www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - First Impressions!

reply A Guy With an Opinion <aguywithanopinion gmail.com> writes:
Hi,

I've been using D for a personal project for about two weeks now 
and just thought I'd share my initial impression just in case 
it's useful! I like feedback on things I do, so I just assume 
others do to. Plus my opinion is the best on the internet! You 
will see (hopefully the sarcasm is obvious otherwise I'll just 
appear pompous). It would probably be better if I did a 
retrospective after my project is completed, but with life who 
knows if that will happen. I could lose interest or something and 
not finish it. And then you guys wouldn't know my opinion. I 
can't allow that.

I'll start off by saying I like the overall experience. I come 
from a C# and C++ background with a little bit of C mixed in. For 
the most part though, I work with C#, SQL and web technologies on 
a day to day basis. I did do a three year stint working with 
C/C++ (mostly C++), but I never really enjoyed it much. C++ is 
overly verbose, overly complicated, overly littered with poor 
legacy decisions, and too error prone. C# on the other hand has 
for the most part been a delight. The only problem is I don't 
find it to be the best when it comes to generative programming. 
C# can do some generative programming with it's generics, but for 
the most part it's always struck me as more specialized for 
container types and to do anything remotely outside of it's 
purpose takes a fair bit of cleverness. I'm sick of being clever 
in that aspect.

So here are some impressions good and bad:

+ Porting straight C# seems pretty straight forward. Even some of 
the .NET framework, like files and unicode, have fairly direct 
counterparts in D.

+ D code so far is pushing me towards more "flat" code (for a 
lack of a better way to phrase it) and so far that has helped 
tremendously when it comes to readability. C# kind is the 
opposite. With it's namespace -> class -> method coupled with 
lock, using, etc...you tend to do a lot of nesting. You are 
generally 3 '{' in before any true logic even begins. Then couple 
that with try/catch, IDisposable/using, locking, and then 
if/else, it can get quite chaotic very easily. So right away, I 
saw my C# code actually appear more readable when I translated it 
and I think it has to do with the flatness. I'm not sure if that 
opinion will hold when I delve into 'static if' a little more, 
but so far my uses of it haven't really dampened that opinion.

+ Visual D. It might be that I had poor expectations of it, 
because I read D's tooling was poor on the internet (and nothing 
is ever wrong on the internet), however, the combination of 
Visual D and DMD actually exceeded my expectations. I've been 
quite happy with it. It was relatively easy to set up and worked 
as I would expect it to work. It lets me debug, add breakpoints, 
and does the basic syntax highlighting I would expect. It could 
have a few other features, but for a project that is not 
corporate backed, it was really above what I could have asked for.

+ So far, compiling is fast. And from what I hear it will stay 
fast. A big motivator. The one commercial C++ project I worked on 
was a beast and could take an hour+ to compile if you needed to 
compile something fundamental. C# is fairly fast, so I've grown 
accustomed to not having to go to the bathroom, get a drink, 
etc...before returning to find out I'm on the linking step. I'm 
used to if it doesn't take less than ten seconds (probably less) 
then I prep myself for an error to deal with. I want this to 
remain.

- Some of the errors from DMD are a little strange. I don't want 
to crap on this too much, because for the most part it's fine. 
However occasionally it throws errors I still can't really work 
out why THAT is the error it gave me. Some of you may have saw my 
question in the "Learn" forum about not knowing to use static in 
an embedded class, but the error was the following:

Error: 'this' is only defined in non-static member functions

I'd say the errors so far are above some of the cryptic stuff C++ 
can throw at you (however, I haven't delved that deeply into D 
templates yet, so don't hold me to this yet), but in terms of 
quality I'd put it somewhere between C# and C++ in quality. With 
C# being the ideal.

+ The standard library so far is really good. Nullable worked as 
I thought it should. I just guessed a few of the methods based on 
what I had seen at that point and got it right. So it appears 
consistent and intuitive. I also like the fact I can peek at the 
code and understand it by just reading it. Unlike with C++ where 
I still don't know how some of the stuff is *really* implemented. 
The STL almost seems like it's written in a completely different 
language than the stuff it enables. For instance, I figured out 
how to do packages by seeing it in Phobos.

- ...however, where are all of the collections? No Queue? No 
Stack? No HashTable? I've read that it's not a big focus because 
some of the built in stuff *can* behave like those things. The C# 
project I'm porting utilizes queues and a specifically C#'s 
Dictionary<> quite a bit, so I'm not looking forward to having to 
hand roll my own or use something that aren't fundamentally them. 
This is definitely the biggest negative I've come across. I want 
a queue, not something that *can* behave as a queue. I definitely 
expected more from a language that is this old.

+ Packages and 'public import'. I really think it's useful to 
forward imports/using statements. It kind of packages everything 
that is required to use that thing in your namespace/package 
together. So you don't have to include a dozen things. C and C++ 
can do this with it's #includes, but in an unsatisfactory way. At 
least in my opinion.

- Modules. I like modules better than #include, but I don't like 
them better than C#'s namespaces. Specifically I don't like how 
there is this gravity that kind of pulls me to associate a module 
with a file. It appears you don't have to, because I can do the 
package thing, but whenever I try to do things outside that one 
idiom I end up in a soup of errors. I'm sure I'm just not use to 
it, but so far it's been a little dissatisfying. Sometimes I want 
where it is physically on my file system to be different from how 
I include it in other source files. To me, C#'s namespaces are 
really the standard to beat or meet.

+ Unit tests. Finally built in unit tests. Enough said here. If 
the lack of collections was the biggest negative, this is the 
biggest positive. I would like to enable them at build time if 
possible though.

- Attributes. I had another post in the Learn forum about 
attributes which was unfortunate. At first I was excited because 
it seems like on the surface it would help me write better code, 
but it gets a little tedious and tiresome to have to remember to 
decorate code with them. It seems like most of them should have 
been the defaults. I would have preferred if the compiler helped 
me and reminded me. I asked if there was a way to enforce them 
globally, which I guess there is, but I guess there's also not a 
way to turn some of them off afterwards. A bit unfortunate. But 
at least I can see some solutions to this.

- The defaults for primitives seem off. They seem to encourage 
errors. I don't think that is the best design decision even if it 
encourages the errors to be caught as quickly as possible. I 
think the better decision would be to not have the errors occur. 
When I asked about this, there seemed to be a disassociation 
between the spec and the implementation. The spec says a 
declaration should error if not explicitly set, but the 
implementation just initializes them to something that is likely 
to error. Like NaN for floats which I would have thought would 
have been 0 based on prior experiences with other languages.

- Immutable. I'm not sure I fully understand it. On the surface 
it seemed like const but transitive. I tried having a method 
return an immutable value, but when I used it in my unit test I 
got some weird errors about objects not being able to return 
immutable (I forget the exact error...apologies). I refactored to 
use const, and it all worked as I expected, but I don't get why 
the immutable didn't work. I was returning a value type, so I 
don't see why passing in assert(object.errorCount == 0) would 
have triggered errors. But it did. I have a set of classes that 
keep track of snapshots of specific counts that seems like a 
perfect fit for immutable (because I don't want those 'snapshots' 
to change...like ever), but I kept getting errors trying to use 
it like const. The type string seems to be an immutable(char[]) 
which works exactly the way I was expecting, and I haven't ran 
into problems, so I'm not sure what the problem was. I'm just 
more confused knowing that string works, but what I did didn't.

+- Unicode support is good. Although I think D's string type 
should have probably been utf16 by default. Especially 
considering the utf module states:

"UTF character support is restricted to '\u0000' <= character <= 
'\U0010FFFF'."

Seems like the natural fit for me. Plus for the vast majority of 
use cases I am pretty guaranteed a char = codepoint. Not the 
biggest issue in the world and maybe I'm just being overly 
critical here.

+ Templates seem powerful. I've only fiddled thus far, but I 
don't think I've quite comprehended their usefulness yet. It will 
probably take me some time to figure out how to wield them 
effectively. One thing I accidentally stumbled upon that I liked 
was that I could simulate inheritance in structs with them, by 
using the mixin keyword. That was cool, and I'm not even sure if 
that is what they were really meant to enable.

So those are just some of my thoughts. Tell me why I'm wrong :P
Nov 27
next sibling parent docandrew <x x.com> writes:
On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an 
Opinion wrote:
 - ...however, where are all of the collections? No Queue? No 
 Stack? No HashTable? I've read that it's not a big focus 
 because some of the built in stuff *can* behave like those 
 things. The C# project I'm porting utilizes queues and a 
 specifically C#'s Dictionary<> quite a bit, so I'm not looking 
 forward to having to hand roll my own or use something that 
 aren't fundamentally them. This is definitely the biggest 
 negative I've come across. I want a queue, not something that 
 *can* behave as a queue. I definitely expected more from a 
 language that is this old.
Good feedback overall, thanks for checking it out. You're not wrong, but some of the design decisions that feel strange to newcomers at first have been heavily-debated, generally well-reasoned, and just take some time to get used to. That sounds like a cop-out, but stick with it and I think you'll find that a lot of the decisions make sense - see the extensive discussion on NaN-default for floats, for example. Just one note about the above comment though: the std.container.dlist doubly-linked list has methods that you can use to put together stacks and queues easily: https://dlang.org/phobos/std_container_dlist.html Also, D's associative arrays implement a hash map https://dlang.org/spec/hash-map.html, which I think should take care of most of C#'s Dictionary functionality. Anyhow, D is a big language (for better and sometimes worse), so it's easy to miss some of the good nuggets buried within the spec/library. -Doc
Nov 27
prev sibling next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 28/11/2017 3:01 AM, A Guy With an Opinion wrote:
 Hi,
 
 I've been using D for a personal project for about two weeks now and 
 just thought I'd share my initial impression just in case it's useful! I 
 like feedback on things I do, so I just assume others do to. Plus my 
 opinion is the best on the internet! You will see (hopefully the sarcasm 
 is obvious otherwise I'll just appear pompous). It would probably be 
 better if I did a retrospective after my project is completed, but with 
 life who knows if that will happen. I could lose interest or something 
 and not finish it. And then you guys wouldn't know my opinion. I can't 
 allow that.
 
 I'll start off by saying I like the overall experience. I come from a C# 
 and C++ background with a little bit of C mixed in. For the most part 
 though, I work with C#, SQL and web technologies on a day to day basis. 
 I did do a three year stint working with C/C++ (mostly C++), but I never 
 really enjoyed it much. C++ is overly verbose, overly complicated, 
 overly littered with poor legacy decisions, and too error prone. C# on 
 the other hand has for the most part been a delight. The only problem is 
 I don't find it to be the best when it comes to generative programming. 
 C# can do some generative programming with it's generics, but for the 
 most part it's always struck me as more specialized for container types 
 and to do anything remotely outside of it's purpose takes a fair bit of 
 cleverness. I'm sick of being clever in that aspect.
 
 So here are some impressions good and bad:
 
 + Porting straight C# seems pretty straight forward. Even some of the 
 .NET framework, like files and unicode, have fairly direct counterparts 
 in D.
 
 + D code so far is pushing me towards more "flat" code (for a lack of a 
 better way to phrase it) and so far that has helped tremendously when it 
 comes to readability. C# kind is the opposite. With it's namespace -> 
 class -> method coupled with lock, using, etc...you tend to do a lot of 
 nesting. You are generally 3 '{' in before any true logic even begins. 
 Then couple that with try/catch, IDisposable/using, locking, and then 
 if/else, it can get quite chaotic very easily. So right away, I saw my 
 C# code actually appear more readable when I translated it and I think 
 it has to do with the flatness. I'm not sure if that opinion will hold 
 when I delve into 'static if' a little more, but so far my uses of it 
 haven't really dampened that opinion.
 
 + Visual D. It might be that I had poor expectations of it, because I 
 read D's tooling was poor on the internet (and nothing is ever wrong on 
 the internet), however, the combination of Visual D and DMD actually 
 exceeded my expectations. I've been quite happy with it. It was 
 relatively easy to set up and worked as I would expect it to work. It 
 lets me debug, add breakpoints, and does the basic syntax highlighting I 
 would expect. It could have a few other features, but for a project that 
 is not corporate backed, it was really above what I could have asked for.
 
 + So far, compiling is fast. And from what I hear it will stay fast. A 
 big motivator. The one commercial C++ project I worked on was a beast 
 and could take an hour+ to compile if you needed to compile something 
 fundamental. C# is fairly fast, so I've grown accustomed to not having 
 to go to the bathroom, get a drink, etc...before returning to find out 
 I'm on the linking step. I'm used to if it doesn't take less than ten 
 seconds (probably less) then I prep myself for an error to deal with. I 
 want this to remain.
 
 - Some of the errors from DMD are a little strange. I don't want to crap 
 on this too much, because for the most part it's fine. However 
 occasionally it throws errors I still can't really work out why THAT is 
 the error it gave me. Some of you may have saw my question in the 
 "Learn" forum about not knowing to use static in an embedded class, but 
 the error was the following:
 
 Error: 'this' is only defined in non-static member functions
 
 I'd say the errors so far are above some of the cryptic stuff C++ can 
 throw at you (however, I haven't delved that deeply into D templates 
 yet, so don't hold me to this yet), but in terms of quality I'd put it 
 somewhere between C# and C++ in quality. With C# being the ideal.
 
 + The standard library so far is really good. Nullable worked as I 
 thought it should. I just guessed a few of the methods based on what I 
 had seen at that point and got it right. So it appears consistent and 
 intuitive. I also like the fact I can peek at the code and understand it 
 by just reading it. Unlike with C++ where I still don't know how some of 
 the stuff is *really* implemented. The STL almost seems like it's 
 written in a completely different language than the stuff it enables. 
 For instance, I figured out how to do packages by seeing it in Phobos.
 
 - ...however, where are all of the collections? No Queue? No Stack? No 
 HashTable? I've read that it's not a big focus because some of the built 
 in stuff *can* behave like those things. The C# project I'm porting 
 utilizes queues and a specifically C#'s Dictionary<> quite a bit, so I'm 
 not looking forward to having to hand roll my own or use something that 
 aren't fundamentally them. This is definitely the biggest negative I've 
 come across. I want a queue, not something that *can* behave as a queue. 
 I definitely expected more from a language that is this old.
Its on our TODO list. Allocators need to come out of experimental and some form of RC before we tackle it again. In the mean time https://github.com/economicmodeling/containers is pretty good.
 + Packages and 'public import'. I really think it's useful to forward 
 imports/using statements. It kind of packages everything that is 
 required to use that thing in your namespace/package together. So you 
 don't have to include a dozen things. C and C++ can do this with it's 
 #includes, but in an unsatisfactory way. At least in my opinion.
 
 - Modules. I like modules better than #include, but I don't like them 
 better than C#'s namespaces. Specifically I don't like how there is this 
 gravity that kind of pulls me to associate a module with a file. It 
 appears you don't have to, because I can do the package thing, but 
 whenever I try to do things outside that one idiom I end up in a soup of 
 errors. I'm sure I'm just not use to it, but so far it's been a little 
 dissatisfying. Sometimes I want where it is physically on my file system 
 to be different from how I include it in other source files. To me, C#'s 
 namespaces are really the standard to beat or meet.
Modules are a fairly well understood concept from the ML family. You are not use to it is all :) Keep in mind we do have namespaces for binding to c++ code and I haven't heard of anybody abusing it for the purpose of using name spaces. They tend to be ugly hacks with ambiguity running through them. Of course I never had to use them in c++ so I'm sure somebody can give you some war stories with them ;)
 + Unit tests. Finally built in unit tests. Enough said here. If the lack 
 of collections was the biggest negative, this is the biggest positive. I 
 would like to enable them at build time if possible though.
I keep saying it, if you don't have unit tests built in, you don't care about code quality!
 - Attributes. I had another post in the Learn forum about attributes 
 which was unfortunate. At first I was excited because it seems like on 
 the surface it would help me write better code, but it gets a little 
 tedious and tiresome to have to remember to decorate code with them. It 
 seems like most of them should have been the defaults. I would have 
 preferred if the compiler helped me and reminded me. I asked if there 
 was a way to enforce them globally, which I guess there is, but I guess 
 there's also not a way to turn some of them off afterwards. A bit 
 unfortunate. But at least I can see some solutions to this.
You don't need to bother with them for most code :)
 - The defaults for primitives seem off. They seem to encourage errors. I 
 don't think that is the best design decision even if it encourages the 
 errors to be caught as quickly as possible. I think the better decision 
 would be to not have the errors occur. When I asked about this, there 
 seemed to be a disassociation between the spec and the implementation. 
 The spec says a declaration should error if not explicitly set, but the 
 implementation just initializes them to something that is likely to 
 error. Like NaN for floats which I would have thought would have been 0 
 based on prior experiences with other languages.
Doesn't mean the other languages are right either.
 - Immutable. I'm not sure I fully understand it. On the surface it 
 seemed like const but transitive. I tried having a method return an 
 immutable value, but when I used it in my unit test I got some weird 
 errors about objects not being able to return immutable (I forget the 
 exact error...apologies). I refactored to use const, and it all worked 
 as I expected, but I don't get why the immutable didn't work. I was 
 returning a value type, so I don't see why passing in 
 assert(object.errorCount == 0) would have triggered errors. But it did. 
 I have a set of classes that keep track of snapshots of specific counts 
 that seems like a perfect fit for immutable (because I don't want those 
 'snapshots' to change...like ever), but I kept getting errors trying to 
 use it like const. The type string seems to be an immutable(char[]) 
 which works exactly the way I was expecting, and I haven't ran into 
 problems, so I'm not sure what the problem was. I'm just more confused 
 knowing that string works, but what I did didn't.
 
 +- Unicode support is good. Although I think D's string type should have 
 probably been utf16 by default. Especially considering the utf module 
 states:
 
 "UTF character support is restricted to '\u0000' <= character <= 
 '\U0010FFFF'."
 
 Seems like the natural fit for me. Plus for the vast majority of use 
 cases I am pretty guaranteed a char = codepoint. Not the biggest issue 
 in the world and maybe I'm just being overly critical here.
That uses a lot of memory UTF-16 instead of UTF-8. I would argue for UTF-32 instead of 16. If you need a wstring, use a wstring! Be aware Microsoft is alone in thinking that UTF-16 was awesome. Everybody else standardized on UTF-8 for Unicode.
 + Templates seem powerful. I've only fiddled thus far, but I don't think 
 I've quite comprehended their usefulness yet. It will probably take me 
 some time to figure out how to wield them effectively. One thing I 
 accidentally stumbled upon that I liked was that I could simulate 
 inheritance in structs with them, by using the mixin keyword. That was 
 cool, and I'm not even sure if that is what they were really meant to 
 enable.
And that is where we use alias this instead. Do wish it was fully implemented though (multiple). Welcome!
Nov 27
next sibling parent reply A Guy With an Opinion <aguywithanopinion gmail.com> writes:
On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole 
wrote:
 Its on our TODO list.

 Allocators need to come out of experimental and some form of RC 
 before we tackle it again.

 In the mean time https://github.com/economicmodeling/containers 
 is pretty good.
That's good to hear.
 I keep saying it, if you don't have unit tests built in, you 
 don't care about code quality!
I just like not having to create a throwaway project to test my code. It's nice to just use unit tests for what I used to create console apps for and then it forever ensures my code works the same!
 You don't need to bother with them for most code :)
That seems to be what people here are saying, but that seems so sad...
 Doesn't mean the other languages are right either.
That is true, but I'm still unconvinced that making the person's program likely to error is better than initializing a number to 0. Zero is such a fundamental default for so many things. And it would be consistent with the other number types.
 If you need a wstring, use a wstring!

 Be aware Microsoft is alone in thinking that UTF-16 was 
 awesome. Everybody else standardized on UTF-8 for Unicode.
I do come from that world, so there is a chance I'm just comfortable with it.
Nov 27
parent reply ketmar <ketmar ketmar.no-ip.org> writes:
A Guy With an Opinion wrote:

 That is true, but I'm still unconvinced that making the person's program 
 likely to error is better than initializing a number to 0. Zero is such a 
 fundamental default for so many things. And it would be consistent with 
 the other number types.
basically, default initializers aren't meant to give a "usable value", they meant to give a *defined* value, so we don't have UB. that is, just initialize your variables explicitly, don't rely on defaults. writing: int a; a += 42; is still bad code, even if you're know that `a` is guaranteed to be zero. int a = 0; a += 42; is the "right" way to write it. if you'll look at default values from this PoV, you'll see that NaN has more sense that zero. if there was a NaN for ints, ints would be inited with it too. ;-)
Nov 27
parent reply A Guy With an Opinion <aguywithanopinion gmail.com> writes:
On Tuesday, 28 November 2017 at 04:12:14 UTC, ketmar wrote:
 A Guy With an Opinion wrote:

 That is true, but I'm still unconvinced that making the 
 person's program likely to error is better than initializing a 
 number to 0. Zero is such a fundamental default for so many 
 things. And it would be consistent with the other number types.
basically, default initializers aren't meant to give a "usable value", they meant to give a *defined* value, so we don't have UB. that is, just initialize your variables explicitly, don't rely on defaults. writing: int a; a += 42; is still bad code, even if you're know that `a` is guaranteed to be zero. int a = 0; a += 42; is the "right" way to write it. if you'll look at default values from this PoV, you'll see that NaN has more sense that zero. if there was a NaN for ints, ints would be inited with it too. ;-)
Eh...I still don't agree. I think C and C++ just gave that style of coding a bad rap due to the undefined behavior. But the issue is it was undefined behavior. A lot of language features aim to make things well defined and have less verbose representations. Once a language matures that's what a big portion of their newer features become. Less verbose shortcuts of commonly done things. I agree it's important that it's well defined, I'm just thinking it should be a value that someone actually wants some notable fraction of the time. Not something no one wants ever. I could be persuaded, but so far I'm not drinking the koolaid on that. It's not the end of the world, I was just confused when my float was NaN.
Nov 27
next sibling parent reply A Guy With an Opinion <aguywithanopinion gmail.com> writes:
On Tuesday, 28 November 2017 at 04:17:18 UTC, A Guy With an 
Opinion wrote:
 On Tuesday, 28 November 2017 at 04:12:14 UTC, ketmar wrote:
 A Guy With an Opinion wrote:

 That is true, but I'm still unconvinced that making the 
 person's program likely to error is better than initializing 
 a number to 0. Zero is such a fundamental default for so many 
 things. And it would be consistent with the other number 
 types.
basically, default initializers aren't meant to give a "usable value", they meant to give a *defined* value, so we don't have UB. that is, just initialize your variables explicitly, don't rely on defaults. writing: int a; a += 42; is still bad code, even if you're know that `a` is guaranteed to be zero. int a = 0; a += 42; is the "right" way to write it. if you'll look at default values from this PoV, you'll see that NaN has more sense that zero. if there was a NaN for ints, ints would be inited with it too. ;-)
Eh...I still don't agree. I think C and C++ just gave that style of coding a bad rap due to the undefined behavior. But the issue is it was undefined behavior. A lot of language features aim to make things well defined and have less verbose representations. Once a language matures that's what a big portion of their newer features become. Less verbose shortcuts of commonly done things. I agree it's important that it's well defined, I'm just thinking it should be a value that someone actually wants some notable fraction of the time. Not something no one wants ever. I could be persuaded, but so far I'm not drinking the koolaid on that. It's not the end of the world, I was just confused when my float was NaN.
Also, C and C++ didn't just have undefined behavior, sometimes it has inconsistent behavior. Sometimes int a; is actually set to 0.
Nov 27
next sibling parent codephantom <me noyb.com> writes:
On Tuesday, 28 November 2017 at 04:19:40 UTC, A Guy With an 
Opinion wrote:
 Also, C and C++ didn't just have undefined behavior, sometimes 
 it has inconsistent behavior. Sometimes int a; is actually set 
 to 0.
set to?
Nov 27
prev sibling parent Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Tuesday, 28 November 2017 at 04:19:40 UTC, A Guy With an 
Opinion wrote:
 On Tuesday, 28 November 2017 at 04:17:18 UTC, A Guy With an 
 Opinion wrote:
 [...]
Also, C and C++ didn't just have undefined behavior, sometimes it has inconsistent behavior. Sometimes int a; is actually set to 0.
It's only auto variables that are undefined. statics and code unit (aka globals) are defined.
Nov 28
prev sibling next sibling parent ketmar <ketmar ketmar.no-ip.org> writes:
A Guy With an Opinion wrote:

 Eh...I still don't agree.
anyway, it is something that won't be changed, 'cause there may be code that rely on current default values. i'm not really trying to change your mind, i just tried to give a rationale behind the choice. that's why `char.init` is 255 too, not zero. still, explicit variable initialization looks better for me. with default init, it is hard to say if the author just forget to initialize a variable, and it happens to work, or he knows about the default value and used it. and the reader don't have to guess what default value is.
Nov 27
prev sibling parent Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Tuesday, 28 November 2017 at 04:17:18 UTC, A Guy With an 
Opinion wrote:
 On Tuesday, 28 November 2017 at 04:12:14 UTC, ketmar wrote:
 A Guy With an Opinion wrote:

 That is true, but I'm still unconvinced that making the 
 person's program likely to error is better than initializing 
 a number to 0. Zero is such a fundamental default for so many 
 things. And it would be consistent with the other number 
 types.
basically, default initializers aren't meant to give a "usable value", they meant to give a *defined* value, so we don't have UB. that is, just initialize your variables explicitly, don't rely on defaults. writing: int a; a += 42; is still bad code, even if you're know that `a` is guaranteed to be zero. int a = 0; a += 42; is the "right" way to write it. if you'll look at default values from this PoV, you'll see that NaN has more sense that zero. if there was a NaN for ints, ints would be inited with it too. ;-)
Eh...I still don't agree. I think C and C++ just gave that style of coding a bad rap due to the undefined behavior. But the issue is it was undefined behavior. A lot of language features aim to make things well defined and have less verbose representations. Once a language matures that's what a big portion of their newer features become. Less verbose shortcuts of commonly done things. I agree it's important that it's well defined, I'm just thinking it should be a value that someone actually wants some notable fraction of the time. Not something no one wants ever. I could be persuaded, but so far I'm not drinking the koolaid on that. It's not the end of the world, I was just confused when my float was NaN.
Just a little anecdote of a maintainer of a legacy project in C. My predecessors in that project had the habit of systematically initialize any auto declared variable at the beginning of a function. The code base that was initiated in the early '90s and written by people who were typical BASIC programmer, so the consequence of it was that functions were very often hundreds of lines long and they all started with a lot of declarations. In the years of reviewing that code, and I was really surprised by that, was how often I found bugs because the variables had been wrongly initialised. By initialising with 0 or NULL, the data flow pass was essentially suppressed at the start so that it could not detect when variables were used before they had been properly populated with the right values the functionality required. The thing with these kind of bugs was that they were very subtle. To make it short, 0 is an arbitrary number that often is the right value but when it isn't, it can be a pain to detect that it was the wrong value.
Nov 28
prev sibling parent reply Kagamin <spam here.lot> writes:
On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole 
wrote:
 Be aware Microsoft is alone in thinking that UTF-16 was 
 awesome. Everybody else standardized on UTF-8 for Unicode.
UCS2 was awesome. UTF-16 is used by Java, JavaScript, Objective-C, Swift, Dart and ms tech, which is 28% of tiobe index.
Nov 30
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/30/2017 9:23 AM, Kagamin wrote:
 On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole wrote:
 Be aware Microsoft is alone in thinking that UTF-16 was awesome. Everybody 
 else standardized on UTF-8 for Unicode.
UCS2 was awesome. UTF-16 is used by Java, JavaScript, Objective-C, Swift, Dart and ms tech, which is 28% of tiobe index.
"was" :-) Those are pretty much pre-surrogate pair designs, or based on them (Dart compiles to JavaScript, for example). UCS2 has serious problems: 1. Most strings are in ascii, meaning UCS2 doubles memory consumption. Strings in the executable file are twice the size. 2. The code doesn't work well with C. C doesn't even have a UCS2 type. 3. There's no reasonable way to audit the code to see if it handles surrogate pairs correctly. Surrogate pairs occur only rarely, so the code is never tested for it, and the bugs may remain latent for many, many years. With UTF8, multibyte code points are much more common, so bugs are detected much earlier.
Dec 01
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Dec 01, 2017 at 03:04:44PM -0800, Walter Bright via Digitalmars-d wrote:
 On 11/30/2017 9:23 AM, Kagamin wrote:
 On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole wrote:
 Be aware Microsoft is alone in thinking that UTF-16 was awesome.
 Everybody else standardized on UTF-8 for Unicode.
UCS2 was awesome. UTF-16 is used by Java, JavaScript, Objective-C, Swift, Dart and ms tech, which is 28% of tiobe index.
"was" :-) Those are pretty much pre-surrogate pair designs, or based on them (Dart compiles to JavaScript, for example). UCS2 has serious problems: 1. Most strings are in ascii, meaning UCS2 doubles memory consumption. Strings in the executable file are twice the size.
This is not true in Asia, esp. where the CJK block is extensively used. A CJK block character is 3 bytes in UTF-8, meaning that string sizes are 150% of the UCS2 encoding. If your code contains a lot of CJK text, that's a lot of bloat. But then again, in non-Latin locales you'd generally store your strings separately of the executable (usually in l10n files), so this may not be that big an issue. But the blanket statement "Most strings are in ASCII" is not correct. T -- Bare foot: (n.) A device for locating thumb tacks on the floor.
Dec 01
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/1/2017 3:16 PM, H. S. Teoh wrote:
 This is not true in Asia, esp. where the CJK block is extensively used.
 A CJK block character is 3 bytes in UTF-8, meaning that string sizes are
 150% of the UCS2 encoding.  If your code contains a lot of CJK text,
 that's a lot of bloat.
 
 But then again, in non-Latin locales you'd generally store your strings
 separately of the executable (usually in l10n files), so this may not be
 that big an issue. But the blanket statement "Most strings are in ASCII"
 is not correct.
Are you sure about that? I know that Asian languages will be longer in UTF-8. But how much data that programs handle is in those languages? The language of business, science, programming, aviation, and engineering is english. Of course, D itself is agnostic about that. The compiler, for example, accepts strings, identifiers, and comments in Chinese in UTF-16 format.
Dec 02
parent Jacob Carlborg <doob me.com> writes:
On 2017-12-02 11:02, Walter Bright wrote:

 Are you sure about that? I know that Asian languages will be longer in 
 UTF-8. But how much data that programs handle is in those languages? The 
 language of business, science, programming, aviation, and engineering is 
 english.
Not necessarily. I've seen code in non-English languages, i.e. when the identifiers are non-English. But of course, most programming languages will using English for keywords and built-in functions. -- /Jacob Carlborg
Dec 02
prev sibling next sibling parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Friday, 1 December 2017 at 23:16:45 UTC, H. S. Teoh wrote:
 On Fri, Dec 01, 2017 at 03:04:44PM -0800, Walter Bright via 
 Digitalmars-d wrote:
 On 11/30/2017 9:23 AM, Kagamin wrote:
 On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki 
 cattermole wrote:
 Be aware Microsoft is alone in thinking that UTF-16 was 
 awesome. Everybody else standardized on UTF-8 for Unicode.
UCS2 was awesome. UTF-16 is used by Java, JavaScript, Objective-C, Swift, Dart and ms tech, which is 28% of tiobe index.
"was" :-) Those are pretty much pre-surrogate pair designs, or based on them (Dart compiles to JavaScript, for example). UCS2 has serious problems: 1. Most strings are in ascii, meaning UCS2 doubles memory consumption. Strings in the executable file are twice the size.
This is not true in Asia, esp. where the CJK block is extensively used. A CJK block character is 3 bytes in UTF-8, meaning that string sizes are 150% of the UCS2 encoding. If your code contains a lot of CJK text, that's a lot of bloat.
That's true in theory, in practice it's not that severe as the CJK languages are never isolated and appear embedded in a lot of ASCII. You can read here a case study [1] which shows 106% for Simplified Chinese, 76% for Traditional Chinese, 129% for Japanese and 94% for Korean. These numbers for pure text. Publish it on the web embedded in bloated html and there goes the size advantage of UTF-16
 But then again, in non-Latin locales you'd generally store your 
 strings separately of the executable (usually in l10n files), 
 so this may not be that big an issue. But the blanket statement 
 "Most strings are in ASCII" is not correct.
False, in the sense that isolated pure text is rare and is generally delivered inside some file format, most times ASCII based like docx, odf, tmx, xliff, akoma ntoso etc... [1]: https://stackoverflow.com/questions/6883434/at-all-times-text-encoded-in-utf-8-will-never-give-us-more-than-a-50-file-size
Dec 02
parent Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Saturday, 2 December 2017 at 10:35:50 UTC, Patrick Schluter 
wrote:
 On Friday, 1 December 2017 at 23:16:45 UTC, H. S. Teoh wrote:
 [...]
That's true in theory, in practice it's not that severe as the CJK languages are never isolated and appear embedded in a lot of ASCII. You can read here a case study [1] which shows 106% for Simplified Chinese, 76% for Traditional Chinese, 129% for Japanese and 94% for Korean. These numbers for pure text.
106% for Korean, copied the wrong column. Traditiojal Chinese was smaller, probably because of whitespaces.
 Publish it on the web embedded in bloated html and there goes 
 the size advantage of UTF-16

 [...]
Dec 02
prev sibling parent reply Joakim <dlang joakim.fea.st> writes:
On Friday, 1 December 2017 at 23:16:45 UTC, H. S. Teoh wrote:
 On Fri, Dec 01, 2017 at 03:04:44PM -0800, Walter Bright via 
 Digitalmars-d wrote:
 On 11/30/2017 9:23 AM, Kagamin wrote:
 On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki 
 cattermole wrote:
 Be aware Microsoft is alone in thinking that UTF-16 was 
 awesome. Everybody else standardized on UTF-8 for Unicode.
UCS2 was awesome. UTF-16 is used by Java, JavaScript, Objective-C, Swift, Dart and ms tech, which is 28% of tiobe index.
"was" :-) Those are pretty much pre-surrogate pair designs, or based on them (Dart compiles to JavaScript, for example). UCS2 has serious problems: 1. Most strings are in ascii, meaning UCS2 doubles memory consumption. Strings in the executable file are twice the size.
This is not true in Asia, esp. where the CJK block is extensively used. A CJK block character is 3 bytes in UTF-8, meaning that string sizes are 150% of the UCS2 encoding. If your code contains a lot of CJK text, that's a lot of bloat.
Yep, that's why five years back many of the major Chinese sites were still not using UTF-8: http://xahlee.info/w/what_encoding_do_chinese_websites_use.html That led that Chinese guy to also rant against UTF-8 a couple years ago: http://xahlee.info/comp/unicode_utf8_encoding_propaganda.html Considering China buys more smartphones than the US and Europe combined, it's time people started recognizing their importance when it comes to issues like this: https://www.statista.com/statistics/412108/global-smartphone-shipments-global-region/ Regarding the unique representation issue Jonathan brings up, I've heard people say that was to provide an easier path for legacy encodings, ie some used combining characters and others didn't, so Unicode chose to accommodate both so both groups would move to Unicode. It would be nice if the Unicode people spent their time pruning and regularizing what they have, rather than adding more useless stuff. Speaking of which, completely agree with Walter and Jonathan that there's no need to add emoji and other such symbols to Unicode, should have never been added. Unicode is supposed to standardize long-existing characters, not promote marginal new symbols to characters. If there's a real need for it, chat software will figure out a way to do it, no need to add such symbols to the Unicode character set.
Dec 02
next sibling parent Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Saturday, 2 December 2017 at 22:16:09 UTC, Joakim wrote:
 On Friday, 1 December 2017 at 23:16:45 UTC, H. S. Teoh wrote:
 On Fri, Dec 01, 2017 at 03:04:44PM -0800, Walter Bright via 
 Digitalmars-d wrote:
 On 11/30/2017 9:23 AM, Kagamin wrote:
 On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki 
 cattermole wrote:
 Be aware Microsoft is alone in thinking that UTF-16 was 
 awesome. Everybody else standardized on UTF-8 for Unicode.
UCS2 was awesome. UTF-16 is used by Java, JavaScript, Objective-C, Swift, Dart and ms tech, which is 28% of tiobe index.
"was" :-) Those are pretty much pre-surrogate pair designs, or based on them (Dart compiles to JavaScript, for example). UCS2 has serious problems: 1. Most strings are in ascii, meaning UCS2 doubles memory consumption. Strings in the executable file are twice the size.
This is not true in Asia, esp. where the CJK block is extensively used. A CJK block character is 3 bytes in UTF-8, meaning that string sizes are 150% of the UCS2 encoding. If your code contains a lot of CJK text, that's a lot of bloat.
Yep, that's why five years back many of the major Chinese sites were still not using UTF-8: http://xahlee.info/w/what_encoding_do_chinese_websites_use.html
Summary Taiwan sites almost all use UTF-8. Very old ones still use BIG5. Mainland China sites mostly still use GBK or GB2312, but a few newer ones use UTF-8. Many top Japan, Korea, sites also use UTF-8, but some uses EUC (Extended Unix Code) variants. This probably means that UTF-8 might dominate in the future. mmmh
 That led that Chinese guy to also rant against UTF-8 a couple 
 years ago:

 http://xahlee.info/comp/unicode_utf8_encoding_propaganda.html
A rant from someone reproaching a video it doesn't provide reasons why utf-8 is good by not providing any reasons why utf-8 is bad. I'm not denying the issues with utf-8, only that the ranter doesn't provide any useful info on what the issues the "Asian" encounter with it, besides legacy reasons (which are important but do not enter in judging the technical quality of an encoding). Add to that that he advocates for GB18030 which is quite inferior to utf-8 except in the legacy support area (here some of the advantages of utf-8 that GB-18030 does not possess: auto-synchronization, algorithmic mapping of codepoints, error detection). If his only beef with utf-8 is the size for CJK text then he shouldn't argue for UTF-32 as he seems to do at the end.
Dec 03
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/2/17 5:16 PM, Joakim wrote:
 Yep, that's why five years back many of the major Chinese sites were 
 still not using UTF-8:
 
 http://xahlee.info/w/what_encoding_do_chinese_websites_use.html
 
 That led that Chinese guy to also rant against UTF-8 a couple years ago:
 
 http://xahlee.info/comp/unicode_utf8_encoding_propaganda.html
BTW has anyone been in contact with Xah Lee? Perhaps we could commission him to write some tutorial material for D. -- Andrei
Dec 04
parent Joakim <dlang joakim.fea.st> writes:
On Monday, 4 December 2017 at 21:23:51 UTC, Andrei Alexandrescu 
wrote:
 On 12/2/17 5:16 PM, Joakim wrote:
 Yep, that's why five years back many of the major Chinese 
 sites were still not using UTF-8:
 
 http://xahlee.info/w/what_encoding_do_chinese_websites_use.html
 
 That led that Chinese guy to also rant against UTF-8 a couple 
 years ago:
 
 http://xahlee.info/comp/unicode_utf8_encoding_propaganda.html
BTW has anyone been in contact with Xah Lee? Perhaps we could commission him to write some tutorial material for D. -- Andrei
I traded email with him last summer, emailed you his email address just now.
Dec 04
prev sibling next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an 
Opinion wrote:
 - Some of the errors from DMD are a little strange.
Yes, indeed, and many of them don't help much in finding the real source of your problem. I think improvements to dmd's error reporting would be the #1 productivity gain D could get right now.
 - ...however, where are all of the collections? No Queue? No 
 Stack? No HashTable?
I always say "meh" to that because any second year student can slap those together in... well, for a second year student, maybe a couple hours for the student, but after that you're looking at just a few minutes, especially leveraging D's built in arrays and associative arrays as your foundation. Sure, they'd be nice to have, but it isn't a dealbreaker in the slightest. Try turning Dictionary<string, string> into D's string[string], for example.
 Sometimes I want where it is physically on my file system to be 
 different from how I include it in other source files.
This is a common misconception, though one promoted by several of the tools: you don't actually need to match file system layout to modules. OK, sure, D does require one module == one file. But the file name and location is not actually tied to the import name you use in code. They can be anything, you just need to pass the list of files to the compiler so it can parse them and figure out the names.
 - Attributes. I had another post in the Learn forum about 
 attributes which was unfortunate.
Yeah, of course, from my post there you know my basic opinion on them. I've written in more detail about them elsewhere and don't feel like it tonight, but I think they are a big failure right now.... but they could be fixed if we're willing to take a few steps (#0 improve the error messages, #1 add opposites to all of them, e.g. throws and gc, #2, change the defaults via a single declaration at the module level, #3 omg revel in how useful they are)
 - Immutable. I'm not sure I fully understand it. On the surface 
 it seemed like const but transitive.
const is transitive too. So the difference is really that `const` means YOU won't change it, whereas `immutable` means NOBODY will change it. What's important there is that to make something immutable, you need to prove to the compiler's satisfaction that nobody else can change it either. const/immutable in D isn't as common as in its family of languages (C++ notably), but when you do get to use it - at least once you get to know it - it is useful.
 I was returning a value type, so I don't see why passing in 
 assert(object.errorCount == 0) would have triggered errors.
Was the object itself immutable? I suspect you wrote something like this: immutable int errorCount() { return ...; } But this is a curious syntax... the `immutable` there actually applies to the *object*, not the return value! It means you can call this method on an immutable object (in fact, it means you MUST call it on an immutable object. const is the middle ground that allows you to call it on either) immutable(int) errorCount() { return ...; } note the parens, is how you apply it to the return value. Yes, this is kinda weird, and style guides tend to suggest putting the qualifiers after the argument list for the `this` thing instead of before... but the language allows it before, so it trips up a LOT of people like this.
 The type string seems to be an immutable(char[]) which works 
 exactly the way I was expecting,
It is actually `immutable(char)[]`. The parens are important here - it applies to the contents of the array, but not the array itself here.
 +- Unicode support is good. Although I think D's string type 
 should have probably been utf16 by default. Especially 
 considering the utf module states:
Note that it has UTF-16 built in as well, with almost equal support. Put `w` at the end of a literal: `"this literal is UTF-16"w` // notice the w after the " and you get utf16. It considers that to be `wstring` instead of `string`, but it works basically the same. If you are doing a lot of Windows API work, this is pretty useful!
 That was cool, and I'm not even sure if that is what they were 
 really meant to enable.
yes, indeed. plugging my book https://www.packtpub.com/application-development/d-cookbook i talk about much of this stuff in there
Nov 27
parent A Guy With an Opinion <aguywithanopinion gmail.com> writes:
On Tuesday, 28 November 2017 at 04:24:46 UTC, Adam D. Ruppe wrote:
 immutable(int) errorCount() { return ...; }
I actually did try something like that, because I remembered seeing the parens around the string definition. I think at that point I was just so riddled with errors I just took a step back and went back to something I know. Just to make sure I wasn't going insane.
Nov 27
prev sibling next sibling parent reply Michael V. Franklin <slavo5150 yahoo.com> writes:
On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an 
Opinion wrote:

 + D code so far is pushing me towards more "flat" code (for a 
 lack of a better way to phrase it) and so far that has helped 
 tremendously when it comes to readability. C# kind is the 
 opposite. With it's namespace -> class -> method coupled with 
 lock, using, etc...you tend to do a lot of nesting. You are 
 generally 3 '{' in before any true logic even begins. Then 
 couple that with try/catch, IDisposable/using, locking, and 
 then if/else, it can get quite chaotic very easily. So right 
 away, I saw my C# code actually appear more readable when I 
 translated it and I think it has to do with the flatness. I'm 
 not sure if that opinion will hold when I delve into 'static 
 if' a little more, but so far my uses of it haven't really 
 dampened that opinion.
I come from a heavy C#/C++ background. I also I *felt* this as well, but never really consciously though about it, until you mentioned it :-)
 - Some of the errors from DMD are a little strange. I don't 
 want to crap on this too much, because for the most part it's 
 fine. However occasionally it throws errors I still can't 
 really work out why THAT is the error it gave me. Some of you 
 may have saw my question in the "Learn" forum about not knowing 
 to use static in an embedded class, but the error was the 
 following:

 Error: 'this' is only defined in non-static member functions
Please submit things like this to the issue tracker. They are very easy to fix, and if I'm aware of them, I'll probably do the work. But, please provide a code example and offer a suggestion of what you would prefer it to say; it just makes things easier.
 - Modules. I like modules better than #include, but I don't 
 like them better than C#'s namespaces. Specifically I don't 
 like how there is this gravity that kind of pulls me to 
 associate a module with a file. It appears you don't have to, 
 because I can do the package thing, but whenever I try to do 
 things outside that one idiom I end up in a soup of errors. I'm 
 sure I'm just not use to it, but so far it's been a little 
 dissatisfying. Sometimes I want where it is physically on my 
 file system to be different from how I include it in other 
 source files. To me, C#'s namespaces are really the standard to 
 beat or meet.
I feel the same. I don't like that modules are tied to files; it seems like such an arbitrary limitation. We're not alone: https://youtu.be/6_xdfSVRrKo?t=353
 - Attributes. I had another post in the Learn forum about 
 attributes which was unfortunate. At first I was excited 
 because it seems like on the surface it would help me write 
 better code, but it gets a little tedious and tiresome to have 
 to remember to decorate code with them. It seems like most of 
 them should have been the defaults. I would have preferred if 
 the compiler helped me and reminded me. I asked if there was a 
 way to enforce them globally, which I guess there is, but I 
 guess there's also not a way to turn some of them off 
 afterwards. A bit unfortunate. But at least I can see some 
 solutions to this.
Yep. One of my pet peeves in D.
 - The defaults for primitives seem off. They seem to encourage 
 errors. I don't think that is the best design decision even if 
 it encourages the errors to be caught as quickly as possible. I 
 think the better decision would be to not have the errors 
 occur. When I asked about this, there seemed to be a 
 disassociation between the spec and the implementation. The 
 spec says a declaration should error if not explicitly set, but 
 the implementation just initializes them to something that is 
 likely to error. Like NaN for floats which I would have thought 
 would have been 0 based on prior experiences with other 
 languages.
Another one of my pet peeves in D. Though this post (http://forum.dlang.org/post/tcldaatzzbhjoamnvniu forum.dlang.org) made me realize we might be able to do something about that.
 +- Unicode support is good. Although I think D's string type 
 should have probably been utf16 by default. Especially 
 considering the utf module states:

 "UTF character support is restricted to '\u0000' <= character 
 <= '\U0010FFFF'."
See http://utf8everywhere.org/
 + Templates seem powerful. I've only fiddled thus far, but I 
 don't think I've quite comprehended their usefulness yet. It 
 will probably take me some time to figure out how to wield them 
 effectively. One thing I accidentally stumbled upon that I 
 liked was that I could simulate inheritance in structs with 
 them, by using the mixin keyword. That was cool, and I'm not 
 even sure if that is what they were really meant to enable.
Templates, CTFE, and mixins are gravy! and D's the only language I know of that has this symbiotic feature set.
 So those are just some of my thoughts. Tell me why I'm wrong :P
I share much of your perspective. Thanks for the interesting read. Mike
Nov 27
parent reply A Guy With an Opinion <aguywithanopinion gmail.com> writes:
On Tuesday, 28 November 2017 at 04:37:04 UTC, Michael V. Franklin 
wrote:
 Please submit things like this to the issue tracker.  They are 
 very easy to fix, and if I'm aware of them, I'll probably do 
 the work.  But, please provide a code example and
 offer a suggestion of what you would prefer it to say; it just 
 makes things easier.>
I'd be happy to submit an issue, but I'm not quite sure I'd be the best to determine an error message (at least not this early). Mainly because I have no clue what it was yelling at me about. I only new to add static because I told people my intentions and they suggested it. I guess having a non statically marked class is a valid feature imported from Java world. I'm just not as familiar with that specific feature of Java. Therefore I have no idea what the text really had to do with anything. Maybe appending "if you meant to make a static class" would have been helpful. I fiddled with Rust a little too, and it's what they tend to do very well. Make verbose error messages.
 We're not alone:  https://youtu.be/6_xdfSVRrKo?t=353
And he was so much better at articulating it than I was. Another C# guy though. :)
Nov 27
parent reply Michael V. Franklin <slavo5150 yahoo.com> writes:
On Tuesday, 28 November 2017 at 04:48:57 UTC, A Guy With an 
Opinion wrote:

 I'd be happy to submit an issue, but I'm not quite sure I'd be 
 the best to determine an error message (at least not this 
 early). Mainly because I have no clue what it was yelling at me 
 about. I only new to add static because I told people my 
 intentions and they suggested it. I guess having a non 
 statically marked class is a valid feature imported from Java 
 world.
If this was on the forum, please point me to it. I'll see if I can understand what's going on and do something about it. Thanks, Mike
Nov 27
parent A Guy With an Opinion <aguywithanopinion gmail.com> writes:
On Tuesday, 28 November 2017 at 05:16:54 UTC, Michael V. Franklin 
wrote:
 On Tuesday, 28 November 2017 at 04:48:57 UTC, A Guy With an 
 Opinion wrote:

 I'd be happy to submit an issue, but I'm not quite sure I'd be 
 the best to determine an error message (at least not this 
 early). Mainly because I have no clue what it was yelling at 
 me about. I only new to add static because I told people my 
 intentions and they suggested it. I guess having a non 
 statically marked class is a valid feature imported from Java 
 world.
If this was on the forum, please point me to it. I'll see if I can understand what's going on and do something about it. Thanks, Mike
https://forum.dlang.org/thread/vcvlffjxowgdvpvjsijq forum.dlang.org
Nov 27
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/27/17 10:01 PM, A Guy With an Opinion wrote:
 Hi,
Hi Guy, welcome, and I wanted to say I was saying "me too" while reading much of your post. I worked on a C# based client/server for about 5 years, and the biggest thing I agree with you on is the generic programming. I was also using D at the time, and using generics felt like eating a superbly under-baked cake. A few points:
 - Some of the errors from DMD are a little strange. I don't want to crap 
 on this too much, because for the most part it's fine. However 
 occasionally it throws errors I still can't really work out why THAT is 
 the error it gave me. Some of you may have saw my question in the 
 "Learn" forum about not knowing to use static in an embedded class, but 
 the error was the following:
 
 Error: 'this' is only defined in non-static member functions
Yes, this is simply a bad error message. Many of our bad error messages come from something called "lowering", where one piece of code is converted to another piece of code, and then the error message happens on the converted code. So essentially you are getting errors on code you didn't write! They are more difficult to fix, since we can't change the real error message (it applies to real code as well), and the code that generated the lowered code is decoupled from the error. I think this is one of those cases.
 I'd say the errors so far are above some of the cryptic stuff C++ can 
 throw at you (however, I haven't delved that deeply into D templates 
 yet, so don't hold me to this yet), but in terms of quality I'd put it 
 somewhere between C# and C++ in quality. With C# being the ideal.
Once you use templates a lot, the error messages explode in cryptology :) But generally, you can get the gist of your errors if you can decipher half-way the mangling.
 - ...however, where are all of the collections? No Queue? No Stack? No 
 HashTable? I've read that it's not a big focus because some of the built 
 in stuff *can* behave like those things. The C# project I'm porting 
 utilizes queues and a specifically C#'s Dictionary<> quite a bit, so I'm 
 not looking forward to having to hand roll my own or use something that 
 aren't fundamentally them. This is definitely the biggest negative I've 
 come across. I want a queue, not something that *can* behave as a queue. 
 I definitely expected more from a language that is this old.
I haven't touched this in years, but it should still work pretty well (if you try it and it doesn't compile for some reason, please submit an issue there): https://github.com/schveiguy/dcollections It has more of a Java/C# feel than other libraries, including an interface hierarchy. That being said, Queue is just so easy to implement given a linked list, I never bothered :)
 + Unit tests. Finally built in unit tests. Enough said here. If the lack 
 of collections was the biggest negative, this is the biggest positive. I 
 would like to enable them at build time if possible though.
+1000 About the running of unit tests at build time, many people version their main function like this: version(unittest) void main() {} else int main(string[] args) // real declaration { ... } This way, when you build with -unittest, you only run unit tests, and exit immediately. So enabling them at build time is quite easy.
 - Attributes. I had another post in the Learn forum about attributes 
 which was unfortunate. At first I was excited because it seems like on 
 the surface it would help me write better code, but it gets a little 
 tedious and tiresome to have to remember to decorate code with them. It 
 seems like most of them should have been the defaults. I would have 
 preferred if the compiler helped me and reminded me. I asked if there 
 was a way to enforce them globally, which I guess there is, but I guess 
 there's also not a way to turn some of them off afterwards. A bit 
 unfortunate. But at least I can see some solutions to this.
If you are using more templates (and I use them the more I write D code), you will not have this problem. Templates infer almost all attributes.
 - Immutable. I'm not sure I fully understand it. On the surface it 
 seemed like const but transitive. I tried having a method return an 
 immutable value, but when I used it in my unit test I got some weird 
 errors about objects not being able to return immutable (I forget the 
 exact error...apologies). I refactored to use const, and it all worked 
 as I expected, but I don't get why the immutable didn't work. I was 
 returning a value type, so I don't see why passing in 
 assert(object.errorCount == 0) would have triggered errors. But it did. 
This is likely because of Adam's suggestion -- you were incorrectly declaring a function that returned an immutable like this: immutable T foo(); Where the immutable *doesn't* apply to the return value, but to the function itself. immutable applied to a function is really applying immutable to the 'this' reference.
 + Templates seem powerful. I've only fiddled thus far, but I don't think 
 I've quite comprehended their usefulness yet. It will probably take me 
 some time to figure out how to wield them effectively. One thing I 
 accidentally stumbled upon that I liked was that I could simulate 
 inheritance in structs with them, by using the mixin keyword. That was 
 cool, and I'm not even sure if that is what they were really meant to 
 enable.
Templates and generative programming is what hooks you on D. You will be spoiled when you work on other languages :) -Steve
Nov 28
parent reply A Guy With an Opinion <aguywithanopinion gmail.com> writes:
On Tuesday, 28 November 2017 at 13:17:16 UTC, Steven 
Schveighoffer wrote:
 This is likely because of Adam's suggestion -- you were 
 incorrectly declaring a function that returned an immutable 
 like this:
 immutable T foo();

 -Steve
That's exactly what it was I think. As I stated before, I tried to do immutable(T) but I was drowning in errors at that point that I just took a step back. I'll try to refactor it back to using immutable. I just honestly didn't quite know what I was doing obviously.
Nov 28
parent A Guy With an Opinion <aguywithanopinion gmail.com> writes:
On Tuesday, 28 November 2017 at 13:17:16 UTC, Steven 
Schveighoffer wrote:
 https://github.com/schveiguy/dcollections
On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole wrote:
 https://github.com/economicmodeling/containers
Thanks. I'll check both out. It's not that I don't want to write them, it's just I don't want to stop what I'm doing when I need them and write them. It takes me out of my thought process.
Nov 28
prev sibling next sibling parent Guillaume Piolat <contact spam.com> writes:
On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an 
Opinion wrote:
 So those are just some of my thoughts. Tell me why I'm wrong :P
You are not supposed to come to this forum with well-balanced opinions and reasonable arguments. It's not colourful enough to be heard! Instead make a dent in the universe. Prepare your most impactful, most offensive statements to push your personal agenda of what your own system programming language would be like, if you had the stamina. Use doubtful analogies and references to languages with wildly different goals than D. Prepare to abuse the volunteers, and say how much you would dare to use D, if only it would do "just this one obvious change". Having this feature would make the BlobTech industry switch to D overnight! And you haven't asked for any new feature, especially no new _syntax_ were demanded! I don't know, find anything: "It would be nice to have a shortcut syntax for when you wan't to add zero. Writing 0 + x is cumbersome, when +x would do it. It has the nice benefit or unifying unary and binary operators, and thus leads to a simplified implementation." Do you realize the dangers of looking satisfied?
Nov 28
prev sibling next sibling parent reply Jack Stouffer <jack jackstouffer.com> writes:
On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an 
Opinion wrote:
 - Attributes. I had another post in the Learn forum about 
 attributes which was unfortunate. At first I was excited 
 because it seems like on the surface it would help me write 
 better code, but it gets a little tedious and tiresome to have 
 to remember to decorate code with them. It seems like most of 
 them should have been the defaults. I would have preferred if 
 the compiler helped me and reminded me. I asked if there was a 
 way to enforce them globally, which I guess there is, but I 
 guess there's also not a way to turn some of them off 
 afterwards. A bit unfortunate. But at least I can see some 
 solutions to this.
Attributes were one of my biggest hurdles when working on my own projects. For example, it's a huge PITA when you have to add a debug writeln deep down in your call stack, and it ends up violating a bunch of function attributes further up. Thankfully, wrapping statements in debug {} allows you to ignore pure and safe violations in that code if you compile with the flag -debug. Also, you can apply attributes to your whole project by adding them to main void main(string[] args) safe {} Although this isn't recommended, as almost no program can be completely safe. You can do it on a per-file basis by putting the attributes at the top like so safe: pure:
Nov 28
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Tuesday, 28 November 2017 at 16:14:52 UTC, Jack Stouffer wrote:
 You can do it on a per-file basis by putting the attributes at 
 the top like so
That doesn't quite work since it doesn't descend into aggregates. And you can't turn most them off.
Nov 28
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2017-11-28 17:24, Adam D. Ruppe wrote:

 That doesn't quite work since it doesn't descend into aggregates. And 
 you can't turn most them off.
And if your project is a library. -- /Jacob Carlborg
Nov 28
prev sibling parent reply A Guy With an Opinion <aguywithanopinion gmail.com> writes:
On Tuesday, 28 November 2017 at 16:24:56 UTC, Adam D. Ruppe wrote:
 That doesn't quite work since it doesn't descend into 
 aggregates. And you can't turn most them off.
I take it adding those inverse attributes is no trivial thing?
Nov 28
next sibling parent reply Michael V. Franklin <slavo5150 yahoo.com> writes:
On Tuesday, 28 November 2017 at 19:34:27 UTC, A Guy With an 
Opinion wrote:

 I take it adding those inverse attributes is no trivial thing?
It would require a DIP: https://github.com/dlang/DIPs This DIP is related (https://github.com/dlang/DIPs/blob/master/DIPs/DIP1012.md) but I don't know what's happening with it. Mike
Nov 28
parent reply Mike Parker <aldacron gmail.com> writes:
On Tuesday, 28 November 2017 at 19:39:19 UTC, Michael V. Franklin 
wrote:

 This DIP is related 
 (https://github.com/dlang/DIPs/blob/master/DIPs/DIP1012.md) but 
 I don't know what's happening with it.
It's awaiting formal review. I'll move it forward when the formal review queue clears out a bit.
Nov 28
parent A Guy With a Question <aguywithanquestion gmail.com> writes:
On Tuesday, 28 November 2017 at 22:08:48 UTC, Mike Parker wrote:
 On Tuesday, 28 November 2017 at 19:39:19 UTC, Michael V. 
 Franklin wrote:

 This DIP is related 
 (https://github.com/dlang/DIPs/blob/master/DIPs/DIP1012.md) 
 but I don't know what's happening with it.
It's awaiting formal review. I'll move it forward when the formal review queue clears out a bit.
How well does phobos play with it? I'm finding, for instance, it's not playing too well with nothrow. Things throw that I don't understand why.
Nov 29
prev sibling parent Adam D. Ruppe <destructionator gmail.com> writes:
On Tuesday, 28 November 2017 at 19:34:27 UTC, A Guy With an 
Opinion wrote:
 I take it adding those inverse attributes is no trivial thing?
Technically, it is extremely trivial. Politically, that's a different matter. There's been arguments before about the words or the syntax (is it " gc" or " nogc(false)", for example? tbh i think the latter is kinda elegant, but the former works too, i just want something that work) and the process (so much paperwork!) and all kinds of nonsense.
Nov 28
prev sibling parent Dukc <ajieskola gmail.com> writes:
On Tuesday, 28 November 2017 at 16:14:52 UTC, Jack Stouffer wrote:

 you can apply attributes to your whole project by adding them 
 to main

 void main(string[] args)  safe {}

 Although this isn't recommended, as almost no program can be 
 completely safe.
In fact I believe it is. When you have something unsafe you can manually wrap it with trusted. Same goes with nothrow, since you can catch everything thrown. But putting nogc to main is of course not recommended except in special cases, and pure is competely out of question.
Nov 30
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/27/2017 7:01 PM, A Guy With an Opinion wrote:
 +- Unicode support is good. Although I think D's string type should have 
 probably been utf16 by default. Especially considering the utf module states:
 
 "UTF character support is restricted to '\u0000' <= character <= '\U0010FFFF'."
 
 Seems like the natural fit for me. Plus for the vast majority of use cases I
am 
 pretty guaranteed a char = codepoint. Not the biggest issue in the world and 
 maybe I'm just being overly critical here.
Sooner or later your code will exhibit bugs if it assumes that char==codepoint with UTF16, because of surrogate pairs. https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java As far as I can tell, pretty much the only users of UTF16 are Windows programs. Everyone else uses UTF8 or UCS32. I recommend using UTF8.
Nov 30
next sibling parent reply Joakim <dlang joakim.fea.st> writes:
On Thursday, 30 November 2017 at 10:19:18 UTC, Walter Bright 
wrote:
 On 11/27/2017 7:01 PM, A Guy With an Opinion wrote:
 +- Unicode support is good. Although I think D's string type 
 should have probably been utf16 by default. Especially 
 considering the utf module states:
 
 "UTF character support is restricted to '\u0000' <= character 
 <= '\U0010FFFF'."
 
 Seems like the natural fit for me. Plus for the vast majority 
 of use cases I am pretty guaranteed a char = codepoint. Not 
 the biggest issue in the world and maybe I'm just being overly 
 critical here.
Sooner or later your code will exhibit bugs if it assumes that char==codepoint with UTF16, because of surrogate pairs. https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java As far as I can tell, pretty much the only users of UTF16 are Windows programs. Everyone else uses UTF8 or UCS32. I recommend using UTF8.
Java, .NET, Qt, Javascript, and a handful of others use UTF-16 too, some starting off with the earlier UCS-2: https://en.m.wikipedia.org/wiki/UTF-16#Usage Not saying either is better, each has their flaws, just pointing out it's more than just Windows.
Nov 30
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/30/2017 2:39 AM, Joakim wrote:
 Java, .NET, Qt, Javascript, and a handful of others use UTF-16 too, some 
 starting off with the earlier UCS-2:
 
 https://en.m.wikipedia.org/wiki/UTF-16#Usage
 
 Not saying either is better, each has their flaws, just pointing out it's more 
 than just Windows.
I stand corrected.
Nov 30
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Thursday, November 30, 2017 03:37:37 Walter Bright via Digitalmars-d 
wrote:
 On 11/30/2017 2:39 AM, Joakim wrote:
 Java, .NET, Qt, Javascript, and a handful of others use UTF-16 too, some
 starting off with the earlier UCS-2:

 https://en.m.wikipedia.org/wiki/UTF-16#Usage

 Not saying either is better, each has their flaws, just pointing out
 it's more than just Windows.
I stand corrected.
I get the impression that the stuff that uses UTF-16 is mostly stuff that picked an encoding early on in the Unicode game and thought that they picked one that guaranteed that a code unit would be an entire character. Many of them picked UCS-2 and then switched later to UTF-16, but once they picked a 16-bit encoding, they were kind of stuck. Others - most notably C/C++ and the *nix world - picked UTF-8 for backwards compatibility, and once it became clear that UCS-2 / UTF-16 wasn't going to cut it for a code unit representing a character, most stuff that went Unicode went UTF-8. Language-wise, I think that most of the UTF-16 is driven by the fact that Java went with UCS-2 / UTF-16, and C# followed them (both because they were copying Java and because the Win32 API had gone with UCS-2 / UTF-16). So, that's had a lot of influence on folks, though most others have gone with UTF-8 for backwards compatibility and because it typically takes up less space for non-Asian text. But the use of UTF-16 in Windows, Java, and C# does seem to have resulted in some folks thinking that wide characters means Unicode, and narrow characters meaning ASCII. I really wish that everything would just got to UTF-8 and that UTF-16 would die, but that would just break too much code. And if we were willing to do that, I'm sure that we could come up with a better encoding than UTF-8 (e.g. getting rid of Unicode normalization as being a thing and never having multiple encodings for the same character), but _that_'s never going to happen. - Jonathan M Davis
Nov 30
next sibling parent A Guy With a Question <aguywithanquestion gmail.com> writes:
On Thursday, 30 November 2017 at 17:56:58 UTC, Jonathan M Davis 
wrote:
 On Thursday, November 30, 2017 03:37:37 Walter Bright via 
 Digitalmars-d wrote:
 On 11/30/2017 2:39 AM, Joakim wrote:
 Java, .NET, Qt, Javascript, and a handful of others use 
 UTF-16 too, some starting off with the earlier UCS-2:

 https://en.m.wikipedia.org/wiki/UTF-16#Usage

 Not saying either is better, each has their flaws, just 
 pointing out it's more than just Windows.
I stand corrected.
I get the impression that the stuff that uses UTF-16 is mostly stuff that picked an encoding early on in the Unicode game and thought that they picked one that guaranteed that a code unit would be an entire character.
I don't think that's true though. Haven't you always been able to combine two codepoints into one visual representation (Ä for example). To me it's still two characters to look for when going through the string, but the UI or text interpreter might choose to combine them. So in certain domains, such as trying to visually represent the character, yes a codepoint is not a character, if by what you mean by character is the visual representation. But what we are referring to as a character can kind of morph depending on context. When you are running through the data though in the algorithm behind the scenes, you care about the *information* therefore the codepoint. And we are really just have a semantics battle if someone calls that a character.
 Many of them picked UCS-2 and then switched later to UTF-16, 
 but once they picked a 16-bit encoding, they were kind of stuck.

 Others - most notably C/C++ and the *nix world - picked UTF-8 
 for backwards compatibility, and once it became clear that 
 UCS-2 / UTF-16 wasn't going to cut it for a code unit 
 representing a character, most stuff that went Unicode went 
 UTF-8.
That's only because C used ASCII and thus was a byte. UTF-8 is inline with this, so literally nothing needs to change to get pretty much the same behavior. It makes sense. With this this in mind, it actually might make sense for D to use it.
Nov 30
prev sibling next sibling parent reply A Guy With a Question <aguywithanquestion gmail.com> writes:
On Thursday, 30 November 2017 at 17:56:58 UTC, Jonathan M Davis 
wrote:
 On Thursday, November 30, 2017 03:37:37 Walter Bright via 
 Digitalmars-d wrote:
 Language-wise, I think that most of the UTF-16 is driven by the 
 fact that Java went with UCS-2 / UTF-16, and C# followed them 
 (both because they were copying Java and because the Win32 API 
 had gone with UCS-2 / UTF-16). So, that's had a lot of 
 influence on folks, though most others have gone with UTF-8 for 
 backwards compatibility and because it typically takes up less 
 space for non-Asian text. But the use of UTF-16 in Windows, 
 Java, and C# does seem to have resulted in some folks thinking 
 that wide characters means Unicode, and narrow characters 
 meaning ASCII.
 - Jonathan M Davis
I think it also simplifies the logic. You are not always looking to represent the codepoints symbolically. You are just trying to see what information is in it. Therefore, if you can practically treat a codepoint as the unit of data behind the scenes, it simplifies the logic.
Nov 30
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Thursday, November 30, 2017 18:32:46 A Guy With a Question via 
Digitalmars-d wrote:
 On Thursday, 30 November 2017 at 17:56:58 UTC, Jonathan M Davis

 wrote:
 On Thursday, November 30, 2017 03:37:37 Walter Bright via
 Digitalmars-d wrote:
 Language-wise, I think that most of the UTF-16 is driven by the
 fact that Java went with UCS-2 / UTF-16, and C# followed them
 (both because they were copying Java and because the Win32 API
 had gone with UCS-2 / UTF-16). So, that's had a lot of
 influence on folks, though most others have gone with UTF-8 for
 backwards compatibility and because it typically takes up less
 space for non-Asian text. But the use of UTF-16 in Windows,
 Java, and C# does seem to have resulted in some folks thinking
 that wide characters means Unicode, and narrow characters
 meaning ASCII.

 - Jonathan M Davis
I think it also simplifies the logic. You are not always looking to represent the codepoints symbolically. You are just trying to see what information is in it. Therefore, if you can practically treat a codepoint as the unit of data behind the scenes, it simplifies the logic.
Even if that were true, UTF-16 code units are not code points. If you want to operate on code points, you have to go to UTF-32. And even if you're at UTF-32, you have to worry about Unicode normalization, otherwise the same information can be represented differently even if all you care about is code points and not graphemes. And of course, some stuff really does care about graphemes, since those are the actual characters. Ultimately, you have to understand how code units, code points, and graphemes work and what you're doing with a particular algorithm so that you know at which level you should operate at and where the pitfalls are. Some code can operate on code units and be fine; some can operate on code points; and some can operate on graphemes. But there is no one-size-fits-all solution that makes it all magically easy and efficient to use. And UTF-16 does _nothing_ to improve any of this over UTF-8. It's just a different way to encode code points. And really, it makes things worse, because it usually takes up more space than UTF-8, and it makes it easier to miss when you screw up your Unicode handling, because more UTF-16 code units are valid code points than UTF-8 code units are, but they still aren't all valid code points. So, if you use UTF-8, you're more likely to catch your mistakes. Honestly, I think that the only good reason to use UTF-16 is if you're interacting with existing APIs that use UTF-16, and even then, I think that in most cases, you're better off using UTF-8 and converting to UTF-16 only when you have to. Strings eat less memory that way, and mistakes are more easily caught. And if you're writing cross-platform code in D, then Windows is really the only place that you're typically going to have to deal with UTF-16, so it definitely works better in general to favor UTF-8 in D programs. But regardless, at least D gives you the tools to deal with the different Unicode encodings relatively cleanly and easily, so you can use whichever Unicode encoding you need to. Most D code is going to use UTF-8 though. - Jonathan M Davis
Nov 30
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/30/2017 9:56 AM, Jonathan M Davis wrote:
 I'm sure that we could come up with a better encoding than UTF-8 (e.g.
 getting rid of Unicode normalization as being a thing and never having
 multiple encodings for the same character), but _that_'s never going to
 happen.
UTF-8 is not the cause of that particular problem, it's caused by the Unicode committee being a committee. Other Unicode problems are caused by the committee trying to add semantic information to code points, which causes nothing but problems. I.e. the committee forgot that Unicode is a character set, and nothing more.
Dec 01
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, December 01, 2017 15:54:31 Walter Bright via Digitalmars-d wrote:
 On 11/30/2017 9:56 AM, Jonathan M Davis wrote:
 I'm sure that we could come up with a better encoding than UTF-8 (e.g.
 getting rid of Unicode normalization as being a thing and never having
 multiple encodings for the same character), but _that_'s never going to
 happen.
UTF-8 is not the cause of that particular problem, it's caused by the Unicode committee being a committee. Other Unicode problems are caused by the committee trying to add semantic information to code points, which causes nothing but problems. I.e. the committee forgot that Unicode is a character set, and nothing more.
Oh, definitely. UTF-8 is arguably the best that Unicode has, but Unicode in general is what's broken, because the folks designing it made poor choices. And personally, I think that their worst decisions tend to be at the code point level (e.g. having the same character being representable by different combinations of code points). Quite possbily the most depressing thing that I've run into with Unicode though was finding out that emojis had their own code points. Emojis are specifically representable by a sequence of existing characters (usually ASCII), because they came from folks trying to represent pictures with text. The fact that they're then trying to put those pictures into the Unicode standard just blatantly shows that the Unicode folks have lost sight of what they're up to. It's like if they started trying to add Unicode characters for words. It makes no sense. But unfortunately, we just have to live with it... :( - Jonathan M Davis
Dec 01
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/1/2017 8:08 PM, Jonathan M Davis wrote:
 And personally, I think that their worst decisions tend to be at the code
 point level (e.g. having the same character being representable by different
 combinations of code points).
Yup. I've presented that point of view a couple times on HackerNews, and some Unicode people took umbrage at that. The case they presented fell a little flat.
 Quite possbily the most depressing thing that I've run into with Unicode
 though was finding out that emojis had their own code points. Emojis are
 specifically representable by a sequence of existing characters (usually
 ASCII), because they came from folks trying to represent pictures with text.
 The fact that they're then trying to put those pictures into the Unicode
 standard just blatantly shows that the Unicode folks have lost sight of what
 they're up to. It's like if they started trying to add Unicode characters
 for words. It makes no sense. But unfortunately, we just have to live with
 it... :(
Yah, I've argued against that, too. And those "international" icons are arguably one of the dumber ideas to ever sweep the world, yet they seem to be celebrated without question. Have you ever tried to look up an icon in a dictionary? It doesn't work. So if you don't know what an icon means, you're hosed. If it is a word you don't understand, you can look it up in a dictionary. Furthermore, you don't need to know English to know what "ON" means. There is no more cognitive difficulty asking someone what "ON" means than there is asking what "|" means. Is an illiterate person from XxLand really going to understand that "|" means "ON" without help? My car has a bunch emoticons labeling the controls. I can't figure out what any of them do without reading the manual, or just pushing random buttons until what I want happens. One button has an icon on it that looks like a snowflake. What does that do? Turn on the A/C? Defrost the frosty windows? Set the AWD in slippery mode? Turn on the Christmas lights? On my pre-madness truck, they're labeled in English. Never had any trouble with that. Part of the problem I've seen is that people do things like "vote for my emoji/icon and I'll vote for yours!" And then when they get something accepted, they wear it as a badge of status and write articles saying how you, too, can get your whatever accepted as an icon. It's madness, madness I say!
Dec 02
next sibling parent Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Saturday, 2 December 2017 at 10:20:10 UTC, Walter Bright wrote:
 On 12/1/2017 8:08 PM, Jonathan M Davis wrote:
 [...]
Yup. I've presented that point of view a couple times on HackerNews, and some Unicode people took umbrage at that. The case they presented fell a little flat. [...]
Where it gets really fun is the when there is color composition for emoticons U+1F466 = 👦 U+1F466 U+1F3FF = 👦🏿
Dec 02
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sat, Dec 02, 2017 at 02:20:10AM -0800, Walter Bright via Digitalmars-d wrote:
[...]
 My car has a bunch emoticons labeling the controls. I can't figure out
 what any of them do without reading the manual, or just pushing random
 buttons until what I want happens. One button has an icon on it that
 looks like a snowflake. What does that do? Turn on the A/C? Defrost
 the frosty windows?  Set the AWD in slippery mode? Turn on the
 Christmas lights?
The same can be argued for the icon mania started by the GUI craze in the 90's that has now become the de facto standard. Some icons are more obvious than others, but nowadays GUI toolbars are full of inscrutible icons of unclear meaning that are basically opaque unless you already have prior knowledge of what they're supposed to represent. Thankfully most(?) GUI programs have enough sanity left to provide tooltips with textual labels for what each button means. Still, it betrays the emperor's invisible clothes of the "graphics == intuitive" mantra -- you still have to learn the icons just like you have to learn the keywords of a text-based UI, before you can use the software effectively. Reminds me also of the infamous Mystery Meat navigation style of the 90's, where people would use images for navigation weblinks on their website, that you basically don't know where they're linking to until you click on it. This is why I think GUIs and the whole "desktop metaphor" craze is heading the wrong direction, and why 95% of my computer usage is via a text terminal. There's a place for graphical interfaces, but it's gone too far these days. But thanks to Unicode emoticons, we can now have icons on my text terminal too, isn't that just wonderful?! Esp. when a missing/incompatible font causes them to show up as literal blank boxes. The power of a standardized, universal character set, lemme tell ya! T -- Almost all proofs have bugs, but almost all theorems are true. -- Paul Pedersen
Dec 02
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/2/2017 5:59 PM, H. S. Teoh wrote:
 [...]
Even worse, companies go and copyright their icons, guaranteeing they have to be substantially different for every company! If there ever was an Emperor's New Clothes, it's icons and emojis.
Dec 02
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 12/2/17 11:28 PM, Walter Bright wrote:
 On 12/2/2017 5:59 PM, H. S. Teoh wrote:
 [...]
Even worse, companies go and copyright their icons, guaranteeing they have to be substantially different for every company!
I like this site for icons. Only requires you to reference them in your about box: https://icons8.com/ -Steve
Dec 04
prev sibling parent Kagamin <spam here.lot> writes:
On Sunday, 3 December 2017 at 01:59:58 UTC, H. S. Teoh wrote:
 Still, it betrays the emperor's invisible clothes of the 
 "graphics == intuitive" mantra -- you still have to learn the 
 icons just like you have to learn the keywords of a text-based 
 UI, before you can use the software effectively.
What happened when you ran vi for the first time?
Dec 04
prev sibling next sibling parent reply codephantom <me noyb.com> writes:
On Saturday, 2 December 2017 at 04:08:54 UTC, Jonathan M Davis 
wrote:
 The fact that they're then trying to put those pictures into 
 the Unicode standard just blatantly shows that the Unicode 
 folks have lost sight of what they're up to. It's like if they 
 started trying to add Unicode characters for words. It makes no 
 sense. But unfortunately, we just have to live with it... :(

 - Jonathan M Davis
The real problem, is that sometimes people don't feel like a little cat with a smiling face. Sometimes, people actually get pissed off at something, and would like to express it. Do the people on the unicode consortium consider such communication to be invalid? Where are the emoji's for saying.. I'm pissed off at this..or that.. (unicode consortium == emoji censorship) https://www.google.com.au/search?q=fuck+you+emoticon&source=lnms&tbm=isch&sa=X&ved=0ahUKEwiWkMzMpOvXAhWIj5QKHVnGC5YQ_AUICigB&biw=1536&bih=736
Dec 02
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Saturday, 2 December 2017 at 12:25:22 UTC, codephantom wrote:
 Do the people on the unicode consortium consider such 
 communication to be invalid?
https://splinternews.com/violent-emoji-are-starting-to-get-people-in-trouble-wit-1793845130 On the other hand try to google "emoji sexual"…
Dec 02
parent reply codephantom <me noyb.com> writes:
On Saturday, 2 December 2017 at 16:44:56 UTC, Ola Fosheim Grøstad 
wrote:
 On Saturday, 2 December 2017 at 12:25:22 UTC, codephantom wrote:
 Do the people on the unicode consortium consider such 
 communication to be invalid?
https://splinternews.com/violent-emoji-are-starting-to-get-people-in-trouble-wit-1793845130 On the other hand try to google "emoji sexual"…
No. Humans never express negative emotions, and also, never communicate a desire to have sex. That's explains a lot about the unicode consortium. 's', 'e', 'x' is ok, just not together. Q.What's the difference between a politician and an emoji? A.Nothing. You cannot take either at face value. ..oophs. politics again. I should know better. but my wider point is, unicode emoji's are useless if they only contain those that 'some' consider to be polictically correct, or socially acceptable. The Unicode consortium is a bunch of ... (I don't have the unicode emoji representation yet to complete that sentence).
Dec 02
parent codephantom <me noyb.com> writes:
On Sunday, 3 December 2017 at 01:11:14 UTC, codephantom wrote:
 but my wider point is, unicode emoji's are useless if they only 
 contain those that 'some' consider to be polictically correct, 
 or socially acceptable.

 The Unicode consortium is a bunch of ...   (I don't have the 
 unicode emoji representation yet to complete that sentence).
btw. Good article here, further demonstrating my point.. "We're talking about engineers that are concerned about standards and internationalization issues who now have to do something more in line with Apple or Google's marketing teams,". https://www.buzzfeed.com/charliewarzel/thanks-to-apples-influence-youre-not-getting-a-rifle-emoji
Dec 02
prev sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Saturday, 2 December 2017 at 04:08:54 UTC, Jonathan M Davis 
wrote:
 code points. Emojis are specifically representable by a 
 sequence of existing characters (usually ASCII), because they 
 came from folks trying to represent pictures with text.
They are used as symbols culturally, which is how written language happen, so I think the real question is if they have just implemented the ones that have become widespread over a long period of time or if they have deliberately created completely new ones... It makes sense for the most used ones. E.g. I don't want "8-(3+4)" to render as "😳3+4" ;-) There is also a difference between Ø and ∅, because the meaning is different. Too bad the same does not apply to arrows (math vs non math usage). So yeah, they could do better, but not too bad. If something is widely used in a way that gives signs a different meaning then it makes sense to introduce a new symbol for it so that one both can render them slightly differently and so that the programs can interpret them correctly.
Dec 02
prev sibling next sibling parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Thursday, 30 November 2017 at 10:19:18 UTC, Walter Bright 
wrote:
 On 11/27/2017 7:01 PM, A Guy With an Opinion wrote:
 [...]
Sooner or later your code will exhibit bugs if it assumes that char==codepoint with UTF16, because of surrogate pairs. https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java As far as I can tell, pretty much the only users of UTF16 are Windows programs. Everyone else uses UTF8 or UCS32. I recommend using UTF8.
I assume you meant UTF32 not UCS32, given UCS2 is Microsoft's half-assed UTF16.
Nov 30
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/30/2017 2:47 AM, Nicholas Wilson wrote:
 As far as I can tell, pretty much the only users of UTF16 are Windows 
 programs. Everyone else uses UTF8 or UCS32.
I assume you meant UTF32 not UCS32, given UCS2 is Microsoft's half-assed UTF16.
I meant UCS-4, which is identical to UTF-32. It's hard keeping all that stuff straight. Sigh. https://en.wikipedia.org/wiki/UTF-32
Nov 30
parent reply A Guy With a Question <aguywithanquestion gmail.com> writes:
On Thursday, 30 November 2017 at 11:41:09 UTC, Walter Bright 
wrote:
 On 11/30/2017 2:47 AM, Nicholas Wilson wrote:
 As far as I can tell, pretty much the only users of UTF16 are 
 Windows programs. Everyone else uses UTF8 or UCS32.
I assume you meant UTF32 not UCS32, given UCS2 is Microsoft's half-assed UTF16.
I meant UCS-4, which is identical to UTF-32. It's hard keeping all that stuff straight. Sigh. https://en.wikipedia.org/wiki/UTF-32
It's also worth mentioning that the more I think about it, the UTF8 vs. UTF16 thing was probably not worth mentioning with the rest of the things I listed out. It's pretty minor and more of a preference.
Nov 30
parent Walter Bright <newshound2 digitalmars.com> writes:
On 11/30/2017 5:22 AM, A Guy With a Question wrote:
 It's also worth mentioning that the more I think about it, the UTF8 vs. UTF16 
 thing was probably not worth mentioning with the rest of the things I listed 
 out. It's pretty minor and more of a preference.
Both Windows and Java selected UTF16 before surrogates were added, so it was a reasonable decision made in good faith. But an awful lot of Windows/Java code has latent bugs in it because of not dealing with surrogates. D is designed from the ground up to work smoothly with UTF8/UTF16 multi-codeunit encodings. If you do decide to use UTF16, please take advantage of this and deal with surrogates correctly. When you do decide to give up on UTF16 (!) and go with UTF8, your code will be easy to convert to UTF8.
Nov 30
prev sibling parent reply A Guy With a Question <aguywithanquestion gmail.com> writes:
On Thursday, 30 November 2017 at 10:19:18 UTC, Walter Bright 
wrote:
 On 11/27/2017 7:01 PM, A Guy With an Opinion wrote:
 +- Unicode support is good. Although I think D's string type 
 should have probably been utf16 by default. Especially 
 considering the utf module states:
 
 "UTF character support is restricted to '\u0000' <= character 
 <= '\U0010FFFF'."
 
 Seems like the natural fit for me. Plus for the vast majority 
 of use cases I am pretty guaranteed a char = codepoint. Not 
 the biggest issue in the world and maybe I'm just being overly 
 critical here.
Sooner or later your code will exhibit bugs if it assumes that char==codepoint with UTF16, because of surrogate pairs. https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java As far as I can tell, pretty much the only users of UTF16 are Windows programs. Everyone else uses UTF8 or UCS32. I recommend using UTF8.
As long as you understand it's limitations I think most bugs can be avoided. Where UTF16 breaks down, is pretty well defined. Also, super rare. I think UTF32 would be great to, but it seems like just a waste of space 99% of the time. UTF8 isn't horrible, I am not going to never use D because it uses UTF8 (that would be silly). Especially when wstring also seems baked into the language. However, it can complicate code because you pretty much always have to assume character != codepoint outside of ASCII. I can see a reasonable person arguing that it forcing you assume character != code point is actually a good thing. And that is a valid opinion.
Nov 30
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Thursday, November 30, 2017 13:18:37 A Guy With a Question via 
Digitalmars-d wrote:
 As long as you understand it's limitations I think most bugs can
 be avoided. Where UTF16 breaks down, is pretty well defined.
 Also, super rare. I think UTF32 would be great to, but it seems
 like just a waste of space 99% of the time. UTF8 isn't horrible,
 I am not going to never use D because it uses UTF8 (that would be
 silly). Especially when wstring also seems baked into the
 language. However, it can complicate code because you pretty much
 always have to assume character != codepoint outside of ASCII. I
 can see a reasonable person arguing that it forcing you assume
 character != code point is actually a good thing. And that is a
 valid opinion.
The reality of the matter is that if you want to write fully valid Unicode, then you have to understand the differences between code units, code points, and graphemes, and since it really doesn't make sense to operate at the grapheme level for everything (it would be terribly slow and is completely unnecessary for many algorithms), you pretty much have to come to accept that in the general case, you can't assume that something like a char represents an actual character, regardless of its encoding. UTF-8 vs UTF-16 doesn't change anything in that respect except for the fact that there are more characters which fit fully in a UTF-16 code unit than a UTF-8 code unit, so it's easier to think that you're correctly handling Unicode when you actually aren't. And if you're not dealing with Asian languages, UTF-16 uses up more space than UTF-8. But either way, they're both wrong if you're trying to treat a code unit as a code point, let alone a grapheme. It's just that we have a lot of programmers who only deal with English and thus don't as easily hit the cases where their code is wrong. For better or worse, UTF-16 hides it better than UTF-8, but the problem exists in both. - Jonathan M Davis
Nov 30
next sibling parent Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M Davis 
wrote:
 [...] And if you're not dealing with Asian languages, UTF-16 
 uses up more space than UTF-8.
Not even that in most cases. Only if you use unstructured text can it happen that UTF-16 needs less space than UTF-8. In most cases, the text is embedded in some sort of ML (html, odf, docx, tmx, xliff, akoma ntoso, etc...) which puts the balance again to the side of UTF-8.
Nov 30
prev sibling parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M Davis 
wrote:
 English and thus don't as easily hit the cases where their code 
 is wrong. For better or worse, UTF-16 hides it better than 
 UTF-8, but the problem exists in both.
To give just an example of what can go wrong with UTF-16. Reading a file in UTF-16 and converting it tosomething else like UTF-8 or UTF-32. Reading block by block and hitting exactly a SMP codepoint at the buffer limit, high surrogate at the end of the first buffer, low surrogate at the start of the next. If you don't think about it => 2 invalid characters instead of your nice poop 💩 emoji character (emojis are in the SMP and they are more and more frequent).
Nov 30
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/30/17 1:20 PM, Patrick Schluter wrote:
 On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M Davis wrote:
 English and thus don't as easily hit the cases where their code is 
 wrong. For better or worse, UTF-16 hides it better than UTF-8, but the 
 problem exists in both.
To give just an example of what can go wrong with UTF-16. Reading a file in UTF-16 and converting it tosomething else like UTF-8 or UTF-32. Reading block by block and hitting exactly a SMP codepoint at the buffer limit, high surrogate at the end of the first buffer, low surrogate at the start of the next. If you don't think about it => 2 invalid characters instead of your nice poop 💩 emoji character (emojis are in the SMP and they are more and more frequent).
iopipe handles this: http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html -Steve
Nov 30
parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Thursday, 30 November 2017 at 19:37:47 UTC, Steven 
Schveighoffer wrote:
 On 11/30/17 1:20 PM, Patrick Schluter wrote:
 On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M 
 Davis wrote:
 English and thus don't as easily hit the cases where their 
 code is wrong. For better or worse, UTF-16 hides it better 
 than UTF-8, but the problem exists in both.
To give just an example of what can go wrong with UTF-16. Reading a file in UTF-16 and converting it tosomething else like UTF-8 or UTF-32. Reading block by block and hitting exactly a SMP codepoint at the buffer limit, high surrogate at the end of the first buffer, low surrogate at the start of the next. If you don't think about it => 2 invalid characters instead of your nice poop 💩 emoji character (emojis are in the SMP and they are more and more frequent).
iopipe handles this: http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html
It was only to give an example. With UTF-8 people who implement the low level code in general think about the multiple codeunits at the buffer boundary. With UTF-16 it's often forgotten. In UTF-16 there are also 2 other common pitfalls, that exist also in UTF-8 but are less consciously acknowledged, overlong encoding and isolated codepoints. So UTF-16 has the same issues as UTF-8, plus some more, endianness and size.
Nov 30
next sibling parent reply A Guy With a Question <aguywithanquestion gmail.com> writes:
On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter 
wrote:
 On Thursday, 30 November 2017 at 19:37:47 UTC, Steven 
 Schveighoffer wrote:
 On 11/30/17 1:20 PM, Patrick Schluter wrote:
 On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M 
 Davis wrote:
 English and thus don't as easily hit the cases where their 
 code is wrong. For better or worse, UTF-16 hides it better 
 than UTF-8, but the problem exists in both.
To give just an example of what can go wrong with UTF-16. Reading a file in UTF-16 and converting it tosomething else like UTF-8 or UTF-32. Reading block by block and hitting exactly a SMP codepoint at the buffer limit, high surrogate at the end of the first buffer, low surrogate at the start of the next. If you don't think about it => 2 invalid characters instead of your nice poop 💩 emoji character (emojis are in the SMP and they are more and more frequent).
iopipe handles this: http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html
It was only to give an example. With UTF-8 people who implement the low level code in general think about the multiple codeunits at the buffer boundary. With UTF-16 it's often forgotten. In UTF-16 there are also 2 other common pitfalls, that exist also in UTF-8 but are less consciously acknowledged, overlong encoding and isolated codepoints. So UTF-16 has the same issues as UTF-8, plus some more, endianness and size.
Most problems with UTF16 is applicable to UTF8. The only issue that isn't, is if you are just dealing with ASCII it's a bit of a waste of space.
Dec 01
parent Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Friday, 1 December 2017 at 12:21:22 UTC, A Guy With a Question 
wrote:
 On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter 
 wrote:
 On Thursday, 30 November 2017 at 19:37:47 UTC, Steven 
 Schveighoffer wrote:
 On 11/30/17 1:20 PM, Patrick Schluter wrote:
 [...]
iopipe handles this: http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html
It was only to give an example. With UTF-8 people who implement the low level code in general think about the multiple codeunits at the buffer boundary. With UTF-16 it's often forgotten. In UTF-16 there are also 2 other common pitfalls, that exist also in UTF-8 but are less consciously acknowledged, overlong encoding and isolated codepoints. So UTF-16 has the same issues as UTF-8, plus some more, endianness and size.
Most problems with UTF16 is applicable to UTF8. The only issue that isn't, is if you are just dealing with ASCII it's a bit of a waste of space.
That's what I said. UTF-16 and UTF-8 have the same issues, but UTF-16 has even 2 more: endianness and bloat for ASCII. All 3 encodings have their pluses and minuses, that's why D supports all 3 but with a preference for utf-8.
Dec 01
prev sibling next sibling parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter 
wrote:
 On Thursday, 30 November 2017 at 19:37:47 UTC, Steven 
 Schveighoffer wrote:
 On 11/30/17 1:20 PM, Patrick Schluter wrote:
 [...]
iopipe handles this: http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html
It was only to give an example. With UTF-8 people who implement the low level code in general think about the multiple codeunits at the buffer boundary. With UTF-16 it's often forgotten. In UTF-16 there are also 2 other common pitfalls, that exist also in UTF-8 but are less consciously acknowledged, overlong encoding and isolated codepoints. So UTF-16 has the
I meant isolated code-units, of course.
 same issues as UTF-8, plus some more, endianness and size.
Dec 01
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 12/1/17 7:26 AM, Patrick Schluter wrote:
 On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter wrote:
  isolated codepoints. 
I meant isolated code-units, of course.
Hehe, it's impossible for me to talk about code points and code units without having to pause and consider which one I mean :) -Steve
Dec 01
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, December 01, 2017 09:49:08 Steven Schveighoffer via Digitalmars-d 
wrote:
 On 12/1/17 7:26 AM, Patrick Schluter wrote:
 On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter wrote:
  isolated codepoints.
I meant isolated code-units, of course.
Hehe, it's impossible for me to talk about code points and code units without having to pause and consider which one I mean :)
What, you mean that Unicode can be confusing? No way! ;) LOL. I have to be careful with that too. What bugs me even more though is that the Unicode spec talks about code points being characters, and then talks about combining characters for grapheme clusters - and this in spite of the fact that what most people would consider a character is a grapheme cluster and _not_ a code point. But they presumably had to come up with new terms for a lot of this nonsense, and that's not always easy. Regardless, what they came up with is complicated enough that it's arguably a miracle whenever a program actually handles Unicode text 100% correctly. :| - Jonathan M Davis
Dec 01
parent A Guy With a Question <aguywithanquestion gmail.com> writes:
On Friday, 1 December 2017 at 18:31:46 UTC, Jonathan M Davis 
wrote:
 On Friday, December 01, 2017 09:49:08 Steven Schveighoffer via 
 Digitalmars-d wrote:
 On 12/1/17 7:26 AM, Patrick Schluter wrote:
 On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter 
 wrote:
  isolated codepoints.
I meant isolated code-units, of course.
Hehe, it's impossible for me to talk about code points and code units without having to pause and consider which one I mean :)
What, you mean that Unicode can be confusing? No way! ;) LOL. I have to be careful with that too. What bugs me even more though is that the Unicode spec talks about code points being characters, and then talks about combining characters for grapheme clusters - and this in spite of the fact that what most people would consider a character is a grapheme cluster and _not_ a code point. But they presumably had to come up with new terms for a lot of this nonsense, and that's not always easy. Regardless, what they came up with is complicated enough that it's arguably a miracle whenever a program actually handles Unicode text 100% correctly. :| - Jonathan M Davis
And dealing with that complexity can often introduce bugs in their own right, because it's hard to get right. That's why sometimes it's easy just to simplify things and to exclude certain ways of looking at the string.
Dec 01
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 11/30/2017 10:07 PM, Patrick Schluter wrote:
 endianness
Yeah, I forgot to mention that one. As if anyone remembers to put in the Byte Order Mark :-(
Dec 02
prev sibling parent Kagamin <spam here.lot> writes:
On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an 
Opinion wrote:
 - Attributes. I had another post in the Learn forum about 
 attributes which was unfortunate. At first I was excited 
 because it seems like on the surface it would help me write 
 better code, but it gets a little tedious and tiresome to have 
 to remember to decorate code with them.
Then do it the C# way. There's choice.
 I think the better decision would be to not have the errors 
 occur.
Hehe, I'm not against living in an idea world either.
 - Immutable. I'm not sure I fully understand it. On the surface 
 it seemed like const but transitive. I tried having a method 
 return an immutable value, but when I used it in my unit test I 
 got some weird errors about objects not being able to return 
 immutable (I forget the exact error...apologies).
That's the point of static type system: if you make a mistake, the code doesn't compile.
 +- Unicode support is good. Although I think D's string type 
 should have probably been utf16 by default. Especially 
 considering the utf module states:

 "UTF character support is restricted to '\u0000' <= character 
 <= '\U0010FFFF'."

 Seems like the natural fit for me.
UTF-16 in inadequate for range '\u0000' <= character <= '\U0010FFFF', though. UCS2 was adequate (for '\u0000' <= character <= '\uFFFF'), but lost relevance. UTF-16 is only backward compatibility for early adopters of unicode based on UCS2.
 Plus for the vast majority of use cases I am pretty guaranteed 
 a char = codepoint.
That way only end users will be able to catch bugs in production system. It's not the best strategy, is it? Text is often persistent data, how do you plan to fix a text handling bug when corruption accumulated for years and spilled all over the place?
Nov 30