digitalmars.D - First Impressions!

A Guy With an Opinion (162/162) Nov 27 2017 Hi,

docandrew (20/30) Nov 27 2017 Good feedback overall, thanks for checking it out. You're not
rikki cattermole (25/186) Nov 27 2017 Its on our TODO list.

A Guy With an Opinion (15/27) Nov 27 2017 That's good to hear.

ketmar (13/17) Nov 27 2017 basically, default initializers aren't meant to give a "usable value", t...

A Guy With an Opinion (13/32) Nov 27 2017 Eh...I still don't agree. I think C and C++ just gave that style

A Guy With an Opinion (4/44) Nov 27 2017 Also, C and C++ didn't just have undefined behavior, sometimes it

codephantom (3/6) Nov 27 2017 set to?
Patrick Schluter (4/10) Nov 28 2017 It's only auto variables that are undefined. statics and code

ketmar (9/10) Nov 27 2017 anyway, it is something that won't be changed, 'cause there may be code
Patrick Schluter (20/60) Nov 28 2017 Just a little anecdote of a maintainer of a legacy project in C.

Kagamin (4/6) Nov 30 2017 UCS2 was awesome. UTF-16 is used by Java, JavaScript,

Walter Bright (12/18) Dec 01 2017 "was" :-) Those are pretty much pre-surrogate pair designs, or based on ...

H. S. Teoh (12/27) Dec 01 2017 This is not true in Asia, esp. where the CJK block is extensively used.

Walter Bright (6/15) Dec 02 2017 Are you sure about that? I know that Asian languages will be longer in U...

Jacob Carlborg (6/10) Dec 02 2017 Not necessarily. I've seen code in non-English languages, i.e. when the

Patrick Schluter (13/41) Dec 02 2017 That's true in theory, in practice it's not that severe as the

Patrick Schluter (4/14) Dec 02 2017 106% for Korean, copied the wrong column. Traditiojal Chinese was

Joakim (25/49) Dec 02 2017 Yep, that's why five years back many of the major Chinese sites

Patrick Schluter (23/56) Dec 03 2017 Summary
Andrei Alexandrescu (3/11) Dec 04 2017 BTW has anyone been in contact with Xah Lee? Perhaps we could commission...

Joakim (4/16) Dec 04 2017 I traded email with him last summer, emailed you his email

Adam D. Ruppe (65/83) Nov 27 2017 Yes, indeed, and many of them don't help much in finding the real

A Guy With an Opinion (6/7) Nov 27 2017 I actually did try something like that, because I remembered

Michael V. Franklin (21/88) Nov 27 2017 I come from a heavy C#/C++ background. I also I *felt* this as

A Guy With an Opinion (15/21) Nov 27 2017 I'd be happy to submit an issue, but I'm not quite sure I'd be

Michael V. Franklin (6/13) Nov 27 2017 If this was on the forum, please point me to it. I'll see if I

A Guy With an Opinion (3/16) Nov 27 2017 https://forum.dlang.org/thread/vcvlffjxowgdvpvjsijq@forum.dlang.org

Steven Schveighoffer (47/95) Nov 28 2017 Hi Guy, welcome, and I wanted to say I was saying "me too" while reading...

A Guy With an Opinion (7/12) Nov 28 2017 That's exactly what it was I think. As I stated before, I tried

A Guy With an Opinion (7/9) Nov 28 2017 On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole

Guillaume Piolat (20/21) Nov 28 2017 You are not supposed to come to this forum with well-balanced
Jack Stouffer (16/27) Nov 28 2017 Attributes were one of my biggest hurdles when working on my own

Adam D. Ruppe (3/5) Nov 28 2017 That doesn't quite work since it doesn't descend into aggregates.

Jacob Carlborg (4/6) Nov 28 2017 And if your project is a library.
A Guy With an Opinion (2/4) Nov 28 2017 I take it adding those inverse attributes is no trivial thing?

Michael V. Franklin (7/8) Nov 28 2017 It would require a DIP: https://github.com/dlang/DIPs

Mike Parker (4/7) Nov 28 2017 It's awaiting formal review. I'll move it forward when the formal

A Guy With a Question (4/12) Nov 29 2017 How well does phobos play with it? I'm finding, for instance,

Adam D. Ruppe (9/10) Nov 28 2017 Technically, it is extremely trivial.

Dukc (6/11) Nov 30 2017 In fact I believe it is. When you have something unsafe you can

Walter Bright (7/15) Nov 30 2017 Sooner or later your code will exhibit bugs if it assumes that char==cod...

Joakim (7/25) Nov 30 2017 Java, .NET, Qt, Javascript, and a handful of others use UTF-16

Walter Bright (2/9) Nov 30 2017 I stand corrected.

Jonathan M Davis (26/35) Nov 30 2017 I get the impression that the stuff that uses UTF-16 is mostly stuff tha...

A Guy With a Question (19/42) Nov 30 2017 I don't think that's true though. Haven't you always been able to
A Guy With a Question (7/20) Nov 30 2017 I think it also simplifies the logic. You are not always looking

Jonathan M Davis (33/54) Nov 30 2017 Even if that were true, UTF-16 code units are not code points. If you wa...

Walter Bright (6/10) Dec 01 2017 UTF-8 is not the cause of that particular problem, it's caused by the Un...

Jonathan M Davis (16/26) Dec 01 2017 Oh, definitely. UTF-8 is arguably the best that Unicode has, but Unicode...

Walter Bright (24/36) Dec 02 2017 Yup. I've presented that point of view a couple times on HackerNews, and...

Patrick Schluter (5/11) Dec 02 2017 Where it gets really fun is the when there is color composition
H. S. Teoh (27/33) Dec 02 2017 The same can be argued for the icon mania started by the GUI craze in

Walter Bright (4/5) Dec 02 2017 Even worse, companies go and copyright their icons, guaranteeing they ha...

Steven Schveighoffer (5/10) Dec 04 2017 I like this site for icons. Only requires you to reference them in your

Kagamin (2/6) Dec 04 2017 What happened when you ran vi for the first time?

codephantom (11/17) Dec 02 2017 The real problem, is that sometimes people don't feel like a

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/5) Dec 02 2017 https://splinternews.com/violent-emoji-are-starting-to-get-people-in-tro...

codephantom (13/18) Dec 02 2017 No. Humans never express negative emotions, and also, never

codephantom (6/11) Dec 02 2017 btw. Good article here, further demonstrating my point..

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (16/19) Dec 02 2017 They are used as symbols culturally, which is how written

Nicholas Wilson (4/12) Nov 30 2017 I assume you meant UTF32 not UCS32, given UCS2 is Microsoft's

Walter Bright (4/7) Nov 30 2017 I meant UCS-4, which is identical to UTF-32. It's hard keeping all that ...

A Guy With a Question (6/14) Nov 30 2017 It's also worth mentioning that the more I think about it, the

Walter Bright (8/11) Nov 30 2017 Both Windows and Java selected UTF16 before surrogates were added, so it...

A Guy With a Question (13/31) Nov 30 2017 As long as you understand it's limitations I think most bugs can

Jonathan M Davis (19/30) Nov 30 2017 The reality of the matter is that if you want to write fully valid Unico...

Patrick Schluter (7/9) Nov 30 2017 Not even that in most cases. Only if you use unstructured text
Patrick Schluter (10/13) Nov 30 2017 To give just an example of what can go wrong with UTF-16. Reading

Steven Schveighoffer (4/17) Nov 30 2017 iopipe handles this:

Patrick Schluter (9/27) Nov 30 2017 It was only to give an example. With UTF-8 people who implement

A Guy With a Question (5/34) Dec 01 2017 Most problems with UTF16 is applicable to UTF8. The only issue

Patrick Schluter (6/28) Dec 01 2017 That's what I said. UTF-16 and UTF-8 have the same issues, but

Patrick Schluter (3/18) Dec 01 2017 I meant isolated code-units, of course.

Steven Schveighoffer (4/9) Dec 01 2017 Hehe, it's impossible for me to talk about code points and code units

Jonathan M Davis (13/20) Dec 01 2017 What, you mean that Unicode can be confusing? No way! ;)

A Guy With a Question (6/31) Dec 01 2017 And dealing with that complexity can often introduce bugs in

Walter Bright (3/4) Dec 02 2017 Yeah, I forgot to mention that one. As if anyone remembers to put in the...

Kagamin (15/35) Nov 30 2017 Then do it the C# way. There's choice.

A Guy With an Opinion <aguywithanopinion gmail.com> writes:

Hi,

I've been using D for a personal project for about two weeks now 
and just thought I'd share my initial impression just in case 
it's useful! I like feedback on things I do, so I just assume 
others do to. Plus my opinion is the best on the internet! You 
will see (hopefully the sarcasm is obvious otherwise I'll just 
appear pompous). It would probably be better if I did a 
retrospective after my project is completed, but with life who 
knows if that will happen. I could lose interest or something and 
not finish it. And then you guys wouldn't know my opinion. I 
can't allow that.

I'll start off by saying I like the overall experience. I come 


a day to day basis. I did do a three year stint working with 
C/C++ (mostly C++), but I never really enjoyed it much. C++ is 
overly verbose, overly complicated, overly littered with poor 

for the most part been a delight. The only problem is I don't 
find it to be the best when it comes to generative programming. 

the most part it's always struck me as more specialized for 
container types and to do anything remotely outside of it's 
purpose takes a fair bit of cleverness. I'm sick of being clever 
in that aspect.

So here are some impressions good and bad:


the .NET framework, like files and unicode, have fairly direct 
counterparts in D.

+ D code so far is pushing me towards more "flat" code (for a 
lack of a better way to phrase it) and so far that has helped 

opposite. With it's namespace -> class -> method coupled with 
lock, using, etc...you tend to do a lot of nesting. You are 
generally 3 '{' in before any true logic even begins. Then couple 
that with try/catch, IDisposable/using, locking, and then 
if/else, it can get quite chaotic very easily. So right away, I 

and I think it has to do with the flatness. I'm not sure if that 
opinion will hold when I delve into 'static if' a little more, 
but so far my uses of it haven't really dampened that opinion.

+ Visual D. It might be that I had poor expectations of it, 
because I read D's tooling was poor on the internet (and nothing 
is ever wrong on the internet), however, the combination of 
Visual D and DMD actually exceeded my expectations. I've been 
quite happy with it. It was relatively easy to set up and worked 
as I would expect it to work. It lets me debug, add breakpoints, 
and does the basic syntax highlighting I would expect. It could 
have a few other features, but for a project that is not 
corporate backed, it was really above what I could have asked for.

+ So far, compiling is fast. And from what I hear it will stay 
fast. A big motivator. The one commercial C++ project I worked on 
was a beast and could take an hour+ to compile if you needed to 

accustomed to not having to go to the bathroom, get a drink, 
etc...before returning to find out I'm on the linking step. I'm 
used to if it doesn't take less than ten seconds (probably less) 
then I prep myself for an error to deal with. I want this to 
remain.

- Some of the errors from DMD are a little strange. I don't want 
to crap on this too much, because for the most part it's fine. 
However occasionally it throws errors I still can't really work 
out why THAT is the error it gave me. Some of you may have saw my 
question in the "Learn" forum about not knowing to use static in 
an embedded class, but the error was the following:

Error: 'this' is only defined in non-static member functions

I'd say the errors so far are above some of the cryptic stuff C++ 
can throw at you (however, I haven't delved that deeply into D 
templates yet, so don't hold me to this yet), but in terms of 



+ The standard library so far is really good. Nullable worked as 
I thought it should. I just guessed a few of the methods based on 
what I had seen at that point and got it right. So it appears 
consistent and intuitive. I also like the fact I can peek at the 
code and understand it by just reading it. Unlike with C++ where 
I still don't know how some of the stuff is *really* implemented. 
The STL almost seems like it's written in a completely different 
language than the stuff it enables. For instance, I figured out 
how to do packages by seeing it in Phobos.

- ...however, where are all of the collections? No Queue? No 
Stack? No HashTable? I've read that it's not a big focus because 


Dictionary<> quite a bit, so I'm not looking forward to having to 
hand roll my own or use something that aren't fundamentally them. 
This is definitely the biggest negative I've come across. I want 
a queue, not something that *can* behave as a queue. I definitely 
expected more from a language that is this old.

+ Packages and 'public import'. I really think it's useful to 
forward imports/using statements. It kind of packages everything 
that is required to use that thing in your namespace/package 
together. So you don't have to include a dozen things. C and C++ 
can do this with it's #includes, but in an unsatisfactory way. At 
least in my opinion.

- Modules. I like modules better than #include, but I don't like 

there is this gravity that kind of pulls me to associate a module 
with a file. It appears you don't have to, because I can do the 
package thing, but whenever I try to do things outside that one 
idiom I end up in a soup of errors. I'm sure I'm just not use to 
it, but so far it's been a little dissatisfying. Sometimes I want 
where it is physically on my file system to be different from how 

really the standard to beat or meet.

+ Unit tests. Finally built in unit tests. Enough said here. If 
the lack of collections was the biggest negative, this is the 
biggest positive. I would like to enable them at build time if 
possible though.

- Attributes. I had another post in the Learn forum about 
attributes which was unfortunate. At first I was excited because 
it seems like on the surface it would help me write better code, 
but it gets a little tedious and tiresome to have to remember to 
decorate code with them. It seems like most of them should have 
been the defaults. I would have preferred if the compiler helped 
me and reminded me. I asked if there was a way to enforce them 
globally, which I guess there is, but I guess there's also not a 
way to turn some of them off afterwards. A bit unfortunate. But 
at least I can see some solutions to this.

- The defaults for primitives seem off. They seem to encourage 
errors. I don't think that is the best design decision even if it 
encourages the errors to be caught as quickly as possible. I 
think the better decision would be to not have the errors occur. 
When I asked about this, there seemed to be a disassociation 
between the spec and the implementation. The spec says a 
declaration should error if not explicitly set, but the 
implementation just initializes them to something that is likely 
to error. Like NaN for floats which I would have thought would 
have been 0 based on prior experiences with other languages.

- Immutable. I'm not sure I fully understand it. On the surface 
it seemed like const but transitive. I tried having a method 
return an immutable value, but when I used it in my unit test I 
got some weird errors about objects not being able to return 
immutable (I forget the exact error...apologies). I refactored to 
use const, and it all worked as I expected, but I don't get why 
the immutable didn't work. I was returning a value type, so I 
don't see why passing in assert(object.errorCount == 0) would 
have triggered errors. But it did. I have a set of classes that 
keep track of snapshots of specific counts that seems like a 
perfect fit for immutable (because I don't want those 'snapshots' 
to change...like ever), but I kept getting errors trying to use 
it like const. The type string seems to be an immutable(char[]) 
which works exactly the way I was expecting, and I haven't ran 
into problems, so I'm not sure what the problem was. I'm just 
more confused knowing that string works, but what I did didn't.

+- Unicode support is good. Although I think D's string type 
should have probably been utf16 by default. Especially 
considering the utf module states:

"UTF character support is restricted to '\u0000' <= character <= 
'\U0010FFFF'."

Seems like the natural fit for me. Plus for the vast majority of 
use cases I am pretty guaranteed a char = codepoint. Not the 
biggest issue in the world and maybe I'm just being overly 
critical here.

+ Templates seem powerful. I've only fiddled thus far, but I 
don't think I've quite comprehended their usefulness yet. It will 
probably take me some time to figure out how to wield them 
effectively. One thing I accidentally stumbled upon that I liked 
was that I could simulate inheritance in structs with them, by 
using the mixin keyword. That was cool, and I'm not even sure if 
that is what they were really meant to enable.

So those are just some of my thoughts. Tell me why I'm wrong :P

Nov 27 2017

docandrew <x x.com> writes:

On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an 
Opinion wrote:
 - ...however, where are all of the collections? No Queue? No 
 Stack? No HashTable? I've read that it's not a big focus 
 because some of the built in stuff *can* behave like those 


 forward to having to hand roll my own or use something that 
 aren't fundamentally them. This is definitely the biggest 
 negative I've come across. I want a queue, not something that 
 *can* behave as a queue. I definitely expected more from a 
 language that is this old.

Good feedback overall, thanks for checking it out. You're not 
wrong, but some of the design decisions that feel strange to 
newcomers at first have been heavily-debated, generally 
well-reasoned, and just take some time to get used to. That 
sounds like a cop-out, but stick with it and I think you'll find 
that a lot of the decisions make sense - see the extensive 
discussion on NaN-default for floats, for example.

Just one note about the above comment though: the 
std.container.dlist doubly-linked list has methods that you can 
use to put together stacks and queues easily:

https://dlang.org/phobos/std_container_dlist.html

Also, D's associative arrays implement a hash map 
https://dlang.org/spec/hash-map.html, which I think should take 


Anyhow, D is a big language (for better and sometimes worse), so 
it's easy to miss some of the good nuggets buried within the 
spec/library.

-Doc

Nov 27 2017

rikki cattermole <rikki cattermole.co.nz> writes:

On 28/11/2017 3:01 AM, A Guy With an Opinion wrote:
 Hi,
 
 I've been using D for a personal project for about two weeks now and 
 just thought I'd share my initial impression just in case it's useful! I 
 like feedback on things I do, so I just assume others do to. Plus my 
 opinion is the best on the internet! You will see (hopefully the sarcasm 
 is obvious otherwise I'll just appear pompous). It would probably be 
 better if I did a retrospective after my project is completed, but with 
 life who knows if that will happen. I could lose interest or something 
 and not finish it. And then you guys wouldn't know my opinion. I can't 
 allow that.
 

 and C++ background with a little bit of C mixed in. For the most part 

 I did do a three year stint working with C/C++ (mostly C++), but I never 
 really enjoyed it much. C++ is overly verbose, overly complicated, 

 the other hand has for the most part been a delight. The only problem is 
 I don't find it to be the best when it comes to generative programming. 

 most part it's always struck me as more specialized for container types 
 and to do anything remotely outside of it's purpose takes a fair bit of 
 cleverness. I'm sick of being clever in that aspect.
 
 So here are some impressions good and bad:
 

 .NET framework, like files and unicode, have fairly direct counterparts 
 in D.
 
 + D code so far is pushing me towards more "flat" code (for a lack of a 
 better way to phrase it) and so far that has helped tremendously when it 

 class -> method coupled with lock, using, etc...you tend to do a lot of 
 nesting. You are generally 3 '{' in before any true logic even begins. 
 Then couple that with try/catch, IDisposable/using, locking, and then 
 if/else, it can get quite chaotic very easily. So right away, I saw my 

 it has to do with the flatness. I'm not sure if that opinion will hold 
 when I delve into 'static if' a little more, but so far my uses of it 
 haven't really dampened that opinion.
 
 + Visual D. It might be that I had poor expectations of it, because I 
 read D's tooling was poor on the internet (and nothing is ever wrong on 
 the internet), however, the combination of Visual D and DMD actually 
 exceeded my expectations. I've been quite happy with it. It was 
 relatively easy to set up and worked as I would expect it to work. It 
 lets me debug, add breakpoints, and does the basic syntax highlighting I 
 would expect. It could have a few other features, but for a project that 
 is not corporate backed, it was really above what I could have asked for.
 
 + So far, compiling is fast. And from what I hear it will stay fast. A 
 big motivator. The one commercial C++ project I worked on was a beast 
 and could take an hour+ to compile if you needed to compile something 

 to go to the bathroom, get a drink, etc...before returning to find out 
 I'm on the linking step. I'm used to if it doesn't take less than ten 
 seconds (probably less) then I prep myself for an error to deal with. I 
 want this to remain.
 
 - Some of the errors from DMD are a little strange. I don't want to crap 
 on this too much, because for the most part it's fine. However 
 occasionally it throws errors I still can't really work out why THAT is 
 the error it gave me. Some of you may have saw my question in the 
 "Learn" forum about not knowing to use static in an embedded class, but 
 the error was the following:
 
 Error: 'this' is only defined in non-static member functions
 
 I'd say the errors so far are above some of the cryptic stuff C++ can 
 throw at you (however, I haven't delved that deeply into D templates 
 yet, so don't hold me to this yet), but in terms of quality I'd put it 

 
 + The standard library so far is really good. Nullable worked as I 
 thought it should. I just guessed a few of the methods based on what I 
 had seen at that point and got it right. So it appears consistent and 
 intuitive. I also like the fact I can peek at the code and understand it 
 by just reading it. Unlike with C++ where I still don't know how some of 
 the stuff is *really* implemented. The STL almost seems like it's 
 written in a completely different language than the stuff it enables. 
 For instance, I figured out how to do packages by seeing it in Phobos.
 
 - ...however, where are all of the collections? No Queue? No Stack? No 
 HashTable? I've read that it's not a big focus because some of the built 


 not looking forward to having to hand roll my own or use something that 
 aren't fundamentally them. This is definitely the biggest negative I've 
 come across. I want a queue, not something that *can* behave as a queue. 
 I definitely expected more from a language that is this old.

Its on our TODO list.

Allocators need to come out of experimental and some form of RC before 
we tackle it again.

In the mean time https://github.com/economicmodeling/containers is 
pretty good.

 + Packages and 'public import'. I really think it's useful to forward 
 imports/using statements. It kind of packages everything that is 
 required to use that thing in your namespace/package together. So you 
 don't have to include a dozen things. C and C++ can do this with it's 
 #includes, but in an unsatisfactory way. At least in my opinion.
 
 - Modules. I like modules better than #include, but I don't like them 

 gravity that kind of pulls me to associate a module with a file. It 
 appears you don't have to, because I can do the package thing, but 
 whenever I try to do things outside that one idiom I end up in a soup of 
 errors. I'm sure I'm just not use to it, but so far it's been a little 
 dissatisfying. Sometimes I want where it is physically on my file system 

 namespaces are really the standard to beat or meet.

Modules are a fairly well understood concept from the ML family.
You are not use to it is all :)

Keep in mind we do have namespaces for binding to c++ code and I haven't 
heard of anybody abusing it for the purpose of using name spaces. They 
tend to be ugly hacks with ambiguity running through them. Of course I 
never had to use them in c++ so I'm sure somebody can give you some war 
stories with them ;)

 + Unit tests. Finally built in unit tests. Enough said here. If the lack 
 of collections was the biggest negative, this is the biggest positive. I 
 would like to enable them at build time if possible though.

I keep saying it, if you don't have unit tests built in, you don't care 
about code quality!

 - Attributes. I had another post in the Learn forum about attributes 
 which was unfortunate. At first I was excited because it seems like on 
 the surface it would help me write better code, but it gets a little 
 tedious and tiresome to have to remember to decorate code with them. It 
 seems like most of them should have been the defaults. I would have 
 preferred if the compiler helped me and reminded me. I asked if there 
 was a way to enforce them globally, which I guess there is, but I guess 
 there's also not a way to turn some of them off afterwards. A bit 
 unfortunate. But at least I can see some solutions to this.

You don't need to bother with them for most code :)

 - The defaults for primitives seem off. They seem to encourage errors. I 
 don't think that is the best design decision even if it encourages the 
 errors to be caught as quickly as possible. I think the better decision 
 would be to not have the errors occur. When I asked about this, there 
 seemed to be a disassociation between the spec and the implementation. 
 The spec says a declaration should error if not explicitly set, but the 
 implementation just initializes them to something that is likely to 
 error. Like NaN for floats which I would have thought would have been 0 
 based on prior experiences with other languages.

Doesn't mean the other languages are right either.

 - Immutable. I'm not sure I fully understand it. On the surface it 
 seemed like const but transitive. I tried having a method return an 
 immutable value, but when I used it in my unit test I got some weird 
 errors about objects not being able to return immutable (I forget the 
 exact error...apologies). I refactored to use const, and it all worked 
 as I expected, but I don't get why the immutable didn't work. I was 
 returning a value type, so I don't see why passing in 
 assert(object.errorCount == 0) would have triggered errors. But it did. 
 I have a set of classes that keep track of snapshots of specific counts 
 that seems like a perfect fit for immutable (because I don't want those 
 'snapshots' to change...like ever), but I kept getting errors trying to 
 use it like const. The type string seems to be an immutable(char[]) 
 which works exactly the way I was expecting, and I haven't ran into 
 problems, so I'm not sure what the problem was. I'm just more confused 
 knowing that string works, but what I did didn't.
 
 +- Unicode support is good. Although I think D's string type should have 
 probably been utf16 by default. Especially considering the utf module 
 states:
 
 "UTF character support is restricted to '\u0000' <= character <= 
 '\U0010FFFF'."
 
 Seems like the natural fit for me. Plus for the vast majority of use 
 cases I am pretty guaranteed a char = codepoint. Not the biggest issue 
 in the world and maybe I'm just being overly critical here.

That uses a lot of memory UTF-16 instead of UTF-8. I would argue for 
UTF-32 instead of 16.

If you need a wstring, use a wstring!

Be aware Microsoft is alone in thinking that UTF-16 was awesome. 
Everybody else standardized on UTF-8 for Unicode.

 + Templates seem powerful. I've only fiddled thus far, but I don't think 
 I've quite comprehended their usefulness yet. It will probably take me 
 some time to figure out how to wield them effectively. One thing I 
 accidentally stumbled upon that I liked was that I could simulate 
 inheritance in structs with them, by using the mixin keyword. That was 
 cool, and I'm not even sure if that is what they were really meant to 
 enable.

And that is where we use alias this instead. Do wish it was fully 
implemented though (multiple).


Welcome!

Nov 27 2017

A Guy With an Opinion <aguywithanopinion gmail.com> writes:

On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole 
wrote:
 Its on our TODO list.

 Allocators need to come out of experimental and some form of RC 
 before we tackle it again.

 In the mean time https://github.com/economicmodeling/containers 
 is pretty good.

That's good to hear.

 I keep saying it, if you don't have unit tests built in, you 
 don't care about code quality!

I just like not having to create a throwaway project to test my 
code. It's nice to just use unit tests for what I used to create 
console apps for and then it forever ensures my
code works the same!

 You don't need to bother with them for most code :)

That seems to be what people here are saying, but that seems so 
sad...

 Doesn't mean the other languages are right either.

That is true, but I'm still unconvinced that making the person's 
program likely to error is better than initializing a number to 
0. Zero is such a fundamental default for so many things. And it 
would be consistent with the other number types.

 If you need a wstring, use a wstring!

 Be aware Microsoft is alone in thinking that UTF-16 was 
 awesome. Everybody else standardized on UTF-8 for Unicode.

I do come from that world, so there is a chance I'm just 
comfortable with it.

Nov 27 2017

ketmar <ketmar ketmar.no-ip.org> writes:

A Guy With an Opinion wrote:

 That is true, but I'm still unconvinced that making the person's program 
 likely to error is better than initializing a number to 0. Zero is such a 
 fundamental default for so many things. And it would be consistent with 
 the other number types.

basically, default initializers aren't meant to give a "usable value", they 
meant to give a *defined* value, so we don't have UB. that is, just 
initialize your variables explicitly, don't rely on defaults. writing:

	int a;
	a += 42;

is still bad code, even if you're know that `a` is guaranteed to be zero.

	int a = 0;
	a += 42;

is the "right" way to write it.

if you'll look at default values from this PoV, you'll see that NaN has 
more sense that zero. if there was a NaN for ints, ints would be inited 
with it too. ;-)

Nov 27 2017

A Guy With an Opinion <aguywithanopinion gmail.com> writes:

On Tuesday, 28 November 2017 at 04:12:14 UTC, ketmar wrote:
 A Guy With an Opinion wrote:

 That is true, but I'm still unconvinced that making the 
 person's program likely to error is better than initializing a 
 number to 0. Zero is such a fundamental default for so many 
 things. And it would be consistent with the other number types.

 basically, default initializers aren't meant to give a "usable 
 value", they meant to give a *defined* value, so we don't have 
 UB. that is, just initialize your variables explicitly, don't 
 rely on defaults. writing:

 	int a;
 	a += 42;

 is still bad code, even if you're know that `a` is guaranteed 
 to be zero.

 	int a = 0;
 	a += 42;

 is the "right" way to write it.

 if you'll look at default values from this PoV, you'll see that 
 NaN has more sense that zero. if there was a NaN for ints, ints 
 would be inited with it too. ;-)

Eh...I still don't agree. I think C and C++ just gave that style 
of coding a bad rap due to the undefined behavior. But the issue 
is it was undefined behavior. A lot of language features aim to 
make things well defined and have less verbose representations. 
Once a language matures that's what a big portion of their newer 
features become. Less verbose shortcuts of commonly done things. 
I agree it's important that it's well defined, I'm just thinking 
it should be a value that someone actually wants some notable 
fraction of the time. Not something no one wants ever.

I could be persuaded, but so far I'm not drinking the koolaid on 
that. It's not the end of the world, I was just confused when my 
float was NaN.

Nov 27 2017

A Guy With an Opinion <aguywithanopinion gmail.com> writes:

On Tuesday, 28 November 2017 at 04:17:18 UTC, A Guy With an 
Opinion wrote:
 On Tuesday, 28 November 2017 at 04:12:14 UTC, ketmar wrote:
 A Guy With an Opinion wrote:

 That is true, but I'm still unconvinced that making the 
 person's program likely to error is better than initializing 
 a number to 0. Zero is such a fundamental default for so many 
 things. And it would be consistent with the other number 
 types.

 basically, default initializers aren't meant to give a "usable 
 value", they meant to give a *defined* value, so we don't have 
 UB. that is, just initialize your variables explicitly, don't 
 rely on defaults. writing:

 	int a;
 	a += 42;

 is still bad code, even if you're know that `a` is guaranteed 
 to be zero.

 	int a = 0;
 	a += 42;

 is the "right" way to write it.

 if you'll look at default values from this PoV, you'll see 
 that NaN has more sense that zero. if there was a NaN for 
 ints, ints would be inited with it too. ;-)

 Eh...I still don't agree. I think C and C++ just gave that 
 style of coding a bad rap due to the undefined behavior. But 
 the issue is it was undefined behavior. A lot of language 
 features aim to make things well defined and have less verbose 
 representations. Once a language matures that's what a big 
 portion of their newer features become. Less verbose shortcuts 
 of commonly done things. I agree it's important that it's well 
 defined, I'm just thinking it should be a value that someone 
 actually wants some notable fraction of the time. Not something 
 no one wants ever.

 I could be persuaded, but so far I'm not drinking the koolaid 
 on that. It's not the end of the world, I was just confused 
 when my float was NaN.

Also, C and C++ didn't just have undefined behavior, sometimes it 
has inconsistent behavior. Sometimes int a; is actually set to 0.

Nov 27 2017

codephantom <me noyb.com> writes:

On Tuesday, 28 November 2017 at 04:19:40 UTC, A Guy With an 
Opinion wrote:
 Also, C and C++ didn't just have undefined behavior, sometimes 
 it has inconsistent behavior. Sometimes int a; is actually set 
 to 0.

set to?

Nov 27 2017

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Tuesday, 28 November 2017 at 04:19:40 UTC, A Guy With an 
Opinion wrote:
 On Tuesday, 28 November 2017 at 04:17:18 UTC, A Guy With an 
 Opinion wrote:
 [...]

 Also, C and C++ didn't just have undefined behavior, sometimes 
 it has inconsistent behavior. Sometimes int a; is actually set 
 to 0.

It's only auto variables that are undefined. statics and code 
unit (aka globals) are defined.

Nov 28 2017

ketmar <ketmar ketmar.no-ip.org> writes:

A Guy With an Opinion wrote:

 Eh...I still don't agree.

anyway, it is something that won't be changed, 'cause there may be code 
that rely on current default values.

i'm not really trying to change your mind, i just tried to give a rationale 
behind the choice. that's why `char.init` is 255 too, not zero.

still, explicit variable initialization looks better for me. with default 
init, it is hard to say if the author just forget to initialize a variable, 
and it happens to work, or he knows about the default value and used it. 
and the reader don't have to guess what default value is.

Nov 27 2017

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Tuesday, 28 November 2017 at 04:17:18 UTC, A Guy With an 
Opinion wrote:
 On Tuesday, 28 November 2017 at 04:12:14 UTC, ketmar wrote:
 A Guy With an Opinion wrote:

 That is true, but I'm still unconvinced that making the 
 person's program likely to error is better than initializing 
 a number to 0. Zero is such a fundamental default for so many 
 things. And it would be consistent with the other number 
 types.

 basically, default initializers aren't meant to give a "usable 
 value", they meant to give a *defined* value, so we don't have 
 UB. that is, just initialize your variables explicitly, don't 
 rely on defaults. writing:

 	int a;
 	a += 42;

 is still bad code, even if you're know that `a` is guaranteed 
 to be zero.

 	int a = 0;
 	a += 42;

 is the "right" way to write it.

 if you'll look at default values from this PoV, you'll see 
 that NaN has more sense that zero. if there was a NaN for 
 ints, ints would be inited with it too. ;-)

 Eh...I still don't agree. I think C and C++ just gave that 
 style of coding a bad rap due to the undefined behavior. But 
 the issue is it was undefined behavior. A lot of language 
 features aim to make things well defined and have less verbose 
 representations. Once a language matures that's what a big 
 portion of their newer features become. Less verbose shortcuts 
 of commonly done things. I agree it's important that it's well 
 defined, I'm just thinking it should be a value that someone 
 actually wants some notable fraction of the time. Not something 
 no one wants ever.

 I could be persuaded, but so far I'm not drinking the koolaid 
 on that. It's not the end of the world, I was just confused 
 when my float was NaN.

Just a little anecdote of a maintainer of a legacy project in C. 
My predecessors in that project had the habit of systematically 
initialize any auto declared variable at the beginning of a 
function. The code base that was initiated in the early '90s and 
written by people who were typical BASIC programmer, so the 
consequence of it was that functions were very often hundreds of 
lines long and they all started with a lot of declarations.
In the years of reviewing that code, and I was really surprised 
by that, was how often I found bugs because the variables had 
been wrongly initialised. By initialising with 0 or NULL, the 
data flow pass was essentially suppressed at the start so that it 
could not detect when variables were used before they had been 
properly populated with the right values the functionality 
required. The thing with these kind of bugs was that they were 
very subtle.

To make it short, 0 is an arbitrary number that often is the 
right value but when it isn't, it can be a pain to detect that it 
was the wrong value.

Nov 28 2017

Kagamin <spam here.lot> writes:

On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole 
wrote:
 Be aware Microsoft is alone in thinking that UTF-16 was 
 awesome. Everybody else standardized on UTF-8 for Unicode.

UCS2 was awesome. UTF-16 is used by Java, JavaScript, 
Objective-C, Swift, Dart and ms tech, which is 28% of tiobe index.

Nov 30 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 11/30/2017 9:23 AM, Kagamin wrote:
 On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole wrote:
 Be aware Microsoft is alone in thinking that UTF-16 was awesome. Everybody 
 else standardized on UTF-8 for Unicode.

 
 UCS2 was awesome. UTF-16 is used by Java, JavaScript, Objective-C, Swift, Dart 
 and ms tech, which is 28% of tiobe index.

"was" :-) Those are pretty much pre-surrogate pair designs, or based on them 
(Dart compiles to JavaScript, for example).

UCS2 has serious problems:

1. Most strings are in ascii, meaning UCS2 doubles memory consumption. Strings 
in the executable file are twice the size.

2. The code doesn't work well with C. C doesn't even have a UCS2 type.

3. There's no reasonable way to audit the code to see if it handles surrogate 
pairs correctly. Surrogate pairs occur only rarely, so the code is never tested 
for it, and the bugs may remain latent for many, many years.

With UTF8, multibyte code points are much more common, so bugs are detected
much 
earlier.

Dec 01 2017

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Dec 01, 2017 at 03:04:44PM -0800, Walter Bright via Digitalmars-d wrote:
 On 11/30/2017 9:23 AM, Kagamin wrote:
 On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole wrote:
 Be aware Microsoft is alone in thinking that UTF-16 was awesome.
 Everybody else standardized on UTF-8 for Unicode.

 
 UCS2 was awesome. UTF-16 is used by Java, JavaScript, Objective-C,
 Swift, Dart and ms tech, which is 28% of tiobe index.

 
 "was" :-) Those are pretty much pre-surrogate pair designs, or based
 on them (Dart compiles to JavaScript, for example).
 
 UCS2 has serious problems:
 
 1. Most strings are in ascii, meaning UCS2 doubles memory consumption.
 Strings in the executable file are twice the size.

This is not true in Asia, esp. where the CJK block is extensively used.
A CJK block character is 3 bytes in UTF-8, meaning that string sizes are
150% of the UCS2 encoding.  If your code contains a lot of CJK text,
that's a lot of bloat.

But then again, in non-Latin locales you'd generally store your strings
separately of the executable (usually in l10n files), so this may not be
that big an issue. But the blanket statement "Most strings are in ASCII"
is not correct.


T

-- 
Bare foot: (n.) A device for locating thumb tacks on the floor.

Dec 01 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 12/1/2017 3:16 PM, H. S. Teoh wrote:
 This is not true in Asia, esp. where the CJK block is extensively used.
 A CJK block character is 3 bytes in UTF-8, meaning that string sizes are
 150% of the UCS2 encoding.  If your code contains a lot of CJK text,
 that's a lot of bloat.
 
 But then again, in non-Latin locales you'd generally store your strings
 separately of the executable (usually in l10n files), so this may not be
 that big an issue. But the blanket statement "Most strings are in ASCII"
 is not correct.

Are you sure about that? I know that Asian languages will be longer in UTF-8. 
But how much data that programs handle is in those languages? The language of 
business, science, programming, aviation, and engineering is english.

Of course, D itself is agnostic about that. The compiler, for example, accepts 
strings, identifiers, and comments in Chinese in UTF-16 format.

Dec 02 2017

Jacob Carlborg <doob me.com> writes:

On 2017-12-02 11:02, Walter Bright wrote:

 Are you sure about that? I know that Asian languages will be longer in 
 UTF-8. But how much data that programs handle is in those languages? The 
 language of business, science, programming, aviation, and engineering is 
 english.

Not necessarily. I've seen code in non-English languages, i.e. when the 
identifiers are non-English. But of course, most programming languages 
will using English for keywords and built-in functions.

-- 
/Jacob Carlborg

Dec 02 2017

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Friday, 1 December 2017 at 23:16:45 UTC, H. S. Teoh wrote:
On Fri, Dec 01, 2017 at 03:04:44PM -0800, Walter Bright via
Digitalmars-d wrote:
On 11/30/2017 9:23 AM, Kagamin wrote:
On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki
cattermole wrote:
Be aware Microsoft is alone in thinking that UTF-16 was
awesome. Everybody else standardized on UTF-8 for Unicode.

UCS2 was awesome. UTF-16 is used by Java, JavaScript,
Objective-C, Swift, Dart and ms tech, which is 28% of tiobe
index.

"was" :-) Those are pretty much pre-surrogate pair designs, or
based
on them (Dart compiles to JavaScript, for example).

UCS2 has serious problems:

1. Most strings are in ascii, meaning UCS2 doubles memory
consumption. Strings in the executable file are twice the size.

This is not true in Asia, esp. where the CJK block is
extensively used. A CJK block character is 3 bytes in UTF-8,
meaning that string sizes are 150% of the UCS2 encoding. If
your code contains a lot of CJK text, that's a lot of bloat.

That's true in theory, in practice it's not that severe as the
CJK languages are never isolated and appear embedded in a lot of
ASCII. You can read here a case study [1] which shows 106% for
Simplified Chinese, 76% for Traditional Chinese, 129% for
Japanese and 94% for Korean. These numbers for pure text. Publish
it on the web embedded in bloated html and there goes the size
advantage of UTF-16

But then again, in non-Latin locales you'd generally store your
strings separately of the executable (usually in l10n files),
so this may not be that big an issue. But the blanket statement
"Most strings are in ASCII" is not correct.

False, in the sense that isolated pure text is rare and is
generally delivered inside some file format, most times ASCII
based like docx, odf, tmx, xliff, akoma ntoso etc...

[1]:
https://stackoverflow.com/questions/6883434/at-all-times-text-encoded-in-utf-8-will-never-give-us-more-than-a-50-file-size

Dec 02 2017

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Saturday, 2 December 2017 at 10:35:50 UTC, Patrick Schluter 
wrote:
 On Friday, 1 December 2017 at 23:16:45 UTC, H. S. Teoh wrote:
 [...]

 That's true in theory, in practice it's not that severe as the 
 CJK languages are never isolated and appear embedded in a lot 
 of ASCII. You can read here a case study [1] which shows 106% 
 for Simplified Chinese, 76% for Traditional Chinese, 129% for 
 Japanese and 94% for Korean. These numbers for pure text.

106% for Korean, copied the wrong column. Traditiojal Chinese was 
smaller, probably because of whitespaces.

 Publish it on the web embedded in bloated html and there goes 
 the size advantage of UTF-16

 [...]

Dec 02 2017

Joakim <dlang joakim.fea.st> writes:

On Friday, 1 December 2017 at 23:16:45 UTC, H. S. Teoh wrote:
 On Fri, Dec 01, 2017 at 03:04:44PM -0800, Walter Bright via 
 Digitalmars-d wrote:
 On 11/30/2017 9:23 AM, Kagamin wrote:
 On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki 
 cattermole wrote:
 Be aware Microsoft is alone in thinking that UTF-16 was 
 awesome. Everybody else standardized on UTF-8 for Unicode.

 
 UCS2 was awesome. UTF-16 is used by Java, JavaScript, 
 Objective-C, Swift, Dart and ms tech, which is 28% of tiobe 
 index.

 
 "was" :-) Those are pretty much pre-surrogate pair designs, or 
 based
 on them (Dart compiles to JavaScript, for example).
 
 UCS2 has serious problems:
 
 1. Most strings are in ascii, meaning UCS2 doubles memory 
 consumption. Strings in the executable file are twice the size.

 This is not true in Asia, esp. where the CJK block is 
 extensively used. A CJK block character is 3 bytes in UTF-8, 
 meaning that string sizes are 150% of the UCS2 encoding.  If 
 your code contains a lot of CJK text, that's a lot of bloat.

Yep, that's why five years back many of the major Chinese sites 
were still not using UTF-8:

http://xahlee.info/w/what_encoding_do_chinese_websites_use.html

That led that Chinese guy to also rant against UTF-8 a couple 
years ago:

http://xahlee.info/comp/unicode_utf8_encoding_propaganda.html

Considering China buys more smartphones than the US and Europe 
combined, it's time people started recognizing their importance 
when it comes to issues like this:

https://www.statista.com/statistics/412108/global-smartphone-shipments-global-region/

Regarding the unique representation issue Jonathan brings up, 
I've heard people say that was to provide an easier path for 
legacy encodings, ie some used combining characters and others 
didn't, so Unicode chose to accommodate both so both groups would 
move to Unicode.  It would be nice if the Unicode people spent 
their time pruning and regularizing what they have, rather than 
adding more useless stuff.

Speaking of which, completely agree with Walter and Jonathan that 
there's no need to add emoji and other such symbols to Unicode, 
should have never been added.  Unicode is supposed to standardize 
long-existing characters, not promote marginal new symbols to 
characters.  If there's a real need for it, chat software will 
figure out a way to do it, no need to add such symbols to the 
Unicode character set.

Dec 02 2017

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Saturday, 2 December 2017 at 22:16:09 UTC, Joakim wrote:
 On Friday, 1 December 2017 at 23:16:45 UTC, H. S. Teoh wrote:
 On Fri, Dec 01, 2017 at 03:04:44PM -0800, Walter Bright via 
 Digitalmars-d wrote:
 On 11/30/2017 9:23 AM, Kagamin wrote:
 On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki 
 cattermole wrote:
 Be aware Microsoft is alone in thinking that UTF-16 was 
 awesome. Everybody else standardized on UTF-8 for Unicode.

 
 UCS2 was awesome. UTF-16 is used by Java, JavaScript, 
 Objective-C, Swift, Dart and ms tech, which is 28% of tiobe 
 index.

 
 "was" :-) Those are pretty much pre-surrogate pair designs, 
 or based
 on them (Dart compiles to JavaScript, for example).
 
 UCS2 has serious problems:
 
 1. Most strings are in ascii, meaning UCS2 doubles memory 
 consumption. Strings in the executable file are twice the 
 size.

 This is not true in Asia, esp. where the CJK block is 
 extensively used. A CJK block character is 3 bytes in UTF-8, 
 meaning that string sizes are 150% of the UCS2 encoding.  If 
 your code contains a lot of CJK text, that's a lot of bloat.

 Yep, that's why five years back many of the major Chinese sites 
 were still not using UTF-8:

 http://xahlee.info/w/what_encoding_do_chinese_websites_use.html

Summary

Taiwan sites almost all use UTF-8. Very old ones still use BIG5.

Mainland China sites mostly still use GBK or GB2312, but a few 
newer ones use UTF-8.

Many top Japan, Korea, sites also use UTF-8, but some uses EUC 
(Extended Unix Code) variants.

This probably means that UTF-8 might dominate in the future.

mmmh
 That led that Chinese guy to also rant against UTF-8 a couple 
 years ago:

 http://xahlee.info/comp/unicode_utf8_encoding_propaganda.html

A rant from someone reproaching a video it doesn't provide 
reasons why utf-8 is good by not providing any reasons why utf-8 
is bad. I'm not denying the issues with utf-8, only that the 
ranter doesn't provide any useful info on what the issues the 
"Asian" encounter with it, besides legacy reasons (which are 
important but do not enter in judging the technical quality of an 
encoding).
Add to that that he advocates for GB18030 which is quite inferior 
to utf-8 except in the legacy support area (here some of the 
advantages of utf-8 that GB-18030 does not possess: 
auto-synchronization, algorithmic mapping of codepoints, error 
detection).
If his only beef with utf-8 is the size for CJK text then he 
shouldn't argue for UTF-32 as he seems to do at the end.

Dec 03 2017

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/2/17 5:16 PM, Joakim wrote:
 Yep, that's why five years back many of the major Chinese sites were 
 still not using UTF-8:
 
 http://xahlee.info/w/what_encoding_do_chinese_websites_use.html
 
 That led that Chinese guy to also rant against UTF-8 a couple years ago:
 
 http://xahlee.info/comp/unicode_utf8_encoding_propaganda.html

BTW has anyone been in contact with Xah Lee? Perhaps we could commission 
him to write some tutorial material for D. -- Andrei

Dec 04 2017

Joakim <dlang joakim.fea.st> writes:

On Monday, 4 December 2017 at 21:23:51 UTC, Andrei Alexandrescu 
wrote:
 On 12/2/17 5:16 PM, Joakim wrote:
 Yep, that's why five years back many of the major Chinese 
 sites were still not using UTF-8:
 
 http://xahlee.info/w/what_encoding_do_chinese_websites_use.html
 
 That led that Chinese guy to also rant against UTF-8 a couple 
 years ago:
 
 http://xahlee.info/comp/unicode_utf8_encoding_propaganda.html

 BTW has anyone been in contact with Xah Lee? Perhaps we could 
 commission him to write some tutorial material for D. -- Andrei

I traded email with him last summer, emailed you his email 
address just now.

Dec 04 2017

Adam D. Ruppe <destructionator gmail.com> writes:

On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an 
Opinion wrote:
 - Some of the errors from DMD are a little strange.

Yes, indeed, and many of them don't help much in finding the real 
source of your problem. I think improvements to dmd's error 


 - ...however, where are all of the collections? No Queue? No 
 Stack? No HashTable?

I always say "meh" to that because any second year student can 
slap those together in... well, for a second year student, maybe 
a couple hours for the student, but after that you're looking at 
just a few minutes, especially leveraging D's built in arrays and 
associative arrays as your foundation.

Sure, they'd be nice to have, but it isn't a dealbreaker in the 
slightest.

Try turning Dictionary<string, string> into D's string[string], 
for example.

 Sometimes I want where it is physically on my file system to be 
 different from how I include it in other source files.

This is a common misconception, though one promoted by several of 
the tools: you don't actually need to match file system layout to 
modules.

OK, sure, D does require one module == one file. But the file 
name and location is not actually tied to the import name you use 
in code. They can be anything, you just need to pass the list of 
files to the compiler so it can parse them and figure out the 
names.

 - Attributes. I had another post in the Learn forum about 
 attributes which was unfortunate.

Yeah, of course, from my post there you know my basic opinion on 
them. I've written in more detail about them elsewhere and don't 
feel like it tonight, but I think they are a big failure right 
now.... but they could be fixed if we're willing to take a few 



are)

 - Immutable. I'm not sure I fully understand it. On the surface 
 it seemed like const but transitive.

const is transitive too. So the difference is really that `const` 
means YOU won't change it, whereas `immutable` means NOBODY will 
change it.

What's important there is that to make something immutable, you 
need to prove to the compiler's satisfaction that nobody else can 
change it either.

const/immutable in D isn't as common as in its family of 
languages (C++ notably), but when you do get to use it - at least 
once you get to know it - it is useful.

 I was returning a value type, so I don't see why passing in 
 assert(object.errorCount == 0) would have triggered errors.

Was the object itself immutable? I suspect you wrote something 
like this:

immutable int errorCount() { return ...; }


But this is a curious syntax... the `immutable` there actually 
applies to the *object*, not the return value! It means you can 
call this method on an immutable object (in fact, it means you 
MUST call it on an immutable object. const is the middle ground 
that allows you to call it on either)


immutable(int) errorCount() { return ...; }

note the parens, is how you apply it to the return value. Yes, 
this is kinda weird, and style guides tend to suggest putting the 
qualifiers after the argument list for the `this` thing instead 
of before... but the language allows it before, so it trips up a 
LOT of people like this.

 The type string seems to be an immutable(char[]) which works 
 exactly the way I was expecting,

It is actually `immutable(char)[]`. The parens are important here 
- it applies to the contents of the array, but not the array 
itself here.

 +- Unicode support is good. Although I think D's string type 
 should have probably been utf16 by default. Especially 
 considering the utf module states:

Note that it has UTF-16 built in as well, with almost equal 
support. Put `w` at the end of a literal:

`"this literal is UTF-16"w` // notice the w after the "

and you get utf16. It considers that to be `wstring` instead of 
`string`, but it works basically the same.

If you are doing a lot of Windows API work, this is pretty useful!

 That was cool, and I'm not even sure if that is what they were 
 really meant to enable.

yes, indeed. plugging my book 
https://www.packtpub.com/application-development/d-cookbook i 
talk about much of this stuff in there

Nov 27 2017

A Guy With an Opinion <aguywithanopinion gmail.com> writes:

On Tuesday, 28 November 2017 at 04:24:46 UTC, Adam D. Ruppe wrote:
 immutable(int) errorCount() { return ...; }

I actually did try something like that, because I remembered 
seeing the parens around the string definition. I think at that 
point I was just so riddled with errors I just took a step back 
and went back to something I know. Just to make sure I wasn't 
going insane.

Nov 27 2017

Michael V. Franklin <slavo5150 yahoo.com> writes:

On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an 
Opinion wrote:

 + D code so far is pushing me towards more "flat" code (for a 
 lack of a better way to phrase it) and so far that has helped 

 opposite. With it's namespace -> class -> method coupled with 
 lock, using, etc...you tend to do a lot of nesting. You are 
 generally 3 '{' in before any true logic even begins. Then 
 couple that with try/catch, IDisposable/using, locking, and 
 then if/else, it can get quite chaotic very easily. So right 

 translated it and I think it has to do with the flatness. I'm 
 not sure if that opinion will hold when I delve into 'static 
 if' a little more, but so far my uses of it haven't really 
 dampened that opinion.


well, but never really consciously though about it, until you 
mentioned it :-)

 - Some of the errors from DMD are a little strange. I don't 
 want to crap on this too much, because for the most part it's 
 fine. However occasionally it throws errors I still can't 
 really work out why THAT is the error it gave me. Some of you 
 may have saw my question in the "Learn" forum about not knowing 
 to use static in an embedded class, but the error was the 
 following:

 Error: 'this' is only defined in non-static member functions

Please submit things like this to the issue tracker.  They are 
very easy to fix, and if I'm aware of them, I'll probably do the 
work.  But, please provide a code example and offer a suggestion 
of what you would prefer it to say; it just makes things easier.

 - Modules. I like modules better than #include, but I don't 

 like how there is this gravity that kind of pulls me to 
 associate a module with a file. It appears you don't have to, 
 because I can do the package thing, but whenever I try to do 
 things outside that one idiom I end up in a soup of errors. I'm 
 sure I'm just not use to it, but so far it's been a little 
 dissatisfying. Sometimes I want where it is physically on my 
 file system to be different from how I include it in other 

 beat or meet.

I feel the same.  I don't like that modules are tied to files; it 
seems like such an arbitrary limitation.  We're not alone:  
https://youtu.be/6_xdfSVRrKo?t=353

 - Attributes. I had another post in the Learn forum about 
 attributes which was unfortunate. At first I was excited 
 because it seems like on the surface it would help me write 
 better code, but it gets a little tedious and tiresome to have 
 to remember to decorate code with them. It seems like most of 
 them should have been the defaults. I would have preferred if 
 the compiler helped me and reminded me. I asked if there was a 
 way to enforce them globally, which I guess there is, but I 
 guess there's also not a way to turn some of them off 
 afterwards. A bit unfortunate. But at least I can see some 
 solutions to this.

Yep.  One of my pet peeves in D.

 - The defaults for primitives seem off. They seem to encourage 
 errors. I don't think that is the best design decision even if 
 it encourages the errors to be caught as quickly as possible. I 
 think the better decision would be to not have the errors 
 occur. When I asked about this, there seemed to be a 
 disassociation between the spec and the implementation. The 
 spec says a declaration should error if not explicitly set, but 
 the implementation just initializes them to something that is 
 likely to error. Like NaN for floats which I would have thought 
 would have been 0 based on prior experiences with other 
 languages.

Another one of my pet peeves in D.  Though this post 
(http://forum.dlang.org/post/tcldaatzzbhjoamnvniu forum.dlang.org) made me
realize we might be able to do something about that.

 +- Unicode support is good. Although I think D's string type 
 should have probably been utf16 by default. Especially 
 considering the utf module states:

 "UTF character support is restricted to '\u0000' <= character 
 <= '\U0010FFFF'."

See http://utf8everywhere.org/

 + Templates seem powerful. I've only fiddled thus far, but I 
 don't think I've quite comprehended their usefulness yet. It 
 will probably take me some time to figure out how to wield them 
 effectively. One thing I accidentally stumbled upon that I 
 liked was that I could simulate inheritance in structs with 
 them, by using the mixin keyword. That was cool, and I'm not 
 even sure if that is what they were really meant to enable.

Templates, CTFE, and mixins are gravy! and D's the only language 
I know of that has this symbiotic feature set.

 So those are just some of my thoughts. Tell me why I'm wrong :P

I share much of your perspective.  Thanks for the interesting 
read.

Mike

Nov 27 2017

A Guy With an Opinion <aguywithanopinion gmail.com> writes:

On Tuesday, 28 November 2017 at 04:37:04 UTC, Michael V. Franklin 
wrote:
 Please submit things like this to the issue tracker.  They are 
 very easy to fix, and if I'm aware of them, I'll probably do 
 the work.  But, please provide a code example and
 offer a suggestion of what you would prefer it to say; it just 
 makes things easier.>

I'd be happy to submit an issue, but I'm not quite sure I'd be 
the best to determine an error message (at least not this early). 
Mainly because I have no clue what it was yelling at me about. I 
only new to add static because I told people my intentions and 
they suggested it. I guess having a non statically marked class 
is a valid feature imported from Java world. I'm just not as 
familiar with that specific feature of Java. Therefore I have no 
idea what the text really had to do with anything. Maybe 
appending "if you meant to make a static class" would have been 
helpful. I fiddled with Rust a little too, and it's what they 
tend to do very well. Make verbose error messages.

 We're not alone:  https://youtu.be/6_xdfSVRrKo?t=353

And he was so much better at articulating it than I was. Another

Nov 27 2017

Michael V. Franklin <slavo5150 yahoo.com> writes:

On Tuesday, 28 November 2017 at 04:48:57 UTC, A Guy With an 
Opinion wrote:

 I'd be happy to submit an issue, but I'm not quite sure I'd be 
 the best to determine an error message (at least not this 
 early). Mainly because I have no clue what it was yelling at me 
 about. I only new to add static because I told people my 
 intentions and they suggested it. I guess having a non 
 statically marked class is a valid feature imported from Java 
 world.

If this was on the forum, please point me to it.  I'll see if I 
can understand what's going on and do something about it.

Thanks,
Mike

Nov 27 2017

A Guy With an Opinion <aguywithanopinion gmail.com> writes:

On Tuesday, 28 November 2017 at 05:16:54 UTC, Michael V. Franklin 
wrote:
 On Tuesday, 28 November 2017 at 04:48:57 UTC, A Guy With an 
 Opinion wrote:

 I'd be happy to submit an issue, but I'm not quite sure I'd be 
 the best to determine an error message (at least not this 
 early). Mainly because I have no clue what it was yelling at 
 me about. I only new to add static because I told people my 
 intentions and they suggested it. I guess having a non 
 statically marked class is a valid feature imported from Java 
 world.

 If this was on the forum, please point me to it.  I'll see if I 
 can understand what's going on and do something about it.

 Thanks,
 Mike

https://forum.dlang.org/thread/vcvlffjxowgdvpvjsijq forum.dlang.org

Nov 27 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 11/27/17 10:01 PM, A Guy With an Opinion wrote:
 Hi,

Hi Guy, welcome, and I wanted to say I was saying "me too" while reading 

years, and the biggest thing I agree with you on is the generic 
programming. I was also using D at the time, and using generics felt 
like eating a superbly under-baked cake.

A few points:

 - Some of the errors from DMD are a little strange. I don't want to crap 
 on this too much, because for the most part it's fine. However 
 occasionally it throws errors I still can't really work out why THAT is 
 the error it gave me. Some of you may have saw my question in the 
 "Learn" forum about not knowing to use static in an embedded class, but 
 the error was the following:
 
 Error: 'this' is only defined in non-static member functions

Yes, this is simply a bad error message. Many of our bad error messages 
come from something called "lowering", where one piece of code is 
converted to another piece of code, and then the error message happens 
on the converted code. So essentially you are getting errors on code you 
didn't write!

They are more difficult to fix, since we can't change the real error 
message (it applies to real code as well), and the code that generated 
the lowered code is decoupled from the error. I think this is one of 
those cases.

 I'd say the errors so far are above some of the cryptic stuff C++ can 
 throw at you (however, I haven't delved that deeply into D templates 
 yet, so don't hold me to this yet), but in terms of quality I'd put it 


Once you use templates a lot, the error messages explode in cryptology 
:) But generally, you can get the gist of your errors if you can 
decipher half-way the mangling.

 - ...however, where are all of the collections? No Queue? No Stack? No 
 HashTable? I've read that it's not a big focus because some of the built 


 not looking forward to having to hand roll my own or use something that 
 aren't fundamentally them. This is definitely the biggest negative I've 
 come across. I want a queue, not something that *can* behave as a queue. 
 I definitely expected more from a language that is this old.

I haven't touched this in years, but it should still work pretty well 
(if you try it and it doesn't compile for some reason, please submit an 
issue there): https://github.com/schveiguy/dcollections


interface hierarchy.

That being said, Queue is just so easy to implement given a linked list, 
I never bothered :)

 + Unit tests. Finally built in unit tests. Enough said here. If the lack 
 of collections was the biggest negative, this is the biggest positive. I 
 would like to enable them at build time if possible though.

+1000

About the running of unit tests at build time, many people version their 
main function like this:

version(unittest) void main() {}
else

int main(string[] args) // real declaration
{ ... }

This way, when you build with -unittest, you only run unit tests, and 
exit immediately. So enabling them at build time is quite easy.

 - Attributes. I had another post in the Learn forum about attributes 
 which was unfortunate. At first I was excited because it seems like on 
 the surface it would help me write better code, but it gets a little 
 tedious and tiresome to have to remember to decorate code with them. It 
 seems like most of them should have been the defaults. I would have 
 preferred if the compiler helped me and reminded me. I asked if there 
 was a way to enforce them globally, which I guess there is, but I guess 
 there's also not a way to turn some of them off afterwards. A bit 
 unfortunate. But at least I can see some solutions to this.

If you are using more templates (and I use them the more I write D 
code), you will not have this problem. Templates infer almost all 
attributes.

 - Immutable. I'm not sure I fully understand it. On the surface it 
 seemed like const but transitive. I tried having a method return an 
 immutable value, but when I used it in my unit test I got some weird 
 errors about objects not being able to return immutable (I forget the 
 exact error...apologies). I refactored to use const, and it all worked 
 as I expected, but I don't get why the immutable didn't work. I was 
 returning a value type, so I don't see why passing in 
 assert(object.errorCount == 0) would have triggered errors. But it did. 

This is likely because of Adam's suggestion -- you were incorrectly 
declaring a function that returned an immutable like this:

immutable T foo();

Where the immutable *doesn't* apply to the return value, but to the 
function itself. immutable applied to a function is really applying 
immutable to the 'this' reference.

 + Templates seem powerful. I've only fiddled thus far, but I don't think 
 I've quite comprehended their usefulness yet. It will probably take me 
 some time to figure out how to wield them effectively. One thing I 
 accidentally stumbled upon that I liked was that I could simulate 
 inheritance in structs with them, by using the mixin keyword. That was 
 cool, and I'm not even sure if that is what they were really meant to 
 enable.

Templates and generative programming is what hooks you on D. You will be 
spoiled when you work on other languages :)

-Steve

Nov 28 2017

A Guy With an Opinion <aguywithanopinion gmail.com> writes:

On Tuesday, 28 November 2017 at 13:17:16 UTC, Steven 
Schveighoffer wrote:
 This is likely because of Adam's suggestion -- you were 
 incorrectly declaring a function that returned an immutable 
 like this:

 immutable T foo();

 -Steve

That's exactly what it was I think. As I stated before, I tried 
to do immutable(T) but I was drowning in errors at that point 
that I just took a step back. I'll try to refactor it back to 
using immutable. I just honestly didn't quite know what I was 
doing obviously.

Nov 28 2017

A Guy With an Opinion <aguywithanopinion gmail.com> writes:

On Tuesday, 28 November 2017 at 13:17:16 UTC, Steven 
Schveighoffer wrote:
 https://github.com/schveiguy/dcollections

On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole 
wrote:
 https://github.com/economicmodeling/containers

Thanks. I'll check both out. It's not that I don't want to write 
them, it's just I don't want to stop what I'm doing when I need 
them and write them. It takes me out of my thought process.

Nov 28 2017

Guillaume Piolat <contact spam.com> writes:

On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an 
Opinion wrote:
 So those are just some of my thoughts. Tell me why I'm wrong :P

You are not supposed to come to this forum with well-balanced 
opinions and reasonable arguments. It's not colourful enough to 
be heard!

Instead make a dent in the universe. Prepare your most impactful, 
most offensive statements to push your personal agenda of what 
your own system programming language would be like, if you had 
the stamina. Use doubtful analogies and references to languages 
with wildly different goals than D. Prepare to abuse the 
volunteers, and say how much you would dare to use D, if only it 
would do "just this one obvious change". Having this feature 
would make the BlobTech industry switch to D overnight!

And you haven't asked for any new feature, especially no new 
_syntax_ were demanded! I don't know, find anything:

"It would be nice to have a shortcut syntax for when you wan't to 
add zero. Writing 0 + x is cumbersome, when +x would do it. It 
has the nice benefit or unifying unary and binary operators, and 
thus leads to a simplified implementation."

Do you realize the dangers of looking satisfied?

Nov 28 2017

Jack Stouffer <jack jackstouffer.com> writes:

On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an 
Opinion wrote:
 - Attributes. I had another post in the Learn forum about 
 attributes which was unfortunate. At first I was excited 
 because it seems like on the surface it would help me write 
 better code, but it gets a little tedious and tiresome to have 
 to remember to decorate code with them. It seems like most of 
 them should have been the defaults. I would have preferred if 
 the compiler helped me and reminded me. I asked if there was a 
 way to enforce them globally, which I guess there is, but I 
 guess there's also not a way to turn some of them off 
 afterwards. A bit unfortunate. But at least I can see some 
 solutions to this.

Attributes were one of my biggest hurdles when working on my own 
projects. For example, it's a huge PITA when you have to add a 
debug writeln deep down in your call stack, and it ends up 
violating a bunch of function attributes further up. Thankfully, 
wrapping statements in debug {} allows you to ignore pure and 
 safe violations in that code if you compile with the flag -debug.

Also, you can apply attributes to your whole project by adding 
them to main

void main(string[] args)  safe {}

Although this isn't recommended, as almost no program can be 
completely safe. You can do it on a per-file basis by putting the 
attributes at the top like so

 safe:
pure:

Nov 28 2017

Adam D. Ruppe <destructionator gmail.com> writes:

On Tuesday, 28 November 2017 at 16:14:52 UTC, Jack Stouffer wrote:
 You can do it on a per-file basis by putting the attributes at 
 the top like so

That doesn't quite work since it doesn't descend into aggregates. 
And you can't turn most them off.

Nov 28 2017

Jacob Carlborg <doob me.com> writes:

On 2017-11-28 17:24, Adam D. Ruppe wrote:

 That doesn't quite work since it doesn't descend into aggregates. And 
 you can't turn most them off.

And if your project is a library.

-- 
/Jacob Carlborg

Nov 28 2017

A Guy With an Opinion <aguywithanopinion gmail.com> writes:

On Tuesday, 28 November 2017 at 16:24:56 UTC, Adam D. Ruppe wrote:
 That doesn't quite work since it doesn't descend into 
 aggregates. And you can't turn most them off.

I take it adding those inverse attributes is no trivial thing?

Nov 28 2017

Michael V. Franklin <slavo5150 yahoo.com> writes:

On Tuesday, 28 November 2017 at 19:34:27 UTC, A Guy With an 
Opinion wrote:

 I take it adding those inverse attributes is no trivial thing?

It would require a DIP: https://github.com/dlang/DIPs

This DIP is related 
(https://github.com/dlang/DIPs/blob/master/DIPs/DIP1012.md) but I 
don't know what's happening with it.

Mike

Nov 28 2017

Mike Parker <aldacron gmail.com> writes:

On Tuesday, 28 November 2017 at 19:39:19 UTC, Michael V. Franklin 
wrote:

 This DIP is related 
 (https://github.com/dlang/DIPs/blob/master/DIPs/DIP1012.md) but 
 I don't know what's happening with it.

It's awaiting formal review. I'll move it forward when the formal 
review queue clears out a bit.

Nov 28 2017

A Guy With a Question <aguywithanquestion gmail.com> writes:

On Tuesday, 28 November 2017 at 22:08:48 UTC, Mike Parker wrote:
 On Tuesday, 28 November 2017 at 19:39:19 UTC, Michael V. 
 Franklin wrote:

 This DIP is related 
 (https://github.com/dlang/DIPs/blob/master/DIPs/DIP1012.md) 
 but I don't know what's happening with it.

 It's awaiting formal review. I'll move it forward when the 
 formal review queue clears out a bit.

How well does phobos play with it? I'm finding, for instance, 
it's not playing too well with nothrow. Things throw that I don't 
understand why.

Nov 29 2017

Adam D. Ruppe <destructionator gmail.com> writes:

On Tuesday, 28 November 2017 at 19:34:27 UTC, A Guy With an 
Opinion wrote:
 I take it adding those inverse attributes is no trivial thing?

Technically, it is extremely trivial.

Politically, that's a different matter. There's been arguments 
before about the words or the syntax (is it " gc" or 
" nogc(false)", for example? tbh i think the latter is kinda 
elegant, but the former works too, i just want something that 
work) and the process (so much paperwork!) and all kinds of 
nonsense.

Nov 28 2017

Dukc <ajieskola gmail.com> writes:

On Tuesday, 28 November 2017 at 16:14:52 UTC, Jack Stouffer wrote:

 you can apply attributes to your whole project by adding them 
 to main

 void main(string[] args)  safe {}

 Although this isn't recommended, as almost no program can be 
 completely safe.

In fact I believe it is. When you have something unsafe you can 
manually wrap it with  trusted. Same goes with nothrow, since you 
can catch everything thrown.

But putting  nogc to main is of course not recommended except in 
special cases, and pure is competely out of question.

Nov 30 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 11/27/2017 7:01 PM, A Guy With an Opinion wrote:
 +- Unicode support is good. Although I think D's string type should have 
 probably been utf16 by default. Especially considering the utf module states:
 
 "UTF character support is restricted to '\u0000' <= character <= '\U0010FFFF'."
 
 Seems like the natural fit for me. Plus for the vast majority of use cases I
am 
 pretty guaranteed a char = codepoint. Not the biggest issue in the world and 
 maybe I'm just being overly critical here.

Sooner or later your code will exhibit bugs if it assumes that char==codepoint 
with UTF16, because of surrogate pairs.

https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java

As far as I can tell, pretty much the only users of UTF16 are Windows programs. 
Everyone else uses UTF8 or UCS32.

I recommend using UTF8.

Nov 30 2017

Joakim <dlang joakim.fea.st> writes:

On Thursday, 30 November 2017 at 10:19:18 UTC, Walter Bright 
wrote:
 On 11/27/2017 7:01 PM, A Guy With an Opinion wrote:
 +- Unicode support is good. Although I think D's string type 
 should have probably been utf16 by default. Especially 
 considering the utf module states:
 
 "UTF character support is restricted to '\u0000' <= character 
 <= '\U0010FFFF'."
 
 Seems like the natural fit for me. Plus for the vast majority 
 of use cases I am pretty guaranteed a char = codepoint. Not 
 the biggest issue in the world and maybe I'm just being overly 
 critical here.

 Sooner or later your code will exhibit bugs if it assumes that 
 char==codepoint with UTF16, because of surrogate pairs.

 https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java

 As far as I can tell, pretty much the only users of UTF16 are 
 Windows programs. Everyone else uses UTF8 or UCS32.

 I recommend using UTF8.

Java, .NET, Qt, Javascript, and a handful of others use UTF-16 
too, some starting off with the earlier UCS-2:

https://en.m.wikipedia.org/wiki/UTF-16#Usage

Not saying either is better, each has their flaws, just pointing 
out it's more than just Windows.

Nov 30 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 11/30/2017 2:39 AM, Joakim wrote:
 Java, .NET, Qt, Javascript, and a handful of others use UTF-16 too, some 
 starting off with the earlier UCS-2:
 
 https://en.m.wikipedia.org/wiki/UTF-16#Usage
 
 Not saying either is better, each has their flaws, just pointing out it's more 
 than just Windows.

I stand corrected.

Nov 30 2017

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Thursday, November 30, 2017 03:37:37 Walter Bright via Digitalmars-d 
wrote:
 On 11/30/2017 2:39 AM, Joakim wrote:
 Java, .NET, Qt, Javascript, and a handful of others use UTF-16 too, some
 starting off with the earlier UCS-2:

 https://en.m.wikipedia.org/wiki/UTF-16#Usage

 Not saying either is better, each has their flaws, just pointing out
 it's more than just Windows.

 I stand corrected.

I get the impression that the stuff that uses UTF-16 is mostly stuff that
picked an encoding early on in the Unicode game and thought that they picked
one that guaranteed that a code unit would be an entire character. Many of
them picked UCS-2 and then switched later to UTF-16, but once they picked a
16-bit encoding, they were kind of stuck.

Others - most notably C/C++ and the *nix world - picked UTF-8 for backwards
compatibility, and once it became clear that UCS-2 / UTF-16 wasn't going to
cut it for a code unit representing a character, most stuff that went
Unicode went UTF-8.

Language-wise, I think that most of the UTF-16 is driven by the fact that

copying Java and because the Win32 API had gone with UCS-2 / UTF-16). So,
that's had a lot of influence on folks, though most others have gone with
UTF-8 for backwards compatibility and because it typically takes up less

does seem to have resulted in some folks thinking that wide characters means
Unicode, and narrow characters meaning ASCII.

I really wish that everything would just got to UTF-8 and that UTF-16 would
die, but that would just break too much code. And if we were willing to do
that, I'm sure that we could come up with a better encoding than UTF-8 (e.g.
getting rid of Unicode normalization as being a thing and never having
multiple encodings for the same character), but _that_'s never going to
happen.

- Jonathan M Davis

Nov 30 2017

A Guy With a Question <aguywithanquestion gmail.com> writes:

On Thursday, 30 November 2017 at 17:56:58 UTC, Jonathan M Davis 
wrote:
 On Thursday, November 30, 2017 03:37:37 Walter Bright via 
 Digitalmars-d wrote:
 On 11/30/2017 2:39 AM, Joakim wrote:
 Java, .NET, Qt, Javascript, and a handful of others use 
 UTF-16 too, some starting off with the earlier UCS-2:

 https://en.m.wikipedia.org/wiki/UTF-16#Usage

 Not saying either is better, each has their flaws, just 
 pointing out it's more than just Windows.

 I stand corrected.

 I get the impression that the stuff that uses UTF-16 is mostly 
 stuff that picked an encoding early on in the Unicode game and 
 thought that they picked one that guaranteed that a code unit 
 would be an entire character.

I don't think that's true though. Haven't you always been able to 
combine two codepoints into one visual representation (Ä for 
example). To me it's still two characters to look for when going 
through the string, but the UI or text interpreter might choose 
to combine them. So in certain domains, such as trying to 
visually represent the character, yes a codepoint is not a 
character, if by what you mean by character is the visual 
representation. But what we are referring to as a character can 
kind of morph depending on context. When you are running through 
the data though in the algorithm behind the scenes, you care 
about the *information* therefore the codepoint. And we are 
really just have a semantics battle if someone calls that a 
character.

 Many of them picked UCS-2 and then switched later to UTF-16, 
 but once they picked a 16-bit encoding, they were kind of stuck.

 Others - most notably C/C++ and the *nix world - picked UTF-8 
 for backwards compatibility, and once it became clear that 
 UCS-2 / UTF-16 wasn't going to cut it for a code unit 
 representing a character, most stuff that went Unicode went 
 UTF-8.

That's only because C used ASCII and thus was a byte. UTF-8 is 
inline with this, so literally nothing needs to change to get 
pretty much the same behavior. It makes sense. With this this in 
mind, it actually might make sense for D to use it.

Nov 30 2017

A Guy With a Question <aguywithanquestion gmail.com> writes:

On Thursday, 30 November 2017 at 17:56:58 UTC, Jonathan M Davis 
wrote:
 On Thursday, November 30, 2017 03:37:37 Walter Bright via 
 Digitalmars-d wrote:
 Language-wise, I think that most of the UTF-16 is driven by the 

 (both because they were copying Java and because the Win32 API 
 had gone with UCS-2 / UTF-16). So, that's had a lot of 
 influence on folks, though most others have gone with UTF-8 for 
 backwards compatibility and because it typically takes up less 
 space for non-Asian text. But the use of UTF-16 in Windows, 

 that wide characters means Unicode, and narrow characters 
 meaning ASCII.

 - Jonathan M Davis

I think it also simplifies the logic. You are not always looking 
to represent the codepoints symbolically. You are just trying to 
see what information is in it. Therefore, if you can practically 
treat a codepoint as the unit of data behind the scenes, it 
simplifies the logic.

Nov 30 2017

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Thursday, November 30, 2017 18:32:46 A Guy With a Question via 
Digitalmars-d wrote:
 On Thursday, 30 November 2017 at 17:56:58 UTC, Jonathan M Davis

 wrote:
 On Thursday, November 30, 2017 03:37:37 Walter Bright via
 Digitalmars-d wrote:
 Language-wise, I think that most of the UTF-16 is driven by the

 (both because they were copying Java and because the Win32 API
 had gone with UCS-2 / UTF-16). So, that's had a lot of
 influence on folks, though most others have gone with UTF-8 for
 backwards compatibility and because it typically takes up less
 space for non-Asian text. But the use of UTF-16 in Windows,

 that wide characters means Unicode, and narrow characters
 meaning ASCII.

 - Jonathan M Davis

 I think it also simplifies the logic. You are not always looking
 to represent the codepoints symbolically. You are just trying to
 see what information is in it. Therefore, if you can practically
 treat a codepoint as the unit of data behind the scenes, it
 simplifies the logic.

Even if that were true, UTF-16 code units are not code points. If you want
to operate on code points, you have to go to UTF-32. And even if you're at
UTF-32, you have to worry about Unicode normalization, otherwise the same
information can be represented differently even if all you care about is
code points and not graphemes. And of course, some stuff really does care
about graphemes, since those are the actual characters.

Ultimately, you have to understand how code units, code points, and
graphemes work and what you're doing with a particular algorithm so that you
know at which level you should operate at and where the pitfalls are. Some
code can operate on code units and be fine; some can operate on code points;
and some can operate on graphemes. But there is no one-size-fits-all
solution that makes it all magically easy and efficient to use.

And UTF-16 does _nothing_ to improve any of this over UTF-8. It's just a
different way to encode code points. And really, it makes things worse,
because it usually takes up more space than UTF-8, and it makes it easier to
miss when you screw up your Unicode handling, because more UTF-16 code units
are valid code points than UTF-8 code units are, but they still aren't all
valid code points. So, if you use UTF-8, you're more likely to catch your
mistakes.

Honestly, I think that the only good reason to use UTF-16 is if you're
interacting with existing APIs that use UTF-16, and even then, I think that
in most cases, you're better off using UTF-8 and converting to UTF-16 only
when you have to. Strings eat less memory that way, and mistakes are more
easily caught. And if you're writing cross-platform code in D, then Windows
is really the only place that you're typically going to have to deal with
UTF-16, so it definitely works better in general to favor UTF-8 in D
programs. But regardless, at least D gives you the tools to deal with the
different Unicode encodings relatively cleanly and easily, so you can use
whichever Unicode encoding you need to. Most D code is going to use UTF-8
though.

- Jonathan M Davis

Nov 30 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 11/30/2017 9:56 AM, Jonathan M Davis wrote:
 I'm sure that we could come up with a better encoding than UTF-8 (e.g.
 getting rid of Unicode normalization as being a thing and never having
 multiple encodings for the same character), but _that_'s never going to
 happen.

UTF-8 is not the cause of that particular problem, it's caused by the Unicode 
committee being a committee. Other Unicode problems are caused by the committee 
trying to add semantic information to code points, which causes nothing but 
problems. I.e. the committee forgot that Unicode is a character set, and
nothing 
more.

Dec 01 2017

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Friday, December 01, 2017 15:54:31 Walter Bright via Digitalmars-d wrote:
 On 11/30/2017 9:56 AM, Jonathan M Davis wrote:
 I'm sure that we could come up with a better encoding than UTF-8 (e.g.
 getting rid of Unicode normalization as being a thing and never having
 multiple encodings for the same character), but _that_'s never going to
 happen.

 UTF-8 is not the cause of that particular problem, it's caused by the
 Unicode committee being a committee. Other Unicode problems are caused by
 the committee trying to add semantic information to code points, which
 causes nothing but problems. I.e. the committee forgot that Unicode is a
 character set, and nothing more.

Oh, definitely. UTF-8 is arguably the best that Unicode has, but Unicode in
general is what's broken, because the folks designing it made poor choices.
And personally, I think that their worst decisions tend to be at the code
point level (e.g. having the same character being representable by different
combinations of code points).

Quite possbily the most depressing thing that I've run into with Unicode
though was finding out that emojis had their own code points. Emojis are
specifically representable by a sequence of existing characters (usually
ASCII), because they came from folks trying to represent pictures with text.
The fact that they're then trying to put those pictures into the Unicode
standard just blatantly shows that the Unicode folks have lost sight of what
they're up to. It's like if they started trying to add Unicode characters
for words. It makes no sense. But unfortunately, we just have to live with
it... :(

- Jonathan M Davis

Dec 01 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 12/1/2017 8:08 PM, Jonathan M Davis wrote:
 And personally, I think that their worst decisions tend to be at the code
 point level (e.g. having the same character being representable by different
 combinations of code points).

Yup. I've presented that point of view a couple times on HackerNews, and some 
Unicode people took umbrage at that. The case they presented fell a little flat.


 Quite possbily the most depressing thing that I've run into with Unicode
 though was finding out that emojis had their own code points. Emojis are
 specifically representable by a sequence of existing characters (usually
 ASCII), because they came from folks trying to represent pictures with text.
 The fact that they're then trying to put those pictures into the Unicode
 standard just blatantly shows that the Unicode folks have lost sight of what
 they're up to. It's like if they started trying to add Unicode characters
 for words. It makes no sense. But unfortunately, we just have to live with
 it... :(

Yah, I've argued against that, too. And those "international" icons are
arguably 
one of the dumber ideas to ever sweep the world, yet they seem to be celebrated 
without question.

Have you ever tried to look up an icon in a dictionary? It doesn't work. So if 
you don't know what an icon means, you're hosed. If it is a word you don't 
understand, you can look it up in a dictionary.

Furthermore, you don't need to know English to know what "ON" means. There is
no 
more cognitive difficulty asking someone what "ON" means than there is asking 
what "|" means. Is an illiterate person from XxLand really going to understand 
that "|" means "ON" without help?

My car has a bunch emoticons labeling the controls. I can't figure out what any 
of them do without reading the manual, or just pushing random buttons until
what 
I want happens. One button has an icon on it that looks like a snowflake. What 
does that do? Turn on the A/C? Defrost the frosty windows? Set the AWD in 
slippery mode? Turn on the Christmas lights?

On my pre-madness truck, they're labeled in English. Never had any trouble with 
that.

Part of the problem I've seen is that people do things like "vote for my 
emoji/icon and I'll vote for yours!" And then when they get something accepted, 
they wear it as a badge of status and write articles saying how you, too, can 
get your whatever accepted as an icon. It's madness, madness I say!

Dec 02 2017

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Saturday, 2 December 2017 at 10:20:10 UTC, Walter Bright wrote:
 On 12/1/2017 8:08 PM, Jonathan M Davis wrote:
 [...]

 Yup. I've presented that point of view a couple times on 
 HackerNews, and some Unicode people took umbrage at that. The 
 case they presented fell a little flat.

 [...]

Where it gets really fun is the when there is color composition 
for emoticons
U+1F466 = 👦
U+1F466 U+1F3FF = 👦🏿

Dec 02 2017

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Sat, Dec 02, 2017 at 02:20:10AM -0800, Walter Bright via Digitalmars-d wrote:
[...]
 My car has a bunch emoticons labeling the controls. I can't figure out
 what any of them do without reading the manual, or just pushing random
 buttons until what I want happens. One button has an icon on it that
 looks like a snowflake. What does that do? Turn on the A/C? Defrost
 the frosty windows?  Set the AWD in slippery mode? Turn on the
 Christmas lights?

The same can be argued for the icon mania started by the GUI craze in
the 90's that has now become the de facto standard.  Some icons are more
obvious than others, but nowadays GUI toolbars are full of inscrutible
icons of unclear meaning that are basically opaque unless you already
have prior knowledge of what they're supposed to represent. Thankfully
most(?) GUI programs have enough sanity left to provide tooltips with
textual labels for what each button means.  Still, it betrays the
emperor's invisible clothes of the "graphics == intuitive" mantra -- you
still have to learn the icons just like you have to learn the keywords
of a text-based UI, before you can use the software effectively.

Reminds me also of the infamous Mystery Meat navigation style of the
90's, where people would use images for navigation weblinks on their
website, that you basically don't know where they're linking to until
you click on it.

This is why I think GUIs and the whole "desktop metaphor" craze is
heading the wrong direction, and why 95% of my computer usage is via a
text terminal. There's a place for graphical interfaces, but it's gone
too far these days.

But thanks to Unicode emoticons, we can now have icons on my text
terminal too, isn't that just wonderful?! Esp. when a
missing/incompatible font causes them to show up as literal blank boxes.
The power of a standardized, universal character set, lemme tell ya!


T

-- 
Almost all proofs have bugs, but almost all theorems are true. -- Paul Pedersen

Dec 02 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 12/2/2017 5:59 PM, H. S. Teoh wrote:
 [...]

Even worse, companies go and copyright their icons, guaranteeing they have to
be 
substantially different for every company!

If there ever was an Emperor's New Clothes, it's icons and emojis.

Dec 02 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 12/2/17 11:28 PM, Walter Bright wrote:
 On 12/2/2017 5:59 PM, H. S. Teoh wrote:
 [...]

 
 Even worse, companies go and copyright their icons, guaranteeing they 
 have to be substantially different for every company!

I like this site for icons. Only requires you to reference them in your 
about box:

https://icons8.com/

-Steve

Dec 04 2017

Kagamin <spam here.lot> writes:

On Sunday, 3 December 2017 at 01:59:58 UTC, H. S. Teoh wrote:
 Still, it betrays the emperor's invisible clothes of the 
 "graphics == intuitive" mantra -- you still have to learn the 
 icons just like you have to learn the keywords of a text-based 
 UI, before you can use the software effectively.

What happened when you ran vi for the first time?

Dec 04 2017

codephantom <me noyb.com> writes:

On Saturday, 2 December 2017 at 04:08:54 UTC, Jonathan M Davis 
wrote:
 The fact that they're then trying to put those pictures into 
 the Unicode standard just blatantly shows that the Unicode 
 folks have lost sight of what they're up to. It's like if they 
 started trying to add Unicode characters for words. It makes no 
 sense. But unfortunately, we just have to live with it... :(

 - Jonathan M Davis

The real problem, is that sometimes people don't feel like a 
little cat with a smiling face. Sometimes, people actually get 
pissed off at something, and would like to express it.

Do the people on the unicode consortium consider such 
communication to be invalid?

Where are the emoji's for saying.. I'm pissed off at this..or 
that..

(unicode consortium == emoji censorship)

https://www.google.com.au/search?q=fuck+you+emoticon&source=lnms&tbm=isch&sa=X&ved=0ahUKEwiWkMzMpOvXAhWIj5QKHVnGC5YQ_AUICigB&biw=1536&bih=736

Dec 02 2017

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Saturday, 2 December 2017 at 12:25:22 UTC, codephantom wrote:
 Do the people on the unicode consortium consider such 
 communication to be invalid?

https://splinternews.com/violent-emoji-are-starting-to-get-people-in-trouble-wit-1793845130

On the other hand try to google "emoji sexual"…

Dec 02 2017

codephantom <me noyb.com> writes:

On Saturday, 2 December 2017 at 16:44:56 UTC, Ola Fosheim Grøstad 
wrote:
 On Saturday, 2 December 2017 at 12:25:22 UTC, codephantom wrote:
 Do the people on the unicode consortium consider such 
 communication to be invalid?

 https://splinternews.com/violent-emoji-are-starting-to-get-people-in-trouble-wit-1793845130

 On the other hand try to google "emoji sexual"…

No. Humans never express negative emotions, and also, never 
communicate a desire to have sex. That's explains a lot about the 
unicode consortium. 's', 'e', 'x' is ok, just not together.

Q.What's the difference between a politician and an emoji?

A.Nothing. You cannot take either at face value.

..oophs. politics again. I should know better.

but my wider point is, unicode emoji's are useless if they only 
contain those that 'some' consider to be polictically correct, or 
socially acceptable.

The Unicode consortium is a bunch of ...   (I don't have the 
unicode emoji representation yet to complete that sentence).

Dec 02 2017

codephantom <me noyb.com> writes:

On Sunday, 3 December 2017 at 01:11:14 UTC, codephantom wrote:
 but my wider point is, unicode emoji's are useless if they only 
 contain those that 'some' consider to be polictically correct, 
 or socially acceptable.

 The Unicode consortium is a bunch of ...   (I don't have the 
 unicode emoji representation yet to complete that sentence).

btw. Good article here, further demonstrating my point..

"We're talking about engineers that are concerned about standards 
and internationalization issues who now have to do something more 
in line with Apple or Google's marketing teams,".

https://www.buzzfeed.com/charliewarzel/thanks-to-apples-influence-youre-not-getting-a-rifle-emoji

Dec 02 2017

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Saturday, 2 December 2017 at 04:08:54 UTC, Jonathan M Davis 
wrote:
 code points. Emojis are specifically representable by a 
 sequence of existing characters (usually ASCII), because they 
 came from folks trying to represent pictures with text.

They are used as symbols culturally, which is how written 
language happen, so I think the real question is if they have 
just implemented the ones that have become widespread over a long 
period of time or if they have deliberately created completely 
new ones... It makes sense for the most used ones.

E.g. I don't want "8-(3+4)" to render as "😳3+4" ;-)

There is also a difference between Ø and ∅, because the meaning 
is different. Too bad the same does not apply to arrows (math vs 
non math usage).

So yeah, they could do better, but not too bad. If something is 
widely used in a way that gives signs a different meaning then it 
makes sense to introduce a new symbol for it so that one both can 
render them slightly differently and so that the programs can 
interpret them correctly.

Dec 02 2017

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Thursday, 30 November 2017 at 10:19:18 UTC, Walter Bright 
wrote:
 On 11/27/2017 7:01 PM, A Guy With an Opinion wrote:
 [...]

 Sooner or later your code will exhibit bugs if it assumes that 
 char==codepoint with UTF16, because of surrogate pairs.

 https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java

 As far as I can tell, pretty much the only users of UTF16 are 
 Windows programs. Everyone else uses UTF8 or UCS32.

 I recommend using UTF8.

I assume you meant UTF32 not UCS32, given UCS2 is Microsoft's 
half-assed UTF16.

Nov 30 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 11/30/2017 2:47 AM, Nicholas Wilson wrote:
 As far as I can tell, pretty much the only users of UTF16 are Windows 
 programs. Everyone else uses UTF8 or UCS32.

 I assume you meant UTF32 not UCS32, given UCS2 is Microsoft's half-assed UTF16.

I meant UCS-4, which is identical to UTF-32. It's hard keeping all that stuff 
straight. Sigh.

https://en.wikipedia.org/wiki/UTF-32

Nov 30 2017

A Guy With a Question <aguywithanquestion gmail.com> writes:

On Thursday, 30 November 2017 at 11:41:09 UTC, Walter Bright 
wrote:
 On 11/30/2017 2:47 AM, Nicholas Wilson wrote:
 As far as I can tell, pretty much the only users of UTF16 are 
 Windows programs. Everyone else uses UTF8 or UCS32.

 I assume you meant UTF32 not UCS32, given UCS2 is Microsoft's 
 half-assed UTF16.

 I meant UCS-4, which is identical to UTF-32. It's hard keeping 
 all that stuff straight. Sigh.

 https://en.wikipedia.org/wiki/UTF-32

It's also worth mentioning that the more I think about it, the 
UTF8 vs. UTF16 thing was probably not worth mentioning with the 
rest of the things I listed out. It's pretty minor and more of a 
preference.

Nov 30 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 11/30/2017 5:22 AM, A Guy With a Question wrote:
 It's also worth mentioning that the more I think about it, the UTF8 vs. UTF16 
 thing was probably not worth mentioning with the rest of the things I listed 
 out. It's pretty minor and more of a preference.

Both Windows and Java selected UTF16 before surrogates were added, so it was a 
reasonable decision made in good faith. But an awful lot of Windows/Java code 
has latent bugs in it because of not dealing with surrogates.

D is designed from the ground up to work smoothly with UTF8/UTF16
multi-codeunit 
encodings. If you do decide to use UTF16, please take advantage of this and
deal 
with surrogates correctly. When you do decide to give up on UTF16 (!) and go 
with UTF8, your code will be easy to convert to UTF8.

Nov 30 2017

A Guy With a Question <aguywithanquestion gmail.com> writes:

On Thursday, 30 November 2017 at 10:19:18 UTC, Walter Bright 
wrote:
 On 11/27/2017 7:01 PM, A Guy With an Opinion wrote:
 +- Unicode support is good. Although I think D's string type 
 should have probably been utf16 by default. Especially 
 considering the utf module states:
 
 "UTF character support is restricted to '\u0000' <= character 
 <= '\U0010FFFF'."
 
 Seems like the natural fit for me. Plus for the vast majority 
 of use cases I am pretty guaranteed a char = codepoint. Not 
 the biggest issue in the world and maybe I'm just being overly 
 critical here.

 Sooner or later your code will exhibit bugs if it assumes that 
 char==codepoint with UTF16, because of surrogate pairs.

 https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java

 As far as I can tell, pretty much the only users of UTF16 are 
 Windows programs. Everyone else uses UTF8 or UCS32.

 I recommend using UTF8.

As long as you understand it's limitations I think most bugs can 
be avoided. Where UTF16 breaks down, is pretty well defined. 
Also, super rare. I think UTF32 would be great to, but it seems 
like just a waste of space 99% of the time. UTF8 isn't horrible, 
I am not going to never use D because it uses UTF8 (that would be 
silly). Especially when wstring also seems baked into the 
language. However, it can complicate code because you pretty much 
always have to assume character != codepoint outside of ASCII. I 
can see a reasonable person arguing that it forcing you assume 
character != code point is actually a good thing. And that is a 
valid opinion.

Nov 30 2017

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Thursday, November 30, 2017 13:18:37 A Guy With a Question via 
Digitalmars-d wrote:
 As long as you understand it's limitations I think most bugs can
 be avoided. Where UTF16 breaks down, is pretty well defined.
 Also, super rare. I think UTF32 would be great to, but it seems
 like just a waste of space 99% of the time. UTF8 isn't horrible,
 I am not going to never use D because it uses UTF8 (that would be
 silly). Especially when wstring also seems baked into the
 language. However, it can complicate code because you pretty much
 always have to assume character != codepoint outside of ASCII. I
 can see a reasonable person arguing that it forcing you assume
 character != code point is actually a good thing. And that is a
 valid opinion.

The reality of the matter is that if you want to write fully valid Unicode,
then you have to understand the differences between code units, code points,
and graphemes, and since it really doesn't make sense to operate at the
grapheme level for everything (it would be terribly slow and is completely
unnecessary for many algorithms), you pretty much have to come to accept
that in the general case, you can't assume that something like a char
represents an actual character, regardless of its encoding. UTF-8 vs UTF-16
doesn't change anything in that respect except for the fact that there are
more characters which fit fully in a UTF-16 code unit than a UTF-8 code
unit, so it's easier to think that you're correctly handling Unicode when
you actually aren't. And if you're not dealing with Asian languages, UTF-16
uses up more space than UTF-8. But either way, they're both wrong if you're
trying to treat a code unit as a code point, let alone a grapheme. It's just
that we have a lot of programmers who only deal with English and thus don't
as easily hit the cases where their code is wrong. For better or worse,
UTF-16 hides it better than UTF-8, but the problem exists in both.

- Jonathan M Davis

Nov 30 2017

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M Davis 
wrote:
 [...] And if you're not dealing with Asian languages, UTF-16 
 uses up more space than UTF-8.

Not even that in most cases. Only if you use unstructured text 
can it happen that UTF-16 needs less space than UTF-8. In most 
cases, the text is embedded in some sort of ML (html, odf, docx, 
tmx, xliff, akoma ntoso, etc...) which puts the balance again to 
the side of UTF-8.

Nov 30 2017

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M Davis 
wrote:
 English and thus don't as easily hit the cases where their code 
 is wrong. For better or worse, UTF-16 hides it better than 
 UTF-8, but the problem exists in both.

To give just an example of what can go wrong with UTF-16. Reading 
a file in UTF-16 and converting it tosomething else like UTF-8 or 
UTF-32. Reading block by block and hitting exactly a SMP 
codepoint at the buffer limit, high surrogate at the end of the 
first buffer, low surrogate at the start of the next. If you 
don't think about it => 2 invalid characters instead of your nice 
poop 💩 emoji character (emojis are in the SMP and they are more 
and more frequent).

Nov 30 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 11/30/17 1:20 PM, Patrick Schluter wrote:
 On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M Davis wrote:
 English and thus don't as easily hit the cases where their code is 
 wrong. For better or worse, UTF-16 hides it better than UTF-8, but the 
 problem exists in both.

 
 To give just an example of what can go wrong with UTF-16. Reading a file 
 in UTF-16 and converting it tosomething else like UTF-8 or UTF-32. 
 Reading block by block and hitting exactly a SMP codepoint at the buffer 
 limit, high surrogate at the end of the first buffer, low surrogate at 
 the start of the next. If you don't think about it => 2 invalid 
 characters instead of your nice poop 💩 emoji character (emojis are in 
 the SMP and they are more and more frequent).

iopipe handles this: 
http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html

-Steve

Nov 30 2017

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Thursday, 30 November 2017 at 19:37:47 UTC, Steven 
Schveighoffer wrote:
 On 11/30/17 1:20 PM, Patrick Schluter wrote:
 On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M 
 Davis wrote:
 English and thus don't as easily hit the cases where their 
 code is wrong. For better or worse, UTF-16 hides it better 
 than UTF-8, but the problem exists in both.

 
 To give just an example of what can go wrong with UTF-16. 
 Reading a file in UTF-16 and converting it tosomething else 
 like UTF-8 or UTF-32. Reading block by block and hitting 
 exactly a SMP codepoint at the buffer limit, high surrogate at 
 the end of the first buffer, low surrogate at the start of the 
 next. If you don't think about it => 2 invalid characters 
 instead of your nice poop 💩 emoji character (emojis are in the 
 SMP and they are more and more frequent).

 iopipe handles this: 
 http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html

It was only to give an example. With UTF-8 people who implement 
the low level code in general think about the multiple codeunits 
at the buffer boundary. With UTF-16 it's often forgotten. In 
UTF-16 there are also 2 other common pitfalls, that exist also in 
UTF-8 but are less consciously acknowledged, overlong encoding 
and isolated codepoints. So UTF-16 has the same issues as UTF-8, 
plus some more, endianness and size.

Nov 30 2017

A Guy With a Question <aguywithanquestion gmail.com> writes:

On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter 
wrote:
 On Thursday, 30 November 2017 at 19:37:47 UTC, Steven 
 Schveighoffer wrote:
 On 11/30/17 1:20 PM, Patrick Schluter wrote:
 On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M 
 Davis wrote:
 English and thus don't as easily hit the cases where their 
 code is wrong. For better or worse, UTF-16 hides it better 
 than UTF-8, but the problem exists in both.

 
 To give just an example of what can go wrong with UTF-16. 
 Reading a file in UTF-16 and converting it tosomething else 
 like UTF-8 or UTF-32. Reading block by block and hitting 
 exactly a SMP codepoint at the buffer limit, high surrogate 
 at the end of the first buffer, low surrogate at the start of 
 the next. If you don't think about it => 2 invalid characters 
 instead of your nice poop 💩 emoji character (emojis are in 
 the SMP and they are more and more frequent).

 iopipe handles this: 
 http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html

 It was only to give an example. With UTF-8 people who implement 
 the low level code in general think about the multiple 
 codeunits at the buffer boundary. With UTF-16 it's often 
 forgotten. In UTF-16 there are also 2 other common pitfalls, 
 that exist also in UTF-8 but are less consciously acknowledged, 
 overlong encoding and isolated codepoints. So UTF-16 has the 
 same issues as UTF-8, plus some more, endianness and size.

Most problems with UTF16 is applicable to UTF8. The only issue 
that isn't, is if you are just dealing with ASCII it's a bit of a 
waste of space.

Dec 01 2017

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Friday, 1 December 2017 at 12:21:22 UTC, A Guy With a Question 
wrote:
 On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter 
 wrote:
 On Thursday, 30 November 2017 at 19:37:47 UTC, Steven 
 Schveighoffer wrote:
 On 11/30/17 1:20 PM, Patrick Schluter wrote:
 [...]

 iopipe handles this: 
 http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html

 It was only to give an example. With UTF-8 people who 
 implement the low level code in general think about the 
 multiple codeunits at the buffer boundary. With UTF-16 it's 
 often forgotten. In UTF-16 there are also 2 other common 
 pitfalls, that exist also in UTF-8 but are less consciously 
 acknowledged, overlong encoding and isolated codepoints. So 
 UTF-16 has the same issues as UTF-8, plus some more, 
 endianness and size.

 Most problems with UTF16 is applicable to UTF8. The only issue 
 that isn't, is if you are just dealing with ASCII it's a bit of 
 a waste of space.

That's what I said. UTF-16 and UTF-8 have the same issues, but 
UTF-16 has even 2 more: endianness and bloat for ASCII. All 3 
encodings have their pluses and minuses, that's why D supports 
all 3 but with a preference for utf-8.

Dec 01 2017

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter 
wrote:
 On Thursday, 30 November 2017 at 19:37:47 UTC, Steven 
 Schveighoffer wrote:
 On 11/30/17 1:20 PM, Patrick Schluter wrote:
 [...]

 iopipe handles this: 
 http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html

 It was only to give an example. With UTF-8 people who implement 
 the low level code in general think about the multiple 
 codeunits at the buffer boundary. With UTF-16 it's often 
 forgotten. In UTF-16 there are also 2 other common pitfalls, 
 that exist also in UTF-8 but are less consciously acknowledged, 
 overlong encoding and isolated codepoints. So UTF-16 has the

I meant isolated code-units, of course.

 same issues as UTF-8, plus some more, endianness and size.

Dec 01 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 12/1/17 7:26 AM, Patrick Schluter wrote:
 On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter wrote:
  isolated codepoints. 

 
 I meant isolated code-units, of course.

Hehe, it's impossible for me to talk about code points and code units 
without having to pause and consider which one I mean :)

-Steve

Dec 01 2017

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Friday, December 01, 2017 09:49:08 Steven Schveighoffer via Digitalmars-d 
wrote:
 On 12/1/17 7:26 AM, Patrick Schluter wrote:
 On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter wrote:
  isolated codepoints.

 I meant isolated code-units, of course.

 Hehe, it's impossible for me to talk about code points and code units
 without having to pause and consider which one I mean :)

What, you mean that Unicode can be confusing? No way! ;)

LOL. I have to be careful with that too. What bugs me even more though is
that the Unicode spec talks about code points being characters, and then
talks about combining characters for grapheme clusters - and this in spite
of the fact that what most people would consider a character is a grapheme
cluster and _not_ a code point. But they presumably had to come up with new
terms for a lot of this nonsense, and that's not always easy.

Regardless, what they came up with is complicated enough that it's arguably
a miracle whenever a program actually handles Unicode text 100% correctly.
:|

- Jonathan M Davis

Dec 01 2017

A Guy With a Question <aguywithanquestion gmail.com> writes:

On Friday, 1 December 2017 at 18:31:46 UTC, Jonathan M Davis 
wrote:
 On Friday, December 01, 2017 09:49:08 Steven Schveighoffer via 
 Digitalmars-d wrote:
 On 12/1/17 7:26 AM, Patrick Schluter wrote:
 On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter 
 wrote:
  isolated codepoints.

 I meant isolated code-units, of course.

 Hehe, it's impossible for me to talk about code points and 
 code units without having to pause and consider which one I 
 mean :)

 What, you mean that Unicode can be confusing? No way! ;)

 LOL. I have to be careful with that too. What bugs me even more 
 though is that the Unicode spec talks about code points being 
 characters, and then talks about combining characters for 
 grapheme clusters - and this in spite of the fact that what 
 most people would consider a character is a grapheme cluster 
 and _not_ a code point. But they presumably had to come up with 
 new terms for a lot of this nonsense, and that's not always 
 easy.

 Regardless, what they came up with is complicated enough that 
 it's arguably a miracle whenever a program actually handles 
 Unicode text 100% correctly. :|

 - Jonathan M Davis

And dealing with that complexity can often introduce bugs in 
their own right, because it's hard to get right. That's why 
sometimes it's easy just to simplify things and to exclude 
certain ways of looking at the string.

Dec 01 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 11/30/2017 10:07 PM, Patrick Schluter wrote:
 endianness

Yeah, I forgot to mention that one. As if anyone remembers to put in the Byte 
Order Mark :-(

Dec 02 2017

Kagamin <spam here.lot> writes:

On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an 
Opinion wrote:
 - Attributes. I had another post in the Learn forum about 
 attributes which was unfortunate. At first I was excited 
 because it seems like on the surface it would help me write 
 better code, but it gets a little tedious and tiresome to have 
 to remember to decorate code with them.



 I think the better decision would be to not have the errors 
 occur.

Hehe, I'm not against living in an idea world either.

 - Immutable. I'm not sure I fully understand it. On the surface 
 it seemed like const but transitive. I tried having a method 
 return an immutable value, but when I used it in my unit test I 
 got some weird errors about objects not being able to return 
 immutable (I forget the exact error...apologies).

That's the point of static type system: if you make a mistake, 
the code doesn't compile.

 +- Unicode support is good. Although I think D's string type 
 should have probably been utf16 by default. Especially 
 considering the utf module states:

 "UTF character support is restricted to '\u0000' <= character 
 <= '\U0010FFFF'."

 Seems like the natural fit for me.

UTF-16 in inadequate for range '\u0000' <= character <= 
'\U0010FFFF', though. UCS2 was adequate (for '\u0000' <= 
character <= '\uFFFF'), but lost relevance. UTF-16 is only 
backward compatibility for early adopters of unicode based on 
UCS2.

 Plus for the vast majority of use cases I am pretty guaranteed 
 a char = codepoint.

That way only end users will be able to catch bugs in production 
system. It's not the best strategy, is it? Text is often 
persistent data, how do you plan to fix a text handling bug when 
corruption accumulated for years and spilled all over the place?

Nov 30 2017

D Programming

C/C++ Programming

Other

digitalmars.D - First Impressions!