digitalmars.D - Updating D beyond Unicode 2.0

Neia Neutuladh (34/34) Sep 21 2018 D's currently accepted identifier characters are based on Unicode

Walter Bright (12/12) Sep 21 2018 When I originally started with D, I thought non-ASCII identifiers with U...

Adam D. Ruppe (13/18) Sep 21 2018 Do you look at Japanese D code much? Or Turkish? Or Chinese?

=?UTF-8?Q?Ali_=c3=87ehreli?= (20/22) Sep 23 2018 Fully aggreed but as far as I know, Turkish companies use English in

Kagamin (3/4) Sep 23 2018 You even contributed to

Neia Neutuladh (13/19) Sep 21 2018 ...you *do* know that not every codebase has people working on it

Thomas Mader (25/27) Sep 22 2018 This topic boils down to diversity vs. productivity.
Steven Schveighoffer (7/31) Sep 22 2018 But aren't we arguing about the wrong thing here? D already accepts

Neia Neutuladh (19/24) Sep 22 2018 Walter was doing that thing that people in the US who only speak

Erik van Velzen (11/18) Sep 22 2018 On Saturday, 22 September 2018 at 16:56:10 UTC, Neia Neutuladh

Neia Neutuladh (3/5) Sep 22 2018 I did. https://git.ikeran.org/dhasenan/muzikilo
Adam D. Ruppe (14/16) Sep 22 2018 This is the obvious observation bias I alluded to before: of

sarn (5/7) Sep 22 2018 You can find a lot more Japanese D code on this blogging platform:

Shachar Shemesh (4/13) Sep 22 2018 Comments in Japanese. Identifiers in English. Not advancing your point,

sarn (4/14) Sep 23 2018 Well, I knew that when I posted, so I honestly have no idea what

Shachar Shemesh (10/25) Sep 23 2018 I don't know what point you were trying to make. That's precisely why I

aliak (3/6) Sep 23 2018 https://forum.dlang.org/post/piwvbtetcwyxlalocxkw@forum.dlang.org

Steven Schveighoffer (37/62) Sep 24 2018 I don't think he was doing that. I think what he was saying was, D tried...

Joakim (17/33) Sep 21 2018 To wit, Windows linker error with Unicode symbol:

Neia Neutuladh (4/10) Sep 21 2018 The compiler doesn't have to do much with Unicode processing,
Jonathan M Davis (18/28) Sep 22 2018 Unicode identifiers may make sense in a code base that is going to be us...

Shachar Shemesh (5/14) Sep 22 2018 Thank Allah that someone said it before I had to. I could not agree

Thomas Mader (8/11) Sep 22 2018 The goal of Unicode is to support diversity, if you argue against

Jonathan M Davis (9/20) Sep 22 2018 Unicode is supposed to be a universal way of representing every characte...

Shachar Shemesh (6/11) Sep 22 2018 To be fair to them, that word is part of the "Arabic-representation
Thomas Mader (17/24) Sep 22 2018 At least since the incorporation of Emojis it's not supposed to

Shachar Shemesh (8/11) Sep 22 2018 If memory serves me right, hieroglyphs actually represent consonants

Neia Neutuladh (21/26) Sep 22 2018 Egyptian hieroglyphics uses logographs (symbols representing

=?UTF-8?Q?Ali_=c3=87ehreli?= (5/8) Sep 23 2018 I had the misconception of each Chinese character meaning a word until I...

Steven Schveighoffer (4/14) Sep 22 2018 But aren't some (many?) Chinese/Japanese characters representing whole

Jonathan M Davis (13/27) Sep 22 2018 It's true that they're not characters in the sense that Roman characters...

Steven Schveighoffer (7/34) Sep 24 2018 But there are tons of emojis that have nothing to do with sequences of

sarn (6/9) Sep 22 2018 Kind of hair-splitting, but it's more accurate to say that some

Neia Neutuladh (11/17) Sep 22 2018 You have a problem when you need to share a codebase between two

Jonathan M Davis (32/49) Sep 22 2018 My point is that if your code base is definitely only going to be used

Walter Bright (16/18) Sep 23 2018 In the earlier days of D, I put on the web pages a google widget what wo...

Neia Neutuladh (4/9) Sep 23 2018 Okay, that's why you previously selected C99 as the standard for

Walter Bright (2/5) Sep 23 2018 I wasn't aware it changed in C11.

Neia Neutuladh (9/14) Sep 23 2018 http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf page

Steven Schveighoffer (41/59) Sep 24 2018 I searched around for the current state of symbol names in C, and found

Adam D. Ruppe (17/25) Sep 24 2018 Eh, those are kinda opaque sequences anyway, since the meanings

Steven Schveighoffer (7/16) Sep 24 2018 Well, even on top of that, the standard library is full of English words...

Martin Tschierschke (7/26) Sep 24 2018 You might get really funny error messages.

Steven Schveighoffer (5/29) Sep 24 2018 Haha, it could be cynical as well

Patrick Schluter (7/12) Sep 24 2018 Indeed. IBM mainframes have C compilers too but not ASCII. They

Steven Schveighoffer (11/23) Sep 24 2018 Right. But it's just a side-note -- I'd guess all modern compilers

Dennis (10/15) Sep 23 2018 I always thought D supported Unicode with the goal of going

Walter Bright (3/4) Sep 23 2018 D the language is well suited to the development of Unicode apps. D sour...

Dennis (10/12) Sep 24 2018 But in the article you specifically talk about the use of Unicode

Jonathan M Davis (7/19) Sep 24 2018 Given that the typical keyboard has none of those characters, maintainin...

Dennis (9/16) Sep 24 2018 Note that I'm not trying to argue either way, it's just that I
Adam D. Ruppe (7/9) Sep 24 2018 It is pretty easy to type them with a little keyboard config

Abdulhaq (30/38) Sep 23 2018 According to the Unicode website,

Walter Bright (6/8) Sep 25 2018 Small character sets are much more implementable on primitive systems li...

aliak (11/27) Sep 23 2018 Not seeing identifiers in languages you don't program in or can

Walter Bright (21/23) Sep 23 2018 On the other hand, I've been programming for 40 years. I've customized m...

0xEAB (30/33) Sep 24 2018 I'm a native German speaker.

0xEAB (2/4) Sep 24 2018 addendum: I've been using the English version since VS2017
=?UTF-8?Q?Ali_=c3=87ehreli?= (6/7) Sep 25 2018 This is something I had heard from a Digital Research programmer in

Simen =?UTF-8?B?S2rDpnLDpXM=?= (13/19) Sep 25 2018 My ex-girlfriend tried to learn SQL from a book that had gotten a
Patrick Schluter (4/10) Sep 25 2018 The K&R in German was of the same "quality". That happens when
ShadoLight (13/18) Sep 26 2018 [snip]

abcde1234 (5/26) Sep 26 2018 In case you missed it, this was well spreaded in the tech news

=?UTF-8?Q?Ali_=c3=87ehreli?= (5/5) Sep 26 2018 A delicious Turkish desert is "kabak tatlısı", made of squash. Now, it...

Jonathan M Davis (4/8) Sep 26 2018 Was it any good? ;)
Andrea Fontana (4/9) Sep 27 2018 You can't even imagine how many italian words and recipes are

Paolo Invernizzi (3/14) Sep 27 2018 +1 :-P

Andrea Fontana (9/15) Sep 26 2018 Yes please. Keep them in english.
Jonathan M Davis (13/16) Sep 26 2018 It reminds me of one of the reasons that Bryan Cantrill thinks that many

Erik van Velzen (7/7) Sep 21 2018 Agreed with Walter.

Seb (11/18) Sep 21 2018 A: Wait. Using emojis as identifiers is not a good idea?

Neia Neutuladh (9/16) Sep 21 2018 The C11 spec says that emoji should be allowed in identifiers
rikki cattermole (3/7) Sep 21 2018 This can be strongly mitigated by using a compose key. But they are not
Kagamin (5/9) Sep 23 2018 It's not like we have a lot of good fonts (I know only one), and
FeepingCreature (7/12) Sep 25 2018 I just want to chime in that I've definitely used greek letters

Dukc (15/24) Sep 25 2018 When I make code that I expect to be only used around here, I

Shachar Shemesh (11/15) Sep 25 2018 This sounded like a very compelling example, until I gave it a second

Dukc (9/14) Sep 26 2018 How so?

Shachar Shemesh (12/28) Sep 26 2018 Sure you can. It's just very poor design.

Dukc (10/15) Sep 26 2018 Two years ago, I taked part in implementing a commerical game. It

Steven Schveighoffer (7/24) Sep 26 2018 Hm... I could see actually some "clever" use of opDispatch being used to...
Walter Bright (3/5) Sep 26 2018 Also, there are usually common ASCII versions of city names, such as Col...

Jacob Carlborg (11/54) Sep 25 2018 I'm not a native English speaker but I write all my public and private
rjframe (11/19) Sep 26 2018 I just want to point out since this thread is still living that there ha...

Steven Schveighoffer (20/44) Sep 26 2018 This is a non-starter. We can't break people's code, especially for

Walter Bright (10/13) Sep 26 2018 We're not going to remove it, because there's not much to gain from it.

Adam D. Ruppe (3/6) Sep 26 2018 http://ddili.org/ders/d/
Steven Schveighoffer (16/19) Sep 26 2018 It may be the weight is already there in the form of unicode symbol
Neia Neutuladh (6/8) Sep 26 2018 Yes, a lot of languages that don't use the Latin alphabet have standard
aliak (37/55) Sep 27 2018 It's not that they don't know English. It's that non-English

Shachar Shemesh (5/11) Sep 27 2018 I'm sorry I keep bringing this up, but context is really important here.

aliak (6/20) Sep 27 2018 The point was that being able to use non-English in code is

Shachar Shemesh (10/14) Sep 27 2018 If you wish to make a point about something irrelevant to the

aliak (9/25) Sep 27 2018 English doesn't mean ascii. You can write non-English in ascii,

sarn (4/33) Sep 27 2018 Shachar seems to be aiming for an internet high score by shooting

Dukc (3/7) Sep 28 2018 I believe you're being too harsh. It's easy to miss a part of a

sarn (14/15) Sep 28 2018 That's very true, and it's always good to give people the benefit
Shachar Shemesh (6/14) Sep 28 2018 A minor correction: Aliak is not accusing me of missing a part of the

Dukc (4/7) Sep 29 2018 I know you meant Sarn, but still... can you please be a bit less

Shachar Shemesh (26/34) Sep 29 2018 That is the word used by the article *you* linked to, in reference to

Shachar Shemesh (4/14) Sep 29 2018 You are 100% correct. My most sincere apologies.

Walter Bright (3/9) Sep 27 2018 Nobody is suggesting D not support Unicode in strings, comments, and the...

Walter Bright (3/4) Sep 26 2018 Feel free to write one, but its chances of getting incorporated are remo...

Neia Neutuladh <neia ikeran.org> writes:

D's currently accepted identifier characters are based on Unicode 
2.0:

* ASCII range values are handled specially.
* Letters and combining marks from Unicode 2.0 are accepted.
* Numbers outside the ASCII range are accepted.
* Eight random punctuation marks are accepted.

This follows the C99 standard.


Python, ECMAScript, just to name a few. A small number of 
languages reject non-ASCII characters: Dart, Perl. Some languages 
are weirdly generous: Swift and C11 allow everything outside the 
Basic Multilingual Plane.

I'd like to update that so that D accepts something as a valid 
identifier character if it's a letter or combining mark or 
modifier symbol that's present in Unicode 11, or a non-ASCII 
number. This allows the 146 most popular writing systems and a 
lot more characters from those writing systems. This *would* 
reject those eight random punctuation marks, so I'll keep them in 
as legacy characters.

It would mean we don't have to reference the C99 standard when 
enumerating the allowed characters; we just have to refer to the 
Unicode standard, which we already need to talk about in the 
lexical part of the spec.

It might also make the lexer a tiny bit faster; it reduces the 
number of valid-ident-char segments to search from 245 to 134. On 
the other hand, it will change the ident char ranges from wchar 
to dchar, which means the table takes up marginally more memory.

And, of course, it lets you write programs entirely in Linear B, 
and that's a marketing ploy not to be missed.

I've got this coded up and can submit a PR, but I thought I'd get 
feedback here first.

Does anyone see any horrible potential problems here?

Or is there an interestingly better option?

Does this need a DIP?

Sep 21 2018

Walter Bright <newshound2 digitalmars.com> writes:

When I originally started with D, I thought non-ASCII identifiers with Unicode 
was a good idea. I've since slowly become less and less enthusiastic about it.

First off, D source text simply must (and does) fully support Unicode in 
comments, characters, and string literals. That's not an issue.

But identifiers? I haven't seen hardly any use of non-ascii identifiers in C, 
C++, or D. In fact, I've seen zero use of it outside of test cases. I don't see 
much point in expanding the support of it. If people use such identifiers, the 
result would most likely be annoyance rather than illumination when people who 
don't know that language have to work on the code.

Extending it further will also cause problems for all the tools that work with
D 
object code, like debuggers, disassemblers, linkers, filesystems, etc.

Absent a much more compelling rationale for it, I'd say no.

Sep 21 2018

Adam D. Ruppe <destructionator gmail.com> writes:

On Friday, 21 September 2018 at 20:25:54 UTC, Walter Bright wrote:
 But identifiers? I haven't seen hardly any use of non-ascii 
 identifiers in C, C++, or D. In fact, I've seen zero use of it 
 outside of test cases.

Do you look at Japanese D code much? Or Turkish? Or Chinese?

I know there are decently sized D communities in those languages, 
and I am pretty sure I have seen identifiers in their languages 
before, but I can't find it right now.

Just there's a pretty clear potential for observation bias here. 
Even our search engine queries are going to be biased toward 
English-language results, so there can be a whole D world kinda 
invisible to you and I.

We should reach out and get solid stats before making a final 
decision.

 most likely be annoyance rather than illumination when people 
 who don't know that language have to work on the code.

Well, for example, with a Chinese company, they may very well 
find forced English identifiers to be an annoyance.

Sep 21 2018

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 09/21/2018 04:18 PM, Adam D. Ruppe wrote:

 Well, for example, with a Chinese company, they may very well find
 forced English identifiers to be an annoyance.

Fully aggreed but as far as I know, Turkish companies use English in 
source code.

Turkish alphabet is Latin based where dotted and undotted versions of 
Latin letters are distinct and  produce different meanings. Quick examples:

sık: dense (n), squeeze (v), ...
sik: penis (n), f*ck (v) [1]
şık: one of multiple choices (1), swanky (2)
döndür: return
dondur: make frozen
sök: disassemble, dismantle, ...
sok: insert, install, ...
şok: shock

Hence, non-Unicode is unacceptable in Turkish code unless we reserve 
programming to English speakers only, which is unacceptable because it 
would be exclusionary and would produce English identifiers that are 
frequently amusing. I've seen the latter in code of English learners. :)

Ali

[1] 
https://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail

Sep 23 2018

Kagamin <spam here.lot> writes:

On Sunday, 23 September 2018 at 11:18:42 UTC, Ali Çehreli wrote:
 Hence, non-Unicode is unacceptable in Turkish code

You even contributed to 
http://code.google.com/p/trileri/source/browse/trunk/tr/yazi.d

Sep 23 2018

Neia Neutuladh <neia ikeran.org> writes:

On Friday, 21 September 2018 at 20:25:54 UTC, Walter Bright wrote:
 But identifiers? I haven't seen hardly any use of non-ascii 
 identifiers in C, C++, or D. In fact, I've seen zero use of it 
 outside of test cases. I don't see much point in expanding the 
 support of it. If people use such identifiers, the result would 
 most likely be annoyance rather than illumination when people 
 who don't know that language have to work on the code.

...you *do* know that not every codebase has people working on it 
who only know English, right?

If I took a software development job in China, I'd need to learn 
Chinese. I'd expect the codebase to be in Chinese. Because a 
Chinese company generally operates in Chinese, and they're likely 
to have a lot of employees who only speak Chinese.

And no, you can't just transcribe Chinese into ASCII.

Same for Spanish, Norwegian, German, Polish, Russian -- heck, 
it's almost easier to list out the languages you *don't* need 
non-ASCII characters for.

Anyway, here's some more D code using non-ASCII identifiers, in 
case you need examples: https://git.ikeran.org/dhasenan/muzikilo

Sep 21 2018

Thomas Mader <thomas.mader gmail.com> writes:

On Saturday, 22 September 2018 at 01:08:26 UTC, Neia Neutuladh 
wrote:
 ...you *do* know that not every codebase has people working on 
 it who only know English, right?

This topic boils down to diversity vs. productivity.

If supporting diversity in this case is questionable.

I work in a German speaking company and we have no developers who 
are not speaking German for now. In fact all are native speakers.
Still we write our code, comments and commit messages in English.
Even at university you learn that you should use English to code.

The reasoning is simple. You never know who will work on your 
code in the future.
If a company writes code in Chinese, they will have a hard time 
to expand the development of their codebase even though Chinese 
is spoken by that many people.

So even though you could use all sorts of characters, in a 
productive environment you better choose not to do so.
You might end up shooting yourself in the foot in the long run.

Diversity is important in other areas but I don't see much 
advantage here.
At least for now because the spoken languages of today don't 
differ tremendously in what they are capable of expressing.

This is also true for todays programming languages. Most of them 
are just different syntax for the very same ideas and concepts. 
That's not very helpful to bring people together and advance.

My understanding is that even life with it's great diversity just 
has one language (DNA) to define it.

Sep 22 2018

Steven Schveighoffer <schveiguy gmail.com> writes:

On 9/21/18 9:08 PM, Neia Neutuladh wrote:
 On Friday, 21 September 2018 at 20:25:54 UTC, Walter Bright wrote:
 But identifiers? I haven't seen hardly any use of non-ascii 
 identifiers in C, C++, or D. In fact, I've seen zero use of it outside 
 of test cases. I don't see much point in expanding the support of it. 
 If people use such identifiers, the result would most likely be 
 annoyance rather than illumination when people who don't know that 
 language have to work on the code.

 
 ....you *do* know that not every codebase has people working on it who 
 only know English, right?
 
 If I took a software development job in China, I'd need to learn 
 Chinese. I'd expect the codebase to be in Chinese. Because a Chinese 
 company generally operates in Chinese, and they're likely to have a lot 
 of employees who only speak Chinese.
 
 And no, you can't just transcribe Chinese into ASCII.
 
 Same for Spanish, Norwegian, German, Polish, Russian -- heck, it's 
 almost easier to list out the languages you *don't* need non-ASCII 
 characters for.
 
 Anyway, here's some more D code using non-ASCII identifiers, in case you 
 need examples: https://git.ikeran.org/dhasenan/muzikilo

But aren't we arguing about the wrong thing here? D already accepts 
non-ASCII identifiers. What languages need an upgrade to unicode symbol 
names? In other words, what symbols aren't possible with the current 
support?

Or maybe I'm misunderstanding something.

-Steve

Sep 22 2018

Neia Neutuladh <neia ikeran.org> writes:

On Saturday, 22 September 2018 at 12:35:27 UTC, Steven 
Schveighoffer wrote:
 But aren't we arguing about the wrong thing here? D already 
 accepts non-ASCII identifiers.

Walter was doing that thing that people in the US who only speak 
English tend to do: forgetting that other people speak other 
languages, and that people who speak English can learn other 
languages to work with people who don't speak English. He was 
saying it's inevitably a mistake to use non-ASCII characters in 
identifiers and that nobody does use them in practice.

Walter talking like that sounds like he'd like to remove support 
for non-ASCII identifiers from the language. I've gotten by 
without maintaining a set of personal patches on top of DMD so 
far, and I'd like it if I didn't have to start.

 What languages need an upgrade to unicode symbol names? In 
 other words, what symbols aren't possible with the current 
 support?

Chinese and Japanese have gained about eleven thousand symbols 
since Unicode 2.

Unicode 2 covers 25 writing systems, while Unicode 11 covers 146. 
Just updating to Unicode 3 would give us Cherokee, Ge'ez 
(multiple languages), Khmer (Cambodian), Mongolian, Burmese, 
Sinhala (Sri Lanka), Thaana (Maldivian), Canadian aboriginal 
syllabics, and Yi (Nuosu).

Sep 22 2018

Erik van Velzen <erik evanv.nl> writes:

On Saturday, 22 September 2018 at 16:56:10 UTC, Neia Neutuladh 
wrote:
On Saturday, 22 September 2018 at 16:56:10 UTC, Neia Neutuladh 
wrote:
 Walter was doing that thing that people in the US who only 
 speak English tend to do: forgetting that other people speak 
 other languages, and that people who speak English can learn 
 other languages to work with people who don't speak English. He 
 was saying it's inevitably a mistake to use non-ASCII 
 characters in identifiers and that nobody does use them in 
 practice.

There's a more charitable view and that's that even furriners 
usually use English identifiers.

Nobody in this thread so far has said they are programming in 
non-ASCII.

If there was a contingent of Japanese or Chinese users doing that 
then surely they would speak up here or in Bugzilla to advocate 
for this feature?

Sep 22 2018

Neia Neutuladh <neia ikeran.org> writes:

On Saturday, 22 September 2018 at 19:59:42 UTC, Erik van Velzen 
wrote:
 Nobody in this thread so far has said they are programming in 
 non-ASCII.

I did. https://git.ikeran.org/dhasenan/muzikilo

Sep 22 2018

Adam D. Ruppe <destructionator gmail.com> writes:

On Saturday, 22 September 2018 at 19:59:42 UTC, Erik van Velzen 
wrote:
 Nobody in this thread so far has said they are programming in 
 non-ASCII.

This is the obvious observation bias I alluded to before: of 
course people who don't read and write English aren't in this 
thread, since they cannot read or write the English used in this 
thread! Ditto for bugzilla.

Absence of evidence CAN be evidence of absence... but not when 
the absence is so easily explained by our shared bias.

Neia Neutuladh posted one link. I have seen Japanese D code 
before on twitter, but cannot find it now (surely because the 
search engines also share this bias). Perhaps those are the only 
two examples in existence, but I stand by my belief that we must 
reach out to these other communities somehow and do a proper, 
proactive study before dismissing the possibility.

Sep 22 2018

sarn <sarn theartofmachinery.com> writes:

On Sunday, 23 September 2018 at 00:18:06 UTC, Adam D. Ruppe wrote:
 I have seen Japanese D code before on twitter, but cannot find 
 it now (surely because the search engines also share this bias).

You can find a lot more Japanese D code on this blogging platform:
https://qiita.com/tags/dlang

Here's the most recent post to save you a click:
https://qiita.com/ShigekiKarita/items/9b3aa8f716848278ef62

Sep 22 2018

Shachar Shemesh <shachar weka.io> writes:

On 23/09/18 04:29, sarn wrote:
 On Sunday, 23 September 2018 at 00:18:06 UTC, Adam D. Ruppe wrote:
 I have seen Japanese D code before on twitter, but cannot find it now 
 (surely because the search engines also share this bias).

 
 You can find a lot more Japanese D code on this blogging platform:
 https://qiita.com/tags/dlang
 
 Here's the most recent post to save you a click:
 https://qiita.com/ShigekiKarita/items/9b3aa8f716848278ef62

Comments in Japanese. Identifiers in English. Not advancing your point, 
I think.

Shachar

Sep 22 2018

sarn <sarn theartofmachinery.com> writes:

On Sunday, 23 September 2018 at 06:53:21 UTC, Shachar Shemesh 
wrote:
 On 23/09/18 04:29, sarn wrote:
 You can find a lot more Japanese D code on this blogging 
 platform:
 https://qiita.com/tags/dlang
 
 Here's the most recent post to save you a click:
 https://qiita.com/ShigekiKarita/items/9b3aa8f716848278ef62

 Comments in Japanese. Identifiers in English. Not advancing 
 your point, I think.

 Shachar

Well, I knew that when I posted, so I honestly have no idea what 
point you assumed I was making.

Sep 23 2018

Shachar Shemesh <shachar weka.io> writes:

On 23/09/18 15:38, sarn wrote:
 On Sunday, 23 September 2018 at 06:53:21 UTC, Shachar Shemesh wrote:
 On 23/09/18 04:29, sarn wrote:
 You can find a lot more Japanese D code on this blogging platform:
 https://qiita.com/tags/dlang

 Here's the most recent post to save you a click:
 https://qiita.com/ShigekiKarita/items/9b3aa8f716848278ef62

 Comments in Japanese. Identifiers in English. Not advancing your 
 point, I think.

 Shachar

 
 Well, I knew that when I posted, so I honestly have no idea what point 
 you assumed I was making.

I don't know what point you were trying to make. That's precisely why I 
posted.

I don't think D currently or ever enforces what type of (legal UTF-8) 
text you could use in comments or strings. This thread is about what's 
legal to use in identifiers.

The example you brought does not use Unicode in identifiers, and is, 
therefor, irrelevant to the discussion we're having.

That was the point *I* was trying to make.

Shachar

Sep 23 2018

aliak <something something.com> writes:

On Saturday, 22 September 2018 at 19:59:42 UTC, Erik van Velzen 
wrote:
 If there was a contingent of Japanese or Chinese users doing 
 that then surely they would speak up here or in Bugzilla to 
 advocate for this feature?

https://forum.dlang.org/post/piwvbtetcwyxlalocxkw forum.dlang.org

Sep 23 2018

Steven Schveighoffer <schveiguy gmail.com> writes:

On 9/22/18 12:56 PM, Neia Neutuladh wrote:
 On Saturday, 22 September 2018 at 12:35:27 UTC, Steven Schveighoffer wrote:
 But aren't we arguing about the wrong thing here? D already accepts 
 non-ASCII identifiers.

 
 Walter was doing that thing that people in the US who only speak English 
 tend to do: forgetting that other people speak other languages, and that 
 people who speak English can learn other languages to work with people 
 who don't speak English.

I don't think he was doing that. I think what he was saying was, D tried 
to accommodate users who don't normally speak English, and they still 
use English (for the most part) for coding.

I'm actually surprised there isn't much code out there that is written 
with other identifiers besides ASCII, given that C99 supported them. I 
assumed it was because they weren't supported. Now I learn that they are 
supported, yet almost all C code I've ever seen is written in English. 
Perhaps that's just because I don't frequent foreign language sites 
though :) But many people here speak English as a second language, and 
vouch for their cultures still using English to write code.

 He was saying it's inevitably a mistake to use 
 non-ASCII characters in identifiers and that nobody does use them in 
 practice.

I would expect people probably do try to use them in practice, it's just 
that the problems they run into aren't worth the effort 
(tool/environment support). But I have no first or even second hand 
experience with this. It does seem like Walter has a lot of experience 
with it though.

 Walter talking like that sounds like he'd like to remove support for 
 non-ASCII identifiers from the language. I've gotten by without 
 maintaining a set of personal patches on top of DMD so far, and I'd like 
 it if I didn't have to start.

I don't think he was saying that. I think he was against expanding 
support for further Unicode identifiers because the the first effort did 
not produce any measurable benefit. I'd be shocked from the recent 
positions of Walter and Andrei if they decided to remove non-ASCII 
identifiers that are currently supported, thereby breaking any existing 
code.

 What languages need an upgrade to unicode symbol names? In other 
 words, what symbols aren't possible with the current support?

 
 Chinese and Japanese have gained about eleven thousand symbols since 
 Unicode 2.
 
 Unicode 2 covers 25 writing systems, while Unicode 11 covers 146. Just 
 updating to Unicode 3 would give us Cherokee, Ge'ez (multiple 
 languages), Khmer (Cambodian), Mongolian, Burmese, Sinhala (Sri Lanka), 
 Thaana (Maldivian), Canadian aboriginal syllabics, and Yi (Nuosu).

Very interesting! I would agree that we should at least add support for 
unicode symbols that are used in spoken languages, especially if we 
already have support for symbols that aren't ASCII already. I don't see 
the downside, especially if you can already use Unicode 2.0 symbols for 
identifiers (the ship has already sailed).

It could be a good incentive to get kids in countries where English 
isn't commonly spoken to try D out as a first programming language ;) 
Using your native language to show example code could be a huge benefit 
for teaching coding.

My recommendation is to put the PR up for review (that you said you had 
ready) and see what happens. Having an actual patch to talk about could 
change minds. At the very least, it's worth not wasting your efforts 
that you have already spent. Even if it does need a DIP, the PR can show 
that one less piece of effort is needed to get it implemented.

-Steve

Sep 24 2018

Joakim <dlang joakim.fea.st> writes:

On Friday, 21 September 2018 at 20:25:54 UTC, Walter Bright wrote:
 When I originally started with D, I thought non-ASCII 
 identifiers with Unicode was a good idea. I've since slowly 
 become less and less enthusiastic about it.

 First off, D source text simply must (and does) fully support 
 Unicode in comments, characters, and string literals. That's 
 not an issue.

 But identifiers? I haven't seen hardly any use of non-ascii 
 identifiers in C, C++, or D. In fact, I've seen zero use of it 
 outside of test cases. I don't see much point in expanding the 
 support of it. If people use such identifiers, the result would 
 most likely be annoyance rather than illumination when people 
 who don't know that language have to work on the code.

 Extending it further will also cause problems for all the tools 
 that work with D object code, like debuggers, disassemblers, 
 linkers, filesystems, etc.

To wit, Windows linker error with Unicode symbol:

https://github.com/ldc-developers/ldc/pull/2850#issuecomment-422968161

 Absent a much more compelling rationale for it, I'd say no.

I'm torn. I completely agree with Adam and others that people 
should be able to use any language they want. But the Unicode 
spec is such a tire fire that I'm leery of extending support for 
it.

Someone linked this Swift chapter on Unicode handling in an 
earlier forum thread, read the section on emoji in particular:

https://oleb.net/blog/2017/11/swift-4-strings/

I was laughing out loud when reading about composing "family" 
emojis with zero-width joiners. If you told me that was a tech 
parody, I'd have believed it.

I believe Swift just punts their Unicode support to ICU, like 
most any other project these days. That's a horrible sign, that 
you've created a spec so grotesquely complicated that most 
everybody relies on a single project to not have to deal with it.

Sep 21 2018

Neia Neutuladh <neia ikeran.org> writes:

On Saturday, 22 September 2018 at 04:54:59 UTC, Joakim wrote:
 To wit, Windows linker error with Unicode symbol:

 https://github.com/ldc-developers/ldc/pull/2850#issuecomment-422968161

That's a good argument for sticking to ASCII for name mangling.

 I'm torn. I completely agree with Adam and others that people 
 should be able to use any language they want. But the Unicode 
 spec is such a tire fire that I'm leery of extending support 
 for it.

The compiler doesn't have to do much with Unicode processing, 
fortunately.

Sep 21 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Friday, September 21, 2018 10:54:59 PM MDT Joakim via Digitalmars-d 
wrote:
 I'm torn. I completely agree with Adam and others that people
 should be able to use any language they want. But the Unicode
 spec is such a tire fire that I'm leery of extending support for
 it.

Unicode identifiers may make sense in a code base that is going to be used
solely by a group of developers who speak a particular language that uses a
number a of non-ASCII characters (especially languages like Chinese or
Japanese), but it has no business in any code that's intended for
international use. It just causes problems. At best, a particular, regional
keyboard may be able to handle a particular symbol, but most other keyboards
won't be able too. So, using that symbol causes problems for all of the
developers from other parts of the world even if those developers also have
Unicode symbols in their native languages.

 Someone linked this Swift chapter on Unicode handling in an
 earlier forum thread, read the section on emoji in particular:

 https://oleb.net/blog/2017/11/swift-4-strings/

 I was laughing out loud when reading about composing "family"
 emojis with zero-width joiners. If you told me that was a tech
 parody, I'd have believed it.

Honestly, I was horrified to find out that emojis were even in Unicode. It
makes no sense whatsover. Emojis are supposed to be sequences of characters
that can be interepreted as images. Treating them like Unicode symbols is
like treating entire words like Unicode symbols. It's just plain stupid and
a clear sign that Unicode has gone completely off the rails (if it was ever
on them). Unfortunately, it's the best tool that we have for the job.

- Jonathan M Davis

Sep 22 2018

Shachar Shemesh <shachar weka.io> writes:

On 22/09/18 11:52, Jonathan M Davis wrote:
 
 Honestly, I was horrified to find out that emojis were even in Unicode. It
 makes no sense whatsover. Emojis are supposed to be sequences of characters
 that can be interepreted as images. Treating them like Unicode symbols is
 like treating entire words like Unicode symbols. It's just plain stupid and
 a clear sign that Unicode has gone completely off the rails (if it was ever
 on them). Unfortunately, it's the best tool that we have for the job.
 
 - Jonathan M Davis

Thank Allah that someone said it before I had to. I could not agree 
more. Encoding whole words as single Unicode code points makes no sense.

U+FDF2

Shachar

Sep 22 2018

Thomas Mader <thomas.mader gmail.com> writes:

On Saturday, 22 September 2018 at 10:24:48 UTC, Shachar Shemesh 
wrote:
 Thank Allah that someone said it before I had to. I could not 
 agree more. Encoding whole words as single Unicode code points 
 makes no sense.

The goal of Unicode is to support diversity, if you argue against 
that you don't need Unicode at all.
What you are saying is basically that you would remove Chinese 
too.

Emojis are not my world either but it is an expression system / 
language.

Sep 22 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Saturday, September 22, 2018 4:51:47 AM MDT Thomas Mader via Digitalmars-
d wrote:
 On Saturday, 22 September 2018 at 10:24:48 UTC, Shachar Shemesh

 wrote:
 Thank Allah that someone said it before I had to. I could not
 agree more. Encoding whole words as single Unicode code points
 makes no sense.

 The goal of Unicode is to support diversity, if you argue against
 that you don't need Unicode at all.
 What you are saying is basically that you would remove Chinese
 too.

 Emojis are not my world either but it is an expression system /
 language.

Unicode is supposed to be a universal way of representing every character in
every language. Emojis are not characters. They are sequences of characters
that people use to represent images. I do not understand how an argument can
even be made that they belong in Unicode. As I said, it's exactly the same
as arguing that words should be represented in Unicode. Unfortunately,
however, at least some of them are in there. :|

- Jonathan M Davis

Sep 22 2018

Shachar Shemesh <shachar weka.io> writes:

On 22/09/18 14:28, Jonathan M Davis wrote:
 As I said, it's exactly the same
 as arguing that words should be represented in Unicode. Unfortunately,
 however, at least some of them are in there. :|
 
 - Jonathan M Davis

To be fair to them, that word is part of the "Arabic-representation 
forms" section. The "Presentation forms" sections are meant as backwards 
compatibility toward code points that existed before, and are not meant 
to be generated by Unicode aware applications.

Shachar

Sep 22 2018

Thomas Mader <thomas.mader gmail.com> writes:

On Saturday, 22 September 2018 at 11:28:48 UTC, Jonathan M Davis 
wrote:
 Unicode is supposed to be a universal way of representing every 
 character in every language. Emojis are not characters. They 
 are sequences of characters that people use to represent 
 images. I do not understand how an argument can even be made 
 that they belong in Unicode. As I said, it's exactly the same 
 as arguing that words should be represented in Unicode. 
 Unfortunately, however, at least some of them are in there. :|

At least since the incorporation of Emojis it's not supposed to 
be a universal way of representing characters anymore. :-)
Maybe there was a time when that was true I don't know but I 
think they see Unicode as a way to express all language symbols.
And Emojis is nothing else than a language were each symbol 
stands for an emotion/word/sentence.
If Unicode only allows languages with characters which are used 
to form words it's excluding languages which use other ways of 
expressing something.

Would you suggest to remove such writing systems out of Unicode?
What should a museum do which is in need of a software to somehow 
manage Egyptian hieroglyphs?

Unicode was made to support all sorts of writing systems and 
using multiple characters per word is just one system to form a 
writing system.

Sep 22 2018

Shachar Shemesh <shachar weka.io> writes:

On 22/09/18 15:13, Thomas Mader wrote:
 Would you suggest to remove such writing systems out of Unicode?
 What should a museum do which is in need of a software to somehow manage 
 Egyptian hieroglyphs?

If memory serves me right, hieroglyphs actually represent consonants 
(vowels are implicit), and as such, are most definitely "characters".

The only language I can think of, off the top of my head, where words 
have distinct signs is sign language. It is a good question whether 
Unicode should include such a language (difficulty of representing 
motion in a font aside).

Shachar

Sep 22 2018

Neia Neutuladh <neia ikeran.org> writes:

On Saturday, 22 September 2018 at 12:24:49 UTC, Shachar Shemesh 
wrote:
 If memory serves me right, hieroglyphs actually represent 
 consonants (vowels are implicit), and as such, are most 
 definitely "characters".

Egyptian hieroglyphics uses logographs (symbols representing 
whole words, which might be multiple syllables), letters, and 
determinants (which don't represent any word but disambiguate the 
surrounding words).

Looking things up serves me better than memory, usually.

 The only language I can think of, off the top of my head, where 
 words have distinct signs is sign language.

Logographic writing systems. There is one logographic writing 
system still in common use, and it's the standard writing system 
for Chinese and Japanese. That's about 1.4 billion people. It was 
used in Korea until hangul became popularized.

Unicode also aims to support writing systems that aren't used 
anymore. That means Mayan, cuneiform (several variants), Egyptian 
hieroglyphics and demotic script, several extinct variants on the 
Chinese writing system, and Luwian.

Sign languages generally don't have writing systems. They're also 
not generally related to any ambient spoken languages (for 
instance, American Sign Language is derived from French Sign 
Language), so if you speak sign language and can write, you're 
bilingual. Anyway, without writing systems, sign languages are 
irrelevant to Unicode.

Sep 22 2018

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 09/22/2018 09:27 AM, Neia Neutuladh wrote:

 Logographic writing systems. There is one logographic writing system
 still in common use, and it's the standard writing system for Chinese
 and Japanese.

I had the misconception of each Chinese character meaning a word until I 
read "The Chinese Language, Fact and Fantasy" by John DeFrancis. One 
thing I learned was that Chinese is not purely logographic.

Ali

Sep 23 2018

Steven Schveighoffer <schveiguy gmail.com> writes:

On 9/22/18 4:52 AM, Jonathan M Davis wrote:
 I was laughing out loud when reading about composing "family"
 emojis with zero-width joiners. If you told me that was a tech
 parody, I'd have believed it.

 
 Honestly, I was horrified to find out that emojis were even in Unicode. It
 makes no sense whatsover. Emojis are supposed to be sequences of characters
 that can be interepreted as images. Treating them like Unicode symbols is
 like treating entire words like Unicode symbols. It's just plain stupid and
 a clear sign that Unicode has gone completely off the rails (if it was ever
 on them). Unfortunately, it's the best tool that we have for the job.

But aren't some (many?) Chinese/Japanese characters representing whole 
words?

-Steve

Sep 22 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Saturday, September 22, 2018 6:37:09 AM MDT Steven Schveighoffer via 
Digitalmars-d wrote:
 On 9/22/18 4:52 AM, Jonathan M Davis wrote:
 I was laughing out loud when reading about composing "family"
 emojis with zero-width joiners. If you told me that was a tech
 parody, I'd have believed it.

 Honestly, I was horrified to find out that emojis were even in Unicode.
 It makes no sense whatsover. Emojis are supposed to be sequences of
 characters that can be interepreted as images. Treating them like
 Unicode symbols is like treating entire words like Unicode symbols.
 It's just plain stupid and a clear sign that Unicode has gone
 completely off the rails (if it was ever on them). Unfortunately, it's
 the best tool that we have for the job.

 But aren't some (many?) Chinese/Japanese characters representing whole
 words?

It's true that they're not characters in the sense that Roman characters are
characters, but they're still part of the alphabets for those languages.
Emojis are specifically formed from sequences of characters - e.g. :) is two
characters which are already expressible on their own. They're meant to
represent a smiley face, but it's a sequence of characters already. There's
no need whatsoever to represent anything extra Unicode. It's already enough
of a disaster that there are multiple ways to represent the same character
in Unicode without nonsense like emojis. It's stuff like this that really
makes me wish that we could come up with a new standard that would replace
Unicode, but that's likely a pipe dream at this point.

- Jonathan M Davis

Sep 22 2018

Steven Schveighoffer <schveiguy gmail.com> writes:

On 9/22/18 8:58 AM, Jonathan M Davis wrote:
 On Saturday, September 22, 2018 6:37:09 AM MDT Steven Schveighoffer via
 Digitalmars-d wrote:
 On 9/22/18 4:52 AM, Jonathan M Davis wrote:
 I was laughing out loud when reading about composing "family"
 emojis with zero-width joiners. If you told me that was a tech
 parody, I'd have believed it.

 Honestly, I was horrified to find out that emojis were even in Unicode.
 It makes no sense whatsover. Emojis are supposed to be sequences of
 characters that can be interepreted as images. Treating them like
 Unicode symbols is like treating entire words like Unicode symbols.
 It's just plain stupid and a clear sign that Unicode has gone
 completely off the rails (if it was ever on them). Unfortunately, it's
 the best tool that we have for the job.

 But aren't some (many?) Chinese/Japanese characters representing whole
 words?

 
 It's true that they're not characters in the sense that Roman characters are
 characters, but they're still part of the alphabets for those languages.
 Emojis are specifically formed from sequences of characters - e.g. :) is two
 characters which are already expressible on their own. They're meant to
 represent a smiley face, but it's a sequence of characters already. There's
 no need whatsoever to represent anything extra Unicode. It's already enough
 of a disaster that there are multiple ways to represent the same character
 in Unicode without nonsense like emojis. It's stuff like this that really
 makes me wish that we could come up with a new standard that would replace
 Unicode, but that's likely a pipe dream at this point.

But there are tons of emojis that have nothing to do with sequences of 
characters. Like houses, or planes, or whatever. I don't even know what 
the sequences of characters are for them.

I think it started out like that, but turned into something else.

Either way, I can't imagine any benefit from using emojis in symbol names.

-Steve

Sep 24 2018

sarn <sarn theartofmachinery.com> writes:

On Saturday, 22 September 2018 at 12:37:09 UTC, Steven 
Schveighoffer wrote:
 But aren't some (many?) Chinese/Japanese characters 
 representing whole words?

 -Steve

Kind of hair-splitting, but it's more accurate to say that some 
Chinese/Japanese words can be written with one character.  Like 
how English speakers wouldn't normally say that "A" and "I" are 
characters representing whole words.

Sep 22 2018

Neia Neutuladh <neia ikeran.org> writes:

On Saturday, 22 September 2018 at 08:52:32 UTC, Jonathan M Davis 
wrote:
 Unicode identifiers may make sense in a code base that is going 
 to be used solely by a group of developers who speak a 
 particular language that uses a number a of non-ASCII 
 characters (especially languages like Chinese or Japanese), but 
 it has no business in any code that's intended for 
 international use. It just causes problems.

You have a problem when you need to share a codebase between two 
organizations using different languages. "Just use ASCII" is not 
the solution. "Use a language that most developers in both 
organizations can use" is. That's *usually* going to be English, 
but not always. For instance, a Belorussian company doing 
outsourcing work for a Russian company might reasonably write 
code in Russian.

If you're writing for a global audience, as most open source code 
is, you're usually going to use the most widely spoken language.

Sep 22 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Saturday, September 22, 2018 10:07:38 AM MDT Neia Neutuladh via 
Digitalmars-d wrote:
 On Saturday, 22 September 2018 at 08:52:32 UTC, Jonathan M Davis

 wrote:
 Unicode identifiers may make sense in a code base that is going
 to be used solely by a group of developers who speak a
 particular language that uses a number a of non-ASCII
 characters (especially languages like Chinese or Japanese), but
 it has no business in any code that's intended for
 international use. It just causes problems.

 You have a problem when you need to share a codebase between two
 organizations using different languages. "Just use ASCII" is not
 the solution. "Use a language that most developers in both
 organizations can use" is. That's *usually* going to be English,
 but not always. For instance, a Belorussian company doing
 outsourcing work for a Russian company might reasonably write
 code in Russian.

 If you're writing for a global audience, as most open source code
 is, you're usually going to use the most widely spoken language.

My point is that if your code base is definitely only going to be used
within a group of people who are using a keyboard that supports a Unicode
character that you want to use, then it's not necessarily a problem to use
it, but if you're writing code that may be seen or used by a general
audience (especially if it's going to be open source), then it needs to be
in ASCII, or it's a serious problem. Even if it's a character like lambda
that most everyone is going to understand, many, many programmers are not
going to be able type it on their keyboards, and that's going to cause
nothing but problems.

For better or worse, English is the international language of science and
engineering, and that includes programming. So, any programs that are
intended to be seen and used by the world at large need to be in ASCII. And
the biggest practical issue with that is whether a character is even on a
typical keyboard. Using a Unicode character in a program makes it so that
make programmers cannot type it. And even given the large breadth of Unicode
characters, you could even have a keyboard that supports a number of Unicode
characters and still not have the Unicode character in question. So, open
source programs need to be in ASCII.

Now, I don't know that it's a problem to support a wide range of Unicode
characters in identifiers when you consider the issues of folks whose native
language is not English (especially when it's a language like Chinese or
Japanese), but open source programs should only be using ASCII identifiers.
And unfortunately, sometimes, the fact that a language supports Unicode
identifiers has lead English speakers to do stupid things like use the
lambda character in identifiers. So, I can understand Walter's reticence to
go further with supporting Unicode identifiers, but on the other hand, when
you consider how many people there are on the planet who use a language that
doesn't even use the latin alphabet, it's arguably a good idea to fully
support Unicode identifiers.

- Jonathan M Davis

Sep 22 2018

Walter Bright <newshound2 digitalmars.com> writes:

On 9/22/2018 6:01 PM, Jonathan M Davis wrote:
 For better or worse, English is the international language of science and
 engineering, and that includes programming.

In the earlier days of D, I put on the web pages a google widget what would 
automatically translate the page into any language google supported. This was 
eventually removed (not by me) because nobody wanted it.

Nobody (besides me) even noticed it was removed. And the D community is a very 
international one.

Supporting Unicode in identifiers gives users a false sense that it's a good 
idea to use them. Lots of programming tools don't work well with Unicode. Even 
Windows doesn't by default - you've got to run "chcp 65001" each time you open
a 
console window. Filesystems don't work reliably with Unicode. Heck, the reason 
module names should be lower case in D is because mixed case doesn't work 
reliably across filesystems.

D supports Unicode in identifiers because C and C++ do, and we want to be able 
to interoperate with them. Extending Unicode identifier support off into other 
directions, especially ones that break such interoperability, is just doing a 
disservice to users.

Sep 23 2018

Neia Neutuladh <neia ikeran.org> writes:

On Sunday, 23 September 2018 at 21:12:13 UTC, Walter Bright wrote:
 D supports Unicode in identifiers because C and C++ do, and we 
 want to be able to interoperate with them. Extending Unicode 
 identifier support off into other directions, especially ones 
 that break such interoperability, is just doing a disservice to 
 users.

Okay, that's why you previously selected C99 as the standard for 
what characters to allow. Do you want to update to match C11? 
It's been out for the better part of a decade, after all.

Sep 23 2018

Walter Bright <newshound2 digitalmars.com> writes:

On 9/23/2018 3:23 PM, Neia Neutuladh wrote:
 Okay, that's why you previously selected C99 as the standard for what
characters 
 to allow. Do you want to update to match C11? It's been out for the better
part 
 of a decade, after all.

I wasn't aware it changed in C11.

Sep 23 2018

Neia Neutuladh <neia ikeran.org> writes:

On Monday, 24 September 2018 at 01:39:43 UTC, Walter Bright wrote:
 On 9/23/2018 3:23 PM, Neia Neutuladh wrote:
 Okay, that's why you previously selected C99 as the standard 
 for what characters to allow. Do you want to update to match 
 C11? It's been out for the better part of a decade, after all.

 I wasn't aware it changed in C11.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf page 
522 (PDF numbering) or 504 (internal numbering).

Outside the BMP, almost everything is allowed, including many 
things that are not currently mapped to any Unicode value. Within 
the BMP, a heck of a lot of stuff is allowed, including a lot 
that D doesn't currently allow.

GCC hasn't even updated to the C99 standard here, as far as I can 
tell, but clang-5.0 is up to date.

Sep 23 2018

Steven Schveighoffer <schveiguy gmail.com> writes:

On 9/24/18 12:23 AM, Neia Neutuladh wrote:
 On Monday, 24 September 2018 at 01:39:43 UTC, Walter Bright wrote:
 On 9/23/2018 3:23 PM, Neia Neutuladh wrote:
 Okay, that's why you previously selected C99 as the standard for what 
 characters to allow. Do you want to update to match C11? It's been 
 out for the better part of a decade, after all.

 I wasn't aware it changed in C11.

 
 http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf page 522 (PDF 
 numbering) or 504 (internal numbering).
 
 Outside the BMP, almost everything is allowed, including many things 
 that are not currently mapped to any Unicode value. Within the BMP, a 
 heck of a lot of stuff is allowed, including a lot that D doesn't 
 currently allow.
 
 GCC hasn't even updated to the C99 standard here, as far as I can tell, 
 but clang-5.0 is up to date.

I searched around for the current state of symbol names in C, and found 
some really crappy rules, though maybe this site isn't up to date?:

https://en.cppreference.com/w/c/language/identifier

What I understand from that is:

1. Yes, you can use any unicode character you want in C/C++ (seemingly 
since C99)
2. There are no rules about what *encoding* is acceptable, it's 
implementation defined. So various compilers have different rules as to 
what will be accepted in the actual source code. In fact, I read 
somewhere that not even ASCII is guaranteed to be supported.

The result being, that you have to write the identifiers with an ASCII 
escape sequence in order for it to be actually portable. Which to me, 
completely defeats the purpose of using such identifiers in the first place.

For example, on that page, they have a line that works in clang, not in 
GCC (tagged as implementation defined):

char *🐱 = "cat";

The portable version looks like this:

char *\U0001f431 = "cat";

Seriously, who wants to use that?

Now, D can potentially do better (especially when all front-ends are the 
same) and support such things in the spec, but I think the argument 
"because C supports it" is kind of bunk.

Or am I reading it wrong?

In any case, I would expect that symbol name support should be focused 
only on languages which people use, not emojis. If there are words in 
Chinese or Japanese that can't be expressed using D, while other words 
can, it would seem inconsistent to a Chinese or Japanese speaking user, 
and I think we should work to fix that. I just have no idea what the 
state of that is.

I also tend to agree that most code is going to be written in English, 
even when the primary language of the user is not. Part of the reason, 
which I haven't read here yet, is that all the keywords are in English. 
Someone has to kind of understand those to get the meaning of some 
constructs, and it's going to read strangely with the non-english words.

One group which I believe hasn't spoken up yet is the group making the 
hunt framework, whom I believe are all Chinese? At least their web site 
is. It would be good to hear from a group like that which has large 
experience writing mature D code (it appears all to be in English) and 
how they feel about the support.

-Steve

Sep 24 2018

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 24 September 2018 at 13:26:14 UTC, Steven 
Schveighoffer wrote:
 Part of the reason, which I haven't read here yet, is that all 
 the keywords are in English.

Eh, those are kinda opaque sequences anyway, since the meanings 
aren't quite what the normal dictionary definition is anyway. 
Look up "int" in the dictionary... or "void", or even "string". 
They are just a handful of magic sequences we learn with the 
programming language. (And in languages like Rust, "fn", lol.)

 One group which I believe hasn't spoken up yet is the group 
 making the hunt framework, whom I believe are all Chinese? At 
 least their web site is.

I know they used a lot of my code as a starting point, and I, of 
course, wrote it in English, so that could have biased it a bit 
too. Though that might be a general point where you want to use 
these libraries and they are in a language.

Just even so, I still find it kinda hard to believe that 
everybody everywhere uses only English in all their code. Maybe 
our efforts should be going toward the Chinese market via natural 
language support instead of competing with Rust on computer 
language features :P

 It would be good to hear from a group like that which has large 
 experience writing mature D code (it appears all to be in 
 English) and how they feel about the support.

definitely.

Sep 24 2018

Steven Schveighoffer <schveiguy gmail.com> writes:

On 9/24/18 10:14 AM, Adam D. Ruppe wrote:
 On Monday, 24 September 2018 at 13:26:14 UTC, Steven Schveighoffer wrote:
 Part of the reason, which I haven't read here yet, is that all the 
 keywords are in English.

 
 Eh, those are kinda opaque sequences anyway, since the meanings aren't 
 quite what the normal dictionary definition is anyway. Look up "int" in 
 the dictionary... or "void", or even "string". They are just a handful 
 of magic sequences we learn with the programming language. (And in 
 languages like Rust, "fn", lol.)

Well, even on top of that, the standard library is full of English words 
that read very coherently when used together (if you understand English).

I can't imagine a long chain of English algorithms with some Chinese one 
pasted in the middle looks very good :) I suppose you could alias them 
all...

-Steve

Sep 24 2018

Martin Tschierschke <mt smartdolphin.de> writes:

On Monday, 24 September 2018 at 14:34:21 UTC, Steven 
Schveighoffer wrote:
 On 9/24/18 10:14 AM, Adam D. Ruppe wrote:
 On Monday, 24 September 2018 at 13:26:14 UTC, Steven 
 Schveighoffer wrote:
 Part of the reason, which I haven't read here yet, is that 
 all the keywords are in English.

 
 Eh, those are kinda opaque sequences anyway, since the 
 meanings aren't quite what the normal dictionary definition is 
 anyway. Look up "int" in the dictionary... or "void", or even 
 "string". They are just a handful of magic sequences we learn 
 with the programming language. (And in languages like Rust, 
 "fn", lol.)

 Well, even on top of that, the standard library is full of 
 English words that read very coherently when used together (if 
 you understand English).

 I can't imagine a long chain of English algorithms with some 
 Chinese one pasted in the middle looks very good :) I suppose 
 you could alias them all...

 -Steve

You might get really funny error messages.

🙂 can't be casted to int.

:-)

And if you have to increment the number of cars you can write: 
🚗++; This might give really funny looking programs!

Sep 24 2018

Steven Schveighoffer <schveiguy gmail.com> writes:

On 9/24/18 2:20 PM, Martin Tschierschke wrote:
 On Monday, 24 September 2018 at 14:34:21 UTC, Steven Schveighoffer wrote:
 On 9/24/18 10:14 AM, Adam D. Ruppe wrote:
 On Monday, 24 September 2018 at 13:26:14 UTC, Steven Schveighoffer 
 wrote:
 Part of the reason, which I haven't read here yet, is that all the 
 keywords are in English.

 Eh, those are kinda opaque sequences anyway, since the meanings 
 aren't quite what the normal dictionary definition is anyway. Look up 
 "int" in the dictionary... or "void", or even "string". They are just 
 a handful of magic sequences we learn with the programming language. 
 (And in languages like Rust, "fn", lol.)

 Well, even on top of that, the standard library is full of English 
 words that read very coherently when used together (if you understand 
 English).

 I can't imagine a long chain of English algorithms with some Chinese 
 one pasted in the middle looks very good :) I suppose you could alias 
 them all...

 You might get really funny error messages.
 
 🙂 can't be casted to int.

Haha, it could be cynical as well

int can’t be casted to int🤔

Oh, the games we could play.

-Steve

Sep 24 2018

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Monday, 24 September 2018 at 13:26:14 UTC, Steven 
Schveighoffer wrote:
 2. There are no rules about what *encoding* is acceptable, it's 
 implementation defined. So various compilers have different 
 rules as to what will be accepted in the actual source code. In 
 fact, I read somewhere that not even ASCII is guaranteed to be 
 supported.

Indeed. IBM mainframes have C compilers too but not ASCII. They 
code in EBCDIC. That's why for instance it's not portable to do 
things like

      if(c >= 'A' && c <= 'Z') printf("CAPITAL LETTER\n");

is not true in EBCDIC.

Sep 24 2018

Steven Schveighoffer <schveiguy gmail.com> writes:

On 9/24/18 3:18 PM, Patrick Schluter wrote:
 On Monday, 24 September 2018 at 13:26:14 UTC, Steven Schveighoffer wrote:
 2. There are no rules about what *encoding* is acceptable, it's 
 implementation defined. So various compilers have different rules as 
 to what will be accepted in the actual source code. In fact, I read 
 somewhere that not even ASCII is guaranteed to be supported.

 Indeed. IBM mainframes have C compilers too but not ASCII. They code in 
 EBCDIC. That's why for instance it's not portable to do things like
 
       if(c >= 'A' && c <= 'Z') printf("CAPITAL LETTER\n");
 
 is not true in EBCDIC.

Right. But it's just a side-note -- I'd guess all modern compilers 
support ASCII, and definitely ones that we would want to interoperate with.

Besides, that example is more concerned about *input data* encoding, not 
*source code* encoding. If the above is written in ASCII, then I would 
assume that the bytes in the source file are the ASCII bytes, and 
probably the IBM compilers would not know what to do with such files (it 
would all be gibberish if you opened on an EBCDIC editor). You'd first 
have to translate it to EBCDIC, which is a red flag that likely this 
isn't going to work :)

-Steve

Sep 24 2018

Dennis <dkorpel gmail.com> writes:

On Sunday, 23 September 2018 at 21:12:13 UTC, Walter Bright wrote:
 D supports Unicode in identifiers because C and C++ do, and we 
 want to be able to interoperate with them. Extending Unicode 
 identifier support off into other directions, especially ones 
 that break such interoperability, is just doing a disservice to 
 users.

I always thought D supported Unicode with the goal of going 
forward with it while C was stuck with ASCII:
http://www.drdobbs.com/cpp/time-for-unicode/228700405

"The D programming language has already driven stakes in the 
ground, saying it will not support 16 bit processors, processors 
that don't have 8 bit bytes, and processors with crippled, 
non-IEEE floating point. Is it time to drive another stake in and 
say the time for Unicode has come? "

Have you changed your mind since?

Sep 23 2018

Walter Bright <newshound2 digitalmars.com> writes:

On 9/23/2018 6:06 PM, Dennis wrote:
 Have you changed your mind since?

D the language is well suited to the development of Unicode apps. D source code 
is another matter.

Sep 23 2018

Dennis <dkorpel gmail.com> writes:

On Monday, 24 September 2018 at 01:32:38 UTC, Walter Bright wrote:
 D the language is well suited to the development of Unicode 
 apps. D source code is another matter.

But in the article you specifically talk about the use of Unicode 
in the context of source code instead of apps:

"With the D programming language, we continuously run up against 
the problem that ASCII has reached its expressivity limits."

"There are the chevrons « and » which serve as another set of 
brackets to lighten the overburdened ambiguities of ( ). There 
are the dot-product and cross-product characters · and × which 
would make lovely infix operator tokens for math libraries. The 
greek letters would be great for math variable names."

Sep 24 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Monday, September 24, 2018 4:19:31 AM MDT Dennis via Digitalmars-d wrote:
 On Monday, 24 September 2018 at 01:32:38 UTC, Walter Bright wrote:
 D the language is well suited to the development of Unicode
 apps. D source code is another matter.

 But in the article you specifically talk about the use of Unicode
 in the context of source code instead of apps:

 "With the D programming language, we continuously run up against
 the problem that ASCII has reached its expressivity limits."

 "There are the chevrons � and � which serve as another set of
 brackets to lighten the overburdened ambiguities of ( ). There
 are the dot-product and cross-product characters � and � which
 would make lovely infix operator tokens for math libraries. The
 greek letters would be great for math variable names."

Given that the typical keyboard has none of those characters, maintaining
code that used any of them would be a royal pain. It's one thing if they're
used in the occasional string as data, but it's quite another if they're
used as identifiers or operators. I don't see how that would be at all
maintainable. You'd be forced to constantly copy and paste rather than type.

- Jonathan M Davis

Sep 24 2018

Dennis <dkorpel gmail.com> writes:

On Monday, 24 September 2018 at 10:36:50 UTC, Jonathan M Davis 
wrote:
 Given that the typical keyboard has none of those characters, 
 maintaining code that used any of them would be a royal pain.

Note that I'm not trying to argue either way, it's just that I 
used to think of Walter's stance on D and Unicode as:
"D would fully embrace Unicode if only editors/debuggers etc. 
would embrace it too"

But now I read:

 D supports Unicode in identifiers because C and C++ do, and we 
 want to be able to interoperate with them."

So I wonder what changed. I guess it's mostly answered in the 
first reply:

 When I originally started with D, I thought non-ASCII 
 identifiers with Unicode was a good idea. I've since slowly 
 become less and less enthusiastic about it.

Sep 24 2018

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 24 September 2018 at 10:36:50 UTC, Jonathan M Davis 
wrote:
 Given that the typical keyboard has none of those characters, 
 maintaining code that used any of them would be a royal pain.

It is pretty easy to type them with a little keyboard config 
change, and like vim can pick those up from comments in the file 
even, though you have to train your fingers to know how to use it 
effectively too... but if you were maintaining something long 
term, you'd just do that.

Sep 24 2018

Abdulhaq <alynch4047 gmail.com> writes:

On Saturday, 22 September 2018 at 08:52:32 UTC, Jonathan M Davis 
wrote:

 Honestly, I was horrified to find out that emojis were even in 
 Unicode. It makes no sense whatsover. Emojis are supposed to be 
 sequences of characters that can be interepreted as images. 
 Treating them like Unicode symbols is like treating entire 
 words like Unicode symbols. It's just plain stupid and a clear 
 sign that Unicode has gone completely off the rails (if it was 
 ever on them). Unfortunately, it's the best tool that we have 
 for the job.

According to the Unicode website, 
http://unicode.org/standard/WhatIsUnicode.html,

"""
Support of Unicode forms the foundation for the representation of 
languages and symbols in all major operating systems, search 
engines, browsers, laptops, and smart phones—plus the Internet 
and World Wide Web (URLs, HTML, XML, CSS, JSON, etc.)"""

Note, unicode supports symbols, not just characters.

The smiley face symbol predates its ':-)' usage in ascii text, 
https://www.smithsonianmag.com/arts-culture/who-really-invented-the-s
iley-face-2058483/. It's fundamentally a symbol, not a sequence of characters.
Therefore it is not unreasonable for it to be encoded with a unicode number. I
do agree though, of course, that it would seem bizarre to use an emoji as a D
identifier.

The early history of computer science is completely dominated by 
cultures who use latin script based characters, and hence, quiet 
reasonably, text encoding and its automated visual representation 
by compute based devices is dominated by the requirements of 
latin script languages. However, the world keeps turning and, 
despite DT's best efforts, China et al. look to become dominant. 
Even if not China, the chances are that eventually a non-latin 
script based language will become very important. Parochial views 
like "all open source code should be in ASCII" will look silly.

However, until that time D developers have to spend their time 
where it can be most useful. Hence the condition of whether to 
apply Neia's patch / ideas or not mainly depends on how much 
effort the donwstream effort will be (debuggers etc. as Walter 
pointed out), and how much the gain is. As unicode 2.0 is already 
supported I would take a guess that the vast majority of people 
with access to a computer can already enter identifiers in D that 
are rich enough for them. As Adam said though, it would be a good 
idea to at least ask!

Sep 23 2018

Walter Bright <newshound2 digitalmars.com> writes:

On 9/23/2018 12:06 PM, Abdulhaq wrote:
 The early history of computer science is completely dominated by cultures who 
 use latin script based characters,

Small character sets are much more implementable on primitive systems like 
telegraphs and electro-mechanical ttys.

It wasn't even practical to display a rich character set until the early 1980's 
or so. There wasn't enough memory. Glass ttys at the time could barely, and I 
mean barely, display ASCII. I know because I designed and built one.

Sep 25 2018

aliak <something something.com> writes:

On Friday, 21 September 2018 at 20:25:54 UTC, Walter Bright wrote:
 When I originally started with D, I thought non-ASCII 
 identifiers with Unicode was a good idea. I've since slowly 
 become less and less enthusiastic about it.

 First off, D source text simply must (and does) fully support 
 Unicode in comments, characters, and string literals. That's 
 not an issue.

 But identifiers? I haven't seen hardly any use of non-ascii 
 identifiers in C, C++, or D. In fact, I've seen zero use of it 
 outside of test cases. I don't see much point in expanding the 
 support of it. If people use such identifiers, the result would 
 most likely be annoyance rather than illumination when people 
 who don't know that language have to work on the code.

Not seeing identifiers in languages you don't program in or can 
read in is expected.

If it's supported it will be used:

Japanese Swift: 
https://speakerdeck.com/codelynx/programming-swift-in-japanese

 Extending it further will also cause problems for all the tools 
 that work with D object code, like debuggers, disassemblers, 
 linkers, filesystems, etc.

 Absent a much more compelling rationale for it, I'd say no.

More compelling than: "there're 6 billion people in this world 
who don't speak english?"

Allowing people to program in their own language while reducing 
the cognitive friction for people who want to learn programming 
in the majority of the world seems like a no-brainer thing to do.

Sep 23 2018

Walter Bright <newshound2 digitalmars.com> writes:

On 9/23/2018 9:52 AM, aliak wrote:
 Not seeing identifiers in languages you don't program in or can read in is 
 expected.

On the other hand, I've been programming for 40 years. I've customized my C++ 
compiler to emit error messages in various languages:

https://github.com/DigitalMars/Compiler/blob/master/dm/src/dmc/msgsx.c

I've implemented SHIFT-JIS encodings, along with .950 (Chinese) and .949 
(Korean) code pages in the C++ compiler.

I've worked in Japan writing software for Japanese companies.

I've sold compilers internationally for 30 years (mostly to Germany and Japan). 
I did the tech support, meaning I'd see their code.

---

There's a reason why dmd doesn't have international error messages. My 
experience with it is that international users don't want it. They prefer the 
english messages.

I'm sure if you look hard enough you'll find someone using non-ASCII characters 
in identifiers.

---

When I visited Remedy Games in Finland a few years back, I was surprised that 
everyone in the company was talking in english. I asked if they were doing that 
out of courtesy to me. They laughed, and said no, they talked in English
because 
they came from all over the world, and english was the only language they had
in 
common.

Sep 23 2018

0xEAB <desisma heidel.beer> writes:

On Sunday, 23 September 2018 at 20:49:39 UTC, Walter Bright wrote:
 There's a reason why dmd doesn't have international error 
 messages. My experience with it is that international users 
 don't want it. They prefer the english messages.

I'm a native German speaker.
As for my part, I agree on this, indeed.


There are several reasons for this:
- Usually such translations are terrible, simply put.
- Uncontinuous translations [0]
- Non-idiomatic sentences that still sound like English somehow.
- Translations of tech terms [1]
- Non-idiomatic translations of tech terms [2]

However, well done translations might be quite nice at the 

in VS 2010 I was happy with the German error messages. I'm not 
sure whether it was just delusion but I think it got worse with 
some later version, though.




[0] There's nothing worse than every single sentence being 
treated on its own during the translation process. At least 
that's what you'd often think when you face a longer error 
message. Usually you're confronted with non-linked and 
kindergarten-like sentences that don't seem to be meant to be put 
together. Often you'd think there were several translators. 
Favorite problem with this: 2 different terms for the same thing 
in two sentences.

[1] e.g. "integer type" -> "ganzzahliger Datentyp"
This just sounds weird. Anyone using "int" in their code knows 
what it means anyway...
Nevertheless, there are some common translations that are fine 
(primarily because they're common), e.g. "error" -> "Fehler"

[2] e.g. "assertion" -> "Assertionsfehler"
This particular one can be found in Windows 10 and is not even 
proper German.

Sep 24 2018

0xEAB <desisma heidel.beer> writes:

On Monday, 24 September 2018 at 15:17:14 UTC, 0xEAB wrote:

 German error messages.

addendum: I've been using the English version since VS2017

Sep 24 2018

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 09/24/2018 08:17 AM, 0xEAB wrote:

 - Non-idiomatic translations of tech terms [2]

This is something I had heard from a Digital Research programmer in 
early 90s:

English message was something like "No memory left" and the German 
translation was "No memory on the left hand side" :)

Ali

Sep 25 2018

Simen =?UTF-8?B?S2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:

On Wednesday, 26 September 2018 at 02:12:07 UTC, Ali Çehreli 
wrote:
 On 09/24/2018 08:17 AM, 0xEAB wrote:

 - Non-idiomatic translations of tech terms [2]

 This is something I had heard from a Digital Research 
 programmer in early 90s:

 English message was something like "No memory left" and the 
 German translation was "No memory on the left hand side" :)

My ex-girlfriend tried to learn SQL from a book that had gotten a 
prize for its use of Norwegian. As a result, every single concept 
used a different name from what everybody else uses, and while it 
may be possible to learn som SQL from this, it made googling an 
absolute nightmare. Just imagine a whole book saying CHOOSE for 
SELECT, IF for WHERE, and USING instead of FROM - only worse, 
since it's a different language. It even used SQL pseudo-code 
with these made-up names, and showed how to translate it to 
proper SQL as more of an afterthought.

--
   Simen

Sep 25 2018

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Wednesday, 26 September 2018 at 02:12:07 UTC, Ali Çehreli 
wrote:
 On 09/24/2018 08:17 AM, 0xEAB wrote:

 - Non-idiomatic translations of tech terms [2]

 This is something I had heard from a Digital Research 
 programmer in early 90s:

 English message was something like "No memory left" and the 
 German translation was "No memory on the left hand side" :)

The K&R in German was of the same "quality". That happens when 
the translator is not an IT person himself.

Sep 25 2018

ShadoLight <ettienne.gilbert gmail.com> writes:

On Wednesday, 26 September 2018 at 02:12:07 UTC, Ali Çehreli 
wrote:
 On 09/24/2018 08:17 AM, 0xEAB wrote:

 - Non-idiomatic translations of tech terms [2]


[snip]
 English message was something like "No memory left" and the 
 German translation was "No memory on the left hand side" :)

 Ali

Not sure if this was not just some urban legend, but there was a 
delightful story back in the late 80s/early 90s about the early 
translation programs. They were in particular not very good at 
idiomatic translations, so people would play with idiomatic 
expressions from language X (say english) to language Y, and then 
back from Y to X  - and then see what was returned.

Apparently the expression "the spirit is willing but the flesh is 
weak" translated to Russian and back was returned by one such 
program as:

"The vodka is good but the meat is rotten!"

Sep 26 2018

abcde1234 <abcde1234 ge.sd> writes:

On Wednesday, 26 September 2018 at 12:57:21 UTC, ShadoLight wrote:
 On Wednesday, 26 September 2018 at 02:12:07 UTC, Ali Çehreli 
 wrote:
 On 09/24/2018 08:17 AM, 0xEAB wrote:

 - Non-idiomatic translations of tech terms [2]


 [snip]
 English message was something like "No memory left" and the 
 German translation was "No memory on the left hand side" :)

 Ali

 Not sure if this was not just some urban legend, but there was 
 a delightful story back in the late 80s/early 90s about the 
 early translation programs. They were in particular not very 
 good at idiomatic translations, so people would play with 
 idiomatic expressions from language X (say english) to language 
 Y, and then back from Y to X  - and then see what was returned.

 Apparently the expression "the spirit is willing but the flesh 
 is weak" translated to Russian and back was returned by one 
 such program as:

 "The vodka is good but the meat is rotten!"

In case you missed it, this was well spreaded in the tech news 
last month or so:

https://translate.google.fr/?hl=fr#so/en/ngoo%20m%20goon%20goob%20goo%20goo%20goo%20mgoo%20goo%20goo%20goo%20goo%20goo%20m%20goo

Still progress to do.

Sep 26 2018

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

A delicious Turkish desert is "kabak tatlısı", made of squash. Now, it 
so happens that "kabak" also means "zucchini" in Turkish. Imagine my 
shock when I came across that desert recipe in English that used 
zucchini as the ingredient! :)

Ali

Sep 26 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Wednesday, September 26, 2018 11:15:01 PM MDT Ali Çehreli via 
Digitalmars-d wrote:
 A delicious Turkish desert is "kabak tatlısı", made of squash. Now, it
 so happens that "kabak" also means "zucchini" in Turkish. Imagine my
 shock when I came across that desert recipe in English that used
 zucchini as the ingredient! :)

Was it any good? ;)

- Jonathan M Davis

Sep 26 2018

Andrea Fontana <nospam example.com> writes:

On Thursday, 27 September 2018 at 05:15:01 UTC, Ali Çehreli wrote:
 A delicious Turkish desert is "kabak tatlısı", made of squash. 
 Now, it so happens that "kabak" also means "zucchini" in 
 Turkish. Imagine my shock when I came across that desert recipe 
 in English that used zucchini as the ingredient! :)

 Ali

You can't even imagine how many italian words and recipes are 
distorted...

Andrea

Sep 27 2018

Paolo Invernizzi <paolo.invernizzi gmail.com> writes:

On Thursday, 27 September 2018 at 07:03:51 UTC, Andrea Fontana 
wrote:
 On Thursday, 27 September 2018 at 05:15:01 UTC, Ali Çehreli 
 wrote:
 A delicious Turkish desert is "kabak tatlısı", made of squash. 
 Now, it so happens that "kabak" also means "zucchini" in 
 Turkish. Imagine my shock when I came across that desert 
 recipe in English that used zucchini as the ingredient! :)

 Ali

 You can't even imagine how many italian words and recipes are 
 distorted...

 Andrea

+1 :-P

Sep 27 2018

Andrea Fontana <nospam example.com> writes:

On Sunday, 23 September 2018 at 20:49:39 UTC, Walter Bright wrote:
 On 9/23/2018 9:52 AM, aliak wrote:

 There's a reason why dmd doesn't have international error 
 messages. My experience with it is that international users 
 don't want it. They prefer the english messages.

Yes please. Keep them in english.
But please, add an error code too in front of them.

 I'm sure if you look hard enough you'll find someone using 
 non-ASCII characters in identifiers.

It depends on what I'm developing.
If I'm writing a public library I'm planning to release on 
github, I use english identifiers.

But of course if is a piece of software for my company or for 
myself, I use italian identifiers.

Andrea

Sep 26 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, September 23, 2018 2:49:39 PM MDT Walter Bright via Digitalmars-d 
wrote:
 There's a reason why dmd doesn't have international error messages. My
 experience with it is that international users don't want it. They prefer
 the english messages.

It reminds me of one of the reasons that Bryan Cantrill thinks that many
folks use Linux - they want to be able to google their stack traces. Of
course, that same argument would be a reason to use C/C++ rather than
switching to D, but having an error be in a format that's more common and
therefore more likely to have been posted somewhere where you might be able
to find a discussion on it and therefore maybe be able to find the solution
for it can be valuable - and that's without even getting into all of the
translation issues discussed elsewher in this thread. And it's not like
compiler error messages - or programming speak in general - are really
traditional English anyway.

- Jonathan M Davis

Sep 26 2018

Erik van Velzen <erik evanv.nl> writes:

Agreed with Walter.

I'm all on board with i18n but I see no need for non-ascii 
identifiers.

Even identifiers with a non-latin origin are usually written in 
the latin script.

As for real-world usage I've seen Cyrillic identifiers a few 
times in PHP.

Sep 21 2018

Seb <seb wilzba.ch> writes:

On Friday, 21 September 2018 at 23:00:45 UTC, Erik van Velzen 
wrote:
 Agreed with Walter.

 I'm all on board with i18n but I see no need for non-ascii 
 identifiers.

 Even identifiers with a non-latin origin are usually written in 
 the latin script.

 As for real-world usage I've seen Cyrillic identifiers a few 
 times in PHP.

A: Wait. Using emojis as identifiers is not a good idea?
B: Yes.
A: But the cool kids are doing it:

https://codepen.io/andresgalante/pen/jbGqXj

In all seriousness I hate it when someone thought its funny to 
use the lambda symbol as an identifier and I have to copy that 
symbol whenever I want to use it because there's no convenient 
way to type it.
(This is already supported in D.)

Sep 21 2018

Neia Neutuladh <neia ikeran.org> writes:

On Friday, 21 September 2018 at 23:17:42 UTC, Seb wrote:
 A: Wait. Using emojis as identifiers is not a good idea?
 B: Yes.
 A: But the cool kids are doing it:

The C11 spec says that emoji should be allowed in identifiers 
(ISO publication N1570 page 504/522), so it's not just the cool 
kids.

I'm not in favor of emoji in identifiers.

 In all seriousness I hate it when someone thought its funny to 
 use the lambda symbol as an identifier and I have to copy that 
 symbol whenever I want to use it because there's no convenient 
 way to type it.

It's supported because λ is a letter in a language spoken by 
thirteen million people. I mean, would you want to have to name a 
variable "lumиnosиty" because someone got annoyed at people using 
"i" as a variable name?

Sep 21 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 22/09/2018 11:17 AM, Seb wrote:
 In all seriousness I hate it when someone thought its funny to use the 
 lambda symbol as an identifier and I have to copy that symbol whenever I 
 want to use it because there's no convenient way to type it.
 (This is already supported in D.)

This can be strongly mitigated by using a compose key. But they are not 
terribly common unfortunately.

Sep 21 2018

Kagamin <spam here.lot> writes:

On Friday, 21 September 2018 at 23:17:42 UTC, Seb wrote:
 A: Wait. Using emojis as identifiers is not a good idea?
 B: Yes.
 A: But the cool kids are doing it:

 https://codepen.io/andresgalante/pen/jbGqXj

It's not like we have a lot of good fonts (I know only one), and 
even fewer of them are suitable for code, and they can't be 
realistically expected to do everything, monospace fonts are even 
often ascii-only.

Sep 23 2018

FeepingCreature <feepingcreature gmail.com> writes:

On Friday, 21 September 2018 at 23:17:42 UTC, Seb wrote:
 In all seriousness I hate it when someone thought its funny to 
 use the lambda symbol as an identifier and I have to copy that 
 symbol whenever I want to use it because there's no convenient 
 way to type it.
 (This is already supported in D.)

I just want to chime in that I've definitely used greek letters 
in "ordinary" code - it's handy when writing math and feeling 
lazy.

Note that on Linux, with a simple configuration tweak (Windows 
key mapped to Compose, and 
https://gist.githubusercontent.com/zkat/6718053/raw/4535a2e2a988aa90937a69dbb8f10e
6a43b4010/.XCompose ), you can for instance type "<windows key> l a m" to make
the lambda symbol, or other greek letters very easily.

Sep 25 2018

Dukc <ajieskola gmail.com> writes:

When I make code that I expect to be only used around here, I 
generally write the code itself in english but comments in my own 
language. I agree that in general, it's better to stick with 
english in identifiers when the programming language and the 
standard library is English.

On Tuesday, 25 September 2018 at 09:28:33 UTC, FeepingCreature 
wrote:
 On Friday, 21 September 2018 at 23:17:42 UTC, Seb wrote:
 In all seriousness I hate it when someone thought its funny to 
 use the lambda symbol as an identifier and I have to copy that 
 symbol whenever I want to use it because there's no convenient 
 way to type it.
 (This is already supported in D.)

 I just want to chime in that I've definitely used greek letters 
 in "ordinary" code - it's handy when writing math and feeling 
 lazy.

On the other hand, Unicode identifiers till have their value IMO. 
The quote above is one reason for that -if there is a very 
specialized codebase it may be just inpractical to letterize 
everything.

Another reason is that something may not have a good translation 
to English. If there is an enum type listing city names, it is 
IMO better to write them as normal, using Unicode. 
CityName.seinäjoki, not CityName.seinaejoki.

Sep 25 2018

Shachar Shemesh <shachar weka.io> writes:

On 25/09/18 15:35, Dukc wrote:
 Another reason is that something may not have a good translation to 
 English. If there is an enum type listing city names, it is IMO better 
 to write them as normal, using Unicode. CityName.seinäjoki, not 
 CityName.seinaejoki.

This sounded like a very compelling example, until I gave it a second 
thought. I now fail to see how this example translates to a real-life 
scenario.

City names (data, changes over time) as enums (compile time set) seem 
like a horrible idea.

That may sound like a very technical objection to an otherwise valid 
point, but it really think that's not the case. The properties that 
cause city names to be poor candidates for enum values are the same as 
those that make them Unicode candidates.

Shachar

Sep 25 2018

Dukc <ajieskola gmail.com> writes:

On Wednesday, 26 September 2018 at 06:50:47 UTC, Shachar Shemesh 
wrote:
 The properties that cause city names to be poor candidates for 
 enum values are the same as those that make them Unicode 
 candidates.

How so?

 City names (data, changes over time) as enums (compile time 
 set) seem like a horrible idea.

In most cases yes. But not always. You might me doing some sort 
of game where certain cities are a central concept, not just data 
with properties. Another possibility is that you're using code as 
data, AKA scripting.

And who says anyway you can't make a program that's designed 
specificially for certain cities?

Sep 26 2018

Shachar Shemesh <shachar weka.io> writes:

On 26/09/18 10:26, Dukc wrote:
 On Wednesday, 26 September 2018 at 06:50:47 UTC, Shachar Shemesh wrote:
 The properties that cause city names to be poor candidates for enum 
 values are the same as those that make them Unicode candidates.

 
 How so?
 
 City names (data, changes over time) as enums (compile time set) seem 
 like a horrible idea.

 
 In most cases yes. But not always. You might me doing some sort of game 
 where certain cities are a central concept, not just data with 
 properties. Another possibility is that you're using code as data, AKA 
 scripting.
 
 And who says anyway you can't make a program that's designed 
 specificially for certain cities?

Sure you can. It's just very poor design.

I think, when asking such questions, two types of answers are relevant. 
One is hypotheticals where you say "this design requires this". For such 
answers, the design needs to be a good one. It makes no sense to design 
a language to support a hypothetical design which is not a good one.

The other type of answer is "it's being done in the real world". If it's 
in active use in the real world, it might make sense to support it, even 
if we can agree that the design is not optimal.

Since your answer is hypothetical, I think arguing this is not a good 
way to code is a valid one.

Shachar

Sep 26 2018

Dukc <ajieskola gmail.com> writes:

On Wednesday, 26 September 2018 at 07:37:28 UTC, Shachar Shemesh 
wrote:
 The other type of answer is "it's being done in the real 
 world". If it's in active use in the real world, it might make 
 sense to support it, even if we can agree that the design is 
 not optimal.

 Shachar

Two years ago, I taked part in implementing a commerical game. It 

would have faced the same thing, were it used.

Anyway, the game has three characters with completely different 
abilites. The abilites were unique enough that it made sense to 
name some functions after the characters. One of the characters 
really has a non-ASCII character in his name, and that meant 
naming him differently in the code.

Sep 26 2018

Steven Schveighoffer <schveiguy gmail.com> writes:

On 9/26/18 2:50 AM, Shachar Shemesh wrote:
 On 25/09/18 15:35, Dukc wrote:
 Another reason is that something may not have a good translation to 
 English. If there is an enum type listing city names, it is IMO better 
 to write them as normal, using Unicode. CityName.seinäjoki, not 
 CityName.seinaejoki.

 
 This sounded like a very compelling example, until I gave it a second 
 thought. I now fail to see how this example translates to a real-life 
 scenario.
 
 City names (data, changes over time) as enums (compile time set) seem 
 like a horrible idea.
 
 That may sound like a very technical objection to an otherwise valid 
 point, but it really think that's not the case. The properties that 
 cause city names to be poor candidates for enum values are the same as 
 those that make them Unicode candidates.

Hm... I could see actually some "clever" use of opDispatch being used to 
define cities or other such names.

In any case, I think the biggest pro for supporting Unicode symbol names 
is -- we already support Unicode symbol names. It doesn't make a whole 
lot of sense to only support some of them.

-Steve

Sep 26 2018

Walter Bright <newshound2 digitalmars.com> writes:

On 9/25/2018 11:50 PM, Shachar Shemesh wrote:
 This sounded like a very compelling example, until I gave it a second thought.
I 
 now fail to see how this example translates to a real-life scenario.

Also, there are usually common ASCII versions of city names, such as Cologne
for 
Köln.

Sep 26 2018

Jacob Carlborg <doob me.com> writes:

On 2018-09-21 18:27, Neia Neutuladh wrote:
 D's currently accepted identifier characters are based on Unicode 2.0:
 
 * ASCII range values are handled specially.
 * Letters and combining marks from Unicode 2.0 are accepted.
 * Numbers outside the ASCII range are accepted.
 * Eight random punctuation marks are accepted.
 
 This follows the C99 standard.
 

 Python, ECMAScript, just to name a few. A small number of languages 
 reject non-ASCII characters: Dart, Perl. Some languages are weirdly 
 generous: Swift and C11 allow everything outside the Basic Multilingual 
 Plane.
 
 I'd like to update that so that D accepts something as a valid 
 identifier character if it's a letter or combining mark or modifier 
 symbol that's present in Unicode 11, or a non-ASCII number. This allows 
 the 146 most popular writing systems and a lot more characters from 
 those writing systems. This *would* reject those eight random 
 punctuation marks, so I'll keep them in as legacy characters.
 
 It would mean we don't have to reference the C99 standard when 
 enumerating the allowed characters; we just have to refer to the Unicode 
 standard, which we already need to talk about in the lexical part of the 
 spec.
 
 It might also make the lexer a tiny bit faster; it reduces the number of 
 valid-ident-char segments to search from 245 to 134. On the other hand, 
 it will change the ident char ranges from wchar to dchar, which means 
 the table takes up marginally more memory.
 
 And, of course, it lets you write programs entirely in Linear B, and 
 that's a marketing ploy not to be missed.
 
 I've got this coded up and can submit a PR, but I thought I'd get 
 feedback here first.
 
 Does anyone see any horrible potential problems here?
 
 Or is there an interestingly better option?
 
 Does this need a DIP?

I'm not a native English speaker but I write all my public and private 
code in English. Anyone I work with, I will expect them and make sure 
they're writing the code in English as well. English is not enough 
either, it has to be American English.

Despite this I think that D should support as much of the Unicode as 
possible (including using Unicode for identifiers). It should not be up 
to the programming language to decide which language the developer 
should write the code in.

-- 
/Jacob Carlborg

Sep 25 2018

rjframe <dlang ryanjframe.com> writes:

On Fri, 21 Sep 2018 16:27:46 +0000, Neia Neutuladh wrote:

 I've got this coded up and can submit a PR, but I thought I'd get
 feedback here first.
 
 Does anyone see any horrible potential problems here?
 
 Or is there an interestingly better option?
 
 Does this need a DIP?

I just want to point out since this thread is still living that there have 
been very few answers to the actual question ("should I submit my PR?").

Walter did answer the question, with the reasons that Unicode identifier 
support is not useful/helpful and could cause issues with tooling. Which 
is likely correct; and if we really want to follow this logic, Unicode 
identifier support should be removed from D entirely.

I don't recall seeing anyone in favor providing technical reasons, save 
the OP.

Especially since the work is done, it makes sense to me to ask for the PR 
for review. Worst case scenario, it sits there until we need it.

Sep 26 2018

Steven Schveighoffer <schveiguy gmail.com> writes:

On 9/26/18 5:54 AM, rjframe wrote:
 On Fri, 21 Sep 2018 16:27:46 +0000, Neia Neutuladh wrote:
 
 I've got this coded up and can submit a PR, but I thought I'd get
 feedback here first.

 Does anyone see any horrible potential problems here?

 Or is there an interestingly better option?

 Does this need a DIP?

 
 I just want to point out since this thread is still living that there have
 been very few answers to the actual question ("should I submit my PR?").
 
 Walter did answer the question, with the reasons that Unicode identifier
 support is not useful/helpful and could cause issues with tooling. Which
 is likely correct; and if we really want to follow this logic, Unicode
 identifier support should be removed from D entirely.

This is a non-starter. We can't break people's code, especially for 
trivial reasons like 'you shouldn't code that way because others don't 
like it'. I'm pretty sure Walter would be against removing Unicode 
support for identifiers.

 
 I don't recall seeing anyone in favor providing technical reasons, save
 the OP.

There doesn't necessarily need to be a technical reason. In fact, there 
really isn't one -- people can get by with using ASCII identifiers just 
fine (and many/most people do). Supporting Unicode would be purely for 
social or inclusive reasons (it may make D more approachable to 
non-English speaking schoolchildren for instance).

As an only-English speaking person, it doesn't bother me either way to 
have Unicode identifiers. But the fact that we *already* support Unicode 
identifiers leads me to expect that we support *all* Unicode 
identifiers. It doesn't make a whole lot of sense to only support some 
of them.

 
 Especially since the work is done, it makes sense to me to ask for the PR
 for review. Worst case scenario, it sits there until we need it.

I suggested this as well.

https://forum.dlang.org/post/poaq1q$its$1 digitalmars.com

I think it stands a good chance of getting incorporated, just for the 
simple fact that it's enabling and not disruptive.

-Steve

Sep 26 2018

Walter Bright <newshound2 digitalmars.com> writes:

On 9/26/2018 5:46 AM, Steven Schveighoffer wrote:
 This is a non-starter. We can't break people's code, especially for trivial 
 reasons like 'you shouldn't code that way because others don't like it'. I'm 
 pretty sure Walter would be against removing Unicode support for identifiers.

We're not going to remove it, because there's not much to gain from it.

But expanding it seems of vanishingly little value. Note that each thing that 
gets added to D adds weight to it, and it needs to pull its weight. Nothing is
free.

I don't see a scenario where someone would be learning D and not know English. 
Non-English D instructional material is nearly non-existent. dlang.org is all
in 
English. Don't most languages have a Romanji-like representation?

C/C++ have made efforts in the past to support non-ASCII coding - digraphs, 
trigraphs, and alternate keywords. They've all failed miserably. The only
people 
who seem to know those features even exist are language lawyers.

Sep 26 2018

Adam D. Ruppe <destructionator gmail.com> writes:

On Wednesday, 26 September 2018 at 20:43:47 UTC, Walter Bright 
wrote:
 I don't see a scenario where someone would be learning D and 
 not know English. Non-English D instructional material is 
 nearly non-existent.

http://ddili.org/ders/d/

Sep 26 2018

Steven Schveighoffer <schveiguy gmail.com> writes:

On 9/26/18 4:43 PM, Walter Bright wrote:
 But expanding it seems of vanishingly little value. Note that each thing 
 that gets added to D adds weight to it, and it needs to pull its weight. 
 Nothing is free.

It may be the weight is already there in the form of unicode symbol 
support, just the range of the characters supported isn't good enough 
for some languages. It might be like replacing your refrigerator -- you 
get an upgrade, but it's not going to take up any more space because you 
get rid of the old one. I would like to see the PR before passing 
judgment on the heft of the change.

The value is simply in the consistency -- when some of the words for 
your language can be valid symbols but others can't, then it becomes a 
weird guessing game as to what is supported. It would be like saying all 
identifiers can have any letters except `q`. Sure, you can get around 
that, but it's weirdly exclusive.

I claim complete ignorance as to what is required, it hasn't been 
technically laid out what is at stake, and I'm not bilingual anyway. It 
could be true that I'm completely misunderstanding the positions of others.

-Steve

Sep 26 2018

Neia Neutuladh <neia ikeran.org> writes:

On 09/26/2018 01:43 PM, Walter Bright wrote:
 Don't most languages have a Romanji-like 
 representation?

Yes, a lot of languages that don't use the Latin alphabet have standard 
transcriptions into the Latin alphabet. Standard transcriptions into 
ASCII are much less common, and newer Unicode versions include more 
Latin characters to better support languages (and other use cases) using 
the Latin alphabet.

Sep 26 2018

aliak <something something.com> writes:

On Wednesday, 26 September 2018 at 20:43:47 UTC, Walter Bright 
wrote:
 On 9/26/2018 5:46 AM, Steven Schveighoffer wrote:
 This is a non-starter. We can't break people's code, 
 especially for trivial reasons like 'you shouldn't code that 
 way because others don't like it'. I'm pretty sure Walter 
 would be against removing Unicode support for identifiers.

 We're not going to remove it, because there's not much to gain 
 from it.

 But expanding it seems of vanishingly little value. Note that 
 each thing that gets added to D adds weight to it, and it needs 
 to pull its weight. Nothing is free.

 I don't see a scenario where someone would be learning D and 
 not know English. Non-English D instructional material is 
 nearly non-existent. dlang.org is all in English. Don't most 
 languages have a Romanji-like representation?

It's not that they don't know English. It's that non-English 
speakers can process words and sentences in non-English much more 
efficiently than in English. Knowing a language is not binary.

Here's an example from this years spring semester and NTNU 
(norwegian uni): 
http://folk.ntnu.no/frh/grprog/eksempel/eks_20.cpp

... That's the basic programming course. Whether the professor 
would use that I guess would depend on ratio of 
English/non-English speakers. But it's there nonetheless.

Of course Norway is a bad example because the English level here 
is, arguably, higher than many English countries :p But it's a 
great example because even if you're great at English, still 
sometimes people are more comfortable/confident/efficient/ in 
their own native language.

Some tech meetups from different countries try and do things in 
English and mostly it works. But it's been seen consistently with 
non-English audiences that presentations given in English result 
in silence whereas if it's in their native language you have 
actual engagement.

I fail to understand how supporting a version of unicode from 
(not sure when it was released) 3 billion decades ago should just 
be left as is and also cannot be removed when there's someone 
who's willing to update it.

 C/C++ have made efforts in the past to support non-ASCII coding 
 - digraphs, trigraphs, and alternate keywords. They've all 
 failed miserably. The only people who seem to know those 
 features even exist are language lawyers.

This is not relevant. Trigraphs and digraphs did indeed fail 
miserably but they do not represent any non-ascii characters. The 
existential reasons for those abominations were different.

Anyway, on a related note: D itself (not identifiers, but std) 
also supports unicode 6 or something. That's from 2010. That's a 
decade ago. We're at unicode 11 now. And I've already had someone 
tell me (while trying to get them to use D) - "hold on it 
supports unicode from a decade ago? Nah I'm not touching it". Not 
that it's the same as supporting identifiers in code, but still 
the reaction is relevant.

Cheers,
- Ali

Sep 27 2018

Shachar Shemesh <shachar weka.io> writes:

On 27/09/18 10:35, aliak wrote:
 Here's an example from this years spring semester and NTNU (norwegian 
 uni): http://folk.ntnu.no/frh/grprog/eksempel/eks_20.cpp
 
 ... That's the basic programming course. Whether the professor would use 
 that I guess would depend on ratio of English/non-English speakers. But 
 it's there nonetheless.

I'm sorry I keep bringing this up, but context is really important here.

The program you link to has non-ASCII in the comments and in the 
literals, but not in the identifiers. Nobody is opposed to having those.

Shachar

Sep 27 2018

aliak <something something.com> writes:

On Thursday, 27 September 2018 at 08:16:00 UTC, Shachar Shemesh 
wrote:
 On 27/09/18 10:35, aliak wrote:
 Here's an example from this years spring semester and NTNU 
 (norwegian uni): 
 http://folk.ntnu.no/frh/grprog/eksempel/eks_20.cpp
 
 ... That's the basic programming course. Whether the professor 
 would use that I guess would depend on ratio of 
 English/non-English speakers. But it's there nonetheless.

 I'm sorry I keep bringing this up, but context is really 
 important here.

 The program you link to has non-ASCII in the comments and in 
 the literals, but not in the identifiers. Nobody is opposed to 
 having those.

 Shachar

The point was that being able to use non-English in code is 
demonstrably both helpful and useful to people. Norwegian happens 
to be easily anglicize-able. I've already linked to non ascii 
code versions in a previous post if you want that too.

Sep 27 2018

Shachar Shemesh <shachar weka.io> writes:

On 27/09/18 16:38, aliak wrote:
 The point was that being able to use non-English in code is demonstrably 
 both helpful and useful to people. Norwegian happens to be easily 
 anglicize-able. I've already linked to non ascii code versions in a 
 previous post if you want that too.

If you wish to make a point about something irrelevant to the 
discussion, that's fine. It is, however, irrelevant, mostly because it 
is uncontested.

This thread is about the use of non-English in *identifiers*. This 
thread is not about comments. It is not about literals (i.e. - strings). 
Only about identifiers (function names, variable names etc.).

If you have real world examples of those, that would be both interesting 
and relevant.

Shachar

Sep 27 2018

aliak <something something.com> writes:

On Thursday, 27 September 2018 at 13:59:48 UTC, Shachar Shemesh 
wrote:
 On 27/09/18 16:38, aliak wrote:
 The point was that being able to use non-English in code is 
 demonstrably both helpful and useful to people. Norwegian 
 happens to be easily anglicize-able. I've already linked to 
 non ascii code versions in a previous post if you want that 
 too.

 If you wish to make a point about something irrelevant to the 
 discussion, that's fine. It is, however, irrelevant, mostly 
 because it is uncontested.

 This thread is about the use of non-English in *identifiers*. 
 This thread is not about comments. It is not about literals 
 (i.e. - strings). Only about identifiers (function names, 
 variable names etc.).

 If you have real world examples of those, that would be both 
 interesting and relevant.

 Shachar

English doesn't mean ascii. You can write non-English in ascii, 
which you would've noticed if you'd opened the link, which had 
identifiers in Norwegian (which is not English).

And again, I've already posted a link that shows non-ascii 
identifiers. I'll paste it again here incase you don't want to 
read the thread:

https://speakerdeck.com/codelynx/programming-swift-in-japanese

Sep 27 2018

sarn <sarn theartofmachinery.com> writes:

On Thursday, 27 September 2018 at 16:34:37 UTC, aliak wrote:
 On Thursday, 27 September 2018 at 13:59:48 UTC, Shachar Shemesh 
 wrote:
 On 27/09/18 16:38, aliak wrote:
 The point was that being able to use non-English in code is 
 demonstrably both helpful and useful to people. Norwegian 
 happens to be easily anglicize-able. I've already linked to 
 non ascii code versions in a previous post if you want that 
 too.

 If you wish to make a point about something irrelevant to the 
 discussion, that's fine. It is, however, irrelevant, mostly 
 because it is uncontested.

 This thread is about the use of non-English in *identifiers*. 
 This thread is not about comments. It is not about literals 
 (i.e. - strings). Only about identifiers (function names, 
 variable names etc.).

 If you have real world examples of those, that would be both 
 interesting and relevant.

 Shachar

 English doesn't mean ascii. You can write non-English in ascii, 
 which you would've noticed if you'd opened the link, which had 
 identifiers in Norwegian (which is not English).

 And again, I've already posted a link that shows non-ascii 
 identifiers. I'll paste it again here incase you don't want to 
 read the thread:

 https://speakerdeck.com/codelynx/programming-swift-in-japanese

Shachar seems to be aiming for an internet high score by shooting 
down threads without reading them.  You have better things to do.
http://www.paulgraham.com/vb.html

Sep 27 2018

Dukc <ajieskola gmail.com> writes:

On Friday, 28 September 2018 at 02:23:32 UTC, sarn wrote:
 Shachar seems to be aiming for an internet high score by 
 shooting down threads without reading them.  You have better 
 things to do.
 http://www.paulgraham.com/vb.html

I believe you're being too harsh. It's easy to miss a part of a 
post sometimes.

Sep 28 2018

sarn <sarn theartofmachinery.com> writes:

On Friday, 28 September 2018 at 11:37:10 UTC, Dukc wrote:
 It's easy to miss a part of a post sometimes.

That's very true, and it's always good to give people the benefit 
of the doubt.  But most people are able to post constructively 
here without

* Abrasively and condescendingly declaring others' posts to be 
completely pointless
* Doing that based on one single aspect of a post, without 
bothering to check the whole post or parent post
* Doubling down even after getting a hint that the poster might 
not have posted 100% cluelessly
* Doing all this more than once in a thread

If Shachar starts posting constructively, I'll happily engage.  I 
mean that.  Otherwise I won't waste my time, and I'll tell others 
not to waste theirs, too.

Sep 28 2018

Shachar Shemesh <shachar weka.io> writes:

On 28/09/18 14:37, Dukc wrote:
 On Friday, 28 September 2018 at 02:23:32 UTC, sarn wrote:
 Shachar seems to be aiming for an internet high score by shooting down 
 threads without reading them.  You have better things to do.
 http://www.paulgraham.com/vb.html

 
 I believe you're being too harsh. It's easy to miss a part of a post 
 sometimes.

A minor correction: Aliak is not accusing me of missing a part of the 
post. He's accusing me of not taking into account something he said in a 
different part of the *thread*. I.e. - I missed something he said in one 
of the other (as of this writing, 98) posts of this thread, and thus 
causing Dukc to label me a bullshitter.

Sep 28 2018

Dukc <ajieskola gmail.com> writes:

On Saturday, 29 September 2018 at 02:22:55 UTC, Shachar Shemesh 
wrote:
 I missed something he said in one of the other (as of this 
 writing, 98) posts of this thread, and thus causing Dukc to 
 label me a bullshitter.

I know you meant Sarn, but still... can you please be a bit less 
aggresive with our wording?

Sep 29 2018

Shachar Shemesh <shachar weka.io> writes:

On 29/09/18 16:52, Dukc wrote:
 On Saturday, 29 September 2018 at 02:22:55 UTC, Shachar Shemesh wrote:
 I missed something he said in one of the other (as of this writing, 
 98) posts of this thread, and thus causing Dukc to label me a 
 bullshitter.

 
 I know you meant Sarn, but still... can you please be a bit less 
 aggresive with our wording?

 From the article (the furthest point I read in it):
 When I ask myself what I've found life is too short for, the word that pops
into my head is "bullshit."


That is the word used by the article *you* linked to, in reference to 
me. If it offends you enough to be accused of *calling* someone that, 
just imagine how I felt being *called* that very same name.

Seriously, I don't make it a habit of being offended by random people on 
the Internet, but this is more a conscious decision than a naturally 
thick skin. Seeing that label hurt.

Don't worry. I've been on the Internet since 1991. That's longer than 
the median age here (i.e. - I've been on the Internet since before most 
of you have been born). I've had my own fair share of flame wars, 
include some that, to my chagrin, I've started.

In other words, I got over it. I did not reply, big though the 
temptation was.

But the right time to be sensitive about what words are being used was 
*before* you linked to the article. Taking offense from being called out 
for calling someone something you find offensive is hypocritical.

I never understood the focus on words. It's not the use of that word 
that offended me, it's the fact that you thought anything I did 
justified using it. I don't think using "cattle excrement" instead would 
have been any less hurtful.

And it's not that the rest of your post was thoughtful, considerate and 
took pains to give constructive criticism, with or without hurting 
anyone's feelings. It's just that it doesn't seem to be that part 
bothers you.

Shachar

Sep 29 2018

Shachar Shemesh <shachar weka.io> writes:

On Saturday, 29 September 2018 at 16:19:38 UTC, ag0aep6g wrote:
 On 09/29/2018 04:19 PM, Shachar Shemesh wrote:
 On 29/09/18 16:52, Dukc wrote:

 [...]
 I know you meant Sarn, but still... can you please be a bit 
 less aggresive with our wording?

 
  From the article (the furthest point I read in it):
 When I ask myself what I've found life is too short for, the 
 word that pops into my head is "bullshit."


 Dukc didn't post that link. sarn did.

You are 100% correct. My most sincere apologies.

I am going to stop responding to this thread now.

Shachar

Sep 29 2018

Walter Bright <newshound2 digitalmars.com> writes:

On 9/27/2018 12:35 AM, aliak wrote:
 Anyway, on a related note: D itself (not identifiers, but std) also supports 
 unicode 6 or something. That's from 2010. That's a decade ago. We're at
unicode 
 11 now. And I've already had someone tell me (while trying to get them to use
D) 
 - "hold on it supports unicode from a decade ago? Nah I'm not touching it".
Not 
 that it's the same as supporting identifiers in code, but still the reaction
is 
 relevant.

Nobody is suggesting D not support Unicode in strings, comments, and the 
standard library. Please file any issues on Bugzilla, and PRs to fix them.

Sep 27 2018

Walter Bright <newshound2 digitalmars.com> writes:

On 9/26/2018 5:46 AM, Steven Schveighoffer wrote:
 Does this need a DIP?



Feel free to write one, but its chances of getting incorporated are remote and 
would require a pretty strong rationale that I haven't seen yet.

Sep 26 2018

D Programming

C/C++ Programming

Other

digitalmars.D - Updating D beyond Unicode 2.0