www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Re: Why UTF-8/16 character encodings?

reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, May 28, 2013 at 02:54:30AM +0200, Torje Digernes wrote:
 On Tuesday, 28 May 2013 at 00:34:20 UTC, Manu wrote:
On 28 May 2013 09:05, Walter Bright <newshound2 digitalmars.com>
wrote:

On 5/27/2013 3:18 PM, H. S. Teoh wrote:

Well, D *does* support non-English identifiers, y'know... for
example:

        void main(string[] args) {
                int число = 1;
                foreach (и; 0..100)
                        число += и;
                writeln(число);
        }

Of course, whether that's a good practice is a different
story. :)

I've recently come to the opinion that that's a bad idea, and D should not support it.

Why? You said previously that you'd love to support extended operators ;)

I find features such as support for uncommon symbols in variables a strength as it makes some physics formulas a bit easier to read in code form, which in my opinion is a good thing.

I think there's a difference between allowing math symbols (which includes things like (a subset of) Greek letters that mathematicians love) in identifiers, and allowing full Unicode. What if you're assigned to maintain code containing identifiers that has letters that don't appear in any of your installed fonts? I think it's OK to allow math symbols, but allowing the entire set of Unicode characters is going a bit too far, IMO. For one thing, if some code has identifiers written in Arabic, I wouldn't be able to understand the code, simply because I'd have a hard time telling different identifiers apart. Besides, if the rest of the language (keywords, Phobos, etc.) are in English, then I don't see any compelling reason to use a different language in identifiers, other than to submit IODCC entries. :-P C doesn't support Unicode identifiers, for one thing, but I've seen working C code written by people who barely understand any English -- it didn't stop them at all. (The comments were of course in their native language -- the compiler ignores everything inside anyway so 8-bit native encodings or even UTF-8 can be sneaked in without provoking compiler errors.) T -- WINDOWS = Will Install Needless Data On Whole System -- CompuMan
May 27 2013
parent "Torje Digernes" <torjehoa pvv.org> writes:
On Tuesday, 28 May 2013 at 01:17:37 UTC, H. S. Teoh wrote:
 On Tue, May 28, 2013 at 02:54:30AM +0200, Torje Digernes wrote:
 On Tuesday, 28 May 2013 at 00:34:20 UTC, Manu wrote:
On 28 May 2013 09:05, Walter Bright 
<newshound2 digitalmars.com>
wrote:

On 5/27/2013 3:18 PM, H. S. Teoh wrote:

Well, D *does* support non-English identifiers, y'know... 
for
example:

        void main(string[] args) {
                int число = 1;
                foreach (и; 0..100)
                        число += и;
                writeln(число);
        }

Of course, whether that's a good practice is a different
story. :)

I've recently come to the opinion that that's a bad idea, and D should not support it.

Why? You said previously that you'd love to support extended operators ;)

I find features such as support for uncommon symbols in variables a strength as it makes some physics formulas a bit easier to read in code form, which in my opinion is a good thing.

I think there's a difference between allowing math symbols (which includes things like (a subset of) Greek letters that mathematicians love) in identifiers, and allowing full Unicode. What if you're assigned to maintain code containing identifiers that has letters that don't appear in any of your installed fonts? I think it's OK to allow math symbols, but allowing the entire set of Unicode characters is going a bit too far, IMO. For one thing, if some code has identifiers written in Arabic, I wouldn't be able to understand the code, simply because I'd have a hard time telling different identifiers apart. Besides, if the rest of the language (keywords, Phobos, etc.) are in English, then I don't see any compelling reason to use a different language in identifiers, other than to submit IODCC entries. :-P C doesn't support Unicode identifiers, for one thing, but I've seen working C code written by people who barely understand any English -- it didn't stop them at all. (The comments were of course in their native language -- the compiler ignores everything inside anyway so 8-bit native encodings or even UTF-8 can be sneaked in without provoking compiler errors.) T

artificially limiting the allowable symbols. Other symbols relevant in other fields which does not happen to use Greek symbols primarily, are they to be treated differently? What you propose is a built in code standard for D, based on your feelings on a topic. If what you fear is that unicode will suddenly make cooperation impossible I doubt you are right, after all there is all kind of ways to make terrible variable names (q,w,e,r ... qq,qw). If any such identifiers show up in a project I assume they are cleaned up, why wouldn't the same happen to unicode if they are causing problems? Think about it, it should happen even faster because the symbol might not be accessible for everyone, where a single/double letter gibberish one is perfectly reproducible and might grow into the project confusing every new reader. Are you going to argue for disallowing variables that are not a compound word or a dictionary word in English?
May 29 2013