digitalmars.D - Re: Why UTF-8/16 character encodings?

"H. S. Teoh" <hsteoh quickfur.ath.cx> May 27 2013

"Torje Digernes" <torjehoa pvv.org> May 29 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, May 28, 2013 at 02:54:30AM +0200, Torje Digernes wrote:
 On Tuesday, 28 May 2013 at 00:34:20 UTC, Manu wrote:
On 28 May 2013 09:05, Walter Bright <newshound2 digitalmars.com>
wrote:

On 5/27/2013 3:18 PM, H. S. Teoh wrote:

Well, D *does* support non-English identifiers, y'know... for
example:

        void main(string[] args) {
                int число = 1;
                foreach (и; 0..100)
                        число += и;
                writeln(число);
        }

Of course, whether that's a good practice is a different
story. :)


I've recently come to the opinion that that's a bad idea, and D
should not
support it.


Why? You said previously that you'd love to support extended
operators ;)


 I find features such as support for uncommon symbols in variables a
 strength as it makes some physics formulas a bit easier to read in
 code form, which in my opinion is a good thing.


I think there's a difference between allowing math symbols (which
includes things like (a subset of) Greek letters that mathematicians
love) in identifiers, and allowing full Unicode. What if you're assigned
to maintain code containing identifiers that has letters that don't
appear in any of your installed fonts?

I think it's OK to allow math symbols, but allowing the entire set of
Unicode characters is going a bit too far, IMO. For one thing, if some
code has identifiers written in Arabic, I wouldn't be able to understand
the code, simply because I'd have a hard time telling different
identifiers apart.  Besides, if the rest of the language (keywords,
Phobos, etc.) are in English, then I don't see any compelling reason to
use a different language in identifiers, other than to submit IODCC
entries. :-P

C doesn't support Unicode identifiers, for one thing, but I've seen
working C code written by people who barely understand any English -- it
didn't stop them at all. (The comments were of course in their native
language -- the compiler ignores everything inside anyway so 8-bit
native encodings or even UTF-8 can be sneaked in without provoking
compiler errors.)


T

-- 
WINDOWS = Will Install Needless Data On Whole System -- CompuMan

May 27 2013

"Torje Digernes" <torjehoa pvv.org> writes:

On Tuesday, 28 May 2013 at 01:17:37 UTC, H. S. Teoh wrote:
 On Tue, May 28, 2013 at 02:54:30AM +0200, Torje Digernes wrote:
 On Tuesday, 28 May 2013 at 00:34:20 UTC, Manu wrote:
On 28 May 2013 09:05, Walter Bright 
<newshound2 digitalmars.com>
wrote:

On 5/27/2013 3:18 PM, H. S. Teoh wrote:

Well, D *does* support non-English identifiers, y'know... 
for
example:

        void main(string[] args) {
                int число = 1;
                foreach (и; 0..100)
                        число += и;
                writeln(число);
        }

Of course, whether that's a good practice is a different
story. :)


I've recently come to the opinion that that's a bad idea, 
and D
should not
support it.


Why? You said previously that you'd love to support extended
operators ;)


 I find features such as support for uncommon symbols in 
 variables a
 strength as it makes some physics formulas a bit easier to 
 read in
 code form, which in my opinion is a good thing.


 I think there's a difference between allowing math symbols 
 (which
 includes things like (a subset of) Greek letters that 
 mathematicians
 love) in identifiers, and allowing full Unicode. What if you're 
 assigned
 to maintain code containing identifiers that has letters that 
 don't
 appear in any of your installed fonts?

 I think it's OK to allow math symbols, but allowing the entire 
 set of
 Unicode characters is going a bit too far, IMO. For one thing, 
 if some
 code has identifiers written in Arabic, I wouldn't be able to 
 understand
 the code, simply because I'd have a hard time telling different
 identifiers apart.  Besides, if the rest of the language 
 (keywords,
 Phobos, etc.) are in English, then I don't see any compelling 
 reason to
 use a different language in identifiers, other than to submit 
 IODCC
 entries. :-P

 C doesn't support Unicode identifiers, for one thing, but I've 
 seen
 working C code written by people who barely understand any 
 English -- it
 didn't stop them at all. (The comments were of course in their 
 native
 language -- the compiler ignores everything inside anyway so 
 8-bit
 native encodings or even UTF-8 can be sneaked in without 
 provoking
 compiler errors.)


 T


artificially limiting the allowable symbols. Other symbols 
relevant in other fields which does not happen to use Greek 
symbols primarily, are they to be treated differently?

What you propose is a built in code standard for D, based on your 
feelings on a topic.

If what you fear is that unicode will suddenly make cooperation 
impossible I doubt you are right, after all there is all kind of 
ways to make terrible variable names (q,w,e,r ... qq,qw). If any 
such identifiers show up in a project I assume they are cleaned 
up, why wouldn't the same happen to unicode if they are causing 
problems? Think about it, it should happen even faster because 
the symbol might not be accessible for everyone, where a 
single/double letter gibberish one is perfectly reproducible and 
might grow into the project confusing every new reader. Are you 
going to argue for disallowing variables that are not a compound 
word or a dictionary word in English?

May 29 2013

D Programming

C/C++ Programming

Other

digitalmars.D - Re: Why UTF-8/16 character encodings?