www.digitalmars.com         C & C++   DMDScript  

D - char - Documentation Error

reply Jill.Ramonsky aculab.com writes:
Hi,

Not sure if this is the right place to report this. I am very, VERY impressed
with D - especially with the UTF support. Spending some time learning D now.

But there's an error in the documentation of the Basic Data Types. It says:
"char = unsigned 8 bit ASCII".

I would point out that ASCII, *BY DEFINITION* is only seven bits wide, and
therefore that the phrase "8 bit ASCII" is meaningless, and open to all sorts of
possible misinterpretation. Three corrections are possible, and I don't know
which one is right:
1. char = unsigned 7 bit ASCII.
2. char = unsigned 8 bit UTF-8
3. char = unsigned 8-bit ISO 8059-1 (aka LATIN-1)

Please note that while choice 3 is a subset of Unicode, it is incompatible with
choice 2. Each of the three choices differ in how codepoints 0x80 to 0xFF are
interpretted. Specifically:
1. ASCII - codepoints 0x80 to 0xFF are undefined
2. UTF-8 - codepoints 0x80 to 0xFF are reserved for proper UTF-8 encoding
3. ISO 8059-1 - codepoints 0x80 to 0xFF represent Unicode chars U+0080 to
U+00FF.

This seems a simple thing to fix. If this is not the right place to report this,
please can someone point me to the right place. Thanks.
Mar 11 2004
next sibling parent =?ISO-8859-1?Q?Sigbj=F8rn_Lund_Olsen?= <sigbjorn lundolsen.net> writes:
Jill.Ramonsky aculab.com wrote:
 Hi,
 
 Not sure if this is the right place to report this. I am very, VERY impressed
 with D - especially with the UTF support. Spending some time learning D now.
 
 But there's an error in the documentation of the Basic Data Types. It says:
 "char = unsigned 8 bit ASCII".
 
 I would point out that ASCII, *BY DEFINITION* is only seven bits wide, and
 therefore that the phrase "8 bit ASCII" is meaningless, and open to all sorts
of
 possible misinterpretation. Three corrections are possible, and I don't know
 which one is right:
 1. char = unsigned 7 bit ASCII.
 2. char = unsigned 8 bit UTF-8
 3. char = unsigned 8-bit ISO 8059-1 (aka LATIN-1)
 
 Please note that while choice 3 is a subset of Unicode, it is incompatible with
 choice 2. Each of the three choices differ in how codepoints 0x80 to 0xFF are
 interpretted. Specifically:
 1. ASCII - codepoints 0x80 to 0xFF are undefined
 2. UTF-8 - codepoints 0x80 to 0xFF are reserved for proper UTF-8 encoding
 3. ISO 8059-1 - codepoints 0x80 to 0xFF represent Unicode chars U+0080 to
 U+00FF.
 
 This seems a simple thing to fix. If this is not the right place to report
this,
 please can someone point me to the right place. Thanks.

This is probably the right(est) place for everything D atm :-) D chars are UTF-8 bytes. On the website on 'Basic Data Types', and in the 0.80 docs I have on my HD this is what it says: char unsigned 8 bit UTF-8 wchar unsigned 16 bit UTF-16 dchar unsigned 32 bit UTF-32 I can't find the phrase 8 bit ASCII anywhere in the docs, so I'm guessing you probably want to update your D compiler. Cheers, Sigbjørn Lund Olsen
Mar 11 2004
prev sibling parent J C Calvarese <jcc7 cox.net> writes:
Jill.Ramonsky aculab.com wrote:
 Hi,
 
 Not sure if this is the right place to report this. I am very, VERY impressed
 with D - especially with the UTF support. Spending some time learning D now.

Welcome to the D newsgroup. You're in the right place. This is where bugs and errata are reported.
 
 But there's an error in the documentation of the Basic Data Types. It says:
 "char = unsigned 8 bit ASCII".

Perhaps you're looking at the quite outdated Apr 2003 PDF snapshot that can be found at: http://www.prowiki.org/wiki4d/wiki.cgi?LanguageSpecification. If you like PDF's, there's a more recent (Jan 2004) available on the same page. It's outdated, too, but not as much. The online docs (http://www.digitalmars.com/d/index.html) and those included with the compiler change fairly often. It's probably best to look at them if you want to see the current and most accurate description of D. (I don't claim to know anything about UTF, so I won't try to address the substance of you message. :) ) -- Justin http://jcc_7.tripod.com/d/
Mar 11 2004