D - Documentation Error

Unicode User (21/21) Mar 11 2004 Hi,

Walter (12/33) Mar 11 2004 You must be looking at an old version. The current doc defines char as

Unicode User <Unicode_member pathlink.com> writes:

Hi,

Not sure if this is the right place to report this. I am very, VERY impressed
with D - especially with the UTF support. Spending some time learning D now.

But there's an error in the documentation of the Basic Data Types. It says:
"char = unsigned 8 bit ASCII".

I would point out that ASCII, *BY DEFINITION* is only seven bits wide, and
therefore that the phrase "8 bit ASCII" is meaningless, and open to all sorts of
possible misinterpretation. Three corrections are possible, and I don't know
which one is right:
1. char = unsigned 7 bit ASCII.
2. char = unsigned 8 bit UTF-8
3. char = unsigned 8-bit ISO 8059-1 (aka LATIN-1)

Please note that while choice 3 is a subset of Unicode, it is incompatible with
choice 2. Each of the three choices differ in how codepoints 0x80 to 0xFF are
interpretted. Specifically:
1. ASCII - codepoints 0x80 to 0xFF are undefined
2. UTF-8 - codepoints 0x80 to 0xFF are reserved for proper UTF-8 encoding
3. ISO 8059-1 - codepoints 0x80 to 0xFF represent Unicode chars U+0080 to
U+00FF.

This seems a simple thing to fix. If this is not the right place to report this,
please can someone point me to the right place. Thanks.

Mar 11 2004

"Walter" <walter digitalmars.com> writes:

You must be looking at an old version. The current doc defines char as
unsigned 8 bit UTF-8. -Walter

"Unicode User" <Unicode_member pathlink.com> wrote in message
news:c2pgq5$1tnc$1 digitaldaemon.com...
 Hi,

 Not sure if this is the right place to report this. I am very, VERY

impressed
 with D - especially with the UTF support. Spending some time learning D

now.
 But there's an error in the documentation of the Basic Data Types. It

says:
 "char = unsigned 8 bit ASCII".

 I would point out that ASCII, *BY DEFINITION* is only seven bits wide, and
 therefore that the phrase "8 bit ASCII" is meaningless, and open to all

sorts of
 possible misinterpretation. Three corrections are possible, and I don't

know
 which one is right:
 1. char = unsigned 7 bit ASCII.
 2. char = unsigned 8 bit UTF-8
 3. char = unsigned 8-bit ISO 8059-1 (aka LATIN-1)

 Please note that while choice 3 is a subset of Unicode, it is incompatible

with
 choice 2. Each of the three choices differ in how codepoints 0x80 to 0xFF

are
 interpretted. Specifically:
 1. ASCII - codepoints 0x80 to 0xFF are undefined
 2. UTF-8 - codepoints 0x80 to 0xFF are reserved for proper UTF-8 encoding
 3. ISO 8059-1 - codepoints 0x80 to 0xFF represent Unicode chars U+0080 to
 U+00FF.

 This seems a simple thing to fix. If this is not the right place to report

this,
 please can someone point me to the right place. Thanks.

Mar 11 2004

D Programming

C/C++ Programming

Other

D - Documentation Error