www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Suggestion: char.init, wchar.init and dchar.init

reply Arcane Jill <Arcane_member pathlink.com> writes:
Hi,

The default value of NaN for floating point numbers is an excellent idea. I
suggest that we do the same thing for chars, wchars and dchars.

The init value for char should (IMO) be 0xFF. Rationale - char by definition
contains a UTF-8 fragment. The byte 0xFF will never occur in a valid UTF-8
sequence. It is a clear indication of an unassigned value.

The init value for wchar and dchar should be 0xFFFF (that is, 0x0000FFFF for
dchar). Rationale - wchar and dchar by definiton contain UTF-16 and UTF-32
(equivalent to plain Unicode within their defined ranges). The codepoint U+FFFF
is not a legitimate Unicode character, and, furthermore, it is guaranteed by the
Unicode Consortium that 0xFFFF will NEVER be a legitimate Unicode character.
This codepoint will remain forever unassigned, precisely so that it may be used
for purposes such as this.

Be it noted that that the codepoint 0 is a bad choice for a default value. It
might have made sense in C, where '\0' has special meaning as a string
terminator, but in D '\0' is just another character. Unicode defines '\0' as a
control character whose interpretation is implementation dependent. Better, I
feel, to use a value with universal meaning.

Jill
Jun 07 2004
next sibling parent Ilya Minkov <minkov cs.tum.edu> writes:
Gets my vote!

-eye
Jun 07 2004
prev sibling next sibling parent "Walter" <newshound digitalmars.com> writes:
That's a good idea.

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:ca17qq$224t$1 digitaldaemon.com...
 Hi,

 The default value of NaN for floating point numbers is an excellent idea.

 suggest that we do the same thing for chars, wchars and dchars.

 The init value for char should (IMO) be 0xFF. Rationale - char by

 contains a UTF-8 fragment. The byte 0xFF will never occur in a valid UTF-8
 sequence. It is a clear indication of an unassigned value.

 The init value for wchar and dchar should be 0xFFFF (that is, 0x0000FFFF

 dchar). Rationale - wchar and dchar by definiton contain UTF-16 and UTF-32
 (equivalent to plain Unicode within their defined ranges). The codepoint

 is not a legitimate Unicode character, and, furthermore, it is guaranteed

 Unicode Consortium that 0xFFFF will NEVER be a legitimate Unicode

 This codepoint will remain forever unassigned, precisely so that it may be

 for purposes such as this.

 Be it noted that that the codepoint 0 is a bad choice for a default value.

 might have made sense in C, where '\0' has special meaning as a string
 terminator, but in D '\0' is just another character. Unicode defines '\0'

 control character whose interpretation is implementation dependent.

 feel, to use a value with universal meaning.

 Jill

Jun 07 2004
prev sibling parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
Arcane Jill wrote:
 Hi,
 
 The default value of NaN for floating point numbers is an excellent idea. I
 suggest that we do the same thing for chars, wchars and dchars.
 
 The init value for char should (IMO) be 0xFF. Rationale - char by definition
 contains a UTF-8 fragment. The byte 0xFF will never occur in a valid UTF-8
 sequence. It is a clear indication of an unassigned value.
 
 The init value for wchar and dchar should be 0xFFFF (that is, 0x0000FFFF for
 dchar). Rationale - wchar and dchar by definiton contain UTF-16 and UTF-32
 (equivalent to plain Unicode within their defined ranges). The codepoint U+FFFF
 is not a legitimate Unicode character, and, furthermore, it is guaranteed by
the
 Unicode Consortium that 0xFFFF will NEVER be a legitimate Unicode character.
 This codepoint will remain forever unassigned, precisely so that it may be used
 for purposes such as this.
 
 Be it noted that that the codepoint 0 is a bad choice for a default value. It
 might have made sense in C, where '\0' has special meaning as a string
 terminator, but in D '\0' is just another character. Unicode defines '\0' as a
 control character whose interpretation is implementation dependent. Better, I
 feel, to use a value with universal meaning.

I like the 0 initialization. It is consistent and easy to understand and remember. And it has an important function. If anyone ever passes an uninitialized D memory block to functions that expect a 0-terminated string then nothing bad will happen. But then again, I also don't like that floats are initialized to NaN. If it HAS to be done then there should definitely be an easy-to-remember property for the char types to test for this. Otherwise many programmers will have a hard time remembering which value means "not a char". Hauke
Jun 07 2004
next sibling parent "Ben Hinkle" <bhinkle mathworks.com> writes:
 If it HAS to be done then there should definitely be an easy-to-remember
 property for the char types to test for this. Otherwise many programmers
 will have a hard time remembering which value means "not a char".

.init?
Jun 07 2004
prev sibling parent Arcane Jill <Arcane_member pathlink.com> writes:
In article <ca2754$h5k$1 digitaldaemon.com>, Hauke Duden says...

If it HAS to be done then there should definitely be an easy-to-remember 
property for the char types to test for this. Otherwise many programmers 
will have a hard time remembering which value means "not a char".

You're not supposed to /test/ for uninitialized variables - you're simply supposed to initialize them! And that error, of course is exactly what we're trying to catch. Anyway, you could always test for "if (c == char.init)" no matter what char.init was. By the way, I got to look at your Unichar code today. Excellent stuff. It's on my machine now. Also, you were right about doxygen, judging by the quality of your documentation - it really does rock. Jill
Jun 07 2004