www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Assigning certain wchars to a char is allowed

reply Bruno Medeiros <daiphoenixNO SPAMlycos.com> writes:
Currently, D accepts the assigment of wchar literals to chars, even on
certain dubious cases. In these examples:

   char ch1 = '\u0100'; // Error: cannot convert ... type wchar to char
   char ch2 = '\u0044'; // No Error, ok.
   char ch3 = '\u00E7'; // No Error, but should it be ok?
   writefln(ch3); // prints: Error: 4invalid UTF-8 sequence

shouldn't the third case also be an error, because altough the codepoint
itselft (0xE7) fits within the range of a char, in UTF-8 it is enconded
with two code values (0xC3A7), and thus it cannot be present in a char.

-- 
Bruno Medeiros - CS/E student
"Certain aspects of D are a pathway to many abilities some consider to
be... unnatural."
Nov 25 2005
parent Georg Wrede <georg.wrede nospam.org> writes:
Bruno Medeiros wrote:
 Currently, D accepts the assigment of wchar literals to chars, even on
 certain dubious cases. In these examples:
 
   char ch1 = '\u0100'; // Error: cannot convert ... type wchar to char
   char ch2 = '\u0044'; // No Error, ok.
   char ch3 = '\u00E7'; // No Error, but should it be ok?
   writefln(ch3); // prints: Error: 4invalid UTF-8 sequence
 
 shouldn't the third case also be an error, because altough the codepoint
 itselft (0xE7) fits within the range of a char, in UTF-8 it is enconded
 with two code values (0xC3A7), and thus it cannot be present in a char.
Yes. Sort of. Check the String Unified Theory thread for more info. It'll be fixed. (But it's not as easy as just making that an error. The fix has to (and will be) part of a larger rework around chars, strings and such.
Nov 25 2005