www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - Weird error on char literal outside UTF-16 or UTF-32 range

reply Stewart Gordon <smjg_1998 yahoo.com> writes:
dchar qwert = '\U00110000';
----------
D:\My Documents\Programming\D\Tests\bugs\utf32overflow.d(1): invalid UTF 
character \U08x
----------

I don't know if it's intended behaviour to reject UTF-32 codes that are 
outside the range that's valid so far.  But that error message doesn't 
exactly make sense.

It's the exact same error for any value above '\U0010FFFF' AFAICT, and 
also for the 'permanently unassigned' codes ('\U0000FFFF', '\U0000FFFE', 
'\uFFFF', '\uFFFE')....

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on 
the 'group where everyone may benefit.
Aug 10 2004
next sibling parent Arcane Jill <Arcane_member pathlink.com> writes:
In article <cfa6p8$1i5a$1 digitaldaemon.com>, Stewart Gordon says...
dchar qwert = '\U00110000';
----------
D:\My Documents\Programming\D\Tests\bugs\utf32overflow.d(1): invalid UTF 
character \U08x
----------

I'll leave Walter to comment on that error message.
I don't know if it's intended behaviour to reject UTF-32 codes that are 
outside the range that's valid so far.

Yes and no. As I understand it, it goes like this: # dchar qwert = 0x00110000; // should succeed # dchar qwert = '\U00110000'; // should fail It's only because you put it inside a character literal that you got problems - and I think that's reasonable, because (as you know), there is no such character as U+110000, but there /is/ such a number as 0x110000. There are some fancy esoteric reasons why you might want to store noncharacters in a dchar, but only if you /really/ know what you're doing - and in such circumstances you would never pass such a value to a UTF conversion function, because you /know/ it's going to fail to validate.
But that error message doesn't 
exactly make sense.

I can't argue with that. Arcane Jill
Aug 10 2004
prev sibling parent "Walter" <newshound digitalmars.com> writes:
"Stewart Gordon" <smjg_1998 yahoo.com> wrote in message
news:cfa6p8$1i5a$1 digitaldaemon.com...
 dchar qwert = '\U00110000';
 ----------
 D:\My Documents\Programming\D\Tests\bugs\utf32overflow.d(1): invalid UTF
 character \U08x
 ----------

 I don't know if it's intended behaviour to reject UTF-32 codes that are
 outside the range that's valid so far.  But that error message doesn't
 exactly make sense.

That's 'cuz the format is supposed to be \\U%08x, not \\U08x <g>
 It's the exact same error for any value above '\U0010FFFF' AFAICT, and
 also for the 'permanently unassigned' codes ('\U0000FFFF', '\U0000FFFE',
 '\uFFFF', '\uFFFE')....

If you want to use invalid UTF characters, you'll need to do it explicitly: dchar qwert = cast(dchar)0x00110000; Also, all the phobos library functions that deal with UTF strings are only defined to work with valid UTF characters.
Aug 10 2004