digitalmars.D - UTF-8 bug
- Arcane Jill <Arcane_member pathlink.com> Jun 05 2004
- "Walter" <newshound digitalmars.com> Jun 05 2004
The following is correct behavior, and is implemented correctly. Nice one! The compiler correctly correctly rejects the following line.char c = 'ß'; // compile error - invalid UTF-8 sequence
However, we see a related bug in the following example:char c = 0xC3; // first byte of a UTF-8 sequence wchar w = c;
This auto-promotion should fail, throwing a runtime exception (because 0xC3 by itself is an invalid UTF-8 sequence). Current behavior is that the cast succeeds, as though c had contained an ISO-8859-1 character instead of a UTF-8 fragment. Arcane Jill
Jun 05 2004
"Arcane Jill" <Arcane_member pathlink.com> wrote in message news:c9s7nu$1255$1 digitaldaemon.com...The following is correct behavior, and is implemented correctly. Nice one!
compiler correctly correctly rejects the following line.char c = 'ß'; // compile error - invalid UTF-8 sequence
However, we see a related bug in the following example:char c = 0xC3; // first byte of a UTF-8 sequence wchar w = c;
This auto-promotion should fail, throwing a runtime exception (because
itself is an invalid UTF-8 sequence). Current behavior is that the cast succeeds, as though c had contained an ISO-8859-1 character instead of a
fragment.
I see what you're saying. Doing such would require a runtime test; not sure about the tradeoffs.
Jun 05 2004








"Walter" <newshound digitalmars.com>