D - UTF8/16 always 8/16 bits ?
- Achilleas Margaritis <Achilleas_member pathlink.com> Apr 22 2004
- Ben Hinkle <bhinkle4 juno.com> Apr 22 2004
- "Scott Egan" <scotte tpg.com.aux> Apr 22 2004
The unicode standard says that UTF8 and UTF16 characters vary in size. How D handles this ? is it assumed that UTF8 chars are always 8-bits and UTF16 chars are always 16 bits ?
Apr 22 2004
On Thu, 22 Apr 2004 11:57:31 +0000 (UTC), Achilleas Margaritis <Achilleas_member pathlink.com> wrote:The unicode standard says that UTF8 and UTF16 characters vary in size. How D handles this ? is it assumed that UTF8 chars are always 8-bits and UTF16 chars are always 16 bits ?
In std.utf http://www.digitalmars.com/d/phobos.html#utf there are functions like dchar decode(char[] s, inout uint idx) that take a UTF8 char[] and an index and return the UTF32 codepoint and advances the index by one or more bytes. The regular array indexing [] doesn't know about multi-slot characters. -Ben
Apr 22 2004
It doesn't although they are called UTF-8 and UTF-16 they are just arrays of appropriate lengh chars. The O/S is what really has to deal with them as Unicode. This means of course that using indexes against the char[] and mucking aroung with the data you may end up with invalid unicode. telle est la vie "Achilleas Margaritis" <Achilleas_member pathlink.com> wrote in message news:c68bvb$1vgk$1 digitaldaemon.com...The unicode standard says that UTF8 and UTF16 characters vary in size. How
handles this ? is it assumed that UTF8 chars are always 8-bits and UTF16
are always 16 bits ?
Apr 22 2004









Ben Hinkle <bhinkle4 juno.com> 