www.digitalmars.com         C & C++   DMDScript  

D - UTF8/16 always 8/16 bits ?

reply Achilleas Margaritis <Achilleas_member pathlink.com> writes:
The unicode standard says that UTF8 and UTF16 characters vary in size. How D
handles this ? is it assumed that UTF8 chars are always 8-bits and UTF16 chars
are always 16 bits ?
Apr 22 2004
next sibling parent Ben Hinkle <bhinkle4 juno.com> writes:
On Thu, 22 Apr 2004 11:57:31 +0000 (UTC), Achilleas Margaritis
<Achilleas_member pathlink.com> wrote:

The unicode standard says that UTF8 and UTF16 characters vary in size. How D
handles this ? is it assumed that UTF8 chars are always 8-bits and UTF16 chars
are always 16 bits ?
In std.utf http://www.digitalmars.com/d/phobos.html#utf there are functions like dchar decode(char[] s, inout uint idx) that take a UTF8 char[] and an index and return the UTF32 codepoint and advances the index by one or more bytes. The regular array indexing [] doesn't know about multi-slot characters. -Ben
Apr 22 2004
prev sibling parent "Scott Egan" <scotte tpg.com.aux> writes:
It doesn't although they are called UTF-8 and UTF-16 they are just arrays of
appropriate lengh chars.

The O/S is what really has to deal with them as Unicode.

This means of course that using indexes against the char[] and mucking
aroung with the data you may end up with invalid unicode.

telle est la vie


"Achilleas Margaritis" <Achilleas_member pathlink.com> wrote in message
news:c68bvb$1vgk$1 digitaldaemon.com...
 The unicode standard says that UTF8 and UTF16 characters vary in size. How
D
 handles this ? is it assumed that UTF8 chars are always 8-bits and UTF16
chars
 are always 16 bits ?
Apr 22 2004