digitalmars.D - ASCII to UTF conversion?

Jarrett Billingsley (8/8) Nov 28 2005 Maybe I missed something in the D Docs, but is there a way to convert fr...

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (6/14) Nov 29 2005 You need to find out which encoding that your non-UTF functions return.
Oskar Linde (18/27) Nov 29 2005 ASCII to UTF-8 is simple:
Walter Bright (4/12) Nov 29 2005 with
Jarrett Billingsley (9/9) Nov 29 2005 "Jarrett Billingsley" wrote in message

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

Maybe I missed something in the D Docs, but is there a way to convert from 
ASCII to UTF?  Sometimes problems arise when dealing with non-UTF-aware 
functions (like those in some libraries), when they return ASCII strings 
that have characters above 0x7F.  All it ends me up with is heartache and 
"4Invalid UTF-8 Sequence" exceptions.

So is there a standard function for doing this, or would I just be better 
off looping through the string and replacing any above-0x7F characters with 
underscores or something?

Nov 28 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Jarrett Billingsley wrote:

 Maybe I missed something in the D Docs, but is there a way to convert from 
 ASCII to UTF?  Sometimes problems arise when dealing with non-UTF-aware 
 functions (like those in some libraries), when they return ASCII strings 
 that have characters above 0x7F.  All it ends me up with is heartache and 
 "4Invalid UTF-8 Sequence" exceptions.

You need to find out which encoding that your non-UTF functions return.
Hint: it's not ASCII, as that is a 7-bit encoding compatible with UTF-8

 So is there a standard function for doing this, or would I just be better 
 off looping through the string and replacing any above-0x7F characters with 
 underscores or something? 

There are no functions in Phobos (as far as I know), but libiconv works.
See: http://www.prowiki.org/wiki4d/wiki.cgi?CharsAndStrs ("8 bit enc.")

--anders

Nov 29 2005

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Jarrett Billingsley wrote:
 Maybe I missed something in the D Docs, but is there a way to convert from 
 ASCII to UTF?  Sometimes problems arise when dealing with non-UTF-aware 
 functions (like those in some libraries), when they return ASCII strings 
 that have characters above 0x7F.  All it ends me up with is heartache and 
 "4Invalid UTF-8 Sequence" exceptions.
 
 So is there a standard function for doing this, or would I just be better 
 off looping through the string and replacing any above-0x7F characters with 
 underscores or something? 

ASCII to UTF-8 is simple:



But by mentioning characters above 0x7F, I assume you mean something 
else than ASCII...

Here is a simple Latin-1 to UTF-16 converter:









(Disclaimer: no code is tested.)

For 8-bit character sets other than Latin-1 (ISO 8859-1) you will need a 
library to supply the mapping. (Unicode's lower 256 code points map 1:1 
to Latin-1)

/Oskar

Nov 29 2005

"Walter Bright" <newshound digitalmars.com> writes:

"Jarrett Billingsley" <kb3ctd2 yahoo.com> wrote in message
news:dmgmc4$hed$1 digitaldaemon.com...
 Maybe I missed something in the D Docs, but is there a way to convert from
 ASCII to UTF?  Sometimes problems arise when dealing with non-UTF-aware
 functions (like those in some libraries), when they return ASCII strings
 that have characters above 0x7F.  All it ends me up with is heartache and
 "4Invalid UTF-8 Sequence" exceptions.

 So is there a standard function for doing this, or would I just be better
 off looping through the string and replacing any above-0x7F characters

with
 underscores or something?

You can try the functions in std.charset.

Nov 29 2005

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Jarrett Billingsley" <kb3ctd2 yahoo.com> wrote in message 
news:dmgmc4$hed$1 digitaldaemon.com...

Thanks for the replies!  Walter's suggestion is what I was looking for - 
totally missed those functions.

And yes, I suppose I meant "Latin 1."  I didn't realize that the formal 
definition of ASCII was still so strict as to mean just the characters 
between 0x0 and 0x7F; for me, characters between 0x0 and 0xFF have always 
been "ASCII."  I guess that's what happens when you only have five years of 
programming experience.

Nov 29 2005

D Programming

C/C++ Programming

Other

digitalmars.D - ASCII to UTF conversion?