digitalmars.D - print non-ASCII/UTF-8 string
- Egor Starostin (16/16) Dec 22 2006 Let's say that file q.txt contains some characters bigger than 0x7f (for
- Pragma (27/44) Dec 22 2006 It's funny that you should bring this up now. I had a thread over in
- Egor Starostin (3/9) Dec 22 2006 It's not my case, I think.
- Jarrett Billingsley (5/8) Dec 22 2006 Hm. This might be one case where printf is actually useful:
- Thomas Kuehne (11/19) Dec 22 2006 -----BEGIN PGP SIGNED MESSAGE-----
- BCS (4/23) Dec 22 2006 This works as well. But only because array parts are in the correct
- Bruno Medeiros (7/20) Dec 23 2006 Or rather:
Let's say that file q.txt contains some characters bigger than 0x7f (for
example, from windows-1252 encoding).
In such case the following snippet:
***
import std.stream;
void main() {
Stream f = new BufferedFile("q.txt");
for (char[] l; f) {
writefln(l);
}
}
***
will fail with 'Error: 4invalid UTF-8 sequence' because D's strings are in
UTF-8, right?
My question is: is there any way to print out non-UTF-8 data exactly in the
same encoding (which may be unknown) as in original file?
Dec 22 2006
Egor Starostin wrote:
Let's say that file q.txt contains some characters bigger than 0x7f (for
example, from windows-1252 encoding).
In such case the following snippet:
***
import std.stream;
void main() {
Stream f = new BufferedFile("q.txt");
for (char[] l; f) {
writefln(l);
}
}
***
will fail with 'Error: 4invalid UTF-8 sequence' because D's strings are in
UTF-8, right?
My question is: is there any way to print out non-UTF-8 data exactly in the
same encoding (which may be unknown) as in original file?
It's funny that you should bring this up now. I had a thread over in
d.D.learn regarding this very thing. The following should help you get
started:
char[] Latin1ToUTF8(char[] value){
char[] result;
for(uint i=0; i<value.length; i++){
char ch = value[i];
if(ch < 0x80){
result ~= ch;
}
else{
result ~= 0xC0 | (ch >> 6);
result ~= 0x80 | (ch & 0x3F);
}
}
return result;
}
(this could be optimized to use fewer concatenations, but I think it
gets the point across)
I have no clue how to work from other code pages, as I gather the
transform would be far less than straightforward as Latin-1.
Also, I have no idea how to *detect* what code page is being used
based on the input set. I don't even know if that's possible, like you,
, I'd love to hear about it should someone else know of an algorithm.
--
- EricAnderton at yahoo
Dec 22 2006
It's not my case, I think. I don't need to convert to UTF-8. I just need to raw print exactly the same string as in original file.My question is: is there any way to print out non-UTF-8 data exactly in the same encoding (which may be unknown) as in original file?It's funny that you should bring this up now. I had a thread over in d.D.learn regarding this very thing. The following should help you get started: char[] Latin1ToUTF8(char[] value){
Dec 22 2006
"Egor Starostin" <egorst gmail.com> wrote in message news:emgvkj$1ll6$1 digitaldaemon.com...I don't need to convert to UTF-8. I just need to raw print exactly the same string as in original file.Hm. This might be one case where printf is actually useful: foreach(l; f) printf("%s\n", toStringz(l));
Dec 22 2006
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jarrett Billingsley schrieb am 2006-12-22:"Egor Starostin" <egorst gmail.com> wrote in message news:emgvkj$1ll6$1 digitaldaemon.com...This should work more reliable and consume less resources: printf("%.*s\n", l.length, l.ptr); Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFFjDDwLK5blCcjpWoRAkg4AJ4uUr0r5t6p2DSD0WYoQU16KqjrmQCfTWjN o4ASI5v294bKKaW1rzDPk54= =/ey0 -----END PGP SIGNATURE-----I don't need to convert to UTF-8. I just need to raw print exactly the same string as in original file.Hm. This might be one case where printf is actually useful: foreach(l; f) printf("%s\n", toStringz(l));
Dec 22 2006
Thomas Kuehne wrote:-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jarrett Billingsley schrieb am 2006-12-22:This works as well. But only because array parts are in the correct order to begin with printf("%.*s\n", l);"Egor Starostin" <egorst gmail.com> wrote in message news:emgvkj$1ll6$1 digitaldaemon.com...This should work more reliable and consume less resources: printf("%.*s\n", l.length, l.ptr); ThomasI don't need to convert to UTF-8. I just need to raw print exactly the same string as in original file.Hm. This might be one case where printf is actually useful: foreach(l; f) printf("%s\n", toStringz(l));
Dec 22 2006
Jarrett Billingsley wrote:"Egor Starostin" <egorst gmail.com> wrote in message news:emgvkj$1ll6$1 digitaldaemon.com...Or rather: dout.write(cast(ubyte[]) line); ? -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DI don't need to convert to UTF-8. I just need to raw print exactly the same string as in original file.Hm. This might be one case where printf is actually useful: foreach(l; f) printf("%s\n", toStringz(l));
Dec 23 2006









BCS <BCS pathilink.com> 