digitalmars.D.learn - latin-1 encoding
- "Simen Haugen" <simen norstat.no> Jan 11 2007
- Johan Granberg <lijat.meREM OVE.gmail.com> Jan 11 2007
- "Simen Haugen" <simen norstat.no> Jan 11 2007
- Johan Granberg <lijat.meREM OVE.gmail.com> Jan 12 2007
- Frits van Bommel <fvbommel REMwOVExCAPSs.nl> Jan 12 2007
- "Frank Benoit (keinfarbton)" <benoit tionex.removethispart.de> Jan 12 2007
I'm just starting to look at D, but I can't seem to find any encodings for latin-1 in the standard library...
Jan 11 2007
Simen Haugen wrote:I'm just starting to look at D, but I can't seem to find any encodings for latin-1 in the standard library...
What are you trying to do? It would be helpfull to know if you want to read files in latin-1 or if you want your whole program to use it internally.
Jan 11 2007
"Johan Granberg" wrote:What are you trying to do? It would be helpfull to know if you want to read files in latin-1 or if you want your whole program to use it internally.
Reading and writing files.
Jan 11 2007
Simen Haugen wrote:"Johan Granberg" wrote:What are you trying to do? It would be helpfull to know if you want to read files in latin-1 or if you want your whole program to use it internally.
Reading and writing files.
there is no string manipulation functions i the standard library that will help you there but you could read them as usual but instead of using char[] use ubyte[] to store them. If you want to use string manipulation functions the easiest would be to convert to utf8, there was some discussion of how to do that a couple of weeks ago.
Jan 12 2007
Simen Haugen wrote:"Johan Granberg" wrote:What are you trying to do? It would be helpfull to know if you want to read files in latin-1 or if you want your whole program to use it internally.
Reading and writing files.
Now I'm no expert in character encodings, but isn't Latin-1 just the first 256 codepoints (or whatever they're called) of Unicode, packed into a single byte per character? If so, it should be pretty trivial to convert latin-1 characters to Unicode, either to wchar[]/dchar[] by direct one-to-one assignment (no multibyte sequences possible) or to char[] by using std.utf.encode, like this: ----- // warning: incomplete, untested code ubyte[] data_lat1; // ... fill data_lat1 array char[] data_utf8; // perhaps preallocate this to a reasonable length foreach(c; data_lat1) { std.utf.encode(data_utf8, c); } ----- And UTF to Latin-1 should be pretty easy too: ----- // again: incomplete, untested code char[] data_utf; // wchar[] and dchar[] should work as well ubyte[] data_lat1; // again, preallocate a reasonable array if you want size_t i = 0; while(i < data_utf.length) { dchar c = std.utf.decode(data_utf, i); // advances i assert(c < 0x100); // make sure it fits data_lat1 ~= c; } ----- I should note that by 'preallocate' I mean '"new" an array and set the length to 0'. Setting the length to 0 is important since otherwise your output will get appended to the end of a default-initialized array, which isn't what you want ;)
Jan 12 2007
Simen Haugen schrieb:I'm just starting to look at D, but I can't seem to find any encodings for latin-1 in the standard library...
you can try the mango project. It has a package called ICU, that does convertions between various encodings and unicode.
Jan 12 2007









Johan Granberg <lijat.meREM OVE.gmail.com> 