www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - latin-1 encoding

reply "Simen Haugen" <simen norstat.no> writes:
I'm just starting to look at D, but I can't seem to find any encodings for 
latin-1 in the standard library... 
Jan 11 2007
next sibling parent reply Johan Granberg <lijat.meREM OVE.gmail.com> writes:
Simen Haugen wrote:

 I'm just starting to look at D, but I can't seem to find any encodings for
 latin-1 in the standard library...

What are you trying to do? It would be helpfull to know if you want to read files in latin-1 or if you want your whole program to use it internally.
Jan 11 2007
parent reply "Simen Haugen" <simen norstat.no> writes:
"Johan Granberg" wrote:
 What are you trying to do? It would be helpfull to know if you want to 
 read
 files in latin-1 or if you want your whole program to use it internally.

Reading and writing files.
Jan 11 2007
next sibling parent Johan Granberg <lijat.meREM OVE.gmail.com> writes:
Simen Haugen wrote:

 "Johan Granberg" wrote:
 What are you trying to do? It would be helpfull to know if you want to
 read
 files in latin-1 or if you want your whole program to use it internally.

Reading and writing files.

there is no string manipulation functions i the standard library that will help you there but you could read them as usual but instead of using char[] use ubyte[] to store them. If you want to use string manipulation functions the easiest would be to convert to utf8, there was some discussion of how to do that a couple of weeks ago.
Jan 12 2007
prev sibling parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Simen Haugen wrote:
 "Johan Granberg" wrote:
 What are you trying to do? It would be helpfull to know if you want to 
 read
 files in latin-1 or if you want your whole program to use it internally.

Reading and writing files.

Now I'm no expert in character encodings, but isn't Latin-1 just the first 256 codepoints (or whatever they're called) of Unicode, packed into a single byte per character? If so, it should be pretty trivial to convert latin-1 characters to Unicode, either to wchar[]/dchar[] by direct one-to-one assignment (no multibyte sequences possible) or to char[] by using std.utf.encode, like this: ----- // warning: incomplete, untested code ubyte[] data_lat1; // ... fill data_lat1 array char[] data_utf8; // perhaps preallocate this to a reasonable length foreach(c; data_lat1) { std.utf.encode(data_utf8, c); } ----- And UTF to Latin-1 should be pretty easy too: ----- // again: incomplete, untested code char[] data_utf; // wchar[] and dchar[] should work as well ubyte[] data_lat1; // again, preallocate a reasonable array if you want size_t i = 0; while(i < data_utf.length) { dchar c = std.utf.decode(data_utf, i); // advances i assert(c < 0x100); // make sure it fits data_lat1 ~= c; } ----- I should note that by 'preallocate' I mean '"new" an array and set the length to 0'. Setting the length to 0 is important since otherwise your output will get appended to the end of a default-initialized array, which isn't what you want ;)
Jan 12 2007
prev sibling parent "Frank Benoit (keinfarbton)" <benoit tionex.removethispart.de> writes:
Simen Haugen schrieb:
 I'm just starting to look at D, but I can't seem to find any encodings for 
 latin-1 in the standard library... 
 
 

you can try the mango project. It has a package called ICU, that does convertions between various encodings and unicode.
Jan 12 2007