www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - using encodings other than UTF-8 - part 2

reply "Piotr Dworaczyk" <pjdworaczyk o2.pl> writes:
Hi,
is there any way to process non ASCII characters in encodings other than=
  =

UTF-8?

I've asked a similar question some time ago (  =

http://www.digitalmars.com/webnews/newsgroups.php?art_group=3Ddigitalmar=
s.D&article_id=3D54417)  =

about polish national characters, but haven't found any examples since.

D's very seductive as a better designed and/or C/C++/Java/C# alike  =

language.
As a hobby programmer and CS teacher I already thought about using it as=
 a  =

teaching tool,
but, please understand, the character encoding issues are a no-go.

To tell the significance of the problem, just imagine, that beside of  =

utf-8 (which still isn't very popular),
there are two major implementations of the polish national characters:  =

windows-1250 (cp-1250) and iso-8859-2.

I already thought about running a conversion to utf-8, before the D  =

program's launch, and a conversion from utf-8 to the apropriate encoding=
,  =

but it does make little sense.

So is there a way, or could there be a possibility to add it in future  =

versions of the standard library?

Thanks for your answers,

Piotr Dworaczyk

-- =

Using Opera Mail: http://www.opera.com/mail/
Oct 06 2007
parent reply Marcin Kuszczak <aarti interia.pl> writes:
Piotr Dworaczyk wrote:

 Hi,
 is there any way to process non ASCII characters in encodings other than
 UTF-8?
 
 I've asked a similar question some time ago (

 about polish national characters, but haven't found any examples since.
 
 D's very seductive as a better designed and/or C/C++/Java/C# alike
 language.
 As a hobby programmer and CS teacher I already thought about using it as a
 teaching tool,
 but, please understand, the character encoding issues are a no-go.
 
 To tell the significance of the problem, just imagine, that beside of
 utf-8 (which still isn't very popular),
 there are two major implementations of the polish national characters:
 windows-1250 (cp-1250) and iso-8859-2.
 
 I already thought about running a conversion to utf-8, before the D
 program's launch, and a conversion from utf-8 to the apropriate encoding,
 but it does make little sense.
 
 So is there a way, or could there be a possibility to add it in future
 versions of the standard library?
 
 Thanks for your answers,
 
 Piotr Dworaczyk
 

I use something like this for converting from other encodings: char[] readFile(char[] name) { char[] file=cast(char[])std.file.read(name) ~ '\0'; file=std.windows.charset.fromMBSz(cast(char*)file, 1250); return file; } and similarly for writing: std.windows.charset.toMBSz(content, 1250); //1250 - polish windows codepage It seems that it works on windows (and only on windows). But I have to agree that support for codepages should be much better (e.g. easy detecting current codepage)... See for docs: http://www.digitalmars.com/d/phobos/std_windows_charset.html -- Regards Marcin Kuszczak (Aarti_pl) ------------------------------------- Ask me why I believe in Jesus - http://www.zapytajmnie.com (en/pl) Doost (port of few Boost libraries) - http://www.dsource.org/projects/doost/ -------------------------------------
Oct 06 2007
parent reply "Piotr Dworaczyk" <pjdworaczyk o2.pl> writes:
 I use something like this for converting from other encodings:

 char[] readFile(char[] name) {

         char[] file=3Dcast(char[])std.file.read(name) ~ '\0';
         file=3Dstd.windows.charset.fromMBSz(cast(char*)file, 1250);
         return file;
 }

 and similarly for writing:
         std.windows.charset.toMBSz(content, 1250);      //1250 - polis=

 windows codepage

 But I have to agree that support for codepages should be much better  =

 (e.g. easy detecting
 current codepage)...

Welcome to The Club :) -- = Using Opera Mail: http://www.opera.com/mail/
Oct 06 2007
parent Jay Norwood <jayn io.com> writes:
Piotr Dworaczyk Wrote:

 I use something like this for converting from other encodings:

 char[] readFile(char[] name) {

         char[] file=cast(char[])std.file.read(name) ~ '\0';
         file=std.windows.charset.fromMBSz(cast(char*)file, 1250);
         return file;
 }

 and similarly for writing:
         std.windows.charset.toMBSz(content, 1250);      //1250 - polish  
 windows codepage

 But I have to agree that support for codepages should be much better  
 (e.g. easy detecting
 current codepage)...

Welcome to The Club :) -- Using Opera Mail: http://www.opera.com/mail/

The fox tools project had a big effort to add a bunch of text codecs over the last couple of years. Perhaps a library conversion to D would supply the support you want. http://www.fox-toolkit.org/fox.html
Oct 07 2007