digitalmars.D - using encodings other than UTF-8

digitalmars.D - using encodings other than UTF-8 - part 2

Piotr Dworaczyk (28/28) Oct 06 2007 Hi,

Marcin Kuszczak (21/50) Oct 06 2007 I use something like this for converting from other encodings:

Piotr Dworaczyk (5/17) Oct 06 2007 Welcome to The Club :)

Jay Norwood (3/27) Oct 07 2007 The fox tools project had a big effort to add a bunch of text codecs ove...

"Piotr Dworaczyk" <pjdworaczyk o2.pl> writes:

Hi,
is there any way to process non ASCII characters in encodings other than=
  =

UTF-8?

I've asked a similar question some time ago (  =

http://www.digitalmars.com/webnews/newsgroups.php?art_group=3Ddigitalmar=
s.D&article_id=3D54417)  =

about polish national characters, but haven't found any examples since.



language.
As a hobby programmer and CS teacher I already thought about using it as=
 a  =

teaching tool,
but, please understand, the character encoding issues are a no-go.

To tell the significance of the problem, just imagine, that beside of  =

utf-8 (which still isn't very popular),
there are two major implementations of the polish national characters:  =

windows-1250 (cp-1250) and iso-8859-2.

I already thought about running a conversion to utf-8, before the D  =

program's launch, and a conversion from utf-8 to the apropriate encoding=
,  =

but it does make little sense.

So is there a way, or could there be a possibility to add it in future  =

versions of the standard library?

Thanks for your answers,

Piotr Dworaczyk

-- =

Using Opera Mail: http://www.opera.com/mail/

Oct 06 2007

Marcin Kuszczak <aarti interia.pl> writes:

Piotr Dworaczyk wrote:

 Hi,
 is there any way to process non ASCII characters in encodings other than
 UTF-8?
 
 I've asked a similar question some time ago (

http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=54417)
 about polish national characters, but haven't found any examples since.
 

 language.
 As a hobby programmer and CS teacher I already thought about using it as a
 teaching tool,
 but, please understand, the character encoding issues are a no-go.
 
 To tell the significance of the problem, just imagine, that beside of
 utf-8 (which still isn't very popular),
 there are two major implementations of the polish national characters:
 windows-1250 (cp-1250) and iso-8859-2.
 
 I already thought about running a conversion to utf-8, before the D
 program's launch, and a conversion from utf-8 to the apropriate encoding,
 but it does make little sense.
 
 So is there a way, or could there be a possibility to add it in future
 versions of the standard library?
 
 Thanks for your answers,
 
 Piotr Dworaczyk
 


I use something like this for converting from other encodings:

char[] readFile(char[] name) {

        char[] file=cast(char[])std.file.read(name) ~ '\0';
        file=std.windows.charset.fromMBSz(cast(char*)file, 1250);
        return file;
}

and similarly for writing:
        std.windows.charset.toMBSz(content, 1250);      //1250 - polish windows
codepage

It seems that it works on windows (and only on windows). But I have to agree
that support for codepages should be much better (e.g. easy detecting
current codepage)...

See for docs: http://www.digitalmars.com/d/phobos/std_windows_charset.html


-- 
Regards
Marcin Kuszczak (Aarti_pl)
-------------------------------------
Ask me why I believe in Jesus - http://www.zapytajmnie.com (en/pl)
Doost (port of few Boost libraries) - http://www.dsource.org/projects/doost/
-------------------------------------

Oct 06 2007

"Piotr Dworaczyk" <pjdworaczyk o2.pl> writes:

 I use something like this for converting from other encodings:

 char[] readFile(char[] name) {

         char[] file=3Dcast(char[])std.file.read(name) ~ '\0';
         file=3Dstd.windows.charset.fromMBSz(cast(char*)file, 1250);
         return file;
 }

 and similarly for writing:
         std.windows.charset.toMBSz(content, 1250);      //1250 - polis=

h  =

 windows codepage

Thanks / Dzieki / for the code.

 But I have to agree that support for codepages should be much better  =

 (e.g. easy detecting
 current codepage)...

Welcome to The Club :)


-- =

Using Opera Mail: http://www.opera.com/mail/

Oct 06 2007

Jay Norwood <jayn io.com> writes:

Piotr Dworaczyk Wrote:

 I use something like this for converting from other encodings:

 char[] readFile(char[] name) {

         char[] file=cast(char[])std.file.read(name) ~ '\0';
         file=std.windows.charset.fromMBSz(cast(char*)file, 1250);
         return file;
 }

 and similarly for writing:
         std.windows.charset.toMBSz(content, 1250);      //1250 - polish  
 windows codepage

 Thanks / Dzieki / for the code.
 
 But I have to agree that support for codepages should be much better  
 (e.g. easy detecting
 current codepage)...

 
 Welcome to The Club :)
 
 
 -- 
 Using Opera Mail: http://www.opera.com/mail/

The fox tools project had a big effort to add a bunch of text codecs over the
last couple of years.   Perhaps a library conversion to D would supply the
support you want.

http://www.fox-toolkit.org/fox.html

Oct 07 2007

D Programming

C/C++ Programming

Other

digitalmars.D - using encodings other than UTF-8 - part 2