www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Character-code sets other than utf8

reply Hiroshi Sakurai <Hiroshi_member pathlink.com> writes:
Happy new year.
Akemasite Omedetou Gozaimasu.

I like D language. 
Originally, D language is a language that uses UTF8.
Therefore, we should use the utf8 console. 

However, we should use past character-code set MS932, EUC-JS and Shift-Jis.
(I am Japanese. )

I waited for a formal support of more character-codes in D language. 

However, it is not supported. 

Please teach the specification of a formal character string conversion library. 

I want to treat character-codes other than utf8 by a formal method. 

I want to write in the library by a formal method. 

http://www.digitalmars.com/d/archives/digitalmars/D/learn/1510.html

I read this one. but, I don't know...

thanks Hiroshi Sakurai.
Jan 15 2006
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Hiroshi Sakurai wrote:

 Happy new year.
 Akemasite Omedetou Gozaimasu.

Gott Nytt År.
 I like D language. 
 Originally, D language is a language that uses UTF8.
 Therefore, we should use the utf8 console. 

D only supports UTF-8 consoles. If you run it from a console which doesn't use UTF-8, you'll get errors. (e.g. the args[] could contain invalid Unicode...)
 However, we should use past character-code set MS932, EUC-JS and Shift-Jis.
 (I am Japanese. )
 
 I waited for a formal support of more character-codes in D language. 
 
 However, it is not supported. 

You can use the "iconv" library to translate to/from legacy encodings: "Japanese EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP, ISO-2022-JP-2, ISO-2022-JP-1" http://www.gnu.org/software/libiconv/ --anders PS. Old D module at http://www.algonet.se/~afb/d/libiconv.d
Jan 15 2006
parent reply Hiroshi Sakurai <Hiroshi_member pathlink.com> writes:
Thank you. anders.
Gott Nytt År.

libiconv.d is very good library!
But, I am hoping for a public domain license. 

I write there....

import std.cstream; 
import std.windows.charset; 
import std.utf; 
import std.string; 
void main() {
dout.writeLine(toString(toMBSz("what your name?")));
char[] name = fromMBSz(din.readLine());
dout.writeLine(toString(toMBSz("your name is " ~ name ~ ".")));
}

now I don't write japanese in d.

I hope for it to be reflected in stdio when I write LOCALE 
information in dmd.conf. 

LOCALE=Japanese
or
CHARSET=Shift_JIS

I want you for the character string conversion library to enter std.conv. 

example...

CharConv std.conv.charconv(char[] tocode, char[] fromcode);
char[] std.conv.iconv(CharConv cd, char[] inbuf);
int std.conv.iconv_close(CharConv cd);

or...

module std.conv.cp1252;
wchar[] cp1252toUTF16(ubyte[] raw) {}
dchar[] cp1252toUTF32(ubyte[] raw) {}
ubyte[] UTF16toCP1252(char[] raw)  {}
ubyte[] UTF16toCP1252(wchar[] raw) {}
ubyte[] UTF16toCP1252(dchar[] raw) {}

module std.conv.cp932;
wchar[] cp932toUTF16(ubyte[] raw) {}
dchar[] cp932toUTF32(ubyte[] raw) {}
ubyte[] UTF16toCP932(char[] raw)  {}
ubyte[] UTF16toCP932(wchar[] raw) {}
ubyte[] UTF16toCP932(dchar[] raw) {}

module std.conv.sjis;
wchar[] SJIStoUTF16(ubyte[] raw) {}
dchar[] SJIStoUTF32(ubyte[] raw) {}
ubyte[] UTF16toSJIS(char[] raw)  {}
ubyte[] UTF16toSJIS(wchar[] raw) {}
ubyte[] UTF16toSJIS(dchar[] raw) {}

module std.conv.eucjp;
wchar[] EUCJPtoUTF16(ubyte[] raw) {}
dchar[] EUCJPtoUTF32(ubyte[] raw) {}
ubyte[] UTF16toEUCJP(char[] raw)  {}
ubyte[] UTF16toEUCJP(wchar[] raw) {}
ubyte[] UTF16toEUCJP(dchar[] raw) {}

In article <dqdrel$1ajo$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
Hiroshi Sakurai wrote:

 Happy new year.
 Akemasite Omedetou Gozaimasu.

Gott Nytt År.
 I like D language. 
 Originally, D language is a language that uses UTF8.
 Therefore, we should use the utf8 console. 

D only supports UTF-8 consoles. If you run it from a console which doesn't use UTF-8, you'll get errors. (e.g. the args[] could contain invalid Unicode...)
 However, we should use past character-code set MS932, EUC-JS and Shift-Jis.
 (I am Japanese. )
 
 I waited for a formal support of more character-codes in D language. 
 
 However, it is not supported. 

You can use the "iconv" library to translate to/from legacy encodings: "Japanese EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP, ISO-2022-JP-2, ISO-2022-JP-1" http://www.gnu.org/software/libiconv/ --anders PS. Old D module at http://www.algonet.se/~afb/d/libiconv.d

Jan 15 2006
next sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Hiroshi Sakurai wrote:

 libiconv.d is very good library!
 But, I am hoping for a public domain license. 

I did some conversion mapping routines in D earlier, but they won't be available without copyright, sorry. Tables should be at http://www.unicode.org/Public/MAPPINGS/
 I want you for the character string conversion library to enter std.conv. 

Okay, will defer the question for Walter's answer then... --anders
Jan 15 2006
parent Hiroshi Sakurai <Hiroshi_member pathlink.com> writes:
thank you! anders.

In article <dqe6ai$2465$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
Hiroshi Sakurai wrote:

 libiconv.d is very good library!
 But, I am hoping for a public domain license. 

I did some conversion mapping routines in D earlier, but they won't be available without copyright, sorry. Tables should be at http://www.unicode.org/Public/MAPPINGS/
 I want you for the character string conversion library to enter std.conv. 

Okay, will defer the question for Walter's answer then... --anders

--Hiroshi Sakurai
Jan 15 2006
prev sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
Would you be interested in writing such conversion routines? 
Jan 15 2006
next sibling parent Hiroshi Sakurai <Hiroshi_member pathlink.com> writes:
In article <dqeq8e$13t$1 digitaldaemon.com>, Walter Bright says...
Would you be interested in writing such conversion routines? 

I am glad to receive the comment from Walter. I did not want to write a formal conversion routine. Because 2ch bbs user says, "D language cannot be used" When 2ch bbs is seen, I spend very mortifying time (T-T). Therefore, I come to want to write it. Originally, I transplanted, and was playing the conversion code. Therefore, I want to write in a formal specification additionally. I want you to read 2ch bbs. http://www.excite.co.jp/world/english/web/?wb_url=http%3A%2F%2Fpc8.2ch.net%2Ftest%2Fread.cgi%2Ftech%2F1137068104%2F&wb_lp=JAEN&wb_dis=2 thanks Hiroshi Sakurai.
Jan 15 2006
prev sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Walter Bright wrote:

 Would you be interested in writing such conversion routines? 

I think they are mostly needed for any Windows D programs that want to avoid linking to / using a LGPL'ed library. For Mac OS X and for Linux, libiconv comes with the system. (so it would seem a little like re-inventing the wheel, no?)) Possibly do some better D wrappers, to make it easier to use. It would still need an addition that would help it tell what encoding the current terminal has or what codepage is being used, so that it can cast() and convert the args[] to UTF-8. (at the moment, they are passed in the native OS encoding...) --anders
Jan 16 2006
parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
 For Mac OS X and for Linux, libiconv comes with the system.
 (so it would seem a little like re-inventing the wheel, no?))

And mango.icu also contains such conversion routines. I think ICU supports something like 230 locales now ? (the default Mac libiconv does something similar, too) Did some quick hacks* for common 1-byte conversions (ISO-8859-1, CP1252, etc) but redoing all of the more complex conversions in D, just seems like a waste... When there are *two* libraries that are ready to use ? (+ one can probably use the built-in Windows functions, versioned, on that platform instead of a 3rd party lib) --anders * On the lines of "wchar[256] mapping;", that is.
Jan 16 2006