www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Chinese characters in a string(GDC)

reply ryuka <Yuki001 gmail.com> writes:
Hey , I 'm a chinese d users , My d compiler (gdc) can't work well when using a
string contains chinese characters . 
when I type "─˙║├" or "─˙║├"w in my source code as a wchar [] , My gdc display
some compile error message as below:
hello.d:33: invalid UTF-8 sequence
hello.d:33: invalid UTF-8 sequence
hello.d:33: invalid UTF-8 sequence
hello.d:33: invalid UTF-8 sequence
:: === Build finished: 4 errors, 0 warnings ===
I don't know if dmd has the same problem , but in gdc I can't find a way to
type chinese characters into a wchar [] .I don't want  to use hex values.so I
came here for some help
thanks 
May 15 2007
next sibling parent reply "Aziz K." <aziz.kerim gmail.com> writes:
Hello ryuka,

Your problem appears to me like you don't use an editor that saves the  
source file as Unicode. I'm pretty sure your editor uses a codepage to  
save the source file, otherwise the DMD front-end wouldn't complain about  
that (which is the same for dmd and gdc.)
May 15 2007
parent reply ryuka <Yuki001 gmail.com> writes:
Aziz K. Wrote:

 Hello ryuka,
 
 Your problem appears to me like you don't use an editor that saves the  
 source file as Unicode. I'm pretty sure your editor uses a codepage to  
 save the source file, otherwise the DMD front-end wouldn't complain about  
 that (which is the same for dmd and gdc.)

Anyway ,I try to change the code page to pthers.However ,When the d source stored in some other code pages , the compiler works well (no compile errors ) but my application display strange characters which aren't the characters I type in the source .. and g++ programs display these strange characters too when using these code pages .So there is still a problem about using chinese on my os. Thank you ..
May 15 2007
parent reply Roberto Mariottini <rmariottini mail.com> writes:
ryuka wrote:
[...]
 Anyway ,I try to change the code page to pthers.However ,When the d source
stored in some other code pages , the compiler works well  (no compile errors )
but my application display strange characters which aren't the characters I
type in the source .. and g++ programs display these strange characters too
when using these code pages .So there is still a problem about using chinese on
my os.

The problem is that the current D console API doesn't translate form internal Unicode representation to the codepage currently selected in your console. So, even if you write your source as UTF-something (so the compiler can understand it) when you call writef (or printf or the like) it will not translate back those UTF sequences to the console codepage, so you'll get messed up characters on the screen. This makes D console programs currently unusable for any language other than English. Ciao
May 15 2007
next sibling parent Carlos Santander <csantander619 gmail.com> writes:
Roberto Mariottini escribiˇ:
 ryuka wrote:
 [...]
 Anyway ,I try to change the code page to pthers.However ,When the d 
 source stored in some other code pages , the compiler works well  (no 
 compile errors ) but my application display strange characters which 
 aren't the characters I type in the source .. and g++ programs display 
 these strange characters too when using these code pages .So there is 
 still a problem about using chinese on my os.

The problem is that the current D console API doesn't translate form internal Unicode representation to the codepage currently selected in your console. So, even if you write your source as UTF-something (so the compiler can understand it) when you call writef (or printf or the like) it will not translate back those UTF sequences to the console codepage, so you'll get messed up characters on the screen. This makes D console programs currently unusable for any language other than English. Ciao

That's Windows-only. On Linux and Mac OS X, where the consoles use UTF-8, such characters show up correctly. ryuka, you can use UTF-8 on Windows by using the Lucida Console font and doing "chcp 65001" (IIRC). Or you can convert your text to the local codepage before sending it to the console. -- Carlos Santander Bernal
May 15 2007
prev sibling parent reply "Aziz K." <aziz.kerim gmail.com> writes:
Roberto Mariottini wrote:
 The problem is that the current D console API doesn't translate form  
 internal Unicode representation to the codepage currently selected in  
 your console.

It doesn't need to convert to the codepage set in the console. There is a neat function called WriteConsoleW() which can print any Unicode character to the console regardless of the current codepage. Other than that there is the issue with the command line, because the arguments aren't passed as Unicode to the main function. To remedy that problem I've written a function that takes the command line with GetCommandLineW() and parses it into an array of wchar[]s (applying the weird escaping rules cmd.exe uses.) I've only used calloc and realloc so that the function can be used while the garbage collector hasn't been initialized yet. I'll submit the function to bugzilla when I've finished it. At first I'd like to see it in Phobos for a while, just in case some bugs crop up, and if it has stood the test of time then Walter could move it to dmain2.d so that every D application has proper support for Unicode command line arguments by default :-) Apart from these problems another one comes to mind, which is, that the font set in the console has to support the codepoints you want to print, otherwise you will get only small boxes. I don't know how to change to another true type font other than Luicida Sans Console (because the dialog restricts you to only two fonts), but as far as I can remember, there should be a guide out there explaining how to do it. PS.: Go to http://openquran.googlecode.com/svn/trunk/src/main.d if you want to see how I'm using WriteConsoleW in the Windows version of my program.
 This makes D console programs currently unusable for any language other  
 than English.

Yes, but fortunately not any longer. Regards.
May 15 2007
parent "Aziz K." <aziz.kerim gmail.com> writes:
You can review my command-line parser now, if you like. I committed it a  
few minutes ago.
http://openquran.googlecode.com/svn/trunk/src/CmdLine.d
May 15 2007
prev sibling parent =?ISO-8859-1?Q?Julio_C=E9sar_Carrascal_Urquijo?= writes:
ryuka wrote:
 Hey , I 'm a chinese d users , My d compiler (gdc) can't work well when using
a string contains chinese characters . 
 when I type "─˙║├" or "─˙║├"w in my source code as a wchar [] , My gdc display
some compile error message as below:
 hello.d:33: invalid UTF-8 sequence
 hello.d:33: invalid UTF-8 sequence
 hello.d:33: invalid UTF-8 sequence
 hello.d:33: invalid UTF-8 sequence
 :: === Build finished: 4 errors, 0 warnings ===
 I don't know if dmd has the same problem , but in gdc I can't find a way to
type chinese characters into a wchar [] .I don't want  to use hex values.so I
came here for some help
 thanks 

You should save your source code as either UTF-8, UTF-16 or UTF-32 (with signature).
May 15 2007