www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - char code

reply Hiroshi Sakurai <Hiroshi_member pathlink.com> writes:
Hi.
this topic writen in 2ch BBS.
http://pc8.2ch.net/test/read.cgi/tech/1109933426/567

and Japanese D language wiki bugtrack.
http://f17.aaa.livedoor.jp/~labamba/?BugTrack%2F13

Illegal non-ascii WYSIWYG string. 

ver dmd0.123 
/*code page utf8 */ 
private import std.stream; 
void main() 
{ 
// valid 
char[] str = "ワロスw"; 
stdout.writeString(str); // valid output : E3 83 AF E3 83 AD E3 82 B9 EF BD 97 
// invalid 
char[] str2 = r"ワロスw"; // or char[] str = `ワロスw`; 
stdout.writeString(str2); // invalid output : E3 E3 E3 EF 
return; 
} 

thanks,
Hiroshi Sakurai.
sorry, my english is very poor. OTL
May 16 2005
next sibling parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Hiroshi Sakurai" <Hiroshi_member pathlink.com> wrote in message 
news:d6bm67$cfr$1 digitaldaemon.com...
 Hi.
 this topic writen in 2ch BBS.
 http://pc8.2ch.net/test/read.cgi/tech/1109933426/567

 and Japanese D language wiki bugtrack.
 http://f17.aaa.livedoor.jp/~labamba/?BugTrack%2F13

 Illegal non-ascii WYSIWYG string.

 ver dmd0.123
 /*code page utf8 */
 private import std.stream;
 void main()
 {
 // valid
 char[] str = "f叔糠X,-";
 stdout.writeString(str); // valid output : E3 83 AF E3 83 AD E3 82 B9 EF 
 BD 97
 // invalid
 char[] str2 = r"f叔糠X,-"; // or char[] str = `f叔糠X,-`;
 stdout.writeString(str2); // invalid output : E3 E3 E3 EF
 return;
 }

 thanks,
 Hiroshi Sakurai.
 sorry, my english is very poor. OTL
I'm confused. Is the problem with raw strings like r"blah" or with std.stream? The Stream.writeString doesn't look at encodings so whatever is going wrong is happening before the call to writeString. Since I don't have the proper fonts or encoding support in my new reader I only see the raw string r"f..." with boxes in them so I can't tell what is actually in the source file you are trying to compile. The raw strings format is a sequence of bytes assumed to be in utf-8 encoding. Is that what is in your source file? -Ben
May 17 2005
parent reply "Uwe Salomon" <post uwesalomon.de> writes:
 void main()
 {
 // valid
 char[] str = "fソスfソスfX,-";
 stdout.writeString(str); // valid output : E3 83 AF E3 83 AD E3 82 B9 EF
 BD 97
 // invalid
 char[] str2 = r"fソスfソスfX,-"; // or char[] str = `fソスfソスfX,-`;
 stdout.writeString(str2); // invalid output : E3 E3 E3 EF
 return;
 }

 thanks,
 Hiroshi Sakurai.
 sorry, my english is very poor. OTL
I'm confused. Is the problem with raw strings like r"blah" or with std.stream? The Stream.writeString doesn't look at encodings so whatever is going wrong is happening before the call to writeString. Since I don't have the proper fonts or encoding support in my new reader I only see the raw string r"f..." with boxes in them so I can't tell what is actually in the source file you are trying to compile.
Yes, and the boxes are U+FFFD, that is the Unicode replacement character. Whatever he typed in, it didn't make its way to us. But it is interesting to note that dmd's behaviour for the normal and the wysiwyg string is still different: UTF8: 66 ef bf bd 66 ef bf bd 66 58 2c 2d UTF16: 66 fffd 66 fffd 66 58 2c 2d This is the normal string in UTF8 and UTF16 (note the U+FFFD replacement character). UTF8: 66 ef 66 ef 66 58 2c 2d UTF16: 66 f9af 66 58 2c 2d And this one is the wysiwyg string, with the contents of the other one copied+pasted. Note that dmd omitted the "BF BD" after "66 EF". That produces illegal unicode, as you can see by the UTF16 translation (which is simply wrong - the algorithm does not check on invalid input). Hmm, after some more thinking i found that the whole f?f?fX,- sequence is wrong, it just does not match the "valid output" he denotes above. He wants to input the following: UTF8: e3 83 af e3 83 ad e3 82 b9 ef bd 97 UTF16: 30ef 30ed 30b9 ff57 Does anybody know how to input these characters with Linux? I don't have any input device for that :) Or easier, Hiroshi, could you please send your input file over the list? Ciao uwe
May 17 2005
parent reply "Uwe Salomon" <post uwesalomon.de> writes:
 Does anybody know how to input these characters with Linux? I don't have  
 any input device for that :)
 Or easier, Hiroshi, could you please send your input file over the list?
Hm, as i see now, Thomas already accomplished that (how?). Please ignore my posting. :( Ciao uwe
May 17 2005
parent reply Thomas Kuehne <thomas-dloop kuehne.thisisspam.cn> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Uwe Salomon schrieb am Tue, 17 May 2005 19:16:21 +0200:
 Does anybody know how to input these characters with Linux? I don't have  
 any input device for that :)
 Or easier, Hiroshi, could you please send your input file over the list?
Hm, as i see now, Thomas already accomplished that (how?).
Where is the prテカbトシテゥm with Uniode on Linux 蜷 ? Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFCik0B3w+/yD4P9tIRAuTsAKCpwmUDrhQEV11P/Za+5aDB1A/c1gCgxKrg 6KBpjbBb7mTAZ3HGeuLjb7E= =rVZK -----END PGP SIGNATURE-----
May 17 2005
parent reply "Uwe Salomon" <post uwesalomon.de> writes:
 Hm, as i see now, Thomas already accomplished that (how?).
Where is the prテカbトシテゥm with Uniode on Linux 蜷 ?
Yes, put salt on the open wound! :-P I don't have problems with Unicode, but i don't know a program/method to insert arbitrary Unicode characters into text... Thus i can only insert the characters that are on my keyboard.. ( ナやぎツカナァ竊絶凪津クテセツィテヲテ淌ート打桔トクナ etc.) uwe
May 17 2005
parent Thomas Kuehne <thomas-dloop kuehne.THISISSPAM.cn> writes:
Uwe Salomon wrote:
 Hm, as i see now, Thomas already accomplished that (how?).
Where is the prテカbトシテゥm with Uniode on Linux 蜷 ?
Yes, put salt on the open wound! :-P I don't have problems with Unicode, but i don't know a program/method to insert arbitrary Unicode characters into text... Thus i can only insert the characters that are on my keyboard.. ( ナやぎツカナァ竊絶凪津クテセツィテヲテ淌ート打桔トクナ etc.)
There are input modules for X11 and gtk that support quite a range of scripts. Last time I checked qt/KDE didn't any way to add native input modules. If you are desperate you might try http://yudit.org/ (X-based) or http://sourceforge.net/projects/jgim/ (Java based) to input "simple" languages. Thomas
May 18 2005
prev sibling parent Thomas Kuehne <thomas-dloop kuehne.thisisspam.cn> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hiroshi Sakurai schrieb am Tue, 17 May 2005 02:50:47 +0000 (UTC):
 Hi.
 this topic writen in 2ch BBS.
 http://pc8.2ch.net/test/read.cgi/tech/1109933426/567

 and Japanese D language wiki bugtrack.
 http://f17.aaa.livedoor.jp/~labamba/?BugTrack%2F13

 Illegal non-ascii WYSIWYG string. 

 ver dmd0.123 
 /*code page utf8 */ 
 private import std.stream; 
 void main() 
 { 
 // valid 
 char[] str = "ワロスw"; 
 stdout.writeString(str); // valid output : E3 83 AF E3 83 AD E3 82 B9 EF BD 97 
 // invalid 
 char[] str2 = r"ワロスw"; // or char[] str = `ワロスw`; 
 stdout.writeString(str2); // invalid output : E3 E3 E3 EF 
 return; 
 } 
Added to DStress as http://dstress.kuehne.cn/run/u/unicode_08_A.d http://dstress.kuehne.cn/run/u/unicode_08_B.d http://dstress.kuehne.cn/run/u/unicode_08_C.d http://dstress.kuehne.cn/run/u/unicode_08_D.d
 sorry, my english is very poor. OTL
I could understand your message, thus your English can't be that bad ;) Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFCiiHX3w+/yD4P9tIRAm7FAKC2uCVJSP8I8scW77UtSU7uTt+YewCfWqVT uzO/m5SpoJA+kZG9qiJA/Fk= =TjZu -----END PGP SIGNATURE-----
May 17 2005