digitalmars.D.bugs - [Issue 10668] New: Unicode characters, when taken from strings, are not printed correctly
- d-bugmail puremagic.com (23/23) Jul 19 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10668
- d-bugmail puremagic.com (15/15) Jul 19 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10668
- d-bugmail puremagic.com (36/36) Jul 19 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10668
- d-bugmail puremagic.com (12/15) Jul 19 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10668
- d-bugmail puremagic.com (11/47) Jul 19 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10668
- d-bugmail puremagic.com (7/16) Jul 19 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10668
- d-bugmail puremagic.com (17/23) Jul 19 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10668
http://d.puremagic.com/issues/show_bug.cgi?id=10668 Summary: Unicode characters, when taken from strings, are not printed correctly Product: D Version: D2 Platform: x86_64 OS/Version: Mac OS X Status: NEW Severity: normal Priority: P2 Component: Phobos AssignedTo: nobody puremagic.com ReportedBy: MATTCA sky.com Created an attachment (id=1234) A small program which demonstrates the issue. When obtaining a char from within a string of non-ASCII characters (in this example, the pound sign '£'), the resulting char will not be printed correctly to the console (via std.stdio.writeln). Instead, the '?' symbol is printed. However, when printing the entire string, the '£' is printed correctly. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 19 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10668 The content of the attachment, just in case: module main; import std.stdio; void main(string[] args) { string s = "£££"; writeln(s); // Output: £££ char c = s[0]; writeln(c); // Output: ? writeln(s[0]); // Output: ? } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 19 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10668 monarchdodra gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED CC| |monarchdodra gmail.com Resolution| |INVALID Well... what did you think it was going to print? you have a utf-8 sequence. char c = s[0]; will extract the first code*point* of your unicode. You want the first code*unit*. http://www.fileformat.info/info/unicode/char/a3/index.htm EG: £ is the codepoint "AE" In UTF8 it is represented by the sequence: [0xC2, 0xA3] When you write "char c = s[0];", you are extracting the first codeunit, which is 0xC2. When you pass this to to writeln, what will happen will mostly depend on your locale/codepage. If it is set to UF8 (CP65001 on windows), then it will print the "unknown character", since it you passed an incomplete sequence. The correct code you want is: dchar c = s.front; (remember to include std.array to front). Another alternative, is to simply work from the ground up with dstrings. module main; import std.stdio; void main(string[] args) { dstring s = "£££"; writeln(s); // Output: £££ dchar c = s[0]; writeln(c); // Output: £ writeln(s[0]); // Output: £ } Do you have access to "The D Programming Language"? It has the best introduction to unicode/UTF I've read. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 19 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10668 Nils <nilsbossung googlemail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |nilsbossung googlemail.com ---Well... what did you think it was going to print? you have a utf-8 sequence. char c = s[0]; will extract the first code*point*You mean code*unit*.of your unicode. You want the first code*unit*.code*point* -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 19 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10668Well... what did you think it was going to print? you have a utf-8 sequence. char c = s[0]; will extract the first code*point* of your unicode. You want the first code*unit*. http://www.fileformat.info/info/unicode/char/a3/index.htm EG: £ is the codepoint "AE" In UTF8 it is represented by the sequence: [0xC2, 0xA3] When you write "char c = s[0];", you are extracting the first codeunit, which is 0xC2. When you pass this to to writeln, what will happen will mostly depend on your locale/codepage. If it is set to UF8 (CP65001 on windows), then it will print the "unknown character", since it you passed an incomplete sequence. The correct code you want is: dchar c = s.front; (remember to include std.array to front). Another alternative, is to simply work from the ground up with dstrings. module main; import std.stdio; void main(string[] args) { dstring s = "£££"; writeln(s); // Output: £££ dchar c = s[0]; writeln(c); // Output: £ writeln(s[0]); // Output: £ } Do you have access to "The D Programming Language"? It has the best introduction to unicode/UTF I've read.Thanks for the response! Yeah, I converted my project to use dstrings on the off chance it worked after posting, lo-behold this is the fix it seems. I plan on eventually getting the book, although I've read some bad reviews regarding the e-book/kindle version, so I'm having to wait a little longer to get a hard copy. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 19 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10668Oops. Massive face-palm. Thank you for correcting me. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------Well... what did you think it was going to print? you have a utf-8 sequence. char c = s[0]; will extract the first code*point*You mean code*unit*.of your unicode. You want the first code*unit*.code*point*
Jul 19 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10668Thanks for the response! Yeah, I converted my project to use dstrings on the off chance it worked after posting, lo-behold this is the fix it seems. I plan on eventually getting the book, although I've read some bad reviews regarding the e-book/kindle version, so I'm having to wait a little longer to get a hard copy.I'd recommend trying to get your project to work with "normal UTF8" strings. They're the norm in D, and you'll have to get around to understanding how they work sooner or later. To make it *really* simple, a UTF-8 string should be handled like a bidirectional range of dchars. You can ask for front/back, popFront/popBack, and empty. Stick to only these primitives, and your code is *guaranteed* to work. All the other primitives (length, index, slice), while *present* require much more knowledge of what is going on, and should be used only when you *know* what you are doing. As a matter of fact, if you ask a string if it supports, say length: "hasLength!string": it will say "false". -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 19 2013