digitalmars.D.bugs - [Issue 1024] New: invalid UTF-8 sequence for \u00B6 (¶) in comment
- d-bugmail puremagic.com (15/15) Mar 04 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1024
- d-bugmail puremagic.com (23/23) Mar 04 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1024
- d-bugmail puremagic.com (10/14) Mar 04 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1024
- d-bugmail puremagic.com (11/11) Mar 04 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1024
- Frits van Bommel (2/4) Mar 04 2007 Just to be clear: I meant the extra character before the '//'.
- Frank Benoit (keinfarbton) (1/1) Mar 04 2007 right, they are corrupted.
- d-bugmail puremagic.com (11/11) Mar 04 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1024
- Lionello Lunesu (3/21) Mar 06 2007 THAT's a known bug, issue 111
http://d.puremagic.com/issues/show_bug.cgi?id=1024 Summary: invalid UTF-8 sequence for \u00B6 (¶) in comment Product: D Version: 1.007 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: DMD AssignedTo: bugzilla digitalmars.com ReportedBy: benoit tionex.de Having \u00b6 in a single line comment (//) gives the message. An editor correctly shows "¶". --
Mar 04 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1024 ------- Comment #1 from fvbommel wxs.nl 2007-03-04 15:41 ------- Was the encoding UTF-8? Did your file start with the appropriate BOM? (DMD requires a BOM to consider a file anything other than pure ASCII) Here's a test for you, try to reproduce this: --- urxae urxae:~/tmp$ cat utf.d //¶ urxae urxae:~/tmp$ hd utf.d 00000000 ef bb bf 2f 2f c2 b6 0a |...//...| 00000008 urxae urxae:~/tmp$ dmd -c utf.d urxae urxae:~/tmp$ --- The first command shows the contents of the file (apparently cat doesn't handle BOMs, it just sends it straight to the console; that's where the extra symbol comes from). The second shows the hexdump of the file. Note the 'ef bb bf' UTF-8 BOM, and the 'c2 b6' encoding of the '¶'. The third command shows DMD compiling the file successfully. See http://www.digitalmars.com/d/lex.html (under "Source Text") for the details on encodings accepted by DMD --
Mar 04 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1024 ------- Comment #2 from fvbommel wxs.nl 2007-03-04 15:45 ------- (In reply to comment #1) [snip]//¶[snip](apparently cat doesn't handle BOMs, it just sends it straight to the console; that's where the extra symbol comes from).And apparently somewhere along the line from bugzilla to the newsgroup message showing up in my Thunderbird, that character is stripped... (for anyone only reading this in the newsgroups trying to figure out what I was talking about: look at the bugzilla page) --
Mar 04 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1024 benoit tionex.de changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #3 from benoit tionex.de 2007-03-04 16:15 ------- in my thunderbird it shows up correctly. But I changed the standard encoding to utf8. You are rigth with the bom. I added a bom to my file, and it compiles. --
Mar 04 2007
d-bugmail puremagic.com wrote:------- Comment #3 from benoit tionex.de 2007-03-04 16:15 ------- in my thunderbird it shows up correctly.Just to be clear: I meant the extra character before the '//'.
Mar 04 2007
right, they are corrupted.
Mar 04 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1024 ------- Comment #4 from benoit tionex.de 2007-03-04 16:41 ------- hm, i did forget to reactivate the code that produces the utf chars.... Now i can say, it also works without BOM, and my problem was a coding error in writing the file. char[] line; ... line ~= '\u00b6'; // This made the corrupt file content line ~= "\u00b6"; // So the file is written correctly // write the line into the file --
Mar 04 2007
d-bugmail puremagic.com wrote:http://d.puremagic.com/issues/show_bug.cgi?id=1024 ------- Comment #4 from benoit tionex.de 2007-03-04 16:41 ------- hm, i did forget to reactivate the code that produces the utf chars.... Now i can say, it also works without BOM, and my problem was a coding error in writing the file. char[] line; ... line ~= '\u00b6'; // This made the corrupt file content line ~= "\u00b6"; // So the file is written correctly // write the line into the fileTHAT's a known bug, issue 111 http://d.puremagic.com/issues/show_bug.cgi?id=111
Mar 06 2007