www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 1024] New: invalid UTF-8 sequence for \u00B6 (¶) in comment

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1024

           Summary: invalid UTF-8 sequence for \u00B6 (¶) in comment
           Product: D
           Version: 1.007
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: benoit tionex.de


Having \u00b6 in a single line comment (//) gives the message.
An editor correctly shows "¶".


-- 
Mar 04 2007
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1024





------- Comment #1 from fvbommel wxs.nl  2007-03-04 15:41 -------
Was the encoding UTF-8? Did your file start with the appropriate BOM? (DMD
requires a BOM to consider a file anything other than pure ASCII)

Here's a test for you, try to reproduce this:
---
urxae urxae:~/tmp$ cat utf.d
//¶
urxae urxae:~/tmp$ hd utf.d
00000000  ef bb bf 2f 2f c2 b6 0a                           |...//...|
00000008
urxae urxae:~/tmp$ dmd -c utf.d
urxae urxae:~/tmp$ 
---
The first command shows the contents of the file (apparently cat doesn't handle
BOMs, it just sends it straight to the console; that's where the extra symbol
comes from).
The second shows the hexdump of the file. Note the 'ef bb bf' UTF-8 BOM, and
the 'c2 b6' encoding of the '¶'.
The third command shows DMD compiling the file successfully.

See http://www.digitalmars.com/d/lex.html (under "Source Text") for the details
on encodings accepted by DMD


-- 
Mar 04 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1024





------- Comment #2 from fvbommel wxs.nl  2007-03-04 15:45 -------
(In reply to comment #1)
[snip]
 //¶

 (apparently cat doesn't handle
 BOMs, it just sends it straight to the console; that's where the extra symbol
 comes from).

And apparently somewhere along the line from bugzilla to the newsgroup message showing up in my Thunderbird, that character is stripped... (for anyone only reading this in the newsgroups trying to figure out what I was talking about: look at the bugzilla page) --
Mar 04 2007
prev sibling next sibling parent reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1024


benoit tionex.de changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID




------- Comment #3 from benoit tionex.de  2007-03-04 16:15 -------
in my thunderbird it shows up correctly.
But I changed the standard encoding to utf8.

You are rigth with the bom. I added a bom to my file, and it compiles.


-- 
Mar 04 2007
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
d-bugmail puremagic.com wrote:
 ------- Comment #3 from benoit tionex.de  2007-03-04 16:15 -------
 in my thunderbird it shows up correctly.

Just to be clear: I meant the extra character before the '//'.
Mar 04 2007
parent "Frank Benoit (keinfarbton)" <benoit tionex.removethispart.de> writes:
right, they are corrupted.
Mar 04 2007
prev sibling parent reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1024





------- Comment #4 from benoit tionex.de  2007-03-04 16:41 -------
hm, i did forget to reactivate the code that produces the utf chars....

Now i can say, it also works without BOM, and my problem was a coding error in
writing the file.

char[] line;
...
line ~= '\u00b6'; // This made the corrupt file content
line ~= "\u00b6"; // So the file is written correctly
// write the line into the file


-- 
Mar 04 2007
parent Lionello Lunesu <lio lunesu.remove.com> writes:
d-bugmail puremagic.com wrote:
 http://d.puremagic.com/issues/show_bug.cgi?id=1024
 
 
 
 
 
 ------- Comment #4 from benoit tionex.de  2007-03-04 16:41 -------
 hm, i did forget to reactivate the code that produces the utf chars....
 
 Now i can say, it also works without BOM, and my problem was a coding error in
 writing the file.
 
 char[] line;
 ...
 line ~= '\u00b6'; // This made the corrupt file content
 line ~= "\u00b6"; // So the file is written correctly
 // write the line into the file
 

THAT's a known bug, issue 111 http://d.puremagic.com/issues/show_bug.cgi?id=111
Mar 06 2007