digitalmars.D.bugs - [Issue 786] New: the \ EndOfFile EscapeSequence in double-quoted strings doesn't work

d-bugmail puremagic.com (47/47) Jan 02 2007 http://d.puremagic.com/issues/show_bug.cgi?id=786

d-bugmail puremagic.com (15/15) Jan 03 2007 http://d.puremagic.com/issues/show_bug.cgi?id=786
d-bugmail puremagic.com (6/6) Jan 06 2007 http://d.puremagic.com/issues/show_bug.cgi?id=786
d-bugmail puremagic.com (13/13) Feb 02 2007 http://d.puremagic.com/issues/show_bug.cgi?id=786
d-bugmail puremagic.com (4/4) Feb 02 2007 http://d.puremagic.com/issues/show_bug.cgi?id=786
d-bugmail puremagic.com (5/5) Feb 02 2007 http://d.puremagic.com/issues/show_bug.cgi?id=786
d-bugmail puremagic.com (5/5) Feb 03 2007 http://d.puremagic.com/issues/show_bug.cgi?id=786
d-bugmail puremagic.com (6/6) Feb 03 2007 http://d.puremagic.com/issues/show_bug.cgi?id=786
d-bugmail puremagic.com (8/8) Feb 04 2007 http://d.puremagic.com/issues/show_bug.cgi?id=786

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=786

           Summary: the \ EndOfFile EscapeSequence in double-quoted strings
                    doesn't work
           Product: D
           Version: 0.178
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Keywords: rejects-valid, spec
          Severity: normal
          Priority: P3
         Component: DMD
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: thecybershadow gmail.com


Spec non-conformacy, I believe.

Spec: http://www.digitalmars.com/d/lex.html#StringLiteral

Program:

void main()
{
  char[] eof_literal = "\";  // the character after the backslash is \u001A,
as per the specs
}

Compiler output:

C:\...>dmd lexical.d
lexical.d(3): unterminated string constant starting at lexical.d(3)
lexical.d(3): semicolon expected, not 'EOF'
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement

(that's 19 repeating lines)


--

Jan 02 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=786


smjg iname.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |smjg iname.com





"End of File

EndOfFile:
        physical end of the file
        \u0000
        \u001A
"

AIUI, locating the end of the code conceptually happens before tokenization. 
But indeed, the spec isn't crystal clear on this.


--

Jan 03 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=786






Intermingling eof detection with tokenisation would cause quite a bit of 
changes within DMD and makes no sense to me as it would allow to read past the 
physical end of the file.


--

Jan 06 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=786


bugzilla digitalmars.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID





0x1A is listed in lex.html as 'end of file', which trumps any token, I think
the spec is reasonably clear on this: "The source text is terminated by
whichever comes first." The reason for this is that some (old) text editors put
out a 0x1A to mark end of file.

Not a bug.


--

Feb 02 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=786






In that case, why is "\ EndOfFile" listed as a valid EscapeSequence token?


--

Feb 02 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=786






If a \ is the last character in a file, the escape sequence will resolve to the
\ character, that's what that is for.


--

Feb 02 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=786






But a StringLiteral can never be the last token of a syntactically valid D
source file, or can it?


--

Feb 03 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=786






Currently, no, it can't, hence the error message about semicolon expected
instead of EOF. But the lexer doesn't (and shouldn't) know syntax, it just
knows tokens.


--

Feb 03 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=786






Exactly.  So really,

    EscapeSequence: \ EndOfFile

has no effect except perhaps on what error message the compiler throws.

Moreover, UIMS the spec gives no meaning to this EscapeSequence form.  Which is
probably why we're all asking.


--

Feb 04 2007

D Programming

C/C++ Programming

Other

digitalmars.D.bugs - [Issue 786] New: the \ EndOfFile EscapeSequence in double-quoted strings doesn't work