www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 24190] New: Identifier tokenizer is greedy steals new line

https://issues.dlang.org/show_bug.cgi?id=24190

          Issue ID: 24190
           Summary: Identifier tokenizer is greedy steals new line
                    characters
           Product: D
           Version: D2
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P1
         Component: dmd
          Assignee: nobody puremagic.com
          Reporter: alphaglosined gmail.com

Currently, the tokenizer for identifiers is quite greedy. It'll steal the
non-ASCII character for new lines when it should probably defer to the outer
loop to error.

```d
$ cat lsps.d
void main ()
{
    enum b = 8;
    mixin ("enum a1 =\u2028b; pragma (msg, a1);");
    mixin ("enum a2\u2028= b; pragma (msg, a2);");
    mixin ("enum\u2028a3 = b; pragma (msg, a3);");
}
$ dmd lsps.d
8
lsps.d-mixin-5(5): Error: char 0x2028 not allowed in identifier
lsps.d-mixin-6(6): Error: char 0x2028 not allowed in identifier
```

That character 0x2028 is a valid new line character.

--
Oct 17 2023