www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 24218] New: U+0000 (NUL) cannot be used in string literal

https://issues.dlang.org/show_bug.cgi?id=24218

          Issue ID: 24218
           Summary: U+0000 (NUL) cannot be used in string literal
           Product: D
           Version: D2
          Hardware: x86_64
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P1
         Component: dmd
          Assignee: nobody puremagic.com
          Reporter: kdevel vogtner.de

According to the documentation [1] DoubleQuotedStrings consist of
DoubleQuotedCharacters which are defined as

   DoubleQuotedCharacter:
      Character               [2]
      EscapeSequence
      EndOfLine

EndOfLine is defined as

   EndOfLine:
       \u000D
       \u000A
       \u000D \u000A
       \u2028
       \u2029
       EndOfFile

and EndOfFile is defined as

   EndOfFile:
       physical end of the file
       \u0000
       \u001A

Hence this code

```d
void main ()
{
   version (NUL) enum frag = "\u0000";
   version (SOH) enum frag = "\u0001";
   version (SUB) enum frag = "\u001a";
   enum s = "string t = \"" ~ frag ~ "\";";
   pragma (msg, s);
   mixin (s);
}
```

should successfully compile for all three versions, but it does not:

$ dmd -version=NUL nul2 
string t = "
nul2.d-mixin-8(8): Error: unterminated string constant starting at
nul2.d-mixin-8(8)
nul2.d-mixin-8(8): Error: semicolon needed to end declaration of `t` instead of
`End of File`
$ dmd -version=SOH nul2 
string t = "";
$ dmd -version=SUB nul2 
string t = "";
nul2.d-mixin-8(8): Error: unterminated string constant starting at
nul2.d-mixin-8(8)
nul2.d-mixin-8(8): Error: semicolon needed to end declaration of `t` instead of
`End of File`


[1] https://dlang.org/spec/lex.html#DoubleQuotedString
[2] This is questionable as Character is any unicode character. This definition
https://en.cppreference.com/w/cpp/language/string_literal seems to get it
right.

--
Nov 01 2023