www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 18532] New: Hex literals produce invalid strings

https://issues.dlang.org/show_bug.cgi?id=18532

          Issue ID: 18532
           Summary: Hex literals produce invalid strings
           Product: D
           Version: D2
          Hardware: x86_64
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P1
         Component: dmd
          Assignee: nobody puremagic.com
          Reporter: default_357-line yahoo.de

Hex literals let you declare strings that are invalid utf-8. This violates the
docs, as well as the typesystem.

"\xff" is an expression of type string. string is defined (
https://dlang.org/spec/arrays.html#strings ) to be in UTF-8 format.
Furthermore, string is an array of char, and chars are defined to be UTF-8
codepoints. 0xFF is not a valid UTF-8 codepoint.

The docs state that hex strings do not perform UTF-8 checking. The docs
accurately describe the code; the code is mistaken since it breaks the type.

Either the behavior of hex literals must be changed, or the definition of char
must be changed. As it stands, the documentation and behavior is
self-contradictory.

Maybe hex literals can be ubyte[]?

--
Feb 27 2018