www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 18241] New: Missing characters from

https://issues.dlang.org/show_bug.cgi?id=18241

          Issue ID: 18241
           Summary: Missing characters from
                    std.uni.unicode.Default_Ignorable_Code_Point
           Product: D
           Version: D2
          Hardware: x86_64
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P1
         Component: phobos
          Assignee: nobody puremagic.com
          Reporter: hsteoh quickfur.ath.cx

The set returned by unicode.Default_Ignorable_Code_Point is missing some
characters listed in:

    http://www.unicode.org/L2/L2002/02368-default-ignorable.pdf

where Default_Ignorable_Code_Point is defined as:

    Other_Default_Ignorable_Code_Point + (Cf + Cc + Cs - White_Space)

While characters in Other_Default_Ignorable_Code_Point seem to be included
correctly, two characters in Cf appear to be missing from the set:

- U+06DD
- U+070F

Furthermore, characters in (Cc - White_Space) are also missing:

- U+0000 to U+0008
- U+000E to U+001F


(See also: PR #5, referencing the Unicode Standard section 5.22.)


Not sure if this is because these missing characters were added in a later
Unicode standard than was originally implemented in std.uni.

--
Jan 15