www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 18844] New: std.utf.decode skips valid character on invalid


          Issue ID: 18844
           Summary: std.utf.decode skips valid character on invalid
                    multibyte sequence
           Product: D
           Version: D2
          Hardware: x86_64
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P1
         Component: phobos
          Assignee: nobody puremagic.com
          Reporter: default_357-line yahoo.de

When decoding an invalid UTF-8 string, like cast(string) [cast(ubyte) 'ä',
't'], with Yes.useReplacementDchar, std.utf.decode will advance the cursor past
the letter where the multibyte sequence hit an error, even if that letter is in
itself a valid start of a new byte sequence. As a result, decode will advance
the index to 2, leading the string to decode as "�" when it should decode as

May 09 2018