www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 22473] New: dmd foreach loops throw exceptions on invalid UTF

https://issues.dlang.org/show_bug.cgi?id=22473

          Issue ID: 22473
           Summary: dmd foreach loops throw exceptions on invalid UTF
                    sequences, use replacementDchar instead
           Product: D
           Version: D2
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P1
         Component: druntime
          Assignee: nobody puremagic.com
          Reporter: bugzilla digitalmars.com

A simple foreach loop:

    void test(char[] a)
    {
        foreach (char c; a) { }
    }

will throw a UtfException if `a` is not a valid UTF string. Instead, it should
replace the invalid sequence with replacementDchar.

The foreach code is compiled to call druntime/src/rt/aApply/_aApplycd1(), which
calls druntime/src/core/internal/utf/decode() which throws the exceptions.

replacementDchar is defined in std.utf as `\uFFFD`

The reason to effect this change is it is the same problems autodecoding has.
It can't be turned off, it throws, and it may allocate with the gc. Oh, and
it's slow.

--
Nov 03 2021