www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 18462] New: std.regex.matchFirst doesn't work well with

https://issues.dlang.org/show_bug.cgi?id=18462

          Issue ID: 18462
           Summary: std.regex.matchFirst doesn't work well with characters
                    from extended ASCII
           Product: D
           Version: D2
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P1
         Component: phobos
          Assignee: nobody puremagic.com
          Reporter: greensunny12 gmail.com

---
void main(string[] args)
{
        import std.string, std.stdio, std.regex;
        static ctr = regex(`^`);

        // unicode works
        string line = "ยต";
        line.representation.writeln; // [194, 181]

        // but not extended ASCII
        line = "\xB5"; // [181]
        line.writeln; // works
        auto m = line.matchFirst(ctr);
}
---

The error message is:

```
std.utf.UTFException /usr/include/dlang/dmd/std/utf.d(1380): Invalid UTF-8
sequence (at index 1)
----------------
??:? pure dchar std.utf.decodeImpl!(true, 0, const(char)[]).decodeImpl(ref
const(char)[], ref ulong) [0x8884beda]
??:? pure  trusted dchar std.utf.decode!(0, const(char)[]).decode(ref
const(char)[], ref ulong) [0x8884be5d]
??:? pure  safe bool std.regex.internal.ir.Input!(char).Input.nextChar(ref
dchar, ref ulong) [0x8885e318]
```

--
Feb 18 2018