digitalmars.D.bugs - [Issue 18462] New: std.regex.matchFirst doesn't work well with
- d-bugmail puremagic.com (41/41) Feb 18 2018 https://issues.dlang.org/show_bug.cgi?id=18462
https://issues.dlang.org/show_bug.cgi?id=18462 Issue ID: 18462 Summary: std.regex.matchFirst doesn't work well with characters from extended ASCII Product: D Version: D2 Hardware: All OS: All Status: NEW Severity: enhancement Priority: P1 Component: phobos Assignee: nobody puremagic.com Reporter: greensunny12 gmail.com --- void main(string[] args) { import std.string, std.stdio, std.regex; static ctr = regex(`^`); // unicode works string line = "ยต"; line.representation.writeln; // [194, 181] // but not extended ASCII line = "\xB5"; // [181] line.writeln; // works auto m = line.matchFirst(ctr); } --- The error message is: ``` std.utf.UTFException /usr/include/dlang/dmd/std/utf.d(1380): Invalid UTF-8 sequence (at index 1) ---------------- ??:? pure dchar std.utf.decodeImpl!(true, 0, const(char)[]).decodeImpl(ref const(char)[], ref ulong) [0x8884beda] ??:? pure trusted dchar std.utf.decode!(0, const(char)[]).decode(ref const(char)[], ref ulong) [0x8884be5d] ??:? pure safe bool std.regex.internal.ir.Input!(char).Input.nextChar(ref dchar, ref ulong) [0x8885e318] ``` --
Feb 18 2018