www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 395] New: std.regexp incorrectly handles UTF text

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=395

           Summary: std.regexp incorrectly handles UTF text
           Product: D
           Version: unspecified
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Severity: major
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: ddparnell bigpond.com


It seems that the std.regexp module doesn't correctly handle non-ASCII text
and wildcard matching.

 import std.stdio;
 import std.regexp;
 import std.utf;
 void test(char[] sample, char[] pat)
 {
    int pos;
    validate(sample);
    validate(pat);
    writefln("sample = %s", cast(ubyte[])sample);
    pos = find(sample, pat);
    writefln("Where = %s %s", cast(ubyte[])pat, pos);
 }
 void main()
 {
    test("\u3026a\u2021\u5004b\u4011", "a\u2021\u5004b"); // works
    test("\u3026a\u2021\u5004b\u4011", "a..b"); // fails

    test("1a23b4", "a23b"); // works
    test("1a23b4", "a..b"); // works

 }


-- 
Oct 02 2006
parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=395


bugzilla digitalmars.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED





Fixed DMD 0.169, but probably more UTF bugs remain.


-- 
Oct 10 2006