digitalmars.D.bugs - [Issue 1395] New: Regex does not support range [something to \*]
- d-bugmail puremagic.com (171/171) Aug 02 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1395
- d-bugmail puremagic.com (11/11) Aug 02 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1395
- d-bugmail puremagic.com (10/10) Oct 11 2009 http://d.puremagic.com/issues/show_bug.cgi?id=1395
- d-bugmail puremagic.com (12/12) Sep 25 2010 http://d.puremagic.com/issues/show_bug.cgi?id=1395
http://d.puremagic.com/issues/show_bug.cgi?id=1395 Summary: Regex does not support range [something to \*] Product: D Version: 1.021 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Phobos AssignedTo: bugzilla digitalmars.com ReportedBy: alan akbkhome.com test case: r = new Regex("^(#)?([\w-\*]+)"); Error: inverted range in character class w > * in ^(#)?([\w-\*]+) (error expanded a bit here..) the problem appears to be the '*' char as a [...-*] Along with having \w as the start char (which does resolve well in the check in regex.) This is a fix for parseRange() that seems to work. for (;;) { int lastStart = 0; if (p == pattern.length) goto Lerr; switch (pattern[p]) { case ']': switch (rs) { case RS.dash: r.setbit2('-'); case RS.rliteral: r.setbit2(c); break; case RS.start: break; default: assert(0); } p++; break; case '\\': p++; r.setbitmax(cmax); if (p == pattern.length) goto Lerr; switch (pattern[p]) { case 'd': for (i = '0'; i <= '9'; i++) r.bits[i] = 1; lastStart = '0'; goto Lrs; case 'D': for (i = 1; i < '0'; i++) r.bits[i] = 1; for (i = '9' + 1; i <= cmax; i++) r.bits[i] = 1; lastStart = 1; goto Lrs; case 's': lastStart = -1; for (i = 0; i <= cmax; i++) if (isspace(i)) { r.bits[i] = 1; lastStart = lastStart > -1 ? lastStart : i; } goto Lrs; case 'S': lastStart = -1; for (i = 1; i <= cmax; i++) if (!isspace(i)) { r.bits[i] = 1; lastStart = lastStart > -1 ? lastStart : i; } goto Lrs; case 'w': lastStart = -1; for (i = 1; i <= cmax; i++) if (isword(i)) { r.bits[i] = 1; lastStart = lastStart > -1 ? lastStart : i; } goto Lrs; case 'W': for (i = 1; i <= cmax; i++) if (!isword(i)) { r.bits[i] = 1; lastStart = lastStart > -1 ? lastStart : i; } goto Lrs; Lrs: switch (rs) { case RS.dash: r.setbit2('-'); case RS.rliteral: r.setbit2(c); break; default: break; } rs = RS.start; continue; default: break; } c2 = escape(); goto Lrange; case '-': p++; if (rs == RS.start) goto Lrange; else if (rs == RS.rliteral) rs = RS.dash; else if (rs == RS.dash) { r.setbit2(c); r.setbit2('-'); rs = RS.start; } continue; default: lastStart = c; c2 = pattern[p]; p++; Lrange: switch (rs) { case RS.rliteral: r.setbit2(c); case RS.start: c = c2; rs = RS.rliteral; break; case RS.dash: if (c2 == '*') { // [a-\*] for (i = lastStart; i <= cmax; i++) r.bits[i] = 1; r.setbitmax(cmax); rs = RS.start; break; } else if (c > c2) { error("inverted range in character class " ~ cast(char)c ~ " > " ~ cast(char)c2 ~ " in " ~ pattern ~ " \n" ); return 0; } r.setbitmax(c2); //printf("c = %x, c2 = %x\n",c,c2); for (; c <= c2; c++) r.bits[c] = 1; rs = RS.start; break; default: assert(0); } continue; } break; } --
Aug 02 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1395 ------- Comment #1 from alan akbkhome.com 2007-08-03 00:05 ------- After looking into this in more detail - the fix is not valid. The two regex's that are causing me problems are: [\w-\.] and [\w-\*] I think the parser needs to intepret that as [\w\*-] [\w\.-] --
Aug 02 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1395 Andrei Alexandrescu <andrei metalanguage.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED CC| |andrei metalanguage.com AssignedTo|nobody puremagic.com |andrei metalanguage.com -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 11 2009
http://d.puremagic.com/issues/show_bug.cgi?id=1395 Andrei Alexandrescu <andrei metalanguage.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |INVALID --- Comment #2 from Andrei Alexandrescu <andrei metalanguage.com> 2010-09-25 13:57:16 PDT --- I didn't find \* listed as a valid escape inside a character class in EcmaScript regexes (http://bclary.com/2004/11/07/#a-15.10.1). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 25 2010