digitalmars.D.bugs - [Issue 11765] New: std.regex: Negation of character class is not applied to base class first
- d-bugmail puremagic.com (45/45) Dec 18 2013 https://d.puremagic.com/issues/show_bug.cgi?id=11765
- d-bugmail puremagic.com (10/15) Dec 18 2013 https://d.puremagic.com/issues/show_bug.cgi?id=11765
- d-bugmail puremagic.com (13/16) Dec 18 2013 https://d.puremagic.com/issues/show_bug.cgi?id=11765
- d-bugmail puremagic.com (9/9) Dec 19 2013 https://d.puremagic.com/issues/show_bug.cgi?id=11765
- d-bugmail puremagic.com (9/22) Dec 19 2013 https://d.puremagic.com/issues/show_bug.cgi?id=11765
- d-bugmail puremagic.com (8/14) Dec 19 2013 https://d.puremagic.com/issues/show_bug.cgi?id=11765
- d-bugmail puremagic.com (7/17) Dec 20 2013 https://d.puremagic.com/issues/show_bug.cgi?id=11765
- d-bugmail puremagic.com (26/30) Jan 10 2014 https://d.puremagic.com/issues/show_bug.cgi?id=11765
https://d.puremagic.com/issues/show_bug.cgi?id=11765 Summary: std.regex: Negation of character class is not applied to base class first Product: D Version: D2 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Phobos AssignedTo: nobody puremagic.com ReportedBy: andrej.mitrovich gmail.com 11:37:06 PST --- ----- import std.regex; import std.stdio; void main() { // expected: [["3"]] - but got: [["2"]]] writeln("123456789".match("[^1--[2]]")); // the above is *currently* equivalent to: writeln("123456789".match("[^[1--[2]]]")); // which means: subtract "1 - 2" (equals 1), // and then negate it (so "2" will match first in the string) // but I expect the first case to be equivalent to: writeln("123456789".match("[[^1]--[2]]")); // which means: negate 1 (for discussion assume 2-9 range), // subtract 2 and you get 3-9, which means "3" will match first. } ----- I'm not sure whether this is just how ECMAScript does it (since std.regex references it), but e.g. .NET does negation on the base class first (The "1" class above) and *then* it does subtraction with another class. You can test this behavior here: http://refiddle.com/ Using .net syntax: [^01-[2]] 0123456789 It matches "3". Either way if this report is invalid (e.g. expected behavior) then I think we should update the docs so they state the precedence of the negation. -- Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 18 2013
https://d.puremagic.com/issues/show_bug.cgi?id=11765 11:38:32 PST ---Using .net syntax: [^01-[2]] 0123456789 It matches "3".Nevermind the leading zero, I meant to use this simpler example: [^1-[2]] 123456789 It matches "3". -- Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 18 2013
https://d.puremagic.com/issues/show_bug.cgi?id=11765 Dmitry Olshansky <dmitry.olsh gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dmitry.olsh gmail.com 11:56:14 PST ---I'm not sure whether this is just how ECMAScript does it (since std.regex references it), but e.g. .NET does negation on the base class first (The "1" class above) and *then* it does subtraction with another class.ECMAScript doesn't even have it AFAIK ;) I think you (and .NET) are right - the prioriy of unary '^' operator should be higher then that of any other binary ops. -- Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 18 2013
https://d.puremagic.com/issues/show_bug.cgi?id=11765 04:55:48 PST --- Is the following sample caused by the same issue? writeln("abcdefghijklmnopqrstuvwxyz".match("[a-z&&[^aeiuo]]")); It writes [["a"]], I was expecting the first non-vowel [["b"]]. It returns "b" in Ruby, as for .NET I haven't found the syntax it uses. -- Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 19 2013
https://d.puremagic.com/issues/show_bug.cgi?id=11765 10:27:35 PST ---Actually because of single dash it works as if all is fine... This one is good case: [^1--[2]] -- Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------Using .net syntax: [^01-[2]] 0123456789 It matches "3".Nevermind the leading zero, I meant to use this simpler example: [^1-[2]] 123456789 It matches "3".
Dec 19 2013
https://d.puremagic.com/issues/show_bug.cgi?id=11765 10:31:23 PST ---Is the following sample caused by the same issue? writeln("abcdefghijklmnopqrstuvwxyz".match("[a-z&&[^aeiuo]]")); It writes [["a"]], I was expecting the first non-vowel [["b"]]. It returns "b" in Ruby, as for .NET I haven't found the syntax it uses.From the look of it - an unrelated bug in set intersection. Better split it off as a new issue. -- Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 19 2013
https://d.puremagic.com/issues/show_bug.cgi?id=11765 00:51:17 PST ---Filed as Issue 11784. -- Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------Is the following sample caused by the same issue? writeln("abcdefghijklmnopqrstuvwxyz".match("[a-z&&[^aeiuo]]")); It writes [["a"]], I was expecting the first non-vowel [["b"]]. It returns "b" in Ruby, as for .NET I haven't found the syntax it uses.From the look of it - an unrelated bug in set intersection. Better split it off as a new issue.
Dec 20 2013
https://d.puremagic.com/issues/show_bug.cgi?id=11765 12:24:42 PST --- Ruby makes me nervous: print /[^abc[e-f]&&[ybc]]/.match('~haystack') Prints '~' meaning that ^ operator has _lower_ priority then '&&'. I'm surprised but it's the precedent. And indeed the following reports empty set and warnings about '-' without escape i.e. '--' is not supported... print /[^1--[2]]/.match("0123456789") re.rb:2: warning: character class has '-' without escape: /[^2--[1]]/ re.rb:2: empty range in char class: /[^2--[1]]/[^1-[2]] 123456789 It matches "3".And .NET is disappointing [^[2]-1] doesn't match anything. They somehow special cased only the form of [..-[set]] and arbitrary nesting of it. So we have no good precedents. My thoughts are to make it proper operator precedence grammar with priorities: 0 - implict union (pieces that stand together, evaluated first) 1 - ^ (negation) 2 - && 3 - -- 4 - || (explicit union, evaluated last) -- Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jan 10 2014