www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 15773] New: D's treatment of whitespace in character classes

https://issues.dlang.org/show_bug.cgi?id=15773

          Issue ID: 15773
           Summary: D's treatment of whitespace in character classes in
                    free-from regexes is not the same as Perl's
           Product: D
           Version: D2
          Hardware: x86_64
                OS: Linux
            Status: NEW
          Severity: minor
          Priority: P1
         Component: phobos
          Assignee: nobody puremagic.com
          Reporter: d20160306.20.mlaker spamgourmet.com

In Perl, whitespace in a character class is always significant, even in /x
extend mode:

msl james:~$ perl -wE 'say "Matched" if "a b" =~ /[c d]/'
Matched
msl james:~$ perl -wE 'say "Matched" if "a b" =~ /[c d]/x'
Matched
msl james:~$

D's std.regex ignores whitespace in "x" free-form mode:

msl james:~$ rdmd --eval='auto rx = regex("[c d]", ""); "a
b".matchFirst(rx).writeln'
[" "]
msl james:~$ rdmd --eval='auto rx = regex("[c d]", "x"); "a
b".matchFirst(rx).writeln'
[]
msl james:~$ rdmd --eval='auto rx = ctRegex!("[c d]", ""); "a
b".matchFirst(rx).writeln'
[" "]
msl james:~$ rdmd --eval='auto rx = ctRegex!("[c d]", "x"); "a
b".matchFirst(rx).writeln'
[]
msl james:~$

I wasted an hour's debugging time because I didn't expect this difference: I
thought whitespace would always be significant inside a character class. 
Perhaps other developers will have the same expectation that I did.  I don't
suggest that we change the behaviour of std.regex, because it would break too
much existing code, but could we explicitly mention D's behaviour in the docs? 
Many thanks.

--
Mar 06 2016