www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Problem with RegExp

reply Matthew <matthewcsims gmail.com> writes:
I was playing around with RegExp and noticed it is not working like I think it
should be working. This is the same with both 1.0 and 2.0.

import std.stdio;
import std.regexp;

void main (char [][] args) {
    string text = "Why doesn't it find the sssss's?";

    RegExp pattern = new RegExp(r"[^\s]+"); 
   //Notice the escape code in the expression.

    RegExp list = pattern.search(text);

    foreach(m; list) {
        writefln(m.match(0));
    }

}

The regular expression should match one or more non-whitespace characters. On
my comp the whitespace characters don't match, but neither do lower case s's.
Interestingly enough if I try the following.

RegExp pattern = new RegExp(r"[^\W]+");

I get the exact same behavior except now capitol W's aren't matched instead of
lower case s's. I don't know if this is a bug or not as I've never used RegExp
class before, but I wonder if it does this for everyone?
Jan 02 2008
next sibling parent reply Russell Lewis <webmaster villagersonline.com> writes:
Matthew wrote:
 I was playing around with RegExp and noticed it is not working like I think it
should be working. This is the same with both 1.0 and 2.0.
 
 import std.stdio;
 import std.regexp;
 
 void main (char [][] args) {
     string text = "Why doesn't it find the sssss's?";
 
     RegExp pattern = new RegExp(r"[^\s]+"); 
    //Notice the escape code in the expression.
 
     RegExp list = pattern.search(text);
 
     foreach(m; list) {
         writefln(m.match(0));
     }
 
 }
 
 The regular expression should match one or more non-whitespace characters. On
my comp the whitespace characters don't match, but neither do lower case s's.
Interestingly enough if I try the following.
 
 RegExp pattern = new RegExp(r"[^\W]+");
 
 I get the exact same behavior except now capitol W's aren't matched instead of
lower case s's. I don't know if this is a bug or not as I've never used RegExp
class before, but I wonder if it does this for everyone?

Do you need a double-backslash?
Jan 02 2008
parent Matthew <matthewcsims gmail.com> writes:
Russell Lewis Wrote:

 Matthew wrote:
 I was playing around with RegExp and noticed it is not working like I think it
should be working. This is the same with both 1.0 and 2.0.
 
 import std.stdio;
 import std.regexp;
 
 void main (char [][] args) {
     string text = "Why doesn't it find the sssss's?";
 
     RegExp pattern = new RegExp(r"[^\s]+"); 
    //Notice the escape code in the expression.
 
     RegExp list = pattern.search(text);
 
     foreach(m; list) {
         writefln(m.match(0));
     }
 
 }
 
 The regular expression should match one or more non-whitespace characters. On
my comp the whitespace characters don't match, but neither do lower case s's.
Interestingly enough if I try the following.
 
 RegExp pattern = new RegExp(r"[^\W]+");
 
 I get the exact same behavior except now capitol W's aren't matched instead of
lower case s's. I don't know if this is a bug or not as I've never used RegExp
class before, but I wonder if it does this for everyone?

Do you need a double-backslash?

I did until I put the r before the quote. r"". Still got same behavior though.
Jan 02 2008
prev sibling next sibling parent reply dennis luehring <dl.soluz gmx.net> writes:
 void main (char [][] args) {
     string text = "Why doesn't it find the sssss's?";
 
     RegExp pattern = new RegExp(r"[^\s]+"); 
    //Notice the escape code in the expression.
 
     RegExp list = pattern.search(text);
 
     foreach(m; list) {
         writefln(m.match(0));
     }
 
 }

i think its a bug \s machtes invisible chars AND the char s [^\s] seem to be interpreted like [^s\s]
Jan 02 2008
next sibling parent Matthew <matthewcsims gmail.com> writes:
dennis luehring Wrote:

 void main (char [][] args) {
     string text = "Why doesn't it find the sssss's?";
 
     RegExp pattern = new RegExp(r"[^\s]+"); 
    //Notice the escape code in the expression.
 
     RegExp list = pattern.search(text);
 
     foreach(m; list) {
         writefln(m.match(0));
     }
 
 }

i think its a bug \s machtes invisible chars AND the char s [^\s] seem to be interpreted like [^s\s]

I didn't want to say it was a bug cause whenever I do that I get jinxed and it ends up being my own code, but equivalent code in C# doesn't seem to demonstrate the problem. using System; using System.Text.RegularExpressions; class MyClass { public static void Main (String [] args) { string text = "Does it find the ssssss's?"; Regex pattern = new Regex( "[^\s]+"); foreach (Match m in pattern.Matches(text)) { Console.WriteLine(m); } } } Now I'm going to continue writing my new D program because if I have to write public static void Main one more time I think I'm just gonna snap.
Jan 02 2008
prev sibling parent reply dennis luehring <dl.soluz gmx.net> writes:
maybe we can use the regex test suit from the perl-source

\t\op\regexp.t - testprogram
\t\op\re_tests - testcases
Jan 02 2008
parent Matthew <matthewcsims gmail.com> writes:
dennis luehring Wrote:

 maybe we can use the regex test suit from the perl-source
 
 \t\op\regexp.t - testprogram
 \t\op\re_tests - testcases
 

I don't have perl or sources on my comp. It's Windows and I just got it. So, it really doesn't have anything on it. But apparently from the bug reports RegExp is pretty buggy. Apparently both phobos and tango regex's are pretty buggy. So it would probably fail in many places. If I was ambitious I might try and write another regex lib, however I'm sure someone else is already working on this. Perhaps if it was really that important a well tested C or C++ regex library could just be wrapped by D. That would save a lot of time.
Jan 02 2008
prev sibling parent reply Tom <tom nospam.com> writes:
Matthew escribió:
 I was playing around with RegExp and noticed it is not working like I think it
should be working. This is the same with both 1.0 and 2.0.
 ...

RegExp has been broken for quite some time now. Search Bugzilla and you'll see. -- Tom;
Jan 02 2008
parent Matthew <matthewcsims gmail.com> writes:
Tom Wrote:

 Matthew escribió:
 I was playing around with RegExp and noticed it is not working like I think it
should be working. This is the same with both 1.0 and 2.0.

RegExp has been broken for quite some time now. Search Bugzilla and you'll see. -- Tom;

So I've noticed since posting.
Jan 02 2008