digitalmars.D - Regular expression woes
- just jeff <jeffrparsons optusnet.com.au> Jan 17 2007
- Lionello Lunesu <lio lunesu.remove.com> Jan 17 2007
- just jeff <psychobrat gmail.com> Jan 17 2007
- just jeff <psychobrat gmail.com> Jan 17 2007
- "Lionello Lunesu" <lionello lunesu.remove.com> Jan 17 2007
- just jeff <jeffrparsons optusnet.com.au> Jan 18 2007
- "Lionello Lunesu" <lionello lunesu.remove.com> Jan 19 2007
- Frits van Bommel <fvbommel REMwOVExCAPSs.nl> Jan 20 2007
- just jeff <jeffrparsons optusnet.com.au> Jan 23 2007
Is this a bug, or am I misunderstanding something? The code...
# import std.stdio;
# import std.regexp;
#
# int main(char[][] args) {
# char[] string = "xfooxxxxxfoox";
# writefln("Greedy matching:");
# foreach (RegExp match; RegExp("x.*x").search(string))
# writefln("%s[%s]%s", match.pre, match.match(0), match.post);
# writefln("Conservative matching:");
# foreach (RegExp match; RegExp("x.*?x").search(string))
# writefln("%s[%s]%s", match.pre, match.match(0), match.post);
# return 0;
# }
...compiled under GDC 0.21 (using the Phobos version that ships
therewith) yields:
Greedy matching:
[xfooxxxxx]foox
Conservative matching:
[xfoox]xxxxfoox
xfoox[xx]xxfoox
xfooxxx[xx]foox
The latter part (conservative matching) makes plenty of sense to me, but
I thought the former should have matched the whole string (i.e. read
"[xfooxxxxxfoox]".
Is this behaviour intended?
Thanks. :)
Jan 17 2007
When searching for x.*x in xfooxxxxxfoox, VisualStudio 2005 matches the entire string: [xfooxxxxxfoox] Also, the following two appear to be missing from "conservative matching": xfoo[xx]xxxfoox xfooxxxx[xfoox] L.
Jan 17 2007
Lionello Lunesu Wrote:(...) Also, the following two appear to be missing from "conservative matching": xfoo[xx]xxxfoox xfooxxxx[xfoox]
I wouldn't have expected them to be found; I had thought standard regex behavior was not to find overlapping matches (i.e. to start searching again just past the end of any match it finds). I'm at work at the moment (and unfortunately without my laptop), so the only library I have available to test that on is the VBA one that comes with Access (*shudders* :P), but that doesn't find those two matches either.
Jan 17 2007
I just tested Java, and it doesn't return the extra matches either. I'll trawl through std.regexp when I get home to see if I can find what's going on. Any inspiration would be appreciated. I presume the default in std.regexp -is- supposed to be a greedy match, and not some strange sort of half-way match? Perhaps I presume too much? o_0
Jan 17 2007
"just jeff" <psychobrat gmail.com> wrote in message news:eom9pa$tjm$1 digitaldaemon.com...Lionello Lunesu Wrote:(...) Also, the following two appear to be missing from "conservative matching": xfoo[xx]xxxfoox xfooxxxx[xfoox]
I wouldn't have expected them to be found; I had thought standard regex behavior was not to find overlapping matches (i.e. to start searching again just past the end of any match it finds).
VS2005 did find them, using x. x L.
Jan 17 2007
VS2005 did find them, using x. x
Ack, I can't find any documentation on the use of " ". Funny, that; I've never had much luck with Microsoft's documentation at all... ;) Care to elaborate?
Jan 18 2007
"just jeff" <jeffrparsons optusnet.com.au> wrote in message news:eopti4$lan$1 digitaldaemon.com...VS2005 did find them, using x. x
Ack, I can't find any documentation on the use of " ". Funny, that; I've never had much luck with Microsoft's documentation at all... ;) Care to elaborate?
http://msdn2.microsoft.com/en-us/library/2k3te2cs(VS.80).aspx But, interestingly, the .NET framework uses the same .*? http://msdn2.microsoft.com/en-us/library/3206d374(VS.80).aspx
Jan 19 2007
Lionello Lunesu wrote:"just jeff" <jeffrparsons optusnet.com.au> wrote in message news:eopti4$lan$1 digitaldaemon.com...VS2005 did find them, using x. x
never had much luck with Microsoft's documentation at all... ;) Care to elaborate?
http://msdn2.microsoft.com/en-us/library/2k3te2cs(VS.80).aspx But, interestingly, the .NET framework uses the same .*? http://msdn2.microsoft.com/en-us/library/3206d374(VS.80).aspx
Looks like the .NET framework uses the "standard" syntax. The reason VS uses a different syntax is probably because it's meant to search in source code and some characters like *() etc are commonly used in C-like languages. Therefore they might be the characters searched for quite often, and excessive quoting is inconvenient. {} are probably searched for a lot less, so they are arguably better choices for meta-characters in this context.
Jan 20 2007
Could somebody confident in the way std.regexp should work please confirm whether or not this is a bug?
Jan 23 2007









just jeff <psychobrat gmail.com> 