www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - RegExp.find() now crippled

reply Steve Teale <steve.teale britseyeview.com> writes:
Some time ago in phobos2, the following:

   RegExp wsr = RegExp("(\\s+)");
   int p = wsr.find("<thingie att1=\"whatever\">");
   writefln("%s|%s|%s %d",wsr.pre(),  wsr.match(1), wsr.post(), p);

would print:

<thingie| |att1="whatever"> 7

Now it prints

<thingie| |att1="whatever"> 1

The new return value is pretty useless, equivalent to returning a bool. It
seems to me that the 'find' verb's subject should be the string, not the RegExp
object.

This looks like a case of the implementation being changed to match the
documentation, when in fact it would have been better to change the
documentation to match the implementation.

Either that, or RegExp should have an indexOf method that behaves like
string.indexOf.

Steve
Nov 14 2010
parent reply KennyTM~ <kennytm gmail.com> writes:
On Nov 15, 10 14:58, Steve Teale wrote:
 Some time ago in phobos2, the following:

     RegExp wsr = RegExp("(\\s+)");
     int p = wsr.find("<thingie att1=\"whatever\">");
     writefln("%s|%s|%s %d",wsr.pre(),  wsr.match(1), wsr.post(), p);

 would print:

 <thingie| |att1="whatever">  7

 Now it prints

 <thingie| |att1="whatever">  1

 The new return value is pretty useless, equivalent to returning a bool. It
seems to me that the 'find' verb's subject should be the string, not the RegExp
object.

 This looks like a case of the implementation being changed to match the
documentation, when in fact it would have been better to change the
documentation to match the implementation.

 Either that, or RegExp should have an indexOf method that behaves like
string.indexOf.

 Steve

Isn't std.regexp replaced by std.regex? Why are both of them still in Phobos 2? (oh, and std.regex is missing a documented .index (= .src_start) property.)
Nov 15 2010
parent reply Steve Teale <steve.teale britseyeview.com> writes:
KennyTM~ Wrote:

 On Nov 15, 10 14:58, Steve Teale wrote:
 Some time ago in phobos2, the following:

     RegExp wsr = RegExp("(\\s+)");
     int p = wsr.find("<thingie att1=\"whatever\">");
     writefln("%s|%s|%s %d",wsr.pre(),  wsr.match(1), wsr.post(), p);

 would print:

 <thingie| |att1="whatever">  7

 Now it prints

 <thingie| |att1="whatever">  1

 The new return value is pretty useless, equivalent to returning a bool. It
seems to me that the 'find' verb's subject should be the string, not the RegExp
object.

 This looks like a case of the implementation being changed to match the
documentation, when in fact it would have been better to change the
documentation to match the implementation.

 Either that, or RegExp should have an indexOf method that behaves like
string.indexOf.

 Steve

Isn't std.regexp replaced by std.regex? Why are both of them still in Phobos 2? (oh, and std.regex is missing a documented .index (= .src_start) property.)

I guess std.regexp is still there because not all of us necessarily want to iterate a range to simply find out the position of the first whitespace in a string. Part of the expressiveness of languages is that one should be free to use the style that suits, and not have to read the documentation every time one uses it. Give me options in Phobos by all means. D2 is not going to succeed by forcing its users to use unfamiliar, and maybe not yet very fashionable constructions. I'm pissed off because this change broke a lot of my code, which I had not used for some time, but now have a paying customer for. The code did not break because of D language evolution. It broke because somebody decided they did not like the style of std.regexp. All I wanted was plain old regular expressions, similar to JavaScript, or PHP, or other popular languages, and std.regexp did that pretty well at one time. Steve
Nov 15 2010
next sibling parent Jesse Phillips <jessekphillips+D gmail.com> writes:
Steve Teale Wrote:

 I guess std.regexp is still there because not all of us necessarily want to
iterate a range to simply find out the position of the first whitespace in a
string.

I'm pretty sure it is still there for the same reason many are, trying to figure out when it should be removed.
 Part of the expressiveness of languages is that one should be free to use the
style that suits, and not have to read the documentation every time one uses
it. Give me options in Phobos by all means.

That has nothing to do with expressiveness, familiarity/easy of use sure.
 D2 is not going to succeed by forcing its users to use unfamiliar, and maybe
not yet very fashionable constructions.

Not providing, does not mean forcing to use.
 I'm pissed off because this change broke a lot of my code, which I had not
used for some time, but now have a paying customer for. The code did not break
because of D language evolution. It broke because somebody decided they did not
like the style of std.regexp.  All I wanted was plain old regular expressions,
similar to JavaScript, or PHP, or other popular languages, and std.regexp did
that pretty well at one time.

I agree, there is no reason a module that is scheduled for deletion should have changes made that would cause existing code to break. But looking at the history, there doesn't seem to be such changes for at least the last year. The only questionable change (one that wasn't just type changes to auto/spacing) happened 3 months ago, but I don't think the behavior was intended to change: http://www.dsource.org/projects/phobos/changeset/1923/trunk/phobos/std/regexp.d
Nov 15 2010
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/15/10 7:55 AM, Steve Teale wrote:
 KennyTM~ Wrote:

 On Nov 15, 10 14:58, Steve Teale wrote:
 Some time ago in phobos2, the following:

      RegExp wsr = RegExp("(\\s+)");
      int p = wsr.find("<thingie att1=\"whatever\">");
      writefln("%s|%s|%s %d",wsr.pre(),  wsr.match(1), wsr.post(), p);

 would print:

 <thingie| |att1="whatever">   7

 Now it prints

 <thingie| |att1="whatever">   1

 The new return value is pretty useless, equivalent to returning a bool. It
seems to me that the 'find' verb's subject should be the string, not the RegExp
object.

 This looks like a case of the implementation being changed to match the
documentation, when in fact it would have been better to change the
documentation to match the implementation.

 Either that, or RegExp should have an indexOf method that behaves like
string.indexOf.

 Steve

Isn't std.regexp replaced by std.regex? Why are both of them still in Phobos 2? (oh, and std.regex is missing a documented .index (= .src_start) property.)

I guess std.regexp is still there because not all of us necessarily want to iterate a range to simply find out the position of the first whitespace in a string. Part of the expressiveness of languages is that one should be free to use the style that suits, and not have to read the documentation every time one uses it. Give me options in Phobos by all means. D2 is not going to succeed by forcing its users to use unfamiliar, and maybe not yet very fashionable constructions. I'm pissed off because this change broke a lot of my code, which I had not used for some time, but now have a paying customer for. The code did not break because of D language evolution. It broke because somebody decided they did not like the style of std.regexp. All I wanted was plain old regular expressions, similar to JavaScript, or PHP, or other popular languages, and std.regexp did that pretty well at one time. Steve

I am sorry for the inadvertent change, it wasn't meant to change semantics of existing code. I'm not sure whether one of my unrelated 64-bit changes messed things up. You may want to file a bug report. There are a number of good reasons for which I was compelled to split std.regex from std.regexp. I'm sure you or others would have found them just as compelling if you saw things the same way. Phobos 1 has experimented in std.string and std.regexp with juxtaposing APIs of various languages (PHP, Ruby, Python). The reasoning was that people familiar with either of those languages could feel right at home by using APIs with similar nomenclatures and semantics. The result was some strange bedfellows in std.string such as "column" or "capwords" and an outright mess in std.regexp. The interface of std.regexp is without a doubt the worst I've ever seen, by a long shot. I have never been able to use it without poring through the documentation _several times_ and without confirming to myself via a small test case that I'm doing the right thing. The simplest problem is this: std.regexp uses the words "exec", "find", "match", "search", and "test" - all to mean regular expression matching. There is absolutely no logic to how meanings are ascribed to words, and there is absolutely no recourse than rote memorization of various arbitrary decisions. The resulting FrankenAPI is likely familiar to anyone except those who've actually spent time learning it, in spite of it trying to be familiar to anyone. So I spawned std.regex in an attempt to sanitize the API (I made minor, if any, changes to the engine; I am in fact having significant trouble maintaining it). The advantages of std.regex are: * No more class definition. Nobody is supposed to inherit RegExp anyway so it's useless to brand the object as a class. * Engine is separated from matches, which means that engines can be memoized for efficiency. Currently regex() only memoizes the last engine. * The new engine works with any character size. * Simpler API: create a regex, call match() against that regex and a string, look at the resulting RegexMatch object. If this all annoys you more than the old API, I will need to disagree. If you have suggestions on how std.regex can be improved, I'm all ears. Andrei
Nov 15 2010
prev sibling parent Lutger Blijdestijn <lutger.blijdestijn gmail.com> writes:
Steve Teale wrote:

 KennyTM~ Wrote:
 
 On Nov 15, 10 14:58, Steve Teale wrote:
 Some time ago in phobos2, the following:

     RegExp wsr = RegExp("(\\s+)");
     int p = wsr.find("<thingie att1=\"whatever\">");
     writefln("%s|%s|%s %d",wsr.pre(),  wsr.match(1), wsr.post(), p);

 would print:

 <thingie| |att1="whatever">  7

 Now it prints

 <thingie| |att1="whatever">  1

 The new return value is pretty useless, equivalent to returning a bool.
 It seems to me that the 'find' verb's subject should be the string, not
 the RegExp object.

 This looks like a case of the implementation being changed to match the
 documentation, when in fact it would have been better to change the
 documentation to match the implementation.

 Either that, or RegExp should have an indexOf method that behaves like
 string.indexOf.

 Steve

Isn't std.regexp replaced by std.regex? Why are both of them still in Phobos 2? (oh, and std.regex is missing a documented .index (= .src_start) property.)

I guess std.regexp is still there because not all of us necessarily want to iterate a range to simply find out the position of the first whitespace in a string. Part of the expressiveness of languages is that one should be free to use the style that suits, and not have to read the documentation every time one uses it. Give me options in Phobos by all means. D2 is not going to succeed by forcing its users to use unfamiliar, and maybe not yet very fashionable constructions. I'm pissed off because this change broke a lot of my code, which I had not used for some time, but now have a paying customer for. The code did not break because of D language evolution. It broke because somebody decided they did not like the style of std.regexp. All I wanted was plain old regular expressions, similar to JavaScript, or PHP, or other popular languages, and std.regexp did that pretty well at one time. Steve

I'm pretty sure that can be filed as a bug. The behavior is still documented as returning index of match, and the standalone std.regexp.find works that way. Patch: -1045,7 +1045,7 { int i = test(string); if (i) - i = pmatch[0].rm_so != 0; + i = pmatch[0].rm_so; else i = -1; // no match return i;
Nov 15 2010