www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - [draft] New std.regex walkthrough

reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
For a couple of releases we have a new revamped std.regex, that as far 
as I'm concerned works nicely, thanks to my GSOC commitment last summer. 
Yet there was certain dark trend around std.regex/std.regexp as both had 
severe bugs, missing documentation and what not, enough to consider them 
unusable or dismiss prematurely.

It's about time to break this gloomy aura, and show that std.regex is 
actually easy to use, that it does the thing and has some nice extras.

Link: http://blackwhale.github.com/regular-expression.html

Comments are welcome from experts and newbies alike, in fact it should 
encourage people to try out a few tricks ;)

This is intended as replacement for an article on dlang.org
about outdated (and soon to disappear) std.regexp:
http://dlang.org/regular-expression.html

[Spoiler] one example relies on a parser bug being fixed (blush):
https://github.com/D-Programming-Language/phobos/pull/481
Well, it was a specific lookahead inside lookaround so that's not severe 
bug ;)

P.S. I've been following through a bunch of new bug reports recently, 
thanks to everyone involved :)


-- 
Dmitry Olshansky
Mar 13 2012
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/13/12 2:27 PM, Dmitry Olshansky wrote:
 For a couple of releases we have a new revamped std.regex, that as far
 as I'm concerned works nicely, thanks to my GSOC commitment last summer.
 Yet there was certain dark trend around std.regex/std.regexp as both had
 severe bugs, missing documentation and what not, enough to consider them
 unusable or dismiss prematurely.

 It's about time to break this gloomy aura, and show that std.regex is
 actually easy to use, that it does the thing and has some nice extras.

 Link: http://blackwhale.github.com/regular-expression.html

Reddited: http://www.reddit.com/r/programming/comments/quyy1/walk_through_regexen_in_the_d_programming/ Andrei
Mar 13 2012
prev sibling next sibling parent reply "Nick Sabalausky" <a a.a> writes:
"Dmitry Olshansky" <dmitry.olsh gmail.com> wrote in message 
news:jjo73v$4gv$1 digitalmars.com...
 For a couple of releases we have a new revamped std.regex, that as far as 
 I'm concerned works nicely, thanks to my GSOC commitment last summer. Yet 
 there was certain dark trend around std.regex/std.regexp as both had 
 severe bugs, missing documentation and what not, enough to consider them 
 unusable or dismiss prematurely.

 It's about time to break this gloomy aura, and show that std.regex is 
 actually easy to use, that it does the thing and has some nice extras.

 Link: http://blackwhale.github.com/regular-expression.html

 Comments are welcome from experts and newbies alike, in fact it should 
 encourage people to try out a few tricks ;)

 This is intended as replacement for an article on dlang.org
 about outdated (and soon to disappear) std.regexp:
 http://dlang.org/regular-expression.html

 [Spoiler] one example relies on a parser bug being fixed (blush):
 https://github.com/D-Programming-Language/phobos/pull/481
 Well, it was a specific lookahead inside lookaround so that's not severe 
 bug ;)

 P.S. I've been following through a bunch of new bug reports recently, 
 thanks to everyone involved :)

Looks nice at an initial glance through. Few things I'll point out though: - The bullet-list immediately after the text "Now, come to think of it, this tiny sample showed a lot of useful things already:" looks like it's outdented instead of indented. Just kinda looks a little odd. - Speaking of the same line, I'd omit the "Now, come to think of it" part. It sounds too "stream-of-conciousness" and not very "professional article". - I'm very much in favor of using backticked strings for regexes instead of r"", because with the latter, you can't include double-quotes, which I'd think would be a much more common need in a regex than a backtick. Although I understand that backticks aren't easy to make on some keyboards. (In the US layout I have, it's just an unshifted tilde, ie, the key just to the left of "1". I guess some people don't have a backtick key though?)
Mar 13 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 13.03.2012 23:42, Nick Sabalausky wrote:
 "Dmitry Olshansky"<dmitry.olsh gmail.com>  wrote in message
 news:jjo73v$4gv$1 digitalmars.com...
 For a couple of releases we have a new revamped std.regex, that as far as
 I'm concerned works nicely, thanks to my GSOC commitment last summer. Yet
 there was certain dark trend around std.regex/std.regexp as both had
 severe bugs, missing documentation and what not, enough to consider them
 unusable or dismiss prematurely.

 It's about time to break this gloomy aura, and show that std.regex is
 actually easy to use, that it does the thing and has some nice extras.

 Link: http://blackwhale.github.com/regular-expression.html

 Comments are welcome from experts and newbies alike, in fact it should
 encourage people to try out a few tricks ;)

 This is intended as replacement for an article on dlang.org
 about outdated (and soon to disappear) std.regexp:
 http://dlang.org/regular-expression.html

 [Spoiler] one example relies on a parser bug being fixed (blush):
 https://github.com/D-Programming-Language/phobos/pull/481
 Well, it was a specific lookahead inside lookaround so that's not severe
 bug ;)

 P.S. I've been following through a bunch of new bug reports recently,
 thanks to everyone involved :)

Looks nice at an initial glance through. Few things I'll point out though: - The bullet-list immediately after the text "Now, come to think of it, this tiny sample showed a lot of useful things already:" looks like it's outdented instead of indented. Just kinda looks a little odd. - Speaking of the same line, I'd omit the "Now, come to think of it" part. It sounds too "stream-of-conciousness" and not very "professional article".

Thanks, these are kind of things I intend to fix/improve/etc. Hence the [draft] prefix.
 - I'm very much in favor of using backticked strings for regexes instead of
 r"", because with the latter, you can't include double-quotes, which I'd
 think would be a much more common need in a regex than a backtick. Although
 I understand that backticks aren't easy to make on some keyboards. (In the
 US layout I have, it's just an unshifted tilde, ie, the key just to the left
 of "1". I guess some people don't have a backtick key though?)

Same here, but I recall there is a movement (was it?) against backticked strings, including some of DPL's highly ranked members ;) So I thought that maybe it's best to not impose my (perverted?) style on readers. -- Dmitry Olshansky
Mar 13 2012
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Dmitry Olshansky:

 It's about time to break this gloomy aura, and show that std.regex is 
 actually easy to use, that it does the thing and has some nice extras.

This seems a good moment to ask people regarding this small problem, that we have already discussed a little in Bugizilla (there is a significant need to show here some Bugzilla discussions): http://d.puremagic.com/issues/show_bug.cgi?id=7260 The problem is easy to show: import std.stdio: write, writeln; import std.regex: regex, match; void main() { string text = "abc312de"; foreach (c; text.match("1|2|3|4")) write(c, " "); writeln(); foreach (c; text.match(regex("1|2|3|4", "g"))) write(c, " "); writeln(); } It outputs: ["3"] ["3"] ["1"] ["2"] In my code I have seen that usually the "g" option (that means "repeat over the whole input") is what I want. So what do you think about making "g" the default? This request is not as arbitrary as it looks, if you compare to the older API. See Bug 7260 for more info. Bye, bearophile
Mar 13 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 14.03.2012 0:05, bearophile wrote:
 Dmitry Olshansky:

 It's about time to break this gloomy aura, and show that std.regex is
 actually easy to use, that it does the thing and has some nice extras.

This seems a good moment to ask people regarding this small problem, that we have already discussed a little in Bugizilla (there is a significant need to show here some Bugzilla discussions): http://d.puremagic.com/issues/show_bug.cgi?id=7260

Yeah, it's prime thing that I regret when thinking of current API.
 The problem is easy to show:

 import std.stdio: write, writeln;
 import std.regex: regex, match;

 void main() {
      string text = "abc312de";

      foreach (c; text.match("1|2|3|4"))
          write(c, " ");
      writeln();

      foreach (c; text.match(regex("1|2|3|4", "g")))
          write(c, " ");
      writeln();
 }


 It outputs:

 ["3"]
 ["3"] ["1"] ["2"]

 In my code I have seen that usually the "g" option (that means "repeat over the
 whole input") is what I want. So what do you think about making "g" the
default?

Yet I'm not convinced to use extra flag as "non-global". I'd propose to yank "g" flag entirely assuming all regex are global, but that breaks code in a lot of subtle ways. Problems of using global flag by default: 1. Generic stuff: assert(equal(match(...), someOtherRange)); //normal regex silently becomes global, quite unexpectedly 2. replace that then have to be 2 funcs - replaceFirst, replaceAll or we are back to the problem of extra flag. I'm thinking there is a path through opApply to allow foreach iteration of non-global regex as if it had global flag, yet not getting full range interface. It's hackish but so far it's as best as it gets.
 This request is not as arbitrary as it looks, if you compare to the older API.
See Bug 7260 for more info.

-- Dmitry Olshansky
Mar 13 2012
prev sibling next sibling parent "Jesse Phillips" <Jessekphillips+D gmail.com> writes:
On Tuesday, 13 March 2012 at 19:27:59 UTC, Dmitry Olshansky wrote:
 For a couple of releases we have a new revamped std.regex, that 
 as far as I'm concerned works nicely, thanks to my GSOC 
 commitment last summer. Yet there was certain dark trend around 
 std.regex/std.regexp as both had severe bugs, missing 
 documentation and what not, enough to consider them unusable or 
 dismiss prematurely.

Thank you for the work Dmitry, I look forward to reading this and ultimately have been happy with the changes. D has been getting a great number of face lifts on its many faces.
Mar 13 2012
prev sibling next sibling parent reply Brad Anderson <eco gnuk.net> writes:
--e89a8f22c5358866fa04bb25c129
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Tue, Mar 13, 2012 at 1:27 PM, Dmitry Olshansky <dmitry.olsh gmail.com>wr=
ote:

 For a couple of releases we have a new revamped std.regex, that as far as
 I'm concerned works nicely, thanks to my GSOC commitment last summer. Yet
 there was certain dark trend around std.regex/std.regexp as both had seve=

 bugs, missing documentation and what not, enough to consider them unusabl=

 or dismiss prematurely.

 It's about time to break this gloomy aura, and show that std.regex is
 actually easy to use, that it does the thing and has some nice extras.

 Link: http://blackwhale.github.com/**regular-expression.html<http://black=

 Comments are welcome from experts and newbies alike, in fact it should
 encourage people to try out a few tricks ;)

 This is intended as replacement for an article on dlang.org
 about outdated (and soon to disappear) std.regexp:
 http://dlang.org/regular-**expression.html<http://dlang.org/regular-expre=

 [Spoiler] one example relies on a parser bug being fixed (blush):
 https://github.com/D-**Programming-Language/phobos/**pull/481<https://git=

 Well, it was a specific lookahead inside lookaround so that's not severe
 bug ;)

 P.S. I've been following through a bunch of new bug reports recently,
 thanks to everyone involved :)


 --
 Dmitry Olshansky

Second paragraph: - "..,expressions, though one though one should..." has too many "though one"s Third paragraph: - "...keeping it's implementation..." should be "its" - "We'll see how close to built-ins one can get this way." was kind of confusing. I'd consider just doing away with the distinction between built in and non-built in regex since it's an implementation detail most programmers who use it don't even need to know about. Maybe say that it is not built in and explain why that is a neat thing to have (meaning, the language itself is powerful enough to express it in user code). Fourth paragraph: - "...article you'd have..." should probably be "you'll" or, preferably, "you will". - "...utilize it's API..." should be "its" - "yet it's not required to get an understanding of the API." I'd probably change this to "...yet it's not required to understand the API" Lost track of which paragraph: - "... that allows writing a regex pattern in it's natural notation" another "its" - "trying to match special characters like" I'd write "trying to match special regex characters like" for clarity - "over input like e.g. search or simillar" I'd remove the e.g., write search as "search()" to show it's a function in other languages and fix the spelling of similar :P - "An element type is Captures for the string type being used, it is a random access range." I just found this confusing. Not sure what it's trying to say. - "I won't go into full detail of the range conception, suffice to say," I'd change "conception" to "concept" and remove "suffice to say". (It's a shame we don't a range article we can link to). - "At that time ancors like" misspelled "anchors" - "Needless to say, one need not" I'd remove the "Needless to say," because I think it's actually important to say :P - "replace(text, regex(r"([0-9]{1,2})/([0-9]{1,2})/([0-9]{4})","g"), "--");" Is this code example correct? It references $1, $2, etc. in the explanatory paragraph below but they are no where to be found. - When you are explaining named captures it sounds like you are about to show them in the subsequent code example but you are actually showing what it'd look like without them which was a bit confusing. - Maybe some more words on what lookaround/lookahead do as I was lost. - "Amdittedly, barrage of ? and ! makes regex rather obscure, more then it's actually is. However" should be "Admittedly, the barrage of ? and ! makes the regex rather obscure, more than it actually is.". Maybe change "obscure" to a different adjective. Perhaps "complex looking" or "complicated". (note I've removed the "However" as the upcoming sentence isn't contradicting what you just said. - "Needless to say it's", again, I think it's rather important to say :P - "Run-time version took around 10-20us on my machine, admittedly no statistics." here, borrow this "=B5" :P. Also, I'd get rid of "admittedly = no statistics". - "meaningful tasks, it's features" another "its" - "together it's major" and another :P - "...flexible tools: match, replace, spliter" should be spelled "splitter" Great article. I didn't even know about the replacement delegate feature which is something I've often wished I could use in other regex systems. D and Phobos need more articles like this. We should have a link to it from the std.regex documentation once this is added to the website. Regards, Brad Anderson --e89a8f22c5358866fa04bb25c129 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Tue, Mar 13, 2012 at 1:27 PM, Dmitry Olshansky <span dir=3D"ltr">&lt;<a = href=3D"mailto:dmitry.olsh gmail.com">dmitry.olsh gmail.com</a>&gt;</span> = wrote:<br><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" styl= e=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> For a couple of releases we have a new revamped std.regex, that as far as I= &#39;m concerned works nicely, thanks to my GSOC commitment last summer. Ye= t there was certain dark trend around std.regex/std.regexp as both had seve= re bugs, missing documentation and what not, enough to consider them unusab= le or dismiss prematurely.<br> <br> It&#39;s about time to break this gloomy aura, and show that std.regex is a= ctually easy to use, that it does the thing and has some nice extras.<br> <br> Link: <a href=3D"http://blackwhale.github.com/regular-expression.html" targ= et=3D"_blank">http://blackwhale.github.com/<u></u>regular-expression.html</= a><br> <br> Comments are welcome from experts and newbies alike, in fact it should enco= urage people to try out a few tricks ;)<br> <br> This is intended as replacement for an article on <a href=3D"http://dlang.o= rg" target=3D"_blank">dlang.org</a><br> about outdated (and soon to disappear) std.regexp:<br> <a href=3D"http://dlang.org/regular-expression.html" target=3D"_blank">http= ://dlang.org/regular-<u></u>expression.html</a><br> <br> [Spoiler] one example relies on a parser bug being fixed (blush):<br> <a href=3D"https://github.com/D-Programming-Language/phobos/pull/481" targe= t=3D"_blank">https://github.com/D-<u></u>Programming-Language/phobos/<u></u=
pull/481</a><br>

e bug ;)<br> <br> P.S. I&#39;ve been following through a bunch of new bug reports recently, t= hanks to everyone involved :)<span class=3D"HOEnZb"><font color=3D"#888888"=
<br>

<br> -- <br> Dmitry Olshansky<br> </font></span></blockquote></div><br><div>Second paragraph:</div><div>- &qu= ot;..,expressions, though one though one should...&quot; has too many &quot= ;though one&quot;s<br></div><div><br></div><div>Third paragraph:</div> <div>- &quot;...keeping it&#39;s implementation...&quot; should be &quot;it= s&quot;</div><div>- &quot;We&#39;ll see how close to built-ins one can get = this way.&quot; was kind of confusing. =A0I&#39;d consider just doing away = with the distinction between built in and non-built in regex since it&#39;s= an implementation detail most programmers who use it don&#39;t even need t= o know about. =A0Maybe say that it is not built in and explain why that is = a neat thing to have (meaning, the language itself is powerful enough to ex= press it in user code).</div> <div><br></div><div>Fourth paragraph:</div><div>- &quot;...article you&#39;= d have...&quot; should probably be &quot;you&#39;ll&quot; or, preferably, &= quot;you will&quot;.</div><div>- &quot;...utilize it&#39;s API...&quot; sho= uld be &quot;its&quot;</div> <div>- &quot;yet it&#39;s not required to get an understanding of the API.&= quot; I&#39;d probably change this to &quot;...yet it&#39;s not required to= understand the API&quot;</div><div><br></div><div>Lost track of which para= graph:</div> <div>- &quot;... that allows writing a regex pattern in it&#39;s natural no= tation&quot; another &quot;its&quot;</div><div>- &quot;trying to match spec= ial characters like&quot; I&#39;d write &quot;trying to match special regex= characters like&quot; for clarity</div> <div>- &quot;over input like e.g. search or simillar&quot; I&#39;d remove t= he e.g., write search as &quot;search()&quot; to show it&#39;s a function i= n other languages and fix the spelling of similar :P</div><div>- &quot;An e= lement type is Captures for the string type being used, it is a random acce= ss range.&quot; I just found this confusing. =A0Not sure what it&#39;s tryi= ng to say.</div> <div>- &quot;I won&#39;t go into full detail of the range conception, suffi= ce to say,&quot; I&#39;d change &quot;conception&quot; to &quot;concept&quo= t; and remove &quot;suffice to say&quot;. (It&#39;s a shame we don&#39;t a = range article we can link to).</div> <div>- &quot;At that time ancors like&quot; misspelled &quot;anchors&quot;<= /div><div>- &quot;Needless to say, one need not&quot; I&#39;d remove the &q= uot;Needless to say,&quot; because I think it&#39;s actually important to s= ay :P</div> <div>- &quot;replace(text, regex(r&quot;([0-9]{1,2})/([0-9]{1,2})/([0-9]{4}= )&quot;,&quot;g&quot;), &quot;--&quot;);&quot; Is this code example correct= ? =A0It references=A0$1, $2, etc.=A0in the explanatory paragraph below but = they are no where to be found.</div> <div>- When you are explaining named captures it sounds like you are about = to show them in the subsequent code example but you are actually showing wh= at it&#39;d look like without them which was a bit confusing.</div><div> - Maybe some more words on what lookaround/lookahead do as I was lost.</div=
<div>- &quot;Amdittedly, barrage of ? and ! makes regex rather obscure, mo=

barrage of ? and ! makes the regex rather obscure, more than it actually i= s.&quot;. =A0Maybe change &quot;obscure&quot; to a different adjective. Per= haps &quot;complex looking&quot; or &quot;complicated&quot;. (note I&#39;ve= removed the &quot;However&quot; as the upcoming sentence isn&#39;t contrad= icting what you just said.</div> <div>- &quot;Needless to say it&#39;s&quot;, again, I think it&#39;s rather= important to say :P</div><div>- &quot;Run-time version took around 10-20us= on my machine, admittedly no statistics.&quot; here, borrow this &quot;=B5= &quot; :P. =A0Also, I&#39;d get rid of &quot;admittedly no statistics&quot;= .</div> <div>- &quot;meaningful tasks, it&#39;s features&quot; another &quot;its&qu= ot;</div><div>- &quot;together it&#39;s major&quot; and another :P</div><di= v>- &quot;...flexible tools: match, replace, spliter&quot; should be spelle= d &quot;splitter&quot;</div> <div><br></div><div><br></div><div>Great article. =A0I didn&#39;t even know= about the replacement delegate feature which is something I&#39;ve often w= ished I could use in other regex systems. =A0D and Phobos need more article= s like this. =A0We should have a link to it from the std.regex documentatio= n once this is added to the website.</div> <div><br></div><div>Regards,</div><div>Brad Anderson</div> --e89a8f22c5358866fa04bb25c129--
Mar 13 2012
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 14.03.2012 0:32, Brad Anderson wrote:
 On Tue, Mar 13, 2012 at 1:27 PM, Dmitry Olshansky <dmitry.olsh gmail.com
 <mailto:dmitry.olsh gmail.com>> wrote:

     For a couple of releases we have a new revamped std.regex, that as
     far as I'm concerned works nicely, thanks to my GSOC commitment last
     summer. Yet there was certain dark trend around std.regex/std.regexp
     as both had severe bugs, missing documentation and what not, enough
     to consider them unusable or dismiss prematurely.

     It's about time to break this gloomy aura, and show that std.regex
     is actually easy to use, that it does the thing and has some nice
     extras.

     Link: http://blackwhale.github.com/__regular-expression.html
     <http://blackwhale.github.com/regular-expression.html>

     Comments are welcome from experts and newbies alike, in fact it
     should encourage people to try out a few tricks ;)

     This is intended as replacement for an article on dlang.org
     <http://dlang.org>
     about outdated (and soon to disappear) std.regexp:
     http://dlang.org/regular-__expression.html
     <http://dlang.org/regular-expression.html>

     [Spoiler] one example relies on a parser bug being fixed (blush):
     https://github.com/D-__Programming-Language/phobos/__pull/481
     <https://github.com/D-Programming-Language/phobos/pull/481>
     Well, it was a specific lookahead inside lookaround so that's not
     severe bug ;)

     P.S. I've been following through a bunch of new bug reports
     recently, thanks to everyone involved :)


     --
     Dmitry Olshansky


 Second paragraph:
 - "..,expressions, though one though one should..." has too many "though
 one"s

 Third paragraph:
 - "...keeping it's implementation..." should be "its"
 - "We'll see how close to built-ins one can get this way." was kind of
 confusing.  I'd consider just doing away with the distinction between
 built in and non-built in regex since it's an implementation detail most
 programmers who use it don't even need to know about.  Maybe say that it
 is not built in and explain why that is a neat thing to have (meaning,
 the language itself is powerful enough to express it in user code).

 Fourth paragraph:
 - "...article you'd have..." should probably be "you'll" or, preferably,
 "you will".
 - "...utilize it's API..." should be "its"
 - "yet it's not required to get an understanding of the API." I'd
 probably change this to "...yet it's not required to understand the API"

 Lost track of which paragraph:
 - "... that allows writing a regex pattern in it's natural notation"
 another "its"
 - "trying to match special characters like" I'd write "trying to match
 special regex characters like" for clarity
 - "over input like e.g. search or simillar" I'd remove the e.g., write
 search as "search()" to show it's a function in other languages and fix
 the spelling of similar :P
 - "An element type is Captures for the string type being used, it is a
 random access range." I just found this confusing.  Not sure what it's
 trying to say.
 - "I won't go into full detail of the range conception, suffice to say,"
 I'd change "conception" to "concept" and remove "suffice to say". (It's
 a shame we don't a range article we can link to).
 - "At that time ancors like" misspelled "anchors"
 - "Needless to say, one need not" I'd remove the "Needless to say,"
 because I think it's actually important to say :P
 - "replace(text, regex(r"([0-9]{1,2})/([0-9]{1,2})/([0-9]{4})","g"),
 "--");" Is this code example correct?  It references $1, $2, etc. in the
 explanatory paragraph below but they are no where to be found.
 - When you are explaining named captures it sounds like you are about to
 show them in the subsequent code example but you are actually showing
 what it'd look like without them which was a bit confusing.
 - Maybe some more words on what lookaround/lookahead do as I was lost.
 - "Amdittedly, barrage of ? and ! makes regex rather obscure, more then
 it's actually is. However" should be "Admittedly, the barrage of ? and !
 makes the regex rather obscure, more than it actually is.".  Maybe
 change "obscure" to a different adjective. Perhaps "complex looking" or
 "complicated". (note I've removed the "However" as the upcoming sentence
 isn't contradicting what you just said.
 - "Needless to say it's", again, I think it's rather important to say :P
 - "Run-time version took around 10-20us on my machine, admittedly no
 statistics." here, borrow this "" :P.  Also, I'd get rid of "admittedly
 no statistics".
 - "meaningful tasks, it's features" another "its"
 - "together it's major" and another :P
 - "...flexible tools: match, replace, spliter" should be spelled "splitter"

Wow, thanks a lot, that sure was a through read. I'll going to carefully work through this list tomorrow.
 Great article.  I didn't even know about the replacement delegate
 feature which is something I've often wished I could use in other regex
 systems.  D and Phobos need more articles like this.  We should have a
 link to it from the std.regex documentation once this is added to the
 website.

 Regards,
 Brad Anderson

-- Dmitry Olshansky
Mar 13 2012
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 14.03.2012 0:32, Brad Anderson wrote:
 On Tue, Mar 13, 2012 at 1:27 PM, Dmitry Olshansky <dmitry.olsh gmail.com
 <mailto:dmitry.olsh gmail.com>> wrote:

     For a couple of releases we have a new revamped std.regex, that as
     far as I'm concerned works nicely, thanks to my GSOC commitment last
     summer. Yet there was certain dark trend around std.regex/std.regexp
     as both had severe bugs, missing documentation and what not, enough
     to consider them unusable or dismiss prematurely.

     It's about time to break this gloomy aura, and show that std.regex
     is actually easy to use, that it does the thing and has some nice
     extras.

     Link: http://blackwhale.github.com/__regular-expression.html
     <http://blackwhale.github.com/regular-expression.html>

     Comments are welcome from experts and newbies alike, in fact it
     should encourage people to try out a few tricks ;)

     This is intended as replacement for an article on dlang.org
     <http://dlang.org>
     about outdated (and soon to disappear) std.regexp:
     http://dlang.org/regular-__expression.html
     <http://dlang.org/regular-expression.html>

     [Spoiler] one example relies on a parser bug being fixed (blush):
     https://github.com/D-__Programming-Language/phobos/__pull/481
     <https://github.com/D-Programming-Language/phobos/pull/481>
     Well, it was a specific lookahead inside lookaround so that's not
     severe bug ;)

     P.S. I've been following through a bunch of new bug reports
     recently, thanks to everyone involved :)


     --
     Dmitry Olshansky


 Second paragraph:
 - "..,expressions, though one though one should..." has too many "though
 one"s

 Third paragraph:
 - "...keeping it's implementation..." should be "its"
 - "We'll see how close to built-ins one can get this way." was kind of
 confusing.  I'd consider just doing away with the distinction between
 built in and non-built in regex since it's an implementation detail most
 programmers who use it don't even need to know about.  Maybe say that it
 is not built in and explain why that is a neat thing to have (meaning,
 the language itself is powerful enough to express it in user code).

Yeah, the point about built-in vs library is kind of dangling in the air for now. Will see how to wrap it up.
 Fourth paragraph:
 - "...article you'd have..." should probably be "you'll" or, preferably,
 "you will".
 - "...utilize it's API..." should be "its"
 - "yet it's not required to get an understanding of the API." I'd
 probably change this to "...yet it's not required to understand the API"

 Lost track of which paragraph:
 - "... that allows writing a regex pattern in it's natural notation"
 another "its"
 - "trying to match special characters like" I'd write "trying to match
 special regex characters like" for clarity
 - "over input like e.g. search or simillar" I'd remove the e.g., write
 search as "search()" to show it's a function in other languages and fix
 the spelling of similar :P
 - "An element type is Captures for the string type being used, it is a
 random access range." I just found this confusing.  Not sure what it's
 trying to say.
 - "I won't go into full detail of the range conception, suffice to say,"
 I'd change "conception" to "concept" and remove "suffice to say". (It's
 a shame we don't a range article we can link to).
 - "At that time ancors like" misspelled "anchors"

All to the point and fixed.
 - "Needless to say, one need not" I'd remove the "Needless to say,"
 because I think it's actually important to say :P

It's not important, as it has no effect on matching if there no anchors. It's just cleaner to the reader, because it alerts along the way of "hm, this guy don't know what multi-line is, let's stay sharp and watch out for other problems".
 - "replace(text, regex(r"([0-9]{1,2})/([0-9]{1,2})/([0-9]{4})","g"),
 "--");" Is this code example correct?  It references $1, $2, etc. in the
 explanatory paragraph below but they are no where to be found.

Damnable DDoc ate my dollars! And that's inside source code section, any ideas on how to avoid this mess?
 - When you are explaining named captures it sounds like you are about to
 show them in the subsequent code example but you are actually showing
 what it'd look like without them which was a bit confusing.
 - Maybe some more words on what lookaround/lookahead do as I was lost.

 - "Amdittedly, barrage of ? and ! makes regex rather obscure, more then
 it's actually is. However" should be "Admittedly, the barrage of ? and !
 makes the regex rather obscure, more than it actually is.".  Maybe
 change "obscure" to a different adjective. Perhaps "complex looking" or
 "complicated". (note I've removed the "However" as the upcoming sentence
 isn't contradicting what you just said.
 - "Needless to say it's", again, I think it's rather important to say :P

Here I concur ;)
 - "Run-time version took around 10-20us on my machine, admittedly no
 statistics." here, borrow this "" :P.  Also, I'd get rid of "admittedly
 no statistics".
 - "meaningful tasks, it's features" another "its"
 - "together it's major" and another :P

Yeah, that an "it's" killing parade :)]
 - "...flexible tools: match, replace, spliter" should be spelled "splitter"


 Great article.  I didn't even know about the replacement delegate
 feature which is something I've often wished I could use in other regex
 systems.  D and Phobos need more articles like this.  We should have a
 link to it from the std.regex documentation once this is added to the
 website.

Thanks again. -- Dmitry Olshansky
Mar 14 2012
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Mar 13, 2012 at 11:27:57PM +0400, Dmitry Olshansky wrote:
 For a couple of releases we have a new revamped std.regex, that as
 far as I'm concerned works nicely, thanks to my GSOC commitment last
 summer. Yet there was certain dark trend around std.regex/std.regexp
 as both had severe bugs, missing documentation and what not, enough
 to consider them unusable or dismiss prematurely.
 
 It's about time to break this gloomy aura, and show that std.regex
 is actually easy to use, that it does the thing and has some nice
 extras.
 
 Link: http://blackwhale.github.com/regular-expression.html
 
 Comments are welcome from experts and newbies alike, in fact it
 should encourage people to try out a few tricks ;)

Yay! Updated docs is always a good thing. I'd like to do some copy-editing to make it nicer to read. (Hope you don't mind my extensive revisions, I'm trying to make the docs as professional as possible.) My revisions are in straight text under the quoted sections, and inline comments are enclosed in [].
 Introduction
 
 String processing is a kind of daily routine that most applications do
 in a one way or another.  It should come as no wonder that many
 programming languages have standard libraries stoked with specialized
 functions for common needs.

String processing is a common task performed by many applications. Many programming languages come with standard libraries that are equipped with a variety of functions for common string processing needs.
 The D programming language standard library among others offers a nice
 assortment in std.string and generic ones from std.algorithm.

The D programming language standard library also offers a nice assortment of such functions in std.string, as well as generic functions in std.algorithm that can also work with strings.
 Still no amount of fixed functionality could cover all needs, as
 naturally flexible text data needs flexible solutions. 

Still no amount of predefined string functions could cover all needs. Text data is very flexible by nature, and so needs flexible solutions.
 Here is where regular expressions come in handy, often succinctly
 called as regexes.

This is where regular expressions, or regexes for short, come in.
 Simple yet powerful language for defining patterns of strings, put
 together with a substitution mechanism, forms a Swiss Army knife of
 text processing.

Regexes are a simple yet powerful language for defining patterns of strings, and when integrated with a substitution mechanism, forms a Swiss Army knife of text processing.
 It's considered so useful that a number of languages provides built-in
 support for regular expressions, though one though one should not jump
 to conclusion that built-in implies faster processing or more
 features. It's all about getting more convenient and friendly syntax
 for typical operations and usage patterns. 

It's considered so useful that a number of languages provides built-in support for regular expressions. (This doesn't necessarily mean, however, that built-in implies faster processing or more features. It's more a matter of providing a more convenient and friendly syntax for typical operations and usage patterns.) [I think it's better to put the second part in parentheses, since it's not really the main point of this doc.]
 The D programming language provides a standard library module
 std.regex.

[OK]
 Being a highly expressive systems language, it opens a possibility to
 get a good look and feel via core features, while keeping it's
 implementation within the language.

Being a highly expressive systems language, D allows regexes to be implemented within the language itself, yet still have the same level of readability and usability that a built-in implementation would provide.
 We'll see how close to built-ins one can get this way. 

We will see below how close to built-in regexes we can achieve.
 By the end of article you'd have a good understanding of regular
 expression capabilities in this library, and how to utilize it's API
 in a most straightforward way.

By the end of this article, you will have a good understanding of the regular expression capabilities offered by this library, and how to utilize its API in the most straightforward way.
 Examples in this article assume the reader has fairly good
 understanding of regex elements, yet it's not required to get an
 understanding of the API.

Examples in this article assume that the reader has fairly good understanding of regex elements, but this is not required to get an understanding of the API. [I'll do this much for now. More to come later.] T -- Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
Mar 13 2012
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 14.03.2012 0:54, H. S. Teoh wrote:
 On Tue, Mar 13, 2012 at 11:27:57PM +0400, Dmitry Olshansky wrote:
 For a couple of releases we have a new revamped std.regex, that as
 far as I'm concerned works nicely, thanks to my GSOC commitment last
 summer. Yet there was certain dark trend around std.regex/std.regexp
 as both had severe bugs, missing documentation and what not, enough
 to consider them unusable or dismiss prematurely.

 It's about time to break this gloomy aura, and show that std.regex
 is actually easy to use, that it does the thing and has some nice
 extras.

 Link: http://blackwhale.github.com/regular-expression.html

 Comments are welcome from experts and newbies alike, in fact it
 should encourage people to try out a few tricks ;)

Yay! Updated docs is always a good thing. I'd like to do some copy-editing to make it nicer to read. (Hope you don't mind my extensive revisions, I'm trying to make the docs as professional as possible.) My revisions are in straight text under the quoted sections, and inline comments are enclosed in [].

Thanks, I'm concerned with "make it nicer to read" part ;) [... a bunch of good stuff to work through later on ...] -- Dmitry Olshansky
Mar 13 2012
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 14.03.2012 0:54, H. S. Teoh wrote:
 On Tue, Mar 13, 2012 at 11:27:57PM +0400, Dmitry Olshansky wrote:
 For a couple of releases we have a new revamped std.regex, that as
 far as I'm concerned works nicely, thanks to my GSOC commitment last
 summer. Yet there was certain dark trend around std.regex/std.regexp
 as both had severe bugs, missing documentation and what not, enough
 to consider them unusable or dismiss prematurely.

 It's about time to break this gloomy aura, and show that std.regex
 is actually easy to use, that it does the thing and has some nice
 extras.

 Link: http://blackwhale.github.com/regular-expression.html

 Comments are welcome from experts and newbies alike, in fact it
 should encourage people to try out a few tricks ;)

Yay! Updated docs is always a good thing. I'd like to do some copy-editing to make it nicer to read. (Hope you don't mind my extensive revisions, I'm trying to make the docs as professional as possible.) My revisions are in straight text under the quoted sections, and inline comments are enclosed in [].
 Introduction

 String processing is a kind of daily routine that most applications do
 in a one way or another.  It should come as no wonder that many
 programming languages have standard libraries stoked with specialized
 functions for common needs.

String processing is a common task performed by many applications. Many programming languages come with standard libraries that are equipped with a variety of functions for common string processing needs.

 The D programming language standard library among others offers a nice
 assortment in std.string and generic ones from std.algorithm.

The D programming language standard library also offers a nice assortment of such functions in std.string, as well as generic functions in std.algorithm that can also work with strings.
 Still no amount of fixed functionality could cover all needs, as
 naturally flexible text data needs flexible solutions.

Still no amount of predefined string functions could cover all needs. Text data is very flexible by nature, and so needs flexible solutions.
 Here is where regular expressions come in handy, often succinctly
 called as regexes.

This is where regular expressions, or regexes for short, come in.
 Simple yet powerful language for defining patterns of strings, put
 together with a substitution mechanism, forms a Swiss Army knife of
 text processing.

Regexes are a simple yet powerful language for defining patterns of strings, and when integrated with a substitution mechanism, forms a Swiss Army knife of text processing.
 It's considered so useful that a number of languages provides built-in
 support for regular expressions, though one though one should not jump
 to conclusion that built-in implies faster processing or more
 features. It's all about getting more convenient and friendly syntax
 for typical operations and usage patterns.

It's considered so useful that a number of languages provides built-in support for regular expressions. (This doesn't necessarily mean, however, that built-in implies faster processing or more features. It's more a matter of providing a more convenient and friendly syntax for typical operations and usage patterns.) [I think it's better to put the second part in parentheses, since it's not really the main point of this doc.]

I think putting that much in parens is a bad idea, but your wording is clearly superior.
 The D programming language provides a standard library module
 std.regex.

[OK]
 Being a highly expressive systems language, it opens a possibility to
 get a good look and feel via core features, while keeping it's
 implementation within the language.

Being a highly expressive systems language, D allows regexes to be implemented within the language itself, yet still have the same level of readability and usability that a built-in implementation would provide.

Nice!
 We'll see how close to built-ins one can get this way.

We will see below how close to built-in regexes we can achieve.
 By the end of article you'd have a good understanding of regular
 expression capabilities in this library, and how to utilize it's API
 in a most straightforward way.

By the end of this article, you will have a good understanding of the regular expression capabilities offered by this library, and how to utilize its API in the most straightforward way.
 Examples in this article assume the reader has fairly good
 understanding of regex elements, yet it's not required to get an
 understanding of the API.

Examples in this article assume that the reader has fairly good understanding of regex elements, but this is not required to get an understanding of the API. [I'll do this much for now. More to come later.]

Thanks. -- Dmitry Olshansky
Mar 14 2012