digitalmars.D.learn - Question about using regex

James Oliphant (21/21) Mar 21 2012 While following the regex discussion, I have been compiling the examples...

Dmitry Olshansky (21/42) Mar 21 2012 Mm-hm it means the fix to use size_t by default is in upstream, but not

Dmitry Olshansky (5/51) Mar 21 2012 Oh wait, it's in this chapter :) I probably should make more noise about...

James Oliphant <jollie.roger gmail.com> writes:

While following the regex discussion, I have been compiling the examples 
to help with my understanding of how it works.

From Dmitry's example page:
	http://blackwhale.github.com/regular-expression.html
and from the dlang.org website:
	http://dlang.org/phobos/std_regex.html

std.regex.replace calls a delegate
	auto delegate(Captures!string)
which does not compile.  The definition in Phobos for Captures is
	struct Captures(R,DIndex)
and for the purposes of these examples changing the delegate to
	auto delegate(Captures!(string,uint))
seems to work.  Is this correct?


In another example on Dmitry's page that starts:
	auto m = match("Ranges are hot!", r"(\w)\w*(\w)"); //at least 3 
"word" symbols
The output from the example is "Ranges, R, s", but I don't quite 
understand why those where the matches in this case.  Also does the 
regular expression imply match at least 2 "word" symbols where \w* means 
match 0 or more "word" symbols?

These newsgroups are a great resource, keep up the great work!

Mar 21 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 21.03.2012 20:05, James Oliphant wrote:
 While following the regex discussion, I have been compiling the examples
 to help with my understanding of how it works.

  From Dmitry's example page:
 	http://blackwhale.github.com/regular-expression.html
 and from the dlang.org website:
 	http://dlang.org/phobos/std_regex.html

 std.regex.replace calls a delegate
 	auto delegate(Captures!string)
 which does not compile.  The definition in Phobos for Captures is
 	struct Captures(R,DIndex)
 and for the purposes of these examples changing the delegate to
 	auto delegate(Captures!(string,uint))
 seems to work.  Is this correct?

Mm-hm it means the fix to use size_t by default is in upstream, but not 
in 2.058 I think. User needs not to specify index type, this is a hook 
for future extension.

 In another example on Dmitry's page that starts:
 	auto m = match("Ranges are hot!", r"(\w)\w*(\w)"); //at least 3
 "word" symbols
 The output from the example is "Ranges, R, s", but I don't quite
 understand why those where the matches in this case.


Ok, \w matches any single word character, that is alpha, numeric or one 
of few other oddities*.
Now (\w) captures 1 character into 1st _submatch_ ('R').
\w* captures the rest the gets reverted so that the next (\w) matches
The second (\w) thus captures last char ('s') into 2nd _submatch_
captures lists submatches captured during one match, [0] is the whole match.

I get it that people tend to think that I was about to show multiple 
_matches_ here, but that belongs to the next chapter. Here I was just 
showing how to work with submatches, that needs to be stressed somehow.


*This is enormously useful tool to get info on unicode stuff and regex 
in particular
http://unicode.org/cldr/utility/index.jsp


Also does the
 regular expression imply match at least 2 "word" symbols where \w* means
 match 0 or more "word" symbols?

Yup, that's right at least 2, I should correct wording.

 These newsgroups are a great resource, keep up the great work!

You are welcome.

-- 
Dmitry Olshansky

Mar 21 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 21.03.2012 21:13, Dmitry Olshansky wrote:
 On 21.03.2012 20:05, James Oliphant wrote:
 While following the regex discussion, I have been compiling the examples
 to help with my understanding of how it works.

 From Dmitry's example page:
 http://blackwhale.github.com/regular-expression.html
 and from the dlang.org website:
 http://dlang.org/phobos/std_regex.html

 std.regex.replace calls a delegate
 auto delegate(Captures!string)
 which does not compile. The definition in Phobos for Captures is
 struct Captures(R,DIndex)
 and for the purposes of these examples changing the delegate to
 auto delegate(Captures!(string,uint))
 seems to work. Is this correct?

 Mm-hm it means the fix to use size_t by default is in upstream, but not
 in 2.058 I think. User needs not to specify index type, this is a hook
 for future extension.

 In another example on Dmitry's page that starts:
 auto m = match("Ranges are hot!", r"(\w)\w*(\w)"); //at least 3
 "word" symbols
 The output from the example is "Ranges, R, s", but I don't quite
 understand why those where the matches in this case.


 Ok, \w matches any single word character, that is alpha, numeric or one
 of few other oddities*.
 Now (\w) captures 1 character into 1st _submatch_ ('R').
 \w* captures the rest the gets reverted so that the next (\w) matches
 The second (\w) thus captures last char ('s') into 2nd _submatch_
 captures lists submatches captured during one match, [0] is the whole
 match.

 I get it that people tend to think that I was about to show multiple
 _matches_ here, but that belongs to the next chapter. Here I was just
 showing how to work with submatches, that needs to be stressed somehow.

Oh wait, it's in this chapter :) I probably should make more noise about 
"g" flag, and separate submatches from range of matches more cleanly.

 *This is enormously useful tool to get info on unicode stuff and regex
 in particular
 http://unicode.org/cldr/utility/index.jsp


 Also does the
 regular expression imply match at least 2 "word" symbols where \w* means
 match 0 or more "word" symbols?

 Yup, that's right at least 2, I should correct wording.

 These newsgroups are a great resource, keep up the great work!

 You are welcome.


-- 
Dmitry Olshansky

Mar 21 2012

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Question about using regex