digitalmars.D.learn - regex issue

Joshua Niehus (24/24) Mar 15 2012 Hello,

Dmitry Olshansky (6/30) Mar 16 2012 Ehm, because they have different engines that _should_ give identical

Joshua Niehus (4/8) Mar 16 2012 Ok, submitted: id 7718

Dmitry Olshansky (7/14) Mar 17 2012 And the fix is coming

Jay Norwood (31/36) Mar 18 2012 I'm also having questions about the matchers. From what I

Dmitry Olshansky (14/53) Mar 19 2012 Like I told in main D group it's wrong - regex doesn't only count

Dmitry Olshansky (5/39) Mar 19 2012 I'm curious what this length() does as I have no length for RegexMatch

Jay Norwood (5/19) Mar 19 2012 http://dlang.org/phobos/std_regex.html#length

Dmitry Olshansky (5/24) Mar 19 2012 Captures is a range of submatches as in "(a)(b)(c)" has 3 sub matches +

Jay Norwood (23/33) Mar 19 2012 ok, global. So the document implies that I should be able to get

Jay Norwood (7/30) Mar 19 2012 so, to answer my own question, it appears that the (regex) is

Dmitry Olshansky (12/44) Mar 19 2012 That's right, however counting is completely separate from regex, you'd

Jay Norwood (27/41) Mar 19 2012 This only sets l_cnt to 1

Jay Norwood (20/33) Mar 19 2012 So I tried something a little different, and this apparently gets
Dmitry Olshansky (15/44) Mar 20 2012 Ehm, forgot "g" flag myself, so it would be

Jay Norwood (21/50) Mar 20 2012 ok, I'll use memchr.

James Miller (10/14) Mar 20 2012 see now

Dmitry Olshansky (12/43) Mar 19 2012 Maybe a replacement of submatch ---> capture helps. But I thought it was...

"Joshua Niehus" <jm.niehus gmail.com> writes:

Hello,

Does anyone know why I would get different results between
ctRegex and regex in the following snippet?

Thanks,
Josh

---

import std.stdio, std.regex;

void main() {
     string strcmd = "./myApp.rb -os OSX -path \"/GIT/Ruby
Apps/sec\" -conf 'no timer'";

     auto ctre = ctRegex!(`(".*")|('.*')`, "g");
     auto   re =   regex (`(".*")|('.*')`, "g");

     auto ctm = match(strcmd, ctre);
     foreach(ct; ctm)
       writeln(ct.hit());

     auto m = match(strcmd, re);
     foreach(h; m)
       writeln(h.hit());
}
/* output */
"/GIT/Ruby Apps/sec"
'no timer'
"/GIT/Ruby Apps/sec"

Mar 15 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 16.03.2012 7:36, Joshua Niehus wrote:
 Hello,

 Does anyone know why I would get different results between
 ctRegex and regex in the following snippet?

Ehm, because they have different engines that _should_ give identical 
results. And the default one apparently has a bug, that I'm looking into.
Fill the bug report plz.

 Thanks,
 Josh

 ---

 import std.stdio, std.regex;

 void main() {
 string strcmd = "./myApp.rb -os OSX -path \"/GIT/Ruby
 Apps/sec\" -conf 'no timer'";

 auto ctre = ctRegex!(`(".*")|('.*')`, "g");
 auto re = regex (`(".*")|('.*')`, "g");

 auto ctm = match(strcmd, ctre);
 foreach(ct; ctm)
 writeln(ct.hit());

 auto m = match(strcmd, re);
 foreach(h; m)
 writeln(h.hit());
 }
 /* output */
 "/GIT/Ruby Apps/sec"
 'no timer'
 "/GIT/Ruby Apps/sec"


-- 
Dmitry Olshansky

Mar 16 2012

"Joshua Niehus" <jm.niehus gmail.com> writes:

On Friday, 16 March 2012 at 08:34:18 UTC, Dmitry Olshansky wrote:
 Ehm, because they have different engines that _should_ give 
 identical results. And the default one apparently has a bug, 
 that I'm looking into.
 Fill the bug report plz.

Ok, submitted: id 7718

Thanks,
Josh

Mar 16 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 16.03.2012 20:05, Joshua Niehus wrote:
 On Friday, 16 March 2012 at 08:34:18 UTC, Dmitry Olshansky wrote:
 Ehm, because they have different engines that _should_ give identical
 results. And the default one apparently has a bug, that I'm looking into.
 Fill the bug report plz.

 Ok, submitted: id 7718

 Thanks,
 Josh

And the fix is coming
https://github.com/D-Programming-Language/phobos/pull/462

I take this time to also thank you, as this was interestingly big 
oversight in that engine code that revealed to me some fundamental things.

-- 
Dmitry Olshansky

Mar 17 2012

"Jay Norwood" <jayn prismnet.com> writes:

On Friday, 16 March 2012 at 03:36:12 UTC, Joshua Niehus wrote:
 Hello,

 Does anyone know why I would get different results between
 ctRegex and regex in the following snippet?

 Thanks,
 Josh

I'm also having questions about the matchers.  From what I 
understand in the docs, if I use this greedy matcher to count 
lines, it should have counted all the lines in the first match 
(when I hade it outside the foreach.  In that case, I should have 
been able to do something like:

matches=match(input,ctr);
l_cnt = matches.length();

But I only get length=1, and so I'm a bit concerned that greedy 
is not really working. In fact, it is about 3x faster to just run 
the second piece of code, so I think something must be wrong...


void wcp_ctRegex(string fn)
{
	string input = cast(string)std.file.read(fn);
	enum ctr =  ctRegex!("\n","g");
	ulong l_cnt;
	foreach(m; match(input,ctr))
	{
		l_cnt ++;
	}
}


void wcp_char(string fn)
{
	string input = cast(string)std.file.read(fn);
	ulong l_cnt;
	foreach(c; input)
	{
		if (c == '\n')
		l_cnt ++;
	}
}

Mar 18 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 19.03.2012 6:50, Jay Norwood wrote:
 On Friday, 16 March 2012 at 03:36:12 UTC, Joshua Niehus wrote:
 Hello,

 Does anyone know why I would get different results between
 ctRegex and regex in the following snippet?

 Thanks,
 Josh

 I'm also having questions about the matchers. From what I understand in
 the docs, if I use this greedy matcher to count lines, it should have
 counted all the lines in the first match (when I hade it outside the
 foreach.

Like I told in main D group it's wrong - regex doesn't only count 
matches. It finds slices that do match.
Thus to make it more efficient, it returns lazy range that does searches 
on request. "g" - means global :)
Then code like this is cool and fast:
foreach(m; match(input, ctr))
{
	if(m.hit == "magic we are looking for")
		break; // <<< ---- no greedy find it all syndrome
}

  In that case, I should have been able to do something like:
 matches=match(input,ctr);
 l_cnt = matches.length();

 But I only get length=1, and so I'm a bit concerned that greedy is not
 really working. In fact, it is about 3x faster to just run the second
 piece of code, so I think something must be wrong...


 void wcp_ctRegex(string fn)
 {
 string input = cast(string)std.file.read(fn);
 enum ctr = ctRegex!("\n","g");
 ulong l_cnt;
 foreach(m; match(input,ctr))
 {
 l_cnt ++;
 }
 }


 void wcp_char(string fn)
 {
 string input = cast(string)std.file.read(fn);
 ulong l_cnt;
 foreach(c; input)
 {
 if (c == '\n')
 l_cnt ++;
 }
 }


-- 
Dmitry Olshansky

Mar 19 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 19.03.2012 12:05, Dmitry Olshansky wrote:
 On 19.03.2012 6:50, Jay Norwood wrote:
 On Friday, 16 March 2012 at 03:36:12 UTC, Joshua Niehus wrote:
 Hello,

 Does anyone know why I would get different results between
 ctRegex and regex in the following snippet?

 Thanks,
 Josh

 I'm also having questions about the matchers. From what I understand in
 the docs, if I use this greedy matcher to count lines, it should have
 counted all the lines in the first match (when I hade it outside the
 foreach.

 Like I told in main D group it's wrong - regex doesn't only count
 matches. It finds slices that do match.
 Thus to make it more efficient, it returns lazy range that does searches
 on request. "g" - means global :)
 Then code like this is cool and fast:
 foreach(m; match(input, ctr))
 {
 if(m.hit == "magic we are looking for")
 break; // <<< ---- no greedy find it all syndrome
 }

 In that case, I should have been able to do something like:
 matches=match(input,ctr);
 l_cnt = matches.length();


I'm curious what this length() does as I have no length for RegexMatch 
in the API :)

 But I only get length=1, and so I'm a bit concerned that greedy is not
 really working. In fact, it is about 3x faster to just run the second
 piece of code, so I think something must be wrong...



-- 
Dmitry Olshansky

Mar 19 2012

"Jay Norwood" <jayn prismnet.com> writes:

On Monday, 19 March 2012 at 08:14:18 UTC, Dmitry Olshansky wrote:
 On 19.03.2012 12:05, Dmitry Olshansky wrote:
 In that case, I should have been able to do something like:
 matches=match(input,ctr);
 l_cnt = matches.length();


 I'm curious what this length() does as I have no length for 
 RegexMatch in the API :)

 But I only get length=1, and so I'm a bit concerned that 
 greedy is not
 really working. In fact, it is about 3x faster to just run 
 the second
 piece of code, so I think something must be wrong...



http://dlang.org/phobos/std_regex.html#length

Yes, I should have typed matches.captures.length.  It is  always 
returning 1, even though the desciption indicates the "g" flag 
should create a match object that contains all the submatches.

Mar 19 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 19.03.2012 16:59, Jay Norwood wrote:
 On Monday, 19 March 2012 at 08:14:18 UTC, Dmitry Olshansky wrote:
 On 19.03.2012 12:05, Dmitry Olshansky wrote:
 In that case, I should have been able to do something like:
 matches=match(input,ctr);
 l_cnt = matches.length();


 I'm curious what this length() does as I have no length for RegexMatch
 in the API :)

 But I only get length=1, and so I'm a bit concerned that greedy is not
 really working. In fact, it is about 3x faster to just run the second
 piece of code, so I think something must be wrong...



 http://dlang.org/phobos/std_regex.html#length

 Yes, I should have typed matches.captures.length. It is always returning
 1, even though the desciption indicates the "g" flag should create a
 match object that contains all the submatches.

Captures is a range of submatches as in "(a)(b)(c)" has 3 sub matches + 
1 whole match == 4.

-- 
Dmitry Olshansky

Mar 19 2012

"Jay Norwood" <jayn prismnet.com> writes:

On Monday, 19 March 2012 at 08:05:18 UTC, Dmitry Olshansky wrote:
 Like I told in main D group it's wrong - regex doesn't only 
 count matches. It finds slices that do match.
 Thus to make it more efficient, it returns lazy range that does 
 searches on request. "g" - means global :)
 Then code like this is cool and fast:
 foreach(m; match(input, ctr))
 {
 	if(m.hit == "magic we are looking for")
 		break; // <<< ---- no greedy find it all syndrome
 }

ok, global.  So the document implies that I should be able to get 
a single match object with a count of the submatches.  So I think 
maybe I've jumped to the wrong conclusion about how to use it, 
thinking I could just use "\n" and "g" flag got get all the 
matches for the range of "\n".  So it looks like instead that the 
term "submatches" needs more explanation.  What exactly 
constitutes a submatch?  I infered it just meant any single match 
among many.

   //create static regex at compile-time, contains fast native code
   enum ctr = ctRegex!(`^.*/([^/]+)/?$`);

   //works just like normal regex:
   auto m2 = match("foo/bar", ctr);   //first match found here if 
any
   assert(m2);   // be sure to check if there is a match, before 
examining contents!
   assert(m2.captures[1] == "bar");//captures is a range of 
submatches, 0 - full match


btw, I couldn't get this \p option to work for the uni 
properties.  Can you provide some example of that which works?

\p{PropertyName}  Matches character that belongs to unicode 
PropertyName set. Single letter abreviations could be used 
without surrounding {,}.

Mar 19 2012

"Jay Norwood" <jayn prismnet.com> writes:

On Monday, 19 March 2012 at 13:27:03 UTC, Jay Norwood wrote:
 ok, global.  So the document implies that I should be able to 
 get a single match object with a count of the submatches.  So I 
 think maybe I've jumped to the wrong conclusion about how to 
 use it, thinking I could just use "\n" and "g" flag got get all 
 the matches for the range of "\n".  So it looks like instead 
 that the term "submatches" needs more explanation.  What 
 exactly constitutes a submatch?  I infered it just meant any 
 single match among many.

   //create static regex at compile-time, contains fast native 
 code
   enum ctr = ctRegex!(`^.*/([^/]+)/?$`);

   //works just like normal regex:
   auto m2 = match("foo/bar", ctr);   //first match found here 
 if any
   assert(m2);   // be sure to check if there is a match, before 
 examining contents!
   assert(m2.captures[1] == "bar");//captures is a range of 
 submatches, 0 - full match


 btw, I couldn't get this \p option to work for the uni 
 properties.  Can you provide some example of that which works?

 \p{PropertyName}  Matches character that belongs to unicode 
 PropertyName set. Single letter abreviations could be used 
 without surrounding {,}.


so, to answer my own question,  it appears that the (regex) is 
the portion that is considered a submatch that gets counted.

so counting lines would be something that has a (\n) in it, 
although I'll have to figure out what that will be exactly.


(regex)  Matches subexpression regex, saving matched portion of 
text for later retrival.

Mar 19 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 19.03.2012 17:39, Jay Norwood wrote:
 On Monday, 19 March 2012 at 13:27:03 UTC, Jay Norwood wrote:
 ok, global. So the document implies that I should be able to get a
 single match object with a count of the submatches. So I think maybe
 I've jumped to the wrong conclusion about how to use it, thinking I
 could just use "\n" and "g" flag got get all the matches for the range
 of "\n". So it looks like instead that the term "submatches" needs
 more explanation. What exactly constitutes a submatch? I infered it
 just meant any single match among many.

 //create static regex at compile-time, contains fast native code
 enum ctr = ctRegex!(`^.*/([^/]+)/?$`);

 //works just like normal regex:
 auto m2 = match("foo/bar", ctr); //first match found here if any
 assert(m2); // be sure to check if there is a match, before examining
 contents!
 assert(m2.captures[1] == "bar");//captures is a range of submatches, 0
 - full match


 btw, I couldn't get this \p option to work for the uni properties. Can
 you provide some example of that which works?

 \p{PropertyName} Matches character that belongs to unicode
 PropertyName set. Single letter abreviations could be used without
 surrounding {,}.


 so, to answer my own question, it appears that the (regex) is the
 portion that is considered a submatch that gets counted.

 so counting lines would be something that has a (\n) in it, although
 I'll have to figure out what that will be exactly.

That's right, however counting is completely separate from regex, you'd 
want to use std.algorithm count:
count(match(....,"\n"));

or more unicode-friendly:
count(match(...., regex("$","m")); //note the multi-line flag

Also observe that there is simply no way to get more then constant 
number of submatches.

 (regex) Matches subexpression regex, saving matched portion of text for
 later retrival.

An example of unicode properties:
\p{WhiteSpace} matches any unicode whitespace char


-- 
Dmitry Olshansky

Mar 19 2012

"Jay Norwood" <jayn prismnet.com> writes:

On Monday, 19 March 2012 at 13:55:39 UTC, Dmitry Olshansky wrote:
 That's right, however counting is completely separate from 
 regex, you'd want to use std.algorithm count:
 count(match(....,"\n"));

 or more unicode-friendly:
 count(match(...., regex("$","m")); //note the multi-line flag

This only sets l_cnt to 1

void wcp_cnt_match1 (string fn)
{
	string input = cast(string)std.file.read(fn);
	enum ctr =  ctRegex!("$","m");
	ulong l_cnt = std.algorithm.count(match(input,ctr));
}

This works ok, but though concise it is not very fast

void wcp (string fn)
{
	string input = cast(string)std.file.read(fn);
      ulong l_cnt = std.algorithm.count(input,"\n");
}


 Also observe that there is simply no way to get more then 
 constant number of submatches.

 (regex) Matches subexpression regex, saving matched portion of 
 text for
 later retrival.

 An example of unicode properties:
 \p{WhiteSpace} matches any unicode whitespace char

This fails to build, so I'd guess is missing \p

void wcp (string fn)
{
	enum ctr =  ctRegex!("\p{WhiteSpace}","m");
}

------ Build started: Project: a7, Configuration: Release Win32
------
Building Release\a7.exe...
a7.d(210): undefined escape sequence \p

Building Release\a7.exe failed!
Details saved as "file://G:\d\a7\a7\Release\a7.buildlog.html"
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped
==========

Mar 19 2012

"Jay Norwood" <jayn prismnet.com> writes:

On Monday, 19 March 2012 at 19:24:30 UTC, Jay Norwood wrote:
 This fails to build, so I'd guess is missing \p

 void wcp (string fn)
 {
 	enum ctr =  ctRegex!("\p{WhiteSpace}","m");
 }

 ------ Build started: Project: a7, Configuration: Release Win32
 ------
 Building Release\a7.exe...
 a7.d(210): undefined escape sequence \p

 Building Release\a7.exe failed!
 Details saved as "file://G:\d\a7\a7\Release\a7.buildlog.html"
 ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped
 ==========

So I tried something a little different, and this apparently gets 
further along to another error message.  But it looks like at 
this point it decides that the unicode properties are not 
available at compile time...


void wcp_bug_no_p(string fn)
{
	enum ctr =  ctRegex!(r"\p{WhiteSpace}","m");
}


------ Build started: Project: a7, Configuration: Debug Win32 
------
Building Debug\a7.exe...
G:\d\dmd2\windows\bin\..\..\src\phobos\std\regex.d(786): Error: 
static variable unicodeProperties cannot be read at compile time
G:\d\dmd2\windows\bin\..\..\src\phobos\std\regex.d(786):        
called from here: assumeSorted(unicodeProperties)
G:\d\dmd2\windows\bin\..\..\src\phobos\std\regex.d(1937):        
called from here: 
getUnicodeSet(result[0u..k],negated,cast(bool)(this.re_flags & 
cast(RegexOption)2u))

Mar 19 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 19.03.2012 23:24, Jay Norwood wrote:
 On Monday, 19 March 2012 at 13:55:39 UTC, Dmitry Olshansky wrote:
 That's right, however counting is completely separate from regex,
 you'd want to use std.algorithm count:
 count(match(....,"\n"));

 or more unicode-friendly:
 count(match(...., regex("$","m")); //note the multi-line flag


Ehm, forgot "g" flag myself, so it would be

count(match(...., regex("$","gm"));

and

count(match(...., regex("\n","g"));

Note that if your task is to split buffer by exactly '\n' byte then loop 
with memchr is about as fast as it gets, no amount of magic compiler 
optimizations would make other generic ways better (even theoretically). 
What they *could* do is bring the difference lower.

 This only sets l_cnt to 1

 void wcp_cnt_match1 (string fn)
 {
 string input = cast(string)std.file.read(fn);
 enum ctr = ctRegex!("$","m");
 ulong l_cnt = std.algorithm.count(match(input,ctr));
 }

 This works ok, but though concise it is not very fast

 void wcp (string fn)
 {
 string input = cast(string)std.file.read(fn);
 ulong l_cnt = std.algorithm.count(input,"\n");
 }

BTW I suggest to separate I/O from actual work or better yet, time both 
separately via std.datetime.StopWatch.

 This fails to build, so I'd guess is missing \p

 void wcp (string fn)
 {
 enum ctr = ctRegex!("\p{WhiteSpace}","m");
 }

 ------ Build started: Project: a7, Configuration: Release Win32
 ------
 Building Release\a7.exe...
 a7.d(210): undefined escape sequence \p

Not a bug, a compiler escape sequence.
How do you think \n works in your non-regex examples ? ;)


-- 
Dmitry Olshansky

Mar 20 2012

"Jay Norwood" <jayn prismnet.com> writes:

On Tuesday, 20 March 2012 at 10:28:11 UTC, Dmitry Olshansky wrote:
 Note that if your task is to split buffer by exactly '\n' byte 
 then loop with memchr is about as fast as it gets, no amount of 
 magic compiler optimizations would make other generic ways 
 better (even theoretically). What they *could* do is bring the 
 difference lower.

ok, I'll use memchr.

  >> This works ok, but though concise it is not very fast
 void wcp (string fn)
 {
 string input = cast(string)std.file.read(fn);
 ulong l_cnt = std.algorithm.count(input,"\n");
 }

 BTW I suggest to separate I/O from actual work or better yet, 
 time both separately via std.datetime.StopWatch.

I'm timing with the stopwatch.  I have separate functions where 
I've measured empty func, just the file reads with empty loop, so 
I can see the deltas.  All these are being executed inside a 
parallel foreach loop ... so 7 threads reading different files, 
and since that is the end target, the overall measurement in the 
context is more meaningful to me.  The file io is on the order of 
25ms for chunk reads or 30ms for full file reads in these 
results, as it is all reads of about 20MB for the full test from 
a 510 series ssd drive with sata3.  The reads are being done in 
parallel by the threads in the threadpool.  Each file is 2MB.   
So any total times you see in my comments are for 10 tasks being 
executed in a parallel foreach loop, with the file read portion 
previously timed at around 30ms.
 This fails to build, so I'd guess is missing \p

 void wcp (string fn)
 {
 enum ctr = ctRegex!("\p{WhiteSpace}","m");
 }

 ------ Build started: Project: a7, Configuration: Release Win32
 ------
 Building Release\a7.exe...
 a7.d(210): undefined escape sequence \p

 Not a bug, a compiler escape sequence.
 How do you think \n works in your non-regex examples ? ;)

yes, thanks.  I read your other link and that was helpful.   I 
think I presumed that the escape handling was something belonging 
to stdio, while regex would have its own valid escapes that would 
include \p.  But I see now that the string literals have their 
own set of escapes.

Mar 20 2012

James Miller <james aatch.net> writes:

On 21 March 2012 04:26, Jay Norwood <jayn prismnet.com> wrote:
 yes, thanks. =C2=A0I read your other link and that was helpful. =C2=A0 I =

think I
 presumed that the escape handling was something belonging to stdio, while
 regex would have its own valid escapes that would include \p. =C2=A0But I=

 see now
 that the string literals have their own set of escapes.

Can you imagine the madness if escapes were specific to stdio, or some
other library! "Ok, and I'll just send this newline over the
network... Dammit, std.network doesn't escape \n". Also means that you
have perfect consistency between usages of strings, no strange other
usages of the same escape sequence...

--
James Miller

Mar 20 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 19.03.2012 17:27, Jay Norwood wrote:
 On Monday, 19 March 2012 at 08:05:18 UTC, Dmitry Olshansky wrote:
 Like I told in main D group it's wrong - regex doesn't only count
 matches. It finds slices that do match.
 Thus to make it more efficient, it returns lazy range that does
 searches on request. "g" - means global :)
 Then code like this is cool and fast:
 foreach(m; match(input, ctr))
 {
 if(m.hit == "magic we are looking for")
 break; // <<< ---- no greedy find it all syndrome
 }

 ok, global. So the document implies that I should be able to get a
 single match object with a count of the submatches. So I think maybe
 I've jumped to the wrong conclusion about how to use it, thinking I
 could just use "\n" and "g" flag got get all the matches for the range
 of "\n". So it looks like instead that the term "submatches" needs more
 explanation. What exactly constitutes a submatch? I infered it just
 meant any single match among many.

Maybe a replacement of submatch ---> capture helps. But I thought it was 
easy to get that any subexpression in regex e.g. "(\w+)" is captured 
into submatch. Are you aware sub-expressions in regex are also extracted 
from the text?

 //create static regex at compile-time, contains fast native code
 enum ctr = ctRegex!(`^.*/([^/]+)/?$`);

 //works just like normal regex:
 auto m2 = match("foo/bar", ctr); //first match found here if any
 assert(m2); // be sure to check if there is a match, before examining
 contents!
 assert(m2.captures[1] == "bar");//captures is a range of submatches, 0 -
 full match

BTW, In the above example what captures are should be clearly visible.

 btw, I couldn't get this \p option to work for the uni properties. Can
 you provide some example of that which works?

 \p{PropertyName} Matches character that belongs to unicode PropertyName
 set. Single letter abreviations could be used without surrounding {,}.

Ouch, I see that docs are no good :)
But well, they are reference-like anyway, you might want to take a look 
for more healthy and lengthy overview:
http://blackwhale.github.com/regular-expression.html


-- 
Dmitry Olshansky

Mar 19 2012

D Programming

C/C++ Programming

Other

digitalmars.D.learn - regex issue