digitalmars.D.learn - Restrictions in std.regexp?

Olaf Pohlmann (10/10) May 02 2006 Hi,

Lionello Lunesu (4/13) May 02 2006 Use "AB(CD)EF" and re.match(1) ??
Derek Parnell (24/33) May 02 2006 I can't tell what it is you are trying to do but it seems that the RE

Olaf Pohlmann (20/22) May 02 2006 No. I'm looking for a string that is preceeded and followed by well
Olaf Pohlmann (13/14) May 02 2006 Oops, this is actually very close to the solution, just drop both '?'.

Olaf Pohlmann <op nospam.org> writes:

Hi,

the documentation of std.regexp is somewhat sparse, so I tried to find 
out a few things on my own. There seems to be no way to do lookaheads 
and lookbehinds. This:

	RegExp re = search("ABCDEF", "(?<=AB)CD(?=EF)");

should find "CD" as a match, but it yields a runtime error:

	Error: *+? not allowed in atom

Is there any other way to get this working or am I just out of luck with 
the current implementation?



op

May 02 2006

Lionello Lunesu <lio lunesu.remove.com> writes:

Olaf Pohlmann wrote:
 Hi,
 
 the documentation of std.regexp is somewhat sparse, so I tried to find 
 out a few things on my own. There seems to be no way to do lookaheads 
 and lookbehinds. This:
 
     RegExp re = search("ABCDEF", "(?<=AB)CD(?=EF)");
 
 should find "CD" as a match, but it yields a runtime error:

Use "AB(CD)EF" and re.match(1) ??
I'm very inexperienced with regexp, mind you :S

L.

May 02 2006

"Derek Parnell" <derek psych.ward> writes:

On Tue, 02 May 2006 23:39:13 +1000, Olaf Pohlmann <op nospam.org> wrote:

 Hi,

 the documentation of std.regexp is somewhat sparse, so I tried to find  
 out a few things on my own. There seems to be no way to do lookaheads  
 and lookbehinds. This:

 	RegExp re = search("ABCDEF", "(?<=AB)CD(?=EF)");

 should find "CD" as a match, but it yields a runtime error:

 	Error: *+? not allowed in atom

 Is there any other way to get this working or am I just out of luck with  
 the current implementation?

I can't tell what it is you are trying to do but it seems that the RE  
syntax you are expecting is not what has been implemented. See  
http:http://www.digitalmars.com/ctg/regular.html for details.

Are you looking for an optional "AB" followed by "CD" followed by an  
optional "EF" ?

If so try

     RegExp re = search("ABCDEF", "(AB)?(CD)(EF)?");

Here is a sample program ...

import std.stdio;
import std.regexp;

void main()
{
   RegExp re = search("AXCDEFGHI", "(AB)?(CD)(EF)?");

   writefln("PRE: %s", re.pre());
   writefln("MATCH: %s", re.match(0));
   writefln("SUB1: %s", re.match(1));
   writefln("SUB2: %s", re.match(2));  // this should be 'CD'
   writefln("SUB3: %s", re.match(3));
   writefln("POST: %s", re.post());
}

-- 
Derek Parnell
Melbourne, Australia

May 02 2006

Olaf Pohlmann <op nospam.org> writes:

Derek Parnell wrote:
 Are you looking for an optional "AB" followed by "CD" followed by an  
 optional "EF" ?

No. I'm looking for a string that is preceeded and followed by well 
defined other strings. The match should *not* return the whole sequence 
but only what is in the middle. It's actually about parsing some kind of 
text markup. If it was html like "<body><h1>Welcome</h1></body>" it 
should allow me to retrieve only the "Welcome". If you just use some 
grouping the match will be the whole <h1> element, so you have to 
extract the content in a 2nd step. The regexp with lookahead and 
lookbehind works fine in Python:

import re
html = "<body>\n<h1>Welcome</h1>\n</body>"
match = re.search("(?<=\<h1\>).*?(?=\</h1\>)", html)
html[m.start():m.end()]

This prints 'Welcome'.

The regexp is a bit hard to read, so see 
http://docs.python.org/lib/re-syntax.html for a description.

Now, I can retrieve the whole h1 element with the D version of regexps 
and then do another scan for the content but it would be nice to get it 
in one step, like in the Python version.


op

May 02 2006

Olaf Pohlmann <op nospam.org> writes:

Derek Parnell wrote:
     RegExp re = search("ABCDEF", "(AB)?(CD)(EF)?");

Oops, this is actually very close to the solution, just drop both '?'. 
It's even more readable than what I tried before:

import std.stdio;
import std.regexp;

void main()
{
     char[] html = "<body>\n<h1>Welcome</h1>\n</body>";
	RegExp re = search(html, r"(\<h1\>)(.*?)(\</h1\>)");
	if (re !is null)
		writefln("%s", re.match(2));
}



op

May 02 2006

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Restrictions in std.regexp?