www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - negative assertion support for RegExp?

reply =?UTF-8?B?VGhvbWFzIEvDvGhuZQ==?= writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Is there any D library that offers regular expressions with negative
assertion support?

There seems to be no documented way to use negative assertions in
Phobo's regular expressions. (http://digitalmars.com/ctg/regular.html)

Usually the syntax "(?!doNotMatch)" is used for that on Linux systems.


Thomas


- -- sample code ---
import std.regexp;
import std.stdio;
	
int main(){
	char[] log=
		"IP:127.0.0.1; USER:some; additional info\n"
		"IP:123.3.8.0; USER:other; additional info\n";

	char[] pattern = "^(?!IP:(127[.]0[.]0[.]1)); USER:([^; ]*);";
	char[] format = "; USER:$2 $1;";
	char[] attributes = "g";	

	char[] filtered = sub(log, pattern, format, attributes);
	
	writef("---unfiltered---\n%s\n", log);
	writef("---filtered---\n%s\n", filtered);
	
	return 0;
}

/* Expected Output:

- ---unfiltered---
IP:127.0.0.1; USER:some; additional info
IP:123.3.8.17; USER:other; additional info

- ---filtered---
IP:127.0.0.1; USER:some; additional info
IP:123.3.8.17; USER:other 123.3.8.17; additional info

*/
-----BEGIN PGP SIGNATURE-----

iD4DBQFC/ec13w+/yD4P9tIRAh+7AJ9kLB27xKffpuoXhbkuT34WDP/DYQCYo1x7
r0vTnBDmV/cn7+gjOfKbyA==
=Ep0M
-----END PGP SIGNATURE-----
Aug 13 2005
next sibling parent reply Manfred Nowak <svv1999 hotmail.com> writes:
=?UTF-8?B?VGhvbWFzIEvDvGhuZQ==?=
<thomas-dloop kuehne.THISISSPAM.cn> wrote: 

[...] 
 Is there any D library that offers regular expressions with
 negative assertion support?

Why do you need such? With a little bit of programming with split, find and rfind you should be able to use std.regexp for that purpose. -manfred
Aug 13 2005
parent reply AJG <AJG_member pathlink.com> writes:
In article <ddmogt$19aq$1 digitaldaemon.com>, Manfred Nowak says...
=?UTF-8?B?VGhvbWFzIEvDvGhuZQ==?=
<thomas-dloop kuehne.THISISSPAM.cn> wrote: 

[...] 
 Is there any D library that offers regular expressions with
 negative assertion support?

Why do you need such? With a little bit of programming with split, find and rfind you should be able to use std.regexp for that purpose.

To save himself that bit of programming? ;) Regexes are currently somewhat limited in phobos. I find myself missing Perl features all the time. --AJG.
Aug 14 2005
parent reply =?UTF-8?B?VGhvbWFzIEvDvGhuZQ==?= writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

AJG schrieb:
 In article <ddmogt$19aq$1 digitaldaemon.com>, Manfred Nowak says...
 
=?UTF-8?B?VGhvbWFzIEvDvGhuZQ==?=
<thomas-dloop kuehne.THISISSPAM.cn> wrote: 

[...] 

Is there any D library that offers regular expressions with
negative assertion support?

[...] Why do you need such? With a little bit of programming with split, find and rfind you should be able to use std.regexp for that purpose.

To save himself that bit of programming? ;) Regexes are currently somewhat limited in phobos. I find myself missing Perl features all the time.

What I gave was a very simple regex. The production ones are nested, include alternatives and contain more than one negative assertion. Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFC/vW/3w+/yD4P9tIRAtJRAKDTMJZFmrQ1UNfbZYGQkTCqFAWFPwCgxPrt JjSTewdoQtJzw4FSrh+YA3c= =Ee8d -----END PGP SIGNATURE-----
Aug 14 2005
parent Manfred Nowak <svv1999 hotmail.com> writes:
=?UTF-8?B?VGhvbWFzIEvDvGhuZQ==?=
<thomas-dloop kuehne.THISISSPAM.cn> wrote: 

[...]
 What I gave was a very simple regex. The production ones are
 nested, include alternatives and contain more than one negative
 assertion. 

Then I do not believe, that an approach with RE's and "assertions" is feasable in terms of run time requirements in first place, but also in terms of time for development and maintenance, because you are implementing some sort of lexer/parser for a language you do not have an explicit formal grammar for nor the definitions for the lexical tokens. I do not know the details of the implementation of PCRE, but I do not believe, that a tool that has its emphasis on RE's incidentally also implements an LALR-parser. -manfred
Aug 14 2005
prev sibling parent reply AJG <AJG_member pathlink.com> writes:
Hi Thomas,

Actually, I ported PCRE version 5 to D about a month ago when Walter told me
phobos didn't support named groups. AFAIK it works correctly; I compiled the
test program (a version of grep) and it didn't show any errors.

The only problem is that it's not object-oriented (it's the C API).

Anyway, I'm going to upload the code and maybe you can use that.
You can find example code in main.d. All you need essentially is:

# import pcre;

And off you go. If you have Build you can do:

% build main

And that's it.

Let me know if you find it useful. If there's enough interest, I could develop a
D-based OO interface for it, and maybe Walter will consider it for inclusion in
phobos to replace the old regex.

Some technical notes:

I ported the code with SUPPORT_UTF8, but _not_ with SUPPORT_UCP because that was
just a lot of bloat. Also, the LINK_SIZE I selected was 2, the default.

Here's the link:

http://pantheon.yale.edu/~ajg36/pcre.zip

Enjoy!
--AJG.



In article <ddkoss$2u5m$1 digitaldaemon.com>, =?UTF-8?B?VGhvbWFzIEvDvGhuZQ==?=
says...
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Is there any D library that offers regular expressions with negative
assertion support?

There seems to be no documented way to use negative assertions in
Phobo's regular expressions. (http://digitalmars.com/ctg/regular.html)

Usually the syntax "(?!doNotMatch)" is used for that on Linux systems.


Thomas


- -- sample code ---
import std.regexp;
import std.stdio;
	
int main(){
	char[] log=
		"IP:127.0.0.1; USER:some; additional info\n"
		"IP:123.3.8.0; USER:other; additional info\n";

	char[] pattern = "^(?!IP:(127[.]0[.]0[.]1)); USER:([^; ]*);";
	char[] format = "; USER:$2 $1;";
	char[] attributes = "g";	

	char[] filtered = sub(log, pattern, format, attributes);
	
	writef("---unfiltered---\n%s\n", log);
	writef("---filtered---\n%s\n", filtered);
	
	return 0;
}

/* Expected Output:

- ---unfiltered---
IP:127.0.0.1; USER:some; additional info
IP:123.3.8.17; USER:other; additional info

- ---filtered---
IP:127.0.0.1; USER:some; additional info
IP:123.3.8.17; USER:other 123.3.8.17; additional info

*/
-----BEGIN PGP SIGNATURE-----

iD4DBQFC/ec13w+/yD4P9tIRAh+7AJ9kLB27xKffpuoXhbkuT34WDP/DYQCYo1x7
r0vTnBDmV/cn7+gjOfKbyA==
=Ep0M
-----END PGP SIGNATURE-----

Aug 14 2005
next sibling parent Derek Parnell <derek psych.ward> writes:
On Sun, 14 Aug 2005 08:02:41 +0000 (UTC), AJG wrote:

 Hi Thomas,
 
 Actually, I ported PCRE version 5 to D about a month ago when Walter told me
 phobos didn't support named groups. AFAIK it works correctly; I compiled the
 test program (a version of grep) and it didn't show any errors.
 
 The only problem is that it's not object-oriented (it's the C API).

I don't see that O-O is a requirement. A simple procedural API is quite satisfactory. -- Derek Parnell Melbourne, Australia 14/08/2005 11:07:28 PM
Aug 14 2005
prev sibling parent =?UTF-8?B?VGhvbWFzIEvDvGhuZQ==?= writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

AJG schrieb:
 Hi Thomas,
 
 Actually, I ported PCRE version 5 to D about a month ago when Walter told me
 phobos didn't support named groups. AFAIK it works correctly; I compiled the
 test program (a version of grep) and it didn't show any errors.
 
 The only problem is that it's not object-oriented (it's the C API).
 
 Anyway, I'm going to upload the code and maybe you can use that.
 You can find example code in main.d. All you need essentially is:
 
 # import pcre;
 
 And off you go. If you have Build you can do:
 
 % build main
 
 And that's it.
 
 Let me know if you find it useful. If there's enough interest, I could develop
a
 D-based OO interface for it, and maybe Walter will consider it for inclusion in
 phobos to replace the old regex.
 
 Some technical notes:
 
 I ported the code with SUPPORT_UTF8, but _not_ with SUPPORT_UCP because that
was
 just a lot of bloat. Also, the LINK_SIZE I selected was 2, the default.
 
 Here's the link:
 
 http://pantheon.yale.edu/~ajg36/pcre.zip

Thanks for the code :))) The main.d sample requires to small changes: line 1 < private import pcre_c;
 private import pcre;

line 8 < pcre *re;
 pcre.pcre *re;

I think PCRE_D - after a bit of clean up and some unittests - might become a valuable Phobos addon. Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFC/73e3w+/yD4P9tIRAj4mAJ9HU5X2bZ7lX03Bchj1gU2DxdNcTQCfbDfG RcZqhTLnYs8pQNZEAQL0v0M= =ziME -----END PGP SIGNATURE-----
Aug 14 2005