www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Compile Time Regular Expressions

reply "Craig Black" <cblack ara.com> writes:
If I understand correctly, D templates can take text as parameters.  I also 
understand that there are limitations as to the length of the text string. 
If this limitation could be eliminated, would it then be possible to 
generate highly-optimized source code from a regular expression at compile 
time?

Just a thought.

-Craig 
Dec 12 2005
next sibling parent reply "Kris" <fu bar.com> writes:
Yes, that would be quite a feature :-)

There was quite a bit of discussion early in the year about supporting 
regex-expressions as a first class citizens within the language. Doing so 
via a template could be a fine resolution.

- Kris

"Craig Black" <cblack ara.com> wrote in message 
news:dnkj03$298u$1 digitaldaemon.com...
 If I understand correctly, D templates can take text as parameters.  I 
 also understand that there are limitations as to the length of the text 
 string. If this limitation could be eliminated, would it then be possible 
 to generate highly-optimized source code from a regular expression at 
 compile time?

 Just a thought.

 -Craig
 

Dec 12 2005
next sibling parent reply "Craig Black" <cblack ara.com> writes:
These new D features are very exciting.  D is breaking into new territory 
and the implications are vast and unexplored.  It's like buried treasure.

-Craig 
Dec 12 2005
parent Don Clugston <dac nospam.com.au> writes:
Craig Black wrote:
 These new D features are very exciting.  D is breaking into new territory 
 and the implications are vast and unexplored.  It's like buried treasure.
 
 -Craig 
 

It's like being the first to walk on Mars. :-)
Dec 12 2005
prev sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Kris" <fu bar.com> wrote in message news:dnkk6s$2al1$1 digitaldaemon.com...
 Yes, that would be quite a feature :-)

 There was quite a bit of discussion early in the year about supporting
 regex-expressions as a first class citizens within the language.

Just a reminder that it isn't in the language for technical reasons, not because it's a bad idea.
 Doing so via a template could be a fine resolution.

That does sound like it'd be cool! I'd like to set it against David Abraham's C++ method of doing regex by using operator overloading and expression templates, a technique that is technologically marvelous but, in my not so humble opinion (!), is not aesthetically right.
Dec 13 2005
next sibling parent reply Larry Evans <cppljevans cos-internet.com> writes:
On 12/13/2005 04:03 AM, Walter Bright wrote:
[snip]
 
Doing so via a template could be a fine resolution.

That does sound like it'd be cool! I'd like to set it against David Abraham's C++ method of doing regex by using operator overloading and expression templates, a technique that is technologically marvelous but, in my not so humble opinion (!), is not aesthetically right.

Could you provide a link to David's method, ... or are you referring to spirit ( http://www.boost.org/libs/spirit/index.html ). If you're referring to spirit, then, IIUC, without the operator overloading, you'd have to compose your regex with a complicated typedef: typedef alternative<term1,alternative<term2,term3> term1_2_or_3; to represent the same regexp with operator overloading: term1 | term2 | term3 Isn't the 2nd form more aesthetic than the 1st form?
Dec 13 2005
parent "Walter Bright" <newshound digitalmars.com> writes:
"Larry Evans" <cppljevans cos-internet.com> wrote in message
news:dnmcdl$rhd$1 digitaldaemon.com...
 On 12/13/2005 04:03 AM, Walter Bright wrote:
 [snip]
Doing so via a template could be a fine resolution.

That does sound like it'd be cool! I'd like to set it against David Abraham's C++ method of doing regex by using operator overloading and expression templates, a technique that is technologically marvelous but,


 my not so humble opinion (!), is not aesthetically right.

Could you provide a link to David's method, ... or are you referring to spirit ( http://www.boost.org/libs/spirit/index.html ).

My mistake. Eric Niebler created this, not David, here's the link: http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/index.html
 If you're
 referring to spirit, then, IIUC, without the operator overloading, you'd
 have to compose your regex with a complicated typedef:

    typedef alternative<term1,alternative<term2,term3> term1_2_or_3;

 to represent the same regexp with operator overloading:

    term1 | term2 | term3

 Isn't the 2nd form more aesthetic than the 1st form?

The problem is that with C++ (or D) operator overloading, one cannot: 1) invent new operators 2) change operator precedence 3) change operator binding Therefore, one cannot create any form of operator overloading which looks like conventional regular expressions. They will, *necessarilly*, look like C++ (or D) expressions. I believe the resulting confusion creates more problems than it solves.
Dec 13 2005
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:>

 Doing so via a template could be a fine resolution.

That does sound like it'd be cool! I'd like to set it against David Abraham's C++ method of doing regex by using operator overloading and expression templates, a technique that is technologically marvelous but, in my not so humble opinion (!), is not aesthetically right.

Agreed. Spirit shows a masterful grasp of C++ template syntax but the result is still a bit jarring (through no fault of David's). A string-based regex parser could be great as a comparison, though it might be a bit much for an introductory article on D templates. Sean
Dec 13 2005
parent "Walter Bright" <newshound digitalmars.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message
news:dnn1j7$1jup$1 digitaldaemon.com...
 Walter Bright wrote:>

 Doing so via a template could be a fine resolution.

That does sound like it'd be cool! I'd like to set it against David Abraham's C++ method of doing regex by using operator overloading and expression templates, a technique that is technologically marvelous but,


 my not so humble opinion (!), is not aesthetically right.

Agreed. Spirit shows a masterful grasp of C++ template syntax but the result is still a bit jarring (through no fault of David's). A string-based regex parser could be great as a comparison, though it might be a bit much for an introductory article on D templates.

I agree it is not David's fault (although Eric Niebler created the regex library, not David http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/index.html), the results are crippled by the limitations of operator overloading. David and I had argued the merits/demerits of this at some length on a private mailing list thread.
Dec 13 2005
prev sibling next sibling parent reply BCS <BCS_member pathlink.com> writes:
In article <dnkj03$298u$1 digitaldaemon.com>, Craig Black says...
If I understand correctly, D templates can take text as parameters.  I also 
understand that there are limitations as to the length of the text string. 
If this limitation could be eliminated, would it then be possible to 
generate highly-optimized source code from a regular expression at compile 
time?

Just a thought.

-Craig 

parameter but may only be used in a context where they can be reduced at compile time to a constant literal. this might be permissible because there would never be a need to generate a mangled name for the specialized template. This would allow things like this. isOneOf(T i, T[] a) { static if(a.length == 1) const bool isOneOf = is(i == a[0]); else const bool isOneOf = isOneOf!(i, a[0..length/2]) || isOneOf!(i, a[length/2..length]); } or maybe even things like mergesort and array filters.
Dec 12 2005
next sibling parent reply Sean Kelly <sean f4.ca> writes:
BCS wrote:
 How about a special case of templates that allow any arbitrary constant
 parameter but may only be used in a context where they can be reduced at
compile
 time to a constant literal. this might be permissible because there would never
 be a need to generate a mangled name for the specialized template. This would
 allow things like this.
 
 
 isOneOf(T i, T[] a)
 {
 static if(a.length == 1)
 const bool isOneOf = is(i == a[0]);
 else
 const bool isOneOf = isOneOf!(i, a[0..length/2]) || isOneOf!(i,
 a[length/2..length]);
 }
 
 or maybe even things like mergesort and array filters.

I think this should already be possible, as the optimizer should throw away most metaprogramming template code once the constants have been determined, which should precede any symbol length errors. Sean
Dec 12 2005
parent reply BCS <BCS_member pathlink.com> writes:
Can the following (or it's equivalant) be done in a template, if so how?

template moo(T[] a)
{
	T[] moo = a[0..length/2];
}
Dec 12 2005
parent "Chris Miller" <chris dprogramming.com> writes:
On Mon, 12 Dec 2005 17:44:57 -0500, BCS <BCS_member pathlink.com> wrote:

 Can the following (or it's equivalant) be done in a template, if so how?

 template moo(T[] a)
 {
 	T[] moo = a[0..length/2];
 }

I came up with: private import std.stdio; template moo(T, T[] a) { const T[] moo = a[0..a.length/2]; } int main() { const char[] s = moo!(char, "hello world"); writefln("%s", s); return 0; } and DMD complains: Error: arithmetic/string type expected for value-parameter, not T[] Change the "T[] a" to "char[] a" for the template and it works.
Dec 12 2005
prev sibling parent reply "Craig Black" <cblack ara.com> writes:
 How about a special case of templates that allow any arbitrary constant
 parameter but may only be used in a context where they can be reduced at 
 compile
 time to a constant literal. this might be permissible because there would 
 never
 be a need to generate a mangled name for the specialized template. This 
 would
 allow things like this.


 isOneOf(T i, T[] a)
 {
 static if(a.length == 1)
 const bool isOneOf = is(i == a[0]);
 else
 const bool isOneOf = isOneOf!(i, a[0..length/2]) || isOneOf!(i,
 a[length/2..length]);
 }

 or maybe even things like mergesort and array filters.

thereby eliminating name-mangling? -Craig
Dec 12 2005
parent BCS <BCS_member pathlink.com> writes:
Craig Black wrote:
How about a special case of templates that allow any arbitrary constant
parameter but may only be used in a context where they can be reduced at 
compile
time to a constant literal. this might be permissible because there would 
never
be a need to generate a mangled name for the specialized template. This 
would
allow things like this.


isOneOf(T i, T[] a)
{
static if(a.length == 1)
const bool isOneOf = is(i == a[0]);
else
const bool isOneOf = isOneOf!(i, a[0..length/2]) || isOneOf!(i,
a[length/2..length]);
}

or maybe even things like mergesort and array filters.

Are you suggesting working around the limitation by inlining the method and thereby eliminating name-mangling? -Craig

IIRC the name-mangling is needed to put the template instance into the symbol table. If it all ends up as a literal, it doesn't need to be put in, just as a string literal never ends up in the table
Dec 12 2005
prev sibling next sibling parent "Craig Black" <cblack ara.com> writes:
With regard to Spirit and Neibler's library, I agree with Walter's comments. 
Although it is cool that it can be done, the syntax is very ugly.  (As if 
regular expression syntax wasn't bad enough.)  D is poised to overcome these 
limitations.

-Craig 
Dec 14 2005
prev sibling next sibling parent Don Clugston <dac nospam.com.au> writes:
Craig Black wrote:
 If I understand correctly, D templates can take text as parameters.  I also 
 understand that there are limitations as to the length of the text string. 
 If this limitation could be eliminated, would it then be possible to 
 generate highly-optimized source code from a regular expression at compile 
 time?
 
 Just a thought.
 
 -Craig 

I have the beginnings of a library ("meta") designed for exactly those type of applications. Currently, the only usage of it is for D Dynamic Libraries, so it's currently living in the 'meta' subdirectory of the SVN repository for DDL on dsource. Currently, it only has mathematical and rudimentary string functions. There are a few workarounds for bugs in the current DMD compiler, but it's already obvious that the final code is going to be very clean and understandable. BTW, the limit on the length of the text string only applies to the compile-time portion, and the limit is huge (about 2Kbytes). Although because you tend to use templates recursively, you run into the limit at much smaller sizes. For regexp, you normally provide the string to be searched at compile time, and there's no limit on how long that can be. The regexp string is limited, but it should still be big enough to accomodate any regexp I've ever seen.
Dec 15 2005
prev sibling parent reply Bruno Medeiros <daiphoenixNO SPAMlycos.com> writes:
Craig Black wrote:
 If I understand correctly, D templates can take text as parameters.  I also 
 understand that there are limitations as to the length of the text string. 
 If this limitation could be eliminated, would it then be possible to 
 generate highly-optimized source code from a regular expression at compile 
 time?
 
 Just a thought.
 
 -Craig 
 
 

ability to run a regexp on a string at compile time? Or to generate at compile-time an (optimized) code that will parse a given (constant) regexp in an arbitrary (non-constant) string? -- Bruno Medeiros - CS/E student "Certain aspects of D are a pathway to many abilities some consider to be... unnatural."
Dec 15 2005
parent reply "Craig Black" <cblack ara.com> writes:
 I'm not understanding clearly on this one. Are you asking for the ability 
 to run a regexp on a string at compile time? Or to generate at 
 compile-time an (optimized) code that will parse a given (constant) regexp 
 in an arbitrary (non-constant) string?

It would be ideal to be able to generate optimized code at compile-time from a regular expression. The regular expression could be a parameter to a D template. This template would be able to parse a an arbitrary string at run-time using this optimized code. Does that make sense? -Craig
Dec 16 2005
parent Bruno Medeiros <daiphoenixNO SPAMlycos.com> writes:
Craig Black wrote:
I'm not understanding clearly on this one. Are you asking for the ability 
to run a regexp on a string at compile time? Or to generate at 
compile-time an (optimized) code that will parse a given (constant) regexp 
in an arbitrary (non-constant) string?

It would be ideal to be able to generate optimized code at compile-time from a regular expression. The regular expression could be a parameter to a D template. This template would be able to parse a an arbitrary string at run-time using this optimized code. Does that make sense? -Craig

-- Bruno Medeiros - CS/E student "Certain aspects of D are a pathway to many abilities some consider to be... unnatural."
Dec 17 2005