www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Would there be interest in a SERIOUS compile-time regex parser?

reply Don Clugston <dac nospam.com.au> writes:
In the past, Eric and I both developed compile-time regex engines, but 
they were proof-of-concept rather than something you'd actually use in 
production code. I think this also applies to C++ metaprogramming regexp 
engines, too.

I've had a bit of play around with the regexp code in Phobos, and have 
convinced myself that it would be straightforward to create a 
compile-time wrapper for the existing engine.

Usage could be something like:
--------
void main()
{
	char [] s = "abcabcabab";
          // case insensitive search
	foreach(m; rexSearch!("ab+", "i")(s))
	{
		writefln("%s[%s]%s", m.pre, m.match(0), m.post);
	}
}
--------

It would behave *exactly* like the existing std.regexp, except that 
compilation into the internal form would happen via template 
metaprogramming, so that
(1) all errors would be caught at compile time, and
(2) there'd be a minor speedup because the compilation step would not 
happen at runtime, and
(3) otherwise it wouldn't be any faster than the existing regexp. 
However, there'd be no template code bloat, either.

Existing code would be unchanged. You could even write:

Regexp a = StaticRegExp!("ab?(ab*)+", "g");

(assign a pre-compiled regular expression to an existing phobos RegExp).

There's potentially a greater speedup possible, because the Regexp class 
could become a struct, with no need for any dynamic memory allocation; 
but if this was done, mixing runtime and compile-time regexps together 
wouldn't be as seamless. And of course there's load of room for future 
enhancement.

BUT...

The question is -- would this be worthwhile? I'm really not interested 
in making another toy.
It's straightforward, but tedious, and would double the length of 
std.regexp.
Would the use of templates be such a turn-off that people wouldn't use it?
Do the benefits exceed the cost?
Oct 16 2006
next sibling parent reply rm <roel.mathys gmail.com> writes:
Don Clugston wrote:
 In the past, Eric and I both developed compile-time regex engines, but
 they were proof-of-concept rather than something you'd actually use in
 production code. I think this also applies to C++ metaprogramming regexp
 engines, too.
 
 I've had a bit of play around with the regexp code in Phobos, and have
 convinced myself that it would be straightforward to create a
 compile-time wrapper for the existing engine.
 
 Usage could be something like:
 --------
 void main()
 {
     char [] s = "abcabcabab";
          // case insensitive search
     foreach(m; rexSearch!("ab+", "i")(s))
     {
         writefln("%s[%s]%s", m.pre, m.match(0), m.post);
     }
 }
 --------
 
 It would behave *exactly* like the existing std.regexp, except that
 compilation into the internal form would happen via template
 metaprogramming, so that
 (1) all errors would be caught at compile time, and
 (2) there'd be a minor speedup because the compilation step would not
 happen at runtime, and
 (3) otherwise it wouldn't be any faster than the existing regexp.
 However, there'd be no template code bloat, either.
 
 Existing code would be unchanged. You could even write:
 
 Regexp a = StaticRegExp!("ab?(ab*)+", "g");
 
 (assign a pre-compiled regular expression to an existing phobos RegExp).
 
 There's potentially a greater speedup possible, because the Regexp class
 could become a struct, with no need for any dynamic memory allocation;
 but if this was done, mixing runtime and compile-time regexps together
 wouldn't be as seamless. And of course there's load of room for future
 enhancement.
 
 BUT...
 
 The question is -- would this be worthwhile? I'm really not interested
 in making another toy.
 It's straightforward, but tedious, and would double the length of
 std.regexp.
 Would the use of templates be such a turn-off that people wouldn't use it?
 Do the benefits exceed the cost?

I'm not so far as looking into the current regexp module. But otoh I've already done some of the homework: template findChar(char[] stringToSearch, char charToFind) { static if ( stringToSearch.length == 0 || stringToSearch[0] == charToFind ) const int findChar = 0; else const int findChar = 1 + findChar!( stringToSearch[1..stringToSearch.length] , charToFind); } gives the position of the char in the string, but if the position == length of stringToSearch, charToFind is not present. I've got some others as well, I can parse an string literal into an integer :-) I'm willing to give a hand if you want. roel
Oct 16 2006
parent Don Clugston <dac nospam.com.au> writes:
rm wrote:
 Don Clugston wrote:
 In the past, Eric and I both developed compile-time regex engines, but
 they were proof-of-concept rather than something you'd actually use in
 production code. I think this also applies to C++ metaprogramming regexp
 engines, too.

 I've had a bit of play around with the regexp code in Phobos, and have
 convinced myself that it would be straightforward to create a
 compile-time wrapper for the existing engine.

 Usage could be something like:
 --------
 void main()
 {
     char [] s = "abcabcabab";
          // case insensitive search
     foreach(m; rexSearch!("ab+", "i")(s))
     {
         writefln("%s[%s]%s", m.pre, m.match(0), m.post);
     }
 }
 --------

 It would behave *exactly* like the existing std.regexp, except that
 compilation into the internal form would happen via template
 metaprogramming, so that
 (1) all errors would be caught at compile time, and
 (2) there'd be a minor speedup because the compilation step would not
 happen at runtime, and
 (3) otherwise it wouldn't be any faster than the existing regexp.
 However, there'd be no template code bloat, either.

 Existing code would be unchanged. You could even write:

 Regexp a = StaticRegExp!("ab?(ab*)+", "g");

 (assign a pre-compiled regular expression to an existing phobos RegExp).

 There's potentially a greater speedup possible, because the Regexp class
 could become a struct, with no need for any dynamic memory allocation;
 but if this was done, mixing runtime and compile-time regexps together
 wouldn't be as seamless. And of course there's load of room for future
 enhancement.

 BUT...

 The question is -- would this be worthwhile? I'm really not interested
 in making another toy.
 It's straightforward, but tedious, and would double the length of
 std.regexp.
 Would the use of templates be such a turn-off that people wouldn't use it?
 Do the benefits exceed the cost?

I'm not so far as looking into the current regexp module. But otoh I've already done some of the homework: template findChar(char[] stringToSearch, char charToFind) { static if ( stringToSearch.length == 0 || stringToSearch[0] == charToFind ) const int findChar = 0; else const int findChar = 1 + findChar!( stringToSearch[1..stringToSearch.length] , charToFind); } gives the position of the char in the string, but if the position == length of stringToSearch, charToFind is not present. I've got some others as well, I can parse an string literal into an integer :-) I'm willing to give a hand if you want.

Thanks. Some of my old code is on dsource, in ddl/meta; it was much more difficult back then, when there were so many compiler bugs. meta.nameof is the code I'm proudest of. I have the basic framework in, and I've made progress on the character class stuff, which I think is the most complicated part (since it's full of bit masking operations on arrays). So far everything has been quite straightforward, and no innovation has been required. A fantastically helpful thing you could do immediately would be to improve the code coverage of the unit tests in std.regexp. Currently it's only about 40%. For my purposes, I only need high test coverage, not necessarily sensible tests (I'll just test the compiled output). Sorry that it's not a very sexy job. (Hint: we need some tests for the undocumented features like "a*?").
Oct 17 2006
prev sibling next sibling parent Hasan Aljudy <hasan.aljudy gmail.com> writes:
I only speak for myself, but there are my 2 cents anyway:

I don't see myself ever needing this kind of thing.
As for the extra speedup, well, I've never been a speed freak, I always 
try to sacrifice performance in favor of code readability.

So I don't see this to be worthwhile.

Don Clugston wrote:
 In the past, Eric and I both developed compile-time regex engines, but 
 they were proof-of-concept rather than something you'd actually use in 
 production code. I think this also applies to C++ metaprogramming regexp 
 engines, too.
 
 I've had a bit of play around with the regexp code in Phobos, and have 
 convinced myself that it would be straightforward to create a 
 compile-time wrapper for the existing engine.
 
 Usage could be something like:
 --------
 void main()
 {
     char [] s = "abcabcabab";
          // case insensitive search
     foreach(m; rexSearch!("ab+", "i")(s))
     {
         writefln("%s[%s]%s", m.pre, m.match(0), m.post);
     }
 }
 --------
 
 It would behave *exactly* like the existing std.regexp, except that 
 compilation into the internal form would happen via template 
 metaprogramming, so that
 (1) all errors would be caught at compile time, and
 (2) there'd be a minor speedup because the compilation step would not 
 happen at runtime, and
 (3) otherwise it wouldn't be any faster than the existing regexp. 
 However, there'd be no template code bloat, either.
 
 Existing code would be unchanged. You could even write:
 
 Regexp a = StaticRegExp!("ab?(ab*)+", "g");
 
 (assign a pre-compiled regular expression to an existing phobos RegExp).
 
 There's potentially a greater speedup possible, because the Regexp class 
 could become a struct, with no need for any dynamic memory allocation; 
 but if this was done, mixing runtime and compile-time regexps together 
 wouldn't be as seamless. And of course there's load of room for future 
 enhancement.
 
 BUT...
 
 The question is -- would this be worthwhile? I'm really not interested 
 in making another toy.
 It's straightforward, but tedious, and would double the length of 
 std.regexp.
 Would the use of templates be such a turn-off that people wouldn't use it?
 Do the benefits exceed the cost?

Oct 16 2006
prev sibling next sibling parent "Andrey Khropov" <andkhropov_nosp m_mtu-net.ru> writes:
Don Clugston wrote:

 Do the benefits exceed the cost?

I think yes. IMHO compile time version should be the default in std.regexp. And it's also a good usecase of D features for the world at large. -- AKhropov
Oct 16 2006
prev sibling next sibling parent Pragma <ericanderton yahoo.removeme.com> writes:
Don Clugston wrote:
 In the past, Eric and I both developed compile-time regex engines, but 
 they were proof-of-concept rather than something you'd actually use in 
 production code. I think this also applies to C++ metaprogramming regexp 
 engines, too.
 
 I've had a bit of play around with the regexp code in Phobos, and have 
 convinced myself that it would be straightforward to create a 
 compile-time wrapper for the existing engine.
 
 Usage could be something like:
 --------
 void main()
 {
     char [] s = "abcabcabab";
          // case insensitive search
     foreach(m; rexSearch!("ab+", "i")(s))
     {
         writefln("%s[%s]%s", m.pre, m.match(0), m.post);
     }
 }
 --------
 
 It would behave *exactly* like the existing std.regexp, except that 
 compilation into the internal form would happen via template 
 metaprogramming, so that
 (1) all errors would be caught at compile time, and
 (2) there'd be a minor speedup because the compilation step would not 
 happen at runtime, and
 (3) otherwise it wouldn't be any faster than the existing regexp. 
 However, there'd be no template code bloat, either.
 
 Existing code would be unchanged. You could even write:
 
 Regexp a = StaticRegExp!("ab?(ab*)+", "g");
 
 (assign a pre-compiled regular expression to an existing phobos RegExp).
 
 There's potentially a greater speedup possible, because the Regexp class 
 could become a struct, with no need for any dynamic memory allocation; 
 but if this was done, mixing runtime and compile-time regexps together 
 wouldn't be as seamless. And of course there's load of room for future 
 enhancement.
 
 BUT...
 
 The question is -- would this be worthwhile? I'm really not interested 
 in making another toy.
 It's straightforward, but tedious, and would double the length of 
 std.regexp.
 Would the use of templates be such a turn-off that people wouldn't use it?
 Do the benefits exceed the cost?

It's always difficult to forsee the ramifications of anything that is new in this sense. I'm curious as to how many people have used the Boost implementation... maybe that would give you an idea of how much real-world potential it has. FWIW, I could use it within Enki for regex expressions, when they're used in an input grammar. It would yield some nice speed increases for regex heavy designs, w/o having to re-implement it all over again for Enki's sake. The only requirement I have is that it must handle UTF correctly, preferably UTF32 or a templated char type. -- - EricAnderton at yahoo
Oct 16 2006
prev sibling next sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Don Clugston wrote:
 The question is -- would this be worthwhile? I'm really not interested 
 in making another toy.
 It's straightforward, but tedious, and would double the length of 
 std.regexp.
 Would the use of templates be such a turn-off that people wouldn't use it?
 Do the benefits exceed the cost?

Yes, for the following reasons: 1) I think it would make for faster regexp's, more than just by omitting the compile step. That's because the bytecoded program wouldn't need to be interpreted. 2) When I show the current one to people, as opposed to Eric Niebler's C++ one, the response is "but the D one is just a toy" with the implication that D is just a toy. 3) I wrote std.regexp long before people needed or asked for it. I knew it would eventually become a critically needed module, and that came to pass. It was nice that it was there when they needed it, and the time passed had ensured that it was a solid piece of code. 4) Your crafting of the current toy version was the catalyst for a big leap forward in D's templating system. Doing a professional one may expose critical problems that need to be fixed, and even if no such flaws are discovered, it would prove that D's TMP capabilities are up to scratch. Sure, a lot of people are turned off by templates. That's one of my motivations for making things like associative arrays usable without templates. But for the people who do use templates, this can be a very big deal. I think it should be in a separate module, say, regexp_static or something like that. Ideally, the user can switch between the two just by changing the module name in his code, and compare.
Oct 16 2006
next sibling parent rm <roel.mathys gmail.com> writes:
Walter Bright wrote:
 Don Clugston wrote:
 The question is -- would this be worthwhile? I'm really not interested
 in making another toy.
 It's straightforward, but tedious, and would double the length of
 std.regexp.
 Would the use of templates be such a turn-off that people wouldn't use
 it?
 Do the benefits exceed the cost?

Yes, for the following reasons: 1) I think it would make for faster regexp's, more than just by omitting the compile step. That's because the bytecoded program wouldn't need to be interpreted. 2) When I show the current one to people, as opposed to Eric Niebler's C++ one, the response is "but the D one is just a toy" with the implication that D is just a toy. 3) I wrote std.regexp long before people needed or asked for it. I knew it would eventually become a critically needed module, and that came to pass. It was nice that it was there when they needed it, and the time passed had ensured that it was a solid piece of code. 4) Your crafting of the current toy version was the catalyst for a big leap forward in D's templating system. Doing a professional one may expose critical problems that need to be fixed, and even if no such flaws are discovered, it would prove that D's TMP capabilities are up to scratch. Sure, a lot of people are turned off by templates. That's one of my motivations for making things like associative arrays usable without templates. But for the people who do use templates, this can be a very big deal. I think it should be in a separate module, say, regexp_static or something like that. Ideally, the user can switch between the two just by changing the module name in his code, and compare.

to code stuff like this, it would really come in handy, if there were a compiler switch which could be set, to trace which template instantiations are going on with which arguments. For the moment I'm putting lot's of pragma-s in to manually trace this stuff, but it's so verbose ;-) roel
Oct 16 2006
prev sibling parent reply Don Clugston <dac nospam.com.au> writes:
Walter Bright wrote:
 Don Clugston wrote:
 The question is -- would this be worthwhile? I'm really not interested 
 in making another toy.
 It's straightforward, but tedious, and would double the length of 
 std.regexp.
 Would the use of templates be such a turn-off that people wouldn't use 
 it?
 Do the benefits exceed the cost?

Yes, for the following reasons: 1) I think it would make for faster regexp's, more than just by omitting the compile step. That's because the bytecoded program wouldn't need to be interpreted.

I haven't worked out how to do backtracking sensibly when using mixins, so what I'm doing is simply compiling to the bytecoded program at compile time. Later on, it would be possible to add another step which looks for simple pattern types, and generates specific code for them (my guess is that 90% of all regexps use just one of a few simple categories).
 2) When I show the current one to people, as opposed to Eric Niebler's 
 C++ one, the response is "but the D one is just a toy" with the 
 implication that D is just a toy.

<raise eyebrows> I wonder if anyone is actually using the C++ regexp in production code? Personally I get the feeling that so much of boost is extremely clever, but not terribly practical.
 3) I wrote std.regexp long before people needed or asked for it. I knew 
 it would eventually become a critically needed module, and that came to 
 pass. It was nice that it was there when they needed it, and the time 
 passed had ensured that it was a solid piece of code.
 
 4) Your crafting of the current toy version was the catalyst for a big 
 leap forward in D's templating system. Doing a professional one may 
 expose critical problems that need to be fixed, and even if no such 
 flaws are discovered, it would prove that D's TMP capabilities are up to 
 scratch.
 
 Sure, a lot of people are turned off by templates. That's one of my 
 motivations for making things like associative arrays usable without 
 templates. But for the people who do use templates, this can be a very 
 big deal.

There's a potential piece of syntax sugar that IMHO would be a killer feature. If it were possible to overload a function based on whether a parameter is a compile-time literal, or a variable, the use of template metaprogramming could become completely invisible to the user. In my code, the situation is basically like this: --------- class RegExp { char [] program; void makeProgramAtCompileTime(char [] pattern)() { program = RegExpCompile!(pattern); } void makeProgramAtRunTime(char [] pattern) { program = runtimecompile(pattern); } } --------- Imagine if there were some way to create an alias makeProgram(char [] pattern, int otherparameters) which became makeProgramAtCompileTime!(pattern)(otherparameters); if pattern were a compile time literal, but became makeProgramAtRunTime(pattern, otherparameters); if pattern were a variable. Then the user wouldn't even need to know that they were using templates. It would also let you do some pretty amazing stuff. Consider something as basic as sin(x). real sin_compiletime(real x)() { static if (x==0) return x; // we can do this one else return sin_runtime(x); // everything else is too hard, pass it to the runtime version. } Which is much easier than training the optimiser that it can replace sin(0) with 0. For more complex situations, you could perform optimisations that are technically infeasible for a normal optimiser. You could also catch more errors at compile time: real sqrt_compiletime(real x)() { static assert(x>=0, "Error, square root of negative number"); return sqrt_runtime(x); } There probably aren't many cases where this kind of optimisation would actually be worth doing, but for those cases, the benefit could be considerable. One possible syntax would be something like: real sqrt(template real x) { static if (is(x: const)) return sqrt_compiletime!(x); else return sqrt_runtime(x); } Another might be "template alias". Something to think about; not a proposal at present.
Oct 17 2006
next sibling parent Bill Baxter <wbaxter gmail.com> writes:
Don Clugston wrote:
 Walter Bright wrote:
 
 Don Clugston wrote:

 2) When I show the current one to people, as opposed to Eric Niebler's 
 C++ one, the response is "but the D one is just a toy" with the 
 implication that D is just a toy.

<raise eyebrows> I wonder if anyone is actually using the C++ regexp in production code? Personally I get the feeling that so much of boost is extremely clever, but not terribly practical.

That's certainly been my experience with boost. It's hit-or-miss. shared_ptr stuff -- great. functional and serialization -- clever, but ultimately a big waste of my time. It's easier to write the dang ugly for loop than to figure out the right incantation of _1's and _2's and binders to get functional::map working. I really wish boost had some sort of feedback form at the bottom of every page so people could rate the packages and make comments. Kind of like the MathWorks site for user-contributed Matlab code. Could be a good thing for D-source too. --bb
Oct 17 2006
prev sibling next sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Don Clugston wrote:
 There's a potential piece of syntax sugar that IMHO would be a killer 
 feature. If it were possible to overload a function based on whether a 
 parameter is a compile-time literal, or a variable, the use of template 
 metaprogramming could become completely invisible to the user.

That is a good idea.
Oct 17 2006
parent reply =?iso-8859-1?q?Knud_S=F8rensen?= <12tkvvb02 sneakemail.com> writes:
On Tue, 17 Oct 2006 12:23:33 -0700, Walter Bright wrote:

 Don Clugston wrote:
 There's a potential piece of syntax sugar that IMHO would be a killer 
 feature. If it were possible to overload a function based on whether a 
 parameter is a compile-time literal, or a variable, the use of template 
 metaprogramming could become completely invisible to the user.

That is a good idea.

That put a crazy idea into my head. disclaimer: My education is in physics not computer science, I have no idea of how to write a compiler so my view may be very wrong ;-) I have previously suggested that D handle functions on compile-time literal by making them inline and let the optimizer evaluate them. Would it be possible to combine the template-engine and the optimizer such that one of them could be refactored away ??
Oct 17 2006
next sibling parent Walter Bright <newshound digitalmars.com> writes:
I don't know.
Oct 17 2006
prev sibling parent reply Reiner Pope <reiner.pope REMOVE.THIS.gmail.com> writes:
Knud Sørensen wrote:
 On Tue, 17 Oct 2006 12:23:33 -0700, Walter Bright wrote:
 
 Don Clugston wrote:
 There's a potential piece of syntax sugar that IMHO would be a killer 
 feature. If it were possible to overload a function based on whether a 
 parameter is a compile-time literal, or a variable, the use of template 
 metaprogramming could become completely invisible to the user.


That put a crazy idea into my head. disclaimer: My education is in physics not computer science, I have no idea of how to write a compiler so my view may be very wrong ;-) I have previously suggested that D handle functions on compile-time literal by making them inline and let the optimizer evaluate them. Would it be possible to combine the template-engine and the optimizer such that one of them could be refactored away ??

Three short answers: 1. This would be easy to do (ie it is only heavy syntax sugar) for a limited subset of the language (this subset would exclude reference semantics). 2. It would be possible to do for almost the entire language, but would require a real interpreter at compile time, which means a fair bit more work (but still possible and very appealing). 3. Neither of these two mechanisms would be enough to do away with the primary motivation for templates -- parameterised types. However, they would remove the need for a Turing-complete template system. <rant> Long answers: 1. Value semantics are inherently easier for a compiler to handle, because evaluation of them is just done by inlining, const folding and const propagation. Right now, an automated conversion of normal D functions to template form is pretty much possible, using the following rules: - convert every function to a template - convert every assignment to a variable into static single assignment form: a = 5; a = 6; becomes const a__1 = 5; const a__2 = 6; - convert function calls into template instantiations a = foo(); becomes const a__1 = foo!().__val; - convert returns into const __val = whatever; - now that everything is constant, we can use static if instead of if everywhere. So, just convert all ifs into static ifs. - convert loops into recursive functions: while (condition) { ... } becomes template __Loop1(local_variables....) { // Do stuff static if (condition) __val = _Loop10!(new_local_variables).__val; } This basically does it but, since the template system only allows value semantics, this conversion must only allow them, so it means: - no pointers - no classes - no index-setting/slice-setting of arrays - no inout/out parameters and also - no assembly - no C code because the template system can't be expected to handle them. 2. By implementing a real interpreter in the compiler, you can handle reference semantics. The only things you can't handle are - assembly - other languages - global non-const variables (which should generally be avoided anyway) Both of these systems would incorporate a way of saying, 'evaluate this at compile time', even if it were just as simple as declaring the variable const, using it as a template parameter or in a static if: version (good) { const int N = 80; } else { const int N = 16 } const foo = sqrt(N); // foo is 4 static if (sqrt(foo) > 3) // that expression must be a compile-time constant alias ubyte MyByte; else alias byte MyByte; List!(foo, MyByte) myList; The main motivations for this are, (in decreasing order of importance): - ability to use runtime functions for static if and template parameters (where static if is important because of - early catching of errors (eg incorrect regexes) - efficiency (eg not having to compile the regex at runtime; also, having cheap reflection at compile time) In my opinion, fully supporting an interpreter at compile time simply turns metaprogramming into normal D with the requirement that it be optimized away by the compiler, and the extensibility and opportunities that this could allow is a huge motivation. However, being a very large amount of work with a lot of details, there is no hope of it being completed any time soon, but I think the motivation is so much that it is worth devoting an entire release of D (maybe 2.0 or 3.0) to being a 'metaprogramming release.' </rant> Cheers, Reiner
Oct 17 2006
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Reiner Pope wrote:
 Knud Sørensen wrote:
 On Tue, 17 Oct 2006 12:23:33 -0700, Walter Bright wrote:

 Don Clugston wrote:



 Three short answers:
 
 1. This would be easy to do (ie it is only heavy syntax sugar) for a 
 limited subset of the language (this subset would exclude reference 
 semantics).
 
 2. It would be possible to do for almost the entire language, but would 
 require a real interpreter at compile time, which means a fair bit more 
 work (but still possible and very appealing).
 
 3. Neither of these two mechanisms would be enough to do away with the 
 primary motivation for templates -- parameterised types. However, they 
 would remove the need for a Turing-complete template system.

 Both of these systems would incorporate a way of saying, 'evaluate this 
 at compile time', even if it were just as simple as declaring the 
 variable const, using it as a template parameter or in a static if:
 
   version (good) { const int N = 80; }
   else { const int N = 16 }
   const foo = sqrt(N); // foo is 4
   static if (sqrt(foo) > 3) // that expression must be a compile-time 
 constant
     alias ubyte MyByte;
   else
     alias byte MyByte;
   List!(foo, MyByte) myList;
 
 The main motivations for this are, (in decreasing order of importance):
   - ability to use runtime functions for static if and template 
 parameters (where static if is important because of
   - early catching of errors (eg incorrect regexes)
   - efficiency (eg not having to compile the regex at runtime; also, 
 having cheap reflection at compile time)

This is exactly what I've been hoping someone would invent too. It seems like the next logical step in template metaprogramming. I think Walter's hip to the idea too. Soon after the last big discussion about metaprogramming stuff and making it more like a true compile-time programming language, "static if" appeared. I think that was something that Walter realized was "low-hanging fruit", which could quickly be implemented today and have a big impact now without waiting around till D 3.0. So, I think it's vaguely on the D roadmap somewhere, it's just a huge undertaking and will take time. --bb
Oct 17 2006
prev sibling parent Sean Kelly <sean f4.ca> writes:
Don Clugston wrote:
 
 Imagine if there were some way to create an alias
 makeProgram(char [] pattern, int otherparameters)
 
 which became
 makeProgramAtCompileTime!(pattern)(otherparameters);
 if pattern were a compile time literal,
 but became
 makeProgramAtRunTime(pattern, otherparameters);
 if pattern were a variable.
 
 Then the user wouldn't even need to know that they were using templates.

I've always thought that meta-heavy libraries like Blitz++ were a noble attempt at something the language didn't really support, but the ability to mix compile and runtime code using the same call syntax would completely change things.
 One possible syntax would be something like:
 
 real sqrt(template real x)
 {
   static if (is(x: const)) return sqrt_compiletime!(x);
   else return sqrt_runtime(x);
 }

This is pretty clear and straightforward. Something along these lines would be a killer feature for 2.0. Sean
Oct 17 2006
prev sibling next sibling parent Michael Butscher <mbutscher gmx.de> writes:
Don Clugston wrote:
 The question is -- would this be worthwhile? I'm really not interested 
 in making another toy.
 It's straightforward, but tedious, and would double the length of 
 std.regexp.

Maybe it would be simpler, but similar effective, to add some persistence functionality to std.regexp.RegExp. This could mean to convert an instance of RegExp to a byte array, cache it in a file, later load the array back and recreate the instance. So the "compilation" of the RE to its opcodes is done only once and program initialization becomes faster. Michael
Oct 16 2006
prev sibling next sibling parent BCS <BCS pathlink.com> writes:
Don Clugston wrote:
 BUT...
 
 The question is -- would this be worthwhile? I'm really not interested 
 in making another toy.
 It's straightforward, but tedious, and would double the length of 
 std.regexp.
 Would the use of templates be such a turn-off that people wouldn't use it?
 Do the benefits exceed the cost?

I have an idea for a project that would benefit from it.
Oct 16 2006
prev sibling next sibling parent Georg Wrede <georg.wrede nospam.org> writes:
Don Clugston wrote:
 However, there'd be no template code bloat, either.
 
 Existing code would be unchanged. You could even write:
 
 Regexp a = StaticRegExp!("ab?(ab*)+", "g");
 
 (assign a pre-compiled regular expression to an existing phobos RegExp).

IMHO, this would be a Killer App in template programming. And definitely one of the show-off gems in D.
Oct 17 2006
prev sibling parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Don Clugston wrote:
 
 It would behave *exactly* like the existing std.regexp, except that 
 compilation into the internal form would happen via template 
 metaprogramming, so that
 (1) all errors would be caught at compile time, and
 (2) there'd be a minor speedup because the compilation step would not 
 happen at runtime, and
 (3) otherwise it wouldn't be any faster than the existing regexp. 
 However, there'd be no template code bloat, either.
 

Whoa, "internal form" and "bytecoded program"? Out of curiosity, for those ignorant on the matter, like me, what kind of processing is done when creating a regexp, in terms of this internal form you speak of? Is it converted to a simple internal form, or something more complex? Bytecoded program seems pretty complex stuff, especially for a regexp (isn't the translation direct) ? -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Oct 21 2006
next sibling parent reply Don Clugston <dac nospam.com.au> writes:
Bruno Medeiros wrote:
 Don Clugston wrote:
 It would behave *exactly* like the existing std.regexp, except that 
 compilation into the internal form would happen via template 
 metaprogramming, so that
 (1) all errors would be caught at compile time, and
 (2) there'd be a minor speedup because the compilation step would not 
 happen at runtime, and
 (3) otherwise it wouldn't be any faster than the existing regexp. 
 However, there'd be no template code bloat, either.

Whoa, "internal form" and "bytecoded program"? Out of curiosity, for those ignorant on the matter, like me, what kind of processing is done when creating a regexp, in terms of this internal form you speak of? Is it converted to a simple internal form, or something more complex? Bytecoded program seems pretty complex stuff, especially for a regexp (isn't the translation direct) ?

It's *nowhere near* as complicated as it sounds. If you look into the source of std.regexp, you'll see what I mean -- there's a function called 'compile'.
Oct 21 2006
parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Don Clugston wrote:
 Bruno Medeiros wrote:
 Don Clugston wrote:
 It would behave *exactly* like the existing std.regexp, except that 
 compilation into the internal form would happen via template 
 metaprogramming, so that
 (1) all errors would be caught at compile time, and
 (2) there'd be a minor speedup because the compilation step would not 
 happen at runtime, and
 (3) otherwise it wouldn't be any faster than the existing regexp. 
 However, there'd be no template code bloat, either.

Whoa, "internal form" and "bytecoded program"? Out of curiosity, for those ignorant on the matter, like me, what kind of processing is done when creating a regexp, in terms of this internal form you speak of? Is it converted to a simple internal form, or something more complex? Bytecoded program seems pretty complex stuff, especially for a regexp (isn't the translation direct) ?

It's *nowhere near* as complicated as it sounds. If you look into the source of std.regexp, you'll see what I mean -- there's a function called 'compile'.

From my quick look, it is a simpler representation of the regexp string, but it still a linear (opcode) representation. It is not a graph state like I'd expect (and like benji said). (one is used/created later? I couldn't tell from std.regexp.test() ) -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Oct 26 2006
parent Don Clugston <dac nospam.com.au> writes:
Bruno Medeiros wrote:
 Don Clugston wrote:
 Bruno Medeiros wrote:
 Don Clugston wrote:
 It would behave *exactly* like the existing std.regexp, except that 
 compilation into the internal form would happen via template 
 metaprogramming, so that
 (1) all errors would be caught at compile time, and
 (2) there'd be a minor speedup because the compilation step would 
 not happen at runtime, and
 (3) otherwise it wouldn't be any faster than the existing regexp. 
 However, there'd be no template code bloat, either.

Whoa, "internal form" and "bytecoded program"? Out of curiosity, for those ignorant on the matter, like me, what kind of processing is done when creating a regexp, in terms of this internal form you speak of? Is it converted to a simple internal form, or something more complex? Bytecoded program seems pretty complex stuff, especially for a regexp (isn't the translation direct) ?

It's *nowhere near* as complicated as it sounds. If you look into the source of std.regexp, you'll see what I mean -- there's a function called 'compile'.

From my quick look, it is a simpler representation of the regexp string, but it still a linear (opcode) representation. It is not a graph state like I'd expect (and like benji said). (one is used/created later? I couldn't tell from std.regexp.test() )

The graph part is done via 'goto' opcodes. But the majority of the simple graph branches get converted into filters.
Oct 27 2006
prev sibling parent Benji Smith <dlanguage benjismith.net> writes:
Bruno Medeiros wrote:
 Whoa, "internal form" and "bytecoded program"? Out of curiosity, for 
 those ignorant on the matter, like me, what kind of processing is done 
 when creating a regexp, in terms of this internal form you speak of?

Generally speaking, regular expressions are compiled into a directed state graph. For example, a regex like this: a(b|c)d ...compiles into a state-graph like this: <mono-spaced> b a -< >- d c </mono-spaced> The regex engine is a state-machine-crawler that knows how to navigate from node to node in the graph, including rules about when it is allowed to backtrack, etc. So, the compiled representation of a regular expression isn't "opcodes" in the traditional sense of the word. It's just a very particular kind of state graph. --Benji
Oct 23 2006