digitalmars.D - Would there be interest in a SERIOUS compile-time regex parser?

Don Clugston (42/42) Oct 16 2006 In the past, Eric and I both developed compile-time regex engines, but

rm (20/71) Oct 16 2006 I'm not so far as looking into the current regexp module.

Don Clugston (15/90) Oct 17 2006 Thanks.

Hasan Aljudy (6/57) Oct 16 2006 I only speak for myself, but there are my 2 cents anyway:
Andrey Khropov (5/6) Oct 16 2006 I think yes. IMHO compile time version should be the default in std.rege...
Pragma (12/63) Oct 16 2006 It's always difficult to forsee the ramifications of anything that is
Walter Bright (24/30) Oct 16 2006 Yes, for the following reasons:

rm (7/45) Oct 16 2006 to code stuff like this, it would really come in handy, if there were a
Don Clugston (65/97) Oct 17 2006 I haven't worked out how to do backtracking sensibly when using mixins,

Bill Baxter (10/21) Oct 17 2006 That's certainly been my experience with boost. It's hit-or-miss.
Walter Bright (2/6) Oct 17 2006 That is a good idea.

=?iso-8859-1?q?Knud_S=F8rensen?= (9/16) Oct 17 2006 That put a crazy idea into my head.

Walter Bright (1/1) Oct 17 2006 I don't know.
Reiner Pope (88/107) Oct 17 2006 This is a very interesting question and I've thought a fair bit about th...

Bill Baxter (13/50) Oct 17 2006 This is exactly what I've been hoping someone would invent too. It

Sean Kelly (8/27) Oct 17 2006 I've always thought that meta-heavy libraries like Blitz++ were a noble

Michael Butscher (8/12) Oct 16 2006 Maybe it would be simpler, but similar effective, to add some
BCS (2/10) Oct 16 2006 I have an idea for a project that would benefit from it.
Georg Wrede (3/10) Oct 17 2006 IMHO, this would be a Killer App in template programming.
Bruno Medeiros (10/20) Oct 21 2006 Whoa, "internal form" and "bytecoded program"? Out of curiosity, for

Don Clugston (4/22) Oct 21 2006 It's *nowhere near* as complicated as it sounds. If you look into the

Bruno Medeiros (8/31) Oct 26 2006 From my quick look, it is a simpler representation of the regexp

Don Clugston (3/32) Oct 27 2006 The graph part is done via 'goto' opcodes. But the majority of the

Benji Smith (17/20) Oct 23 2006 Generally speaking, regular expressions are compiled into a directed

Don Clugston <dac nospam.com.au> writes:

In the past, Eric and I both developed compile-time regex engines, but 
they were proof-of-concept rather than something you'd actually use in 
production code. I think this also applies to C++ metaprogramming regexp 
engines, too.

I've had a bit of play around with the regexp code in Phobos, and have 
convinced myself that it would be straightforward to create a 
compile-time wrapper for the existing engine.

Usage could be something like:
--------
void main()
{
	char [] s = "abcabcabab";
          // case insensitive search
	foreach(m; rexSearch!("ab+", "i")(s))
	{
		writefln("%s[%s]%s", m.pre, m.match(0), m.post);
	}
}
--------

It would behave *exactly* like the existing std.regexp, except that 
compilation into the internal form would happen via template 
metaprogramming, so that
(1) all errors would be caught at compile time, and
(2) there'd be a minor speedup because the compilation step would not 
happen at runtime, and
(3) otherwise it wouldn't be any faster than the existing regexp. 
However, there'd be no template code bloat, either.

Existing code would be unchanged. You could even write:

Regexp a = StaticRegExp!("ab?(ab*)+", "g");

(assign a pre-compiled regular expression to an existing phobos RegExp).

There's potentially a greater speedup possible, because the Regexp class 
could become a struct, with no need for any dynamic memory allocation; 
but if this was done, mixing runtime and compile-time regexps together 
wouldn't be as seamless. And of course there's load of room for future 
enhancement.

BUT...

The question is -- would this be worthwhile? I'm really not interested 
in making another toy.
It's straightforward, but tedious, and would double the length of 
std.regexp.
Would the use of templates be such a turn-off that people wouldn't use it?
Do the benefits exceed the cost?

Oct 16 2006

rm <roel.mathys gmail.com> writes:

Don Clugston wrote:
 In the past, Eric and I both developed compile-time regex engines, but
 they were proof-of-concept rather than something you'd actually use in
 production code. I think this also applies to C++ metaprogramming regexp
 engines, too.
 
 I've had a bit of play around with the regexp code in Phobos, and have
 convinced myself that it would be straightforward to create a
 compile-time wrapper for the existing engine.
 
 Usage could be something like:
 --------
 void main()
 {
     char [] s = "abcabcabab";
          // case insensitive search
     foreach(m; rexSearch!("ab+", "i")(s))
     {
         writefln("%s[%s]%s", m.pre, m.match(0), m.post);
     }
 }
 --------
 
 It would behave *exactly* like the existing std.regexp, except that
 compilation into the internal form would happen via template
 metaprogramming, so that
 (1) all errors would be caught at compile time, and
 (2) there'd be a minor speedup because the compilation step would not
 happen at runtime, and
 (3) otherwise it wouldn't be any faster than the existing regexp.
 However, there'd be no template code bloat, either.
 
 Existing code would be unchanged. You could even write:
 
 Regexp a = StaticRegExp!("ab?(ab*)+", "g");
 
 (assign a pre-compiled regular expression to an existing phobos RegExp).
 
 There's potentially a greater speedup possible, because the Regexp class
 could become a struct, with no need for any dynamic memory allocation;
 but if this was done, mixing runtime and compile-time regexps together
 wouldn't be as seamless. And of course there's load of room for future
 enhancement.
 
 BUT...
 
 The question is -- would this be worthwhile? I'm really not interested
 in making another toy.
 It's straightforward, but tedious, and would double the length of
 std.regexp.
 Would the use of templates be such a turn-off that people wouldn't use it?
 Do the benefits exceed the cost?

I'm not so far as looking into the current regexp module.
But otoh I've already done some of the homework:

template findChar(char[] stringToSearch, char charToFind)
{
  static
    if ( stringToSearch.length == 0
       || stringToSearch[0] == charToFind )
      const int findChar = 0;
    else
      const int findChar
         = 1 + findChar!( stringToSearch[1..stringToSearch.length]
                        , charToFind);
}

gives the position of the char in the string, but if the position ==
length of stringToSearch, charToFind is not present.

I've got some others as well, I can parse an string literal into an
integer :-)

I'm willing to give a hand if you want.

roel

Oct 16 2006

Don Clugston <dac nospam.com.au> writes:

rm wrote:
 Don Clugston wrote:
 In the past, Eric and I both developed compile-time regex engines, but
 they were proof-of-concept rather than something you'd actually use in
 production code. I think this also applies to C++ metaprogramming regexp
 engines, too.

 I've had a bit of play around with the regexp code in Phobos, and have
 convinced myself that it would be straightforward to create a
 compile-time wrapper for the existing engine.

 Usage could be something like:
 --------
 void main()
 {
     char [] s = "abcabcabab";
          // case insensitive search
     foreach(m; rexSearch!("ab+", "i")(s))
     {
         writefln("%s[%s]%s", m.pre, m.match(0), m.post);
     }
 }
 --------

 It would behave *exactly* like the existing std.regexp, except that
 compilation into the internal form would happen via template
 metaprogramming, so that
 (1) all errors would be caught at compile time, and
 (2) there'd be a minor speedup because the compilation step would not
 happen at runtime, and
 (3) otherwise it wouldn't be any faster than the existing regexp.
 However, there'd be no template code bloat, either.

 Existing code would be unchanged. You could even write:

 Regexp a = StaticRegExp!("ab?(ab*)+", "g");

 (assign a pre-compiled regular expression to an existing phobos RegExp).

 There's potentially a greater speedup possible, because the Regexp class
 could become a struct, with no need for any dynamic memory allocation;
 but if this was done, mixing runtime and compile-time regexps together
 wouldn't be as seamless. And of course there's load of room for future
 enhancement.

 BUT...

 The question is -- would this be worthwhile? I'm really not interested
 in making another toy.
 It's straightforward, but tedious, and would double the length of
 std.regexp.
 Would the use of templates be such a turn-off that people wouldn't use it?
 Do the benefits exceed the cost?

 
 I'm not so far as looking into the current regexp module.
 But otoh I've already done some of the homework:
 
 template findChar(char[] stringToSearch, char charToFind)
 {
   static
     if ( stringToSearch.length == 0
        || stringToSearch[0] == charToFind )
       const int findChar = 0;
     else
       const int findChar
          = 1 + findChar!( stringToSearch[1..stringToSearch.length]
                         , charToFind);
 }
 
 gives the position of the char in the string, but if the position ==
 length of stringToSearch, charToFind is not present.
 
 I've got some others as well, I can parse an string literal into an
 integer :-)
 
 I'm willing to give a hand if you want.

Thanks.
Some of my old code is on dsource, in ddl/meta; it was much more 
difficult back then, when there were so many compiler bugs.
meta.nameof is the code I'm proudest of.

I have the basic framework in, and I've made progress on the character 
class stuff, which I think is the most complicated part (since it's full 
of bit masking operations on arrays). So far everything has been quite 
straightforward, and no innovation has been required.

A fantastically helpful thing you could do immediately would be to 
improve the code coverage of the unit tests in std.regexp. Currently 
it's only about 40%. For my purposes, I only need high test coverage, 
not necessarily sensible tests (I'll just test the compiled output). 
Sorry that it's not a very sexy job.
(Hint: we need some tests for the undocumented features like "a*?").

Oct 17 2006

Hasan Aljudy <hasan.aljudy gmail.com> writes:

I only speak for myself, but there are my 2 cents anyway:

I don't see myself ever needing this kind of thing.
As for the extra speedup, well, I've never been a speed freak, I always 
try to sacrifice performance in favor of code readability.

So I don't see this to be worthwhile.

Don Clugston wrote:
 In the past, Eric and I both developed compile-time regex engines, but 
 they were proof-of-concept rather than something you'd actually use in 
 production code. I think this also applies to C++ metaprogramming regexp 
 engines, too.
 
 I've had a bit of play around with the regexp code in Phobos, and have 
 convinced myself that it would be straightforward to create a 
 compile-time wrapper for the existing engine.
 
 Usage could be something like:
 --------
 void main()
 {
     char [] s = "abcabcabab";
          // case insensitive search
     foreach(m; rexSearch!("ab+", "i")(s))
     {
         writefln("%s[%s]%s", m.pre, m.match(0), m.post);
     }
 }
 --------
 
 It would behave *exactly* like the existing std.regexp, except that 
 compilation into the internal form would happen via template 
 metaprogramming, so that
 (1) all errors would be caught at compile time, and
 (2) there'd be a minor speedup because the compilation step would not 
 happen at runtime, and
 (3) otherwise it wouldn't be any faster than the existing regexp. 
 However, there'd be no template code bloat, either.
 
 Existing code would be unchanged. You could even write:
 
 Regexp a = StaticRegExp!("ab?(ab*)+", "g");
 
 (assign a pre-compiled regular expression to an existing phobos RegExp).
 
 There's potentially a greater speedup possible, because the Regexp class 
 could become a struct, with no need for any dynamic memory allocation; 
 but if this was done, mixing runtime and compile-time regexps together 
 wouldn't be as seamless. And of course there's load of room for future 
 enhancement.
 
 BUT...
 
 The question is -- would this be worthwhile? I'm really not interested 
 in making another toy.
 It's straightforward, but tedious, and would double the length of 
 std.regexp.
 Would the use of templates be such a turn-off that people wouldn't use it?
 Do the benefits exceed the cost?

Oct 16 2006

"Andrey Khropov" <andkhropov_nosp m_mtu-net.ru> writes:

Don Clugston wrote:

 Do the benefits exceed the cost?

I think yes. IMHO compile time version should be the default in std.regexp.

And it's also a good usecase of D features for the world at large.

-- 
AKhropov

Oct 16 2006

Pragma <ericanderton yahoo.removeme.com> writes:

Don Clugston wrote:
 In the past, Eric and I both developed compile-time regex engines, but 
 they were proof-of-concept rather than something you'd actually use in 
 production code. I think this also applies to C++ metaprogramming regexp 
 engines, too.
 
 I've had a bit of play around with the regexp code in Phobos, and have 
 convinced myself that it would be straightforward to create a 
 compile-time wrapper for the existing engine.
 
 Usage could be something like:
 --------
 void main()
 {
     char [] s = "abcabcabab";
          // case insensitive search
     foreach(m; rexSearch!("ab+", "i")(s))
     {
         writefln("%s[%s]%s", m.pre, m.match(0), m.post);
     }
 }
 --------
 
 It would behave *exactly* like the existing std.regexp, except that 
 compilation into the internal form would happen via template 
 metaprogramming, so that
 (1) all errors would be caught at compile time, and
 (2) there'd be a minor speedup because the compilation step would not 
 happen at runtime, and
 (3) otherwise it wouldn't be any faster than the existing regexp. 
 However, there'd be no template code bloat, either.
 
 Existing code would be unchanged. You could even write:
 
 Regexp a = StaticRegExp!("ab?(ab*)+", "g");
 
 (assign a pre-compiled regular expression to an existing phobos RegExp).
 
 There's potentially a greater speedup possible, because the Regexp class 
 could become a struct, with no need for any dynamic memory allocation; 
 but if this was done, mixing runtime and compile-time regexps together 
 wouldn't be as seamless. And of course there's load of room for future 
 enhancement.
 
 BUT...
 
 The question is -- would this be worthwhile? I'm really not interested 
 in making another toy.
 It's straightforward, but tedious, and would double the length of 
 std.regexp.
 Would the use of templates be such a turn-off that people wouldn't use it?
 Do the benefits exceed the cost?

It's always difficult to forsee the ramifications of anything that is 
new in this sense.  I'm curious as to how many people have used the 
Boost implementation... maybe that would give you an idea of how much 
real-world potential it has.

FWIW, I could use it within Enki for regex expressions, when they're 
used in an input grammar.  It would yield some nice speed increases for 
regex heavy designs, w/o having to re-implement it all over again for 
Enki's sake.  The only requirement I have is that it must handle UTF 
correctly, preferably UTF32 or a templated char type.

-- 
- EricAnderton at yahoo

Oct 16 2006

Walter Bright <newshound digitalmars.com> writes:

Don Clugston wrote:
 The question is -- would this be worthwhile? I'm really not interested 
 in making another toy.
 It's straightforward, but tedious, and would double the length of 
 std.regexp.
 Would the use of templates be such a turn-off that people wouldn't use it?
 Do the benefits exceed the cost?

Yes, for the following reasons:

1) I think it would make for faster regexp's, more than just by omitting 
the compile step. That's because the bytecoded program wouldn't need to 
be interpreted.

2) When I show the current one to people, as opposed to Eric Niebler's 
C++ one, the response is "but the D one is just a toy" with the 
implication that D is just a toy.

3) I wrote std.regexp long before people needed or asked for it. I knew 
it would eventually become a critically needed module, and that came to 
pass. It was nice that it was there when they needed it, and the time 
passed had ensured that it was a solid piece of code.

4) Your crafting of the current toy version was the catalyst for a big 
leap forward in D's templating system. Doing a professional one may 
expose critical problems that need to be fixed, and even if no such 
flaws are discovered, it would prove that D's TMP capabilities are up to 
scratch.

Sure, a lot of people are turned off by templates. That's one of my 
motivations for making things like associative arrays usable without 
templates. But for the people who do use templates, this can be a very 
big deal.

I think it should be in a separate module, say, regexp_static or 
something like that. Ideally, the user can switch between the two just 
by changing the module name in his code, and compare.

Oct 16 2006

rm <roel.mathys gmail.com> writes:

Walter Bright wrote:
 Don Clugston wrote:
 The question is -- would this be worthwhile? I'm really not interested
 in making another toy.
 It's straightforward, but tedious, and would double the length of
 std.regexp.
 Would the use of templates be such a turn-off that people wouldn't use
 it?
 Do the benefits exceed the cost?

 
 Yes, for the following reasons:
 
 1) I think it would make for faster regexp's, more than just by omitting
 the compile step. That's because the bytecoded program wouldn't need to
 be interpreted.
 
 2) When I show the current one to people, as opposed to Eric Niebler's
 C++ one, the response is "but the D one is just a toy" with the
 implication that D is just a toy.
 
 3) I wrote std.regexp long before people needed or asked for it. I knew
 it would eventually become a critically needed module, and that came to
 pass. It was nice that it was there when they needed it, and the time
 passed had ensured that it was a solid piece of code.
 
 4) Your crafting of the current toy version was the catalyst for a big
 leap forward in D's templating system. Doing a professional one may
 expose critical problems that need to be fixed, and even if no such
 flaws are discovered, it would prove that D's TMP capabilities are up to
 scratch.
 
 Sure, a lot of people are turned off by templates. That's one of my
 motivations for making things like associative arrays usable without
 templates. But for the people who do use templates, this can be a very
 big deal.
 
 I think it should be in a separate module, say, regexp_static or
 something like that. Ideally, the user can switch between the two just
 by changing the module name in his code, and compare.

to code stuff like this, it would really come in handy, if there were a
compiler switch which could be set, to trace which template
instantiations are going on with which arguments.

For the moment I'm putting lot's of pragma-s in to manually trace this
stuff, but it's so verbose ;-)

roel

Oct 16 2006

Don Clugston <dac nospam.com.au> writes:

Walter Bright wrote:
 Don Clugston wrote:
 The question is -- would this be worthwhile? I'm really not interested 
 in making another toy.
 It's straightforward, but tedious, and would double the length of 
 std.regexp.
 Would the use of templates be such a turn-off that people wouldn't use 
 it?
 Do the benefits exceed the cost?

 
 Yes, for the following reasons:
 
 1) I think it would make for faster regexp's, more than just by omitting 
 the compile step. That's because the bytecoded program wouldn't need to 
 be interpreted.

I haven't worked out how to do backtracking sensibly when using mixins, 
so what I'm doing is simply compiling to the bytecoded program at 
compile time. Later on, it would be possible to add another step which 
looks for simple pattern types, and generates specific code for them (my 
guess is that 90% of all regexps use just one of a few simple categories).

 2) When I show the current one to people, as opposed to Eric Niebler's 
 C++ one, the response is "but the D one is just a toy" with the 
 implication that D is just a toy.

<raise eyebrows> I wonder if anyone is actually using the C++ regexp in 
production code? Personally I get the feeling that so much of boost is 
extremely clever, but not terribly practical.

 3) I wrote std.regexp long before people needed or asked for it. I knew 
 it would eventually become a critically needed module, and that came to 
 pass. It was nice that it was there when they needed it, and the time 
 passed had ensured that it was a solid piece of code.
 
 4) Your crafting of the current toy version was the catalyst for a big 
 leap forward in D's templating system. Doing a professional one may 
 expose critical problems that need to be fixed, and even if no such 
 flaws are discovered, it would prove that D's TMP capabilities are up to 
 scratch.
 
 Sure, a lot of people are turned off by templates. That's one of my 
 motivations for making things like associative arrays usable without 
 templates. But for the people who do use templates, this can be a very 
 big deal.

There's a potential piece of syntax sugar that IMHO would be a killer 
feature. If it were possible to overload a function based on whether a 
parameter is a compile-time literal, or a variable, the use of template 
metaprogramming could become completely invisible to the user.

In my code, the situation is basically like this:
---------
class RegExp
{
   char [] program;

   void makeProgramAtCompileTime(char [] pattern)()
   {
	program = RegExpCompile!(pattern);
   }

   void makeProgramAtRunTime(char [] pattern)
   {
     program = runtimecompile(pattern);
   }
}
---------

Imagine if there were some way to create an alias
makeProgram(char [] pattern, int otherparameters)

which became
makeProgramAtCompileTime!(pattern)(otherparameters);
if pattern were a compile time literal,
but became
makeProgramAtRunTime(pattern, otherparameters);
if pattern were a variable.

Then the user wouldn't even need to know that they were using templates.
It would also let you do some pretty amazing stuff.
Consider something as basic as sin(x).

real sin_compiletime(real x)()
{
   static if (x==0) return x; // we can do this one
   else return sin_runtime(x); // everything else is too hard, pass it 
to the runtime version.
}

Which is much easier than training the optimiser that it can replace 
sin(0) with 0. For more complex situations, you could perform 
optimisations that are technically infeasible for a normal optimiser.
You could also catch more errors at compile time:

real sqrt_compiletime(real x)()
{
   static assert(x>=0, "Error, square root of negative number");
   return sqrt_runtime(x);
}

There probably aren't many cases where this kind of optimisation would 
actually be worth doing, but for those cases, the benefit could be 
considerable.

One possible syntax would be something like:

real sqrt(template real x)
{
   static if (is(x: const)) return sqrt_compiletime!(x);
   else return sqrt_runtime(x);
}

Another might be "template alias".

Something to think about; not a proposal at present.

Oct 17 2006

Bill Baxter <wbaxter gmail.com> writes:

Don Clugston wrote:
 Walter Bright wrote:
 
 Don Clugston wrote:

 2) When I show the current one to people, as opposed to Eric Niebler's 
 C++ one, the response is "but the D one is just a toy" with the 
 implication that D is just a toy.

 
 <raise eyebrows> I wonder if anyone is actually using the C++ regexp in 
 production code? Personally I get the feeling that so much of boost is 
 extremely clever, but not terribly practical.

That's certainly been my experience with boost.  It's hit-or-miss. 
shared_ptr stuff -- great.  functional and serialization -- clever, but 
ultimately a big waste of my time.  It's easier to write the dang ugly 
for loop than to figure out the right incantation of _1's and _2's and 
binders to get functional::map working.  I really wish boost had some 
sort of feedback form at the bottom of every page so people could rate 
the packages and make comments.  Kind of like the MathWorks site for 
user-contributed Matlab code.  Could be a good thing for D-source too.

--bb

Oct 17 2006

Walter Bright <newshound digitalmars.com> writes:

Don Clugston wrote:
 There's a potential piece of syntax sugar that IMHO would be a killer 
 feature. If it were possible to overload a function based on whether a 
 parameter is a compile-time literal, or a variable, the use of template 
 metaprogramming could become completely invisible to the user.

That is a good idea.

Oct 17 2006

=?iso-8859-1?q?Knud_S=F8rensen?= <12tkvvb02 sneakemail.com> writes:

On Tue, 17 Oct 2006 12:23:33 -0700, Walter Bright wrote:

 Don Clugston wrote:
 There's a potential piece of syntax sugar that IMHO would be a killer 
 feature. If it were possible to overload a function based on whether a 
 parameter is a compile-time literal, or a variable, the use of template 
 metaprogramming could become completely invisible to the user.

 
 That is a good idea.

That put a crazy idea into my head.

disclaimer: My education is in physics not computer science, 
I have no idea of how to write a compiler so my view may be very wrong ;-)

I have previously suggested that D handle functions on compile-time
literal by making them inline and let the optimizer evaluate them. 
 
Would it be possible to combine the template-engine and the optimizer
such that one of them could be refactored away ??

Oct 17 2006

Walter Bright <newshound digitalmars.com> writes:

I don't know.

Oct 17 2006

Reiner Pope <reiner.pope REMOVE.THIS.gmail.com> writes:

Knud S�rensen wrote:
 On Tue, 17 Oct 2006 12:23:33 -0700, Walter Bright wrote:
 
 Don Clugston wrote:
 There's a potential piece of syntax sugar that IMHO would be a killer 
 feature. If it were possible to overload a function based on whether a 
 parameter is a compile-time literal, or a variable, the use of template 
 metaprogramming could become completely invisible to the user.

 That is a good idea.

 
 That put a crazy idea into my head.
 
 disclaimer: My education is in physics not computer science, 
 I have no idea of how to write a compiler so my view may be very wrong ;-)
 
 I have previously suggested that D handle functions on compile-time
 literal by making them inline and let the optimizer evaluate them. 
  
 Would it be possible to combine the template-engine and the optimizer
 such that one of them could be refactored away ?? 

This is a very interesting question and I've thought a fair bit about this.

Three short answers:

1. This would be easy to do (ie it is only heavy syntax sugar) for a 
limited subset of the language (this subset would exclude reference 
semantics).

2. It would be possible to do for almost the entire language, but would 
require a real interpreter at compile time, which means a fair bit more 
work (but still possible and very appealing).

3. Neither of these two mechanisms would be enough to do away with the 
primary motivation for templates -- parameterised types. However, they 
would remove the need for a Turing-complete template system.


<rant>
Long answers:

1. Value semantics are inherently easier for a compiler to handle, 
because evaluation of them is just done by inlining, const folding and 
const propagation. Right now, an automated conversion of normal D 
functions to template form is pretty much possible, using the following 
rules:
    - convert every function to a template
    - convert every assignment to a variable into static single 
assignment form:
         a = 5;
         a = 6;
             becomes
         const a__1 = 5;
         const a__2 = 6;
    - convert function calls into template instantiations
         a = foo();
             becomes
         const a__1 = foo!().__val;
    - convert returns into
        const __val = whatever;
    - now that everything is constant, we can use static if instead of 
if everywhere. So, just convert all ifs into static ifs.
    - convert loops into recursive functions:
       while (condition)
       {  ... }
         becomes
       template __Loop1(local_variables....)
       {
          // Do stuff
          static if (condition) __val = _Loop10!(new_local_variables).__val;
       }

This basically does it but, since the template system only allows value 
semantics, this conversion must only allow them, so it means:
   - no pointers
   - no classes
   - no index-setting/slice-setting of arrays
   - no inout/out parameters
and also
   - no assembly
   - no C code
because the template system can't be expected to handle them.


2. By implementing a real interpreter in the compiler, you can handle 
reference semantics. The only things you can't handle are
   - assembly
   - other languages
   - global non-const variables (which should generally be avoided anyway)




Both of these systems would incorporate a way of saying, 'evaluate this 
at compile time', even if it were just as simple as declaring the 
variable const, using it as a template parameter or in a static if:

   version (good) { const int N = 80; }
   else { const int N = 16 }
   const foo = sqrt(N); // foo is 4
   static if (sqrt(foo) > 3) // that expression must be a compile-time 
constant
     alias ubyte MyByte;
   else
     alias byte MyByte;
   List!(foo, MyByte) myList;

The main motivations for this are, (in decreasing order of importance):
   - ability to use runtime functions for static if and template 
parameters (where static if is important because of
   - early catching of errors (eg incorrect regexes)
   - efficiency (eg not having to compile the regex at runtime; also, 
having cheap reflection at compile time)



In my opinion, fully supporting an interpreter at compile time simply 
turns metaprogramming into normal D with the requirement that it be 
optimized away by the compiler, and the extensibility and opportunities 
that this could allow is a huge motivation. However, being a very large 
amount of work with a lot of details, there is no hope of it being 
completed any time soon, but I think the motivation is so much that it 
is worth devoting an entire release of D (maybe 2.0 or 3.0) to being a 
'metaprogramming release.'

</rant>

Cheers,

Reiner

Oct 17 2006

Bill Baxter <dnewsgroup billbaxter.com> writes:

Reiner Pope wrote:
 Knud S�rensen wrote:
 On Tue, 17 Oct 2006 12:23:33 -0700, Walter Bright wrote:

 Don Clugston wrote:



 Three short answers:
 
 1. This would be easy to do (ie it is only heavy syntax sugar) for a 
 limited subset of the language (this subset would exclude reference 
 semantics).
 
 2. It would be possible to do for almost the entire language, but would 
 require a real interpreter at compile time, which means a fair bit more 
 work (but still possible and very appealing).
 
 3. Neither of these two mechanisms would be enough to do away with the 
 primary motivation for templates -- parameterised types. However, they 
 would remove the need for a Turing-complete template system.

...
 Both of these systems would incorporate a way of saying, 'evaluate this 
 at compile time', even if it were just as simple as declaring the 
 variable const, using it as a template parameter or in a static if:
 
   version (good) { const int N = 80; }
   else { const int N = 16 }
   const foo = sqrt(N); // foo is 4
   static if (sqrt(foo) > 3) // that expression must be a compile-time 
 constant
     alias ubyte MyByte;
   else
     alias byte MyByte;
   List!(foo, MyByte) myList;
 
 The main motivations for this are, (in decreasing order of importance):
   - ability to use runtime functions for static if and template 
 parameters (where static if is important because of
   - early catching of errors (eg incorrect regexes)
   - efficiency (eg not having to compile the regex at runtime; also, 
 having cheap reflection at compile time)

This is exactly what I've been hoping someone would invent too.  It 
seems like the next logical step in template metaprogramming.  I think 
Walter's hip to the idea too.  Soon after the last big discussion about 
metaprogramming stuff and making it more like a true compile-time 
programming language, "static if" appeared.  I think that was something 
that Walter realized was "low-hanging fruit", which could quickly be 
implemented today and have a big impact now without waiting around till 
D 3.0.

So, I think it's vaguely on the D roadmap somewhere, it's just a huge 
undertaking and will take time.

--bb

Oct 17 2006

Sean Kelly <sean f4.ca> writes:

Don Clugston wrote:
 
 Imagine if there were some way to create an alias
 makeProgram(char [] pattern, int otherparameters)
 
 which became
 makeProgramAtCompileTime!(pattern)(otherparameters);
 if pattern were a compile time literal,
 but became
 makeProgramAtRunTime(pattern, otherparameters);
 if pattern were a variable.
 
 Then the user wouldn't even need to know that they were using templates.

I've always thought that meta-heavy libraries like Blitz++ were a noble 
attempt at something the language didn't really support, but the ability 
to mix compile and runtime code using the same call syntax would 
completely change things.

 One possible syntax would be something like:
 
 real sqrt(template real x)
 {
   static if (is(x: const)) return sqrt_compiletime!(x);
   else return sqrt_runtime(x);
 }

This is pretty clear and straightforward.  Something along these lines 
would be a killer feature for 2.0.


Sean

Oct 17 2006

Michael Butscher <mbutscher gmx.de> writes:

Don Clugston wrote:
 The question is -- would this be worthwhile? I'm really not interested 
 in making another toy.
 It's straightforward, but tedious, and would double the length of 
 std.regexp.

Maybe it would be simpler, but similar effective, to add some 
persistence functionality to std.regexp.RegExp.

This could mean to convert an instance of RegExp to a byte array, cache 
it in a file, later load the array back and recreate the instance.

So the "compilation" of the RE to its opcodes is done only once and 
program initialization becomes faster.


Michael

Oct 16 2006

BCS <BCS pathlink.com> writes:

Don Clugston wrote:
 BUT...
 
 The question is -- would this be worthwhile? I'm really not interested 
 in making another toy.
 It's straightforward, but tedious, and would double the length of 
 std.regexp.
 Would the use of templates be such a turn-off that people wouldn't use it?
 Do the benefits exceed the cost?

I have an idea for a project that would benefit from it.

Oct 16 2006

Georg Wrede <georg.wrede nospam.org> writes:

Don Clugston wrote:
 However, there'd be no template code bloat, either.
 
 Existing code would be unchanged. You could even write:
 
 Regexp a = StaticRegExp!("ab?(ab*)+", "g");
 
 (assign a pre-compiled regular expression to an existing phobos RegExp).

IMHO, this would be a Killer App in template programming.

And definitely one of the show-off gems in D.

Oct 17 2006

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Don Clugston wrote:
 
 It would behave *exactly* like the existing std.regexp, except that 
 compilation into the internal form would happen via template 
 metaprogramming, so that
 (1) all errors would be caught at compile time, and
 (2) there'd be a minor speedup because the compilation step would not 
 happen at runtime, and
 (3) otherwise it wouldn't be any faster than the existing regexp. 
 However, there'd be no template code bloat, either.
 

Whoa, "internal form" and "bytecoded program"? Out of curiosity, for 
those ignorant on the matter, like me, what kind of processing is done 
when creating a regexp, in terms of this internal form you speak of? Is 
it converted to a simple internal form, or something more complex? 
Bytecoded program seems pretty complex stuff, especially for a regexp 
(isn't the translation direct) ?

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Oct 21 2006

Don Clugston <dac nospam.com.au> writes:

Bruno Medeiros wrote:
 Don Clugston wrote:
 It would behave *exactly* like the existing std.regexp, except that 
 compilation into the internal form would happen via template 
 metaprogramming, so that
 (1) all errors would be caught at compile time, and
 (2) there'd be a minor speedup because the compilation step would not 
 happen at runtime, and
 (3) otherwise it wouldn't be any faster than the existing regexp. 
 However, there'd be no template code bloat, either.

 
 Whoa, "internal form" and "bytecoded program"? Out of curiosity, for 
 those ignorant on the matter, like me, what kind of processing is done 
 when creating a regexp, in terms of this internal form you speak of? Is 
 it converted to a simple internal form, or something more complex? 
 Bytecoded program seems pretty complex stuff, especially for a regexp 
 (isn't the translation direct) ?

It's *nowhere near* as complicated as it sounds. If you look into the 
source of std.regexp, you'll see what I mean -- there's a function 
called 'compile'.

Oct 21 2006

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Don Clugston wrote:
 Bruno Medeiros wrote:
 Don Clugston wrote:
 It would behave *exactly* like the existing std.regexp, except that 
 compilation into the internal form would happen via template 
 metaprogramming, so that
 (1) all errors would be caught at compile time, and
 (2) there'd be a minor speedup because the compilation step would not 
 happen at runtime, and
 (3) otherwise it wouldn't be any faster than the existing regexp. 
 However, there'd be no template code bloat, either.

 Whoa, "internal form" and "bytecoded program"? Out of curiosity, for 
 those ignorant on the matter, like me, what kind of processing is done 
 when creating a regexp, in terms of this internal form you speak of? 
 Is it converted to a simple internal form, or something more complex? 
 Bytecoded program seems pretty complex stuff, especially for a regexp 
 (isn't the translation direct) ?

 
 It's *nowhere near* as complicated as it sounds. If you look into the 
 source of std.regexp, you'll see what I mean -- there's a function 
 called 'compile'.

 From my quick look, it is a simpler representation of the regexp 
string, but it still a linear (opcode) representation. It is not a graph 
state like I'd expect (and like benji said). (one is used/created later? 
I couldn't tell from std.regexp.test() )

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Oct 26 2006

Don Clugston <dac nospam.com.au> writes:

Bruno Medeiros wrote:
 Don Clugston wrote:
 Bruno Medeiros wrote:
 Don Clugston wrote:
 It would behave *exactly* like the existing std.regexp, except that 
 compilation into the internal form would happen via template 
 metaprogramming, so that
 (1) all errors would be caught at compile time, and
 (2) there'd be a minor speedup because the compilation step would 
 not happen at runtime, and
 (3) otherwise it wouldn't be any faster than the existing regexp. 
 However, there'd be no template code bloat, either.

 Whoa, "internal form" and "bytecoded program"? Out of curiosity, for 
 those ignorant on the matter, like me, what kind of processing is 
 done when creating a regexp, in terms of this internal form you speak 
 of? Is it converted to a simple internal form, or something more 
 complex? Bytecoded program seems pretty complex stuff, especially for 
 a regexp (isn't the translation direct) ?

 It's *nowhere near* as complicated as it sounds. If you look into the 
 source of std.regexp, you'll see what I mean -- there's a function 
 called 'compile'.

 
  From my quick look, it is a simpler representation of the regexp 
 string, but it still a linear (opcode) representation. It is not a graph 
 state like I'd expect (and like benji said). (one is used/created later? 
 I couldn't tell from std.regexp.test() )

The graph part is done via 'goto' opcodes. But the majority of the 
simple graph branches get converted into filters.

Oct 27 2006

Benji Smith <dlanguage benjismith.net> writes:

Bruno Medeiros wrote:
 Whoa, "internal form" and "bytecoded program"? Out of curiosity, for 
 those ignorant on the matter, like me, what kind of processing is done 
 when creating a regexp, in terms of this internal form you speak of?

Generally speaking, regular expressions are compiled into a directed 
state graph. For example, a regex like this:

   a(b|c)d

...compiles into a state-graph like this:

<mono-spaced>

      b
a -<   >- d
      c

</mono-spaced>

The regex engine is a state-machine-crawler that knows how to navigate 
from node to node in the graph, including rules about when it is allowed 
to backtrack, etc.

So, the compiled representation of a regular expression isn't "opcodes" 
in the traditional sense of the word. It's just a very particular kind 
of state graph.

--Benji

Oct 23 2006

D Programming

C/C++ Programming

Other

digitalmars.D - Would there be interest in a SERIOUS compile-time regex parser?