digitalmars.D - Would there be interest in a SERIOUS compile-time regex parser?
- Don Clugston (42/42) Oct 16 2006 In the past, Eric and I both developed compile-time regex engines, but
- rm (20/71) Oct 16 2006 I'm not so far as looking into the current regexp module.
- Don Clugston (15/90) Oct 17 2006 Thanks.
- Hasan Aljudy (6/57) Oct 16 2006 I only speak for myself, but there are my 2 cents anyway:
- Andrey Khropov (5/6) Oct 16 2006 I think yes. IMHO compile time version should be the default in std.rege...
- Pragma (12/63) Oct 16 2006 It's always difficult to forsee the ramifications of anything that is
- Walter Bright (24/30) Oct 16 2006 Yes, for the following reasons:
- rm (7/45) Oct 16 2006 to code stuff like this, it would really come in handy, if there were a
- Don Clugston (65/97) Oct 17 2006 I haven't worked out how to do backtracking sensibly when using mixins,
- Bill Baxter (10/21) Oct 17 2006 That's certainly been my experience with boost. It's hit-or-miss.
- Walter Bright (2/6) Oct 17 2006 That is a good idea.
- =?iso-8859-1?q?Knud_S=F8rensen?= (9/16) Oct 17 2006 That put a crazy idea into my head.
- Walter Bright (1/1) Oct 17 2006 I don't know.
- Reiner Pope (88/107) Oct 17 2006 This is a very interesting question and I've thought a fair bit about th...
- Bill Baxter (13/50) Oct 17 2006 This is exactly what I've been hoping someone would invent too. It
- Sean Kelly (8/27) Oct 17 2006 I've always thought that meta-heavy libraries like Blitz++ were a noble
- Michael Butscher (8/12) Oct 16 2006 Maybe it would be simpler, but similar effective, to add some
- BCS (2/10) Oct 16 2006 I have an idea for a project that would benefit from it.
- Georg Wrede (3/10) Oct 17 2006 IMHO, this would be a Killer App in template programming.
- Bruno Medeiros (10/20) Oct 21 2006 Whoa, "internal form" and "bytecoded program"? Out of curiosity, for
- Don Clugston (4/22) Oct 21 2006 It's *nowhere near* as complicated as it sounds. If you look into the
- Bruno Medeiros (8/31) Oct 26 2006 From my quick look, it is a simpler representation of the regexp
- Don Clugston (3/32) Oct 27 2006 The graph part is done via 'goto' opcodes. But the majority of the
- Benji Smith (17/20) Oct 23 2006 Generally speaking, regular expressions are compiled into a directed
In the past, Eric and I both developed compile-time regex engines, but they were proof-of-concept rather than something you'd actually use in production code. I think this also applies to C++ metaprogramming regexp engines, too. I've had a bit of play around with the regexp code in Phobos, and have convinced myself that it would be straightforward to create a compile-time wrapper for the existing engine. Usage could be something like: -------- void main() { char [] s = "abcabcabab"; // case insensitive search foreach(m; rexSearch!("ab+", "i")(s)) { writefln("%s[%s]%s", m.pre, m.match(0), m.post); } } -------- It would behave *exactly* like the existing std.regexp, except that compilation into the internal form would happen via template metaprogramming, so that (1) all errors would be caught at compile time, and (2) there'd be a minor speedup because the compilation step would not happen at runtime, and (3) otherwise it wouldn't be any faster than the existing regexp. However, there'd be no template code bloat, either. Existing code would be unchanged. You could even write: Regexp a = StaticRegExp!("ab?(ab*)+", "g"); (assign a pre-compiled regular expression to an existing phobos RegExp). There's potentially a greater speedup possible, because the Regexp class could become a struct, with no need for any dynamic memory allocation; but if this was done, mixing runtime and compile-time regexps together wouldn't be as seamless. And of course there's load of room for future enhancement. BUT... The question is -- would this be worthwhile? I'm really not interested in making another toy. It's straightforward, but tedious, and would double the length of std.regexp. Would the use of templates be such a turn-off that people wouldn't use it? Do the benefits exceed the cost?
Oct 16 2006
Don Clugston wrote:In the past, Eric and I both developed compile-time regex engines, but they were proof-of-concept rather than something you'd actually use in production code. I think this also applies to C++ metaprogramming regexp engines, too. I've had a bit of play around with the regexp code in Phobos, and have convinced myself that it would be straightforward to create a compile-time wrapper for the existing engine. Usage could be something like: -------- void main() { char [] s = "abcabcabab"; // case insensitive search foreach(m; rexSearch!("ab+", "i")(s)) { writefln("%s[%s]%s", m.pre, m.match(0), m.post); } } -------- It would behave *exactly* like the existing std.regexp, except that compilation into the internal form would happen via template metaprogramming, so that (1) all errors would be caught at compile time, and (2) there'd be a minor speedup because the compilation step would not happen at runtime, and (3) otherwise it wouldn't be any faster than the existing regexp. However, there'd be no template code bloat, either. Existing code would be unchanged. You could even write: Regexp a = StaticRegExp!("ab?(ab*)+", "g"); (assign a pre-compiled regular expression to an existing phobos RegExp). There's potentially a greater speedup possible, because the Regexp class could become a struct, with no need for any dynamic memory allocation; but if this was done, mixing runtime and compile-time regexps together wouldn't be as seamless. And of course there's load of room for future enhancement. BUT... The question is -- would this be worthwhile? I'm really not interested in making another toy. It's straightforward, but tedious, and would double the length of std.regexp. Would the use of templates be such a turn-off that people wouldn't use it? Do the benefits exceed the cost?I'm not so far as looking into the current regexp module. But otoh I've already done some of the homework: template findChar(char[] stringToSearch, char charToFind) { static if ( stringToSearch.length == 0 || stringToSearch[0] == charToFind ) const int findChar = 0; else const int findChar = 1 + findChar!( stringToSearch[1..stringToSearch.length] , charToFind); } gives the position of the char in the string, but if the position == length of stringToSearch, charToFind is not present. I've got some others as well, I can parse an string literal into an integer :-) I'm willing to give a hand if you want. roel
Oct 16 2006
rm wrote:Don Clugston wrote:Thanks. Some of my old code is on dsource, in ddl/meta; it was much more difficult back then, when there were so many compiler bugs. meta.nameof is the code I'm proudest of. I have the basic framework in, and I've made progress on the character class stuff, which I think is the most complicated part (since it's full of bit masking operations on arrays). So far everything has been quite straightforward, and no innovation has been required. A fantastically helpful thing you could do immediately would be to improve the code coverage of the unit tests in std.regexp. Currently it's only about 40%. For my purposes, I only need high test coverage, not necessarily sensible tests (I'll just test the compiled output). Sorry that it's not a very sexy job. (Hint: we need some tests for the undocumented features like "a*?").In the past, Eric and I both developed compile-time regex engines, but they were proof-of-concept rather than something you'd actually use in production code. I think this also applies to C++ metaprogramming regexp engines, too. I've had a bit of play around with the regexp code in Phobos, and have convinced myself that it would be straightforward to create a compile-time wrapper for the existing engine. Usage could be something like: -------- void main() { char [] s = "abcabcabab"; // case insensitive search foreach(m; rexSearch!("ab+", "i")(s)) { writefln("%s[%s]%s", m.pre, m.match(0), m.post); } } -------- It would behave *exactly* like the existing std.regexp, except that compilation into the internal form would happen via template metaprogramming, so that (1) all errors would be caught at compile time, and (2) there'd be a minor speedup because the compilation step would not happen at runtime, and (3) otherwise it wouldn't be any faster than the existing regexp. However, there'd be no template code bloat, either. Existing code would be unchanged. You could even write: Regexp a = StaticRegExp!("ab?(ab*)+", "g"); (assign a pre-compiled regular expression to an existing phobos RegExp). There's potentially a greater speedup possible, because the Regexp class could become a struct, with no need for any dynamic memory allocation; but if this was done, mixing runtime and compile-time regexps together wouldn't be as seamless. And of course there's load of room for future enhancement. BUT... The question is -- would this be worthwhile? I'm really not interested in making another toy. It's straightforward, but tedious, and would double the length of std.regexp. Would the use of templates be such a turn-off that people wouldn't use it? Do the benefits exceed the cost?I'm not so far as looking into the current regexp module. But otoh I've already done some of the homework: template findChar(char[] stringToSearch, char charToFind) { static if ( stringToSearch.length == 0 || stringToSearch[0] == charToFind ) const int findChar = 0; else const int findChar = 1 + findChar!( stringToSearch[1..stringToSearch.length] , charToFind); } gives the position of the char in the string, but if the position == length of stringToSearch, charToFind is not present. I've got some others as well, I can parse an string literal into an integer :-) I'm willing to give a hand if you want.
Oct 17 2006
I only speak for myself, but there are my 2 cents anyway: I don't see myself ever needing this kind of thing. As for the extra speedup, well, I've never been a speed freak, I always try to sacrifice performance in favor of code readability. So I don't see this to be worthwhile. Don Clugston wrote:In the past, Eric and I both developed compile-time regex engines, but they were proof-of-concept rather than something you'd actually use in production code. I think this also applies to C++ metaprogramming regexp engines, too. I've had a bit of play around with the regexp code in Phobos, and have convinced myself that it would be straightforward to create a compile-time wrapper for the existing engine. Usage could be something like: -------- void main() { char [] s = "abcabcabab"; // case insensitive search foreach(m; rexSearch!("ab+", "i")(s)) { writefln("%s[%s]%s", m.pre, m.match(0), m.post); } } -------- It would behave *exactly* like the existing std.regexp, except that compilation into the internal form would happen via template metaprogramming, so that (1) all errors would be caught at compile time, and (2) there'd be a minor speedup because the compilation step would not happen at runtime, and (3) otherwise it wouldn't be any faster than the existing regexp. However, there'd be no template code bloat, either. Existing code would be unchanged. You could even write: Regexp a = StaticRegExp!("ab?(ab*)+", "g"); (assign a pre-compiled regular expression to an existing phobos RegExp). There's potentially a greater speedup possible, because the Regexp class could become a struct, with no need for any dynamic memory allocation; but if this was done, mixing runtime and compile-time regexps together wouldn't be as seamless. And of course there's load of room for future enhancement. BUT... The question is -- would this be worthwhile? I'm really not interested in making another toy. It's straightforward, but tedious, and would double the length of std.regexp. Would the use of templates be such a turn-off that people wouldn't use it? Do the benefits exceed the cost?
Oct 16 2006
Don Clugston wrote:Do the benefits exceed the cost?I think yes. IMHO compile time version should be the default in std.regexp. And it's also a good usecase of D features for the world at large. -- AKhropov
Oct 16 2006
Don Clugston wrote:In the past, Eric and I both developed compile-time regex engines, but they were proof-of-concept rather than something you'd actually use in production code. I think this also applies to C++ metaprogramming regexp engines, too. I've had a bit of play around with the regexp code in Phobos, and have convinced myself that it would be straightforward to create a compile-time wrapper for the existing engine. Usage could be something like: -------- void main() { char [] s = "abcabcabab"; // case insensitive search foreach(m; rexSearch!("ab+", "i")(s)) { writefln("%s[%s]%s", m.pre, m.match(0), m.post); } } -------- It would behave *exactly* like the existing std.regexp, except that compilation into the internal form would happen via template metaprogramming, so that (1) all errors would be caught at compile time, and (2) there'd be a minor speedup because the compilation step would not happen at runtime, and (3) otherwise it wouldn't be any faster than the existing regexp. However, there'd be no template code bloat, either. Existing code would be unchanged. You could even write: Regexp a = StaticRegExp!("ab?(ab*)+", "g"); (assign a pre-compiled regular expression to an existing phobos RegExp). There's potentially a greater speedup possible, because the Regexp class could become a struct, with no need for any dynamic memory allocation; but if this was done, mixing runtime and compile-time regexps together wouldn't be as seamless. And of course there's load of room for future enhancement. BUT... The question is -- would this be worthwhile? I'm really not interested in making another toy. It's straightforward, but tedious, and would double the length of std.regexp. Would the use of templates be such a turn-off that people wouldn't use it? Do the benefits exceed the cost?It's always difficult to forsee the ramifications of anything that is new in this sense. I'm curious as to how many people have used the Boost implementation... maybe that would give you an idea of how much real-world potential it has. FWIW, I could use it within Enki for regex expressions, when they're used in an input grammar. It would yield some nice speed increases for regex heavy designs, w/o having to re-implement it all over again for Enki's sake. The only requirement I have is that it must handle UTF correctly, preferably UTF32 or a templated char type. -- - EricAnderton at yahoo
Oct 16 2006
Don Clugston wrote:The question is -- would this be worthwhile? I'm really not interested in making another toy. It's straightforward, but tedious, and would double the length of std.regexp. Would the use of templates be such a turn-off that people wouldn't use it? Do the benefits exceed the cost?Yes, for the following reasons: 1) I think it would make for faster regexp's, more than just by omitting the compile step. That's because the bytecoded program wouldn't need to be interpreted. 2) When I show the current one to people, as opposed to Eric Niebler's C++ one, the response is "but the D one is just a toy" with the implication that D is just a toy. 3) I wrote std.regexp long before people needed or asked for it. I knew it would eventually become a critically needed module, and that came to pass. It was nice that it was there when they needed it, and the time passed had ensured that it was a solid piece of code. 4) Your crafting of the current toy version was the catalyst for a big leap forward in D's templating system. Doing a professional one may expose critical problems that need to be fixed, and even if no such flaws are discovered, it would prove that D's TMP capabilities are up to scratch. Sure, a lot of people are turned off by templates. That's one of my motivations for making things like associative arrays usable without templates. But for the people who do use templates, this can be a very big deal. I think it should be in a separate module, say, regexp_static or something like that. Ideally, the user can switch between the two just by changing the module name in his code, and compare.
Oct 16 2006
Walter Bright wrote:Don Clugston wrote:to code stuff like this, it would really come in handy, if there were a compiler switch which could be set, to trace which template instantiations are going on with which arguments. For the moment I'm putting lot's of pragma-s in to manually trace this stuff, but it's so verbose ;-) roelThe question is -- would this be worthwhile? I'm really not interested in making another toy. It's straightforward, but tedious, and would double the length of std.regexp. Would the use of templates be such a turn-off that people wouldn't use it? Do the benefits exceed the cost?Yes, for the following reasons: 1) I think it would make for faster regexp's, more than just by omitting the compile step. That's because the bytecoded program wouldn't need to be interpreted. 2) When I show the current one to people, as opposed to Eric Niebler's C++ one, the response is "but the D one is just a toy" with the implication that D is just a toy. 3) I wrote std.regexp long before people needed or asked for it. I knew it would eventually become a critically needed module, and that came to pass. It was nice that it was there when they needed it, and the time passed had ensured that it was a solid piece of code. 4) Your crafting of the current toy version was the catalyst for a big leap forward in D's templating system. Doing a professional one may expose critical problems that need to be fixed, and even if no such flaws are discovered, it would prove that D's TMP capabilities are up to scratch. Sure, a lot of people are turned off by templates. That's one of my motivations for making things like associative arrays usable without templates. But for the people who do use templates, this can be a very big deal. I think it should be in a separate module, say, regexp_static or something like that. Ideally, the user can switch between the two just by changing the module name in his code, and compare.
Oct 16 2006
Walter Bright wrote:Don Clugston wrote:I haven't worked out how to do backtracking sensibly when using mixins, so what I'm doing is simply compiling to the bytecoded program at compile time. Later on, it would be possible to add another step which looks for simple pattern types, and generates specific code for them (my guess is that 90% of all regexps use just one of a few simple categories).The question is -- would this be worthwhile? I'm really not interested in making another toy. It's straightforward, but tedious, and would double the length of std.regexp. Would the use of templates be such a turn-off that people wouldn't use it? Do the benefits exceed the cost?Yes, for the following reasons: 1) I think it would make for faster regexp's, more than just by omitting the compile step. That's because the bytecoded program wouldn't need to be interpreted.2) When I show the current one to people, as opposed to Eric Niebler's C++ one, the response is "but the D one is just a toy" with the implication that D is just a toy.<raise eyebrows> I wonder if anyone is actually using the C++ regexp in production code? Personally I get the feeling that so much of boost is extremely clever, but not terribly practical.3) I wrote std.regexp long before people needed or asked for it. I knew it would eventually become a critically needed module, and that came to pass. It was nice that it was there when they needed it, and the time passed had ensured that it was a solid piece of code. 4) Your crafting of the current toy version was the catalyst for a big leap forward in D's templating system. Doing a professional one may expose critical problems that need to be fixed, and even if no such flaws are discovered, it would prove that D's TMP capabilities are up to scratch. Sure, a lot of people are turned off by templates. That's one of my motivations for making things like associative arrays usable without templates. But for the people who do use templates, this can be a very big deal.There's a potential piece of syntax sugar that IMHO would be a killer feature. If it were possible to overload a function based on whether a parameter is a compile-time literal, or a variable, the use of template metaprogramming could become completely invisible to the user. In my code, the situation is basically like this: --------- class RegExp { char [] program; void makeProgramAtCompileTime(char [] pattern)() { program = RegExpCompile!(pattern); } void makeProgramAtRunTime(char [] pattern) { program = runtimecompile(pattern); } } --------- Imagine if there were some way to create an alias makeProgram(char [] pattern, int otherparameters) which became makeProgramAtCompileTime!(pattern)(otherparameters); if pattern were a compile time literal, but became makeProgramAtRunTime(pattern, otherparameters); if pattern were a variable. Then the user wouldn't even need to know that they were using templates. It would also let you do some pretty amazing stuff. Consider something as basic as sin(x). real sin_compiletime(real x)() { static if (x==0) return x; // we can do this one else return sin_runtime(x); // everything else is too hard, pass it to the runtime version. } Which is much easier than training the optimiser that it can replace sin(0) with 0. For more complex situations, you could perform optimisations that are technically infeasible for a normal optimiser. You could also catch more errors at compile time: real sqrt_compiletime(real x)() { static assert(x>=0, "Error, square root of negative number"); return sqrt_runtime(x); } There probably aren't many cases where this kind of optimisation would actually be worth doing, but for those cases, the benefit could be considerable. One possible syntax would be something like: real sqrt(template real x) { static if (is(x: const)) return sqrt_compiletime!(x); else return sqrt_runtime(x); } Another might be "template alias". Something to think about; not a proposal at present.
Oct 17 2006
Don Clugston wrote:Walter Bright wrote:That's certainly been my experience with boost. It's hit-or-miss. shared_ptr stuff -- great. functional and serialization -- clever, but ultimately a big waste of my time. It's easier to write the dang ugly for loop than to figure out the right incantation of _1's and _2's and binders to get functional::map working. I really wish boost had some sort of feedback form at the bottom of every page so people could rate the packages and make comments. Kind of like the MathWorks site for user-contributed Matlab code. Could be a good thing for D-source too. --bbDon Clugston wrote: 2) When I show the current one to people, as opposed to Eric Niebler's C++ one, the response is "but the D one is just a toy" with the implication that D is just a toy.<raise eyebrows> I wonder if anyone is actually using the C++ regexp in production code? Personally I get the feeling that so much of boost is extremely clever, but not terribly practical.
Oct 17 2006
Don Clugston wrote:There's a potential piece of syntax sugar that IMHO would be a killer feature. If it were possible to overload a function based on whether a parameter is a compile-time literal, or a variable, the use of template metaprogramming could become completely invisible to the user.That is a good idea.
Oct 17 2006
On Tue, 17 Oct 2006 12:23:33 -0700, Walter Bright wrote:Don Clugston wrote:That put a crazy idea into my head. disclaimer: My education is in physics not computer science, I have no idea of how to write a compiler so my view may be very wrong ;-) I have previously suggested that D handle functions on compile-time literal by making them inline and let the optimizer evaluate them. Would it be possible to combine the template-engine and the optimizer such that one of them could be refactored away ??There's a potential piece of syntax sugar that IMHO would be a killer feature. If it were possible to overload a function based on whether a parameter is a compile-time literal, or a variable, the use of template metaprogramming could become completely invisible to the user.That is a good idea.
Oct 17 2006
Knud Sørensen wrote:On Tue, 17 Oct 2006 12:23:33 -0700, Walter Bright wrote:This is a very interesting question and I've thought a fair bit about this. Three short answers: 1. This would be easy to do (ie it is only heavy syntax sugar) for a limited subset of the language (this subset would exclude reference semantics). 2. It would be possible to do for almost the entire language, but would require a real interpreter at compile time, which means a fair bit more work (but still possible and very appealing). 3. Neither of these two mechanisms would be enough to do away with the primary motivation for templates -- parameterised types. However, they would remove the need for a Turing-complete template system. <rant> Long answers: 1. Value semantics are inherently easier for a compiler to handle, because evaluation of them is just done by inlining, const folding and const propagation. Right now, an automated conversion of normal D functions to template form is pretty much possible, using the following rules: - convert every function to a template - convert every assignment to a variable into static single assignment form: a = 5; a = 6; becomes const a__1 = 5; const a__2 = 6; - convert function calls into template instantiations a = foo(); becomes const a__1 = foo!().__val; - convert returns into const __val = whatever; - now that everything is constant, we can use static if instead of if everywhere. So, just convert all ifs into static ifs. - convert loops into recursive functions: while (condition) { ... } becomes template __Loop1(local_variables....) { // Do stuff static if (condition) __val = _Loop10!(new_local_variables).__val; } This basically does it but, since the template system only allows value semantics, this conversion must only allow them, so it means: - no pointers - no classes - no index-setting/slice-setting of arrays - no inout/out parameters and also - no assembly - no C code because the template system can't be expected to handle them. 2. By implementing a real interpreter in the compiler, you can handle reference semantics. The only things you can't handle are - assembly - other languages - global non-const variables (which should generally be avoided anyway) Both of these systems would incorporate a way of saying, 'evaluate this at compile time', even if it were just as simple as declaring the variable const, using it as a template parameter or in a static if: version (good) { const int N = 80; } else { const int N = 16 } const foo = sqrt(N); // foo is 4 static if (sqrt(foo) > 3) // that expression must be a compile-time constant alias ubyte MyByte; else alias byte MyByte; List!(foo, MyByte) myList; The main motivations for this are, (in decreasing order of importance): - ability to use runtime functions for static if and template parameters (where static if is important because of - early catching of errors (eg incorrect regexes) - efficiency (eg not having to compile the regex at runtime; also, having cheap reflection at compile time) In my opinion, fully supporting an interpreter at compile time simply turns metaprogramming into normal D with the requirement that it be optimized away by the compiler, and the extensibility and opportunities that this could allow is a huge motivation. However, being a very large amount of work with a lot of details, there is no hope of it being completed any time soon, but I think the motivation is so much that it is worth devoting an entire release of D (maybe 2.0 or 3.0) to being a 'metaprogramming release.' </rant> Cheers, ReinerDon Clugston wrote:That put a crazy idea into my head. disclaimer: My education is in physics not computer science, I have no idea of how to write a compiler so my view may be very wrong ;-) I have previously suggested that D handle functions on compile-time literal by making them inline and let the optimizer evaluate them. Would it be possible to combine the template-engine and the optimizer such that one of them could be refactored away ??There's a potential piece of syntax sugar that IMHO would be a killer feature. If it were possible to overload a function based on whether a parameter is a compile-time literal, or a variable, the use of template metaprogramming could become completely invisible to the user.That is a good idea.
Oct 17 2006
Reiner Pope wrote:Knud Sørensen wrote:On Tue, 17 Oct 2006 12:23:33 -0700, Walter Bright wrote:Don Clugston wrote:Three short answers: 1. This would be easy to do (ie it is only heavy syntax sugar) for a limited subset of the language (this subset would exclude reference semantics). 2. It would be possible to do for almost the entire language, but would require a real interpreter at compile time, which means a fair bit more work (but still possible and very appealing). 3. Neither of these two mechanisms would be enough to do away with the primary motivation for templates -- parameterised types. However, they would remove the need for a Turing-complete template system....Both of these systems would incorporate a way of saying, 'evaluate this at compile time', even if it were just as simple as declaring the variable const, using it as a template parameter or in a static if: version (good) { const int N = 80; } else { const int N = 16 } const foo = sqrt(N); // foo is 4 static if (sqrt(foo) > 3) // that expression must be a compile-time constant alias ubyte MyByte; else alias byte MyByte; List!(foo, MyByte) myList; The main motivations for this are, (in decreasing order of importance): - ability to use runtime functions for static if and template parameters (where static if is important because of - early catching of errors (eg incorrect regexes) - efficiency (eg not having to compile the regex at runtime; also, having cheap reflection at compile time)This is exactly what I've been hoping someone would invent too. It seems like the next logical step in template metaprogramming. I think Walter's hip to the idea too. Soon after the last big discussion about metaprogramming stuff and making it more like a true compile-time programming language, "static if" appeared. I think that was something that Walter realized was "low-hanging fruit", which could quickly be implemented today and have a big impact now without waiting around till D 3.0. So, I think it's vaguely on the D roadmap somewhere, it's just a huge undertaking and will take time. --bb
Oct 17 2006
Don Clugston wrote:Imagine if there were some way to create an alias makeProgram(char [] pattern, int otherparameters) which became makeProgramAtCompileTime!(pattern)(otherparameters); if pattern were a compile time literal, but became makeProgramAtRunTime(pattern, otherparameters); if pattern were a variable. Then the user wouldn't even need to know that they were using templates.I've always thought that meta-heavy libraries like Blitz++ were a noble attempt at something the language didn't really support, but the ability to mix compile and runtime code using the same call syntax would completely change things.One possible syntax would be something like: real sqrt(template real x) { static if (is(x: const)) return sqrt_compiletime!(x); else return sqrt_runtime(x); }This is pretty clear and straightforward. Something along these lines would be a killer feature for 2.0. Sean
Oct 17 2006
Don Clugston wrote:The question is -- would this be worthwhile? I'm really not interested in making another toy. It's straightforward, but tedious, and would double the length of std.regexp.Maybe it would be simpler, but similar effective, to add some persistence functionality to std.regexp.RegExp. This could mean to convert an instance of RegExp to a byte array, cache it in a file, later load the array back and recreate the instance. So the "compilation" of the RE to its opcodes is done only once and program initialization becomes faster. Michael
Oct 16 2006
Don Clugston wrote:BUT... The question is -- would this be worthwhile? I'm really not interested in making another toy. It's straightforward, but tedious, and would double the length of std.regexp. Would the use of templates be such a turn-off that people wouldn't use it? Do the benefits exceed the cost?I have an idea for a project that would benefit from it.
Oct 16 2006
Don Clugston wrote:However, there'd be no template code bloat, either. Existing code would be unchanged. You could even write: Regexp a = StaticRegExp!("ab?(ab*)+", "g"); (assign a pre-compiled regular expression to an existing phobos RegExp).IMHO, this would be a Killer App in template programming. And definitely one of the show-off gems in D.
Oct 17 2006
Don Clugston wrote:It would behave *exactly* like the existing std.regexp, except that compilation into the internal form would happen via template metaprogramming, so that (1) all errors would be caught at compile time, and (2) there'd be a minor speedup because the compilation step would not happen at runtime, and (3) otherwise it wouldn't be any faster than the existing regexp. However, there'd be no template code bloat, either.Whoa, "internal form" and "bytecoded program"? Out of curiosity, for those ignorant on the matter, like me, what kind of processing is done when creating a regexp, in terms of this internal form you speak of? Is it converted to a simple internal form, or something more complex? Bytecoded program seems pretty complex stuff, especially for a regexp (isn't the translation direct) ? -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Oct 21 2006
Bruno Medeiros wrote:Don Clugston wrote:It's *nowhere near* as complicated as it sounds. If you look into the source of std.regexp, you'll see what I mean -- there's a function called 'compile'.It would behave *exactly* like the existing std.regexp, except that compilation into the internal form would happen via template metaprogramming, so that (1) all errors would be caught at compile time, and (2) there'd be a minor speedup because the compilation step would not happen at runtime, and (3) otherwise it wouldn't be any faster than the existing regexp. However, there'd be no template code bloat, either.Whoa, "internal form" and "bytecoded program"? Out of curiosity, for those ignorant on the matter, like me, what kind of processing is done when creating a regexp, in terms of this internal form you speak of? Is it converted to a simple internal form, or something more complex? Bytecoded program seems pretty complex stuff, especially for a regexp (isn't the translation direct) ?
Oct 21 2006
Don Clugston wrote:Bruno Medeiros wrote:From my quick look, it is a simpler representation of the regexp string, but it still a linear (opcode) representation. It is not a graph state like I'd expect (and like benji said). (one is used/created later? I couldn't tell from std.regexp.test() ) -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DDon Clugston wrote:It's *nowhere near* as complicated as it sounds. If you look into the source of std.regexp, you'll see what I mean -- there's a function called 'compile'.It would behave *exactly* like the existing std.regexp, except that compilation into the internal form would happen via template metaprogramming, so that (1) all errors would be caught at compile time, and (2) there'd be a minor speedup because the compilation step would not happen at runtime, and (3) otherwise it wouldn't be any faster than the existing regexp. However, there'd be no template code bloat, either.Whoa, "internal form" and "bytecoded program"? Out of curiosity, for those ignorant on the matter, like me, what kind of processing is done when creating a regexp, in terms of this internal form you speak of? Is it converted to a simple internal form, or something more complex? Bytecoded program seems pretty complex stuff, especially for a regexp (isn't the translation direct) ?
Oct 26 2006
Bruno Medeiros wrote:Don Clugston wrote:The graph part is done via 'goto' opcodes. But the majority of the simple graph branches get converted into filters.Bruno Medeiros wrote:From my quick look, it is a simpler representation of the regexp string, but it still a linear (opcode) representation. It is not a graph state like I'd expect (and like benji said). (one is used/created later? I couldn't tell from std.regexp.test() )Don Clugston wrote:It's *nowhere near* as complicated as it sounds. If you look into the source of std.regexp, you'll see what I mean -- there's a function called 'compile'.It would behave *exactly* like the existing std.regexp, except that compilation into the internal form would happen via template metaprogramming, so that (1) all errors would be caught at compile time, and (2) there'd be a minor speedup because the compilation step would not happen at runtime, and (3) otherwise it wouldn't be any faster than the existing regexp. However, there'd be no template code bloat, either.Whoa, "internal form" and "bytecoded program"? Out of curiosity, for those ignorant on the matter, like me, what kind of processing is done when creating a regexp, in terms of this internal form you speak of? Is it converted to a simple internal form, or something more complex? Bytecoded program seems pretty complex stuff, especially for a regexp (isn't the translation direct) ?
Oct 27 2006
Bruno Medeiros wrote:Whoa, "internal form" and "bytecoded program"? Out of curiosity, for those ignorant on the matter, like me, what kind of processing is done when creating a regexp, in terms of this internal form you speak of?Generally speaking, regular expressions are compiled into a directed state graph. For example, a regex like this: a(b|c)d ...compiles into a state-graph like this: <mono-spaced> b a -< >- d c </mono-spaced> The regex engine is a state-machine-crawler that knows how to navigate from node to node in the graph, including rules about when it is allowed to backtrack, etc. So, the compiled representation of a regular expression isn't "opcodes" in the traditional sense of the word. It's just a very particular kind of state graph. --Benji
Oct 23 2006