www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - RegExps and templates

reply xs0 <xs0 xs0.com> writes:
Hello!

I was reading some of the RegExp posts and a thought occured to me - 
would it be possible to express regular expressions using 
meta-programming techniques? If so, it would be fairly easy to do 
compiler support for them - it would just rewrite the RE as a template 
instance, and optimization and whatnot would be inherited automatically.

To support RE also at run-time, it may be neat to have the templates 
work in two modes - either as static functions or as runtime objects, to 
allow runtime use.

It would also allow construction of REs by explicitly instancing 
templates, and the whole infrastructure could be expanded by 
implementing new templates that do matching, as long as they implemented 
some specified interface (this would have no compiler support, of course).

Now, the questions are:
- is this possible at all? I'm no RE expert, so I can't answer that..
- what would be the required interfaces?
- what would the templates look like?
- if everything else is worked out, is Walter willing to include this in 
the language?

Finally, for a partial example (with only matching):

interface REPart
{
// returns true if there is a match
// pos gets updated to where the match ends, or left alone if not
     bit isMatch(dchar[] text, inout int pos);
}

class MatchChar(dchar C) : REPart
{
// this is the static part:
     static bit isMatch(dchar[] text, inout int pos)
     {
         if (text[pos]==C) {
             ++pos;
             return 1;
         }
         return 0;
     }

// this is the dynamic part:
     dchar val;
     this(dchar val)
     {
         this.val=val;
     }

     bit isMatch(dchar[] text, inout int pos)
     {
         if (text[pos]==val) {
             ++pos;
             return 1;
         }
         return 0;
     }

}

class MatchCharRange(dchar A, dchar B) : REPart
{
     static bit isMatch(dchar[] text, inout int pos)
     {
         if (A<=text[pos] && text[pos]<=B) {
            ++pos;
            return 1;
         }
         return 0;
     }
// and the same for dynamic support
}

class MatchCount(int min, int max, SUB) : REPart
{
     static bit isMatch(dchar[] text, inout int pos)
     {
         int tmppos=pos;
         int count=0;
         while (SUB.isMatch(text, tmppos))
             count++;

         if (count>=min && count<=max) {
             pos=tmppos;
             return 1;
         }
         return 0;
     }
}

class MatchEither(A,B) : REPart
{
     static bit isMatch(dchar[] text, inout int pos)
     {
         int posA=pos;
         bit matchA=A.isMatch(text, posA);

         int posB=pos;
         bit matchB=B.isMatch(text, posB);

         if (matchA && matchB) { // match the longer one
             pos=posA>posB?posA:posB;
             return 1;
         }

         if (matchA || matchB) {
             pos=matchA?posA:posB;
             return 1;
         }

         return 0;
     }
}

class MatchTwo(A,B) : REPart
{
     static bit isMatch(dchar[] text, inout int pos)
     {
         int tmppos=pos;
         if (A.isMatch(text, tmppos) && B.isMatch(text, tmppos)) {
             pos=tmppos;
             return 1;
         }
         return 0;
     }
}


And now, something like

[0-9]+([.][0-9]+)?

would become

MatchTwo!(
     MatchCount!(1, int.max,
         MatchCharRange!('0', '9')
     ),
     MatchCount!(0, 1,
         MatchTwo!(
             MatchChar!('.'),
             MatchCount!(1, int.max,
                 MatchCharRange!('0', '9')
             )
         )
     )
).isMatch(..)



Thoughts?


xs0
Feb 27 2005
parent reply zwang <nehzgnaw gmail.com> writes:
Interesting.  But what's the benefit of using template-based RE?

Isn't "MatchTwo!(MatchCount!(1, int.max, MatchCharRange!('0', '9')),
MatchCount!(0, 1, MatchTwo!(MatchChar!('.'), MatchCount!(1, int.max,
MatchCharRange!('0', '9')))))" overly verbose?
What's wrong with the good old succinct form "[0-9]+([.][0-9]+)?"?



xs0 wrote:
 Hello!
 
 I was reading some of the RegExp posts and a thought occured to me - 
 would it be possible to express regular expressions using 
 meta-programming techniques? If so, it would be fairly easy to do 
 compiler support for them - it would just rewrite the RE as a template 
 instance, and optimization and whatnot would be inherited automatically.
 
 To support RE also at run-time, it may be neat to have the templates 
 work in two modes - either as static functions or as runtime objects, to 
 allow runtime use.
 
 It would also allow construction of REs by explicitly instancing 
 templates, and the whole infrastructure could be expanded by 
 implementing new templates that do matching, as long as they implemented 
 some specified interface (this would have no compiler support, of course).
 
 Now, the questions are:
 - is this possible at all? I'm no RE expert, so I can't answer that..
 - what would be the required interfaces?
 - what would the templates look like?
 - if everything else is worked out, is Walter willing to include this in 
 the language?
 
 Finally, for a partial example (with only matching):
 
 interface REPart
 {
 // returns true if there is a match
 // pos gets updated to where the match ends, or left alone if not
     bit isMatch(dchar[] text, inout int pos);
 }
 
 class MatchChar(dchar C) : REPart
 {
 // this is the static part:
     static bit isMatch(dchar[] text, inout int pos)
     {
         if (text[pos]==C) {
             ++pos;
             return 1;
         }
         return 0;
     }
 
 // this is the dynamic part:
     dchar val;
     this(dchar val)
     {
         this.val=val;
     }
 
     bit isMatch(dchar[] text, inout int pos)
     {
         if (text[pos]==val) {
             ++pos;
             return 1;
         }
         return 0;
     }
 
 }
 
 class MatchCharRange(dchar A, dchar B) : REPart
 {
     static bit isMatch(dchar[] text, inout int pos)
     {
         if (A<=text[pos] && text[pos]<=B) {
            ++pos;
            return 1;
         }
         return 0;
     }
 // and the same for dynamic support
 }
 
 class MatchCount(int min, int max, SUB) : REPart
 {
     static bit isMatch(dchar[] text, inout int pos)
     {
         int tmppos=pos;
         int count=0;
         while (SUB.isMatch(text, tmppos))
             count++;
 
         if (count>=min && count<=max) {
             pos=tmppos;
             return 1;
         }
         return 0;
     }
 }
 
 class MatchEither(A,B) : REPart
 {
     static bit isMatch(dchar[] text, inout int pos)
     {
         int posA=pos;
         bit matchA=A.isMatch(text, posA);
 
         int posB=pos;
         bit matchB=B.isMatch(text, posB);
 
         if (matchA && matchB) { // match the longer one
             pos=posA>posB?posA:posB;
             return 1;
         }
 
         if (matchA || matchB) {
             pos=matchA?posA:posB;
             return 1;
         }
 
         return 0;
     }
 }
 
 class MatchTwo(A,B) : REPart
 {
     static bit isMatch(dchar[] text, inout int pos)
     {
         int tmppos=pos;
         if (A.isMatch(text, tmppos) && B.isMatch(text, tmppos)) {
             pos=tmppos;
             return 1;
         }
         return 0;
     }
 }
 
 
 And now, something like
 
 [0-9]+([.][0-9]+)?
 
 would become
 
 MatchTwo!(
     MatchCount!(1, int.max,
         MatchCharRange!('0', '9')
     ),
     MatchCount!(0, 1,
         MatchTwo!(
             MatchChar!('.'),
             MatchCount!(1, int.max,
                 MatchCharRange!('0', '9')
             )
         )
     )
 ).isMatch(..)
 
 
 
 Thoughts?
 
 
 xs0

Feb 27 2005
parent reply xs0 <xs0 xs0.com> writes:
Well, the benefit is that it gets compiled to native code that handles 
exactly that RE. The idea is to make it simple to support in the 
compiler, otherwise it will never happen, I guess... And naturally, if 
it was compiler-supported, you would only write "[0-9]+([.][0-9]+)?" 
with a char or two indicating it was a RE :)


xs0

zwang wrote:
 Interesting.  But what's the benefit of using template-based RE?
 
 Isn't "MatchTwo!(MatchCount!(1, int.max, MatchCharRange!('0', '9')),
 MatchCount!(0, 1, MatchTwo!(MatchChar!('.'), MatchCount!(1, int.max,
 MatchCharRange!('0', '9')))))" overly verbose?
 What's wrong with the good old succinct form "[0-9]+([.][0-9]+)?"?

Feb 27 2005
parent reply Georg Wrede <georg.wrede nospam.org> writes:
Did you read the thread "What is a regular expression" (started feb 16)? 
How is this related to it?

I mean, is this in addition to that, or instead of it? Or something else?

xs0 wrote:
 Well, the benefit is that it gets compiled to native code that handles 
 exactly that RE. The idea is to make it simple to support in the 
 compiler, otherwise it will never happen, I guess... And naturally, if 
 it was compiler-supported, you would only write "[0-9]+([.][0-9]+)?" 
 with a char or two indicating it was a RE :)
 
 
 xs0
 
 zwang wrote:
 
 Interesting.  But what's the benefit of using template-based RE?

 Isn't "MatchTwo!(MatchCount!(1, int.max, MatchCharRange!('0', '9')),
 MatchCount!(0, 1, MatchTwo!(MatchChar!('.'), MatchCount!(1, int.max,
 MatchCharRange!('0', '9')))))" overly verbose?
 What's wrong with the good old succinct form "[0-9]+([.][0-9]+)?"?


Feb 27 2005
parent xs0 <xs0 xs0.com> writes:
Georg Wrede wrote:
 Did you read the thread "What is a regular expression" (started feb 16)? 
 How is this related to it?
 
 I mean, is this in addition to that, or instead of it? Or something else?

Yes I did, that's what got me started. In this thread I was trying to focus on implementation, rather than syntax or other topics. Unfortunately, it seems it was premature, as there are hardly any replies :) xs0
Feb 28 2005