www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - "spirit" like template parser generator in 100 Lines of code (almost)

reply BCS <BCS pathlink.com> writes:
(The almost is with regards to it working not the 100 LOC, if all blank 
lines and comments are removed it is exactly 100 LOC.)

It actual doesn't work because of several things. However, some of these 
are bugs (I think) and the rest I am putting forward as feature requests.

What I have actually done is make an almost working prototype of a 
template system that generates functions for a recursive decent parser. 
It is to be mixed into a class or struct. The parameter for the template 
is a string containing almost normal BNF reduction definitions.

It extends BNF by allowing "|" for "or" but doesn't implement optional 
or repeated terms.

It isn't implemented yet, but actions will be defined by providing 
specializations for the "Action" template.

Example:

<code>
struct P
{
  char[] data;
  int pos;

  PObject Action(char[]string:"ADD_ACT")(PObject l, PObject e, PObject r)
  {
   Numeric ln = cast(Numeric) l;
   Numeric rn = cast(Numeric) r;
   return new AddExp(ln, rn);
  }
  ... //some more actions

  PObject ParseRule(char[] string : "PLUS")()
  {
   if (data[pos] == '+')
   {
    pos++;
    return PObject.pass;
   }
   return PObject.fail;
  }
  ... // more terminals

// ADD_EXP is the name of the reduction
// MUL2ADD, ADD_ACT, SUB_ACT, CAT2ADD are action names
  mixin Parser!("ADD_EXP :
       MUL2ADD / MUL_EXP |
       ADD_ACT / ADD_EXP PLUS MUL_EXP |
       SUB_ACT / ADD_EXP MINUS MUL_EXP |
       CAT2ADD / CAT_EXP;"); // also has more non terminals.
}

vod main()
{
  P p;
  p.data = GetData();
  p.pos = 0;
  if(ret = cast(AddExp)p.ParseRule!("ADD_EXP")())
   //good
}
</code>

(The templates are at the bottom of the post)

Now for why it doesn't work.

===Bugs:

-- Mixins don't work quite right. --
this for instance doesn't work unless the second line in foo is removed:

template First(char[] string) { const char[] First = string[0..1] }
template Temp(char[] string)
{
  static if(string != "")
  {
   void TempFn(char[] s : First!(string)) {}
   mixin Temp!(string[1..$]);
  }
}
struct S
{
  mixin Temp!("ab");
  void Foo()
  {
   TempFn!("a")();
   TempFn!("b")();// this fails
  }
}

what seems to be happening is that only the first iteration of the 
template is actual mixed in. Use of pragma(msg,...) seems to show that 
template does recurse correctly.

I think this is a bug

-- The parameter on a foreach isn't const. --
Actual this doesn't stop anything from working, it just makes some 
things more difficult.

foreach(i;TupleMaker!())
  Template!(i)();

becomes:

alias TupleMaker!() tempTuple
foreach(j,_;tempTuple)
{
  const i = tempTuple[j];
  Template!()();
}

This is not a trivial complaint. While I was working on my BigNum 
template I noticed that the const variable adds ops to the function even 
with the -O flag.


===New Feature Needed To Make This Work.


-- Abstract Templates --

The idea here is that a template marked as abstract would have no 
generic implementation, only specializations. Furthermore the compiler 
would assume that any given specializations it is asked for exists and 
leave it up to the linker to find the correct implementation. this would 
allow cyclical calls by specializations of a template.

abstract template foo(char[] s)
{
  void foo(uint);
}

template foo(char[] s : "A")
{
  void foo(uint i)
  {  // nothing generated here
   if(i) foo!("B")(i-1);
  }
}

template foo(char[] s : "B") // foo!("B") generated here.
{
  void foo(uint i)
  {
   if(i>1) foo!("A")(i-2);
   if(i>1) foo!("C")(0); // will cause a link error
  }
}

-- "static foreach" at same level as "static if" --

two parts:
  make it "static forach"
  allow it the same places as "static if"

template foo(char[] string)
{
  static foreach(i;BreakBy!(string,','))
   mixin Other!(i);
}

this isn't strictly needed to make my template work, but it makes it a 
whole lot cleaner.


===================


Now for what you all have been waiting for!!!!!

origonal formatting at:
http://www.webpages.uidaho.edu/~shro8822/dspirit.d

<code>
/************
  Discard leading whitespace
*/
template DropWhite(char[] string)
{
  static if(string == "")
   alias string DropWhite;
  else static if(string[0] == ' ' || string[0] == '\t' || string[0] == 
'\r' || string[0] == '\n')
   alias DropWhite!(string[1..$]) DropWhite;
  else
   alias string DropWhite;
}

/************
  return a slice upto the first whitespace char
*/
template UseTillWhite(char[] string)
{
  static if(string == "" || string[0] == ' ' || string[0] == '\t' || 
string[0] == '\r' || string[0] == '\n')
   const char[] UseTillWhite = "";
  else
   const char[] UseTillWhite = string[0] ~ UseTillWhite!(string[1..$]);
}

/************
  return a slice starting with the first whitespace char
*/
template DiscardTillWhite(char[] string)
{
  static if(string == "" || string[0] == ' ' || string[0] == '\t' || 
string[0] == '\r' || string[0] == '\n')
   alias string DiscardTillWhite;
  else
   alias DiscardTillWhite!(string[1..$]) DiscardTillWhite;
}

/************
  Return a slice up-to but not including the first instance of t.
*/
template UseTill(char[] string, char t)
{
  static if(string == "" || string[0] == t)
   const char[] UseTill = "";
  else
   const char[] UseTill = string[0..1] ~ UseTill!(string[1..$],t);
}

/***********
  Return a slice starting after the first instance of t and containing 
the rest of the string.
*/
template DiscardTill(char[] string, char t)
{
  static if(string == "")
   const char[] DiscardTill = "";
  else static if(string[0] == t)
   const char[] DiscardTill = string[1..$];
  else
   const char[] DiscardTill = DiscardTill!(string[1..$],t);
}

/***********
  discard [ ]*
  then return [a-zA-Z_][a-zA-Z0-9_]*
*/
template GetID(char[] string) { alias GetID_A!(DropWhite!(string)) GetID; }
template GetID_A(char[] string)
{
  static if(string == "")
   const char[] GetID_A = "";
  else static if('_' == string[0] || ('a' <= string[0] && string[0] <= 
'z') || ('A' <= string[0] && string[0] <= 'Z'))
   const char[] GetID_A = string[0..1] ~ GetID_B!(string[1..$]);
  else
   const char[] GetID_A = "";
}
template GetID_B(char[] string)
{
  static if(string == "")
   const char[] GetID_B = "";
  else static if('_' == string[0] || ('a' <= string[0] && string[0] <= 
'z') || ('A' <= string[0] && string[0] <= 'Z') || ('0' <= string[0] && 
string[0] <= '9'))
   const char[] GetID_B = string[0..1] ~ GetID_B!(string[1..$]);
  else
   const char[] GetID_B = "";
}

/***********
  mixin temp with {V + 'string' broken up by 'c'}
*/
template BreakBy(char[] string, char d, V...)
{
  static if(string == "")
   alias V BreakBy;
  else static
   alias BreakBy!(DiscardTill!(string,d), d, V, UseTill!(string,d)) BreakBy;
}

/***********
  mixin temp with {V + 'string' broken up by 'c'}
*/
template SplitWhite(char[] string, V...)
{
  static if(DropWhite!(string).length == 0)
   alias V SplitWhite;
  else
   alias SplitWhite!(DropWhite!(DiscardTillWhite!(string)), V, 
UseTillWhite!(DropWhite!(string))) SplitWhite;
}

/****
PARSER  : RULE PARSER | ;
RULE  : ID.name ":" CASE ALT ";";
ALT  : "|" CASE ALT | ;
CASE  : ID.action "/" SEQUENCE;
SEQUENCE : ID SEQUENCE | ;
*/
template Parser(char[] string)
{
  static if(string != "") // test for PARSER.2
  {
    // get RULE
   private static const char[] rule = DropWhite!(UseTill!(string, ';'));

    // report
   pragma(msg, rule);

    // instance RULE
   bool ParseRule(char[] name : GetID!(rule))()
   {
     // record start location
    uint start = this.pos;
     // try caes
    alias BreakBy!(DropWhite!(DiscardTill!(rule,':')),'|') cases;
    caseLoop: foreach(ci,c;cases)
    {
     const char[] Case = cases[ci];
     pragma(msg, "caseLoop: "~Case);
      // return to start location
     this.pos = start;

      // allocate storage for returns
     alias SplitWhite!(DiscardTill!(Case,'/')) clauses;

      // try clauses
     foreach(index, cl;clauses)
     {
      const char[] clause = clauses[index];
      pragma(msg, "clauseLoop: "~clause);
       // parse item
      bool fail = ParseRule!(GetID!(clause))();
       // if failed try next case
      if(fail) continue caseLoop;
       // otherwise try next clause
     }
      // enact reduction when all clauses fit.
     return false;
    }

    return true;
   }
    // recur on remainder of PARSER
   mixin Parser!(DiscardTill!(string, ';'));
  }
}
</code>
Dec 18 2006
next sibling parent BCS <BCS pathlink.com> writes:
More work and some other thoughts.

BCS wrote:
[...]
 What I have actually done is make an almost working prototype of a 
 template system that generates functions for a recursive decent parser. 
 It is to be mixed into a class or struct. The parameter for the template 
 is a string containing almost normal BNF reduction definitions.
 
[...]

 It isn't implemented yet, but actions will be defined by providing 
 specializations for the "Action" template.
This is now done in one form, I'm not sure I quite like it though... [...]
 Now for why it doesn't work.
 
 ===Bugs:
 
 -- Mixins don't work quite right. --
 
 what seems to be happening is that only the first iteration of the 
 template is actual mixed in. Use of pragma(msg,...) seems to show that 
 template does recurse correctly.
 
 I think this is a bug
A bit more work and it seems that a mixin can't add new specializations to a template in the scope that is mixed into. template a(){template b(char : 'c'){uint i;}} struct S { template b(char : 'd'){uint i;} mixin a!(); // only b!('d') avalable // b!('c') masked by 'd' version }
[...]
 ===New Feature Needed To Make This Work.
 
 
 -- Abstract Templates --
 
 The idea here is that a template marked as abstract would have no 
 generic implementation, only specializations. Furthermore the compiler 
 would assume that any given specializations it is asked for exists and 
 leave it up to the linker to find the correct implementation. this would 
 allow cyclical calls by specializations of a template.
 
After a bit of playing around I'm not sure that this is strictly needed in this case. The order that templates are evaluated in may make it unneeded. I'd have to play around with it more to know for sure. However having the concept of an "abstract template" would remove this from the realm of "implementation specific details". source: http://www.webpages.uidaho.edu/~shro8822/dspirit.d exmaple: http://www.cs.uidaho.edu/~temp0092/test_t.d
Dec 18 2006
prev sibling next sibling parent reply Don Clugston <dac nospam.com.au> writes:
BCS wrote:
 -- The parameter on a foreach isn't const. --
 Actual this doesn't stop anything from working, it just makes some 
 things more difficult.
 
 foreach(i;TupleMaker!())
  Template!(i)();
 
 becomes:
 
 alias TupleMaker!() tempTuple
 foreach(j,_;tempTuple)
 {
  const i = tempTuple[j];
  Template!()();
 }
 
 This is not a trivial complaint. While I was working on my BigNum 
 template I noticed that the const variable adds ops to the function even 
 with the -O flag.
Another alternative workaround is this: ------- template countTo(int a, A...) { static if (a==0) alias A countTo; else static if (a&1) alias countTo!(a-1, void, A) countTo; else alias countTo!(a/2, A, A) countTo; } foreach(j,_; countTo!(tempTuple.length)) { const i = tempTuple[j]; } In fact, if you can calculate the value of 'i' based only on 'j', this can be a much more general solution, for example for your Bignum template. It is very quick, can cope with numbers as high as 20000 or so, and doesn't generate any runtime code (in fact, it can't, because there's nothing you can do with a tuple of 20000 voids, except count it). It works well for cases like: foreach(i,_; countTo!(s.length)) { static if (s[i]=='+') { ... } ... } allowing you to iterate over the contents of a string. There are some pretty cool things that work here -- you can add labels in one iteration, and generate goto statements (or asm jmp instructions) in a different iteration. I don't know how that works, but it's awesome because it lets you turn a compile-time string into a run-time state machine...
Dec 19 2006
parent BCS <BCS pathilink.com> writes:
Don Clugston wrote:
 Another alternative workaround is this:
 -------
 template countTo(int a, A...)
 {
     static if (a==0) alias A countTo;
     else static if (a&1) alias countTo!(a-1, void, A) countTo;
     else alias countTo!(a/2, A, A) countTo;
 }
 
 foreach(j,_; countTo!(tempTuple.length)) {
   const i = tempTuple[j];
 }
 In fact, if you can calculate the value of 'i' based only on 'j', this 
 can be a much more general solution, for example for your Bignum 
 template. It is very quick, can cope with numbers as high as 20000 or 
 so, and doesn't generate any runtime code (in fact, it can't, because 
 there's nothing you can do with a tuple of 20000 voids, except count it).
 It works well for cases like:
 
 foreach(i,_; countTo!(s.length)) {
   static if (s[i]=='+') { ... }
   ...
 }
 allowing you to iterate over the contents of a string.
 There are some pretty cool things that work here -- you can add labels 
 in one iteration, and generate goto statements (or asm jmp instructions) 
 in a different iteration. I don't know how that works, but it's awesome 
 because it lets you turn a compile-time string into a run-time state 
 machine...
That would be interesting to try. In fact the use of a "make dummy tuple list" meta function was the next thing I planed to do for BigNum. (BTW, my first hack at it soaked up my whole system with a~=5k) OTOH that still doesn't solve the "instance for each" case. For that you still need a local const. Anyway, neat stuff.
Dec 19 2006
prev sibling parent BCS <BCS pathilink.com> writes:
After a bit of consideration and puzzling, I think I have figured out 
what is going on with the mixins, templates and specializations. From 
this, I'm not sure if what is happening is a bug or not. However it 
definitely prevents my program from working and blocks off a whole set 
of functionality for D templates.

*What I'm trying to do:*

I'm trying to mixin several function templates into the same scope, all 
with the same name, but differently specialized. This would effectively 
allow for identifier generation from strings.

void Foo(char[] s: "hello")(){}
void Foo(char[] s: "world")(){}

void fn(char[] string)()
{
  Foo!(string)();	// go find a Foo to call
}

The usefulness of this comes in when the string that is used comes from 
template processing. A dumb example of this could convert a string to an 
expression evaluation (why someone would want to do that...)

Eval!("SET this FROM that USING something over there")();
goes to:
Val!("this")=Act!("that")(Val!("something"),Val!("over"),Val!("there"));

*What is Happening That Prevents This:*

Things are getting mixed in for the most part. However it seems 
specializations for a template must all reside in the same "mixin 
scope". Supposedly through the use of named mixins and aliases, 
overloading of functions using more than one mixin can work. However 
this doesn't seem to work when generating specializations. This seems to 
be the same problem as experienced by Bill Baxter, namely that function 
templates aren't functions, but templates with functions in them. This 
causes problems because some things that work for functions don't work 
for templates. Importantly, templates mixed in from different scope 
don't overload.

*What Would Be Needed To Allow This:*

Two things would be needed. Firstly, some means to make a mixed in 
template (a template inside of a template that is used as a mixin) 
overload with other templates at the scope it is mixed into. Without 
this, none of this will be of much use.
	Secondly, A problems involving order-of-instancing shows up with 
templates as it is. If two template specialization refer to each other, 
there is the possibility that one of them will try to instance the other 
before the specialization for the other is created. The result would be 
either a compile error or an attempt to instance the wrong template. On 
the other hand, if I'm not missing something, a compiler would also be 
correct to lazy evaluate specializations, in which case this problem may 
well not occur. Some way to ensure the desired behavior here would be 
needed.
	One solution would be to define an "abstract template" at the outermost 
scope. This template would have no implementation, but would specify 
what symbols are to be generated. Any use of this template would skip 
the argument deduction rules for templates and just assume that an exact 
specialization will exist. If one is not specified, then an error is 
generated later, probably at link time.

Template Mapper(char[] from, char[] to)
{
  template Map(char[] string: from)
  {
   const char[] Map = to;
  }
}

struct Foo
{
  abstract template Map(char[])
  {
   const char[] Map;
  }

  void fn()
  {
   writeln("%s", Map!("hello"));	// maps "hello" to "world"
   writeln("%s", Map!("good"));	// link error
  }

  mixin Mapper!("hello", "world");
}

[1] see mixin.html down at the bottom
[2] digitalmars.D.learn:"Q about template function overloads"
[3] see "Argument Deduction" in template.html
Dec 19 2006