digitalmars.D - Anyone interested in a Spirit for D?

Walter Bright (5/5) Oct 18 2006 Along the lines of Don's regexp template metaprograms, is anyone

Pragma (18/26) Oct 18 2006 Now there's an idea!

J Duncan (3/33) Oct 18 2006 Yeah Its a good idea, but my first thought was "is that even possible?"
Walter Bright (15/30) Oct 18 2006 I think the operator overloading aspect of Spirit is only a minor part

Bill Baxter (19/32) Oct 18 2006 Huh. Very interesting. Here's the example:

Richard Koch (4/46) Oct 18 2006 all that is cool, but (i know i am the dummy here) readability as in bnf...

Bill Baxter (15/27) Oct 18 2006 You mean this coco?

Paolo Invernizzi (8/11) Oct 19 2006 As a Spirit user, I can tell that readability is one of the most

Walter Bright (2/15) Oct 19 2006 You're probably right.

Bill Baxter (46/50) Oct 18 2006 Now that would be useful I think.

Walter Bright (7/25) Oct 18 2006 Yes, it would be. But there's a catastrophic problem with it. Spirit

Bill Baxter (20/40) Oct 18 2006 But maybe you could allow the user to access those terminals via strings...
BCS (57/73) Oct 18 2006 How about delegate literals for code snipits??
Kristian (17/28) Oct 19 2006 =

Walter Bright (1/1) Oct 19 2006 Not a bad idea.
Jacques (45/57) Oct 19 2006 I actually did something similar to this when creating a generic PEG

BCS (61/69) Oct 19 2006 it's conceivable that in D the above parser could just be created with:
Karen Lanrap (7/10) Oct 19 2006 I do not see why transferring a tool into a library should reduce any

Pragma (38/50) Oct 18 2006 From what I gather, that's the major benefit, other than a

Bill Baxter (38/92) Oct 18 2006 Yes! Sounds like we're thinking along the same lines here. But if

pragma (15/121) Oct 18 2006 Yes and no. The parser generator has a good deal of flexibility

Walter Bright <newshound digitalmars.com> writes:

Along the lines of Don's regexp template metaprograms, is anyone 
interested in a Spirit-like parser generator capability in D?

http://spirit.sourceforge.net/



http://www.codeproject.com/useritems/spart.asp

Oct 18 2006

Pragma <ericanderton yahoo.removeme.com> writes:

Walter Bright wrote:
 Along the lines of Don's regexp template metaprograms, is anyone 
 interested in a Spirit-like parser generator capability in D?
 
 http://spirit.sourceforge.net/
 

 
 http://www.codeproject.com/useritems/spart.asp

Now there's an idea!

Words of caution to follow:

FWIW, I looked into doing this years ago, and didn't get to far.  The 
biggest hurdle, aside from the limitations of templates at the time, was 
a lack of unary operators to override.  In particular, not being able to 
override unary '*' and '!' caused some cosmetic problems.

The only other major hangup I had was not having IFTI so I could 
instantiate templates transparently.  This feature alone could close the 
gap on most of Spirit's useage of C++ templates.  At a minimum, it means 
that a D programmer could get very close to the cosmetic appeal of 
Spirit (operator problems aside).

Don't get me wrong: I'm not a nay-sayer here.  I think this is very 
doable and worthwhile suggestion by Walter.  Folks should take it 
seriously. But it will require some design compromises and changes from 
the original - IMO, it'll probably require more of a re-write than a port.

-- 
- EricAnderton at yahoo

Oct 18 2006

J Duncan <jtd514 nospam.ameritech.net> writes:

Pragma wrote:
 Walter Bright wrote:
 Along the lines of Don's regexp template metaprograms, is anyone 
 interested in a Spirit-like parser generator capability in D?

 http://spirit.sourceforge.net/



 http://www.codeproject.com/useritems/spart.asp

 
 Now there's an idea!
 
 Words of caution to follow:
 
 FWIW, I looked into doing this years ago, and didn't get to far.  The 
 biggest hurdle, aside from the limitations of templates at the time, was 
 a lack of unary operators to override.  In particular, not being able to 
 override unary '*' and '!' caused some cosmetic problems.
 
 The only other major hangup I had was not having IFTI so I could 
 instantiate templates transparently.  This feature alone could close the 
 gap on most of Spirit's useage of C++ templates.  At a minimum, it means 
 that a D programmer could get very close to the cosmetic appeal of 
 Spirit (operator problems aside).
 
 Don't get me wrong: I'm not a nay-sayer here.  I think this is very 
 doable and worthwhile suggestion by Walter.  Folks should take it 
 seriously. But it will require some design compromises and changes from 
 the original - IMO, it'll probably require more of a re-write than a port.
 

Yeah Its a good idea, but my first thought was "is that even possible?" 
It wont be spirit, but a lexer in the uh spirit of spirit :)

Oct 18 2006

Walter Bright <newshound digitalmars.com> writes:

Pragma wrote:
 Words of caution to follow:
 
 FWIW, I looked into doing this years ago, and didn't get to far.  The 
 biggest hurdle, aside from the limitations of templates at the time, was 
 a lack of unary operators to override.  In particular, not being able to 
 override unary '*' and '!' caused some cosmetic problems.

I think the operator overloading aspect of Spirit is only a minor part 
of the implementation - in fact, just a pretty shell around it. It could 
all be done using functional notation.

 The only other major hangup I had was not having IFTI so I could 
 instantiate templates transparently.  This feature alone could close the 
 gap on most of Spirit's useage of C++ templates.  At a minimum, it means 
 that a D programmer could get very close to the cosmetic appeal of 
 Spirit (operator problems aside).


use operator overloading or even templates.


 Don't get me wrong: I'm not a nay-sayer here.  I think this is very 
 doable and worthwhile suggestion by Walter.  Folks should take it 
 seriously. But it will require some design compromises and changes from 
 the original - IMO, it'll probably require more of a re-write than a port.

I think it would be a complete rewrite.

The reason I'm interested in it for D is that:

1) it's a pretty cool library
2) it's one of Boost's most popular ones
3) it's been touted as a reason why D is no good and C++ roolz
4) it's popular enough to have been a driving force behind improvements 
in C++ compilers
5) it would surely improve D
6) and last, and most importantly, it's very useful

Oct 18 2006

Bill Baxter <wbaxter gmail.com> writes:

Walter Bright wrote:

 use operator overloading or even templates.

Huh.  Very interesting.  Here's the example:

// spirit:
num_p >> *( ch_p(',') >> num_p)


Ops.Seq( Prims.Digit, Ops.Start( Ops.Seq(Prims.Ch(','), Prims.Digit)))

Though it's definitely not as easy to read, I think I might actually 

use of operator-overloading is that it can be a real pain to discover 
things because they don't have real names.



   Seq( Digit, Start( Seq(Ch(','), Digit)))

That doesn't look too bad to me.

Still it would rock the world if you could just do:
   parser("digit (',' digit)*");
and have the grammar be verified at compile-time.

 I think it would be a complete rewrite.
 
 The reason I'm interested in it for D is that:
 
 1) it's a pretty cool library
 2) it's one of Boost's most popular ones
 3) it's been touted as a reason why D is no good and C++ roolz
 4) it's popular enough to have been a driving force behind improvements 
 in C++ compilers
 5) it would surely improve D
 6) and last, and most importantly, it's very useful

Excellent reasons.

--bb

Oct 18 2006

Richard Koch <dr.richard.koch t-online.de> writes:

Bill Baxter wrote:
 Walter Bright wrote:

 doesn't use operator overloading or even templates.

 
 Huh.  Very interesting.  Here's the example:
 
 // spirit:
 num_p >> *( ch_p(',') >> num_p)
 

 Ops.Seq( Prims.Digit, Ops.Start( Ops.Seq(Prims.Ch(','), Prims.Digit)))
 
 Though it's definitely not as easy to read, I think I might actually 

 use of operator-overloading is that it can be a real pain to discover 
 things because they don't have real names.
 


   Seq( Digit, Start( Seq(Ch(','), Digit)))
 
 That doesn't look too bad to me.
 
 Still it would rock the world if you could just do:
   parser("digit (',' digit)*");
 and have the grammar be verified at compile-time.
 
 I think it would be a complete rewrite.

 The reason I'm interested in it for D is that:

 1) it's a pretty cool library
 2) it's one of Boost's most popular ones
 3) it's been touted as a reason why D is no good and C++ roolz
 4) it's popular enough to have been a driving force behind 
 improvements in C++ compilers
 5) it would surely improve D
 6) and last, and most importantly, it's very useful

 
 Excellent reasons.
 
 --bb

all that is cool, but (i know i am the dummy here) readability as in bnf 
is something that eludes me. better to go for coco?

richard

Oct 18 2006

Bill Baxter <dnewsgroup billbaxter.com> writes:

Richard Koch wrote:
 Bill Baxter wrote:
 Walter Bright wrote:


 // spirit:
 num_p >> *( ch_p(',') >> num_p)


 Ops.Seq( Prims.Digit, Ops.Start( Ops.Seq(Prims.Ch(','), Prims.Digit)))

 all that is cool, but (i know i am the dummy here) readability as in bnf 
 is something that eludes me. better to go for coco?
 

You mean this coco?
http://www.ssw.uni-linz.ac.at/Coco/

Not sure what you mean by coco being more readable than BNF.  Coco's 
grammar looks pretty much like BNF to me.

----- from Taste.atg -----
VarDecl                  (. wchar_t* name; int type; .)
= Type<type>
   Ident<name>            (. tab->NewObj(name, var, type); .)
   { ',' Ident<name>      (. tab->NewObj(name, var, type); .)
   } ';'.
--------------------------

As far as I can tell that's just good old EBNF with some notations.
    VarDecl ::= Type Ident ("," Ident)* ";"

--bb

Oct 18 2006

Paolo Invernizzi <arathorn NOSPAM_fastwebnet.it> writes:

Walter Bright wrote:


 use operator overloading or even templates.

 3) it's been touted as a reason why D is no good and C++ roolz

As a Spirit user, I can tell that readability is one of the most 
important factor. So, IMHO, if you want to target your 'point 3', I 
think you must have at least the same amount of operator overloading you 
can have in C++.

Or C++ guys will continue to argue that D is no good and C++ roolz. ;-P

---
Paolo Invernizzi

Oct 19 2006

Walter Bright <newshound digitalmars.com> writes:

Paolo Invernizzi wrote:
 Walter Bright wrote:
 

 doesn't use operator overloading or even templates.

 
 3) it's been touted as a reason why D is no good and C++ roolz

 
 As a Spirit user, I can tell that readability is one of the most 
 important factor. So, IMHO, if you want to target your 'point 3', I 
 think you must have at least the same amount of operator overloading you 
 can have in C++.
 
 Or C++ guys will continue to argue that D is no good and C++ roolz. ;-P

You're probably right.

Oct 19 2006

Bill Baxter <wbaxter gmail.com> writes:

Walter Bright wrote:
 Along the lines of Don's regexp template metaprograms, is anyone 
 interested in a Spirit-like parser generator capability in D?
 
 http://spirit.sourceforge.net/

Now that would be useful I think.

Take this example from the Spirit intro of code to make a parser for a 
list of real numbers:
    r = real_p >> *(ch_p(',') >> real_p);

In EBNF that's just:
    real_number ("," real_number)*

In C++ you have to get creative with the operator overloading there 
(prefix '*' used to denote the regexp Kleene star, '>>' used to separate 
tokens)

But given Don's experiments with compile-time text parsing in D, it's 
conceivable that in D the above parser could just be created with:

    r = make_parser("real_number (',' real_number)*");

I.e. use the EBNF version directly in a string literal that gets parsed 
at compile time.
That would be pretty cool.

Though, you know, even thinking about Boost::Spirit, I have to wonder if 
it really is necessary.  From the intro it says that it's primary use is 
"extremely small micro-parsers", not a full blown language processor. 
But if that's the target then the runtime overhead of translating the 
EBNF description to a parser would be pretty trivial.  So I guess the 
real benefit of a compile-time parser-generator is that your grammar can 
be _verified_ at compile-time.

I wonder if it would be any easier to make a compile-time grammar 
verifier than a full blown parser generator?   Then just do the 
parser-generating at runtime.


---
heh heh, this is fun.  From one of the code examples:

   typedef
    alternative<alternative<space_parser, sequence<sequence<
    strlit<const char*>, kleene_star<difference<anychar_parser,
    chlit<char> > > >, chlit<char> > >, sequence<sequence<
    strlit<const char*>, kleene_star<difference<anychar_parser,
    strlit<const char*> > > >, strlit<const char*> > >
    skip_t;

    skip_t skip;

That monster type signature was determined by deliberately forcing a 
compiler error and then copy-pasting the type from the resulting error 
message.  Too funny.  (Note that this as given not as the main way to 
use the library but as a way to eliminate some of the code bloat all the 
templates lead to -- another reason to not try to generate the parser at 
compile-time, but just verify it.)

At any rate the Spirit documentation seems to be rife with juicy 
comments of the form "yes it looks funky, but we're stuck with C++ 
here".  So it's a good place to get ideas for how to make things better.

--bb

Oct 18 2006

Walter Bright <newshound digitalmars.com> writes:

Bill Baxter wrote:
 But given Don's experiments with compile-time text parsing in D, it's 
 conceivable that in D the above parser could just be created with:
 
    r = make_parser("real_number (',' real_number)*");
 
 I.e. use the EBNF version directly in a string literal that gets parsed 
 at compile time.
 That would be pretty cool.

Yes, it would be. But there's a catastrophic problem with it. Spirit 
enables code snippets to be attached to terminals by overloading the [] 
operator. If the EBNF was all in a string literal, this would be impossible.

 Though, you know, even thinking about Boost::Spirit, I have to wonder if 
 it really is necessary.  From the intro it says that it's primary use is 
 "extremely small micro-parsers", not a full blown language processor. 
 But if that's the target then the runtime overhead of translating the 
 EBNF description to a parser would be pretty trivial.  So I guess the 
 real benefit of a compile-time parser-generator is that your grammar can 
 be _verified_ at compile-time.

I disagree. I think the real benefit is avoiding reliance on an add-on 
tool. Such tools are a nuisance; making archival, maintenance, etc., clumsy.

 At any rate the Spirit documentation seems to be rife with juicy 
 comments of the form "yes it looks funky, but we're stuck with C++ 
 here".  So it's a good place to get ideas for how to make things better.

Yup.

Oct 18 2006

Bill Baxter <dnewsgroup billbaxter.com> writes:

Walter Bright wrote:
 Bill Baxter wrote:
 But given Don's experiments with compile-time text parsing in D, it's 
 conceivable that in D the above parser could just be created with:

    r = make_parser("real_number (',' real_number)*");

 I.e. use the EBNF version directly in a string literal that gets 
 parsed at compile time.
 That would be pretty cool.

 
 Yes, it would be. But there's a catastrophic problem with it. Spirit 
 enables code snippets to be attached to terminals by overloading the [] 
 operator. If the EBNF was all in a string literal, this would be 
 impossible.

But maybe you could allow the user to access those terminals via strings:
    r.lookup_terminal("real_number").add_action(&func);
or just
    r.add_action("real_number", &func);


 So I guess the real benefit of a compile-time parser-generator is that 
 your grammar can be _verified_ at compile-time.

 
 I disagree. I think the real benefit is avoiding reliance on an add-on 
 tool. Such tools are a nuisance; making archival, maintenance, etc., 
 clumsy.

Hmm.  Well if no external tools is the main benefit, then simply making 
Lex/Yacc (or more apropriately, Enki) into a library should be 
sufficient.  I guess you do need some way to attach code to terminals at 
runtime, but that's doable via various existing callback mechanisms. 
The machinery needed is basically the same as signals/slots.  You just 
need to be able to do something like
         connect(ASTreeNode.accept(), mycode);
  at runtime.

Then you should be able to get this kind of thing to work:

    auto r = make_parser_node("real_number (',' real_number)*");
    r.add_action("real_number", &func);

using nothing but runtime parsing of the grammar to build your AST.  No 
fancy templates needed, except perhaps in adding the callback to &func.

That kind of thing could be done in C++ too.

--bb

Oct 18 2006

BCS <BCS pathlink.com> writes:

Walter Bright wrote:
 Bill Baxter wrote:
 
 But given Don's experiments with compile-time text parsing in D, it's 
 conceivable that in D the above parser could just be created with:

    r = make_parser("real_number (',' real_number)*");

 I.e. use the EBNF version directly in a string literal that gets 
 parsed at compile time.
 That would be pretty cool.

 
 
 Yes, it would be. But there's a catastrophic problem with it. Spirit 
 enables code snippets to be attached to terminals by overloading the [] 
 operator. If the EBNF was all in a string literal, this would be 
 impossible.


How about delegate literals for code snipits??


template parse(char[] rulename, char[] rule, T, T delegate(T[]) act)
{
	// mixin allows this to be used in the scope
	// parse!(rulename)(out T);

	// for rule ==
	// "AssignExpression | AssignExpression ',' Expression",
	// parse!(rulename) expand to something like this

		// mixin a specialization
	bool parse!(rulename)(out T ret)
	{
		T[2] set
		if(T* ret = rule!("AssignExpression")(set[0]))
		{
			ret = act[0](set);
			return true;
		}
		if(rule!("AssignExpression")(set[0]) &&
			rule!("Expression")(set[1]))
		{
			ret = act[1](set);
			return true;
		}
		false;
	}
	
}

///used something like this

class Parser : RootParser
{
	mixin parse("Expression",
			"
			AssignExpression |
			AssignExpression ',' Expression
			",
		Expr,
		[
		(Expr[] e){return e[0];},
		(Expr[] e){return new Expr(e[0], e[1]);}
		]
		);

	mixin parse("AssignExpression",
			"
			ConditionalExpression |
			ConditionalExpression '=' AssignExpression |
			ConditionalExpression '+=' AssignExpression
			"
		Expr,
		[
		(Expr[] e){return e[0];},
		(Expr[] e){return new AssignExper(e[0], e[1]);},
		(Expr[] e){return new AssignExper(e[0], e[1]);}
		]
		);

}

//// I've never used mixins so I most likely have something wrong in there

Oct 18 2006

Kristian <kjkilpi gmail.com> writes:

On Thu, 19 Oct 2006 00:12:41 +0300, Walter Bright  =

<newshound digitalmars.com> wrote:

 Bill Baxter wrote:
 But given Don's experiments with compile-time text parsing in D, it's=


  =

 conceivable that in D the above parser could just be created with:
     r =3D make_parser("real_number (',' real_number)*");
  I.e. use the EBNF version directly in a string literal that gets  =


 parsed at compile time.
 That would be pretty cool.

 Yes, it would be. But there's a catastrophic problem with it. Spirit  =

 enables code snippets to be attached to terminals by overloading the [=

]  =

 operator. If the EBNF was all in a string literal, this would be  =

 impossible.

Well, couldn't one use arrays to define actions? For example:

     r =3D make_parser("real_number[0] (',' real_number[1])*");

     array[] =3D {&firstAction,   //[0]
                &generalAction  //[1]
                };
     r.attach(array);


You could also give string ids for actions:

     r =3D make_parser("real_number[myaction] (',' real_number[real])*")=
;

     array[] =3D {"myaction", &firstAction,
                "real", &generalAction
                };
     r.attach(array);

Oct 19 2006

Walter Bright <newshound digitalmars.com> writes:

Not a bad idea.

Oct 19 2006

Jacques <jpdt telkomsa.net> writes:

Kristian wrote:
 Well, couldn't one use arrays to define actions? For example:

     r = make_parser("real_number[0] (',' real_number[1])*");

     array[] = {&firstAction,   //[0]
                &generalAction  //[1]
                };
     r.attach(array);


 You could also give string ids for actions:

     r = make_parser("real_number[myaction] (',' real_number[real])*");

     array[] = {"myaction", &firstAction,
                "real", &generalAction
                };
     r.attach(array);

I actually did something similar to this when creating a generic PEG 
parser. (http://en.wikipedia.org/wiki/Parsing_Expression_Grammar)
I opted to use the square brackets to denote semantic groups. Each 
semantic group has an associated delegate, which performs an action
on the nodes in the group and returns a replacement node for the parse 
tree.

The following is part of my calculator test that parses and evaluates an 
expression.

  auto c = new RuleSet();

  // Number.create creates a number object from a string.
  c.addRegex("Number", 	
     "\s*(([\+\-])*([0-9]+)(\.([0-9]*))?(([eE])([\+\-]*)([0-9]+))?)\s*",
     &Number.create);
  c.addRegex("Identifier", r"\s*([a-zA-Z_]\w*)\s*");


  c.addRule("Statement", "Assignment / Expression");
  c.addRule("Assignment", r"[Identifier '=' Expression]",
       // assign value to variable
       delegate Node(Node[] n) {
	 // All this casting feels a bit clunky!
          char[] s = (cast(RegexNode)n[0]).match[1];
          vars[s] = (cast(Number)n[2]).eval();
          return new Number(vars[s]);
       });

  c.addRule("Expression", "[Term (('+' / '-') Term)*]",
       // perform addition and subtraction
       delegate Node(Node[] n) {
	   ...
       });

  c.addRule("Term", "[Factor (('*' / '/') Factor)*]",
       // perform multiplication and division
       delegate Node(Node[] n) {
	 ...
       });


  c.addRule("Factor", "Number / [Identifier] / ['(' Expression ')']",
       // Get value of identifier
       delegate Node(Node[] n) {
        ...
       },
       // 2 is a shortcut for returning 2nd node in semantic group.
       // The expression itself in this case.
       2);

  auto p = new Parser(c);


Its all runtime as my templating abilities arent up to the task.  A 
compile time version will be fun though!

Oct 19 2006

BCS <BCS pathlink.com> writes:

Sorry about that garbald post, lets try that again:

Walter Bright wrote:

 Bill Baxter wrote:

 But given Don's experiments with compile-time text parsing in D, 


it's conceivable that in D the above parser could just be created with:
    r = make_parser("real_number (',' real_number)*");

 I.e. use the EBNF version directly in a string literal that gets 


parsed at compile time.
 That would be pretty cool.



 Yes, it would be. But there's a catastrophic problem with it. Spirit 

enables code snippets to be attached to terminals by overloading the [] 
operator. If the EBNF was all in a string literal, this would be impossible.



How about delegate literals for code snipits??


template parse(char[] rulename, char[] rule, T, T delegate(T[]) act)
{
     // mixin allows this to be used in the scope
     // parse!(rulename)(out T);

     // for rule ==
     // "AssignExpression | AssignExpression ',' Expression",
     // parse!(rulename) expand to something like this

         // mixin a specialization
     bool parse!(rulename)(out T ret)
     {
         T[2] set
         if(T* ret = rule!("AssignExpression")(set[0]))
         {
             ret = act[0](set);
             return true;
         }
         if(rule!("AssignExpression")(set[0]) &&
             rule!("Expression")(set[1]))
         {
             ret = act[1](set);
             return true;
         }
         false;
     }

}

///used something like this

class Parser : RootParser
{
     mixin parse("Expression",
             "
             AssignExpression |
             AssignExpression ',' Expression
             ",
         Expr,
         [
         (Expr[] e){return e[0];},
         (Expr[] e){return new Expr(e[0], e[1]);}
         ]
         );

     mixin parse("AssignExpression",
             "
             ConditionalExpression |
             ConditionalExpression '=' AssignExpression |
             ConditionalExpression '+=' AssignExpression
             "
         Expr,
         [
         (Expr[] e){return e[0];},
         (Expr[] e){return new AssignExper(e[0], e[1]);},
         (Expr[] e){return new AssignExper(e[0], e[1]);}
         ]
         );

}

//// I've never used mixins so I most likely have something wrong in there

Oct 19 2006

Karen Lanrap <karen digitaldaemon.com> writes:

Walter Bright wrote:

 I disagree. I think the real benefit is avoiding reliance on an
 add-on tool. Such tools are a nuisance; making archival,
 maintenance, etc., clumsy. 

I do not see why transferring a tool into a library should reduce any 
clumsiness.

The process of developing compilers is still more a piece of art than  
a piece of engineering and who can propose that the D community is 
able to establish such an engeneering principle by the lonely fact 
that D has templates and mixins?

Oct 19 2006

Pragma <ericanderton yahoo.removeme.com> writes:

Bill Baxter wrote:
 [snip]

 Though, you know, even thinking about Boost::Spirit, I have to wonder if 
 it really is necessary.  From the intro it says that it's primary use is 
 "extremely small micro-parsers", not a full blown language processor. 
 But if that's the target then the runtime overhead of translating the 
 EBNF description to a parser would be pretty trivial.  So I guess the 
 real benefit of a compile-time parser-generator is that your grammar can 
 be _verified_ at compile-time.

 From what I gather, that's the major benefit, other than a 
"self-documenting design".  All the "prettyness" of using a near EBNF 
syntax in C++ code gets you close enough to actual EBNF that it's 
apparent what and how it functions.

However, the only problem with composing this as an EBNF compile-time 
parser, is that you can't attach actions to arbitrary terminals without 
some sort of binding lookup.  I'm not saying it's impossible, but it'll 
be a little odd to use until we get some stronger reflection support.

But what you're suggesting could just as easily be a Compile-Time 
rendition of Enki. It's quite possible to pull off.  Especially if you 
digest the grammar one production at a time as to side-step any 
recursion depth limitations when processing the parser templates. :)

auto grammar = new Parser(
   Production!("Number ::= NumberPart {NumberPart}",
     // binding attached to production ('all' is supplied by default?)
     void function(char[] all){
       writefln("Parsed Number: %s",all);
     }
   ),
   Production!("NumberPart ::= Sep | Digit "),
   Production!("Digit ::= 0|1|2|3|4|5|6|7|8|9"),
   Production!("Sep ::= '_' | ','")
);

// call specifying start production
grammar.parse("Number",myInput);

Depending on how you'd like the call bindings to go, you could probably 
go about as complex as what Enki lets you get away with.  But you'll 
have to accept a 'soft' binding in there someplace, hence you loose the 
type/name checking benefits of being at compile time.

 
 I wonder if it would be any easier to make a compile-time grammar 
 verifier than a full blown parser generator?   Then just do the 
 parser-generating at runtime.

Maybe I don't fully understand, but I don't think there's a gain there. 
  If you've already gone through the gyrations of parsing the BNF 
expression, it's hardly any extra trouble to do something at each step 
of the resulting parse tree*.

(* of course template-based parsers use the call-tree as a parse-tree 
but that's besides the point)

-- 
- EricAnderton at yahoo

Oct 18 2006

Bill Baxter <dnewsgroup billbaxter.com> writes:

Pragma wrote:
 Bill Baxter wrote:
 [snip]

  >
 Though, you know, even thinking about Boost::Spirit, I have to wonder 
 if it really is necessary.  From the intro it says that it's primary 
 use is "extremely small micro-parsers", not a full blown language 
 processor. But if that's the target then the runtime overhead of 
 translating the EBNF description to a parser would be pretty trivial.  
 So I guess the real benefit of a compile-time parser-generator is that 
 your grammar can be _verified_ at compile-time.

 
  From what I gather, that's the major benefit, other than a 
 "self-documenting design".  All the "prettyness" of using a near EBNF 
 syntax in C++ code gets you close enough to actual EBNF that it's 
 apparent what and how it functions.
 
 However, the only problem with composing this as an EBNF compile-time 
 parser, is that you can't attach actions to arbitrary terminals without 
 some sort of binding lookup.  I'm not saying it's impossible, but it'll 
 be a little odd to use until we get some stronger reflection support.
 
 But what you're suggesting could just as easily be a Compile-Time 
 rendition of Enki. It's quite possible to pull off.  Especially if you 
 digest the grammar one production at a time as to side-step any 
 recursion depth limitations when processing the parser templates. :)

Yes!  Sounds like we're thinking along the same lines here.  But if 
Walter's right, that the compile-time verification is not a big deal, 
then it would be even simpler.

Actually it sounds very similar to the way writing shader code for 
OpenGL/Direct3D works.  You have to compile the code it to use it, but 
conveniently compilation is so fast that you can do it at run-time 
easily.  Or if you prefer, you can still precompile it.  What I like to 
do is set up my IDE to go ahead and precompile my shaders just so I can 
check for errors at compile time, but then I use the runtime compilation 
in the end anyway because that makes some things easier -- like 
modifying the code on the fly.

It actually works pretty well I think.  The only difference between 
shader code and grammar code is that shader code doesn't need to make 
any callbacks.  But callbacks aren't hard.

 auto grammar = new Parser(
   Production!("Number ::= NumberPart {NumberPart}",
     // binding attached to production ('all' is supplied by default?)
     void function(char[] all){
       writefln("Parsed Number: %s",all);
     }
   ),
   Production!("NumberPart ::= Sep | Digit "),
   Production!("Digit ::= 0|1|2|3|4|5|6|7|8|9"),
   Production!("Sep ::= '_' | ','")
 );
 
 // call specifying start production
 grammar.parse("Number",myInput);

That's one way to do it, but I think you could also allow bindings to be 
attached after the fact:

  auto grammar = new Parser(
      "Number ::= NumberPart {NumberPart}
       NumberPart ::= Sep | Digit
       Digit ::= 0|1|2|3|4|5|6|7|8|9
       Sep ::= '_' | ','");
    );

  grammer.attach("Number",
      // binding attached to production ('all' is supplied by default?)
      void function(char[] all){
        writefln("Parsed Number: %s",all);
      })

This is _exactly_ how parameter binding works in shader code.  Just here 
the value we're binding is a function pointer instead of a texture 
coordinate or something.

 Depending on how you'd like the call bindings to go, you could probably 
 go about as complex as what Enki lets you get away with.  But you'll 
 have to accept a 'soft' binding in there someplace, hence you loose the 
 type/name checking benefits of being at compile time.

I'll have to take your word for it.  You mean in Enki you can say that 
Number has to output something convertible to 'real'?

 I wonder if it would be any easier to make a compile-time grammar 
 verifier than a full blown parser generator?   Then just do the 
 parser-generating at runtime.

 
 Maybe I don't fully understand, but I don't think there's a gain there. 
  If you've already gone through the gyrations of parsing the BNF 
 expression, it's hardly any extra trouble to do something at each step 
 of the resulting parse tree*.
 
 (* of course template-based parsers use the call-tree as a parse-tree 
 but that's besides the point)

Yeh, I was just talking crap.  I thought maybe you might be able to save 
some bookkeeping if all you cared about was that the grammar made a 
valid tree, but didn't care about it's output.  But probably it's the 
other way around.  Checking validity is the hard part, not making a tree.

--bb

Oct 18 2006

pragma <ericanderton yahoo.com> writes:

Bill Baxter wrote:
 Pragma wrote:
 Bill Baxter wrote:
 [snip]

  >
 Though, you know, even thinking about Boost::Spirit, I have to wonder 
 if it really is necessary.  From the intro it says that it's primary 
 use is "extremely small micro-parsers", not a full blown language 
 processor. But if that's the target then the runtime overhead of 
 translating the EBNF description to a parser would be pretty 
 trivial.  So I guess the real benefit of a compile-time 
 parser-generator is that your grammar can be _verified_ at compile-time.

  From what I gather, that's the major benefit, other than a 
 "self-documenting design".  All the "prettyness" of using a near EBNF 
 syntax in C++ code gets you close enough to actual EBNF that it's 
 apparent what and how it functions.

 However, the only problem with composing this as an EBNF compile-time 
 parser, is that you can't attach actions to arbitrary terminals 
 without some sort of binding lookup.  I'm not saying it's impossible, 
 but it'll be a little odd to use until we get some stronger reflection 
 support.

 But what you're suggesting could just as easily be a Compile-Time 
 rendition of Enki. It's quite possible to pull off.  Especially if you 
 digest the grammar one production at a time as to side-step any 
 recursion depth limitations when processing the parser templates. :)

 
 Yes!  Sounds like we're thinking along the same lines here.  But if 
 Walter's right, that the compile-time verification is not a big deal, 
 then it would be even simpler.
 
 Actually it sounds very similar to the way writing shader code for 
 OpenGL/Direct3D works.  You have to compile the code it to use it, but 
 conveniently compilation is so fast that you can do it at run-time 
 easily.  Or if you prefer, you can still precompile it.  What I like to 
 do is set up my IDE to go ahead and precompile my shaders just so I can 
 check for errors at compile time, but then I use the runtime compilation 
 in the end anyway because that makes some things easier -- like 
 modifying the code on the fly.
 
 It actually works pretty well I think.  The only difference between 
 shader code and grammar code is that shader code doesn't need to make 
 any callbacks.  But callbacks aren't hard.
 
 auto grammar = new Parser(
   Production!("Number ::= NumberPart {NumberPart}",
     // binding attached to production ('all' is supplied by default?)
     void function(char[] all){
       writefln("Parsed Number: %s",all);
     }
   ),
   Production!("NumberPart ::= Sep | Digit "),
   Production!("Digit ::= 0|1|2|3|4|5|6|7|8|9"),
   Production!("Sep ::= '_' | ','")
 );

 // call specifying start production
 grammar.parse("Number",myInput);

 
 That's one way to do it, but I think you could also allow bindings to be 
 attached after the fact:
 
  auto grammar = new Parser(
      "Number ::= NumberPart {NumberPart}
       NumberPart ::= Sep | Digit
       Digit ::= 0|1|2|3|4|5|6|7|8|9
       Sep ::= '_' | ','");
    );
 
  grammer.attach("Number",
      // binding attached to production ('all' is supplied by default?)
      void function(char[] all){
        writefln("Parsed Number: %s",all);
      })
 
 This is _exactly_ how parameter binding works in shader code.  Just here 
 the value we're binding is a function pointer instead of a texture 
 coordinate or something.
 
 Depending on how you'd like the call bindings to go, you could 
 probably go about as complex as what Enki lets you get away with.  But 
 you'll have to accept a 'soft' binding in there someplace, hence you 
 loose the type/name checking benefits of being at compile time.

 
 I'll have to take your word for it.  You mean in Enki you can say that 
 Number has to output something convertible to 'real'?

Yes and no.  The parser generator has a good deal of flexibility 
built-in, including a pseudo-variant type that tries to perform 
conversions wherever possible.  For instance, if we re-wrote the 
production for Number like so:

Number
   = real handleNumber(whole,part)
   ::= (NumberPart {NumberPart}):whole '.'
       (NumberPart {NumberPart}):fraction;

... Enki would emit code that binds the chars traversed for for 'whole' 
and 'fraction', and passes those onto a function called 'handleNumber' 
that returns a real.  That return value is passed up parse chain so that 
other terminals can bind to it:

Foobar = writeMe(foo) ::= Number:foo;

And so on.

 
 I wonder if it would be any easier to make a compile-time grammar 
 verifier than a full blown parser generator?   Then just do the 
 parser-generating at runtime.

 Maybe I don't fully understand, but I don't think there's a gain 
 there.  If you've already gone through the gyrations of parsing the 
 BNF expression, it's hardly any extra trouble to do something at each 
 step of the resulting parse tree*.

 (* of course template-based parsers use the call-tree as a parse-tree 
 but that's besides the point)

 
 Yeh, I was just talking crap.  I thought maybe you might be able to save 
 some bookkeeping if all you cared about was that the grammar made a 
 valid tree, but didn't care about it's output.  But probably it's the 
 other way around.  Checking validity is the hard part, not making a tree.
 
 --bb

Oct 18 2006

D Programming

C/C++ Programming

Other

digitalmars.D - Anyone interested in a Spirit for D?