www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Anyone interested in a Spirit for D?

reply Walter Bright <newshound digitalmars.com> writes:
Along the lines of Don's regexp template metaprograms, is anyone 
interested in a Spirit-like parser generator capability in D?

http://spirit.sourceforge.net/

Apparently, someone has gotten Spirit to work with C#:

http://www.codeproject.com/useritems/spart.asp
Oct 18 2006
next sibling parent reply Pragma <ericanderton yahoo.removeme.com> writes:
Walter Bright wrote:
 Along the lines of Don's regexp template metaprograms, is anyone 
 interested in a Spirit-like parser generator capability in D?
 
 http://spirit.sourceforge.net/
 
 Apparently, someone has gotten Spirit to work with C#:
 
 http://www.codeproject.com/useritems/spart.asp

Now there's an idea! Words of caution to follow: FWIW, I looked into doing this years ago, and didn't get to far. The biggest hurdle, aside from the limitations of templates at the time, was a lack of unary operators to override. In particular, not being able to override unary '*' and '!' caused some cosmetic problems. The only other major hangup I had was not having IFTI so I could instantiate templates transparently. This feature alone could close the gap on most of Spirit's useage of C++ templates. At a minimum, it means that a D programmer could get very close to the cosmetic appeal of Spirit (operator problems aside). Don't get me wrong: I'm not a nay-sayer here. I think this is very doable and worthwhile suggestion by Walter. Folks should take it seriously. But it will require some design compromises and changes from the original - IMO, it'll probably require more of a re-write than a port. -- - EricAnderton at yahoo
Oct 18 2006
next sibling parent J Duncan <jtd514 nospam.ameritech.net> writes:
Pragma wrote:
 Walter Bright wrote:
 Along the lines of Don's regexp template metaprograms, is anyone 
 interested in a Spirit-like parser generator capability in D?

 http://spirit.sourceforge.net/

 Apparently, someone has gotten Spirit to work with C#:

 http://www.codeproject.com/useritems/spart.asp

Now there's an idea! Words of caution to follow: FWIW, I looked into doing this years ago, and didn't get to far. The biggest hurdle, aside from the limitations of templates at the time, was a lack of unary operators to override. In particular, not being able to override unary '*' and '!' caused some cosmetic problems. The only other major hangup I had was not having IFTI so I could instantiate templates transparently. This feature alone could close the gap on most of Spirit's useage of C++ templates. At a minimum, it means that a D programmer could get very close to the cosmetic appeal of Spirit (operator problems aside). Don't get me wrong: I'm not a nay-sayer here. I think this is very doable and worthwhile suggestion by Walter. Folks should take it seriously. But it will require some design compromises and changes from the original - IMO, it'll probably require more of a re-write than a port.

Yeah Its a good idea, but my first thought was "is that even possible?" It wont be spirit, but a lexer in the uh spirit of spirit :)
Oct 18 2006
prev sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Pragma wrote:
 Words of caution to follow:
 
 FWIW, I looked into doing this years ago, and didn't get to far.  The 
 biggest hurdle, aside from the limitations of templates at the time, was 
 a lack of unary operators to override.  In particular, not being able to 
 override unary '*' and '!' caused some cosmetic problems.

I think the operator overloading aspect of Spirit is only a minor part of the implementation - in fact, just a pretty shell around it. It could all be done using functional notation.
 The only other major hangup I had was not having IFTI so I could 
 instantiate templates transparently.  This feature alone could close the 
 gap on most of Spirit's useage of C++ templates.  At a minimum, it means 
 that a D programmer could get very close to the cosmetic appeal of 
 Spirit (operator problems aside).

I think it would be worth looking at again. The C# version of it doesn't use operator overloading or even templates.
 Don't get me wrong: I'm not a nay-sayer here.  I think this is very 
 doable and worthwhile suggestion by Walter.  Folks should take it 
 seriously. But it will require some design compromises and changes from 
 the original - IMO, it'll probably require more of a re-write than a port.

I think it would be a complete rewrite. The reason I'm interested in it for D is that: 1) it's a pretty cool library 2) it's one of Boost's most popular ones 3) it's been touted as a reason why D is no good and C++ roolz 4) it's popular enough to have been a driving force behind improvements in C++ compilers 5) it would surely improve D 6) and last, and most importantly, it's very useful
Oct 18 2006
next sibling parent reply Bill Baxter <wbaxter gmail.com> writes:
Walter Bright wrote:
 I think it would be worth looking at again. The C# version of it doesn't 
 use operator overloading or even templates.

Huh. Very interesting. Here's the example: // spirit: num_p >> *( ch_p(',') >> num_p) // C# Ops.Seq( Prims.Digit, Ops.Start( Ops.Seq(Prims.Ch(','), Prims.Digit))) Though it's definitely not as easy to read, I think I might actually prefer the C# version. Part of the annoyance with Boost super-clever use of operator-overloading is that it can be a real pain to discover things because they don't have real names. I bet the C# version could be compacted with some aliases or imports (assuming C# has these): Seq( Digit, Start( Seq(Ch(','), Digit))) That doesn't look too bad to me. Still it would rock the world if you could just do: parser("digit (',' digit)*"); and have the grammar be verified at compile-time.
 I think it would be a complete rewrite.
 
 The reason I'm interested in it for D is that:
 
 1) it's a pretty cool library
 2) it's one of Boost's most popular ones
 3) it's been touted as a reason why D is no good and C++ roolz
 4) it's popular enough to have been a driving force behind improvements 
 in C++ compilers
 5) it would surely improve D
 6) and last, and most importantly, it's very useful

Excellent reasons. --bb
Oct 18 2006
parent reply Richard Koch <dr.richard.koch t-online.de> writes:
Bill Baxter wrote:
 Walter Bright wrote:
 I think it would be worth looking at again. The C# version of it 
 doesn't use operator overloading or even templates.

Huh. Very interesting. Here's the example: // spirit: num_p >> *( ch_p(',') >> num_p) // C# Ops.Seq( Prims.Digit, Ops.Start( Ops.Seq(Prims.Ch(','), Prims.Digit))) Though it's definitely not as easy to read, I think I might actually prefer the C# version. Part of the annoyance with Boost super-clever use of operator-overloading is that it can be a real pain to discover things because they don't have real names. I bet the C# version could be compacted with some aliases or imports (assuming C# has these): Seq( Digit, Start( Seq(Ch(','), Digit))) That doesn't look too bad to me. Still it would rock the world if you could just do: parser("digit (',' digit)*"); and have the grammar be verified at compile-time.
 I think it would be a complete rewrite.

 The reason I'm interested in it for D is that:

 1) it's a pretty cool library
 2) it's one of Boost's most popular ones
 3) it's been touted as a reason why D is no good and C++ roolz
 4) it's popular enough to have been a driving force behind 
 improvements in C++ compilers
 5) it would surely improve D
 6) and last, and most importantly, it's very useful

Excellent reasons. --bb

is something that eludes me. better to go for coco? richard
Oct 18 2006
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Richard Koch wrote:
 Bill Baxter wrote:
 Walter Bright wrote:


 // spirit:
 num_p >> *( ch_p(',') >> num_p)

 // C#
 Ops.Seq( Prims.Digit, Ops.Start( Ops.Seq(Prims.Ch(','), Prims.Digit)))

is something that eludes me. better to go for coco?

You mean this coco? http://www.ssw.uni-linz.ac.at/Coco/ Not sure what you mean by coco being more readable than BNF. Coco's grammar looks pretty much like BNF to me. ----- from Taste.atg ----- VarDecl (. wchar_t* name; int type; .) = Type<type> Ident<name> (. tab->NewObj(name, var, type); .) { ',' Ident<name> (. tab->NewObj(name, var, type); .) } ';'. -------------------------- As far as I can tell that's just good old EBNF with some notations. VarDecl ::= Type Ident ("," Ident)* ";" --bb
Oct 18 2006
prev sibling parent reply Paolo Invernizzi <arathorn NOSPAM_fastwebnet.it> writes:
Walter Bright wrote:

 I think it would be worth looking at again. The C# version of it doesn't 
 use operator overloading or even templates.

 3) it's been touted as a reason why D is no good and C++ roolz

As a Spirit user, I can tell that readability is one of the most important factor. So, IMHO, if you want to target your 'point 3', I think you must have at least the same amount of operator overloading you can have in C++. Or C++ guys will continue to argue that D is no good and C++ roolz. ;-P --- Paolo Invernizzi
Oct 19 2006
parent Walter Bright <newshound digitalmars.com> writes:
Paolo Invernizzi wrote:
 Walter Bright wrote:
 
 I think it would be worth looking at again. The C# version of it 
 doesn't use operator overloading or even templates.

 3) it's been touted as a reason why D is no good and C++ roolz

As a Spirit user, I can tell that readability is one of the most important factor. So, IMHO, if you want to target your 'point 3', I think you must have at least the same amount of operator overloading you can have in C++. Or C++ guys will continue to argue that D is no good and C++ roolz. ;-P

You're probably right.
Oct 19 2006
prev sibling parent reply Bill Baxter <wbaxter gmail.com> writes:
Walter Bright wrote:
 Along the lines of Don's regexp template metaprograms, is anyone 
 interested in a Spirit-like parser generator capability in D?
 
 http://spirit.sourceforge.net/

Now that would be useful I think. Take this example from the Spirit intro of code to make a parser for a list of real numbers: r = real_p >> *(ch_p(',') >> real_p); In EBNF that's just: real_number ("," real_number)* In C++ you have to get creative with the operator overloading there (prefix '*' used to denote the regexp Kleene star, '>>' used to separate tokens) But given Don's experiments with compile-time text parsing in D, it's conceivable that in D the above parser could just be created with: r = make_parser("real_number (',' real_number)*"); I.e. use the EBNF version directly in a string literal that gets parsed at compile time. That would be pretty cool. Though, you know, even thinking about Boost::Spirit, I have to wonder if it really is necessary. From the intro it says that it's primary use is "extremely small micro-parsers", not a full blown language processor. But if that's the target then the runtime overhead of translating the EBNF description to a parser would be pretty trivial. So I guess the real benefit of a compile-time parser-generator is that your grammar can be _verified_ at compile-time. I wonder if it would be any easier to make a compile-time grammar verifier than a full blown parser generator? Then just do the parser-generating at runtime. --- heh heh, this is fun. From one of the code examples: typedef alternative<alternative<space_parser, sequence<sequence< strlit<const char*>, kleene_star<difference<anychar_parser, chlit<char> > > >, chlit<char> > >, sequence<sequence< strlit<const char*>, kleene_star<difference<anychar_parser, strlit<const char*> > > >, strlit<const char*> > > skip_t; skip_t skip; That monster type signature was determined by deliberately forcing a compiler error and then copy-pasting the type from the resulting error message. Too funny. (Note that this as given not as the main way to use the library but as a way to eliminate some of the code bloat all the templates lead to -- another reason to not try to generate the parser at compile-time, but just verify it.) At any rate the Spirit documentation seems to be rife with juicy comments of the form "yes it looks funky, but we're stuck with C++ here". So it's a good place to get ideas for how to make things better. --bb
Oct 18 2006
next sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Bill Baxter wrote:
 But given Don's experiments with compile-time text parsing in D, it's 
 conceivable that in D the above parser could just be created with:
 
    r = make_parser("real_number (',' real_number)*");
 
 I.e. use the EBNF version directly in a string literal that gets parsed 
 at compile time.
 That would be pretty cool.

Yes, it would be. But there's a catastrophic problem with it. Spirit enables code snippets to be attached to terminals by overloading the [] operator. If the EBNF was all in a string literal, this would be impossible.
 Though, you know, even thinking about Boost::Spirit, I have to wonder if 
 it really is necessary.  From the intro it says that it's primary use is 
 "extremely small micro-parsers", not a full blown language processor. 
 But if that's the target then the runtime overhead of translating the 
 EBNF description to a parser would be pretty trivial.  So I guess the 
 real benefit of a compile-time parser-generator is that your grammar can 
 be _verified_ at compile-time.

I disagree. I think the real benefit is avoiding reliance on an add-on tool. Such tools are a nuisance; making archival, maintenance, etc., clumsy.
 At any rate the Spirit documentation seems to be rife with juicy 
 comments of the form "yes it looks funky, but we're stuck with C++ 
 here".  So it's a good place to get ideas for how to make things better.

Yup.
Oct 18 2006
next sibling parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Walter Bright wrote:
 Bill Baxter wrote:
 But given Don's experiments with compile-time text parsing in D, it's 
 conceivable that in D the above parser could just be created with:

    r = make_parser("real_number (',' real_number)*");

 I.e. use the EBNF version directly in a string literal that gets 
 parsed at compile time.
 That would be pretty cool.

Yes, it would be. But there's a catastrophic problem with it. Spirit enables code snippets to be attached to terminals by overloading the [] operator. If the EBNF was all in a string literal, this would be impossible.

But maybe you could allow the user to access those terminals via strings: r.lookup_terminal("real_number").add_action(&func); or just r.add_action("real_number", &func);
 So I guess the real benefit of a compile-time parser-generator is that 
 your grammar can be _verified_ at compile-time.

I disagree. I think the real benefit is avoiding reliance on an add-on tool. Such tools are a nuisance; making archival, maintenance, etc., clumsy.

Hmm. Well if no external tools is the main benefit, then simply making Lex/Yacc (or more apropriately, Enki) into a library should be sufficient. I guess you do need some way to attach code to terminals at runtime, but that's doable via various existing callback mechanisms. The machinery needed is basically the same as signals/slots. You just need to be able to do something like connect(ASTreeNode.accept(), mycode); at runtime. Then you should be able to get this kind of thing to work: auto r = make_parser_node("real_number (',' real_number)*"); r.add_action("real_number", &func); using nothing but runtime parsing of the grammar to build your AST. No fancy templates needed, except perhaps in adding the callback to &func. That kind of thing could be done in C++ too. --bb
Oct 18 2006
prev sibling next sibling parent BCS <BCS pathlink.com> writes:
Walter Bright wrote:
 Bill Baxter wrote:
 
 But given Don's experiments with compile-time text parsing in D, it's 
 conceivable that in D the above parser could just be created with:

    r = make_parser("real_number (',' real_number)*");

 I.e. use the EBNF version directly in a string literal that gets 
 parsed at compile time.
 That would be pretty cool.

Yes, it would be. But there's a catastrophic problem with it. Spirit enables code snippets to be attached to terminals by overloading the [] operator. If the EBNF was all in a string literal, this would be impossible.

How about delegate literals for code snipits?? template parse(char[] rulename, char[] rule, T, T delegate(T[]) act) { // mixin allows this to be used in the scope // parse!(rulename)(out T); // for rule == // "AssignExpression | AssignExpression ',' Expression", // parse!(rulename) expand to something like this // mixin a specialization bool parse!(rulename)(out T ret) { T[2] set if(T* ret = rule!("AssignExpression")(set[0])) { ret = act[0](set); return true; } if(rule!("AssignExpression")(set[0]) && rule!("Expression")(set[1])) { ret = act[1](set); return true; } false; } } ///used something like this class Parser : RootParser { mixin parse("Expression", " AssignExpression | AssignExpression ',' Expression ", Expr, [ (Expr[] e){return e[0];}, (Expr[] e){return new Expr(e[0], e[1]);} ] ); mixin parse("AssignExpression", " ConditionalExpression | ConditionalExpression '=' AssignExpression | ConditionalExpression '+=' AssignExpression " Expr, [ (Expr[] e){return e[0];}, (Expr[] e){return new AssignExper(e[0], e[1]);}, (Expr[] e){return new AssignExper(e[0], e[1]);} ] ); } //// I've never used mixins so I most likely have something wrong in there
Oct 18 2006
prev sibling next sibling parent reply Kristian <kjkilpi gmail.com> writes:
On Thu, 19 Oct 2006 00:12:41 +0300, Walter Bright  =

<newshound digitalmars.com> wrote:

 Bill Baxter wrote:
 But given Don's experiments with compile-time text parsing in D, it's=


 conceivable that in D the above parser could just be created with:
     r =3D make_parser("real_number (',' real_number)*");
  I.e. use the EBNF version directly in a string literal that gets  =


 parsed at compile time.
 That would be pretty cool.

Yes, it would be. But there's a catastrophic problem with it. Spirit =

 enables code snippets to be attached to terminals by overloading the [=

 operator. If the EBNF was all in a string literal, this would be  =

 impossible.

Well, couldn't one use arrays to define actions? For example: r =3D make_parser("real_number[0] (',' real_number[1])*"); array[] =3D {&firstAction, //[0] &generalAction //[1] }; r.attach(array); You could also give string ids for actions: r =3D make_parser("real_number[myaction] (',' real_number[real])*")= ; array[] =3D {"myaction", &firstAction, "real", &generalAction }; r.attach(array);
Oct 19 2006
next sibling parent Walter Bright <newshound digitalmars.com> writes:
Not a bad idea.
Oct 19 2006
prev sibling parent Jacques <jpdt telkomsa.net> writes:
Kristian wrote:
 Well, couldn't one use arrays to define actions? For example:

     r = make_parser("real_number[0] (',' real_number[1])*");

     array[] = {&firstAction,   //[0]
                &generalAction  //[1]
                };
     r.attach(array);


 You could also give string ids for actions:

     r = make_parser("real_number[myaction] (',' real_number[real])*");

     array[] = {"myaction", &firstAction,
                "real", &generalAction
                };
     r.attach(array);

I actually did something similar to this when creating a generic PEG parser. (http://en.wikipedia.org/wiki/Parsing_Expression_Grammar) I opted to use the square brackets to denote semantic groups. Each semantic group has an associated delegate, which performs an action on the nodes in the group and returns a replacement node for the parse tree. The following is part of my calculator test that parses and evaluates an expression. auto c = new RuleSet(); // Number.create creates a number object from a string. c.addRegex("Number", "\s*(([\+\-])*([0-9]+)(\.([0-9]*))?(([eE])([\+\-]*)([0-9]+))?)\s*", &Number.create); c.addRegex("Identifier", r"\s*([a-zA-Z_]\w*)\s*"); c.addRule("Statement", "Assignment / Expression"); c.addRule("Assignment", r"[Identifier '=' Expression]", // assign value to variable delegate Node(Node[] n) { // All this casting feels a bit clunky! char[] s = (cast(RegexNode)n[0]).match[1]; vars[s] = (cast(Number)n[2]).eval(); return new Number(vars[s]); }); c.addRule("Expression", "[Term (('+' / '-') Term)*]", // perform addition and subtraction delegate Node(Node[] n) { ... }); c.addRule("Term", "[Factor (('*' / '/') Factor)*]", // perform multiplication and division delegate Node(Node[] n) { ... }); c.addRule("Factor", "Number / [Identifier] / ['(' Expression ')']", // Get value of identifier delegate Node(Node[] n) { ... }, // 2 is a shortcut for returning 2nd node in semantic group. // The expression itself in this case. 2); auto p = new Parser(c); Its all runtime as my templating abilities arent up to the task. A compile time version will be fun though!
Oct 19 2006
prev sibling next sibling parent BCS <BCS pathlink.com> writes:
Sorry about that garbald post, lets try that again:

Walter Bright wrote:

 Bill Baxter wrote:

 But given Don's experiments with compile-time text parsing in D, 


    r = make_parser("real_number (',' real_number)*");

 I.e. use the EBNF version directly in a string literal that gets 


 That would be pretty cool.

Yes, it would be. But there's a catastrophic problem with it. Spirit

operator. If the EBNF was all in a string literal, this would be impossible. How about delegate literals for code snipits?? template parse(char[] rulename, char[] rule, T, T delegate(T[]) act) { // mixin allows this to be used in the scope // parse!(rulename)(out T); // for rule == // "AssignExpression | AssignExpression ',' Expression", // parse!(rulename) expand to something like this // mixin a specialization bool parse!(rulename)(out T ret) { T[2] set if(T* ret = rule!("AssignExpression")(set[0])) { ret = act[0](set); return true; } if(rule!("AssignExpression")(set[0]) && rule!("Expression")(set[1])) { ret = act[1](set); return true; } false; } } ///used something like this class Parser : RootParser { mixin parse("Expression", " AssignExpression | AssignExpression ',' Expression ", Expr, [ (Expr[] e){return e[0];}, (Expr[] e){return new Expr(e[0], e[1]);} ] ); mixin parse("AssignExpression", " ConditionalExpression | ConditionalExpression '=' AssignExpression | ConditionalExpression '+=' AssignExpression " Expr, [ (Expr[] e){return e[0];}, (Expr[] e){return new AssignExper(e[0], e[1]);}, (Expr[] e){return new AssignExper(e[0], e[1]);} ] ); } //// I've never used mixins so I most likely have something wrong in there
Oct 19 2006
prev sibling parent Karen Lanrap <karen digitaldaemon.com> writes:
Walter Bright wrote:

 I disagree. I think the real benefit is avoiding reliance on an
 add-on tool. Such tools are a nuisance; making archival,
 maintenance, etc., clumsy. 

I do not see why transferring a tool into a library should reduce any clumsiness. The process of developing compilers is still more a piece of art than a piece of engineering and who can propose that the D community is able to establish such an engeneering principle by the lonely fact that D has templates and mixins?
Oct 19 2006
prev sibling parent reply Pragma <ericanderton yahoo.removeme.com> writes:
Bill Baxter wrote:
 [snip]

 Though, you know, even thinking about Boost::Spirit, I have to wonder if 
 it really is necessary.  From the intro it says that it's primary use is 
 "extremely small micro-parsers", not a full blown language processor. 
 But if that's the target then the runtime overhead of translating the 
 EBNF description to a parser would be pretty trivial.  So I guess the 
 real benefit of a compile-time parser-generator is that your grammar can 
 be _verified_ at compile-time.

From what I gather, that's the major benefit, other than a "self-documenting design". All the "prettyness" of using a near EBNF syntax in C++ code gets you close enough to actual EBNF that it's apparent what and how it functions. However, the only problem with composing this as an EBNF compile-time parser, is that you can't attach actions to arbitrary terminals without some sort of binding lookup. I'm not saying it's impossible, but it'll be a little odd to use until we get some stronger reflection support. But what you're suggesting could just as easily be a Compile-Time rendition of Enki. It's quite possible to pull off. Especially if you digest the grammar one production at a time as to side-step any recursion depth limitations when processing the parser templates. :) auto grammar = new Parser( Production!("Number ::= NumberPart {NumberPart}", // binding attached to production ('all' is supplied by default?) void function(char[] all){ writefln("Parsed Number: %s",all); } ), Production!("NumberPart ::= Sep | Digit "), Production!("Digit ::= 0|1|2|3|4|5|6|7|8|9"), Production!("Sep ::= '_' | ','") ); // call specifying start production grammar.parse("Number",myInput); Depending on how you'd like the call bindings to go, you could probably go about as complex as what Enki lets you get away with. But you'll have to accept a 'soft' binding in there someplace, hence you loose the type/name checking benefits of being at compile time.
 
 I wonder if it would be any easier to make a compile-time grammar 
 verifier than a full blown parser generator?   Then just do the 
 parser-generating at runtime.

Maybe I don't fully understand, but I don't think there's a gain there. If you've already gone through the gyrations of parsing the BNF expression, it's hardly any extra trouble to do something at each step of the resulting parse tree*. (* of course template-based parsers use the call-tree as a parse-tree but that's besides the point) -- - EricAnderton at yahoo
Oct 18 2006
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Pragma wrote:
 Bill Baxter wrote:
 [snip]

 Though, you know, even thinking about Boost::Spirit, I have to wonder 
 if it really is necessary.  From the intro it says that it's primary 
 use is "extremely small micro-parsers", not a full blown language 
 processor. But if that's the target then the runtime overhead of 
 translating the EBNF description to a parser would be pretty trivial.  
 So I guess the real benefit of a compile-time parser-generator is that 
 your grammar can be _verified_ at compile-time.

From what I gather, that's the major benefit, other than a "self-documenting design". All the "prettyness" of using a near EBNF syntax in C++ code gets you close enough to actual EBNF that it's apparent what and how it functions. However, the only problem with composing this as an EBNF compile-time parser, is that you can't attach actions to arbitrary terminals without some sort of binding lookup. I'm not saying it's impossible, but it'll be a little odd to use until we get some stronger reflection support. But what you're suggesting could just as easily be a Compile-Time rendition of Enki. It's quite possible to pull off. Especially if you digest the grammar one production at a time as to side-step any recursion depth limitations when processing the parser templates. :)

Yes! Sounds like we're thinking along the same lines here. But if Walter's right, that the compile-time verification is not a big deal, then it would be even simpler. Actually it sounds very similar to the way writing shader code for OpenGL/Direct3D works. You have to compile the code it to use it, but conveniently compilation is so fast that you can do it at run-time easily. Or if you prefer, you can still precompile it. What I like to do is set up my IDE to go ahead and precompile my shaders just so I can check for errors at compile time, but then I use the runtime compilation in the end anyway because that makes some things easier -- like modifying the code on the fly. It actually works pretty well I think. The only difference between shader code and grammar code is that shader code doesn't need to make any callbacks. But callbacks aren't hard.
 auto grammar = new Parser(
   Production!("Number ::= NumberPart {NumberPart}",
     // binding attached to production ('all' is supplied by default?)
     void function(char[] all){
       writefln("Parsed Number: %s",all);
     }
   ),
   Production!("NumberPart ::= Sep | Digit "),
   Production!("Digit ::= 0|1|2|3|4|5|6|7|8|9"),
   Production!("Sep ::= '_' | ','")
 );
 
 // call specifying start production
 grammar.parse("Number",myInput);

That's one way to do it, but I think you could also allow bindings to be attached after the fact: auto grammar = new Parser( "Number ::= NumberPart {NumberPart} NumberPart ::= Sep | Digit Digit ::= 0|1|2|3|4|5|6|7|8|9 Sep ::= '_' | ','"); ); grammer.attach("Number", // binding attached to production ('all' is supplied by default?) void function(char[] all){ writefln("Parsed Number: %s",all); }) This is _exactly_ how parameter binding works in shader code. Just here the value we're binding is a function pointer instead of a texture coordinate or something.
 Depending on how you'd like the call bindings to go, you could probably 
 go about as complex as what Enki lets you get away with.  But you'll 
 have to accept a 'soft' binding in there someplace, hence you loose the 
 type/name checking benefits of being at compile time.

I'll have to take your word for it. You mean in Enki you can say that Number has to output something convertible to 'real'?
 I wonder if it would be any easier to make a compile-time grammar 
 verifier than a full blown parser generator?   Then just do the 
 parser-generating at runtime.

Maybe I don't fully understand, but I don't think there's a gain there. If you've already gone through the gyrations of parsing the BNF expression, it's hardly any extra trouble to do something at each step of the resulting parse tree*. (* of course template-based parsers use the call-tree as a parse-tree but that's besides the point)

Yeh, I was just talking crap. I thought maybe you might be able to save some bookkeeping if all you cared about was that the grammar made a valid tree, but didn't care about it's output. But probably it's the other way around. Checking validity is the hard part, not making a tree. --bb
Oct 18 2006
parent pragma <ericanderton yahoo.com> writes:
Bill Baxter wrote:
 Pragma wrote:
 Bill Baxter wrote:
 [snip]

 Though, you know, even thinking about Boost::Spirit, I have to wonder 
 if it really is necessary.  From the intro it says that it's primary 
 use is "extremely small micro-parsers", not a full blown language 
 processor. But if that's the target then the runtime overhead of 
 translating the EBNF description to a parser would be pretty 
 trivial.  So I guess the real benefit of a compile-time 
 parser-generator is that your grammar can be _verified_ at compile-time.

From what I gather, that's the major benefit, other than a "self-documenting design". All the "prettyness" of using a near EBNF syntax in C++ code gets you close enough to actual EBNF that it's apparent what and how it functions. However, the only problem with composing this as an EBNF compile-time parser, is that you can't attach actions to arbitrary terminals without some sort of binding lookup. I'm not saying it's impossible, but it'll be a little odd to use until we get some stronger reflection support. But what you're suggesting could just as easily be a Compile-Time rendition of Enki. It's quite possible to pull off. Especially if you digest the grammar one production at a time as to side-step any recursion depth limitations when processing the parser templates. :)

Yes! Sounds like we're thinking along the same lines here. But if Walter's right, that the compile-time verification is not a big deal, then it would be even simpler. Actually it sounds very similar to the way writing shader code for OpenGL/Direct3D works. You have to compile the code it to use it, but conveniently compilation is so fast that you can do it at run-time easily. Or if you prefer, you can still precompile it. What I like to do is set up my IDE to go ahead and precompile my shaders just so I can check for errors at compile time, but then I use the runtime compilation in the end anyway because that makes some things easier -- like modifying the code on the fly. It actually works pretty well I think. The only difference between shader code and grammar code is that shader code doesn't need to make any callbacks. But callbacks aren't hard.
 auto grammar = new Parser(
   Production!("Number ::= NumberPart {NumberPart}",
     // binding attached to production ('all' is supplied by default?)
     void function(char[] all){
       writefln("Parsed Number: %s",all);
     }
   ),
   Production!("NumberPart ::= Sep | Digit "),
   Production!("Digit ::= 0|1|2|3|4|5|6|7|8|9"),
   Production!("Sep ::= '_' | ','")
 );

 // call specifying start production
 grammar.parse("Number",myInput);

That's one way to do it, but I think you could also allow bindings to be attached after the fact: auto grammar = new Parser( "Number ::= NumberPart {NumberPart} NumberPart ::= Sep | Digit Digit ::= 0|1|2|3|4|5|6|7|8|9 Sep ::= '_' | ','"); ); grammer.attach("Number", // binding attached to production ('all' is supplied by default?) void function(char[] all){ writefln("Parsed Number: %s",all); }) This is _exactly_ how parameter binding works in shader code. Just here the value we're binding is a function pointer instead of a texture coordinate or something.
 Depending on how you'd like the call bindings to go, you could 
 probably go about as complex as what Enki lets you get away with.  But 
 you'll have to accept a 'soft' binding in there someplace, hence you 
 loose the type/name checking benefits of being at compile time.

I'll have to take your word for it. You mean in Enki you can say that Number has to output something convertible to 'real'?

Yes and no. The parser generator has a good deal of flexibility built-in, including a pseudo-variant type that tries to perform conversions wherever possible. For instance, if we re-wrote the production for Number like so: Number = real handleNumber(whole,part) ::= (NumberPart {NumberPart}):whole '.' (NumberPart {NumberPart}):fraction; ... Enki would emit code that binds the chars traversed for for 'whole' and 'fraction', and passes those onto a function called 'handleNumber' that returns a real. That return value is passed up parse chain so that other terminals can bind to it: Foobar = writeMe(foo) ::= Number:foo; And so on.
 
 I wonder if it would be any easier to make a compile-time grammar 
 verifier than a full blown parser generator?   Then just do the 
 parser-generating at runtime.

Maybe I don't fully understand, but I don't think there's a gain there. If you've already gone through the gyrations of parsing the BNF expression, it's hardly any extra trouble to do something at each step of the resulting parse tree*. (* of course template-based parsers use the call-tree as a parse-tree but that's besides the point)

Yeh, I was just talking crap. I thought maybe you might be able to save some bookkeeping if all you cared about was that the grammar made a valid tree, but didn't care about it's output. But probably it's the other way around. Checking validity is the hard part, not making a tree. --bb

Oct 18 2006