www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Writing a language parser in D

reply Justin Johansson <procode adam-dott-com.au> writes:
Can D people please recommend suitable tools for generating a parser (in D) for
an LL(1) grammar.  There's bound to be much better parser generator tools
available nowadays, since my last foray into this area 10+ years ago with YACC.
 I've heard of tools like bison, SableCC etc but apart from the names know
nothing about them.

(Note.  This question is not about writing a parser for D.  It is about writing
a parser in D for another language which has an LL(1) grammar).

Thanks in advance for all help.

-- Justin Johansson
Sep 14 2009
next sibling parent reply div0 <div0 users.sourceforge.net> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Justin Johansson wrote:
 Can D people please recommend suitable tools for generating a parser (in D)
for an LL(1) grammar.  There's bound to be much better parser generator tools
available nowadays, since my last foray into this area 10+ years ago with YACC.
 I've heard of tools like bison, SableCC etc but apart from the names know
nothing about them.
 
 (Note.  This question is not about writing a parser for D.  It is about
writing a parser in D for another language which has an LL(1) grammar).
 
 Thanks in advance for all help.
 
 -- Justin Johansson
 

I've ported boost::spirit to d. No idea if it does what you want, but I've written some fairly complicated grammars with it. It's not a tool though, you just define your grammar directly in code. Which is either a plus or a minus depending on your point of view. Quick intro: http://www.boost.org/doc/libs/1_36_0/libs/spirit/classic/index.html And D implementation: http://www.sstk.co.uk/spiritd.php - -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFKrrmbT9LetA9XoXwRAg0ZAJ95oFr48DUbDEBGKUOCpWDNewYhGwCfQb83 ud7OQmiXnPmpAuRQdpLHyLc= =PPEc -----END PGP SIGNATURE-----
Sep 14 2009
next sibling parent reply Bill Baxter <wbaxter gmail.com> writes:
On Mon, Sep 14, 2009 at 2:46 PM, div0 <div0 users.sourceforge.net> wrote:
 -----BEGIN PGP SIGNED MESSAGE-----
 Hash: SHA1

 Justin Johansson wrote:
 Can D people please recommend suitable tools for generating a parser (in=


or tools available nowadays, since my last foray into this area 10+ years a= go with YACC. =A0I've heard of tools like bison, SableCC etc but apart from= the names know nothing about them.
 (Note. =A0This question is not about writing a parser for D. =A0It is ab=


 Thanks in advance for all help.

 -- Justin Johansson

I've ported boost::spirit to d. No idea if it does what you want, but I've written some fairly complicated grammars with it. It's not a tool though, you just define your grammar directly in code. Which is either a plus or a minus depending on your point of view. Quick intro: http://www.boost.org/doc/libs/1_36_0/libs/spirit/classic/index.html And D implementation: http://www.sstk.co.uk/spiritd.php

I'm not seeing the powershell script or test app in that .zip file. I don't really need it, I was just curious what the syntax looked like without any operator overloading. --bb
Sep 14 2009
parent div0 <div0 users.sourceforge.net> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bill Baxter wrote:
 On Mon, Sep 14, 2009 at 2:46 PM, div0 <div0 users.sourceforge.net> wrote:
 -----BEGIN PGP SIGNED MESSAGE-----
 Hash: SHA1

 Justin Johansson wrote:
 Can D people please recommend suitable tools for generating a parser (in D)
for an LL(1) grammar.  There's bound to be much better parser generator tools
available nowadays, since my last foray into this area 10+ years ago with YACC.
 I've heard of tools like bison, SableCC etc but apart from the names know
nothing about them.

 (Note.  This question is not about writing a parser for D.  It is about
writing a parser in D for another language which has an LL(1) grammar).

 Thanks in advance for all help.

 -- Justin Johansson

but I've written some fairly complicated grammars with it. It's not a tool though, you just define your grammar directly in code. Which is either a plus or a minus depending on your point of view. Quick intro: http://www.boost.org/doc/libs/1_36_0/libs/spirit/classic/index.html And D implementation: http://www.sstk.co.uk/spiritd.php

I'm not seeing the powershell script or test app in that .zip file. I don't really need it, I was just curious what the syntax looked like without any operator overloading. --bb

Their both in the top level bit of the zip, build.ps1 & test0.d I've gone for template factory functions at the moment, it's quick and dirty:
 rT	values = rT.create(
         or(
             or(
                 or(
                     or(parseReal, boolVal[&_outer.gotBool]),
                     parseInt
                 ),
                 stringVal[&_outer.gotString]
             ),
             arrayValues
         ));
 
 rT	fieldName = rT.create(
         lexemeD[
             seq(
                 or(alphaP, chP!(chT)('_')),
                 star(or(alnumP, chP!(chT)('_')))
             )
         ]);
 
 rT	field = rT.create(
         seq(
             seq(fieldName[&_outer.gotFieldName], chP!(chT)(':')),
             values
         ));

One day I may write a something to generate the grammar from a string but I've got way too much other stuff to do at the mo, so that's a low priority. - -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFKr+HRT9LetA9XoXwRAppuAJ4n+0i/BCW4pVI3CPpBXXEadDlF8wCdG8RH gp+7369u/3k9hkE2E/vxapg= =ZYF3 -----END PGP SIGNATURE-----
Sep 15 2009
prev sibling parent reply Justin Johansson <procode adam-dott-com.au> writes:
 I've ported boost::spirit to d. No idea if it does what you want,
 but I've written some fairly complicated grammars with it.
 
 It's not a tool though, you just define your grammar directly in code.
 Which is either a plus or a minus depending on your point of view.

Thanks for all replies, Ellery, div0, Bill et. al. There's 101 odd productions in EBNF so whatever is the easiest to plug these directly into the tool or engine is probably the road of least resistance for this exercise. <JJ/>
Sep 14 2009
parent div0 <div0 users.sourceforge.net> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Justin Johansson wrote:
 I've ported boost::spirit to d. No idea if it does what you want,
 but I've written some fairly complicated grammars with it.

 It's not a tool though, you just define your grammar directly in code.
 Which is either a plus or a minus depending on your point of view.

Thanks for all replies, Ellery, div0, Bill et. al. There's 101 odd productions in EBNF so whatever is the easiest to plug these directly into the tool or engine is probably the road of least resistance for this exercise. <JJ/>

Yar, that's a bit much for doing in spiritd. Least ways not with out extra non existent plumbing. - -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFKr+IHT9LetA9XoXwRAkECAKDL38p+IfBQpkqoXdM6vwRJya5uYACgg46l 7wqGhQEi/bhFoy9oZ7KNGKI= =bcBy -----END PGP SIGNATURE-----
Sep 15 2009
prev sibling next sibling parent reply Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
Justin Johansson wrote:
 Can D people please recommend suitable tools for generating a parser (in D)
for an LL(1) grammar.  There's bound to be much better parser generator tools
available nowadays, since my last foray into this area 10+ years ago with YACC.
 I've heard of tools like bison, SableCC etc but apart from the names know
nothing about them.
 
 (Note.  This question is not about writing a parser for D.  It is about
writing a parser in D for another language which has an LL(1) grammar).
 
 Thanks in advance for all help.
 
 -- Justin Johansson
 

You might have a look at ANTLR. It's an LL(k) or LL(*) (versions) parser generator. I've found it suitable for writing a parser for D (yes I know), so it is definitely powerful enough. Currently, there is no implementation of ANTLR that generates D code, so if you really want a pure D parser, look elsewhere. However, I believe it can generate C, which you might be able to link to. If so, I'd say it's your best bet.
Sep 14 2009
parent reply Trass3r <mrmocool gmx.de> writes:
Ellery Newcomer schrieb:
 Currently, there is no implementation of ANTLR that generates D code, so
 if you really want a pure D parser, look elsewhere. However, I believe
 it can generate C, which you might be able to link to. If so, I'd say
 it's your best bet.

Not completely true, there is one, it's just antiquated http://www.mbutscher.de/antlrd/
Sep 15 2009
parent reply Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
Trass3r wrote:
 Ellery Newcomer schrieb:
 Currently, there is no implementation of ANTLR that generates D code, so
 if you really want a pure D parser, look elsewhere. However, I believe
 it can generate C, which you might be able to link to. If so, I'd say
 it's your best bet.

Not completely true, there is one, it's just antiquated http://www.mbutscher.de/antlrd/

That actually works?
Sep 15 2009
parent Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
Ellery Newcomer wrote:
 Trass3r wrote:
 Ellery Newcomer schrieb:
 Currently, there is no implementation of ANTLR that generates D code, so
 if you really want a pure D parser, look elsewhere. However, I believe
 it can generate C, which you might be able to link to. If so, I'd say
 it's your best bet.

http://www.mbutscher.de/antlrd/

That actually works?

Wow. I had a go with it, and it actually does a lot more than I thought it would. I am so going to dust this off and get it working!
Sep 15 2009
prev sibling next sibling parent Justin Johansson <procode adam-dott-com.au> writes:
Hi Nick,
Thanks.  The grammar is already spec'ed for LL ans so looking for the course of
least resistance.
I've used GOLD and spoken with its author, Devin Cook, in the past though. 
It's rather cool in a way.
Still it's great to see GOLD coming to a screen in the D village.
<JJ/>

 If you can't find anything you like for LL and don't mind switching to LALR 
 instead, I've recently released Goldie ( 
 http://www.dsource.org/projects/goldie ) which works in conjunction with 
 GOLD Parser Builder ( http://www.devincook.com/goldparser/ ). If you're 
 familiar with GOLD, Goldie is a GOLD Engine for D. I do plan to add support 
 for LL eventually, but that's kind of a ways off for now.

Sep 14 2009
prev sibling next sibling parent reply downs <default_357-line yahoo.de> writes:
Justin Johansson wrote:
 Can D people please recommend suitable tools for generating a parser (in D)
for an LL(1) grammar.  There's bound to be much better parser generator tools
available nowadays, since my last foray into this area 10+ years ago with YACC.
 I've heard of tools like bison, SableCC etc but apart from the names know
nothing about them.
 
 (Note.  This question is not about writing a parser for D.  It is about
writing a parser in D for another language which has an LL(1) grammar).
 
 Thanks in advance for all help.
 
 -- Justin Johansson
 

In a completely different vein, tools.rd is a simplicistic recursive descent parser framework implemented at compiletime that I've used for most/all of my toy languages. It keeps things trivial - there's no lexing stage, it parses straight from input string. It's not that well documented, but if you want, give me a simple language description and I can write you a sample parser. It's probably the easiest to use though - just mix it in from D code :)
Sep 15 2009
parent reply Justin Johansson <procode adam-dott-com.au> writes:
Hi downs,

Thanks for the offer but since YACC is my prior background I'll probably go to
the closest tool which is the modern variant for LL(1).  Still if you have a
small sample to share I'm sure other D people will be delighted.

<JJ/>

downs Wrote:

 Justin Johansson wrote:
 Can D people please recommend suitable tools for generating a parser (in D)
for an LL(1) grammar.  There's bound to be much better parser generator tools
available nowadays, since my last foray into this area 10+ years ago with YACC.
 I've heard of tools like bison, SableCC etc but apart from the names know
nothing about them.
 
 (Note.  This question is not about writing a parser for D.  It is about
writing a parser in D for another language which has an LL(1) grammar).
 
 Thanks in advance for all help.
 
 -- Justin Johansson
 

In a completely different vein, tools.rd is a simplicistic recursive descent parser framework implemented at compiletime that I've used for most/all of my toy languages. It keeps things trivial - there's no lexing stage, it parses straight from input string. It's not that well documented, but if you want, give me a simple language description and I can write you a sample parser. It's probably the easiest to use though - just mix it in from D code :)

Sep 15 2009
parent reply downs <default_357-line yahoo.de> writes:
Justin Johansson wrote:
downs Wrote:
 Justin Johansson wrote:
 Can D people please recommend suitable tools for generating a parser (in D)
for an LL(1) grammar.  There's bound to be much better parser generator tools
available nowadays, since my last foray into this area 10+ years ago with YACC.
 I've heard of tools like bison, SableCC etc but apart from the names know
nothing about them.

 (Note.  This question is not about writing a parser for D.  It is about
writing a parser in D for another language which has an LL(1) grammar).

 Thanks in advance for all help.

 -- Justin Johansson


Hi downs, Thanks for the offer but since YACC is my prior background I'll probably go to the closest tool which is the modern variant for LL(1). Still if you have a small sample to share I'm sure other D people will be delighted. <JJ/>

Well for instance, take the PAD (Pastebin Adventure) component of my IRC bot, that can run simple text adventures from a variety of sources, like local Gobby sessions, Wikis and (originally) Pastebin.com: http://dsource.org/projects/scrapple/browser/trunk/idc/pad Let's look at http://dsource.org/projects/scrapple/browser/trunk/idc/pad/engine.d L175: gotToken Functions like this form the building blocks of tools.rd parsing. They always have the form "bool gotBlarghle(ref string st, out T result)" and return true if result could be parsed from st, otherwise false (in which case st is not modified). gotToken trivially removes a token from the input text. L200: bool accept(ref string st, string cmp): This function is called internally by the parser framework to decide if st starts with a comparison string, in which case it is removed and true returned. bool accept removes tokens from both strings and compares until a comparison fails (false, st not modified) or cmp is used up (true). L230: The first use of the actual Parser DSL. return mixin(gotMatchExpr("s: log")); This simply matches "log" against the input string s. Nothing fancy. L282: Not related to the parser but still, IMHO, insanely cool. const string Table = ` | bool | int | string | float --------+---------------+-------------+----------------------+-------- Boolean | b | b | b?q{true}p:q{false}p | Integer | i != 0 | i | Format(i) | i String | s == q{true}p | atoi(s) | s | atof(s) Float | | cast(int) f | Format(f) | f`; This table contains a conversion matrix for internal types to basic type. Two things are of interest: 1) q{}p is unrolled by .litstring_expand() into nested and escaped ""s. It's a backport of D2 nestable string literals to D1. 2) The table itself. tools.ctfe contains functionality to select rows, columns, and iterate the table in column-major order. This means the above table can be automatically translated into nested if/switch statements. L487: A more instructive use of the parser framework. if (mixin(gotMatchExpr("st: [==$#eq=true$|!=$#neq=true$|<=$#eq=smaller=true$|>=$#eq=greater=true$|<$#smaller=tru $|>$#greater=true$] " "$dg2 <- genExprMath$" ))) { ... } Okay, first we have a conditional branch: [a|b|c|d]. This matches each of the possible branches against the input string in turn. Segments in $$ indicate variable matches and/or programmatic reactions. $#eq=smaller=true$ basically translates to "execute eq=smaller=true when this part of the parse string is successfully reached. ". "$dg2 <- genExprMath$" means "Generate dg2 using the genExprMath function" It is assumed that this function follows the convention of bool(ref string, out typeof(dg2)). It hasn't been used in that sample, but "y <- foo/x" means "pass x as an extra parameter to foo". And that's basically it. :) Oh, just for fun, here's the unrolled D syntax for the above expression: (ref string s) { auto scratch = s; return ( true && (ref string s) { auto scratch = s; return (true && scratch.accept("==") && (((eq=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept("!=") && (((neq=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept("<=") && (((eq=smaller=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept(">=") && (((eq=greater=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept("<") && (((smaller=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept(">") && (((greater=true), true))) && ((s = scratch), true); }(scratch) && ( genExprMath(scratch, dg2 )) ) && ((s = scratch), true); }(st)
Sep 16 2009
next sibling parent Don <nospam nospam.com> writes:
downs wrote:
 L282: Not related to the parser but still, IMHO, insanely cool.
     const string Table = `
               | bool          | int         | string               | float
       --------+---------------+-------------+----------------------+--------
       Boolean | b             | b           | b?q{true}p:q{false}p | 
       Integer | i != 0        | i           | Format(i)            | i
       String  | s == q{true}p | atoi(s)     | s                    | atof(s)
       Float   |              | cast(int) f | Format(f)            | f`;
 
 This table contains a conversion matrix for internal types to basic type. 

That's a very interesting DSL <g>. Insanely cool, indeed.
Sep 16 2009
prev sibling parent Justin Johansson <procode adam-dott-com.au> writes:
Hmm, delightful.
Thanks for sharing.
There's obviously some very talented people out there :-)

Gotta put this in my input queue for later consumption.

JJ

downs Wrote:

 Justin Johansson wrote:
downs Wrote:
 Justin Johansson wrote:
 Can D people please recommend suitable tools for generating a parser (in D)
for an LL(1) grammar.  There's bound to be much better parser generator tools
available nowadays, since my last foray into this area 10+ years ago with YACC.
 I've heard of tools like bison, SableCC etc but apart from the names know
nothing about them.

 (Note.  This question is not about writing a parser for D.  It is about
writing a parser in D for another language which has an LL(1) grammar).

 Thanks in advance for all help.

 -- Justin Johansson


Hi downs, Thanks for the offer but since YACC is my prior background I'll probably go to the closest tool which is the modern variant for LL(1). Still if you have a small sample to share I'm sure other D people will be delighted. <JJ/>

Well for instance, take the PAD (Pastebin Adventure) component of my IRC bot, that can run simple text adventures from a variety of sources, like local Gobby sessions, Wikis and (originally) Pastebin.com: http://dsource.org/projects/scrapple/browser/trunk/idc/pad Let's look at http://dsource.org/projects/scrapple/browser/trunk/idc/pad/engine.d L175: gotToken Functions like this form the building blocks of tools.rd parsing. They always have the form "bool gotBlarghle(ref string st, out T result)" and return true if result could be parsed from st, otherwise false (in which case st is not modified). gotToken trivially removes a token from the input text. L200: bool accept(ref string st, string cmp): This function is called internally by the parser framework to decide if st starts with a comparison string, in which case it is removed and true returned. bool accept removes tokens from both strings and compares until a comparison fails (false, st not modified) or cmp is used up (true). L230: The first use of the actual Parser DSL. return mixin(gotMatchExpr("s: log")); This simply matches "log" against the input string s. Nothing fancy. L282: Not related to the parser but still, IMHO, insanely cool. const string Table = ` | bool | int | string | float --------+---------------+-------------+----------------------+-------- Boolean | b | b | b?q{true}p:q{false}p | Integer | i != 0 | i | Format(i) | i String | s == q{true}p | atoi(s) | s | atof(s) Float | | cast(int) f | Format(f) | f`; This table contains a conversion matrix for internal types to basic type. Two things are of interest: 1) q{}p is unrolled by .litstring_expand() into nested and escaped ""s. It's a backport of D2 nestable string literals to D1. 2) The table itself. tools.ctfe contains functionality to select rows, columns, and iterate the table in column-major order. This means the above table can be automatically translated into nested if/switch statements. L487: A more instructive use of the parser framework. if (mixin(gotMatchExpr("st: [==$#eq=true$|!=$#neq=true$|<=$#eq=smaller=true$|>=$#eq=greater=true$|<$#smaller=tru $|>$#greater=true$] " "$dg2 <- genExprMath$" ))) { ... } Okay, first we have a conditional branch: [a|b|c|d]. This matches each of the possible branches against the input string in turn. Segments in $$ indicate variable matches and/or programmatic reactions. $#eq=smaller=true$ basically translates to "execute eq=smaller=true when this part of the parse string is successfully reached. ". "$dg2 <- genExprMath$" means "Generate dg2 using the genExprMath function" It is assumed that this function follows the convention of bool(ref string, out typeof(dg2)). It hasn't been used in that sample, but "y <- foo/x" means "pass x as an extra parameter to foo". And that's basically it. :) Oh, just for fun, here's the unrolled D syntax for the above expression: (ref string s) { auto scratch = s; return ( true && (ref string s) { auto scratch = s; return (true && scratch.accept("==") && (((eq=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept("!=") && (((neq=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept("<=") && (((eq=smaller=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept(">=") && (((eq=greater=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept("<") && (((smaller=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept(">") && (((greater=true), true))) && ((s = scratch), true); }(scratch) && ( genExprMath(scratch, dg2 )) ) && ((s = scratch), true); }(st)

Sep 16 2009
prev sibling next sibling parent Lutger <lutger.blijdestijn gmail.com> writes:
APaGeD can do LL parsers: http://apaged.mainia.de/
There is a fork: http://www.dsource.org/projects/apaged2

Not a recommendation perse, but just wanted to mention it as an option. 
Sep 15 2009
prev sibling next sibling parent Alexander Bothe <info alexanderbothe.com> writes:
Justin Johansson Wrote:

 Can D people please recommend suitable tools for generating a parser (in D)
for an LL(1) grammar.  There's bound to be much better parser generator tools
available nowadays, since my last foray into this area 10+ years ago with YACC.
 I've heard of tools like bison, SableCC etc but apart from the names know
nothing about them.
 
 (Note.  This question is not about writing a parser for D.  It is about
writing a parser in D for another language which has an LL(1) grammar).
 
 Thanks in advance for all help.
 
 -- Justin Johansson
 

My D-IDE (written in C#) has an extra DLL which contains an entire D Parser...It's based on the SharpDevelop parser and lexer.... may you can use it.... www.alexanderbothe.com/?id=27
Sep 15 2009
prev sibling parent BCS <none anon.com> writes:
Hello Justin,

 Can D people please recommend suitable tools for generating a parser
 (in D) for an LL(1) grammar.  There's bound to be much better parser
 generator tools available nowadays, since my last foray into this area
 10+ years ago with YACC.  I've heard of tools like bison, SableCC etc
 but apart from the names know nothing about them.
 
 (Note.  This question is not about writing a parser for D.  It is
 about writing a parser in D for another language which has an LL(1)
 grammar).
 
 Thanks in advance for all help.
 

I've written this: http://www.dsource.org/projects/scrapple/browser/trunk/dparser It's a pure compile time parser generator that takes grammars defined as text and generates a backtracking recursive decent parser.
Sep 15 2009