digitalmars.D - D grammar as LL(1) grammar ( OT)

BLS (8/8) Sep 15 2007 Sorry for beeing a bit off topic, anyway:

BCS (8/20) Sep 15 2007 I'm working on an automatic conversion to LL for a recursive decent pars...
Matti Niemenmaa (6/9) Sep 16 2007 Impossible due to allowing if-else without curly brackets:
Aziz K. (8/9) Sep 16 2007 Unfortunately not :/. Take this case for example:
Jascha Wetzel (20/32) Sep 16 2007 D needs arbitrary lookahead, that means you can write D code that can be...

BCS (13/18) Sep 16 2007 In general that may be true (I'd have to check) but often a specific gra...

Jascha Wetzel (17/44) Sep 17 2007 if you make sure, that the parser still generates the same parse tree as...

BCS (8/58) Sep 17 2007 I'm sort of hoping to pull the grammar right out of the docs (I'm
Manfred Nowak (7/9) Sep 18 2007 This indicates only that your parsing system might accept a super set

Jascha Wetzel (5/16) Sep 18 2007 definitely. the grammar in the docs is a lot more general than the DMD

BLS (4/16) Sep 16 2007 Thank you folks,

BLS <nanali nospam-wanadoo.fr> writes:

Sorry for beeing a bit off topic, anyway:

Do you see a chance to describe D grammar as LL(1) grammar..

Please notice that I am a more DB developer than a language-designer 
means keep your answere as simple as possible :)

all hints are welcome.
Thanks Bjoern

Well, at least one guy did it for *Erlang*, as you can see here:
http://platform.netbeans.org/articles/nbm_interview_caoyuan.html

Sep 15 2007

BCS <ao pathlink.com> writes:

Reply to bls,

 Sorry for beeing a bit off topic, anyway:
 
 Do you see a chance to describe D grammar as LL(1) grammar..
 
 Please notice that I am a more DB developer than a language-designer
 means keep your answere as simple as possible :)
 
 all hints are welcome.
 Thanks Bjoern
 Well, at least one guy did it for *Erlang*, as you can see here:
 http://platform.netbeans.org/articles/nbm_interview_caoyuan.html
 

I'm working on an automatic conversion to LL for a recursive decent parser 
(not look ahead 1 but...) I expect to have a mostly working system in the 
near future (a month, maybe less, maybe much less, depends on homework ;)

I'm hoping to auto extract the grammar from the doc files, auto convert to 
LL, auto munge to get it working for my system ("now, now, cut up your grammar 
into small pieces, young man. Don't gulp it done like a animal" <G>)

I plan on sharing it all eventually.

Sep 15 2007

Matti Niemenmaa <see_signature for.real.address> writes:

BLS wrote:
 Sorry for beeing a bit off topic, anyway:
 
 Do you see a chance to describe D grammar as LL(1) grammar..

Impossible due to allowing if-else without curly brackets:
http://en.wikipedia.org/wiki/Dangling_else

Explanation here: http://compilers.iecc.com/comparch/article/95-06-033

-- 
E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi

Sep 16 2007

"Aziz K." <aziz.kerim gmail.com> writes:

Hello Björn,

BLS wrote:
 Do you see a chance to describe D grammar as LL(1) grammar..

Unfortunately not :/. Take this case for example:

auto dg = (int a){ return 0; };

You would have to peek behind the closing parenthesis for an opening curly  
bracket to know if you have to parse this as a delegate.

Regards,
Aziz

Sep 16 2007

Jascha Wetzel <firstname mainia.de> writes:

BLS wrote:
 Sorry for beeing a bit off topic, anyway:
 
 Do you see a chance to describe D grammar as LL(1) grammar..
 
 Please notice that I am a more DB developer than a language-designer 
 means keep your answere as simple as possible :)
 
 all hints are welcome.
 Thanks Bjoern
 
 Well, at least one guy did it for *Erlang*, as you can see here:
 http://platform.netbeans.org/articles/nbm_interview_caoyuan.html

D needs arbitrary lookahead, that means you can write D code that can be 
parsed with LL(1), some that needs LL(173) and so on. the parser has to 
be able to parse any of that and thus may not have a fixed length lookahead.
For LL parsers that generally means that you have to use backtracking. 
Since DMD uses a hand-written parser, it can optimize the lookahead, 
such that not the whole parser has to work it's way until the point 
where it can decide what to parse, but only a simplified function that 
quickly scans for the relevant token.
E.g. there is a function called peekPastParen (or similar) that just 
finds the first token after a parenthesis and is therefore really fast.

If you generate an LL parser automatically, it's very difficult to apply 
such optimizations. The resulting parser will be considerably slower.
That's the reason i chose to write an LR parser generator to parse D. LR 
parsers use the information provided by a lookahead symbol much more 
efficiently. That is, LR(1) can parse more languages than LL(1), 
although both are just using a lookahead of length 1. Furthermore, LR 
parsers consider multiple potential parse trees at the same time, 
throwing away the wrong guesses, once the necessary tokens get 
available. LL parsers have to check each potential parse tree separately.

Sep 16 2007

BCS <ao pathlink.com> writes:

Reply to Jascha,


 Furthermore, LR parsers consider multiple potential parse trees at the
 same time, throwing away the wrong guesses, once the necessary tokens
 get available. LL parsers have to check each potential parse tree
 separately.
 

In general that may be true (I'd have to check) but often a specific grammar 
can be rearranges to (partly) avoid that, particularly the parts that appear 
in several trees.

for instance:

A ::= B c C | B d D | B e E ;

can be rearranged as

A ::= B A_ ;
A_ ::= c C | d D | e E ;

This at a minimum can avoid reparsing the B part of each option.

Admittedly this is actually changing the grammar that is being parsed (so 
I can't claim to be doing a better job with the original grammar) but these 
can be done automatically.

Sep 16 2007

Jascha Wetzel <firstname mainia.de> writes:

BCS wrote:
 Reply to Jascha,
 
 
 Furthermore, LR parsers consider multiple potential parse trees at the
 same time, throwing away the wrong guesses, once the necessary tokens
 get available. LL parsers have to check each potential parse tree
 separately.

 
 In general that may be true (I'd have to check) but often a specific 
 grammar can be rearranges to (partly) avoid that, particularly the parts 
 that appear in several trees.
 
 for instance:
 
 A ::= B c C | B d D | B e E ;
 
 can be rearranged as
 
 A ::= B A_ ;
 A_ ::= c C | d D | e E ;
 
 This at a minimum can avoid reparsing the B part of each option.
 
 Admittedly this is actually changing the grammar that is being parsed 
 (so I can't claim to be doing a better job with the original grammar) 
 but these can be done automatically.

if you make sure, that the parser still generates the same parse tree as 
for the unchanged grammar, that'd be cool. i wonder how this approach 
will do for the (large) D grammar.

take a look at http://seatd.mainia.de/grammar.html if you like. it's my 
version of the D grammar from the docs with the errors ironed out 
(currently, i'm not aware of any more, that is). it might save you some 
time.
like the version from the docs, it uses left recursion, though. but that 
you would have to change anyway. that grammar successfully parses tango, 
phobos, derelict, blade, dfl, core32 and more.

what i'd love to do is to have a benchmark some day, to let the 
different parsing techniques compete. i'd patch the DMD source, such 
that it can be run as a standalone parser and it'll represent the 
hand-written LL team. my seatd parser will run for the automated LR team 
and it'll be fun if you would throw your parser in as the automated LL 
team :)

Sep 17 2007

BCS <BCS pathlink.com> writes:

Jascha Wetzel wrote:
 BCS wrote:
 
 Reply to Jascha,


 Furthermore, LR parsers consider multiple potential parse trees at the
 same time, throwing away the wrong guesses, once the necessary tokens
 get available. LL parsers have to check each potential parse tree
 separately.

 In general that may be true (I'd have to check) but often a specific 
 grammar can be rearranges to (partly) avoid that, particularly the 
 parts that appear in several trees.

 for instance:

 A ::= B c C | B d D | B e E ;

 can be rearranged as

 A ::= B A_ ;
 A_ ::= c C | d D | e E ;

 This at a minimum can avoid reparsing the B part of each option.

 Admittedly this is actually changing the grammar that is being parsed 
 (so I can't claim to be doing a better job with the original grammar) 
 but these can be done automatically.

 
 
 if you make sure, that the parser still generates the same parse tree as 
 for the unchanged grammar, that'd be cool. i wonder how this approach 
 will do for the (large) D grammar.
 

Well, when I get the time, I'll known because I plan to do exactly that

 take a look at http://seatd.mainia.de/grammar.html if you like. it's my 
 version of the D grammar from the docs with the errors ironed out 
 (currently, i'm not aware of any more, that is). it might save you some 
 time.

I'm sort of hoping to pull the grammar right out of the docs (I'm 
thinking of building a set of DDoc macros that will generate it from the 
  the original source) this would have the advantage of giving me an 
excuse to rag on Walter about the error in the docs.

 like the version from the docs, it uses left recursion, though. but that 
 you would have to change anyway. that grammar successfully parses tango, 
 phobos, derelict, blade, dfl, core32 and more.
 
 what i'd love to do is to have a benchmark some day, to let the 
 different parsing techniques compete. i'd patch the DMD source, such 
 that it can be run as a standalone parser and it'll represent the 
 hand-written LL team.

DMD is LL? cool.

 my seatd parser will run for the automated LR team
 and it'll be fun if you would throw your parser in as the automated LL 
 team :)

Again, given time I expect to have that.

Sep 17 2007

Manfred Nowak <svv1999 hotmail.com> writes:

Jascha Wetzel wrote

 that grammar successfully parses tango, 
 phobos, derelict, blade, dfl, core32 and more.

This indicates only that your parsing system might accept a super set 
of the D language.

From looking at the grammar I conclude that your parsing system might 
accept sources that are not accepted by DMD.

BTW do you have some papers on the "tdfa" you use?

-manfred

Sep 18 2007

Jascha Wetzel <firstname mainia.de> writes:

Manfred Nowak wrote:
 Jascha Wetzel wrote
 
 that grammar successfully parses tango, 
 phobos, derelict, blade, dfl, core32 and more.

 
 This indicates only that your parsing system might accept a super set 
 of the D language.
 
 From looking at the grammar I conclude that your parsing system might 
 accept sources that are not accepted by DMD.

definitely. the grammar in the docs is a lot more general than the DMD 
parser. i'm narrowing it down more and more. also, i'm going chase the 
parser through DStress, which also includes negative test cases.

 BTW do you have some papers on the "tdfa" you use?

http://laurikari.net/ville/spire2000-tnfa.pdf

Sep 18 2007

BLS <nanali nospam-wanadoo.fr> writes:

Thank you folks,
...these info's are bad news for my project. Have to think about it.

Bjoern

BLS schrieb:
 Sorry for beeing a bit off topic, anyway:
 
 Do you see a chance to describe D grammar as LL(1) grammar..
 
 Please notice that I am a more DB developer than a language-designer 
 means keep your answere as simple as possible :)
 
 all hints are welcome.
 Thanks Bjoern
 
 Well, at least one guy did it for *Erlang*, as you can see here:
 http://platform.netbeans.org/articles/nbm_interview_caoyuan.html

Sep 16 2007

D Programming

C/C++ Programming

Other

digitalmars.D - D grammar as LL(1) grammar ( OT)