www.digitalmars.com         C & C++   DMDScript  

D - D grammar (yacc format)

reply "Sammy" <none here.com> writes:
I began to write a yacc style grammar for D.

I am using the parse.c from the dmd source.
Each function is being translated to a piece of
the grammar. Each piece is added to the source
file using some tag ( /** Grammar )

Eg:

/** Grammar
 *
 * %token "^" TOKxor
 *
 * XorExp : AndExp
 *           | XorExp "^" AndExp
 *           ;
 *
 **/

Expression *Parser::parseXorExp()
{   Expression *e;
    Expression *e2;
    Loc loc = this->loc;

    e = parseAndExp();
    while (token.value == TOKxor)
    {
 nextToken();
 e2 = parseAndExp();
 e = new XorExp(loc, e, e2);
    }
    return e;
}

I dont know if the final grammar will be LALR, nor
if it will be unabiguous.

If everything is ok, i will use bison to process the grammar.
if not, there are other parser generators who can
handle anbiguous and/or non-lalr grammars
( eg: btyacc, accent, msta, D'Parser )

I would like to see the grammar maintained ( i can do that )
and distributed with DMD ( i cant do that :-) .

I also have a lexer (made with flex), who needs to be
updated.

What do you think ?
Jan 28 2004
next sibling parent "C" <dont respond.com> writes:
Wow, this would be awesome!  Think you'll end up using somethign besides
bison though , D'Parser looks really cool.

C
"Sammy" <none here.com> wrote in message
news:bv941u$1e1e$1 digitaldaemon.com...
 I began to write a yacc style grammar for D.

 I am using the parse.c from the dmd source.
 Each function is being translated to a piece of
 the grammar. Each piece is added to the source
 file using some tag ( /** Grammar )

 Eg:

 /** Grammar
  *
  * %token "^" TOKxor
  *
  * XorExp : AndExp
  *           | XorExp "^" AndExp
  *           ;
  *
  **/

 Expression *Parser::parseXorExp()
 {   Expression *e;
     Expression *e2;
     Loc loc = this->loc;

     e = parseAndExp();
     while (token.value == TOKxor)
     {
  nextToken();
  e2 = parseAndExp();
  e = new XorExp(loc, e, e2);
     }
     return e;
 }

 I dont know if the final grammar will be LALR, nor
 if it will be unabiguous.

 If everything is ok, i will use bison to process the grammar.
 if not, there are other parser generators who can
 handle anbiguous and/or non-lalr grammars
 ( eg: btyacc, accent, msta, D'Parser )

 I would like to see the grammar maintained ( i can do that )
 and distributed with DMD ( i cant do that :-) .

 I also have a lexer (made with flex), who needs to be
 updated.

 What do you think ?

Jan 28 2004
prev sibling parent reply "Ben Hinkle" <bhinkle4 juno.com> writes:
Interestingly GCC's C++ frontend switched from using a yacc generated parser
to a hand-written parser in GCC-3.4. I don't know all the reasons why but
you could probably Google around to find out why. Personally I find the 3.4
hand-written parser is easier to understand.

-Ben

"Sammy" <none here.com> wrote in message
news:bv941u$1e1e$1 digitaldaemon.com...
 I began to write a yacc style grammar for D.

 I am using the parse.c from the dmd source.
 Each function is being translated to a piece of
 the grammar. Each piece is added to the source
 file using some tag ( /** Grammar )

 Eg:

 /** Grammar
  *
  * %token "^" TOKxor
  *
  * XorExp : AndExp
  *           | XorExp "^" AndExp
  *           ;
  *
  **/

 Expression *Parser::parseXorExp()
 {   Expression *e;
     Expression *e2;
     Loc loc = this->loc;

     e = parseAndExp();
     while (token.value == TOKxor)
     {
  nextToken();
  e2 = parseAndExp();
  e = new XorExp(loc, e, e2);
     }
     return e;
 }

 I dont know if the final grammar will be LALR, nor
 if it will be unabiguous.

 If everything is ok, i will use bison to process the grammar.
 if not, there are other parser generators who can
 handle anbiguous and/or non-lalr grammars
 ( eg: btyacc, accent, msta, D'Parser )

 I would like to see the grammar maintained ( i can do that )
 and distributed with DMD ( i cant do that :-) .

 I also have a lexer (made with flex), who needs to be
 updated.

 What do you think ?

Jan 28 2004
next sibling parent "Sammy" <none here.com> writes:
"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:bv9etc$21ib$1 digitaldaemon.com...
 Interestingly GCC's C++ frontend switched from using a yacc generated

 to a hand-written parser in GCC-3.4. I don't know all the reasons why but
 you could probably Google around to find out why. Personally I find the

 hand-written parser is easier to understand.

 -Ben

I am not sure, and i dont remember where i read this, but I think its because the parser is faster, simpler and as you said, easier to understand ( and possibly to implement ). Implement the parser, any way you want. But if you want to have a clear picture of the language, i think you need a grammar. If you use EBNF, instead of pure BNF, you will end up with a grammar fitting on 2 or 3 pages instead of the 67 pages of the file parser.c So you are right, but i still find its still and interesting exercice. - S
Jan 28 2004
prev sibling next sibling parent "Sammy" <none here.com> writes:
"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:bv9etc$21ib$1 digitaldaemon.com...
 Interestingly GCC's C++ frontend switched from using a yacc generated

 to a hand-written parser in GCC-3.4. I don't know all the reasons why but
 you could probably Google around to find out why. Personally I find the

 hand-written parser is easier to understand.

 -Ben

Found this short explaination on : http://gnu.essentkabel.com/software/gcc/gcc-3.4/changes.html GCC 3.4 Release Series. Changes, New Features, and Fixes C++ A hand-written recursive-descent C++ parser has replaced the YACC-derived C++ parser from previous GCC releases. The new parser contains much improved infrastructure needed for better parsing of C++ source codes, handling of extensions, and clean separation (where possible) between proper semantics analysis and parsing. The new parser fixes many bugs that were found in the old parser. - S
Jan 28 2004
prev sibling parent "Sammy" <none here.com> writes:
"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:bv9etc$21ib$1 digitaldaemon.com...
 Interestingly GCC's C++ frontend switched from using a yacc generated

 to a hand-written parser in GCC-3.4. I don't know all the reasons why but
 you could probably Google around to find out why. Personally I find the

 hand-written parser is easier to understand.

 -Ben

Sorry, to post again. All those explanation tells that its a very good choice to use a hand written parser... From: http://compilers.iecc.com/comparch/article/02-01-022 ---------------------------------------------------------------------- Re: C++ parsing : what's new ? From comp.compilers From: Zack Weinberg <zackw panix.com> Newsgroups: comp.compilers Date: 4 Jan 2002 00:09:35 -0500 Organization: PANIX -- Public Access Networks Corp. References: 01-12-179 Keywords: C++, parse Posted-Date: 04 Jan 2002 00:09:35 EST Peter H. Froehlich <pfroehli ics.uci.edu> writes:
On Saturday, December 29, 2001, at 10:30 , Travis Nixon wrote:

 "Martin von Loewis" <loewis informatik.hu-berlin.de> wrote in message
 The gcc grammar is currently being rewritten, from a bison-based one
 to a hand-written recursive-descent parser.

Is there a discussion online anywhere about the reasons for doing this?

As I recall the motivation is performance. The people doing the rewrite expect the recursive descent parser to be way faster than the table-driven one they have now. However, I sure would *not* want to write a parser like that for a grammar as hairy as C/C++. :-)

The motivations are correctness, performance, and better error recovery, in that order. The existing gcc grammar for C++ does not interpret all language constructs correctly; this has been judged unfixable while continuing to use yacc. There are already horrible kludges to get certain constructs mostly right (look at cp/spew.c, if you want to know) that become unnecessary with a recursive descent parser. A hand-rolled recursive descent parser has a lot more information accessible to it. This should permit it to make better backtracking decisions. Bad backtracking is the major cause of bad performance in the existing parser. It will certainly permit better error recovery. Note that this approach is known to work - the Edison Design Group (EDG)'s C/C++ front end uses a hand-written recursive descent parser, and it does all the above things better than g++. As someone who's worked on both front ends, I can also say that I found it _easier_ to work with the recursive descent parser. zw
Jan 28 2004