www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Recursive descent is dangerous

reply Manfred Nowak <svv1999 hotmail.com> writes:
Recursive descent parsing  is dangerous, especially when there is 
no formal grammar at hand.

<cite  href="http://www.digitalmars.com/d/expression.html">
MulExpression:
		UnaryExpression
		MulExpression * UnaryExpression
UnaryExpression:
		* UnaryExpression
		NewExpression
NewExpression:
		new BasicType Stars
Stars:
		nothing
		*
		* Stars
</cite>

Do you see the ambiguousness of this grammar? A recursive descent 
parser, as currently used, does not see it.

Consider this derivation according to the grammar above, which 
seems to be excerpted from the recursive descent parser:

    MulExpression
--> MulExpression           * UnaryExpression
--> UnaryExpression         * UnaryExpression
--> * UnaryExpression       * * UnaryExpression
--> * * UnaryExpression     * * UnaryExpression
--> * * NewExpression       * * UnaryExpression
--> * * new BasicType Stars * * UnaryExpression
--> * * new int       *     * * Identifier

Then the following should be flawlessly parsed:

<code>
void main(){
  int** x;
  x=new int*;
  int* y;
  y= *x;

  int z= **x * *y;
  z= **new int* * *y;
}
</code>

But it cannot be parsed because of the ambiguity, which the 
recursive descent parser is not aware of and therefore parses the 
code in question as

--> **new BasicType Stars Identifier

which cannot be derived from the startsymbol.

Also parenthesing does not help

  Z= **(new int*) * *y;

because somehow int-promotion is made and the error mesage

 "can only * a pointer, not a 'int'"

is generated.

-manfred
Feb 28 2005
next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Manfred Nowak wrote:
 Recursive descent parsing  is dangerous, especially when there is 
 no formal grammar at hand.

Ambiguous grammars are dangerous, before you get into the question of how to parse them.
 <cite  href="http://www.digitalmars.com/d/expression.html">
 MulExpression:
 		UnaryExpression
 		MulExpression * UnaryExpression
 UnaryExpression:
 		* UnaryExpression
 		NewExpression
 NewExpression:
 		new BasicType Stars
 Stars:
 		nothing
 		*
 		* Stars
 </cite>
 
 Do you see the ambiguousness of this grammar? A recursive descent 
 parser, as currently used, does not see it.

Yes. I even see the ambiguity of a simple statement like qwert * yuiop; Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Feb 28 2005
parent "Walter" <newshound digitalmars.com> writes:
The ambiguity of:

    a * b

is resolved in the parser by using the rule "if it parses as a declaration,
it is a declaration." Hence,

    a * b;

resolves as b being declared as a pointer to a.
Feb 28 2005
prev sibling parent reply Ilya Minkov <minkov cs.tum.edu> writes:
Manfred Nowak wrote:
 Recursive descent parsing  is dangerous, especially when there is 
 no formal grammar at hand.

If one uses a recursive descent parser generator, it would bark and make you fix. I just had COCO/R bark at me yesterday for a similar reason - something that one cannot actually detect visually in a complex grammar. More recent recursive descent parser generators offer disambiguation on the basis of more lookahead or other means.
 <cite  href="http://www.digitalmars.com/d/expression.html">
 MulExpression:
 		UnaryExpression
 		MulExpression * UnaryExpression
 UnaryExpression:
 		* UnaryExpression
 		NewExpression
 NewExpression:
 		new BasicType Stars
 Stars:
 		nothing
 		*
 		* Stars
 </cite>
 
 Do you see the ambiguousness of this grammar? A recursive descent 
 parser, as currently used, does not see it.

This grammar need not represent the grammar on which DMD is actually built. However, this part might.
 Consider this derivation according to the grammar above, which 
 seems to be excerpted from the recursive descent parser:
 
     MulExpression
 --> MulExpression           * UnaryExpression
 --> UnaryExpression         * UnaryExpression
 --> * UnaryExpression       * * UnaryExpression
 --> * * UnaryExpression     * * UnaryExpression
 --> * * NewExpression       * * UnaryExpression
 --> * * new BasicType Stars * * UnaryExpression
 --> * * new int       *     * * Identifier
 
 Then the following should be flawlessly parsed:
 
 <code>
 void main(){
   int** x;
   x=new int*;
   int* y;
   y= *x;
 
   int z= **x * *y;
   z= **new int* * *y;
 }
 </code>
 
 But it cannot be parsed because of the ambiguity, which the 
 recursive descent parser is not aware of and therefore parses the 
 code in question as
 
 --> **new BasicType Stars Identifier
 
 which cannot be derived from the startsymbol.

A recursive descent parser never arrives at parsing this expression completely. To the contrary, an LR parser would parse this subexpression completely, and might later have problems deriving it from the root. However, since these decisions in an LR parser are made statically, a table generator would bark at you. As opposed to an LR parser one can quite nicely use the information on what we are trying to parse - the grammar can have separate sets of simiar rules for stuff which can be confusingly similar but can be disambiguated by where it may appear. This is not possible wlth bottom-up parsing. There are 2 other techniques I am aware of. One is GLR, of which i believe it uses a combination of top-down and bottom-up techniques - and is moderately fast - however i cannot see how it can solve the problem. Another is Earley parser which yuilds all possible derivations, but is probably not fast enough to be used for such a big task as D.
 Also parenthesing does not help
 
   Z= **(new int*) * *y;
 
 because somehow int-promotion is made and the error mesage
 
  "can only * a pointer, not a 'int'"
 
 is generated.

Int-promotion? I see the problem but i don't see how it happens. Perhaps it is an unrelated bug or perhaps the grammar needs tweaking. Anyway, what do you propose? How would a different type of parser solve this problem? - well, apart from Earley parser. Please feel free to correct me if you feel confident, i'm not very solid on the parsing ground and could have mixed something up, and am too tired to pick up the reference at the moment. -eye
Mar 04 2005
parent Manfred Nowak <svv1999 hotmail.com> writes:
Ilya Minkov <minkov cs.tum.edu> wrote:

[...]
 Int-promotion? I see the problem but i don't see how it happens.
 Perhaps it is an unrelated bug or perhaps the grammar needs
 tweaking. 
 
 Anyway, what do you propose? How would a different type of
 parser solve this problem? - well, apart from Earley parser.

I agree that this might be an unrelated bug. I detected this ambiguity, while trying to establish a LR-grammar for D. So my proposal is a dual one. First: use hand coded recursive descent parsers only if you have also a formal grammar at hand, from which you can detect ambiguities by the according tools preferable before you hand code the parser. Second: my LR-grammar can be disambiguated by intoducing the need for parentheses around new-expressions if followed by the multiplicative operator `*'. -manfred
Mar 11 2005