www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Walter - Discrepancy between EqualExpression spec and implementation

reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
I'm writing a small scripting language with a syntax based on D, and as 
such, I'm basing some of the parsing code off the the DMD frontend source. 
I've found a discrepancy between the spec and the code, though, and I'm not 
sure if it's a bug or if it was changed in the code on purpose and the spec 
just hasn't been updated.

For EqualExpressions, the following grammar is given:

EqualExpression:
    RelExpression
    EqualExpression == RelExpression
    EqualExpression != RelExpression
    EqualExpression is RelExpression
    EqualExpression !is RelExpression

Notice that the left-hand operand of the operators is an EqualExpression.

However, in the code (parse.c line 4365), instead of using parseEqualExp() 
to get the left-hand operand as I would have expected, it instead uses 
parseRelExp().  Meaning that the code implements the following grammar 
instead:

EqualExpression:
    RelExpression
    RelExpression == RelExpression
    RelExpression != RelExpression
    RelExpression is RelExpression
    RelExpression !is RelExpression

I'm pretty sure only Walter can answer this one! 
May 28 2006
next sibling parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Jarrett Billingsley" <kb3ctd2 yahoo.com> wrote in message 
news:e5d40i$vp4$1 digitaldaemon.com...
 I'm writing a small scripting language with a syntax based on D, and as 
 such, I'm basing some of the parsing code off the the DMD frontend source. 
 I've found a discrepancy between the spec and the code, though, and I'm 
 not sure if it's a bug or if it was changed in the code on purpose and the 
 spec just hasn't been updated.

I just realized that most of the parsing functions follow this same pattern (of not following the spec). The only functions that seem to follow the spec are parseAssignExp() and parseCondExp().
May 28 2006
next sibling parent reply "Unknown W. Brackets" <unknown simplemachines.org> writes:
Forgive me if I am off-base here, but the following code:

int main()
{
	int i = 0;
	bool b = i == 0 == true !is false;

	return 0;
}

Compiles without errors.  This would suggest to me that it is being 
parsed according to the spec, roughly.  Perhaps it's just not following 
that exact BNF (but implementing the same difference, e.g. outside 
EqualExpression)?

-[Unknown]


 "Jarrett Billingsley" <kb3ctd2 yahoo.com> wrote in message 
 news:e5d40i$vp4$1 digitaldaemon.com...
 I'm writing a small scripting language with a syntax based on D, and as 
 such, I'm basing some of the parsing code off the the DMD frontend source. 
 I've found a discrepancy between the spec and the code, though, and I'm 
 not sure if it's a bug or if it was changed in the code on purpose and the 
 spec just hasn't been updated.

I just realized that most of the parsing functions follow this same pattern (of not following the spec). The only functions that seem to follow the spec are parseAssignExp() and parseCondExp().

May 28 2006
next sibling parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Unknown W. Brackets" <unknown simplemachines.org> wrote in message 
news:e5djuj$1dik$1 digitaldaemon.com...

 This would suggest to me that it is being parsed according to the spec, 
 roughly.  Perhaps it's just not following that exact BNF (but implementing 
 the same difference, e.g. outside EqualExpression)?

I'm thinking about it. It's very weird, but I think I've got it. I think all Walter's done with this code is factored out a recursive step; in this case, calling parseEqualExp() is the same as calling parseRelExp(). Recursive descent parsers are weird.
May 28 2006
parent xs0 <xs0 xs0.com> writes:
 Recursive descent parsers are weird. 

Heh, I think it's quite the opposite, with recursive descent you can actually figure out what's going on. Have you seen a yacc-generated parser (it's not recursive descent but "shift/reduce", LALR or whatever that's called)? Now that's really weird: http://search.cpan.org/src/RGARCIA/perl-5.9.3/perly.tab http://search.cpan.org/src/RGARCIA/perl-5.9.3/perly.c Of course, the input file is much more readable, but you can't compare C++ code in one case with grammar definition code in the other case, can you? :) xs0
May 29 2006
prev sibling parent reply renox <renosky free.fr> writes:
Unknown W. Brackets wrote:

 Forgive me if I am off-base here, but the following code:
 
 int main()
 {
     int i = 0;
     bool b = i == 0 == true !is false;
 
     return 0;
 }
 
 Compiles without errors.

Urgh, if I see someone coding like this, let's just say that I won't be polite.. At some point I wonder if it wouldn't be better if in the language all the operators shouldn't have the same priority, just to force the programmers to use parenthesis! The only downside I see is a big pitfall for the beginners and perhaps less opportunity for optimisation by the compiler depending of the implementation.. Regards, RenoX
Jun 02 2006
parent "Unknown W. Brackets" <unknown simplemachines.org> writes:
I have been known to revert checkins for code line that, haha.

Still, I like that I can write something like this:

x = original_x * cos(angle) - original_y * sin(angle);
y = original_y * cos(angle) + original_x * sin(angle);

And know it will work as I expect.  For this reason, I love operator 
precedence.

But I agree that "x == y == z" is just wrong.  I even take it farther 
and can't stand when people do "x = y = 0;", even if it is fine code.

Everyone has their own preferences.

-[Unknown]


 Urgh, if I see someone coding like this, let's just say that I won't be 
 polite..
 
 At some point I wonder if it wouldn't be better if in the language all 
 the operators shouldn't have the same priority, just to force the 
 programmers to use parenthesis!
 
 The only downside I see is a big pitfall for the beginners and perhaps 
 less opportunity for optimisation by the compiler depending of the 
 implementation..
 
 Regards,
 RenoX

Jun 02 2006
prev sibling parent reply James Dunne <james.jdunne gmail.com> writes:
Jarrett Billingsley wrote:
 "Jarrett Billingsley" <kb3ctd2 yahoo.com> wrote in message 
 news:e5d40i$vp4$1 digitaldaemon.com...
 
I'm writing a small scripting language with a syntax based on D, and as 
such, I'm basing some of the parsing code off the the DMD frontend source. 
I've found a discrepancy between the spec and the code, though, and I'm 
not sure if it's a bug or if it was changed in the code on purpose and the 
spec just hasn't been updated.

I just realized that most of the parsing functions follow this same pattern (of not following the spec). The only functions that seem to follow the spec are parseAssignExp() and parseCondExp().

I've taken a serious look into this too :) Here's my answer: See the while loops implemented in most of those functions? Grammar rule: ------------- OrOrExpression: AndAndExpression OrOrExpression || AndAndExpression DMD implementation: ------------------- Expression *Parser::parseOrOrExp() { Expression *e; Expression *e2; Loc loc = this->loc; e = parseAndAndExp(); while (token.value == TOKoror) { nextToken(); e2 = parseAndAndExp(); e = new OrOrExp(loc, e, e2); } return e; } So, on the first iteration we either get an AndAndExpression alone, or an OrOrExpression with the left expression being an AndAndExpression and the right expression being an AndAndExpression. The left-recursive descent rule 'OrOrExpression || AndAndExpression' means to loop over the left side until a subsequent token is NOT the '||' token. Thus, we can only get an OrOrExpression out of two AndAndExpressions next to each other, separated by the '||' token, or just an AndAndExpression alone. So, the code does implement the spec correctly. All the other expression rules follow this pattern. As to the exceptional case of AssignExpression: Grammar rule: ------------- AssignExpression: ConditionalExpression ConditionalExpression = AssignExpression ConditionalExpression += AssignExpression ConditionalExpression -= AssignExpression ConditionalExpression *= AssignExpression ConditionalExpression /= AssignExpression ConditionalExpression %= AssignExpression ConditionalExpression &= AssignExpression ConditionalExpression |= AssignExpression ConditionalExpression ^= AssignExpression ConditionalExpression ~= AssignExpression ConditionalExpression <<= AssignExpression ConditionalExpression >>= AssignExpression ConditionalExpression >>>= AssignExpression DMD implementation: (I've expanded out the #define shortcut for readability) ------------------- Expression *Parser::parseAssignExp() { Expression *e; Expression *e2; Loc loc; e = parseCondExp(); while (1) { loc = this->loc; switch (token.value) { case TOKassign: nextToken(); e2 = parseAssignExp(); e = new AssignExp(loc,e,e2); continue; case TOKaddass: nextToken(); e2 = parseAssignExp(); e = new AddAssignExp(loc,e,e2); continue; case TOKminass: nextToken(); e2 = parseAssignExp(); e = new MinAssignExp(loc,e,e2); continue; case TOKmulass: nextToken(); e2 = parseAssignExp(); e = new MulAssignExp(loc,e,e2); continue; case TOKdivass: nextToken(); e2 = parseAssignExp(); e = new DivAssignExp(loc,e,e2); continue; case TOKmodass: nextToken(); e2 = parseAssignExp(); e = new ModAssignExp(loc,e,e2); continue; case TOKandass: nextToken(); e2 = parseAssignExp(); e = new AndAssignExp(loc,e,e2); continue; case TOKorass: nextToken(); e2 = parseAssignExp(); e = new OrAssignExp(loc,e,e2); continue; case TOKxorass: nextToken(); e2 = parseAssignExp(); e = new XorAssignExp(loc,e,e2); continue; case TOKshlass: nextToken(); e2 = parseAssignExp(); e = new ShlAssignExp(loc,e,e2); continue; case TOKshrass: nextToken(); e2 = parseAssignExp(); e = new ShrAssignExp(loc,e,e2); continue; case TOKushrass: nextToken(); e2 = parseAssignExp(); e = new UshrAssignExp(loc,e,e2); continue; case TOKcatass: nextToken(); e2 = parseAssignExp(); e = new CatAssignExp(loc,e,e2); continue; default: break; } break; } return e; } Here, we see that e2 is always evaluated as an AssignExpression, thus this rule is right-recursive, not left-recursive as the other pattern. Again, note the while loop here. I'm not sure how to make it clearer, but if you have any questions don't hesitate to ask. P.S. - I've also used this code pattern as a template for my own language parser and it does work quite well. -- Regards, James Dunne
May 29 2006
parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"James Dunne" <james.jdunne gmail.com> wrote in message 
news:e5g01q$1l9r$1 digitaldaemon.com...

 The left-recursive descent rule 'OrOrExpression || AndAndExpression' means 
 to loop over the left side until a subsequent token is NOT the '||' token. 
 Thus, we can only get an OrOrExpression out of two AndAndExpressions next 
 to each other, separated by the '||' token, or just an AndAndExpression 
 alone.  So, the code does implement the spec correctly.  All the other 
 expression rules follow this pattern.

Ahhhahaha... I see. That's kind of what my mind was working towards when I was thinking about it.
 Here, we see that e2 is always evaluated as an AssignExpression, thus this 
 rule is right-recursive, not left-recursive as the other pattern. Again, 
 note the while loop here.

I thought it might have something to do with left- and right-associativity, as it's only the assignment operators which are right-associative (and the postfix operators too, but you don't see that with the way the D parser is structured). I used a different form of expression parsing before which required that you have the associativity of each operator known beforehand (along with the precedence, as using different associativity changed how you parsed the LHS and RHS of an operator, just like in the recursive descent method). The other method was a bit more terse and mathematical, but had the advantage of smaller code. But recursive descent is just.. cool, if a bit weird.
May 29 2006
prev sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Jarrett Billingsley wrote:
 I'm writing a small scripting language with a syntax based on D, and as 
 such, I'm basing some of the parsing code off the the DMD frontend source. 
 I've found a discrepancy between the spec and the code, though, and I'm not 
 sure if it's a bug or if it was changed in the code on purpose and the spec 
 just hasn't been updated.
 
 For EqualExpressions, the following grammar is given:
 
 EqualExpression:
     RelExpression
     EqualExpression == RelExpression
     EqualExpression != RelExpression
     EqualExpression is RelExpression
     EqualExpression !is RelExpression
 
 Notice that the left-hand operand of the operators is an EqualExpression.
 
 However, in the code (parse.c line 4365), instead of using parseEqualExp() 
 to get the left-hand operand as I would have expected, it instead uses 
 parseRelExp().  Meaning that the code implements the following grammar 
 instead:
 
 EqualExpression:
     RelExpression
     RelExpression == RelExpression
     RelExpression != RelExpression
     RelExpression is RelExpression
     RelExpression !is RelExpression
 
 I'm pretty sure only Walter can answer this one! 

It's implemented as (simplified): Expression *Parser::parseEqualExp() { Expression *e; Expression *e2; e = parseRelExp(); while (1) { enum TOK value = token.value; switch (value) { case TOKequal: case TOKnotequal: nextToken(); e2 = parseRelExp(); e = new EqualExp(value, loc, e, e2); continue; default: break; } break; } return e; } which means that: a == b == c == d is parsed as: (((a == b) == c) == d) which I think you'll see matches the grammar. The while loop makes it left-associative.
Jun 07 2006
next sibling parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Walter Bright" <newshound digitalmars.com> wrote in message 
news:e6606q$2t33$1 digitaldaemon.com...

 The while loop makes it left-associative.

Right, I noticed that when James replied about it; I looked at the associativity for all the C operators and noticed that only the left-associative operators were implemented this way.
Jun 07 2006
prev sibling parent Lionello Lunesu <lio lunesu.remove.com> writes:
Walter Bright wrote:
 which means that:
     a == b == c == d
 is parsed as:
     (((a == b) == c) == d)
 which I think you'll see matches the grammar. The while loop makes it 
 left-associative.

I think it would be a lot more logical (and useful) if that code were interpreted as: (a==b) && (b==c) && (c==d) //a==b==c==d Similarly, a < b < c could be the same as (a < b) && (b < c) //a<b<c L.
Jun 16 2006