digitalmars.D - Walter - Discrepancy between EqualExpression spec and implementation

Jarrett Billingsley (24/24) May 28 2006 I'm writing a small scripting language with a syntax based on D, and as

Jarrett Billingsley (5/10) May 28 2006 I just realized that most of the parsing functions follow this same patt...

Unknown W. Brackets (12/25) May 28 2006 Forgive me if I am off-base here, but the following code:

Jarrett Billingsley (6/9) May 28 2006 I'm thinking about it. It's very weird, but I think I've got it. I thi...

xs0 (10/11) May 29 2006 Heh, I think it's quite the opposite, with recursive descent you can

renox (11/22) Jun 02 2006 Urgh, if I see someone coding like this, let's just say that I won't be

Unknown W. Brackets (10/23) Jun 02 2006 I have been known to revert checkins for code line that, haha.

James Dunne (104/119) May 29 2006 I've taken a serious look into this too :) Here's my answer:

Jarrett Billingsley (14/23) May 29 2006 Ahhhahaha... I see. That's kind of what my mind was working towards whe...

Walter Bright (29/59) Jun 07 2006 It's implemented as (simplified):

Jarrett Billingsley (5/6) Jun 07 2006 Right, I noticed that when James replied about it; I looked at the
Lionello Lunesu (7/13) Jun 16 2006 I think it would be a lot more logical (and useful) if that code were

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

I'm writing a small scripting language with a syntax based on D, and as 
such, I'm basing some of the parsing code off the the DMD frontend source. 
I've found a discrepancy between the spec and the code, though, and I'm not 
sure if it's a bug or if it was changed in the code on purpose and the spec 
just hasn't been updated.

For EqualExpressions, the following grammar is given:

EqualExpression:
    RelExpression
    EqualExpression == RelExpression
    EqualExpression != RelExpression
    EqualExpression is RelExpression
    EqualExpression !is RelExpression

Notice that the left-hand operand of the operators is an EqualExpression.

However, in the code (parse.c line 4365), instead of using parseEqualExp() 
to get the left-hand operand as I would have expected, it instead uses 
parseRelExp().  Meaning that the code implements the following grammar 
instead:

EqualExpression:
    RelExpression
    RelExpression == RelExpression
    RelExpression != RelExpression
    RelExpression is RelExpression
    RelExpression !is RelExpression

I'm pretty sure only Walter can answer this one!

May 28 2006

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Jarrett Billingsley" <kb3ctd2 yahoo.com> wrote in message 
news:e5d40i$vp4$1 digitaldaemon.com...
 I'm writing a small scripting language with a syntax based on D, and as 
 such, I'm basing some of the parsing code off the the DMD frontend source. 
 I've found a discrepancy between the spec and the code, though, and I'm 
 not sure if it's a bug or if it was changed in the code on purpose and the 
 spec just hasn't been updated.

I just realized that most of the parsing functions follow this same pattern 
(of not following the spec).  The only functions that seem to follow the 
spec are parseAssignExp() and parseCondExp().

May 28 2006

"Unknown W. Brackets" <unknown simplemachines.org> writes:

Forgive me if I am off-base here, but the following code:

int main()
{
	int i = 0;
	bool b = i == 0 == true !is false;

	return 0;
}

Compiles without errors.  This would suggest to me that it is being 
parsed according to the spec, roughly.  Perhaps it's just not following 
that exact BNF (but implementing the same difference, e.g. outside 
EqualExpression)?

-[Unknown]


 "Jarrett Billingsley" <kb3ctd2 yahoo.com> wrote in message 
 news:e5d40i$vp4$1 digitaldaemon.com...
 I'm writing a small scripting language with a syntax based on D, and as 
 such, I'm basing some of the parsing code off the the DMD frontend source. 
 I've found a discrepancy between the spec and the code, though, and I'm 
 not sure if it's a bug or if it was changed in the code on purpose and the 
 spec just hasn't been updated.

 
 I just realized that most of the parsing functions follow this same pattern 
 (of not following the spec).  The only functions that seem to follow the 
 spec are parseAssignExp() and parseCondExp().

May 28 2006

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Unknown W. Brackets" <unknown simplemachines.org> wrote in message 
news:e5djuj$1dik$1 digitaldaemon.com...

 This would suggest to me that it is being parsed according to the spec, 
 roughly.  Perhaps it's just not following that exact BNF (but implementing 
 the same difference, e.g. outside EqualExpression)?

I'm thinking about it.  It's very weird, but I think I've got it.  I think 
all Walter's done with this code is factored out a recursive step; in this 
case, calling parseEqualExp() is the same as calling parseRelExp().

Recursive descent parsers are weird.

May 28 2006

xs0 <xs0 xs0.com> writes:

 Recursive descent parsers are weird. 

Heh, I think it's quite the opposite, with recursive descent you can 
actually figure out what's going on. Have you seen a yacc-generated 
parser (it's not recursive descent but "shift/reduce", LALR or whatever 
that's called)? Now that's really weird:

http://search.cpan.org/src/RGARCIA/perl-5.9.3/perly.tab
http://search.cpan.org/src/RGARCIA/perl-5.9.3/perly.c

Of course, the input file is much more readable, but you can't compare 
C++ code in one case with grammar definition code in the other case, can 
you? :)


xs0

May 29 2006

renox <renosky free.fr> writes:

Unknown W. Brackets wrote:

 Forgive me if I am off-base here, but the following code:
 
 int main()
 {
     int i = 0;
     bool b = i == 0 == true !is false;
 
     return 0;
 }
 
 Compiles without errors.

Urgh, if I see someone coding like this, let's just say that I won't be 
polite..

At some point I wonder if it wouldn't be better if in the language all 
the operators shouldn't have the same priority, just to force the 
programmers to use parenthesis!

The only downside I see is a big pitfall for the beginners and perhaps 
less opportunity for optimisation by the compiler depending of the 
implementation..

Regards,
RenoX

Jun 02 2006

"Unknown W. Brackets" <unknown simplemachines.org> writes:

I have been known to revert checkins for code line that, haha.

Still, I like that I can write something like this:

x = original_x * cos(angle) - original_y * sin(angle);
y = original_y * cos(angle) + original_x * sin(angle);

And know it will work as I expect.  For this reason, I love operator 
precedence.

But I agree that "x == y == z" is just wrong.  I even take it farther 
and can't stand when people do "x = y = 0;", even if it is fine code.

Everyone has their own preferences.

-[Unknown]


 Urgh, if I see someone coding like this, let's just say that I won't be 
 polite..
 
 At some point I wonder if it wouldn't be better if in the language all 
 the operators shouldn't have the same priority, just to force the 
 programmers to use parenthesis!
 
 The only downside I see is a big pitfall for the beginners and perhaps 
 less opportunity for optimisation by the compiler depending of the 
 implementation..
 
 Regards,
 RenoX

Jun 02 2006

James Dunne <james.jdunne gmail.com> writes:

Jarrett Billingsley wrote:
 "Jarrett Billingsley" <kb3ctd2 yahoo.com> wrote in message 
 news:e5d40i$vp4$1 digitaldaemon.com...
 
I'm writing a small scripting language with a syntax based on D, and as 
such, I'm basing some of the parsing code off the the DMD frontend source. 
I've found a discrepancy between the spec and the code, though, and I'm 
not sure if it's a bug or if it was changed in the code on purpose and the 
spec just hasn't been updated.

 
 
 I just realized that most of the parsing functions follow this same pattern 
 (of not following the spec).  The only functions that seem to follow the 
 spec are parseAssignExp() and parseCondExp(). 
 
 

I've taken a serious look into this too :)  Here's my answer:

See the while loops implemented in most of those functions?

Grammar rule:
-------------
OrOrExpression:
	AndAndExpression
	OrOrExpression || AndAndExpression

DMD implementation:
-------------------
Expression *Parser::parseOrOrExp()
{   Expression *e;
     Expression *e2;
     Loc loc = this->loc;

     e = parseAndAndExp();
     while (token.value == TOKoror)
     {
	nextToken();
	e2 = parseAndAndExp();
	e = new OrOrExp(loc, e, e2);
     }
     return e;
}

So, on the first iteration we either get an AndAndExpression alone, or 
an OrOrExpression with the left expression being an AndAndExpression and 
the right expression being an AndAndExpression.

The left-recursive descent rule 'OrOrExpression || AndAndExpression' 
means to loop over the left side until a subsequent token is NOT the 
'||' token.  Thus, we can only get an OrOrExpression out of two 
AndAndExpressions next to each other, separated by the '||' token, or 
just an AndAndExpression alone.  So, the code does implement the spec 
correctly.  All the other expression rules follow this pattern.

As to the exceptional case of AssignExpression:

Grammar rule:
-------------
AssignExpression:
	ConditionalExpression
	ConditionalExpression = AssignExpression
	ConditionalExpression += AssignExpression
	ConditionalExpression -= AssignExpression
	ConditionalExpression *= AssignExpression
	ConditionalExpression /= AssignExpression
	ConditionalExpression %= AssignExpression
	ConditionalExpression &= AssignExpression
	ConditionalExpression |= AssignExpression
	ConditionalExpression ^= AssignExpression
	ConditionalExpression ~= AssignExpression
	ConditionalExpression <<= AssignExpression
	ConditionalExpression >>= AssignExpression
	ConditionalExpression >>>= AssignExpression

DMD implementation:
(I've expanded out the #define shortcut for readability)
-------------------
Expression *Parser::parseAssignExp() {
     Expression *e;
     Expression *e2;
     Loc loc;

     e = parseCondExp();
     while (1) {
         loc = this->loc;
         switch (token.value) {
             case TOKassign:  nextToken(); e2 = parseAssignExp(); e = 
new AssignExp(loc,e,e2); continue;
             case TOKaddass:  nextToken(); e2 = parseAssignExp(); e = 
new AddAssignExp(loc,e,e2); continue;
             case TOKminass:  nextToken(); e2 = parseAssignExp(); e = 
new MinAssignExp(loc,e,e2); continue;
             case TOKmulass:  nextToken(); e2 = parseAssignExp(); e = 
new MulAssignExp(loc,e,e2); continue;
             case TOKdivass:  nextToken(); e2 = parseAssignExp(); e = 
new DivAssignExp(loc,e,e2); continue;
             case TOKmodass:  nextToken(); e2 = parseAssignExp(); e = 
new ModAssignExp(loc,e,e2); continue;
             case TOKandass:  nextToken(); e2 = parseAssignExp(); e = 
new AndAssignExp(loc,e,e2); continue;
             case TOKorass:   nextToken(); e2 = parseAssignExp(); e = 
new OrAssignExp(loc,e,e2); continue;
             case TOKxorass:  nextToken(); e2 = parseAssignExp(); e = 
new XorAssignExp(loc,e,e2); continue;
             case TOKshlass:  nextToken(); e2 = parseAssignExp(); e = 
new ShlAssignExp(loc,e,e2); continue;
             case TOKshrass:  nextToken(); e2 = parseAssignExp(); e = 
new ShrAssignExp(loc,e,e2); continue;
             case TOKushrass: nextToken(); e2 = parseAssignExp(); e = 
new UshrAssignExp(loc,e,e2); continue;
             case TOKcatass:  nextToken(); e2 = parseAssignExp(); e = 
new CatAssignExp(loc,e,e2); continue;

             default:
             break;
         }
         break;
     }
     return e;
}

Here, we see that e2 is always evaluated as an AssignExpression, thus 
this rule is right-recursive, not left-recursive as the other pattern. 
Again, note the while loop here.

I'm not sure how to make it clearer, but if you have any questions don't 
hesitate to ask.

P.S. - I've also used this code pattern as a template for my own 
language parser and it does work quite well.

-- 
Regards,
James Dunne

May 29 2006

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"James Dunne" <james.jdunne gmail.com> wrote in message 
news:e5g01q$1l9r$1 digitaldaemon.com...

 The left-recursive descent rule 'OrOrExpression || AndAndExpression' means 
 to loop over the left side until a subsequent token is NOT the '||' token. 
 Thus, we can only get an OrOrExpression out of two AndAndExpressions next 
 to each other, separated by the '||' token, or just an AndAndExpression 
 alone.  So, the code does implement the spec correctly.  All the other 
 expression rules follow this pattern.

Ahhhahaha... I see.  That's kind of what my mind was working towards when I 
was thinking about it.

 Here, we see that e2 is always evaluated as an AssignExpression, thus this 
 rule is right-recursive, not left-recursive as the other pattern. Again, 
 note the while loop here.

I thought it might have something to do with left- and right-associativity, 
as it's only the assignment operators which are right-associative (and the 
postfix operators too, but you don't see that with the way the D parser is 
structured).  I used a different form of expression parsing before which 
required that you have the associativity of each operator known beforehand 
(along with the precedence, as using different associativity changed how you 
parsed the LHS and RHS of an operator, just like in the recursive descent 
method).  The other method was a bit more terse and mathematical, but had 
the advantage of smaller code.  But recursive descent is just.. cool, if a 
bit weird.

May 29 2006

Walter Bright <newshound digitalmars.com> writes:

Jarrett Billingsley wrote:
 I'm writing a small scripting language with a syntax based on D, and as 
 such, I'm basing some of the parsing code off the the DMD frontend source. 
 I've found a discrepancy between the spec and the code, though, and I'm not 
 sure if it's a bug or if it was changed in the code on purpose and the spec 
 just hasn't been updated.
 
 For EqualExpressions, the following grammar is given:
 
 EqualExpression:
     RelExpression
     EqualExpression == RelExpression
     EqualExpression != RelExpression
     EqualExpression is RelExpression
     EqualExpression !is RelExpression
 
 Notice that the left-hand operand of the operators is an EqualExpression.
 
 However, in the code (parse.c line 4365), instead of using parseEqualExp() 
 to get the left-hand operand as I would have expected, it instead uses 
 parseRelExp().  Meaning that the code implements the following grammar 
 instead:
 
 EqualExpression:
     RelExpression
     RelExpression == RelExpression
     RelExpression != RelExpression
     RelExpression is RelExpression
     RelExpression !is RelExpression
 
 I'm pretty sure only Walter can answer this one! 

It's implemented as (simplified):

Expression *Parser::parseEqualExp()
{   Expression *e;
     Expression *e2;
     e = parseRelExp();
     while (1)
     {   enum TOK value = token.value;

         switch (value)
         {
             case TOKequal:
             case TOKnotequal:
                 nextToken();
                 e2 = parseRelExp();
                 e = new EqualExp(value, loc, e, e2);
                 continue;

             default:
                 break;
         }
         break;
     }
     return e;
}

which means that:
	a == b == c == d
is parsed as:
	(((a == b) == c) == d)
which I think you'll see matches the grammar. The while loop makes it 
left-associative.

Jun 07 2006

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Walter Bright" <newshound digitalmars.com> wrote in message 
news:e6606q$2t33$1 digitaldaemon.com...

 The while loop makes it left-associative.

Right, I noticed that when James replied about it; I looked at the 
associativity for all the C operators and noticed that only the 
left-associative operators were implemented this way.

Jun 07 2006

Lionello Lunesu <lio lunesu.remove.com> writes:

Walter Bright wrote:
 which means that:
     a == b == c == d
 is parsed as:
     (((a == b) == c) == d)
 which I think you'll see matches the grammar. The while loop makes it 
 left-associative.

I think it would be a lot more logical (and useful) if that code were 
interpreted as:

(a==b) && (b==c) && (c==d) //a==b==c==d

Similarly, a < b < c could be the same as

(a < b) && (b < c)  //a<b<c

L.

Jun 16 2006

D Programming

C/C++ Programming

Other

digitalmars.D - Walter - Discrepancy between EqualExpression spec and implementation