www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - context-free grammar

reply Simon Buerger <krox gmx.net> writes:
It is often said that D's grammar is easier to parse than C++, i.e. it 
should be possible to seperate syntactic and semantic analysis, which 
is not possible in C++ with the template-"< >" and so on. But I found 
following example:

The Line "a * b = c;" can be interpreted in two ways:
-> Declaration of variable b of type a*
-> (a*b) is itself a lvalue which is assigned to.

Current D (gdc 2.051) interprets it always in the first way and yields 
an error if the second is meant. The Workaround is simply to use 
parens like "(a*b)=c", so it's not a real issue. But at the same time, 
C++ (gcc 4.5) has no problem to distinguish it even without parens.

So, is the advertising as "context-free grammar" wrong?

- Krox
Mar 04 2011
next sibling parent reply "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Simon Buerger <krox gmx.net> wrote:

 It is often said that D's grammar is easier to parse than C++, i.e. it  
 should be possible to seperate syntactic and semantic analysis, which is  
 not possible in C++ with the template-"< >" and so on. But I found  
 following example:

 The Line "a * b = c;" can be interpreted in two ways:
 -> Declaration of variable b of type a*
 -> (a*b) is itself a lvalue which is assigned to.

 Current D (gdc 2.051) interprets it always in the first way and yields  
 an error if the second is meant. The Workaround is simply to use parens  
 like "(a*b)=c", so it's not a real issue. But at the same time, C++ (gcc  
 4.5) has no problem to distinguish it even without parens.

 So, is the advertising as "context-free grammar" wrong?

Well, obviously not. The grammar has one and only one meaning for that example - that of an a* called b, being set to c. This can be inferred with no other context. -- Simen
Mar 04 2011
parent reply bearophile <bearophileHUGS lycos.com> writes:
Simen kjaeraas:

 Well, obviously not. The grammar has one and only one meaning for that
 example - that of an a* called b, being set to c. This can be inferred
 with no other context.

This little program: struct Foo { int x; Foo opBinary(string op:"*")(Foo other) { Foo result = Foo(x * other.x); return result; } void opAssign(Foo other) { x = other.x; } } void main() { Foo a, b, c; a * b = c; } Gives: test.d(10): Error: a is used as a type test.d(10): Error: cannot implicitly convert expression (c) of type Foo to _error_* test.d(10): Error: declaration test.main.b is already defined While this one gives no errors: struct Foo { int x; Foo opBinary(string op:"*")(Foo other) { Foo result = Foo(x * other.x); return result; } void opAssign(Foo other) { x = other.x; } } void main() { Foo a, b, c; (a * b) = c; } Bye, bearophile
Mar 04 2011
parent reply SiegeLord <none none.org> writes:
bearophile Wrote:

 Simen kjaeraas:
 
 Well, obviously not. The grammar has one and only one meaning for that
 example - that of an a* called b, being set to c. This can be inferred
 with no other context.

This little program: struct Foo { int x; Foo opBinary(string op:"*")(Foo other) { Foo result = Foo(x * other.x); return result; } void opAssign(Foo other) { x = other.x; } } void main() { Foo a, b, c; a * b = c; } Gives: test.d(10): Error: a is used as a type test.d(10): Error: cannot implicitly convert expression (c) of type Foo to _error_* test.d(10): Error: declaration test.main.b is already defined While this one gives no errors: struct Foo { int x; Foo opBinary(string op:"*")(Foo other) { Foo result = Foo(x * other.x); return result; } void opAssign(Foo other) { x = other.x; } } void main() { Foo a, b, c; (a * b) = c; } Bye, bearophile

Yeah, and this: struct Foo { } void main() { Foo a, b, c; (a * b) = c; } Gives an error. I don't see any problem here: a * b; // always a pointer declaration (a * b); // always a binary expression -SiegeLord
Mar 04 2011
parent Walter Bright <newshound2 digitalmars.com> writes:
SiegeLord wrote:
 Gives an error. I don't see any problem here:
 
 a * b; // always a pointer declaration
 (a * b); // always a binary expression

There isn't one. C++ decides if a*b=c; is a declaration or expression based on whether 'a' is a type or a variable. That requires semantic analysis. D's rule does not.
Mar 04 2011
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday 04 March 2011 17:05:57 Simon Buerger wrote:
 It is often said that D's grammar is easier to parse than C++, i.e. it
 should be possible to seperate syntactic and semantic analysis, which
 is not possible in C++ with the template-"< >" and so on. But I found
 following example:
 
 The Line "a * b = c;" can be interpreted in two ways:
 -> Declaration of variable b of type a*
 -> (a*b) is itself a lvalue which is assigned to.
 
 Current D (gdc 2.051) interprets it always in the first way and yields
 an error if the second is meant. The Workaround is simply to use
 parens like "(a*b)=c", so it's not a real issue. But at the same time,
 C++ (gcc 4.5) has no problem to distinguish it even without parens.
 
 So, is the advertising as "context-free grammar" wrong?

Umm. How could a * b be assigned to? It's definitely not an lvalue. Do you mean that an overloaded opBinary!"*" is used which returns a ref? It certainly can't be done normally. - Jonathan M Davis
Mar 04 2011
parent reply uri <fan languages.org> writes:
Jonathan M Davis Wrote:

 On Friday 04 March 2011 17:05:57 Simon Buerger wrote:
 It is often said that D's grammar is easier to parse than C++, i.e. it
 should be possible to seperate syntactic and semantic analysis, which
 is not possible in C++ with the template-"< >" and so on. But I found
 following example:
 
 The Line "a * b = c;" can be interpreted in two ways:
 -> Declaration of variable b of type a*
 -> (a*b) is itself a lvalue which is assigned to.
 
 Current D (gdc 2.051) interprets it always in the first way and yields
 an error if the second is meant. The Workaround is simply to use
 parens like "(a*b)=c", so it's not a real issue. But at the same time,
 C++ (gcc 4.5) has no problem to distinguish it even without parens.
 
 So, is the advertising as "context-free grammar" wrong?

Umm. How could a * b be assigned to? It's definitely not an lvalue. Do you mean that an overloaded opBinary!"*" is used which returns a ref? It certainly can't be done normally.

Explain why (a*b) is lvalue in bearophile's second example. This is one of the weird things in D. The language is too complex. It takes years to find out about the corner cases. I wouldn't use it for anything reliable
Mar 04 2011
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
uri wrote:
 Explain why (a*b) is lvalue in bearophile's second example.

Because the expression evaluates to a temporary, which is an lvalue.
 This is one of the weird things in D. The language is too complex. It takes
 years to find out about the corner cases.

It's not a weird corner case at all. Temporaries can be used as lvalues (in C++ too).
Mar 04 2011
parent reply Peter Alexander <peter.alexander.au gmail.com> writes:
On 5/03/11 4:39 AM, Jonathan M Davis wrote:
 On Friday 04 March 2011 20:31:38 Walter Bright wrote:
 uri wrote:
 Explain why (a*b) is lvalue in bearophile's second example.

Because the expression evaluates to a temporary, which is an lvalue.
 This is one of the weird things in D. The language is too complex. It
 takes years to find out about the corner cases.

It's not a weird corner case at all. Temporaries can be used as lvalues (in C++ too).

Really? I thought that a temporary was pretty much _the_ classic example of an rvalue. If you can assign to temporaries, you can assign to most anything then, other than literals. Why on earth would assigning to temporaries be permitted? That just seems unnecessary and bug-prone. - Jonathan M Davis

How do you think array assignments in C++ work? a[i] = x; a[i] is just *(a + i), i.e. the evaluation of an expression that yields a temporary, which in this case is an lvalue. Same applies to all other operator[] overloads.
Mar 05 2011
parent reply Mafi <mafi example.org> writes:
Am 05.03.2011 13:10, schrieb Peter Alexander:
 On 5/03/11 4:39 AM, Jonathan M Davis wrote:
 On Friday 04 March 2011 20:31:38 Walter Bright wrote:
 uri wrote:
 Explain why (a*b) is lvalue in bearophile's second example.

Because the expression evaluates to a temporary, which is an lvalue.
 This is one of the weird things in D. The language is too complex. It
 takes years to find out about the corner cases.

It's not a weird corner case at all. Temporaries can be used as lvalues (in C++ too).

Really? I thought that a temporary was pretty much _the_ classic example of an rvalue. If you can assign to temporaries, you can assign to most anything then, other than literals. Why on earth would assigning to temporaries be permitted? That just seems unnecessary and bug-prone. - Jonathan M Davis

How do you think array assignments in C++ work? a[i] = x; a[i] is just *(a + i), i.e. the evaluation of an expression that yields a temporary, which in this case is an lvalue. Same applies to all other operator[] overloads.

No, the temporary in this case is not an lvalue. It's an adress whose value is an lvalue. The results of operator[] is a reference not a normal value. A reference is an adress which is always implicitly derefenced when used. In D we use opIndexAssign anyways. Mafi
Mar 05 2011
parent reply Peter Alexander <peter.alexander.au gmail.com> writes:
On 5/03/11 1:39 PM, Mafi wrote:
 Am 05.03.2011 13:10, schrieb Peter Alexander:
 How do you think array assignments in C++ work?

 a[i] = x;

 a[i] is just *(a + i), i.e. the evaluation of an expression that yields
 a temporary, which in this case is an lvalue. Same applies to all other
 operator[] overloads.

No, the temporary in this case is not an lvalue. It's an adress whose value is an lvalue.

(a + i) is an address (type T*) *(a + i) is a lvalue reference (type T&)
 The results of operator[] is a reference not a normal value. A reference
 is an adress which is always implicitly derefenced when used.
 In D we use opIndexAssign anyways.

A reference is not an address. A reference is a synonym for another object. If that other object is an lvalue then the reference is also an lvalue. Sources: http://www.parashift.com/c++-faq-lite/references.html#faq-8.2 "In compiler writer lingo, a reference is an 'lvalue' (something that can appear on the left hand side of an assignment operator)." http://www.artima.com/cppsource/rvalue.html "To better distinguish these two types, we refer to a traditional C++ reference as an lvalue reference." http://msdn.microsoft.com/en-us/library/64sa8b1e.aspx "The operand of the address-of operator can be either a function designator or an l-value that designates an object" (I can use the address-of operator on *(a+i), and it's not a function designator, therefore it is an l-value)
Mar 05 2011
parent reply Mafi <mafi example.org> writes:
Am 05.03.2011 16:44, schrieb Peter Alexander:
 On 5/03/11 1:39 PM, Mafi wrote:
 Am 05.03.2011 13:10, schrieb Peter Alexander:
 How do you think array assignments in C++ work?

 a[i] = x;

 a[i] is just *(a + i), i.e. the evaluation of an expression that yields
 a temporary, which in this case is an lvalue. Same applies to all other
 operator[] overloads.

No, the temporary in this case is not an lvalue. It's an adress whose value is an lvalue.

(a + i) is an address (type T*) *(a + i) is a lvalue reference (type T&)

I know. I meant the temporary itself (ie a + i) is not an lvalue; it lies around in some register which should the coder should not explicitly write to. The dereferencing only changed what to do with the result. There's no computation behind derefencing.
 The results of operator[] is a reference not a normal value. A reference
 is an adress which is always implicitly derefenced when used.
 In D we use opIndexAssign anyways.

A reference is not an address. A reference is a synonym for another object. If that other object is an lvalue then the reference is also an lvalue.

A reference is nothing else than a pointer which the compiler handles diferent at compile time. Look /++++++ main.d +++++++++/ import std.stdio; //extern(C) to avoid mangling extern(C) void refTest(ref int x); void main() { int a = 5; refTest(a); a = 7; refTest(a); a = 42; refTest(a); writeln("END"); } /+++++++ test.d ++++++++/ import std.stdio; extern(C) void refTest(int* x) { writefln(" x = %s, *x = %s", x, *x); } /+++++++ compile +++++++/ dmd -c test.d dmd main.d test.obj ./main /+++++ output ++++++++/ x = 12FE44, *x = 5 x = 12FE44, *x = 7 x = 12FE44, *x = 42 END Tested with dmd 2.051 on Win7. Look reference = pointer + implicite derefence
.......

Mar 05 2011
next sibling parent reply %u <wfunction hotmail.com> writes:
I didn't see this example being mentioned in this thread (although I
might have missed this), but would someone explain why (1) the code
below doesn't compile, and (2) why it's considered context-free?

struct MyStruct { ref MyStruct opMul(MyStruct x) { return this; } }
...
OpOverloadAbuse a, b;
a * b = b;

Thanks!
Mar 06 2011
parent reply %u <wfunction hotmail.com> writes:
 That's essentially the example that's been under discussion - though in this
case it's a ref instead of

declaration, not a call to the multiplication operator. If you want it to use the multiplication operator, then use parens: (a * b) = b. It's context free, because it just assumes one of the two and it's _always_ that one, so there's no ambiguity. It is, _by definition_, a variable declaration. Oh, I see. So is multiplication being special-cased in the grammar, or is it part of a more general rule in the language?
 Also, opMul is on its way to deprecation. binaryOp should be used for
overloading the multiplication

Whoa! I did not know that; thanks.
Mar 06 2011
parent KennyTM~ <kennytm gmail.com> writes:
On Mar 7, 11 15:16, %u wrote:
 That's essentially the example that's been under discussion - though in this
case it's a ref instead of

declaration, not a call to the multiplication operator. If you want it to use the multiplication operator, then use parens: (a * b) = b. It's context free, because it just assumes one of the two and it's _always_ that one, so there's no ambiguity. It is, _by definition_, a variable declaration. Oh, I see. So is multiplication being special-cased in the grammar, or is it part of a more general rule in the language?

Only statement of the form 'p*q=r' is "special-cased". Actually it's not quite fair to say it is "special-cased", just because declarations (which accepts 'p*q=r') are processed before expressions. import std.stdio; struct Foo { ref Foo opBinary(string x) (int a) pure nothrow { return this; } ref Foo opBinaryRight(string x) (int a) pure nothrow { return this; } } void main() { Foo a; int b; a + 1 = a; a * 1 = a; 1 + a = a; 1 * a = a; a + b = a; //a * b = a; }
 Also, opMul is on its way to deprecation. binaryOp should be used for
overloading the multiplication

Whoa! I did not know that; thanks.

Mar 06 2011
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday 06 March 2011 22:38:50 %u wrote:
 I didn't see this example being mentioned in this thread (although I
 might have missed this), but would someone explain why (1) the code
 below doesn't compile, and (2) why it's considered context-free?
 
 struct MyStruct { ref MyStruct opMul(MyStruct x) { return this; } }
 ...
 OpOverloadAbuse a, b;
 a * b = b;

That's essentially the example that's been under discussion - though in this case it's a ref instead of a temporary for the lvalue. Regardless, it's context free because a * b is by definition a variable declaration, not a call to the multiplication operator. If you want it to use the multiplication operator, then use parens: (a * b) = b. It's context free, because it just assumes one of the two and it's _always_ that one, so there's no ambiguity. It is, _by definition_, a variable declaration. Also, opMul is on its way to deprecation. binaryOp should be used for overloading the multiplication operator. - Jonathan M Davis
Mar 06 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday 04 March 2011 20:31:38 Walter Bright wrote:
 uri wrote:
 Explain why (a*b) is lvalue in bearophile's second example.

Because the expression evaluates to a temporary, which is an lvalue.
 This is one of the weird things in D. The language is too complex. It
 takes years to find out about the corner cases.

It's not a weird corner case at all. Temporaries can be used as lvalues (in C++ too).

Really? I thought that a temporary was pretty much _the_ classic example of an rvalue. If you can assign to temporaries, you can assign to most anything then, other than literals. Why on earth would assigning to temporaries be permitted? That just seems unnecessary and bug-prone. - Jonathan M Davis
Mar 04 2011
prev sibling parent "Nick Sabalausky" <a a.a> writes:
"uri" <fan languages.org> wrote in message 
news:iks9jb$127g$1 digitalmars.com...
 Jonathan M Davis Wrote:

 On Friday 04 March 2011 17:05:57 Simon Buerger wrote:
 It is often said that D's grammar is easier to parse than C++, i.e. it
 should be possible to seperate syntactic and semantic analysis, which
 is not possible in C++ with the template-"< >" and so on. But I found
 following example:

 The Line "a * b = c;" can be interpreted in two ways:
 -> Declaration of variable b of type a*
 -> (a*b) is itself a lvalue which is assigned to.

 Current D (gdc 2.051) interprets it always in the first way and yields
 an error if the second is meant. The Workaround is simply to use
 parens like "(a*b)=c", so it's not a real issue. But at the same time,
 C++ (gcc 4.5) has no problem to distinguish it even without parens.

 So, is the advertising as "context-free grammar" wrong?

Umm. How could a * b be assigned to? It's definitely not an lvalue. Do you mean that an overloaded opBinary!"*" is used which returns a ref? It certainly can't be done normally.

Explain why (a*b) is lvalue in bearophile's second example. This is one of the weird things in D. The language is too complex. It takes years to find out about the corner cases. I wouldn't use it for anything reliable

Corner cases are certainly a PITA in certain corner cases. But simplistic languages are a PITA in most everyday cases. If I felt that simpler languages were better I'd use Brainfuck as my primary language.
Mar 04 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday 04 March 2011 19:10:35 uri wrote:
 Jonathan M Davis Wrote:
 On Friday 04 March 2011 17:05:57 Simon Buerger wrote:
 It is often said that D's grammar is easier to parse than C++, i.e. it
 should be possible to seperate syntactic and semantic analysis, which
 is not possible in C++ with the template-"< >" and so on. But I found
 following example:
 
 The Line "a * b = c;" can be interpreted in two ways:
 -> Declaration of variable b of type a*
 -> (a*b) is itself a lvalue which is assigned to.
 
 Current D (gdc 2.051) interprets it always in the first way and yields
 an error if the second is meant. The Workaround is simply to use
 parens like "(a*b)=c", so it's not a real issue. But at the same time,
 C++ (gcc 4.5) has no problem to distinguish it even without parens.
 
 So, is the advertising as "context-free grammar" wrong?

Umm. How could a * b be assigned to? It's definitely not an lvalue. Do you mean that an overloaded opBinary!"*" is used which returns a ref? It certainly can't be done normally.

Explain why (a*b) is lvalue in bearophile's second example. This is one of the weird things in D. The language is too complex. It takes years to find out about the corner cases. I wouldn't use it for anything reliable

I'd argue that it's a bug related to operator overloading. = is converted to opAssign, which is then essentially a normal function. And because the result of * is a Foo which has an overloaded opAssign, the call to opAssign succeeds like any other function call would, because = was lowered to opAssign. I don't see what else could possibly be happening here. And since this violates the fact that opAssign is supposed to operate on lvalues, and the result of a * b is an rvalue, not an lvalue, this is a bug. - Jonathan M Davis
Mar 04 2011
prev sibling next sibling parent Rainer Schuetze <r.sagitario gmx.de> writes:
The ambiguities are simply resolved by this rule in the language 
specification: "Any ambiguities in the grammar between Statements and 
Declarations are resolved by the declarations taking precedence." ( 
http://www.digitalmars.com/d/2.0/statement.html ).

Simon Buerger wrote:
 It is often said that D's grammar is easier to parse than C++, i.e. it 
 should be possible to seperate syntactic and semantic analysis, which is 
 not possible in C++ with the template-"< >" and so on. But I found 
 following example:
 
 The Line "a * b = c;" can be interpreted in two ways:
 -> Declaration of variable b of type a*
 -> (a*b) is itself a lvalue which is assigned to.
 
 Current D (gdc 2.051) interprets it always in the first way and yields 
 an error if the second is meant. The Workaround is simply to use parens 
 like "(a*b)=c", so it's not a real issue. But at the same time, C++ (gcc 
 4.5) has no problem to distinguish it even without parens.
 
 So, is the advertising as "context-free grammar" wrong?
 
 - Krox

Mar 04 2011
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday 06 March 2011 23:16:33 %u wrote:
 That's essentially the example that's been under discussion - though in
 this case it's a ref instead of

a temporary for the lvalue. Regardless, it's context free because a * b is by definition a variable declaration, not a call to the multiplication operator. If you want it to use the multiplication operator, then use parens: (a * b) = b. It's context free, because it just assumes one of the two and it's _always_ that one, so there's no ambiguity. It is, _by definition_, a variable declaration. Oh, I see. So is multiplication being special-cased in the grammar, or is it part of a more general rule in the language?

It's not really that multiplication is being special-cased. It's that when something could either be a pointer declaration or a multiplicative expression, it's deemed to be a pointer declaration. Anywhere where it wouldn't be a pointer declaration, it's a multiplicative expression.
 Also, opMul is on its way to deprecation. binaryOp should be used for
 overloading the multiplication


_Most_ of the old opX functions are going to be deprecated in favor of functions like opUnary and opBinary - which are far more flexible. You should probably read http://www.digitalmars.com/d/2.0/operatoroverloading.html - and if you can, reading TDPL (The D Programming Language by Andrei Alexandrescu) would be even better. - Jonathan M Davis
Mar 06 2011