www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - declaration/expression

reply Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
Sorry for not posting this in learn, but I'd also like to hear the
Language Designer's input on this one.

How does dmd resolve the declaration/expression ambiguity?

My first instinct would be to try the declaration, and if it doesn't
work because the type doesn't exist or something like that then try the
expression, or vice versa. But that could easily lead to undefined and
unexpected behavior. what if both are valid?

Are there any straightforward rules for determining how to proceed?

And I might be wrong, but I don't think any of this is mentioned in the
spec? Should it be?

I can think of a number of examples, most of which dmd handles
gracefully. Here are a couple which, though contrived, seem to
illustrate that the rules are complicated, or more so than I would
conceive off the top of my head. Is the compiler doing what it should,
and if so, how?

import tango.io.Stdout;
void main(){
    int[4] i = [1,2,3,4];
    T(t); // compiler: I think this is an expression *barf*
    t(i[])(i[]); //compiler: I think this is a declaration *barf*
}

class T{
    public T opCall(int[] i){
        Stdout(i).newline;
        return this;
    }
}
Jun 22 2009
next sibling parent BCS <none anon.com> writes:
Hello Ellery,

 How does dmd resolve the declaration/expression ambiguity?
 

Could you elaborate? I'm not understanding the problem.
 
 import tango.io.Stdout;
 void main(){
 int[4] i = [1,2,3,4];
 T(t); // compiler: I think this is an expression *barf*

I don't think this can be interpreted as a declaration
 t(i[])(i[]); //compiler: I think this is a declaration *barf*

nor that
 }
 class T{
 public T opCall(int[] i){
 Stdout(i).newline;
 return this;
 }
 }

the only problem case I know of is: a * b = d; where this can be a decl of a pointer to type a called b and set to d or the result of the expression a times b getting assigned d (operator overloading can make this valid).
Jun 22 2009
prev sibling next sibling parent Michal Minich <michal.minich gmail.com> writes:
 import tango.io.Stdout;
 void main(){
     int[4] i = [1,2,3,4];
     T(t); // compiler: I think this is an expression *barf* t(i[])(i[]);
     //compiler: I think this is a declaration *barf*
 }
 
 class T{
     public T opCall(int[] i){
         Stdout(i).newline;
         return this;
     }
 }

In your example are at least two errors. T(t); -- the "t" is not defined there public T opCall(int[] i) { -- it should be called as t(i), if you want call T(i), make this method static. I personally do not encounter problem you are writing about. But you should be aware of: Expressions that have no effect, like (x + x), are illegal in expression statements. If such an expression is needed, casting it to void will make it legal. http://www.digitalmars.com/d/1.0/statement.html#ExpressionStatement
Jun 23 2009
prev sibling next sibling parent reply Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Tue, Jun 23, 2009 at 1:00 AM, Ellery
Newcomer<ellery-newcomer utulsa.edu> wrote:
 Sorry for not posting this in learn, but I'd also like to hear the
 Language Designer's input on this one.

 How does dmd resolve the declaration/expression ambiguity?

 My first instinct would be to try the declaration, and if it doesn't
 work because the type doesn't exist or something like that then try the
 expression, or vice versa. But that could easily lead to undefined and
 unexpected behavior. what if both are valid?

You're right; if a statement begins with an identifier, the compiler requires arbitrary lookahead to determine whether it's looking at an expression or a declaration. There's a good bit of duplicated code in DMD dedicated to parsing declarations. IIRC there's one version of the parsing that just returns whether or not it's "probably" a declaration, and another version that does the exact same thing but which actually builds the AST. Kind of icky. But that being said, I don't think there are actually any ambiguities in the grammar when it comes to this. Neither of the "problem" lines in your example code could possibly be interpreted as declarations, and I don't think I can come up with any actually ambiguous code.
Jun 23 2009
parent reply Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
Jarrett Billingsley wrote:
 On Tue, Jun 23, 2009 at 1:00 AM, Ellery
 Newcomer<ellery-newcomer utulsa.edu> wrote:
 Sorry for not posting this in learn, but I'd also like to hear the
 Language Designer's input on this one.

 How does dmd resolve the declaration/expression ambiguity?

 My first instinct would be to try the declaration, and if it doesn't
 work because the type doesn't exist or something like that then try the
 expression, or vice versa. But that could easily lead to undefined and
 unexpected behavior. what if both are valid?

You're right; if a statement begins with an identifier, the compiler requires arbitrary lookahead to determine whether it's looking at an expression or a declaration. There's a good bit of duplicated code in DMD dedicated to parsing declarations. IIRC there's one version of the parsing that just returns whether or not it's "probably" a declaration, and another version that does the exact same thing but which actually builds the AST. Kind of icky.

Heh. I saw that. I also saw a toExpression function in various declaration structs. Didn't look deeply into it though.
 
 But that being said, I don't think there are actually any ambiguities
 in the grammar when it comes to this.  Neither of the "problem" lines
 in your example code could possibly be interpreted as declarations,
 and I don't think I can come up with any actually ambiguous code.

Wrong. Both are perfectly valid declarations (and did you miss my note? the compiler *IS* interpreting the second as a declaration). Okay, consider the rule declarator, which is (or should, if the grammar wants to correctly reflect what the compiler is doing) defined like so Declarator: BasicType2opt Identifier DeclaratorSuffixesopt BasicType2opt ( Declarator ) DeclaratorSuffixesopt This is what allows D to accept C-style (forgot about those, didn't ya?) declarations, and it's mostly what I'm referring to. Watch: int(i); //compiles exactly the same as 'int i;' int(*i)(int[]); //compiles the same as 'int function(int[]) i;' So when I give something like T(t); t(*i)(i[]); //changed it a little, since 'int (i[])(int[])' // is semantically invalid I intend a declaration and an expression. I get the opposite. Fortunately, neither compiles, due to semantic errors. To restate my question, if I'm a parser and I see Identifier ( Identifier ) ; which do I interpret it as? Type ( NewSymbol ) ; FunctionName ( Argument ) ; If I see Identifier . Identifier ( * Identifier ) ; what do I resolve it as? And it just goes downhill from there.
Jun 23 2009
next sibling parent BCS <none anon.com> writes:
Hello Ellery,

 This is what allows D to accept C-style (forgot about those, didn't
 ya?) declarations, and it's mostly what I'm referring to. Watch:

Yes and I now I even more wish DMD would to :(
Jun 23 2009
prev sibling parent reply Tim Matthews <tim.matthews7 gmail.com> writes:
Ellery Newcomer wrote:

 
 To restate my question, if I'm a parser and I see
 
 Identifier ( Identifier ) ;
 
 which do I interpret it as?
 
 Type ( NewSymbol ) ;
 FunctionName ( Argument ) ;
 

After some incremental parsing iterations you should be able to gradually resolve dependencies for each expression. If it's not ambiguous on what the source is trying to describe and all its dependencies are resolved then you add the new types that it may be declaring to a collection of parsed types. Repeat until everything can be passed and eventually you should know exactly what the first ID is (type, func etc). IIRC opCall can not be declared static. Sorry if I am completely missing the point but this doesn't seem complex (in a problem solving sense but the code writing may be tedious)
Jun 23 2009
parent reply Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
Tim Matthews wrote:
 Ellery Newcomer wrote:
 
 To restate my question, if I'm a parser and I see

 Identifier ( Identifier ) ;

 which do I interpret it as?

 Type ( NewSymbol ) ;
 FunctionName ( Argument ) ;

After some incremental parsing iterations you should be able to gradually resolve dependencies for each expression. If it's not ambiguous on what the source is trying to describe and all its dependencies are resolved then you add the new types that it may be declaring to a collection of parsed types. Repeat until everything can be passed and eventually you should know exactly what the first ID is (type, func etc). IIRC opCall can not be declared static.

Remember back in D1 land when we didn't have struct constructors?
 
 Sorry if I am completely missing the point but this doesn't seem complex
 (in a problem solving sense but the code writing may be tedious)

Yeah, you're missing the point. The point is the D Language is billed as one whose lexer is completely independent of its parser, which is completely independent of its semantic analysis. The parser must be able to decide all of these without any help from semantic. Anything less is either failure or just plain wrong. If you'll have another gander at my original example, you'll see that's exactly what DMD does. The compiler decides that T(t) is an expression and t(i[])(i[]) is a declaration, and if they don't resolve, then by golly that's just too bad. It's an error. Game over. It's mildly restrictive from the user's perspective, but from the compiler writer's perspective, it is infinitely better than mixing semantic and syntactic analysis. And anyways, T(t) can be rewritten the normal way, and t(i[])(i[]) can be surrounded with parentheses to force it to be an expression. But question remains: how does the compiler decide this? I'm hoping for some simple rule like if it is a C-style declaration, then it must have a suffix or prefix for each level. It seems to be behaving something like this. You are right, though, none of this is complex, just tedious. Reading the compiler's source code especially, though it sounds like I'm not going to get answers any other way.
Jun 24 2009
parent Tim Matthews <tim.matthews7 gmail.com> writes:
Ellery Newcomer wrote:

 
 Yeah, you're missing the point. The point is the D Language is billed as
 one whose lexer is completely independent of its parser, which is
 completely independent of its semantic analysis. The parser must be able
 to decide all of these without any help from semantic. Anything less is
 either failure or just plain wrong.

So you are following those steps from http://digitalmars.com/d/2.0/lex.html. I don't think these are strict restrictions to allow your tool to be called a D language parser preventing you from re parsing. I think it is really trying to point out the first few steps and perhaps should be re written as: 1 source character set The source file is checked to see what character set it is, and the appropriate scanner is loaded. ASCII and UTF formats are accepted. 2 script line If the first line starts with #! then the first line is ignored. 3 parse Also from this page http://digitalmars.com/d/2.0/overview.html features to drop: C source code compatibility. This is not valid code with dmd v2.030 because D is not strictly compatible with C/C+. struct A { int i; } void main() { A(a); } Now that I've tested that with structs, classes, and typedef'd int none of which worked. This does compile however: void main() { int(a); a = 2; } From that dmd compatibility should be far simpler but going beyond that would be nicer.
 it is infinitely better than mixing
 semantic and syntactic analysis

I didn't recommend that.
 But question remains: how does the compiler decide this?

Built in types have the extra C compatibility. Dmd doesn't like this though and if it matters to you enough, report it as a bug: alias int T; void main() { T(a); a = 2; }
 
 You are right, though, none of this is complex, just tedious. Reading
 the compiler's source code especially, though it sounds like I'm not
 going to get answers any other way.

If you have a parser that allows that syntax to work a bit more than could you please provide an example of code here that is completely ambiguous to the compiler.
Jun 24 2009
prev sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Tue, Jun 23, 2009 at 8:35 PM, Ellery
Newcomer<ellery-newcomer utulsa.edu> wrote:

 Wrong. Both are perfectly valid declarations (and did you miss my note?
 the compiler *IS* interpreting the second as a declaration).
 Okay, consider the rule declarator, which is (or should, if the grammar
 wants to correctly reflect what the compiler is doing) defined like so

 Declarator:
 =A0 =A0 =A0BasicType2opt Identifier DeclaratorSuffixesopt
 =A0 =A0 =A0BasicType2opt ( Declarator ) DeclaratorSuffixesopt

Ah, fuck. I can't believe D still accepts those. All the ambiguity probably goes away without them, huh.
Jun 23 2009