digitalmars.D - declaration/expression

Ellery Newcomer (27/27) Jun 22 2009 Sorry for not posting this in learn, but I'd also like to hear the

BCS (9/24) Jun 22 2009 I don't think this can be interpreted as a declaration
Michal Minich (9/22) Jun 23 2009 In your example are at least two errors.
Jarrett Billingsley (13/20) Jun 23 2009 You're right; if a statement begins with an identifier, the compiler

Ellery Newcomer (29/53) Jun 23 2009 Heh. I saw that. I also saw a toExpression function in various

BCS (2/4) Jun 23 2009 Yes and I now I even more wish DMD would to :(
Jarrett Billingsley (4/11) Jun 23 2009 Ah, fuck. I can't believe D still accepts those. All the ambiguity
Tim Matthews (10/20) Jun 23 2009 After some incremental parsing iterations you should be able to

Ellery Newcomer (23/47) Jun 24 2009 Yeah, you're missing the point. The point is the D Language is billed as

Tim Matthews (45/58) Jun 24 2009 So you are following those steps from

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

Sorry for not posting this in learn, but I'd also like to hear the
Language Designer's input on this one.

How does dmd resolve the declaration/expression ambiguity?

My first instinct would be to try the declaration, and if it doesn't
work because the type doesn't exist or something like that then try the
expression, or vice versa. But that could easily lead to undefined and
unexpected behavior. what if both are valid?

Are there any straightforward rules for determining how to proceed?

And I might be wrong, but I don't think any of this is mentioned in the
spec? Should it be?

I can think of a number of examples, most of which dmd handles
gracefully. Here are a couple which, though contrived, seem to
illustrate that the rules are complicated, or more so than I would
conceive off the top of my head. Is the compiler doing what it should,
and if so, how?

import tango.io.Stdout;
void main(){
    int[4] i = [1,2,3,4];
    T(t); // compiler: I think this is an expression *barf*
    t(i[])(i[]); //compiler: I think this is a declaration *barf*
}

class T{
    public T opCall(int[] i){
        Stdout(i).newline;
        return this;
    }
}

Jun 22 2009

BCS <none anon.com> writes:

Hello Ellery,

 How does dmd resolve the declaration/expression ambiguity?
 

Could you elaborate? I'm not understanding the problem.

 
 import tango.io.Stdout;
 void main(){
 int[4] i = [1,2,3,4];
 T(t); // compiler: I think this is an expression *barf*

I don't think this can be interpreted as a declaration

 t(i[])(i[]); //compiler: I think this is a declaration *barf*

nor that

 }
 class T{
 public T opCall(int[] i){
 Stdout(i).newline;
 return this;
 }
 }

the only problem case I know of is:

a * b = d;

where this can be a decl of a pointer to type a called b and set to d
or the result of the expression a times b getting assigned d (operator
overloading 
can make this valid).

Jun 22 2009

Michal Minich <michal.minich gmail.com> writes:

 import tango.io.Stdout;
 void main(){
     int[4] i = [1,2,3,4];
     T(t); // compiler: I think this is an expression *barf* t(i[])(i[]);
     //compiler: I think this is a declaration *barf*
 }
 
 class T{
     public T opCall(int[] i){
         Stdout(i).newline;
         return this;
     }
 }

In your example are at least two errors.

T(t); -- the "t" is not defined there
public T opCall(int[] i) { -- it should be called as t(i), if you want 
call T(i), make this method static.

I personally do not encounter problem you are writing about. But you 
should be aware of: Expressions that have no effect, like (x + x), are 
illegal in expression statements. If such an expression is needed, 
casting it to void will make it legal. 

http://www.digitalmars.com/d/1.0/statement.html#ExpressionStatement

Jun 23 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Tue, Jun 23, 2009 at 1:00 AM, Ellery
Newcomer<ellery-newcomer utulsa.edu> wrote:
 Sorry for not posting this in learn, but I'd also like to hear the
 Language Designer's input on this one.

 How does dmd resolve the declaration/expression ambiguity?

 My first instinct would be to try the declaration, and if it doesn't
 work because the type doesn't exist or something like that then try the
 expression, or vice versa. But that could easily lead to undefined and
 unexpected behavior. what if both are valid?

You're right; if a statement begins with an identifier, the compiler
requires arbitrary lookahead to determine whether it's looking at an
expression or a declaration.  There's a good bit of duplicated code in
DMD dedicated to parsing declarations.  IIRC there's one version of
the parsing that just returns whether or not it's "probably" a
declaration, and another version that does the exact same thing but
which actually builds the AST.  Kind of icky.

But that being said, I don't think there are actually any ambiguities
in the grammar when it comes to this.  Neither of the "problem" lines
in your example code could possibly be interpreted as declarations,
and I don't think I can come up with any actually ambiguous code.

Jun 23 2009

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

Jarrett Billingsley wrote:
 On Tue, Jun 23, 2009 at 1:00 AM, Ellery
 Newcomer<ellery-newcomer utulsa.edu> wrote:
 Sorry for not posting this in learn, but I'd also like to hear the
 Language Designer's input on this one.

 How does dmd resolve the declaration/expression ambiguity?

 My first instinct would be to try the declaration, and if it doesn't
 work because the type doesn't exist or something like that then try the
 expression, or vice versa. But that could easily lead to undefined and
 unexpected behavior. what if both are valid?

 
 You're right; if a statement begins with an identifier, the compiler
 requires arbitrary lookahead to determine whether it's looking at an
 expression or a declaration.  There's a good bit of duplicated code in
 DMD dedicated to parsing declarations.  IIRC there's one version of
 the parsing that just returns whether or not it's "probably" a
 declaration, and another version that does the exact same thing but
 which actually builds the AST.  Kind of icky.

Heh. I saw that. I also saw a toExpression function in various
declaration structs. Didn't look deeply into it though.
 
 But that being said, I don't think there are actually any ambiguities
 in the grammar when it comes to this.  Neither of the "problem" lines
 in your example code could possibly be interpreted as declarations,
 and I don't think I can come up with any actually ambiguous code.

Wrong. Both are perfectly valid declarations (and did you miss my note?
the compiler *IS* interpreting the second as a declaration).
Okay, consider the rule declarator, which is (or should, if the grammar
wants to correctly reflect what the compiler is doing) defined like so

Declarator:
      BasicType2opt Identifier DeclaratorSuffixesopt
      BasicType2opt ( Declarator ) DeclaratorSuffixesopt

This is what allows D to accept C-style (forgot about those, didn't ya?)
declarations, and it's mostly what I'm referring to. Watch:

int(i); //compiles exactly the same as 'int i;'
int(*i)(int[]); //compiles the same as 'int function(int[]) i;'

So when I give something like

T(t);
t(*i)(i[]); //changed it a little, since 'int (i[])(int[])'
//            is semantically invalid

I intend a declaration and an expression. I get the opposite.
Fortunately, neither compiles, due to semantic errors.

To restate my question, if I'm a parser and I see

Identifier ( Identifier ) ;

which do I interpret it as?

Type ( NewSymbol ) ;
FunctionName ( Argument ) ;

If I see

Identifier . Identifier ( * Identifier ) ;

what do I resolve it as?

And it just goes downhill from there.

Jun 23 2009

BCS <none anon.com> writes:

Hello Ellery,

 This is what allows D to accept C-style (forgot about those, didn't
 ya?) declarations, and it's mostly what I'm referring to. Watch:

Yes and I now I even more wish DMD would to :(

Jun 23 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Tue, Jun 23, 2009 at 8:35 PM, Ellery
Newcomer<ellery-newcomer utulsa.edu> wrote:

 Wrong. Both are perfectly valid declarations (and did you miss my note?
 the compiler *IS* interpreting the second as a declaration).
 Okay, consider the rule declarator, which is (or should, if the grammar
 wants to correctly reflect what the compiler is doing) defined like so

 Declarator:
 =A0 =A0 =A0BasicType2opt Identifier DeclaratorSuffixesopt
 =A0 =A0 =A0BasicType2opt ( Declarator ) DeclaratorSuffixesopt

Ah, fuck.  I can't believe D still accepts those.  All the ambiguity
probably goes away without them, huh.

Jun 23 2009

Tim Matthews <tim.matthews7 gmail.com> writes:

Ellery Newcomer wrote:

 
 To restate my question, if I'm a parser and I see
 
 Identifier ( Identifier ) ;
 
 which do I interpret it as?
 
 Type ( NewSymbol ) ;
 FunctionName ( Argument ) ;
 


After some incremental parsing iterations you should be able to 
gradually resolve dependencies for each expression. If it's not 
ambiguous on what the source is trying to describe and all its 
dependencies are resolved then you add the new types that it may be 
declaring to a collection of parsed types. Repeat until everything can 
be passed and eventually you should know exactly what the first ID is 
(type, func etc). IIRC opCall can not be declared static.

Sorry if I am completely missing the point but this doesn't seem complex 
(in a problem solving sense but the code writing may be tedious)

Jun 23 2009

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

Tim Matthews wrote:
 Ellery Newcomer wrote:
 
 To restate my question, if I'm a parser and I see

 Identifier ( Identifier ) ;

 which do I interpret it as?

 Type ( NewSymbol ) ;
 FunctionName ( Argument ) ;

 
 
 After some incremental parsing iterations you should be able to
 gradually resolve dependencies for each expression. If it's not
 ambiguous on what the source is trying to describe and all its
 dependencies are resolved then you add the new types that it may be
 declaring to a collection of parsed types. Repeat until everything can
 be passed and eventually you should know exactly what the first ID is
 (type, func etc). IIRC opCall can not be declared static.

Remember back in D1 land when we didn't have struct constructors?
 
 Sorry if I am completely missing the point but this doesn't seem complex
 (in a problem solving sense but the code writing may be tedious)

Yeah, you're missing the point. The point is the D Language is billed as
one whose lexer is completely independent of its parser, which is
completely independent of its semantic analysis. The parser must be able
to decide all of these without any help from semantic. Anything less is
either failure or just plain wrong.

If you'll have another gander at my original example, you'll see that's
exactly what DMD does. The compiler decides that T(t) is an expression
and t(i[])(i[]) is a declaration, and if they don't resolve, then by
golly that's just too bad. It's an error. Game over.

It's mildly restrictive from the user's perspective, but from the
compiler writer's perspective, it is infinitely better than mixing
semantic and syntactic analysis. And anyways, T(t) can be rewritten the
normal way, and t(i[])(i[]) can be surrounded with parentheses to force
it to be an expression.

But question remains: how does the compiler decide this? I'm hoping for
some simple rule like if it is a C-style declaration, then it must have
a suffix or prefix for each level. It seems to be behaving something
like this.

You are right, though, none of this is complex, just tedious. Reading
the compiler's source code especially, though it sounds like I'm not
going to get answers any other way.

Jun 24 2009

Tim Matthews <tim.matthews7 gmail.com> writes:

Ellery Newcomer wrote:

 
 Yeah, you're missing the point. The point is the D Language is billed as
 one whose lexer is completely independent of its parser, which is
 completely independent of its semantic analysis. The parser must be able
 to decide all of these without any help from semantic. Anything less is
 either failure or just plain wrong.


So you are following those steps from 
http://digitalmars.com/d/2.0/lex.html. I don't think these are strict 
restrictions to allow your tool to be called a D language parser 
preventing you from re parsing. I think it is really trying to point out 
the first few steps and perhaps should be re written as:

1 source character set
The source file is checked to see what character set it is, and the 
appropriate scanner is loaded. ASCII and UTF formats are accepted.
2 script line

3 parse

Also from this page http://digitalmars.com/d/2.0/overview.html
features to drop: C source code compatibility.

This is not valid code with dmd v2.030 because D is not strictly 
compatible with C/C+.

struct A
{
     int i;
}

void main()
{
     A(a);
}

Now that I've tested that with structs, classes, and typedef'd int none 
of which worked. This does compile however:

void main()
{
     int(a);
     a = 2;
}

 From that dmd compatibility should be far simpler but going beyond that 
would be nicer.

 it is infinitely better than mixing
 semantic and syntactic analysis

I didn't recommend that.


 But question remains: how does the compiler decide this?

Built in types have the extra C compatibility. Dmd doesn't like this 
though and if it matters to you enough, report it as a bug:

alias int T;

void main()
{
     T(a);
     a = 2;
}


 
 You are right, though, none of this is complex, just tedious. Reading
 the compiler's source code especially, though it sounds like I'm not
 going to get answers any other way.


If you have a parser that allows that syntax to work a bit more than 
could you please provide an example of code here that is completely 
ambiguous to the compiler.

Jun 24 2009

D Programming

C/C++ Programming

Other

digitalmars.D - declaration/expression