digitalmars.D.learn - How do I find the arity of an Expression? (dmd hacking)

Chad J (7/7) Nov 29 2009 Given an Expression object in dmd, I'd like to know how many

Ary Borenszweig (4/12) Nov 30 2009 Where's the Expression object defined in D? I don't think there's such a...

Ellery Newcomer (2/14) Nov 30 2009 DMD source, eg parse.c, expression.c, etc

Chad J (2/18) Nov 30 2009 Right. That.

Ellery Newcomer (17/35) Nov 30 2009 Not that I know anything about DMD outside parse.c, but in expression.h,...

Ary Borenszweig (3/47) Nov 30 2009 It would be better if dmd's source code implemented the visitor pattern....

Chad J (96/122) Nov 30 2009 I think I'll reply to both of you in one post since the thoughts are

Ellery Newcomer (32/72) Nov 30 2009 Personally, I've come to hate this pattern (just from reading DMD src).

Chad J (45/121) Dec 01 2009 Yeah, I can see that. There seems to be a lot of duplicated patterns

Ellery Newcomer (5/17) Dec 01 2009 expression.h

Don (3/31) Dec 03 2009 It works in the version of DMD in svn. Everything in Andrei's book

Chad J <chadjoan __spam.is.bad__gmail.com> writes:

Given an Expression object in dmd, I'd like to know how many
subexpressions it contains and, even better, iterate over them.  I'd
like to do this in a general way, without having to create cases for all
of the different kinds of Expressions.  Is there some way to do this
that I've been missing?

Thanks,
- Chad

Nov 29 2009

Ary Borenszweig <ary esperanto.org.ar> writes:

Chad J wrote:
 Given an Expression object in dmd, I'd like to know how many
 subexpressions it contains and, even better, iterate over them.  I'd
 like to do this in a general way, without having to create cases for all
 of the different kinds of Expressions.  Is there some way to do this
 that I've been missing?
 
 Thanks,
 - Chad

Where's the Expression object defined in D? I don't think there's such a 
thing (I mean, a reflection capability to get an expression out of 
something).

Nov 30 2009

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 11/30/2009 03:53 AM, Ary Borenszweig wrote:
 Chad J wrote:
 Given an Expression object in dmd, I'd like to know how many
 subexpressions it contains and, even better, iterate over them. I'd
 like to do this in a general way, without having to create cases for all
 of the different kinds of Expressions. Is there some way to do this
 that I've been missing?

 Thanks,
 - Chad

 Where's the Expression object defined in D? I don't think there's such a
 thing (I mean, a reflection capability to get an expression out of
 something).

DMD source, eg parse.c, expression.c, etc

Nov 30 2009

Chad J <chadjoan __spam.is.bad__gmail.com> writes:

Ellery Newcomer wrote:
 On 11/30/2009 03:53 AM, Ary Borenszweig wrote:
 Chad J wrote:
 Given an Expression object in dmd, I'd like to know how many
 subexpressions it contains and, even better, iterate over them. I'd
 like to do this in a general way, without having to create cases for all
 of the different kinds of Expressions. Is there some way to do this
 that I've been missing?

 Thanks,
 - Chad

 Where's the Expression object defined in D? I don't think there's such a
 thing (I mean, a reflection capability to get an expression out of
 something).

 
 DMD source, eg parse.c, expression.c, etc

Right.  That.

Nov 30 2009

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 11/30/2009 12:32 PM, Chad J wrote:
 Ellery Newcomer wrote:
 On 11/30/2009 03:53 AM, Ary Borenszweig wrote:
 Chad J wrote:
 Given an Expression object in dmd, I'd like to know how many
 subexpressions it contains and, even better, iterate over them. I'd
 like to do this in a general way, without having to create cases for all
 of the different kinds of Expressions. Is there some way to do this
 that I've been missing?

 Thanks,
 - Chad

 Where's the Expression object defined in D? I don't think there's such a
 thing (I mean, a reflection capability to get an expression out of
 something).

 DMD source, eg parse.c, expression.c, etc

 Right.  That.

Not that I know anything about DMD outside parse.c, but in expression.h, 
there be decls along the lines of

struct UnaExp{
   Expression* e1;
}
struct BinExp{
   Expression* e1;
   Expression* e2;
}

Iterating them would just be a tree walk. That takes care of 90% and ...

Oh. Yeah. There are a bunch of special cases.

But from what I've seen, iterating over an expression or any part of the 
AST involves implementing a function (like semantic) at each subclass. 
I'm betting there is no other way to do this.

I love ANTLR. It generates an arbitrary child/sibling AST, and walking 
it is a piece of cake.

Nov 30 2009

Ary Borenszweig <ary esperanto.org.ar> writes:

Ellery Newcomer wrote:
 On 11/30/2009 12:32 PM, Chad J wrote:
 Ellery Newcomer wrote:
 On 11/30/2009 03:53 AM, Ary Borenszweig wrote:
 Chad J wrote:
 Given an Expression object in dmd, I'd like to know how many
 subexpressions it contains and, even better, iterate over them. I'd
 like to do this in a general way, without having to create cases 
 for all
 of the different kinds of Expressions. Is there some way to do this
 that I've been missing?

 Thanks,
 - Chad

 Where's the Expression object defined in D? I don't think there's 
 such a
 thing (I mean, a reflection capability to get an expression out of
 something).

 DMD source, eg parse.c, expression.c, etc

 Right.  That.

 
 Not that I know anything about DMD outside parse.c, but in expression.h, 
 there be decls along the lines of
 
 struct UnaExp{
   Expression* e1;
 }
 struct BinExp{
   Expression* e1;
   Expression* e2;
 }
 
 Iterating them would just be a tree walk. That takes care of 90% and ...
 
 Oh. Yeah. There are a bunch of special cases.
 
 But from what I've seen, iterating over an expression or any part of the 
 AST involves implementing a function (like semantic) at each subclass. 
 I'm betting there is no other way to do this.
 
 I love ANTLR. It generates an arbitrary child/sibling AST, and walking 
 it is a piece of cake.

It would be better if dmd's source code implemented the visitor pattern. 
  Descent's port implements it and it uses it a a lot of places.

Nov 30 2009

Chad J <chadjoan __spam.is.bad__gmail.com> writes:

I think I'll reply to both of you in one post since the thoughts are
related.

Ary Borenszweig wrote:
 Ellery Newcomer wrote:
 Not that I know anything about DMD outside parse.c, but in
 expression.h, there be decls along the lines of

 struct UnaExp{
   Expression* e1;
 }
 struct BinExp{
   Expression* e1;
   Expression* e2;
 }

 Iterating them would just be a tree walk. That takes care of 90% and ...

 Oh. Yeah. There are a bunch of special cases.

 But from what I've seen, iterating over an expression or any part of
 the AST involves implementing a function (like semantic) at each
 subclass. I'm betting there is no other way to do this.


D'oh.  Yep that's what I was noticing too.  Thanks for confirming that.

 I love ANTLR. It generates an arbitrary child/sibling AST, and walking
 it is a piece of cake.

 
 It would be better if dmd's source code implemented the visitor pattern.
  Descent's port implements it and it uses it a a lot of places.

I wonder if Walter would go for it.  Is he very receptive of purely
architectural patches for dmd?

...

If you care for a bit of reading, I'll tell you my story and ping an idea.

This is about the property expression rewrite of course.  I'd love to
just use the current convention in dmd and write the rewrite as a
non-recursive function that gets called at every point in the tree
whenever someExpr->semantic(sc) is called.  However, there's a snag.

When doing the property rewrite, the inner-most subexpression will need
to generate the outermost temporaries.

An example:

prop1.prop2 += foo;

AFAIK, the compiler sees this as ((prop1).prop2) += foo; or
((prop1()).prop2()) += foo; after resolveProperties is called.  So prop1
is the innermost expression.

It would be rewritten as this comma expression:

(auto t1 = prop1()),
(auto t2 = t1.prop2()),
(t2 += foo),
(t1.prop2(t2)),
(prop1(t1))

So t1 is the outermost temporary, and it corresponds to prop1, the
innermost expression.

This is problematic in dmd's traditional approach, because the semantic
pass on prop1 does not have access to it's enclosing expression (as far
as I can tell).  If it did, I'd just go to the root of the tree, uproot
the tree, stick it in a comma expression, and just start nesting:

CommaExp // Step 1a for the rewrite.
|
+--> auto t1 = prop1()
|
+--> CommaExp // Step 1b
.    |
.    +--> CommaExp // Step 2a
.    |    |
.    |    +--> auto t2 = t1.prop2()
.    |    |
.    |    +--> CommaExp // Step 2b
.    |         |
.    |         +--> t2 += foo; // Originally the root.
.    |         |
.    |         +--> t1.prop2(t2);
.    |
.    +--> prop1(t1)

Then as long as the innermost expression does its rewrite before its
parent, then it will work.  If it doesn't go first, well I dunno.  I
don't seem to have this option anyways.  I can't find a way to get the
root.

Perhaps adding a root expression pointer to the Scope (sc) object that
gets tossed around semantic would work, but I haven't considered the
bookkeeping involved with that.  I already started taking a different
approach before thinking of that.

My plan then is to do the rewrite after the root's expr->semantic(sc)
call.  It will probably look like expr->doPropertyRewrite(), and be
called from statement.c.  But in there will be the guts of the function
that has a signature like this:

Expression *doPropertyRewriteGreedily(
    Expression *expr,    // The current expression being rewritten.
    Expressions *prolog, // Temporary decls and getter calls go here.
    Expressions *epilog, // Setter calls go here.
    int treatAsLvalue    // Is this expression mutated by its parent?
    )

This works out pretty nice since adding the temporaries and such into
the prolog and epilog looks very natural.  The function just calls
itself to walk the AST.

The problem is that walking the AST thing.  The function has to know
about expr's children in order to recurse on them.  This is easy in some
cases like postfix ++ and --, because that pattern is represented by one
subclass of Expression: PostExp.  Same for the properties themselves
(CallExp).  But then there's the more general patterns like all other
impure expressions (+=,-=,*=,/=, etc) and all pure expressions ever
(+,-,*,/,%,&,|, pure functions, literals, etc).  In a lot of those cases
the rewrite doesn't care about analyzing the expression beyond a certain
point, or at all, and it just wants to forward the children to the next
recursion level.  That's where I hit this iteration problem and all of
the special cases.

Now for the idea.

Currently I'm planning an approach that's a bit more minimalistic than
the visitor pattern; just to solve my immediate problem.  I will define
these methods in Expression:

virtual Expression *getChild(unsigned index);
virtual void setChild(unsigned index, Expression* val);
virtual unsigned numChildren();

Then I will implement them as needed in the subclasses of Expression.  I
figure I'd have to write this logic into my function anyways, so I might
as well make this usable to others.  I've already written a draft of the
property rewrite that uses these, so I know they get the job done in
principle.  I just hope this will all be OK in the patch (or even holds
up under debugging).

So yeah, if I haven't put you to sleep already, please let me know if I
messed up somewhere.

- Chad

Nov 30 2009

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 11/30/2009 07:59 PM, Chad J wrote:
 If you care for a bit of reading, I'll tell you my story and ping an idea.

 This is about the property expression rewrite of course.  I'd love to
 just use the current convention in dmd and write the rewrite as a
 non-recursive function that gets called at every point in the tree
 whenever someExpr->semantic(sc) is called.  However, there's a snag.

Personally, I've come to hate this pattern (just from reading DMD src). 
It seems like the antithesis of code reuse.

 When doing the property rewrite, the inner-most subexpression will need
 to generate the outermost temporaries.

 An example:

 prop1.prop2 += foo;

 AFAIK, the compiler sees this as ((prop1).prop2) += foo; or
 ((prop1()).prop2()) += foo; after resolveProperties is called.  So prop1
 is the innermost expression.

 It would be rewritten as this comma expression:

 (auto t1 = prop1()),
 (auto t2 = t1.prop2()),
 (t2 += foo),
 (t1.prop2(t2)),
 (prop1(t1))

This is an interesting problem. I like it. Is this rewrite condoned by 
the powers that be? It'd be fun to implement if I ever get this far in 
my semantic analyzer.

 So t1 is the outermost temporary, and it corresponds to prop1, the
 innermost expression.

 This is problematic in dmd's traditional approach, because the semantic
 pass on prop1 does not have access to it's enclosing expression (as far
 as I can tell).  If it did, I'd just go to the root of the tree, uproot
 the tree, stick it in a comma expression, and just start nesting:

I'm not convinced you need access to the enclosing expression to make 
this rewrite happen. Working it out on paper, it appears to be a matter 
of a few AST copies, keeping track of your t's, and building your comma 
tree (in two directions simultaneously) as you traverse the AST via eg 
semantic.

Here's some chicken scratch that more or less illustrates what I 
envision going on (I can't attest that DMD's ASTs look exactly like 
this, but I assume they're similar):

http://personal.utulsa.edu/~ellery-newcomer/scribble.png

How much effort it would entail is another matter though. I would assume 
you could just add a field or two to struct Expression for pushing 
information up the tree. It seems like copying functionality is already 
there. Pushing 't' information across the tree would probably propagate 
to each special case and be the most annoying part. But basically, every 
time you come across a property, you generate a new symbol and pair of 
pre/post trees. Then at the top, you make a single tree copy with the 
last symbol generated.

 Now for the idea.

 Currently I'm planning an approach that's a bit more minimalistic than
 the visitor pattern; just to solve my immediate problem.  I will define
 these methods in Expression:

 virtual Expression *getChild(unsigned index);
 virtual void setChild(unsigned index, Expression* val);
 virtual unsigned numChildren();

I can imagine this would be welcome regardless of how you set about your 
rewrite procedure. I came across a need for something like this as I was 
building my semantic analyzer a while back. Can't wait for school to get 
out so I can get back to work.

 Then I will implement them as needed in the subclasses of Expression.  I
 figure I'd have to write this logic into my function anyways, so I might
 as well make this usable to others.  I've already written a draft of the
 property rewrite that uses these, so I know they get the job done in
 principle.  I just hope this will all be OK in the patch (or even holds
 up under debugging).

 So yeah, if I haven't put you to sleep already, please let me know if I
 messed up somewhere.

I do have one nitpick:

auto t1 = p1();

isn't an expression, it's a declaration. That could potentially make 
things difficult for you, particularly when your property assignment is 
deep within an expression tree.

 - Chad

Nov 30 2009

Chad J <chadjoan __spam.is.bad__gmail.com> writes:

Ellery Newcomer wrote:
On 11/30/2009 07:59 PM, Chad J wrote:
This is about the property expression rewrite of course. I'd love to
just use the current convention in dmd and write the rewrite as a
non-recursive function that gets called at every point in the tree
whenever someExpr->semantic(sc) is called. However, there's a snag.

Personally, I've come to hate this pattern (just from reading DMD src).
It seems like the antithesis of code reuse.

Yeah, I can see that. There seems to be a lot of duplicated patterns
and things done by convention. That makes it a little more difficult to
figure out what is going on. (It's done this way... but *why*?)

I just don't want to alienate Walter from my patch by defying the style.
It may be inevitable in a minor way though.

When doing the property rewrite, the inner-most subexpression will need
to generate the outermost temporaries.

An example:

prop1.prop2 += foo;

AFAIK, the compiler sees this as ((prop1).prop2) += foo; or
((prop1()).prop2()) += foo; after resolveProperties is called. So prop1
is the innermost expression.

It would be rewritten as this comma expression:

(auto t1 = prop1()),
(auto t2 = t1.prop2()),
(t2 += foo),
(t1.prop2(t2)),
(prop1(t1))

This is an interesting problem. I like it. Is this rewrite condoned by
the powers that be? It'd be fun to implement if I ever get this far in
my semantic analyzer.

No guarantees, but a lot of promise.

http://erdani.com/d/thermopylae.pdf
On page 114 of the draft, 14 of the pdf, in section 4.1.10, at the
bottom: notice how Andrei seems to be hedging on properties working
correctly.

Now I can envision that behavior with array length being hacked in for
special cases. That would make the book example work but lack
generality since it wouldn't work for arbitrary expressions and maybe
not user-defined properties either.

Walter has already swung for explicit properties that require syntax
addition to the language (and annotations at the same time!) :
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.announce&article_id=16862

I think if he'll go for explicit properties, he'll want them to work
correctly too. Think about it... even if the compiler knows that the
appearance of some variable in an expression is actually a property, how
is it going to figure out whether or not the setter should be called and
what it's going to pass to the setter?

I conjecture that this property expression rewrite is quite necessary
for properties to work unsurprisingly in arbitrary side-effectful
expressions. Whether the properties are implicit or explicit is
unrelated, and affects other things instead.

So Walter is going to have to accept this or he will be in for some
nasty surprises when he tries to make his explicit properties work.

(Of course he could always decide that properties aren't worth it. That
would be unfortunate.)

http://prowiki.org/wiki4d/wiki.cgi?DocComments/Property
For more info, if you haven't seen it already.

So t1 is the outermost temporary, and it corresponds to prop1, the
innermost expression.

This is problematic in dmd's traditional approach, because the semantic
pass on prop1 does not have access to it's enclosing expression (as far
as I can tell). If it did, I'd just go to the root of the tree, uproot
the tree, stick it in a comma expression, and just start nesting:

I'm not convinced you need access to the enclosing expression to make
this rewrite happen. Working it out on paper, it appears to be a matter
of a few AST copies, keeping track of your t's, and building your comma
tree (in two directions simultaneously) as you traverse the AST via eg
semantic.

Here's some chicken scratch that more or less illustrates what I
envision going on (I can't attest that DMD's ASTs look exactly like
this, but I assume they're similar):

http://personal.utulsa.edu/~ellery-newcomer/scribble.png

How much effort it would entail is another matter though. I would assume
you could just add a field or two to struct Expression for pushing
information up the tree. It seems like copying functionality is already
there. Pushing 't' information across the tree would probably propagate
to each special case and be the most annoying part. But basically, every
time you come across a property, you generate a new symbol and pair of
pre/post trees. Then at the top, you make a single tree copy with the
last symbol generated.

Yeah. I think you're right.

So yeah, if I haven't put you to sleep already, please let me know if I
messed up somewhere.

I do have one nitpick:

auto t1 = p1();

isn't an expression, it's a declaration. That could potentially make
things difficult for you, particularly when your property assignment is
deep within an expression tree.

Oh my.

I just tried out a comma expression with declarations in some D code and
dmd didn't like it at all. In hindsight I should've checked that a long
time ago.

I did this though because I saw it happening in other parts of
expression.c. Maybe I missed some details. Or maybe the backend will
be fine with stuff like this. I wonder.

Yeah, if I can't put declaration expressions into other expressions,
then things may get ugly. I'd probably have to put more code into
statement.c at least.

Eh, I'll know how things work out soon enough.

Dec 01 2009

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 12/01/2009 02:35 PM, Chad J wrote:
 No guarantees, but a lot of promise.

 http://erdani.com/d/thermopylae.pdf
 On page 114 of the draft, 14 of the pdf, in section 4.1.10, at the
 bottom: notice how Andrei seems to be hedging on properties working
 correctly.

Oh goodie. We're going to get arr.length += 4; to actually work.


 I did this though because I saw it happening in other parts of
 expression.c.  Maybe I missed some details.  Or maybe the backend will
 be fine with stuff like this.  I wonder.

 Yeah, if I can't put declaration expressions into other expressions,
 then things may get ugly.  I'd probably have to put more code into
 statement.c at least.

 Eh, I'll know how things work out soon enough.

expression.h

struct DeclarationExp

looks like you'll be fine and what I said applies really only to the parser.

Dec 01 2009

Don <nospam nospam.com> writes:

Ellery Newcomer wrote:
 On 12/01/2009 02:35 PM, Chad J wrote:
 No guarantees, but a lot of promise.

 http://erdani.com/d/thermopylae.pdf
 On page 114 of the draft, 14 of the pdf, in section 4.1.10, at the
 bottom: notice how Andrei seems to be hedging on properties working
 correctly.

 
 Oh goodie. We're going to get arr.length += 4; to actually work.

It works in the version of DMD in svn. Everything in Andrei's book 
should be working now.

 I did this though because I saw it happening in other parts of
 expression.c.  Maybe I missed some details.  Or maybe the backend will
 be fine with stuff like this.  I wonder.

 Yeah, if I can't put declaration expressions into other expressions,
 then things may get ugly.  I'd probably have to put more code into
 statement.c at least.

 Eh, I'll know how things work out soon enough.

 
 expression.h
 
 struct DeclarationExp
 
 looks like you'll be fine and what I said applies really only to the 
 parser.

Dec 03 2009

D Programming

C/C++ Programming

Other

digitalmars.D.learn - How do I find the arity of an Expression? (dmd hacking)