www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - How do I find the arity of an Expression? (dmd hacking)

reply Chad J <chadjoan __spam.is.bad__gmail.com> writes:
Given an Expression object in dmd, I'd like to know how many
subexpressions it contains and, even better, iterate over them.  I'd
like to do this in a general way, without having to create cases for all
of the different kinds of Expressions.  Is there some way to do this
that I've been missing?

Thanks,
- Chad
Nov 29 2009
parent reply Ary Borenszweig <ary esperanto.org.ar> writes:
Chad J wrote:
 Given an Expression object in dmd, I'd like to know how many
 subexpressions it contains and, even better, iterate over them.  I'd
 like to do this in a general way, without having to create cases for all
 of the different kinds of Expressions.  Is there some way to do this
 that I've been missing?
 
 Thanks,
 - Chad
Where's the Expression object defined in D? I don't think there's such a thing (I mean, a reflection capability to get an expression out of something).
Nov 30 2009
parent reply Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
On 11/30/2009 03:53 AM, Ary Borenszweig wrote:
 Chad J wrote:
 Given an Expression object in dmd, I'd like to know how many
 subexpressions it contains and, even better, iterate over them. I'd
 like to do this in a general way, without having to create cases for all
 of the different kinds of Expressions. Is there some way to do this
 that I've been missing?

 Thanks,
 - Chad
Where's the Expression object defined in D? I don't think there's such a thing (I mean, a reflection capability to get an expression out of something).
DMD source, eg parse.c, expression.c, etc
Nov 30 2009
parent reply Chad J <chadjoan __spam.is.bad__gmail.com> writes:
Ellery Newcomer wrote:
 On 11/30/2009 03:53 AM, Ary Borenszweig wrote:
 Chad J wrote:
 Given an Expression object in dmd, I'd like to know how many
 subexpressions it contains and, even better, iterate over them. I'd
 like to do this in a general way, without having to create cases for all
 of the different kinds of Expressions. Is there some way to do this
 that I've been missing?

 Thanks,
 - Chad
Where's the Expression object defined in D? I don't think there's such a thing (I mean, a reflection capability to get an expression out of something).
DMD source, eg parse.c, expression.c, etc
Right. That.
Nov 30 2009
parent reply Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
On 11/30/2009 12:32 PM, Chad J wrote:
 Ellery Newcomer wrote:
 On 11/30/2009 03:53 AM, Ary Borenszweig wrote:
 Chad J wrote:
 Given an Expression object in dmd, I'd like to know how many
 subexpressions it contains and, even better, iterate over them. I'd
 like to do this in a general way, without having to create cases for all
 of the different kinds of Expressions. Is there some way to do this
 that I've been missing?

 Thanks,
 - Chad
Where's the Expression object defined in D? I don't think there's such a thing (I mean, a reflection capability to get an expression out of something).
DMD source, eg parse.c, expression.c, etc
Right. That.
Not that I know anything about DMD outside parse.c, but in expression.h, there be decls along the lines of struct UnaExp{ Expression* e1; } struct BinExp{ Expression* e1; Expression* e2; } Iterating them would just be a tree walk. That takes care of 90% and ... Oh. Yeah. There are a bunch of special cases. But from what I've seen, iterating over an expression or any part of the AST involves implementing a function (like semantic) at each subclass. I'm betting there is no other way to do this. I love ANTLR. It generates an arbitrary child/sibling AST, and walking it is a piece of cake.
Nov 30 2009
parent reply Ary Borenszweig <ary esperanto.org.ar> writes:
Ellery Newcomer wrote:
 On 11/30/2009 12:32 PM, Chad J wrote:
 Ellery Newcomer wrote:
 On 11/30/2009 03:53 AM, Ary Borenszweig wrote:
 Chad J wrote:
 Given an Expression object in dmd, I'd like to know how many
 subexpressions it contains and, even better, iterate over them. I'd
 like to do this in a general way, without having to create cases 
 for all
 of the different kinds of Expressions. Is there some way to do this
 that I've been missing?

 Thanks,
 - Chad
Where's the Expression object defined in D? I don't think there's such a thing (I mean, a reflection capability to get an expression out of something).
DMD source, eg parse.c, expression.c, etc
Right. That.
Not that I know anything about DMD outside parse.c, but in expression.h, there be decls along the lines of struct UnaExp{ Expression* e1; } struct BinExp{ Expression* e1; Expression* e2; } Iterating them would just be a tree walk. That takes care of 90% and ... Oh. Yeah. There are a bunch of special cases. But from what I've seen, iterating over an expression or any part of the AST involves implementing a function (like semantic) at each subclass. I'm betting there is no other way to do this. I love ANTLR. It generates an arbitrary child/sibling AST, and walking it is a piece of cake.
It would be better if dmd's source code implemented the visitor pattern. Descent's port implements it and it uses it a a lot of places.
Nov 30 2009
parent reply Chad J <chadjoan __spam.is.bad__gmail.com> writes:
I think I'll reply to both of you in one post since the thoughts are
related.

Ary Borenszweig wrote:
 Ellery Newcomer wrote:
 Not that I know anything about DMD outside parse.c, but in
 expression.h, there be decls along the lines of

 struct UnaExp{
   Expression* e1;
 }
 struct BinExp{
   Expression* e1;
   Expression* e2;
 }

 Iterating them would just be a tree walk. That takes care of 90% and ...

 Oh. Yeah. There are a bunch of special cases.

 But from what I've seen, iterating over an expression or any part of
 the AST involves implementing a function (like semantic) at each
 subclass. I'm betting there is no other way to do this.
D'oh. Yep that's what I was noticing too. Thanks for confirming that.
 I love ANTLR. It generates an arbitrary child/sibling AST, and walking
 it is a piece of cake.
It would be better if dmd's source code implemented the visitor pattern. Descent's port implements it and it uses it a a lot of places.
I wonder if Walter would go for it. Is he very receptive of purely architectural patches for dmd? ... If you care for a bit of reading, I'll tell you my story and ping an idea. This is about the property expression rewrite of course. I'd love to just use the current convention in dmd and write the rewrite as a non-recursive function that gets called at every point in the tree whenever someExpr->semantic(sc) is called. However, there's a snag. When doing the property rewrite, the inner-most subexpression will need to generate the outermost temporaries. An example: prop1.prop2 += foo; AFAIK, the compiler sees this as ((prop1).prop2) += foo; or ((prop1()).prop2()) += foo; after resolveProperties is called. So prop1 is the innermost expression. It would be rewritten as this comma expression: (auto t1 = prop1()), (auto t2 = t1.prop2()), (t2 += foo), (t1.prop2(t2)), (prop1(t1)) So t1 is the outermost temporary, and it corresponds to prop1, the innermost expression. This is problematic in dmd's traditional approach, because the semantic pass on prop1 does not have access to it's enclosing expression (as far as I can tell). If it did, I'd just go to the root of the tree, uproot the tree, stick it in a comma expression, and just start nesting: CommaExp // Step 1a for the rewrite. | +--> auto t1 = prop1() | +--> CommaExp // Step 1b . | . +--> CommaExp // Step 2a . | | . | +--> auto t2 = t1.prop2() . | | . | +--> CommaExp // Step 2b . | | . | +--> t2 += foo; // Originally the root. . | | . | +--> t1.prop2(t2); . | . +--> prop1(t1) Then as long as the innermost expression does its rewrite before its parent, then it will work. If it doesn't go first, well I dunno. I don't seem to have this option anyways. I can't find a way to get the root. Perhaps adding a root expression pointer to the Scope (sc) object that gets tossed around semantic would work, but I haven't considered the bookkeeping involved with that. I already started taking a different approach before thinking of that. My plan then is to do the rewrite after the root's expr->semantic(sc) call. It will probably look like expr->doPropertyRewrite(), and be called from statement.c. But in there will be the guts of the function that has a signature like this: Expression *doPropertyRewriteGreedily( Expression *expr, // The current expression being rewritten. Expressions *prolog, // Temporary decls and getter calls go here. Expressions *epilog, // Setter calls go here. int treatAsLvalue // Is this expression mutated by its parent? ) This works out pretty nice since adding the temporaries and such into the prolog and epilog looks very natural. The function just calls itself to walk the AST. The problem is that walking the AST thing. The function has to know about expr's children in order to recurse on them. This is easy in some cases like postfix ++ and --, because that pattern is represented by one subclass of Expression: PostExp. Same for the properties themselves (CallExp). But then there's the more general patterns like all other impure expressions (+=,-=,*=,/=, etc) and all pure expressions ever (+,-,*,/,%,&,|, pure functions, literals, etc). In a lot of those cases the rewrite doesn't care about analyzing the expression beyond a certain point, or at all, and it just wants to forward the children to the next recursion level. That's where I hit this iteration problem and all of the special cases. Now for the idea. Currently I'm planning an approach that's a bit more minimalistic than the visitor pattern; just to solve my immediate problem. I will define these methods in Expression: virtual Expression *getChild(unsigned index); virtual void setChild(unsigned index, Expression* val); virtual unsigned numChildren(); Then I will implement them as needed in the subclasses of Expression. I figure I'd have to write this logic into my function anyways, so I might as well make this usable to others. I've already written a draft of the property rewrite that uses these, so I know they get the job done in principle. I just hope this will all be OK in the patch (or even holds up under debugging). So yeah, if I haven't put you to sleep already, please let me know if I messed up somewhere. - Chad
Nov 30 2009
parent reply Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
On 11/30/2009 07:59 PM, Chad J wrote:
 If you care for a bit of reading, I'll tell you my story and ping an idea.

 This is about the property expression rewrite of course.  I'd love to
 just use the current convention in dmd and write the rewrite as a
 non-recursive function that gets called at every point in the tree
 whenever someExpr->semantic(sc) is called.  However, there's a snag.
Personally, I've come to hate this pattern (just from reading DMD src). It seems like the antithesis of code reuse.
 When doing the property rewrite, the inner-most subexpression will need
 to generate the outermost temporaries.

 An example:

 prop1.prop2 += foo;

 AFAIK, the compiler sees this as ((prop1).prop2) += foo; or
 ((prop1()).prop2()) += foo; after resolveProperties is called.  So prop1
 is the innermost expression.

 It would be rewritten as this comma expression:

 (auto t1 = prop1()),
 (auto t2 = t1.prop2()),
 (t2 += foo),
 (t1.prop2(t2)),
 (prop1(t1))
This is an interesting problem. I like it. Is this rewrite condoned by the powers that be? It'd be fun to implement if I ever get this far in my semantic analyzer.
 So t1 is the outermost temporary, and it corresponds to prop1, the
 innermost expression.

 This is problematic in dmd's traditional approach, because the semantic
 pass on prop1 does not have access to it's enclosing expression (as far
 as I can tell).  If it did, I'd just go to the root of the tree, uproot
 the tree, stick it in a comma expression, and just start nesting:
I'm not convinced you need access to the enclosing expression to make this rewrite happen. Working it out on paper, it appears to be a matter of a few AST copies, keeping track of your t's, and building your comma tree (in two directions simultaneously) as you traverse the AST via eg semantic. Here's some chicken scratch that more or less illustrates what I envision going on (I can't attest that DMD's ASTs look exactly like this, but I assume they're similar): http://personal.utulsa.edu/~ellery-newcomer/scribble.png How much effort it would entail is another matter though. I would assume you could just add a field or two to struct Expression for pushing information up the tree. It seems like copying functionality is already there. Pushing 't' information across the tree would probably propagate to each special case and be the most annoying part. But basically, every time you come across a property, you generate a new symbol and pair of pre/post trees. Then at the top, you make a single tree copy with the last symbol generated.
 Now for the idea.

 Currently I'm planning an approach that's a bit more minimalistic than
 the visitor pattern; just to solve my immediate problem.  I will define
 these methods in Expression:

 virtual Expression *getChild(unsigned index);
 virtual void setChild(unsigned index, Expression* val);
 virtual unsigned numChildren();
I can imagine this would be welcome regardless of how you set about your rewrite procedure. I came across a need for something like this as I was building my semantic analyzer a while back. Can't wait for school to get out so I can get back to work.
 Then I will implement them as needed in the subclasses of Expression.  I
 figure I'd have to write this logic into my function anyways, so I might
 as well make this usable to others.  I've already written a draft of the
 property rewrite that uses these, so I know they get the job done in
 principle.  I just hope this will all be OK in the patch (or even holds
 up under debugging).

 So yeah, if I haven't put you to sleep already, please let me know if I
 messed up somewhere.
I do have one nitpick: auto t1 = p1(); isn't an expression, it's a declaration. That could potentially make things difficult for you, particularly when your property assignment is deep within an expression tree.
 - Chad
Nov 30 2009
parent reply Chad J <chadjoan __spam.is.bad__gmail.com> writes:
Ellery Newcomer wrote:
 On 11/30/2009 07:59 PM, Chad J wrote:
 This is about the property expression rewrite of course.  I'd love to
 just use the current convention in dmd and write the rewrite as a
 non-recursive function that gets called at every point in the tree
 whenever someExpr->semantic(sc) is called.  However, there's a snag.
Personally, I've come to hate this pattern (just from reading DMD src). It seems like the antithesis of code reuse.
Yeah, I can see that. There seems to be a lot of duplicated patterns and things done by convention. That makes it a little more difficult to figure out what is going on. (It's done this way... but *why*?) I just don't want to alienate Walter from my patch by defying the style. It may be inevitable in a minor way though.
 When doing the property rewrite, the inner-most subexpression will need
 to generate the outermost temporaries.

 An example:

 prop1.prop2 += foo;

 AFAIK, the compiler sees this as ((prop1).prop2) += foo; or
 ((prop1()).prop2()) += foo; after resolveProperties is called.  So prop1
 is the innermost expression.

 It would be rewritten as this comma expression:

 (auto t1 = prop1()),
 (auto t2 = t1.prop2()),
 (t2 += foo),
 (t1.prop2(t2)),
 (prop1(t1))
This is an interesting problem. I like it. Is this rewrite condoned by the powers that be? It'd be fun to implement if I ever get this far in my semantic analyzer.
No guarantees, but a lot of promise. http://erdani.com/d/thermopylae.pdf On page 114 of the draft, 14 of the pdf, in section 4.1.10, at the bottom: notice how Andrei seems to be hedging on properties working correctly. Now I can envision that behavior with array length being hacked in for special cases. That would make the book example work but lack generality since it wouldn't work for arbitrary expressions and maybe not user-defined properties either. Walter has already swung for explicit properties that require syntax addition to the language (and annotations at the same time!) : http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.announce&article_id=16862 I think if he'll go for explicit properties, he'll want them to work correctly too. Think about it... even if the compiler knows that the appearance of some variable in an expression is actually a property, how is it going to figure out whether or not the setter should be called and what it's going to pass to the setter? I conjecture that this property expression rewrite is quite necessary for properties to work unsurprisingly in arbitrary side-effectful expressions. Whether the properties are implicit or explicit is unrelated, and affects other things instead. So Walter is going to have to accept this or he will be in for some nasty surprises when he tries to make his explicit properties work. (Of course he could always decide that properties aren't worth it. That would be unfortunate.) http://prowiki.org/wiki4d/wiki.cgi?DocComments/Property For more info, if you haven't seen it already.
 So t1 is the outermost temporary, and it corresponds to prop1, the
 innermost expression.

 This is problematic in dmd's traditional approach, because the semantic
 pass on prop1 does not have access to it's enclosing expression (as far
 as I can tell).  If it did, I'd just go to the root of the tree, uproot
 the tree, stick it in a comma expression, and just start nesting:
I'm not convinced you need access to the enclosing expression to make this rewrite happen. Working it out on paper, it appears to be a matter of a few AST copies, keeping track of your t's, and building your comma tree (in two directions simultaneously) as you traverse the AST via eg semantic. Here's some chicken scratch that more or less illustrates what I envision going on (I can't attest that DMD's ASTs look exactly like this, but I assume they're similar): http://personal.utulsa.edu/~ellery-newcomer/scribble.png How much effort it would entail is another matter though. I would assume you could just add a field or two to struct Expression for pushing information up the tree. It seems like copying functionality is already there. Pushing 't' information across the tree would probably propagate to each special case and be the most annoying part. But basically, every time you come across a property, you generate a new symbol and pair of pre/post trees. Then at the top, you make a single tree copy with the last symbol generated.
Yeah. I think you're right.
 So yeah, if I haven't put you to sleep already, please let me know if I
 messed up somewhere.
I do have one nitpick: auto t1 = p1(); isn't an expression, it's a declaration. That could potentially make things difficult for you, particularly when your property assignment is deep within an expression tree.
Oh my. I just tried out a comma expression with declarations in some D code and dmd didn't like it at all. In hindsight I should've checked that a long time ago. I did this though because I saw it happening in other parts of expression.c. Maybe I missed some details. Or maybe the backend will be fine with stuff like this. I wonder. Yeah, if I can't put declaration expressions into other expressions, then things may get ugly. I'd probably have to put more code into statement.c at least. Eh, I'll know how things work out soon enough.
Dec 01 2009
parent reply Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
On 12/01/2009 02:35 PM, Chad J wrote:
 No guarantees, but a lot of promise.

 http://erdani.com/d/thermopylae.pdf
 On page 114 of the draft, 14 of the pdf, in section 4.1.10, at the
 bottom: notice how Andrei seems to be hedging on properties working
 correctly.
Oh goodie. We're going to get arr.length += 4; to actually work.
 I did this though because I saw it happening in other parts of
 expression.c.  Maybe I missed some details.  Or maybe the backend will
 be fine with stuff like this.  I wonder.

 Yeah, if I can't put declaration expressions into other expressions,
 then things may get ugly.  I'd probably have to put more code into
 statement.c at least.

 Eh, I'll know how things work out soon enough.
expression.h struct DeclarationExp looks like you'll be fine and what I said applies really only to the parser.
Dec 01 2009
parent Don <nospam nospam.com> writes:
Ellery Newcomer wrote:
 On 12/01/2009 02:35 PM, Chad J wrote:
 No guarantees, but a lot of promise.

 http://erdani.com/d/thermopylae.pdf
 On page 114 of the draft, 14 of the pdf, in section 4.1.10, at the
 bottom: notice how Andrei seems to be hedging on properties working
 correctly.
Oh goodie. We're going to get arr.length += 4; to actually work.
It works in the version of DMD in svn. Everything in Andrei's book should be working now.
 I did this though because I saw it happening in other parts of
 expression.c.  Maybe I missed some details.  Or maybe the backend will
 be fine with stuff like this.  I wonder.

 Yeah, if I can't put declaration expressions into other expressions,
 then things may get ugly.  I'd probably have to put more code into
 statement.c at least.

 Eh, I'll know how things work out soon enough.
expression.h struct DeclarationExp looks like you'll be fine and what I said applies really only to the parser.
Dec 03 2009