digitalmars.D - The magic behind foreach (was: Re: Descent 0.5.3 released)
- Ary Borenszweig (97/106) Jan 21 2009 I still have to work on some stuff, but...
- Ary Borenszweig (3/134) Jan 21 2009 I mean:
- Denis Koroskin (9/117) Jan 21 2009 A side question: does foreach loop allocate memory (for delegate) in D2?...
- Christopher Wright (4/14) Jan 22 2009 This only fails if you wish to take a particular action when the calling...
- Bill Baxter (12/27) Jan 22 2009 It's not?
- Ary Borenszweig (55/85) Jan 22 2009 Why do you mean by "fails"? The compiler transforms the foreach's body,
- Bill Baxter (8/96) Jan 22 2009 I posted a proposal for how to hide the magic int from the user a
- Jarrett Billingsley (17/22) Jan 22 2009 Consider the case where you return a value inside a foreach loop:
- Bill Baxter (7/31) Jan 22 2009 Yep, that was the gist of it. Put the return value on the stack of
- Jarrett Billingsley (3/7) Jan 22 2009 Oh, I'm suggesting that the compiler do this instead of us having to
- Bill Baxter (11/19) Jan 22 2009 I see what you're saying now. So the user has to manipulate a bool
- Jarrett Billingsley (12/21) Jan 22 2009 Right. Granted, it's not perfect - it's still possible to write an
- Christopher Wright (14/28) Jan 23 2009 You didn't read my post carefully.
- Ary Borenszweig (2/32) Jan 24 2009 Why would you ever want to do that??
- Bill Baxter (7/40) Jan 24 2009 I could have read it 1000 times and it wouldn't have helped. It was
- Denis Koroskin (4/112) Jan 23 2009 Does AST preserve information whether a closure is static of dynamic?
Ary Borenszweig wrote:BCS wrote:I still have to work on some stuff, but... Before: --- module main; import std.stdio; class Foo { uint array[2]; int opApply(int delegate(ref uint) dg) { int result = 0; for(int i = 0; i < array.length; i++) { result = dg(array[i]); if(result) break; } return result; } } int main(char[][] args) { Foo foo = new Foo(); foreach(x; foo) { if (x == 3) { break; } writefln("%s", x); } return 0; } --- After: --- module main; import object; import std.stdio; class Foo: Object { uint[2] array; int opApply(int delegate(ref uint) dg) { assert(this, "null this"); { int result = 0; for(int i = 0; cast(uint) i < 2; i++) { result = dg(this.array[cast(uint) i]); if(result) break; } return result; } } } int main(char[][] args) { Foo foo = new Foo; foo.opApply(delegate (uint __applyArg0) { { { uint x = __applyArg0; if(x == 3) return 1; writefln("%s", x); } return 0; } } ); return 0; } --- Ummm... I was wondering... In every implemetation of opApply, after you invoke the delegate you must check the result to see if it's non zero, right? In that case, you must break the iteration. If the compiler can transform a "foreach" into an opApply call, passing the foreach body and converting breaks to "return 1" statements... can't opApply be specified as: int opApply(void delegate(ref uint) dg) { // note: delegate returns void } and the compiler transforms the opApply signature to the one that's used now, plus converting each dg call to a call and a check of return value != 0 and return 1 in that case? Yes, yes, this is not trivial at all, but it's possible. And then D programmers no longer have to make ifs and return magic numbers to make foreach work. (think: do once in the compiler, eliminate thousands of boilerplate codes in programs) So basically: --- int opApply(void delegate(ref uint) dg) { for(int i = 0; i < array.length; i++) { dg(array[i]); } --- would be converted to: --- int opApply(void delegate(ref uint) dg) { for(int i = 0; i < array.length; i++) { result = dg(array[i]); if (result) return 1; } --- What do you think? (By the way, why opApply returns an int? What's the use of that?)Reply to Robert,Ok, I'll work on it. :-)That doesn't look entirely useless, especially for optimization. Perhaps hard to read, but easier than reading the assembly output ;-P!ditto; now that you have it might as well make it available.
Jan 21 2009
Ary Borenszweig wrote:Ary Borenszweig wrote: > BCS wrote: >> Reply to Robert, >> >>> That doesn't look entirely useless, especially for optimization. >>> Perhaps hard to read, but easier than reading the assembly output ;-P! >>> >> >> ditto; now that you have it might as well make it available. > > Ok, I'll work on it. :-) I still have to work on some stuff, but... Before: --- module main; import std.stdio; class Foo { uint array[2]; int opApply(int delegate(ref uint) dg) { int result = 0; for(int i = 0; i < array.length; i++) { result = dg(array[i]); if(result) break; } return result; } } int main(char[][] args) { Foo foo = new Foo(); foreach(x; foo) { if (x == 3) { break; } writefln("%s", x); } return 0; } --- After: --- module main; import object; import std.stdio; class Foo: Object { uint[2] array; int opApply(int delegate(ref uint) dg) { assert(this, "null this"); { int result = 0; for(int i = 0; cast(uint) i < 2; i++) { result = dg(this.array[cast(uint) i]); if(result) break; } return result; } } } int main(char[][] args) { Foo foo = new Foo; foo.opApply(delegate (uint __applyArg0) { { { uint x = __applyArg0; if(x == 3) return 1; writefln("%s", x); } return 0; } } ); return 0; } --- Ummm... I was wondering... In every implemetation of opApply, after you invoke the delegate you must check the result to see if it's non zero, right? In that case, you must break the iteration. If the compiler can transform a "foreach" into an opApply call, passing the foreach body and converting breaks to "return 1" statements... can't opApply be specified as: int opApply(void delegate(ref uint) dg) { // note: delegate returns void } and the compiler transforms the opApply signature to the one that's used now, plus converting each dg call to a call and a check of return value != 0 and return 1 in that case? Yes, yes, this is not trivial at all, but it's possible. And then D programmers no longer have to make ifs and return magic numbers to make foreach work. (think: do once in the compiler, eliminate thousands of boilerplate codes in programs) So basically: --- int opApply(void delegate(ref uint) dg) { for(int i = 0; i < array.length; i++) { dg(array[i]); } --- would be converted to: --- int opApply(void delegate(ref uint) dg) {I mean: int opApply(int delegate(ref uint) dg) {for(int i = 0; i < array.length; i++) { result = dg(array[i]); if (result) return 1; } --- What do you think? (By the way, why opApply returns an int? What's the use of that?)
Jan 21 2009
On Thu, 22 Jan 2009 09:30:15 +0300, Ary Borenszweig <ary esperanto.org.ar> wrote:Ary Borenszweig wrote: > BCS wrote: >> Reply to Robert, >> >>> That doesn't look entirely useless, especially for optimization. >>> Perhaps hard to read, but easier than reading the assembly output ;-P! >>> >> >> ditto; now that you have it might as well make it available. > > Ok, I'll work on it. :-) I still have to work on some stuff, but... Before: --- module main; import std.stdio; class Foo { uint array[2]; int opApply(int delegate(ref uint) dg) { int result = 0; for(int i = 0; i < array.length; i++) { result = dg(array[i]); if(result) break; } return result; } } int main(char[][] args) { Foo foo = new Foo(); foreach(x; foo) { if (x == 3) { break; } writefln("%s", x); } return 0; } --- After: --- module main; import object; import std.stdio; class Foo: Object { uint[2] array; int opApply(int delegate(ref uint) dg) { assert(this, "null this"); { int result = 0; for(int i = 0; cast(uint) i < 2; i++) { result = dg(this.array[cast(uint) i]); if(result) break; } return result; } } } int main(char[][] args) { Foo foo = new Foo; foo.opApply(delegate (uint __applyArg0) { { { uint x = __applyArg0; if(x == 3) return 1; writefln("%s", x); } return 0; } } ); return 0; } --- Ummm... I was wondering... In every implemetation of opApply, after you invoke the delegate you must check the result to see if it's non zero, right? In that case, you must break the iteration. If the compiler can transform a "foreach" into an opApply call, passing the foreach body and converting breaks to "return 1" statements... can't opApply be specified as: int opApply(void delegate(ref uint) dg) { // note: delegate returns void } and the compiler transforms the opApply signature to the one that's used now, plus converting each dg call to a call and a check of return value != 0 and return 1 in that case? Yes, yes, this is not trivial at all, but it's possible. And then D programmers no longer have to make ifs and return magic numbers to make foreach work. (think: do once in the compiler, eliminate thousands of boilerplate codes in programs) So basically: --- int opApply(void delegate(ref uint) dg) { for(int i = 0; i < array.length; i++) { dg(array[i]); } --- would be converted to: --- int opApply(void delegate(ref uint) dg) { for(int i = 0; i < array.length; i++) { result = dg(array[i]); if (result) return 1; } --- What do you think? (By the way, why opApply returns an int? What's the use of that?)A side question: does foreach loop allocate memory (for delegate) in D2? It certainly shouldn't, but escape analysis can't prove that it is safe to allocate closure on stack: int delegate(ref int) _dg; class Foo { int opApply(int delegate(ref int) dg) { _dg = dg; // dang! } } Perhaps, documentation should indicate that opApply mustn't store the delegate that is passed to it (it makes little sense anyway) and safely make foreach delegates static (if it is not done yet).
Jan 21 2009
Ary Borenszweig wrote:If the compiler can transform a "foreach" into an opApply call, passing the foreach body and converting breaks to "return 1" statements... can't opApply be specified as: int opApply(void delegate(ref uint) dg) { // note: delegate returns void } and the compiler transforms the opApply signature to the one that's used now, plus converting each dg call to a call and a check of return value != 0 and return 1 in that case?This only fails if you wish to take a particular action when the calling code breaks out of iteration. This is not such a large use case that I think it worth preserving.
Jan 22 2009
On Fri, Jan 23, 2009 at 8:10 AM, Christopher Wright <dhasenan gmail.com> wrote:Ary Borenszweig wrote:It's not? foreach(i; things) { if (i==a) continue; if (i==b) break; if (i==d) return; if (i==c) goto somewhere; } Those are all fairly common things to do from inside the 'dg' call. The int is how the compiler distinguishes which case got you out of the dg. --bbIf the compiler can transform a "foreach" into an opApply call, passing the foreach body and converting breaks to "return 1" statements... can't opApply be specified as: int opApply(void delegate(ref uint) dg) { // note: delegate returns void } and the compiler transforms the opApply signature to the one that's used now, plus converting each dg call to a call and a check of return value != 0 and return 1 in that case?This only fails if you wish to take a particular action when the calling code breaks out of iteration. This is not such a large use case that I think it worth preserving.
Jan 22 2009
Bill Baxter wrote:On Fri, Jan 23, 2009 at 8:10 AM, Christopher Wright <dhasenan gmail.com> wrote:Why do you mean by "fails"? The compiler transforms the foreach's body, it can transform the opApply's body.Ary Borenszweig wrote:If the compiler can transform a "foreach" into an opApply call, passing the foreach body and converting breaks to "return 1" statements... can't opApply be specified as: int opApply(void delegate(ref uint) dg) { // note: delegate returns void } and the compiler transforms the opApply signature to the one that's used now, plus converting each dg call to a call and a check of return value != 0 and return 1 in that case?This only fails if you wish to take a particular action when the calling code breaks out of iteration. This is not such a large use case that I think it worth preserving.It's not? foreach(i; things) { if (i==a) continue; if (i==b) break; if (i==d) return; if (i==c) goto somewhere; } Those are all fairly common things to do from inside the 'dg' call. The int is how the compiler distinguishes which case got you out of the dg. --bbAaaah... Now I see what's the return value of opApply for. So I tried your code: (just the relevant piece) --- int main(char[][] args) { int a = 1, b = 2, c = 3, d = 4; Foo foo = new Foo(); foreach(i; foo) { if (i==a) continue; if (i==b) break; if (i==d) return; if (i==c) goto somewhere; } somewhere: return 0; } --- and DMD spits out this: --- int main(char[][] args) { int a = 1, b = 2, c = 3, d = 4; Foo foo = new Foo; switch(foo.opApply(delegate (uint __applyArg0) { { uint i = __applyArg0; if(i == cast(uint) a) return 0; if(i == cast(uint) b) return 1; if(i == cast(uint) d) return 2; if(i == cast(uint) c) return 3; } return 0; } )) { default: break; case 2: return; case 3: goto somewhere; } somewhere: return 0; } --- Intersting. The compiler (Walter?) is being smart here. :-)
Jan 22 2009
On Fri, Jan 23, 2009 at 8:52 AM, Ary Borenszweig <ary esperanto.org.ar> wrote:Bill Baxter wrote:I posted a proposal for how to hide the magic int from the user a while back, but my conclusion was that my approach would require AST macros in order to give it a reasonable syntax. With the ast macros you'd be able to do something like yield(i) in the body of your foreach, where yield is an appropriately defined macro. If anyone is interested I'll try to dig it up from the archives. --bbOn Fri, Jan 23, 2009 at 8:10 AM, Christopher Wright <dhasenan gmail.com> wrote:Why do you mean by "fails"? The compiler transforms the foreach's body, it can transform the opApply's body.Ary Borenszweig wrote:If the compiler can transform a "foreach" into an opApply call, passing the foreach body and converting breaks to "return 1" statements... can't opApply be specified as: int opApply(void delegate(ref uint) dg) { // note: delegate returns void } and the compiler transforms the opApply signature to the one that's used now, plus converting each dg call to a call and a check of return value != 0 and return 1 in that case?This only fails if you wish to take a particular action when the calling code breaks out of iteration. This is not such a large use case that I think it worth preserving.It's not? foreach(i; things) { if (i==a) continue; if (i==b) break; if (i==d) return; if (i==c) goto somewhere; } Those are all fairly common things to do from inside the 'dg' call. The int is how the compiler distinguishes which case got you out of the dg. --bbAaaah... Now I see what's the return value of opApply for. So I tried your code: (just the relevant piece) --- int main(char[][] args) { int a = 1, b = 2, c = 3, d = 4; Foo foo = new Foo(); foreach(i; foo) { if (i==a) continue; if (i==b) break; if (i==d) return; if (i==c) goto somewhere; } somewhere: return 0; } --- and DMD spits out this: --- int main(char[][] args) { int a = 1, b = 2, c = 3, d = 4; Foo foo = new Foo; switch(foo.opApply(delegate (uint __applyArg0) { { uint i = __applyArg0; if(i == cast(uint) a) return 0; if(i == cast(uint) b) return 1; if(i == cast(uint) d) return 2; if(i == cast(uint) c) return 3; } return 0; } )) { default: break; case 2: return; case 3: goto somewhere; } somewhere: return 0; } --- Intersting. The compiler (Walter?) is being smart here. :-)
Jan 22 2009
On Thu, Jan 22, 2009 at 7:00 PM, Bill Baxter <wbaxter gmail.com> wrote:I posted a proposal for how to hide the magic int from the user a while back, but my conclusion was that my approach would require AST macros in order to give it a reasonable syntax. With the ast macros you'd be able to do something like yield(i) in the body of your foreach, where yield is an appropriately defined macro.Consider the case where you return a value inside a foreach loop: int find(char[] value) { foreach(k, v; someAA) if(v == value) return k; } How is this implemented? It puts a local variable in the stack frame of find(), called __result. Then the foreach delegate just writes into that local when it returns (__result = k; return IM_RETURNING_NOW;), and the compiler-generated cruft returns the value of that local from find() (switch(...) { case IM_RETURNING_NOW: return __result; }). The delegate return status code doesn't have to be any different. Just have the delegate return a bool (should iteration stop?) and put its actual return status in the stack frame of the enclosing function.
Jan 22 2009
On Fri, Jan 23, 2009 at 11:15 AM, Jarrett Billingsley <jarrett.billingsley gmail.com> wrote:On Thu, Jan 22, 2009 at 7:00 PM, Bill Baxter <wbaxter gmail.com> wrote:Yep, that was the gist of it. Put the return value on the stack of the calling frame. Nothing all that fancy really, just you want to be able to hide that __result = blah ugliness from the user. --bbI posted a proposal for how to hide the magic int from the user a while back, but my conclusion was that my approach would require AST macros in order to give it a reasonable syntax. With the ast macros you'd be able to do something like yield(i) in the body of your foreach, where yield is an appropriately defined macro.Consider the case where you return a value inside a foreach loop: int find(char[] value) { foreach(k, v; someAA) if(v == value) return k; } How is this implemented? It puts a local variable in the stack frame of find(), called __result. Then the foreach delegate just writes into that local when it returns (__result = k; return IM_RETURNING_NOW;), and the compiler-generated cruft returns the value of that local from find() (switch(...) { case IM_RETURNING_NOW: return __result; }). The delegate return status code doesn't have to be any different. Just have the delegate return a bool (should iteration stop?) and put its actual return status in the stack frame of the enclosing function.
Jan 22 2009
On Thu, Jan 22, 2009 at 9:38 PM, Bill Baxter <wbaxter gmail.com> wrote:Yep, that was the gist of it. Put the return value on the stack of the calling frame. Nothing all that fancy really, just you want to be able to hide that __result = blah ugliness from the user.Oh, I'm suggesting that the compiler do this instead of us having to do it with macros.
Jan 22 2009
On Fri, Jan 23, 2009 at 11:59 AM, Jarrett Billingsley <jarrett.billingsley gmail.com> wrote:On Thu, Jan 22, 2009 at 9:38 PM, Bill Baxter <wbaxter gmail.com> wrote:I see what you're saying now. So the user has to manipulate a bool return value instead of an int? Then you could have void opApply(bool delegate(ref int) dg) { foreach(i; internal_) { if (dg(i)) return; } } Yep, that looks like an improvement to me.Yep, that was the gist of it. Put the return value on the stack of the calling frame. Nothing all that fancy really, just you want to be able to hide that __result = blah ugliness from the user.Oh, I'm suggesting that the compiler do this instead of us having to do it with macros.
Jan 22 2009
On Fri, Jan 23, 2009 at 12:30 AM, Bill Baxter <wbaxter gmail.com> wrote:I see what you're saying now. So the user has to manipulate a bool return value instead of an int? Then you could have void opApply(bool delegate(ref int) dg) { foreach(i; internal_) { if (dg(i)) return; } } Yep, that looks like an improvement to me.Right. Granted, it's not perfect - it's still possible to write an opApply that doesn't respect breaks/returns/gotos out of the loop - but it's much more straightforward. A "perfect" opApply would probably look like: void opApply(void delegate(ref int) dg) { foreach(i; mData) dg(i); } but it would probably have to abuse exceptions to accomplish this, which doesn't sound like a great idea to me.
Jan 22 2009
Bill Baxter wrote:It's not? foreach(i; things) { if (i==a) continue; if (i==b) break; if (i==d) return; if (i==c) goto somewhere; } Those are all fairly common things to do from inside the 'dg' call. The int is how the compiler distinguishes which case got you out of the dg. --bbYou didn't read my post carefully. The use case you would lose is: int opApply (int delegate(something) dg) { auto result = dg(stuff); if (result) { // This would be impossible if we took a void delegate. log ("iteration stopped early"); return result; } return 0; }
Jan 23 2009
Christopher Wright wrote:Bill Baxter wrote:Why would you ever want to do that??It's not? foreach(i; things) { if (i==a) continue; if (i==b) break; if (i==d) return; if (i==c) goto somewhere; } Those are all fairly common things to do from inside the 'dg' call. The int is how the compiler distinguishes which case got you out of the dg. --bbYou didn't read my post carefully. The use case you would lose is: int opApply (int delegate(something) dg) { auto result = dg(stuff); if (result) { // This would be impossible if we took a void delegate. log ("iteration stopped early"); return result; } return 0; }
Jan 24 2009
On Sun, Jan 25, 2009 at 2:06 AM, Ary Borenszweig <ary esperanto.org.ar> wrote:Christopher Wright wrote:I could have read it 1000 times and it wouldn't have helped. It was not clear exactly when you were talking about the action being taken or by whom.Bill Baxter wrote:It's not? foreach(i; things) { if (i==a) continue; if (i==b) break; if (i==d) return; if (i==c) goto somewhere; } Those are all fairly common things to do from inside the 'dg' call. The int is how the compiler distinguishes which case got you out of the dg. --bbYou didn't read my post carefully.It was pretty clear that he didn't think it a common or particularly useful pattern. --bbThe use case you would lose is: int opApply (int delegate(something) dg) { auto result = dg(stuff); if (result) { // This would be impossible if we took a void delegate. log ("iteration stopped early"); return result; } return 0; }Why would you ever want to do that??
Jan 24 2009
On Thu, 22 Jan 2009 09:30:15 +0300, Ary Borenszweig <ary esperanto.org.ar> wrote:Ary Borenszweig wrote: > BCS wrote: >> Reply to Robert, >> >>> That doesn't look entirely useless, especially for optimization. >>> Perhaps hard to read, but easier than reading the assembly output ;-P! >>> >> >> ditto; now that you have it might as well make it available. > > Ok, I'll work on it. :-) I still have to work on some stuff, but... Before: --- module main; import std.stdio; class Foo { uint array[2]; int opApply(int delegate(ref uint) dg) { int result = 0; for(int i = 0; i < array.length; i++) { result = dg(array[i]); if(result) break; } return result; } } int main(char[][] args) { Foo foo = new Foo(); foreach(x; foo) { if (x == 3) { break; } writefln("%s", x); } return 0; } --- After: --- module main; import object; import std.stdio; class Foo: Object { uint[2] array; int opApply(int delegate(ref uint) dg) { assert(this, "null this"); { int result = 0; for(int i = 0; cast(uint) i < 2; i++) { result = dg(this.array[cast(uint) i]); if(result) break; } return result; } } } int main(char[][] args) { Foo foo = new Foo; foo.opApply(delegate (uint __applyArg0) { { { uint x = __applyArg0; if(x == 3) return 1; writefln("%s", x); } return 0; } } ); return 0; } --- Ummm... I was wondering... In every implemetation of opApply, after you invoke the delegate you must check the result to see if it's non zero, right? In that case, you must break the iteration. If the compiler can transform a "foreach" into an opApply call, passing the foreach body and converting breaks to "return 1" statements... can't opApply be specified as: int opApply(void delegate(ref uint) dg) { // note: delegate returns void } and the compiler transforms the opApply signature to the one that's used now, plus converting each dg call to a call and a check of return value != 0 and return 1 in that case? Yes, yes, this is not trivial at all, but it's possible. And then D programmers no longer have to make ifs and return magic numbers to make foreach work. (think: do once in the compiler, eliminate thousands of boilerplate codes in programs) So basically: --- int opApply(void delegate(ref uint) dg) { for(int i = 0; i < array.length; i++) { dg(array[i]); } --- would be converted to: --- int opApply(void delegate(ref uint) dg) { for(int i = 0; i < array.length; i++) { result = dg(array[i]); if (result) return 1; } --- What do you think? (By the way, why opApply returns an int? What's the use of that?)Does AST preserve information whether a closure is static of dynamic? I believe having this information is very useful for development in D2. It will help Tango preserve "no hidden allocations" contract when ported to D2.
Jan 23 2009