www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Strange closure behaviour

reply Emmanuelle <VuLXn6DBW PPtUm7TvV6nsw.com> writes:
Take a look at this code:

---
import std.stdio;

void main()
{
     alias Func = void delegate(int);

     int[][] nums = new int[][5];
     Func[] funcs;
     foreach (x; 0 .. 5) {
         funcs ~= (int i) { nums[x] ~= i; };
     }

     foreach (i, func; funcs) {
         func(cast(int) i);
     }

     writeln(nums);
}
---

(https://run.dlang.io/is/oMjNRL)

The output is:

---
[[], [], [], [], [0, 1, 2, 3, 4]]
---

Personally, this makes no sense to me. This is the result I was 
expecting:

---
[[0], [1], [2], [3], [4]]
---

Why is it "locking" the bound `x` to the last element? It seems 
like the compiler is overwriting the closure for `x`, somehow. 
So, I'm wondering why D is doing that. Is it a compiler bug? Or 
is this the expected behaviour?
Jun 14 2019
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Saturday, 15 June 2019 at 00:24:52 UTC, Emmanuelle wrote:
 Is it a compiler bug?
Yup, a very longstanding bug. You can work around it by wrapping it all in another layer of function which you immediately call (which is fairly common in javascript): funcs ~= ((x) => (int i) { nums[x] ~= i; })(x); Or maybe less confusingly written long form: funcs ~= (delegate(x) { return (int i) { nums[x] ~= i; }; })(x); You write a function that returns your actual function, and immediately calls it with the loop variable, which will explicitly make a copy of it.
Jun 14 2019
parent reply Emmanuelle <VuLXn6DBW PPtUm7TvV6nsw.com> writes:
On Saturday, 15 June 2019 at 00:30:43 UTC, Adam D. Ruppe wrote:
 On Saturday, 15 June 2019 at 00:24:52 UTC, Emmanuelle wrote:
 Is it a compiler bug?
Yup, a very longstanding bug. You can work around it by wrapping it all in another layer of function which you immediately call (which is fairly common in javascript): funcs ~= ((x) => (int i) { nums[x] ~= i; })(x); Or maybe less confusingly written long form: funcs ~= (delegate(x) { return (int i) { nums[x] ~= i; }; })(x); You write a function that returns your actual function, and immediately calls it with the loop variable, which will explicitly make a copy of it.
Oh, I see. Unfortunate that it's a longstanding compiler bug, but at least the rather awkward workaround will do. Thank you!
Jun 14 2019
parent reply =?UTF-8?B?UsOpbXkgTW91w6t6YQ==?= <remy.moueza gmail.com> writes:
On Saturday, 15 June 2019 at 01:21:46 UTC, Emmanuelle wrote:
 On Saturday, 15 June 2019 at 00:30:43 UTC, Adam D. Ruppe wrote:
 On Saturday, 15 June 2019 at 00:24:52 UTC, Emmanuelle wrote:
 Is it a compiler bug?
Yup, a very longstanding bug. You can work around it by wrapping it all in another layer of function which you immediately call (which is fairly common in javascript): funcs ~= ((x) => (int i) { nums[x] ~= i; })(x); Or maybe less confusingly written long form: funcs ~= (delegate(x) { return (int i) { nums[x] ~= i; }; })(x); You write a function that returns your actual function, and immediately calls it with the loop variable, which will explicitly make a copy of it.
Oh, I see. Unfortunate that it's a longstanding compiler bug, but at least the rather awkward workaround will do. Thank you!
I don't know if we can tell this is a compiler bug. The same behavior happens in Python. The logic being variable `x` is captured by the closure. That closure's context will contain a pointer/reference to x. Whenever x is updated outside of the closure, the context still points to the modified x. Hence the seemingly strange behavior. Adam's workaround ensures that the closure captures a temporary `x` variable on the stack: a copy will be made instead of taking a reference, since a pointer to `x` would be dangling once the `delegate(x){...}` returns. Most of the time, we want a pointer/reference to the enclosed variables in our closures. Note that C++ 17 allows one to select the capture mode: the following link lists 8 of them: https://en.cppreference.com/w/cpp/language/lambda#Lambda_capture. D offers a convenient default that works most of the time. The trade-off is having to deal with the creation of several closures referencing a variable being modified in a single scope, like the incremented `x` of the for loop. That said, I wouldn't mind having the compiler dealing with that case: detecting that `x` is within a for loop and making copies of it in the closures contexts.
Jun 15 2019
next sibling parent Adam D. Ruppe <destructionator gmail.com> writes:
On Saturday, 15 June 2019 at 16:29:29 UTC, Rémy Mouëza wrote:
 I don't know if we can tell this is a compiler bug.
I can't remember where the key fact was, but I used to agree with you (several languages work this same way, and it makes a lot of sense for ease of the implementation), but someone convinced me otherwise by pointing to the language of the D spec. I just can't find that reference right now... It is worth noting too that the current behavior also opens up a whole in the immutable promises; the loop variable can be passed as immutable to the outside via a delegate, but then modified afterward, which is unambiguously a bug. Regardless of bug vs spec, it isn't implemented and I wouldn't expect that to change any time soon, so it is good to just learn the wrapper function technique :) (and it is useful in those other languages too)
Jun 15 2019
prev sibling next sibling parent Emmanuelle <VuLXn6DBW PPtUm7TvV6nsw.com> writes:
On Saturday, 15 June 2019 at 16:29:29 UTC, Rémy Mouëza wrote:
 I don't know if we can tell this is a compiler bug. The same 
 behavior happens in Python. The logic being variable `x` is 
 captured by the closure. That closure's context will contain a 
 pointer/reference to x. Whenever x is updated outside of the 
 closure, the context still points to the modified x. Hence the 
 seemingly strange behavior.
I come from Ruby, where it works as I expected, so I assumed all languages would work like that; but then, D surprised me, and now, Python too, and apparently a whole bunch of other languages (which is honestly kinda disheartening since I like throwing lambdas everywhere.)
Jun 15 2019
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 15.06.19 18:29, Rémy Mouëza wrote:
 On Saturday, 15 June 2019 at 01:21:46 UTC, Emmanuelle wrote:
 On Saturday, 15 June 2019 at 00:30:43 UTC, Adam D. Ruppe wrote:
 On Saturday, 15 June 2019 at 00:24:52 UTC, Emmanuelle wrote:
 Is it a compiler bug?
Yup, a very longstanding bug. You can work around it by wrapping it all in another layer of function which you immediately call (which is fairly common in javascript):         funcs ~= ((x) => (int i) { nums[x] ~= i; })(x); Or maybe less confusingly written long form:         funcs ~= (delegate(x) {             return (int i) { nums[x] ~= i; };         })(x); You write a function that returns your actual function, and immediately calls it with the loop variable, which will explicitly make a copy of it.
Oh, I see. Unfortunate that it's a longstanding compiler bug, but at least the rather awkward workaround will do. Thank you!
I don't know if we can tell this is a compiler bug.
It's a bug. It's memory corruption. Different objects with overlapping lifetimes use the same memory location.
 The same behavior happens in Python.
No, it's not the same. Python has no sensible notion of variable scope.
 for i in range(3): pass
...
 print(i)
2 Yuck.
 The logic being variable `x` is captured by the 
 closure. That closure's context will contain a pointer/reference to x. 
 Whenever x is updated outside of the closure, the context still points 
 to the modified x. Hence the seemingly strange behavior.
 ...
It's not the same instance of the variable. Foreach loop variables are local to the loop body. They may both be called `x`, but they are not the same. It's most obvious with `immutable` variables.
 Adam's workaround ensures that the closure captures a temporary `x` 
 variable on the stack: a copy will be made instead of taking a 
 reference, since a pointer to `x` would be dangling once the 
 `delegate(x){...}` returns.
 
 Most of the time, we want a pointer/reference to the enclosed variables 
 in our closures. Note that C++ 17 allows one to select the capture mode: 
 the following link lists 8 of them: 
 https://en.cppreference.com/w/cpp/language/lambda#Lambda_capture.
 ...
No, this is not an issue of by value vs by reference. All captures in D are by reference, yet the behavior is wrong.
 D offers a convenient default that works most of the time. The trade-off 
 is having to deal with the creation of several closures referencing a 
 variable being modified in a single scope, like the incremented `x` of 
 the for loop.
 ...
By reference capturing may be a convenient default, but even capturing by reference the behavior is wrong.
Jun 15 2019
parent =?UTF-8?B?UsOpbXkgTW91w6t6YQ==?= <remy.moueza gmail.com> writes:
On Sunday, 16 June 2019 at 01:36:38 UTC, Timon Gehr wrote:
 It's a bug. It's memory corruption. Different objects with 
 overlapping
  lifetimes use the same memory location.
Okay. Seen that way, it is clear to me why it's a bug.
 ...
 No, it's not the same. Python has no sensible notion of 
 variable scope.

 for i in range(3): pass
...
 print(i)
2 Yuck.
I got confused by this Python behavior: ls = [] for i in range(0, 5): ls.append(lambda x: x + i) for fun in ls: print(fun(0)) This prints: 4 4 4 4 4
Jun 16 2019