digitalmars.D - move+forward as intrinsics, incl. revised forward semantics for
- kinke (172/172) Oct 13 IMO we need to make `core.lifetime.{move,forward}` compiler
- Paul Backus (4/9) Oct 13 Unless the compiler can statically detect and prevent such
- Timon Gehr (20/213) Oct 13 Thanks for writing this up! I think this is a good starting point, but I...
IMO we need to make `core.lifetime.{move,forward}` compiler intrinsics, to enable further optimizations that aren't possible with a library solution. * semantics: move an lvalue to a new rvalue, at a new memory address, 'hijacking' the lvalue resources; the lvalue is reset to T.init (blit, not assignment!) afterwards * will be complete with move ctor; syntax needs to be decided, but signature is `(ref T)` (yes, must be an explicit ref) * allows to opt out of the default blit (memcpy struct payload), e.g., to fix up interior pointers * move ctor interop with C++ should be doable (just getting the extern(C++) mangle right) * problem: handle/avoid all compiler-implicit moves/blits (would have to call move ctor and dtor now; emplace FTW!) * would be nice as intrinsic: * not to have to import `core.lifetime` everywhere and end up with complicated template bloat for a basically trivial operation * potential optimization: elide lvalue reset to T.init and its destruction iff: * it is a local (can skip destruction) * and not used after the move * and the destruction of T.init is a noop (modulo mods to the struct's own payload), so its elision not observable forward must become an intrinsic: * for vars with `ref` storage class: as-is, yields the original lvalue * non-ref lvalues (NEW semantics): 're-interpret as rvalue' - no move, and accordingly no destruction after forwarding (because the rvalue will already be destructed earlier) * only valid for locals (incl. params), the destruction of other lvalues cannot be skipped * invalid/undefined to access the original lvalue after forwarding it (has been destructed already) * probably only valid: * as function call argument expressions (glue layer needs to treat it like a frontend-generated temporary, passing it directly by ref) * as assignment right-hand-sides, for move-assign (`dst = forward!src;` => `dst.opAssign(forward!src);`) * as return expressions, for move-constructions (but prefer NRVO if possible, for direct emplace) * probably needs to keep template syntax (`forward!x`, not `forward(x)`) for backwards compatibility with druntime template Let's take a look at an example: ```D import core.stdc.stdio; import core.lifetime; struct S { int x; this(int x) { this.x = x; printf("ctor: %p\n", &this); } this(this) { printf("copy: %p\n", &this); } ~this() { printf("dtor: %p\n", &this); } } void main() { { auto lval = S(1); printf("lval: %p\n", &lval); const r = bar1(lval); printf(" r: %p\n", &r); } { printf("\nrvalue:\n"); const r = bar1(S(2)); printf(" r: %p\n", &r); } } S bar1()(auto ref S s) { printf("bar1: %p\n", &s); return bar2(forward!s); } S bar2()(auto ref S s) { printf("bar2: %p\n", &s); return bar3(forward!s); } S bar3()(auto ref S s) { printf("bar3: %p\n", &s); return bar4(forward!s); } S bar4()(auto ref S s) { printf("bar4: %p, got a ref: %d\n", &s, __traits(isRef, s)); return s; // copy parameter lvalue to return value } ``` Output with DMD (and GDC), no backend optimizations: ``` ctor: 0x7ffebea26460 lval: 0x7ffebea26460 bar1: 0x7ffebea26460 bar2: 0x7ffebea26460 bar3: 0x7ffebea26460 bar4: 0x7ffebea26460, got a ref: 1 copy: 0x7ffebea263d0 r: 0x7ffebea26464 dtor: 0x7ffebea26464 dtor: 0x7ffebea26460 rvalue: ctor: 0x7ffebea2647c bar1: 0x7ffebea26488 bar2: 0x7ffebea26424 bar3: 0x7ffebea263e4 bar4: 0x7ffebea263a4, got a ref: 0 copy: 0x7ffebea26358 dtor: 0x7ffebea263a4 dtor: 0x7ffebea263e4 dtor: 0x7ffebea26424 dtor: 0x7ffebea26488 r: 0x7ffebea26478 dtor: 0x7ffebea26478 ``` What we see is that current `core.lifetime.forward` propagates the ref-ness of the parameter, but has to `core.lifetime.move` it in the non-ref case, creating 3 explicit moves + destructions. We also see that there are compiler-implicit moves ('optimized', i.e., no reset+destruction of the moved-from value): * when passing the `S(2)` rvalue to `bar1` (not sure why, seems like a bug) - note the different addresses of `ctor` and `bar1` * for the return values - the addresses of `copy` and `r` diverge (constructed 0x7ffebea26358, destructed 0x7ffebea26478) With LDC, we at least already get perfectly forwarded return values (the addresses of `copy` and `r` are identical): ``` ctor: 0x7ffda922edbc lval: 0x7ffda922edbc bar1: 0x7ffda922edbc bar2: 0x7ffda922edbc bar3: 0x7ffda922edbc bar4: 0x7ffda922edbc, got a ref: 1 copy: 0x7ffda922edb8 r: 0x7ffda922edb8 dtor: 0x7ffda922edb8 dtor: 0x7ffda922edbc rvalue: ctor: 0x7ffda922eda0 bar1: 0x7ffda922ed6c bar2: 0x7ffda922ed1c bar3: 0x7ffda922eccc bar4: 0x7ffda922ecc8, got a ref: 0 copy: 0x7ffda922eda4 dtor: 0x7ffda922ecc8 dtor: 0x7ffda922eccc dtor: 0x7ffda922ed1c dtor: 0x7ffda922ed6c r: 0x7ffda922eda4 dtor: 0x7ffda922eda4 ``` The compiler needs to implement RVO (Return Value Optimization, different to Named-RVO!) to enable perfect forwarding of the return values. In this example, `r` is allocated in `main`, then its address passed and forwarded as hidden pointer all the way to `bar4`, where it gets copy-constructed. With the proposed `forward` semantics, we'd get perfect forwarding of the `s` parameters too, without the 3 explicit moves and destructions. The `S(2)` rvalue would be created in `main`, then passed and forwarded directly by ref all the way to `bar4`, where it would get destructed when the `s` param goes out of scope. This would make the compiler automatically `forward` suited lvalues. In the example, we wouldn't have to use a single explicit `forward` in the `barN` trampolines, *and* the copy-construction of the return value in the non-ref version of `bar4` would be optimized to a move-construction (`return forward!s`).
Oct 13
On Sunday, 13 October 2024 at 10:43:27 UTC, kinke wrote:* non-ref lvalues (NEW semantics): 're-interpret as rvalue' - no move, and accordingly no destruction after forwarding (because the rvalue will already be destructed earlier) * invalid/undefined to access the original lvalue after forwarding it (has been destructed already)Unless the compiler can statically detect and prevent such accesses, this would make `forward!x` a ` system` operation, which IMO would be a step backward.
Oct 13
On 10/13/24 12:43, kinke wrote:IMO we need to make `core.lifetime.{move,forward}` compiler intrinsics, to enable further optimizations that aren't possible with a library solution. ...Thanks for writing this up! I think this is a good starting point, but I would make some small tweaks.* semantics: move an lvalue to a new rvalue, at a new memory address, 'hijacking' the lvalue resources; the lvalue is reset to T.init (blit, not assignment!) afterwardsMakes sense, though if the compiler can determine that something is a last use, it can optimize out the address change.* will be complete with move ctor; syntax needs to be decided, but signature is `(ref T)` (yes, must be an explicit ref)I can see either idea work here. What is most important is that it is in fact treated as a constructor. I guess the benefit of `this(S)` is uniformity with `this(ref S)`, and the benefit of `=this(ref S)` or `opMove(ref S)` is that it is obvious that the destructor will be called by the caller, potentially much later.* allows to opt out of the default blit (memcpy struct payload), e.g., to fix up interior pointers * move ctor interop with C++ should be doable (just getting the extern(C++) mangle right) * problem: handle/avoid all compiler-implicit moves/blits (would have to call move ctor and dtor now; emplace FTW!) * would be nice as intrinsic: * not to have to import `core.lifetime` everywhere and end up with complicated template bloat for a basically trivial operation * potential optimization: elide lvalue reset to T.init and its destruction iff: * it is a local (can skip destruction) * and not used after the move * and the destruction of T.init is a noop (modulo mods to the struct's own payload), so its elision not observable ...Well, as I alluded to earlier, I think in such cases the object should just keep its original address and the move constructor does not need to be called at all. It reduces to a safe version of `__rvalue` in this case.forward must become an intrinsic: * for vars with `ref` storage class: as-is, yields the original lvalue * non-ref lvalues (NEW semantics): 're-interpret as rvalue' - no move, and accordingly no destruction after forwarding (because the rvalue will already be destructed earlier) * only valid for locals (incl. params), the destruction of other lvalues cannot be skipped * invalid/undefined to access the original lvalue after forwarding it (has been destructed already)I think it would be better to do a `move`, where the `move` will usually be optimized to a safe `__rvalue` as above. I think unsafe `__rvalue` should be possible, but not ` safe`.* probably only valid: * as function call argument expressions (glue layer needs to treat it like a frontend-generated temporary, passing it directly by ref) * as assignment right-hand-sides, for move-assign (`dst = forward! src;` => `dst.opAssign(forward!src);`) * as return expressions, for move-constructions (but prefer NRVO if possible, for direct emplace) * probably needs to keep template syntax (`forward!x`, not `forward(x)`) for backwards compatibility with druntime template Let's take a look at an example: ```D import core.stdc.stdio; import core.lifetime; struct S { int x; this(int x) { this.x = x; printf("ctor: %p\n", &this); } this(this) { printf("copy: %p\n", &this); } ~this() { printf("dtor: %p\n", &this); } } void main() { { auto lval = S(1); printf("lval: %p\n", &lval); const r = bar1(lval); printf(" r: %p\n", &r); } { printf("\nrvalue:\n"); const r = bar1(S(2)); printf(" r: %p\n", &r); } } S bar1()(auto ref S s) { printf("bar1: %p\n", &s); return bar2(forward!s); } S bar2()(auto ref S s) { printf("bar2: %p\n", &s); return bar3(forward!s); } S bar3()(auto ref S s) { printf("bar3: %p\n", &s); return bar4(forward!s); } S bar4()(auto ref S s) { printf("bar4: %p, got a ref: %d\n", &s, __traits(isRef, s)); return s; // copy parameter lvalue to return value } ``` Output with DMD (and GDC), no backend optimizations: ``` ctor: 0x7ffebea26460 lval: 0x7ffebea26460 bar1: 0x7ffebea26460 bar2: 0x7ffebea26460 bar3: 0x7ffebea26460 bar4: 0x7ffebea26460, got a ref: 1 copy: 0x7ffebea263d0 r: 0x7ffebea26464 dtor: 0x7ffebea26464 dtor: 0x7ffebea26460 rvalue: ctor: 0x7ffebea2647c bar1: 0x7ffebea26488 bar2: 0x7ffebea26424 bar3: 0x7ffebea263e4 bar4: 0x7ffebea263a4, got a ref: 0 copy: 0x7ffebea26358 dtor: 0x7ffebea263a4 dtor: 0x7ffebea263e4 dtor: 0x7ffebea26424 dtor: 0x7ffebea26488 r: 0x7ffebea26478 dtor: 0x7ffebea26478 ``` What we see is that current `core.lifetime.forward` propagates the ref- ness of the parameter, but has to `core.lifetime.move` it in the non-ref case, creating 3 explicit moves + destructions. We also see that there are compiler-implicit moves ('optimized', i.e., no reset+destruction of the moved-from value): * when passing the `S(2)` rvalue to `bar1` (not sure why, seems like a bug) - note the different addresses of `ctor` and `bar1` * for the return values - the addresses of `copy` and `r` diverge (constructed 0x7ffebea26358, destructed 0x7ffebea26478) With LDC, we at least already get perfectly forwarded return values (the addresses of `copy` and `r` are identical): ``` ctor: 0x7ffda922edbc lval: 0x7ffda922edbc bar1: 0x7ffda922edbc bar2: 0x7ffda922edbc bar3: 0x7ffda922edbc bar4: 0x7ffda922edbc, got a ref: 1 copy: 0x7ffda922edb8 r: 0x7ffda922edb8 dtor: 0x7ffda922edb8 dtor: 0x7ffda922edbc rvalue: ctor: 0x7ffda922eda0 bar1: 0x7ffda922ed6c bar2: 0x7ffda922ed1c bar3: 0x7ffda922eccc bar4: 0x7ffda922ecc8, got a ref: 0 copy: 0x7ffda922eda4 dtor: 0x7ffda922ecc8 dtor: 0x7ffda922eccc dtor: 0x7ffda922ed1c dtor: 0x7ffda922ed6c r: 0x7ffda922eda4 dtor: 0x7ffda922eda4 ``` The compiler needs to implement RVO (Return Value Optimization, different to Named-RVO!) to enable perfect forwarding of the return values. In this example, `r` is allocated in `main`, then its address passed and forwarded as hidden pointer all the way to `bar4`, where it gets copy-constructed. With the proposed `forward` semantics, we'd get perfect forwarding of the `s` parameters too, without the 3 explicit moves and destructions. The `S(2)` rvalue would be created in `main`, then passed and forwarded directly by ref all the way to `bar4`, where it would get destructed when the `s` param goes out of scope. This would make the compiler automatically `forward` suited lvalues. In the example, we wouldn't have to use a single explicit `forward` in the `barN` trampolines, *and* the copy-construction of the return value in the non-ref version of `bar4` would be optimized to a move-construction (`return forward!s`).Sounds good, but I think simple cases like this one should be a priority. Even if there is no data-flow analysis as advanced as the one proposed in DIP1040, I think it is important that there is no copy in `bar4`.
Oct 13