www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - move+forward as intrinsics, incl. revised forward semantics for

reply kinke <noone nowhere.com> writes:
IMO we need to make `core.lifetime.{move,forward}` compiler 
intrinsics, to enable further optimizations that aren't possible 
with a library solution.



* semantics: move an lvalue to a new rvalue, at a new memory 
address, 'hijacking' the lvalue resources; the lvalue is reset to 
T.init (blit, not assignment!) afterwards
* will be complete with move ctor; syntax needs to be decided, 
but signature is `(ref T)` (yes, must be an explicit ref)
   * allows to opt out of the default blit (memcpy struct 
payload), e.g., to fix up interior pointers
   * move ctor interop with C++ should be doable (just getting the 
extern(C++) mangle right)
   * problem: handle/avoid all compiler-implicit moves/blits 
(would have to call move ctor and dtor now; emplace FTW!)
* would be nice as intrinsic:
   * not to have to import `core.lifetime` everywhere and end up 
with complicated template bloat for a basically trivial operation
   * potential optimization: elide lvalue reset to T.init and its 
destruction iff:
     * it is a local (can skip destruction)
     * and not used after the move
     * and the destruction of T.init is a noop (modulo mods to the 
struct's own payload), so its elision not observable



forward must become an intrinsic:
* for vars with `ref` storage class: as-is, yields the original 
lvalue
* non-ref lvalues (NEW semantics): 're-interpret as rvalue' - no 
move, and accordingly no destruction after forwarding (because 
the rvalue will already be destructed earlier)
   * only valid for locals (incl. params), the destruction of 
other lvalues cannot be skipped
   * invalid/undefined to access the original lvalue after 
forwarding it (has been destructed already)
   * probably only valid:
     * as function call argument expressions (glue layer needs to 
treat it like a frontend-generated temporary, passing it directly 
by ref)
     * as assignment right-hand-sides, for move-assign (`dst = 
forward!src;` => `dst.opAssign(forward!src);`)
     * as return expressions, for move-constructions (but prefer 
NRVO if possible, for direct emplace)
* probably needs to keep template syntax (`forward!x`, not 
`forward(x)`) for backwards compatibility with druntime template

Let's take a look at an example:
```D
import core.stdc.stdio;
import core.lifetime;

struct S {
     int x;

     this(int x) {
         this.x = x;
         printf("ctor: %p\n", &this);
     }

     this(this) {
         printf("copy: %p\n", &this);
     }

     ~this() {
         printf("dtor: %p\n", &this);
     }
}

void main() {
     {
         auto lval = S(1);
         printf("lval: %p\n", &lval);
         const r = bar1(lval);
         printf("   r: %p\n", &r);
     }

     {
         printf("\nrvalue:\n");
         const r = bar1(S(2));
         printf("   r: %p\n", &r);
     }
}

S bar1()(auto ref S s) {
     printf("bar1: %p\n", &s);
     return bar2(forward!s);
}

S bar2()(auto ref S s) {
     printf("bar2: %p\n", &s);
     return bar3(forward!s);
}

S bar3()(auto ref S s) {
     printf("bar3: %p\n", &s);
     return bar4(forward!s);
}

S bar4()(auto ref S s) {
     printf("bar4: %p, got a ref: %d\n", &s, __traits(isRef, s));
     return s; // copy parameter lvalue to return value
}
```

Output with DMD (and GDC), no backend optimizations:
```
ctor: 0x7ffebea26460
lval: 0x7ffebea26460
bar1: 0x7ffebea26460
bar2: 0x7ffebea26460
bar3: 0x7ffebea26460
bar4: 0x7ffebea26460, got a ref: 1
copy: 0x7ffebea263d0
    r: 0x7ffebea26464
dtor: 0x7ffebea26464
dtor: 0x7ffebea26460

rvalue:
ctor: 0x7ffebea2647c
bar1: 0x7ffebea26488
bar2: 0x7ffebea26424
bar3: 0x7ffebea263e4
bar4: 0x7ffebea263a4, got a ref: 0
copy: 0x7ffebea26358
dtor: 0x7ffebea263a4
dtor: 0x7ffebea263e4
dtor: 0x7ffebea26424
dtor: 0x7ffebea26488
    r: 0x7ffebea26478
dtor: 0x7ffebea26478
```

What we see is that current `core.lifetime.forward` propagates 
the ref-ness of the parameter, but has to `core.lifetime.move` it 
in the non-ref case, creating 3 explicit moves + destructions.

We also see that there are compiler-implicit moves ('optimized', 
i.e., no reset+destruction of the moved-from value):
* when passing the `S(2)` rvalue to `bar1` (not sure why, seems 
like a bug) - note the different addresses of `ctor` and `bar1`
* for the return values - the addresses of `copy` and `r` diverge 
(constructed   0x7ffebea26358, destructed   0x7ffebea26478)

With LDC, we at least already get perfectly forwarded return 
values (the addresses of `copy` and `r` are identical):
```
ctor: 0x7ffda922edbc
lval: 0x7ffda922edbc
bar1: 0x7ffda922edbc
bar2: 0x7ffda922edbc
bar3: 0x7ffda922edbc
bar4: 0x7ffda922edbc, got a ref: 1
copy: 0x7ffda922edb8
    r: 0x7ffda922edb8
dtor: 0x7ffda922edb8
dtor: 0x7ffda922edbc

rvalue:
ctor: 0x7ffda922eda0
bar1: 0x7ffda922ed6c
bar2: 0x7ffda922ed1c
bar3: 0x7ffda922eccc
bar4: 0x7ffda922ecc8, got a ref: 0
copy: 0x7ffda922eda4
dtor: 0x7ffda922ecc8
dtor: 0x7ffda922eccc
dtor: 0x7ffda922ed1c
dtor: 0x7ffda922ed6c
    r: 0x7ffda922eda4
dtor: 0x7ffda922eda4
```

The compiler needs to implement RVO (Return Value Optimization, 
different to Named-RVO!) to enable perfect forwarding of the 
return values. In this example, `r` is allocated in `main`, then 
its address passed and forwarded as hidden pointer all the way to 
`bar4`, where it gets copy-constructed.

With the proposed `forward` semantics, we'd get perfect 
forwarding of the `s` parameters too, without the 3 explicit 
moves and destructions. The `S(2)` rvalue would be created in 
`main`, then passed and forwarded directly by ref all the way to 
`bar4`, where it would get destructed when the `s` param goes out 
of scope.



This would make the compiler automatically `forward` suited 
lvalues. In the example, we wouldn't have to use a single 
explicit `forward` in the `barN` trampolines, *and* the 
copy-construction of the return value in the non-ref version of 
`bar4` would be optimized to a move-construction (`return 
forward!s`).
Oct 13
next sibling parent Paul Backus <snarwin gmail.com> writes:
On Sunday, 13 October 2024 at 10:43:27 UTC, kinke wrote:
 * non-ref lvalues (NEW semantics): 're-interpret as rvalue' - 
 no move, and accordingly no destruction after forwarding 
 (because the rvalue will already be destructed earlier)
   * invalid/undefined to access the original lvalue after 
 forwarding it (has been destructed already)
Unless the compiler can statically detect and prevent such accesses, this would make `forward!x` a ` system` operation, which IMO would be a step backward.
Oct 13
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 10/13/24 12:43, kinke wrote:
 IMO we need to make `core.lifetime.{move,forward}` compiler intrinsics, 
 to enable further optimizations that aren't possible with a library 
 solution.
 ...
Thanks for writing this up! I think this is a good starting point, but I would make some small tweaks.

 
 * semantics: move an lvalue to a new rvalue, at a new memory address, 
 'hijacking' the lvalue resources; the lvalue is reset to T.init (blit, 
 not assignment!) afterwards
Makes sense, though if the compiler can determine that something is a last use, it can optimize out the address change.
 * will be complete with move ctor; syntax needs to be decided, but 
 signature is `(ref T)` (yes, must be an explicit ref)
I can see either idea work here. What is most important is that it is in fact treated as a constructor. I guess the benefit of `this(S)` is uniformity with `this(ref S)`, and the benefit of `=this(ref S)` or `opMove(ref S)` is that it is obvious that the destructor will be called by the caller, potentially much later.
    * allows to opt out of the default blit (memcpy struct payload), 
 e.g., to fix up interior pointers
    * move ctor interop with C++ should be doable (just getting the 
 extern(C++) mangle right)
    * problem: handle/avoid all compiler-implicit moves/blits (would have 
 to call move ctor and dtor now; emplace FTW!)
 * would be nice as intrinsic:
    * not to have to import `core.lifetime` everywhere and end up with 
 complicated template bloat for a basically trivial operation
    * potential optimization: elide lvalue reset to T.init and its 
 destruction iff:
      * it is a local (can skip destruction)
      * and not used after the move
      * and the destruction of T.init is a noop (modulo mods to the 
 struct's own payload), so its elision not observable
 ...
Well, as I alluded to earlier, I think in such cases the object should just keep its original address and the move constructor does not need to be called at all. It reduces to a safe version of `__rvalue` in this case.

 
 forward must become an intrinsic:
 * for vars with `ref` storage class: as-is, yields the original lvalue
 * non-ref lvalues (NEW semantics): 're-interpret as rvalue' - no move, 
 and accordingly no destruction after forwarding (because the rvalue will 
 already be destructed earlier)
    * only valid for locals (incl. params), the destruction of other 
 lvalues cannot be skipped
    * invalid/undefined to access the original lvalue after forwarding it 
 (has been destructed already)
I think it would be better to do a `move`, where the `move` will usually be optimized to a safe `__rvalue` as above. I think unsafe `__rvalue` should be possible, but not ` safe`.
    * probably only valid:
      * as function call argument expressions (glue layer needs to treat 
 it like a frontend-generated temporary, passing it directly by ref)
      * as assignment right-hand-sides, for move-assign (`dst = forward! 
 src;` => `dst.opAssign(forward!src);`)
      * as return expressions, for move-constructions (but prefer NRVO if 
 possible, for direct emplace)
 * probably needs to keep template syntax (`forward!x`, not `forward(x)`) 
 for backwards compatibility with druntime template
 
 Let's take a look at an example:
 ```D
 import core.stdc.stdio;
 import core.lifetime;
 
 struct S {
      int x;
 
      this(int x) {
          this.x = x;
          printf("ctor: %p\n", &this);
      }
 
      this(this) {
          printf("copy: %p\n", &this);
      }
 
      ~this() {
          printf("dtor: %p\n", &this);
      }
 }
 
 void main() {
      {
          auto lval = S(1);
          printf("lval: %p\n", &lval);
          const r = bar1(lval);
          printf("   r: %p\n", &r);
      }
 
      {
          printf("\nrvalue:\n");
          const r = bar1(S(2));
          printf("   r: %p\n", &r);
      }
 }
 
 S bar1()(auto ref S s) {
      printf("bar1: %p\n", &s);
      return bar2(forward!s);
 }
 
 S bar2()(auto ref S s) {
      printf("bar2: %p\n", &s);
      return bar3(forward!s);
 }
 
 S bar3()(auto ref S s) {
      printf("bar3: %p\n", &s);
      return bar4(forward!s);
 }
 
 S bar4()(auto ref S s) {
      printf("bar4: %p, got a ref: %d\n", &s, __traits(isRef, s));
      return s; // copy parameter lvalue to return value
 }
 ```
 
 Output with DMD (and GDC), no backend optimizations:
 ```
 ctor: 0x7ffebea26460
 lval: 0x7ffebea26460
 bar1: 0x7ffebea26460
 bar2: 0x7ffebea26460
 bar3: 0x7ffebea26460
 bar4: 0x7ffebea26460, got a ref: 1
 copy: 0x7ffebea263d0
     r: 0x7ffebea26464
 dtor: 0x7ffebea26464
 dtor: 0x7ffebea26460
 
 rvalue:
 ctor: 0x7ffebea2647c
 bar1: 0x7ffebea26488
 bar2: 0x7ffebea26424
 bar3: 0x7ffebea263e4
 bar4: 0x7ffebea263a4, got a ref: 0
 copy: 0x7ffebea26358
 dtor: 0x7ffebea263a4
 dtor: 0x7ffebea263e4
 dtor: 0x7ffebea26424
 dtor: 0x7ffebea26488
     r: 0x7ffebea26478
 dtor: 0x7ffebea26478
 ```
 
 What we see is that current `core.lifetime.forward` propagates the ref- 
 ness of the parameter, but has to `core.lifetime.move` it in the non-ref 
 case, creating 3 explicit moves + destructions.
 
 We also see that there are compiler-implicit moves ('optimized', i.e., 
 no reset+destruction of the moved-from value):
 * when passing the `S(2)` rvalue to `bar1` (not sure why, seems like a 
 bug) - note the different addresses of `ctor` and `bar1`
 * for the return values - the addresses of `copy` and `r` diverge 
 (constructed   0x7ffebea26358, destructed   0x7ffebea26478)
 
 With LDC, we at least already get perfectly forwarded return values (the 
 addresses of `copy` and `r` are identical):
 ```
 ctor: 0x7ffda922edbc
 lval: 0x7ffda922edbc
 bar1: 0x7ffda922edbc
 bar2: 0x7ffda922edbc
 bar3: 0x7ffda922edbc
 bar4: 0x7ffda922edbc, got a ref: 1
 copy: 0x7ffda922edb8
     r: 0x7ffda922edb8
 dtor: 0x7ffda922edb8
 dtor: 0x7ffda922edbc
 
 rvalue:
 ctor: 0x7ffda922eda0
 bar1: 0x7ffda922ed6c
 bar2: 0x7ffda922ed1c
 bar3: 0x7ffda922eccc
 bar4: 0x7ffda922ecc8, got a ref: 0
 copy: 0x7ffda922eda4
 dtor: 0x7ffda922ecc8
 dtor: 0x7ffda922eccc
 dtor: 0x7ffda922ed1c
 dtor: 0x7ffda922ed6c
     r: 0x7ffda922eda4
 dtor: 0x7ffda922eda4
 ```
 
 The compiler needs to implement RVO (Return Value Optimization, 
 different to Named-RVO!) to enable perfect forwarding of the return 
 values. In this example, `r` is allocated in `main`, then its address 
 passed and forwarded as hidden pointer all the way to `bar4`, where it 
 gets copy-constructed.
 
 With the proposed `forward` semantics, we'd get perfect forwarding of 
 the `s` parameters too, without the 3 explicit moves and destructions. 
 The `S(2)` rvalue would be created in `main`, then passed and forwarded 
 directly by ref all the way to `bar4`, where it would get destructed 
 when the `s` param goes out of scope.
 

 
 This would make the compiler automatically `forward` suited lvalues. In 
 the example, we wouldn't have to use a single explicit `forward` in the 
 `barN` trampolines, *and* the copy-construction of the return value in 
 the non-ref version of `bar4` would be optimized to a move-construction 
 (`return forward!s`).
Sounds good, but I think simple cases like this one should be a priority. Even if there is no data-flow analysis as advanced as the one proposed in DIP1040, I think it is important that there is no copy in `bar4`.
Oct 13