www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Choosing an Approach for the Template Lowering of _d_arrayctor

reply Teodor Dutu <teodor.dutu gmail.com> writes:
Hi,

As part of SAoC 2021, I changed the lowering of `_d_arrayctor` 
from using the [runtime 
hook](https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/rt/arra
assign.d#L170-L201) to a template, via theses PRs:
- https://github.com/dlang/dmd/pull/13116
- https://github.com/dlang/druntime/pull/3627

We'll call the former approach the **hook** approach and the 
latter, the **template** approach.

This template approach required me to declare the returned array 
inside a union. This was needed because by declaring this array 
directly, the compiler was inserting a call to `__ArrayDtor` 
before exiting the function. This call was in contradiction with 
the error handling already performed by `_d_arrayctor`. I gave a 
few more details about this issue in [this 
post](https://forum.dlang.org/post/ukhovyoowgzjeaapyuvd forum.dlang.org).
However, this came with the drawback of not using NRVO, because of the [cast at
end of the function](https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/core/internal/array/
onstruction.d#L88). This additional copy causes a performance decrease,
compared to the hook approach, as I'm going to show later in this post.

A third approach, which I call the **hack** approach, is very 
close to the one using the template. The difference is that the 
hack adds a third, unused, pointer-type parameter to 
`_d_arrayctor`, in order to change the function's strong purity 
to weak purity. The reasons for this unorthodox approach are 
better explained in [this 
post](https://forum.dlang.org/post/simesvkancmscrtsciwq forum.dlang.org), and
the PR for it is [here](https://github.com/dlang/druntime/pull/3587). Due to
this hack, there is no need to declare the returned array inside the scope of
`_d_arrayctor`, which also removes the need for the union trick. As a result,
no additional copying is performed and this approach has increased performance
when compared to the hook approach.

In order to measure the performance of each lowering, I designed 
[this](https://gist.github.com/teodutu/c6b01561e8eb0d12f9fc7ce11b519d9a)
benchmark, which I compiled with dmd and ran, using all 3 approaches. It
measures the time taken by the call to `_d_arrayctor` for 3 struct sizes: small
(empty), medium (64 bytes) and large (256 bytes), as well as 3 array lengths:
small (1 element), medium (64 elements) and large (256 elements). 256 elements
is a large enough array, because the lowering to `_d_arrayctor` is only
performed for static arrays, which are unlikely to be much larger. The
numerical results can be seen below:
```
Running benchmark on branch master (_d_arrayctor as a hook):
_0B_Struct_1Elem   1000000 runs: average time = 13ms; std dev = 
9.53674e-06
_0B_Struct_64Elems   1000000 runs: average time = 280.4ms; std 
dev = 0.489898
_0B_Struct_256Elems   1000000 runs: average time = 1062.18ms; std 
dev = 0.433128
_64B_Struct_1Elem   1000000 runs: average time = 13.01ms; std dev 
= 0.0994988
_64B_Struct_64Elems   1000000 runs: average time = 309.99ms; std 
dev = 0.0994992
_64B_Struct_256Elems   1000000 runs: average time = 1194.11ms; 
std dev = 0.31289
_256B_Struct_1Elem   1000000 runs: average time = 15ms; std dev = 
1.43051e-05
_256B_Struct_64Elems   1000000 runs: average time = 482.23ms; std 
dev = 1.73698
_256B_Struct_256Elems   1000000 runs: average time = 2960.72ms; 
std dev = 1.20067

Running benchmark with _d_arrayctor as a template:
0B_Struct_1Elem   1000000 runs: average time = 9.6934ms; std dev 
= 0.019658
0B_Struct_64Elems   1000000 runs: average time = 219.086ms; std 
dev = 0.467551
0B_Struct_256Elems   1000000 runs: average time = 820.951ms; std 
dev = 1.88878
64B_Struct_1Elem   1000000 runs: average time = 15.6967ms; std 
dev = 0.046066
64B_Struct_64Elems   1000000 runs: average time = 287.54ms; std 
dev = 0.979795
64B_Struct_256Elems   1000000 runs: average time = 1177.63ms; std 
dev = 1.06447
256B_Struct_1Elem   1000000 runs: average time = 20.8763ms; std 
dev = 0.259867
256B_Struct_64Elems   1000000 runs: average time = 981.405ms; std 
dev = 0.67177
256B_Struct_256Elems   1000000 runs: average time = 4029.59ms; 
std dev = 0.511771

Running benchmark with _d_arrayctor as a template (with the 
hack-y 3rd parameter):
_0B_Struct_1Elem   1000000 runs: average time = 9.00001ms; std 
dev = 6.67572e-06
_0B_Struct_64Elems   1000000 runs: average time = 183.94ms; std 
dev = 0.525738
_0B_Struct_256Elems   1000000 runs: average time = 689.091ms; std 
dev = 0.286183
_64B_Struct_1Elem   1000000 runs: average time = 10.01ms; std dev 
= 0.0994987
_64B_Struct_64Elems   1000000 runs: average time = 227.23ms; std 
dev = 0.645833
_64B_Struct_256Elems   1000000 runs: average time = 925.25ms; std 
dev = 0.766486
_256B_Struct_1Elem   1000000 runs: average time = 11.07ms; std 
dev = 0.255147
_256B_Struct_64Elems   1000000 runs: average time = 417.92ms; std 
dev = 0.271293
_256B_Struct_256Elems   1000000 runs: average time = 2576.1ms; 
std dev = 8.36003
```

In order to better visualise the numbers above, I also plotted 
the bar charts below:

![_d_arrayctor Performance 
Comparison](https://i.imgur.com/s9JjXSi.png)

As you can see, the hack is faster than the hook. Its running 
times are about 15-25% lower than those of the hook for larger 
structs (64 and 256 bytes) and as much as 35% lower for the empty 
struct. On the other hand, the template approach, while being 
20-25% faster than the hook when called on the empty struct, is 
as much as 35% slower than the hook, on larger (256-byte) structs.

Due to this discrepancy in performance, I think the best approach 
is the hack, despite adding an unused parameter to 
`_d_arrayctor`. What do you think?

Thanks,
Teodor
Nov 25 2021
next sibling parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Thursday, 25 November 2021 at 13:28:18 UTC, Teodor Dutu wrote:

 A third approach, which I call the **hack** approach, is very 
 close to the one using the template. The difference is that the 
 hack adds a third, unused, pointer-type parameter...
Hm... if you're going to have a pointer parameter, why not instead simply keep the two parameter variant but pass the first parameter by reference?
Nov 25 2021
parent reply Teodor Dutu <teodor.dutu gmail.com> writes:
On Thursday, 25 November 2021 at 14:24:59 UTC, Stanislav Blinov 
wrote:
 On Thursday, 25 November 2021 at 13:28:18 UTC, Teodor Dutu 
 wrote:

 A third approach, which I call the **hack** approach, is very 
 close to the one using the template. The difference is that 
 the hack adds a third, unused, pointer-type parameter...
Hm... if you're going to have a pointer parameter, why not instead simply keep the two parameter variant but pass the first parameter by reference?
The first iteration of `_d_arrayctor` was like this, as implemented by [this PR](https://github.com/dlang/druntime/pull/2655). The problem with it was that it made the compiler issue some warnings when testing phobos, because the lowering of something like `S[3] dst = src;` looked like this: `_d_arrayctor(dst[], src[])`. `_d_arrayctor` was strongly pure and, since the return value of the function was ignored, the compiler deemed the whole function call useless. I went more in-depth about this problem in [this post](https://forum.dlang.org/post/simesvkancmscrtsciwq forum.dlang.org). Following this, I tried to change `_d_arrayctor` to declare the returned array inside its own scope, hoping to make use of NRVO, but it wasn't the case, because I had to resort to the union trick.
Nov 25 2021
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Thursday, 25 November 2021 at 14:35:56 UTC, Teodor Dutu wrote:
 On Thursday, 25 November 2021 at 14:24:59 UTC, Stanislav Blinov 
 wrote:
 On Thursday, 25 November 2021 at 13:28:18 UTC, Teodor Dutu 
 wrote:

 A third approach, which I call the **hack** approach, is very 
 close to the one using the template. The difference is that 
 the hack adds a third, unused, pointer-type parameter...
Hm... if you're going to have a pointer parameter, why not instead simply keep the two parameter variant but pass the first parameter by reference?
The first iteration of `_d_arrayctor` was like this, as implemented by [this PR](https://github.com/dlang/druntime/pull/2655). The problem with it was that it made the compiler issue some warnings when testing phobos, because the lowering of something like `S[3] dst = src;` looked like this: `_d_arrayctor(dst[], src[])`. `_d_arrayctor` was strongly pure and, since the return value of the function was ignored, the compiler deemed the whole function call useless.
If you take the first parameter by reference, that shouldn't happen. I.e. in ```d void _d_arrayctor(A,B)(return ref A to, scope B from) if (is(A : X[]) && is (B : Y[]) && is(immutable(typeof(A.init[0])) == immutable(typeof(B.init[0])))) { /* ... */ } ``` `A` would be a static array, taken by reference.
Nov 25 2021
parent reply Paul Backus <snarwin gmail.com> writes:
On Thursday, 25 November 2021 at 14:59:54 UTC, Stanislav Blinov 
wrote:
 If you take the first parameter by reference, that shouldn't 
 happen. I.e. in

 ```d
 void _d_arrayctor(A,B)(return ref A to, scope B from)
 if (is(A : X[]) && is (B : Y[]) && 
 is(immutable(typeof(A.init[0])) == 
 immutable(typeof(B.init[0])))) { /* ... */ }
 ```

 `A` would be a static array, taken by reference.
Unfortunately it is impossible to implement `_d_arrayctor` with this signature without invoking undefined behavior. See the discussion in this thread for details: https://forum.dlang.org/thread/simesvkancmscrtsciwq forum.dlang.org
Nov 25 2021
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Thursday, 25 November 2021 at 15:10:10 UTC, Paul Backus wrote:
 On Thursday, 25 November 2021 at 14:59:54 UTC, Stanislav Blinov 
 wrote:
 If you take the first parameter by reference, that shouldn't 
 happen. I.e. in

 ```d
 void _d_arrayctor(A,B)(return ref A to, scope B from)
 if (is(A : X[]) && is (B : Y[]) && 
 is(immutable(typeof(A.init[0])) == 
 immutable(typeof(B.init[0])))) { /* ... */ }
 ```

 `A` would be a static array, taken by reference.
Unfortunately it is impossible to implement `_d_arrayctor` with this signature without invoking undefined behavior. See the discussion in this thread for details: https://forum.dlang.org/thread/simesvkancmscrtsciwq forum.dlang.org
Yikes. But then this goes the same for the "hack" version of the OP, does it not?.. What can be done then? If I understand correctly, the union approach was arrived at due to void initialization, as in case of exception array of garbage data is getting destructed. But then, couldn't one just write a T.init into the whole array if an exception is thrown, and not rely on a union at all?.. ```d A _d_arrayctor(A,B)(B from) if (appropriateContraintGoesHere) { Unqual!(typeof(A.init[0]))[A.length] result = void; version (_D_BetterC) {} else scope(failure) foreach (ref it; result) emplaceInitializer(it); // this is nothrow copyEmplaceRange(from, result[]); return result; } ``` though there's the question of calling appropriate copy ctors, "thanks" to all the qualifier combinations.
Nov 25 2021
next sibling parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Thursday, 25 November 2021 at 16:16:11 UTC, Stanislav Blinov 
wrote:

         foreach (ref it; result)
             emplaceInitializer(it); // this is nothrow
...pff, yeah, of course, and then we run into a segfault if it's an array of nested structs, which was the problem "solved" with void initialization in the first place. Not fun >:-E
Nov 25 2021
parent russhy <russhy gmail.com> writes:
Very nice post, with benchmark and graph to visualize! thanks for 
the effort!
Nov 25 2021
prev sibling parent reply Teodor Dutu <teodor.dutu gmail.com> writes:
On Thursday, 25 November 2021 at 16:16:11 UTC, Stanislav Blinov 
wrote:
 What can be done then? If I understand correctly, the union 
 approach was arrived at due to void initialization, as in case 
 of exception array of garbage data is getting destructed. But 
 then, couldn't one just write a T.init into the whole array if 
 an exception is thrown, and not rely on a union at all?..
Actually, I used void initialisation because `T.init` couldn't be used when T is a nested struct, because of the context pointer, which is unavailable in the scope of `_d_arrayctor`. On Thursday, 25 November 2021 at 16:16:11 UTC, Stanislav Blinov wrote:
 
```
    foreach (ref it; result)
        emplaceInitializer(it); // this is nothrow
```
...pff, yeah, of course, and then we run into a segfault if it's an array of nested structs, which was the problem "solved" with void initialization in the first place. Not fun >:-E
I think you've already touched on what I said above, here. It's just that there wouldn't be a seg fault, but a compiler error, because of the aforementioned context pointer.
Nov 25 2021
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Thursday, 25 November 2021 at 18:50:28 UTC, Teodor Dutu wrote:
 On Thursday, 25 November 2021 at 16:16:11 UTC, Stanislav Blinov 
 wrote:
 What can be done then? If I understand correctly, the union 
 approach was arrived at due to void initialization, as in case 
 of exception array of garbage data is getting destructed. But 
 then, couldn't one just write a T.init into the whole array if 
 an exception is thrown, and not rely on a union at all?..
Actually, I used void initialisation because `T.init` couldn't be used when T is a nested struct, because of the context pointer, which is unavailable in the scope of `_d_arrayctor`.
I know, I was the one that suggested void initialization :D
 On Thursday, 25 November 2021 at 16:16:11 UTC, Stanislav Blinov 
 wrote:
 
```
    foreach (ref it; result)
        emplaceInitializer(it); // this is nothrow
```
...pff, yeah, of course, and then we run into a segfault if it's an array of nested structs, which was the problem "solved" with void initialization in the first place. Not fun >:-E
I think you've already touched on what I said above, here. It's just that there wouldn't be a seg fault, but a compiler error, because of the aforementioned context pointer.
I'm just not expressing myself clearly :) This won't be a compile error. The `emplaceInitializer` above would plop a null context pointer into `it`, which, if the destructor actually accesses context, naturally would lead to a segfault in __ArrayDtor. This can be avoided though, if you leech the context pointer from the elements of `from` when emplacing the initializer.
Nov 25 2021
parent reply kinke <noone nowhere.com> writes:
On Thursday, 25 November 2021 at 19:58:11 UTC, Stanislav Blinov 
wrote:
 I'm just not expressing myself clearly :) This won't be a 
 compile error. The `emplaceInitializer` above would plop a null 
 context pointer into `it`, which, if the destructor actually 
 accesses context, naturally would lead to a segfault in 
 __ArrayDtor.
Note that this is exactly the current behavior of intimately-related .dup for non-POD slices: https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/object.d#L3601-L3627
Nov 25 2021
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Friday, 26 November 2021 at 02:42:23 UTC, kinke wrote:
 On Thursday, 25 November 2021 at 19:58:11 UTC, Stanislav Blinov 
 wrote:
 I'm just not expressing myself clearly :) This won't be a 
 compile error. The `emplaceInitializer` above would plop a 
 null context pointer into `it`, which, if the destructor 
 actually accesses context, naturally would lead to a segfault 
 in __ArrayDtor.
Note that this is exactly the current behavior of intimately-related .dup for non-POD slices: https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/object.d#L3601-L3627
https://issues.dlang.org/show_bug.cgi?id=22547
Nov 26 2021
parent reply kinke <noone nowhere.com> writes:
On Friday, 26 November 2021 at 11:59:37 UTC, Stanislav Blinov 
wrote:
 On Friday, 26 November 2021 at 02:42:23 UTC, kinke wrote:
 Note that this is exactly the current behavior of 
 intimately-related .dup for non-POD slices: 
 https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/object.d#L3601-L3627
https://issues.dlang.org/show_bug.cgi?id=22547
Thanks for filing. What I wanted to get at is that there seems to be a need for a `copyEmplaceArray(S, T)(S[] source, T[] target)`, which can be used for _d_arrayctor, dup and copyEmplace (for static arrays). copyEmplace for static arrays currently handles throwing copies in yet another way (https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/core/life ime.d#L1266-L1285), but that's already filed: https://issues.dlang.org/show_bug.cgi?id=22177 With copyEmplaceArray, _d_arrayctor could keep on stack-allocating the static array result and returning it via NRVO.
Nov 26 2021
parent Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Friday, 26 November 2021 at 14:58:04 UTC, kinke wrote:
 On Friday, 26 November 2021 at 11:59:37 UTC, Stanislav Blinov 
 wrote:
 On Friday, 26 November 2021 at 02:42:23 UTC, kinke wrote:
 Note that this is exactly the current behavior of 
 intimately-related .dup for non-POD slices: 
 https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/object.d#L3601-L3627
https://issues.dlang.org/show_bug.cgi?id=22547
Thanks for filing. What I wanted to get at is that there seems to be a need for a `copyEmplaceArray(S, T)(S[] source, T[] target)`, which can be used for _d_arrayctor, dup and copyEmplace (for static arrays). copyEmplace for static arrays currently handles throwing copies in yet another way (https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/core/life ime.d#L1266-L1285), but that's already filed: https://issues.dlang.org/show_bug.cgi?id=22177 With copyEmplaceArray, _d_arrayctor could keep on stack-allocating the static array result and returning it via NRVO.
I've left some thoughts on that in the bug tracker. As for making a generic utility - absolutely agree. However, it does require an upgrade for `copyEmplace` - it __has__ to be made to work in CTFE, which it currently doesn't due to the way it blits. If `_d_arrayctor` is to be made a lowering for array initialization, **and** involve `copyEmplace`, we need it to work at compile time. `.dup` can (and does) work around it trivially, but `_d_arrayctor` wouldn't be able to.
Nov 27 2021
prev sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
I would look at placing an out or ref on one of the two parameters and 
rewrite the call slightly, rather than a new parameter.

F[] from = [...], to;
to._d_arrayctor(form);

To something more like that.

But the benchmarks will say weather this works or not.
Nov 25 2021