digitalmars.D - Choosing an Approach for the Template Lowering of _d_arrayctor
- Teodor Dutu (99/99) Nov 25 2021 Hi,
- Stanislav Blinov (4/7) Nov 25 2021 Hm... if you're going to have a pointer parameter, why not
- Teodor Dutu (13/21) Nov 25 2021 The first iteration of `_d_arrayctor` was like this, as
- Stanislav Blinov (10/31) Nov 25 2021 If you take the first parameter by reference, that shouldn't
- Paul Backus (6/15) Nov 25 2021 Unfortunately it is impossible to implement `_d_arrayctor` with
- Stanislav Blinov (23/41) Nov 25 2021 Yikes. But then this goes the same for the "hack" version of the
- Stanislav Blinov (5/7) Nov 25 2021 ...pff, yeah, of course, and then we run into a segfault if it's
- russhy (2/2) Nov 25 2021 Very nice post, with benchmark and graph to visualize! thanks for
- Teodor Dutu (10/25) Nov 25 2021 Actually, I used void initialisation because `T.init` couldn't be
- Stanislav Blinov (8/33) Nov 25 2021 I'm just not expressing myself clearly :) This won't be a compile
- kinke (5/10) Nov 25 2021 Note that this is exactly the current behavior of
- Stanislav Blinov (2/12) Nov 26 2021 https://issues.dlang.org/show_bug.cgi?id=22547
- kinke (11/16) Nov 26 2021 Thanks for filing. What I wanted to get at is that there seems to
- Stanislav Blinov (9/26) Nov 27 2021 I've left some thoughts on that in the bug tracker.
- rikki cattermole (6/6) Nov 25 2021 I would look at placing an out or ref on one of the two parameters and
Hi, As part of SAoC 2021, I changed the lowering of `_d_arrayctor` from using the [runtime hook](https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/rt/arra assign.d#L170-L201) to a template, via theses PRs: - https://github.com/dlang/dmd/pull/13116 - https://github.com/dlang/druntime/pull/3627 We'll call the former approach the **hook** approach and the latter, the **template** approach. This template approach required me to declare the returned array inside a union. This was needed because by declaring this array directly, the compiler was inserting a call to `__ArrayDtor` before exiting the function. This call was in contradiction with the error handling already performed by `_d_arrayctor`. I gave a few more details about this issue in [this post](https://forum.dlang.org/post/ukhovyoowgzjeaapyuvd forum.dlang.org). However, this came with the drawback of not using NRVO, because of the [cast at end of the function](https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/core/internal/array/ onstruction.d#L88). This additional copy causes a performance decrease, compared to the hook approach, as I'm going to show later in this post. A third approach, which I call the **hack** approach, is very close to the one using the template. The difference is that the hack adds a third, unused, pointer-type parameter to `_d_arrayctor`, in order to change the function's strong purity to weak purity. The reasons for this unorthodox approach are better explained in [this post](https://forum.dlang.org/post/simesvkancmscrtsciwq forum.dlang.org), and the PR for it is [here](https://github.com/dlang/druntime/pull/3587). Due to this hack, there is no need to declare the returned array inside the scope of `_d_arrayctor`, which also removes the need for the union trick. As a result, no additional copying is performed and this approach has increased performance when compared to the hook approach. In order to measure the performance of each lowering, I designed [this](https://gist.github.com/teodutu/c6b01561e8eb0d12f9fc7ce11b519d9a) benchmark, which I compiled with dmd and ran, using all 3 approaches. It measures the time taken by the call to `_d_arrayctor` for 3 struct sizes: small (empty), medium (64 bytes) and large (256 bytes), as well as 3 array lengths: small (1 element), medium (64 elements) and large (256 elements). 256 elements is a large enough array, because the lowering to `_d_arrayctor` is only performed for static arrays, which are unlikely to be much larger. The numerical results can be seen below: ``` Running benchmark on branch master (_d_arrayctor as a hook): _0B_Struct_1Elem 1000000 runs: average time = 13ms; std dev = 9.53674e-06 _0B_Struct_64Elems 1000000 runs: average time = 280.4ms; std dev = 0.489898 _0B_Struct_256Elems 1000000 runs: average time = 1062.18ms; std dev = 0.433128 _64B_Struct_1Elem 1000000 runs: average time = 13.01ms; std dev = 0.0994988 _64B_Struct_64Elems 1000000 runs: average time = 309.99ms; std dev = 0.0994992 _64B_Struct_256Elems 1000000 runs: average time = 1194.11ms; std dev = 0.31289 _256B_Struct_1Elem 1000000 runs: average time = 15ms; std dev = 1.43051e-05 _256B_Struct_64Elems 1000000 runs: average time = 482.23ms; std dev = 1.73698 _256B_Struct_256Elems 1000000 runs: average time = 2960.72ms; std dev = 1.20067 Running benchmark with _d_arrayctor as a template: 0B_Struct_1Elem 1000000 runs: average time = 9.6934ms; std dev = 0.019658 0B_Struct_64Elems 1000000 runs: average time = 219.086ms; std dev = 0.467551 0B_Struct_256Elems 1000000 runs: average time = 820.951ms; std dev = 1.88878 64B_Struct_1Elem 1000000 runs: average time = 15.6967ms; std dev = 0.046066 64B_Struct_64Elems 1000000 runs: average time = 287.54ms; std dev = 0.979795 64B_Struct_256Elems 1000000 runs: average time = 1177.63ms; std dev = 1.06447 256B_Struct_1Elem 1000000 runs: average time = 20.8763ms; std dev = 0.259867 256B_Struct_64Elems 1000000 runs: average time = 981.405ms; std dev = 0.67177 256B_Struct_256Elems 1000000 runs: average time = 4029.59ms; std dev = 0.511771 Running benchmark with _d_arrayctor as a template (with the hack-y 3rd parameter): _0B_Struct_1Elem 1000000 runs: average time = 9.00001ms; std dev = 6.67572e-06 _0B_Struct_64Elems 1000000 runs: average time = 183.94ms; std dev = 0.525738 _0B_Struct_256Elems 1000000 runs: average time = 689.091ms; std dev = 0.286183 _64B_Struct_1Elem 1000000 runs: average time = 10.01ms; std dev = 0.0994987 _64B_Struct_64Elems 1000000 runs: average time = 227.23ms; std dev = 0.645833 _64B_Struct_256Elems 1000000 runs: average time = 925.25ms; std dev = 0.766486 _256B_Struct_1Elem 1000000 runs: average time = 11.07ms; std dev = 0.255147 _256B_Struct_64Elems 1000000 runs: average time = 417.92ms; std dev = 0.271293 _256B_Struct_256Elems 1000000 runs: average time = 2576.1ms; std dev = 8.36003 ``` In order to better visualise the numbers above, I also plotted the bar charts below: ![_d_arrayctor Performance Comparison](https://i.imgur.com/s9JjXSi.png) As you can see, the hack is faster than the hook. Its running times are about 15-25% lower than those of the hook for larger structs (64 and 256 bytes) and as much as 35% lower for the empty struct. On the other hand, the template approach, while being 20-25% faster than the hook when called on the empty struct, is as much as 35% slower than the hook, on larger (256-byte) structs. Due to this discrepancy in performance, I think the best approach is the hack, despite adding an unused parameter to `_d_arrayctor`. What do you think? Thanks, Teodor
Nov 25 2021
On Thursday, 25 November 2021 at 13:28:18 UTC, Teodor Dutu wrote:A third approach, which I call the **hack** approach, is very close to the one using the template. The difference is that the hack adds a third, unused, pointer-type parameter...Hm... if you're going to have a pointer parameter, why not instead simply keep the two parameter variant but pass the first parameter by reference?
Nov 25 2021
On Thursday, 25 November 2021 at 14:24:59 UTC, Stanislav Blinov wrote:On Thursday, 25 November 2021 at 13:28:18 UTC, Teodor Dutu wrote:The first iteration of `_d_arrayctor` was like this, as implemented by [this PR](https://github.com/dlang/druntime/pull/2655). The problem with it was that it made the compiler issue some warnings when testing phobos, because the lowering of something like `S[3] dst = src;` looked like this: `_d_arrayctor(dst[], src[])`. `_d_arrayctor` was strongly pure and, since the return value of the function was ignored, the compiler deemed the whole function call useless. I went more in-depth about this problem in [this post](https://forum.dlang.org/post/simesvkancmscrtsciwq forum.dlang.org). Following this, I tried to change `_d_arrayctor` to declare the returned array inside its own scope, hoping to make use of NRVO, but it wasn't the case, because I had to resort to the union trick.A third approach, which I call the **hack** approach, is very close to the one using the template. The difference is that the hack adds a third, unused, pointer-type parameter...Hm... if you're going to have a pointer parameter, why not instead simply keep the two parameter variant but pass the first parameter by reference?
Nov 25 2021
On Thursday, 25 November 2021 at 14:35:56 UTC, Teodor Dutu wrote:On Thursday, 25 November 2021 at 14:24:59 UTC, Stanislav Blinov wrote:If you take the first parameter by reference, that shouldn't happen. I.e. in ```d void _d_arrayctor(A,B)(return ref A to, scope B from) if (is(A : X[]) && is (B : Y[]) && is(immutable(typeof(A.init[0])) == immutable(typeof(B.init[0])))) { /* ... */ } ``` `A` would be a static array, taken by reference.On Thursday, 25 November 2021 at 13:28:18 UTC, Teodor Dutu wrote:The first iteration of `_d_arrayctor` was like this, as implemented by [this PR](https://github.com/dlang/druntime/pull/2655). The problem with it was that it made the compiler issue some warnings when testing phobos, because the lowering of something like `S[3] dst = src;` looked like this: `_d_arrayctor(dst[], src[])`. `_d_arrayctor` was strongly pure and, since the return value of the function was ignored, the compiler deemed the whole function call useless.A third approach, which I call the **hack** approach, is very close to the one using the template. The difference is that the hack adds a third, unused, pointer-type parameter...Hm... if you're going to have a pointer parameter, why not instead simply keep the two parameter variant but pass the first parameter by reference?
Nov 25 2021
On Thursday, 25 November 2021 at 14:59:54 UTC, Stanislav Blinov wrote:If you take the first parameter by reference, that shouldn't happen. I.e. in ```d void _d_arrayctor(A,B)(return ref A to, scope B from) if (is(A : X[]) && is (B : Y[]) && is(immutable(typeof(A.init[0])) == immutable(typeof(B.init[0])))) { /* ... */ } ``` `A` would be a static array, taken by reference.Unfortunately it is impossible to implement `_d_arrayctor` with this signature without invoking undefined behavior. See the discussion in this thread for details: https://forum.dlang.org/thread/simesvkancmscrtsciwq forum.dlang.org
Nov 25 2021
On Thursday, 25 November 2021 at 15:10:10 UTC, Paul Backus wrote:On Thursday, 25 November 2021 at 14:59:54 UTC, Stanislav Blinov wrote:Yikes. But then this goes the same for the "hack" version of the OP, does it not?.. What can be done then? If I understand correctly, the union approach was arrived at due to void initialization, as in case of exception array of garbage data is getting destructed. But then, couldn't one just write a T.init into the whole array if an exception is thrown, and not rely on a union at all?.. ```d A _d_arrayctor(A,B)(B from) if (appropriateContraintGoesHere) { Unqual!(typeof(A.init[0]))[A.length] result = void; version (_D_BetterC) {} else scope(failure) foreach (ref it; result) emplaceInitializer(it); // this is nothrow copyEmplaceRange(from, result[]); return result; } ``` though there's the question of calling appropriate copy ctors, "thanks" to all the qualifier combinations.If you take the first parameter by reference, that shouldn't happen. I.e. in ```d void _d_arrayctor(A,B)(return ref A to, scope B from) if (is(A : X[]) && is (B : Y[]) && is(immutable(typeof(A.init[0])) == immutable(typeof(B.init[0])))) { /* ... */ } ``` `A` would be a static array, taken by reference.Unfortunately it is impossible to implement `_d_arrayctor` with this signature without invoking undefined behavior. See the discussion in this thread for details: https://forum.dlang.org/thread/simesvkancmscrtsciwq forum.dlang.org
Nov 25 2021
On Thursday, 25 November 2021 at 16:16:11 UTC, Stanislav Blinov wrote:foreach (ref it; result) emplaceInitializer(it); // this is nothrow...pff, yeah, of course, and then we run into a segfault if it's an array of nested structs, which was the problem "solved" with void initialization in the first place. Not fun >:-E
Nov 25 2021
Very nice post, with benchmark and graph to visualize! thanks for the effort!
Nov 25 2021
On Thursday, 25 November 2021 at 16:16:11 UTC, Stanislav Blinov wrote:What can be done then? If I understand correctly, the union approach was arrived at due to void initialization, as in case of exception array of garbage data is getting destructed. But then, couldn't one just write a T.init into the whole array if an exception is thrown, and not rely on a union at all?..Actually, I used void initialisation because `T.init` couldn't be used when T is a nested struct, because of the context pointer, which is unavailable in the scope of `_d_arrayctor`. On Thursday, 25 November 2021 at 16:16:11 UTC, Stanislav Blinov wrote:I think you've already touched on what I said above, here. It's just that there wouldn't be a seg fault, but a compiler error, because of the aforementioned context pointer.``` foreach (ref it; result) emplaceInitializer(it); // this is nothrow ```...pff, yeah, of course, and then we run into a segfault if it's an array of nested structs, which was the problem "solved" with void initialization in the first place. Not fun >:-E
Nov 25 2021
On Thursday, 25 November 2021 at 18:50:28 UTC, Teodor Dutu wrote:On Thursday, 25 November 2021 at 16:16:11 UTC, Stanislav Blinov wrote:I know, I was the one that suggested void initialization :DWhat can be done then? If I understand correctly, the union approach was arrived at due to void initialization, as in case of exception array of garbage data is getting destructed. But then, couldn't one just write a T.init into the whole array if an exception is thrown, and not rely on a union at all?..Actually, I used void initialisation because `T.init` couldn't be used when T is a nested struct, because of the context pointer, which is unavailable in the scope of `_d_arrayctor`.On Thursday, 25 November 2021 at 16:16:11 UTC, Stanislav Blinov wrote:I'm just not expressing myself clearly :) This won't be a compile error. The `emplaceInitializer` above would plop a null context pointer into `it`, which, if the destructor actually accesses context, naturally would lead to a segfault in __ArrayDtor. This can be avoided though, if you leech the context pointer from the elements of `from` when emplacing the initializer.I think you've already touched on what I said above, here. It's just that there wouldn't be a seg fault, but a compiler error, because of the aforementioned context pointer.``` foreach (ref it; result) emplaceInitializer(it); // this is nothrow ```...pff, yeah, of course, and then we run into a segfault if it's an array of nested structs, which was the problem "solved" with void initialization in the first place. Not fun >:-E
Nov 25 2021
On Thursday, 25 November 2021 at 19:58:11 UTC, Stanislav Blinov wrote:I'm just not expressing myself clearly :) This won't be a compile error. The `emplaceInitializer` above would plop a null context pointer into `it`, which, if the destructor actually accesses context, naturally would lead to a segfault in __ArrayDtor.Note that this is exactly the current behavior of intimately-related .dup for non-POD slices: https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/object.d#L3601-L3627
Nov 25 2021
On Friday, 26 November 2021 at 02:42:23 UTC, kinke wrote:On Thursday, 25 November 2021 at 19:58:11 UTC, Stanislav Blinov wrote:https://issues.dlang.org/show_bug.cgi?id=22547I'm just not expressing myself clearly :) This won't be a compile error. The `emplaceInitializer` above would plop a null context pointer into `it`, which, if the destructor actually accesses context, naturally would lead to a segfault in __ArrayDtor.Note that this is exactly the current behavior of intimately-related .dup for non-POD slices: https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/object.d#L3601-L3627
Nov 26 2021
On Friday, 26 November 2021 at 11:59:37 UTC, Stanislav Blinov wrote:On Friday, 26 November 2021 at 02:42:23 UTC, kinke wrote:Thanks for filing. What I wanted to get at is that there seems to be a need for a `copyEmplaceArray(S, T)(S[] source, T[] target)`, which can be used for _d_arrayctor, dup and copyEmplace (for static arrays). copyEmplace for static arrays currently handles throwing copies in yet another way (https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/core/life ime.d#L1266-L1285), but that's already filed: https://issues.dlang.org/show_bug.cgi?id=22177 With copyEmplaceArray, _d_arrayctor could keep on stack-allocating the static array result and returning it via NRVO.Note that this is exactly the current behavior of intimately-related .dup for non-POD slices: https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/object.d#L3601-L3627https://issues.dlang.org/show_bug.cgi?id=22547
Nov 26 2021
On Friday, 26 November 2021 at 14:58:04 UTC, kinke wrote:On Friday, 26 November 2021 at 11:59:37 UTC, Stanislav Blinov wrote:I've left some thoughts on that in the bug tracker. As for making a generic utility - absolutely agree. However, it does require an upgrade for `copyEmplace` - it __has__ to be made to work in CTFE, which it currently doesn't due to the way it blits. If `_d_arrayctor` is to be made a lowering for array initialization, **and** involve `copyEmplace`, we need it to work at compile time. `.dup` can (and does) work around it trivially, but `_d_arrayctor` wouldn't be able to.On Friday, 26 November 2021 at 02:42:23 UTC, kinke wrote:Thanks for filing. What I wanted to get at is that there seems to be a need for a `copyEmplaceArray(S, T)(S[] source, T[] target)`, which can be used for _d_arrayctor, dup and copyEmplace (for static arrays). copyEmplace for static arrays currently handles throwing copies in yet another way (https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/core/life ime.d#L1266-L1285), but that's already filed: https://issues.dlang.org/show_bug.cgi?id=22177 With copyEmplaceArray, _d_arrayctor could keep on stack-allocating the static array result and returning it via NRVO.Note that this is exactly the current behavior of intimately-related .dup for non-POD slices: https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/object.d#L3601-L3627https://issues.dlang.org/show_bug.cgi?id=22547
Nov 27 2021
I would look at placing an out or ref on one of the two parameters and rewrite the call slightly, rather than a new parameter. F[] from = [...], to; to._d_arrayctor(form); To something more like that. But the benchmarks will say weather this works or not.
Nov 25 2021