digitalmars.D - Choosing an Approach for the Template Lowering of _d

digitalmars.D - Choosing an Approach for the Template Lowering of _d_arrayctor

Teodor Dutu (99/99) Nov 25 2021 Hi,

Stanislav Blinov (4/7) Nov 25 2021 Hm... if you're going to have a pointer parameter, why not

Teodor Dutu (13/21) Nov 25 2021 The first iteration of `_d_arrayctor` was like this, as

Stanislav Blinov (10/31) Nov 25 2021 If you take the first parameter by reference, that shouldn't

Paul Backus (6/15) Nov 25 2021 Unfortunately it is impossible to implement `_d_arrayctor` with

Stanislav Blinov (23/41) Nov 25 2021 Yikes. But then this goes the same for the "hack" version of the

Stanislav Blinov (5/7) Nov 25 2021 ...pff, yeah, of course, and then we run into a segfault if it's

russhy (2/2) Nov 25 2021 Very nice post, with benchmark and graph to visualize! thanks for

Teodor Dutu (10/25) Nov 25 2021 Actually, I used void initialisation because `T.init` couldn't be

Stanislav Blinov (8/33) Nov 25 2021 I'm just not expressing myself clearly :) This won't be a compile

kinke (5/10) Nov 25 2021 Note that this is exactly the current behavior of

Stanislav Blinov (2/12) Nov 26 2021 https://issues.dlang.org/show_bug.cgi?id=22547

kinke (11/16) Nov 26 2021 Thanks for filing. What I wanted to get at is that there seems to

Stanislav Blinov (9/26) Nov 27 2021 I've left some thoughts on that in the bug tracker.

rikki cattermole (6/6) Nov 25 2021 I would look at placing an out or ref on one of the two parameters and

Teodor Dutu <teodor.dutu gmail.com> writes:

Hi,

As part of SAoC 2021, I changed the lowering of `_d_arrayctor` 
from using the [runtime 
hook](https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/rt/arra
assign.d#L170-L201) to a template, via theses PRs:
- https://github.com/dlang/dmd/pull/13116
- https://github.com/dlang/druntime/pull/3627

We'll call the former approach the **hook** approach and the 
latter, the **template** approach.

This template approach required me to declare the returned array 
inside a union. This was needed because by declaring this array 
directly, the compiler was inserting a call to `__ArrayDtor` 
before exiting the function. This call was in contradiction with 
the error handling already performed by `_d_arrayctor`. I gave a 
few more details about this issue in [this 
post](https://forum.dlang.org/post/ukhovyoowgzjeaapyuvd forum.dlang.org).
However, this came with the drawback of not using NRVO, because of the [cast at
end of the function](https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/core/internal/array/
onstruction.d#L88). This additional copy causes a performance decrease,
compared to the hook approach, as I'm going to show later in this post.

A third approach, which I call the **hack** approach, is very 
close to the one using the template. The difference is that the 
hack adds a third, unused, pointer-type parameter to 
`_d_arrayctor`, in order to change the function's strong purity 
to weak purity. The reasons for this unorthodox approach are 
better explained in [this 
post](https://forum.dlang.org/post/simesvkancmscrtsciwq forum.dlang.org), and
the PR for it is [here](https://github.com/dlang/druntime/pull/3587). Due to
this hack, there is no need to declare the returned array inside the scope of
`_d_arrayctor`, which also removes the need for the union trick. As a result,
no additional copying is performed and this approach has increased performance
when compared to the hook approach.

In order to measure the performance of each lowering, I designed 
[this](https://gist.github.com/teodutu/c6b01561e8eb0d12f9fc7ce11b519d9a)
benchmark, which I compiled with dmd and ran, using all 3 approaches. It
measures the time taken by the call to `_d_arrayctor` for 3 struct sizes: small
(empty), medium (64 bytes) and large (256 bytes), as well as 3 array lengths:
small (1 element), medium (64 elements) and large (256 elements). 256 elements
is a large enough array, because the lowering to `_d_arrayctor` is only
performed for static arrays, which are unlikely to be much larger. The
numerical results can be seen below:
```
Running benchmark on branch master (_d_arrayctor as a hook):
_0B_Struct_1Elem   1000000 runs: average time = 13ms; std dev = 
9.53674e-06
_0B_Struct_64Elems   1000000 runs: average time = 280.4ms; std 
dev = 0.489898
_0B_Struct_256Elems   1000000 runs: average time = 1062.18ms; std 
dev = 0.433128
_64B_Struct_1Elem   1000000 runs: average time = 13.01ms; std dev 
= 0.0994988
_64B_Struct_64Elems   1000000 runs: average time = 309.99ms; std 
dev = 0.0994992
_64B_Struct_256Elems   1000000 runs: average time = 1194.11ms; 
std dev = 0.31289
_256B_Struct_1Elem   1000000 runs: average time = 15ms; std dev = 
1.43051e-05
_256B_Struct_64Elems   1000000 runs: average time = 482.23ms; std 
dev = 1.73698
_256B_Struct_256Elems   1000000 runs: average time = 2960.72ms; 
std dev = 1.20067

Running benchmark with _d_arrayctor as a template:
0B_Struct_1Elem   1000000 runs: average time = 9.6934ms; std dev 
= 0.019658
0B_Struct_64Elems   1000000 runs: average time = 219.086ms; std 
dev = 0.467551
0B_Struct_256Elems   1000000 runs: average time = 820.951ms; std 
dev = 1.88878
64B_Struct_1Elem   1000000 runs: average time = 15.6967ms; std 
dev = 0.046066
64B_Struct_64Elems   1000000 runs: average time = 287.54ms; std 
dev = 0.979795
64B_Struct_256Elems   1000000 runs: average time = 1177.63ms; std 
dev = 1.06447
256B_Struct_1Elem   1000000 runs: average time = 20.8763ms; std 
dev = 0.259867
256B_Struct_64Elems   1000000 runs: average time = 981.405ms; std 
dev = 0.67177
256B_Struct_256Elems   1000000 runs: average time = 4029.59ms; 
std dev = 0.511771

Running benchmark with _d_arrayctor as a template (with the 
hack-y 3rd parameter):
_0B_Struct_1Elem   1000000 runs: average time = 9.00001ms; std 
dev = 6.67572e-06
_0B_Struct_64Elems   1000000 runs: average time = 183.94ms; std 
dev = 0.525738
_0B_Struct_256Elems   1000000 runs: average time = 689.091ms; std 
dev = 0.286183
_64B_Struct_1Elem   1000000 runs: average time = 10.01ms; std dev 
= 0.0994987
_64B_Struct_64Elems   1000000 runs: average time = 227.23ms; std 
dev = 0.645833
_64B_Struct_256Elems   1000000 runs: average time = 925.25ms; std 
dev = 0.766486
_256B_Struct_1Elem   1000000 runs: average time = 11.07ms; std 
dev = 0.255147
_256B_Struct_64Elems   1000000 runs: average time = 417.92ms; std 
dev = 0.271293
_256B_Struct_256Elems   1000000 runs: average time = 2576.1ms; 
std dev = 8.36003
```

In order to better visualise the numbers above, I also plotted 
the bar charts below:

![_d_arrayctor Performance 
Comparison](https://i.imgur.com/s9JjXSi.png)

As you can see, the hack is faster than the hook. Its running 
times are about 15-25% lower than those of the hook for larger 
structs (64 and 256 bytes) and as much as 35% lower for the empty 
struct. On the other hand, the template approach, while being 
20-25% faster than the hook when called on the empty struct, is 
as much as 35% slower than the hook, on larger (256-byte) structs.

Due to this discrepancy in performance, I think the best approach 
is the hack, despite adding an unused parameter to 
`_d_arrayctor`. What do you think?

Thanks,
Teodor

Nov 25 2021

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Thursday, 25 November 2021 at 13:28:18 UTC, Teodor Dutu wrote:

 A third approach, which I call the **hack** approach, is very 
 close to the one using the template. The difference is that the 
 hack adds a third, unused, pointer-type parameter...

Hm... if you're going to have a pointer parameter, why not 
instead simply keep the two parameter variant but pass the first 
parameter by reference?

Nov 25 2021

Teodor Dutu <teodor.dutu gmail.com> writes:

On Thursday, 25 November 2021 at 14:24:59 UTC, Stanislav Blinov 
wrote:
 On Thursday, 25 November 2021 at 13:28:18 UTC, Teodor Dutu 
 wrote:

 A third approach, which I call the **hack** approach, is very 
 close to the one using the template. The difference is that 
 the hack adds a third, unused, pointer-type parameter...

 Hm... if you're going to have a pointer parameter, why not 
 instead simply keep the two parameter variant but pass the 
 first parameter by reference?

The first iteration of `_d_arrayctor` was like this, as 
implemented by [this 
PR](https://github.com/dlang/druntime/pull/2655). The problem 
with it was that it made the compiler issue some warnings when 
testing phobos, because the lowering of something like `S[3] dst 
= src;` looked like this: `_d_arrayctor(dst[], src[])`. 
`_d_arrayctor` was strongly pure and, since the return value of 
the function was ignored, the compiler deemed the whole function 
call useless.

I went more in-depth about this problem in [this 
post](https://forum.dlang.org/post/simesvkancmscrtsciwq forum.dlang.org).
Following this, I tried to change `_d_arrayctor` to declare the returned array
inside its own scope, hoping to make use of NRVO, but it wasn't the case,
because I had to resort to the union trick.

Nov 25 2021

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Thursday, 25 November 2021 at 14:35:56 UTC, Teodor Dutu wrote:
 On Thursday, 25 November 2021 at 14:24:59 UTC, Stanislav Blinov 
 wrote:
 On Thursday, 25 November 2021 at 13:28:18 UTC, Teodor Dutu 
 wrote:

 A third approach, which I call the **hack** approach, is very 
 close to the one using the template. The difference is that 
 the hack adds a third, unused, pointer-type parameter...

 Hm... if you're going to have a pointer parameter, why not 
 instead simply keep the two parameter variant but pass the 
 first parameter by reference?

 The first iteration of `_d_arrayctor` was like this, as 
 implemented by [this 
 PR](https://github.com/dlang/druntime/pull/2655). The problem 
 with it was that it made the compiler issue some warnings when 
 testing phobos, because the lowering of something like `S[3] 
 dst = src;` looked like this: `_d_arrayctor(dst[], src[])`. 
 `_d_arrayctor` was strongly pure and, since the return value of 
 the function was ignored, the compiler deemed the whole 
 function call useless.

If you take the first parameter by reference, that shouldn't 
happen. I.e. in

```d
void _d_arrayctor(A,B)(return ref A to, scope B from)
if (is(A : X[]) && is (B : Y[]) && 
is(immutable(typeof(A.init[0])) == immutable(typeof(B.init[0])))) 
{ /* ... */ }
```

`A` would be a static array, taken by reference.

Nov 25 2021

Paul Backus <snarwin gmail.com> writes:

On Thursday, 25 November 2021 at 14:59:54 UTC, Stanislav Blinov 
wrote:
 If you take the first parameter by reference, that shouldn't 
 happen. I.e. in

 ```d
 void _d_arrayctor(A,B)(return ref A to, scope B from)
 if (is(A : X[]) && is (B : Y[]) && 
 is(immutable(typeof(A.init[0])) == 
 immutable(typeof(B.init[0])))) { /* ... */ }
 ```

 `A` would be a static array, taken by reference.

Unfortunately it is impossible to implement `_d_arrayctor` with 
this signature without invoking undefined behavior. See the 
discussion in this thread for details: 
https://forum.dlang.org/thread/simesvkancmscrtsciwq forum.dlang.org

Nov 25 2021

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Thursday, 25 November 2021 at 15:10:10 UTC, Paul Backus wrote:
 On Thursday, 25 November 2021 at 14:59:54 UTC, Stanislav Blinov 
 wrote:
 If you take the first parameter by reference, that shouldn't 
 happen. I.e. in

 ```d
 void _d_arrayctor(A,B)(return ref A to, scope B from)
 if (is(A : X[]) && is (B : Y[]) && 
 is(immutable(typeof(A.init[0])) == 
 immutable(typeof(B.init[0])))) { /* ... */ }
 ```

 `A` would be a static array, taken by reference.

 Unfortunately it is impossible to implement `_d_arrayctor` with 
 this signature without invoking undefined behavior. See the 
 discussion in this thread for details: 
 https://forum.dlang.org/thread/simesvkancmscrtsciwq forum.dlang.org

Yikes. But then this goes the same for the "hack" version of the 
OP, does it not?..

What can be done then? If I understand correctly, the union 
approach was arrived at due to void initialization, as in case of 
exception array of garbage data is getting destructed. But then, 
couldn't one just write a T.init into the whole array if an 
exception is thrown, and not rely on a union at all?..

```d
A _d_arrayctor(A,B)(B from)
if (appropriateContraintGoesHere)
{
     Unqual!(typeof(A.init[0]))[A.length] result = void;
     version (_D_BetterC) {}
     else scope(failure)
         foreach (ref it; result)
             emplaceInitializer(it); // this is nothrow
     copyEmplaceRange(from, result[]);
     return result;
}
```

though there's the question of calling appropriate copy ctors, 
"thanks" to all the qualifier combinations.

Nov 25 2021

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Thursday, 25 November 2021 at 16:16:11 UTC, Stanislav Blinov 
wrote:

         foreach (ref it; result)
             emplaceInitializer(it); // this is nothrow

...pff, yeah, of course, and then we run into a segfault if it's 
an array of nested structs, which was the problem "solved" with 
void initialization in the first place. Not fun >:-E

Nov 25 2021

russhy <russhy gmail.com> writes:

Very nice post, with benchmark and graph to visualize! thanks for 
the effort!

Nov 25 2021

Teodor Dutu <teodor.dutu gmail.com> writes:

On Thursday, 25 November 2021 at 16:16:11 UTC, Stanislav Blinov 
wrote:
 What can be done then? If I understand correctly, the union 
 approach was arrived at due to void initialization, as in case 
 of exception array of garbage data is getting destructed. But 
 then, couldn't one just write a T.init into the whole array if 
 an exception is thrown, and not rely on a union at all?..

Actually, I used void initialisation because `T.init` couldn't be 
used when T is a nested struct, because of the context pointer, 
which is unavailable in the scope of `_d_arrayctor`.

On Thursday, 25 November 2021 at 16:16:11 UTC, Stanislav Blinov 
wrote:
 
```
    foreach (ref it; result)
        emplaceInitializer(it); // this is nothrow
```

 ...pff, yeah, of course, and then we run into a segfault if 
 it's an array of nested
 structs, which was the problem "solved" with void 
 initialization in the first place.
 Not fun >:-E

I think you've already touched on what I said above, here. It's 
just that there wouldn't be a seg fault, but a compiler error, 
because of the aforementioned context pointer.

Nov 25 2021

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Thursday, 25 November 2021 at 18:50:28 UTC, Teodor Dutu wrote:
 On Thursday, 25 November 2021 at 16:16:11 UTC, Stanislav Blinov 
 wrote:
 What can be done then? If I understand correctly, the union 
 approach was arrived at due to void initialization, as in case 
 of exception array of garbage data is getting destructed. But 
 then, couldn't one just write a T.init into the whole array if 
 an exception is thrown, and not rely on a union at all?..

 Actually, I used void initialisation because `T.init` couldn't 
 be used when T is a nested struct, because of the context 
 pointer, which is unavailable in the scope of `_d_arrayctor`.

I know, I was the one that suggested void initialization :D

 On Thursday, 25 November 2021 at 16:16:11 UTC, Stanislav Blinov 
 wrote:
 
```
    foreach (ref it; result)
        emplaceInitializer(it); // this is nothrow
```

 ...pff, yeah, of course, and then we run into a segfault if 
 it's an array of nested
 structs, which was the problem "solved" with void 
 initialization in the first place.
 Not fun >:-E

 I think you've already touched on what I said above, here. It's 
 just that there wouldn't be a seg fault, but a compiler error, 
 because of the aforementioned context pointer.

I'm just not expressing myself clearly :) This won't be a compile 
error. The `emplaceInitializer` above would plop a null context 
pointer into `it`, which, if the destructor actually accesses 
context, naturally would lead to a segfault in __ArrayDtor.

This can be avoided though, if you leech the context pointer from 
the elements of `from` when emplacing the initializer.

Nov 25 2021

kinke <noone nowhere.com> writes:

On Thursday, 25 November 2021 at 19:58:11 UTC, Stanislav Blinov 
wrote:
 I'm just not expressing myself clearly :) This won't be a 
 compile error. The `emplaceInitializer` above would plop a null 
 context pointer into `it`, which, if the destructor actually 
 accesses context, naturally would lead to a segfault in 
 __ArrayDtor.

Note that this is exactly the current behavior of 
intimately-related .dup for non-POD slices: 
https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/object.d#L3601-L3627

Nov 25 2021

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Friday, 26 November 2021 at 02:42:23 UTC, kinke wrote:
 On Thursday, 25 November 2021 at 19:58:11 UTC, Stanislav Blinov 
 wrote:
 I'm just not expressing myself clearly :) This won't be a 
 compile error. The `emplaceInitializer` above would plop a 
 null context pointer into `it`, which, if the destructor 
 actually accesses context, naturally would lead to a segfault 
 in __ArrayDtor.

 Note that this is exactly the current behavior of 
 intimately-related .dup for non-POD slices: 
 https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/object.d#L3601-L3627

https://issues.dlang.org/show_bug.cgi?id=22547

Nov 26 2021

kinke <noone nowhere.com> writes:

On Friday, 26 November 2021 at 11:59:37 UTC, Stanislav Blinov 
wrote:
 On Friday, 26 November 2021 at 02:42:23 UTC, kinke wrote:
 Note that this is exactly the current behavior of 
 intimately-related .dup for non-POD slices: 
 https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/object.d#L3601-L3627

 https://issues.dlang.org/show_bug.cgi?id=22547

Thanks for filing. What I wanted to get at is that there seems to 
be a need for a `copyEmplaceArray(S, T)(S[] source, T[] target)`, 
which can be used for _d_arrayctor, dup and copyEmplace (for 
static arrays). copyEmplace for static arrays currently handles 
throwing copies in yet another way 
(https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/core/life
ime.d#L1266-L1285), but that's already filed:
https://issues.dlang.org/show_bug.cgi?id=22177

With copyEmplaceArray, _d_arrayctor could keep on 
stack-allocating the static array result and returning it via 
NRVO.

Nov 26 2021

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Friday, 26 November 2021 at 14:58:04 UTC, kinke wrote:
 On Friday, 26 November 2021 at 11:59:37 UTC, Stanislav Blinov 
 wrote:
 On Friday, 26 November 2021 at 02:42:23 UTC, kinke wrote:
 Note that this is exactly the current behavior of 
 intimately-related .dup for non-POD slices: 
 https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/object.d#L3601-L3627

 https://issues.dlang.org/show_bug.cgi?id=22547

 Thanks for filing. What I wanted to get at is that there seems 
 to be a need for a `copyEmplaceArray(S, T)(S[] source, T[] 
 target)`, which can be used for _d_arrayctor, dup and 
 copyEmplace (for static arrays). copyEmplace for static arrays 
 currently handles throwing copies in yet another way 
 (https://github.com/dlang/druntime/blob/595707b1ac8a439b2a7243b0abf95d4bc56239ff/src/core/life
ime.d#L1266-L1285), but that's already filed:
https://issues.dlang.org/show_bug.cgi?id=22177

 With copyEmplaceArray, _d_arrayctor could keep on 
 stack-allocating the static array result and returning it via 
 NRVO.

I've left some thoughts on that in the bug tracker.

As for making a generic utility - absolutely agree. However, it 
does require an upgrade for `copyEmplace` - it __has__ to be made 
to work in CTFE, which it currently doesn't due to the way it 
blits. If `_d_arrayctor` is to be made a lowering for array 
initialization, **and** involve `copyEmplace`, we need it to work 
at compile time. `.dup` can (and does) work around it trivially, 
but `_d_arrayctor` wouldn't be able to.

Nov 27 2021

rikki cattermole <rikki cattermole.co.nz> writes:

I would look at placing an out or ref on one of the two parameters and 
rewrite the call slightly, rather than a new parameter.

F[] from = [...], to;
to._d_arrayctor(form);

To something more like that.

But the benchmarks will say weather this works or not.

Nov 25 2021

D Programming

C/C++ Programming

Other

digitalmars.D - Choosing an Approach for the Template Lowering of _d_arrayctor