www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Possible bug when instantiating template function with nested struct

reply Teodor Dutu <teodor.dutu gmail.com> writes:
Hi,

As part of my project for SAoC 2021, I am trying to rewrite the 
hook to `_d_arrayctor` to a template function. This work was 
started by [Dan Printzell](https://github.com/Vild/) 2 years ago. 
Dan managed to implement a template version of `_d_arrayctor` in 
druntime and took some steps to switch to this version in dmd as 
well:
- https://github.com/dlang/druntime/pull/2655.
- https://github.com/dlang/dmd/pull/10102

In essence, Dan performs the following lowering:
```d
T[2] a;
T[2] b = a;
// is lowered to:
T[2] a;
T[2] b;
_darrayctor(b[], a[]);
```

After picking up Dan's work, I brought [some 
fixes](https://github.com/dlang/dmd/pull/13116) to his changes to 
dmd, so that the code now passes the tests in dmd and druntime. 
However, a few warnings issued by some tests in phobos revealed a 
flaw in `_d_arrayctor`'s signature. Some time ago, I wrote [a 
forum 
post](https://forum.dlang.org/post/simesvkancmscrtsciwq forum.dlang.org) about
this issue.

It goes something like this: when lowered using `const` or 
`immutable` arguments, `_d_arrayctor` becomes strongly pure. For 
instance, in the code snippet below,
```d
struct S {};
const S[2] b;
const S[2] a = b;
```
the line `const S[2] a = b;` is lowered to `_d_arrayctor(a[], 
b[])`, which is the intended behaviour.

But, since, in this case, `_d_arrayctor` is strongly pure and 
since its return value is ignored, the compiler issues the 
warning in the test mentioned above. In addition, its strong 
purity might also cause calls to `_d_arrayctor` to be removed by 
the compiler as part of the optimisation phase.

My mentors and I first tried changing the type of either `to` or 
the `from` parameters to a mutable `void[]`, but in this case the 
compiler was unable to instantiate the function's template 
correctly. So this solution didn't work.

The only alternative we could initially come up with was to force 
`_d_arrayctor` to be weakly pure instead. We achieved this by 
adding a third, unused, pointer-type parameter, as implemented in 
[this PR](https://github.com/dlang/druntime/pull/3587), which 
changes `_d_arrayctor`'s signature to:
```d
Tarr _d_arrayctor(Tarr : T[], T)(return scope Tarr to, scope Tarr 
from, char* makeWeaklyPure = null)  trusted
```
But this is merely a stop-gap solution, because it acts against 
the language, by denying one of its properties: purity.

Razvan asked around and found a new, more elegant approach: 
change the signature of `_d_arrayctor` so that it creates the 
destination array itself, instead of simply modifying it. This 
will make use of the function's return value, thus removing the 
warnings from phobos that I mentioned above. So the lowering 
would be changed to something like:
```d
T[2] a;
T[2] b = a;
// would be roughly lowered to:
T[2] a;
T[2] b;
b = _darrayctor(a[], b.length);
```
The point of this approach is to make use of NRVO so that the 
contents of `b` are initialised directly inside `_d_arrayctor`, 
whithout having to copy them back to `b` after the function call.

But even this idea ran into trouble, as, by giving `b.length` as 
a function parameter, the constructed array can only be a dynamic 
array. As this lowering is only performed for static arrays, NRVO 
is not possible. The code works, the tests pass, but it's 
inefficient, thus possibly making the entire replacement of the 
runtime hook kind of useless. A working implementation of this 
approach can be found 
[here](https://github.com/teodutu/druntime/blob/38cbc346cbb0142707161b1dce3ab22187c80dad/src/core/internal/array/construction.d).

Then we had several attempts to create the returned array 
statically inside the scope of `_d_arrayctor` in the hope of 
triggering NRVO. The first was to pass the length of the newly 
created array as another template parameter, like so (note that 
this would create new a template instance for every different 
length of the created array and that's a lot of code):
```d
T[] _d_arrayctor(Tarr : T[], T, size_t length)(scope Tarr from) 
 trusted
{
     // ...
     T[length] to;
     // ...
     return to;  // this is line 83
}
```
As you've probably guessed, this code does not compile because it 
returns a reference to a local variable: `to`:
```
src/core/internal/array/construction.d(83): Error: returning 
`cast(S[])to` escapes a reference to local variable `to`
```

So I tried changing the function's signature once again:
```d
Tarr1 _d_arrayctor(Tarr1 : T1[], Tarr2 : T2[], T1, T2)(scope 
Tarr2 from)  trusted
{
     // ...
     Tarr1 to;  // this is line 35
     // ...
     return to;
}
```
This attempt binds `Tarr1` to a static array. In the case of 
[this 
unittest](https://github.com/dlang/druntime/blob/69ba1f900733f9929d3f704c9c66d393806a343b/src/core/internal/array/cons
ruction.d#L84-L99), for example, `Tarr` is bound to `S[4]`, after changing the
call to `_d_arrayctor` to:
```d
arr1 = _d_arrayctor!(typeof(arr1), typeof(arr2))(arr2[]);
```
This should have worked, but instead, the compiler raises a 
rather strange error:
```
src/core/internal/array/construction.d(35): Error: cannot access 
frame pointer of 
`core.internal.array.construction.__unittest_L87_C7.S`
```

The error seems to have something to do with `struct S` which is 
used to instantiate the template. When changing it and `counter` 
to `static`, the unittest passes.

Finally, my question is related to this error. Is this normal 
behaviour? Razvan and I tried to search for this bug and found 
these two issues:
- https://issues.dlang.org/show_bug.cgi?id=8850
- https://issues.dlang.org/show_bug.cgi?id=8863

[This comment](https://issues.dlang.org/show_bug.cgi?id=8863#c2) 
points out that indeed this is a bug. Or, at least, it was back 
in 2012. It was [supposedly 
fixed](https://github.com/dlang/dmd/commit/c6a41c414aa9177aef7dac
ac131addba20abf32), but it still manifests itself.

What could I do at this point?

Thanks,
Teodor
Nov 04 2021
parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Thursday, 4 November 2021 at 17:28:14 UTC, Teodor Dutu wrote:
 Hi,

 As part of my project for SAoC 2021, I am trying to rewrite the 
 hook to `_d_arrayctor` to a template function. This work was 
 started by [Dan Printzell](https://github.com/Vild/) 2 years 
 ago. Dan managed to implement a template version of 
 `_d_arrayctor` in druntime and took some steps to switch to 
 this version in dmd as well:
 - https://github.com/dlang/druntime/pull/2655.
 - https://github.com/dlang/dmd/pull/10102

 In essence, Dan performs the following lowering:
 ```d
 T[2] a;
 T[2] b = a;
 // is lowered to:
 T[2] a;
 T[2] b;
 _darrayctor(b[], a[]);
 ```

 After picking up Dan's work, I brought [some 
 fixes](https://github.com/dlang/dmd/pull/13116) to his changes 
 to dmd, so that the code now passes the tests in dmd and 
 druntime. However, a few warnings issued by some tests in 
 phobos revealed a flaw in `_d_arrayctor`'s signature. Some time 
 ago, I wrote [a forum 
 post](https://forum.dlang.org/post/simesvkancmscrtsciwq forum.dlang.org) about
this issue.
I've always wondered, whats the problem with representing an array through a simple struct? I've seen multiple comments in the forum where people mentioned that an array is basically a struct with one pointer and length. So why not have entire array lower to smth like this: ```d struct Array(T) { private T* store; private size_t length; this(inout ref Array!T array); // ... And rest of methods implementing array behavior } ``` Then compiler would just lower down all T[] expressions to Array!T, and then let druntime implement semantics of the array. This would be much more flexible since you won't need to go inside compiler itself to fix bugs related to array behavior, unless you need to change the struct interface that the compiler potentially will use. Best regards, Alexandru.
Nov 07 2021
next sibling parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Sunday, 7 November 2021 at 12:02:49 UTC, Alexandru Ermicioi 
wrote:

 Then compiler would just lower down all T[] expressions to 
 Array!T, and then let druntime implement semantics of the array.
BetterC?..
Nov 07 2021
parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Sunday, 7 November 2021 at 12:32:17 UTC, Stanislav Blinov 
wrote:
 On Sunday, 7 November 2021 at 12:02:49 UTC, Alexandru Ermicioi 
 wrote:

 Then compiler would just lower down all T[] expressions to 
 Array!T, and then let druntime implement semantics of the 
 array.
BetterC?..
Are they working properly today though? Per my understanding these funcs mentioned in this thread are also in druntime.
Nov 07 2021
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Sunday, 7 November 2021 at 12:55:15 UTC, Alexandru Ermicioi 
wrote:
 On Sunday, 7 November 2021 at 12:32:17 UTC, Stanislav Blinov 
 wrote:
 On Sunday, 7 November 2021 at 12:02:49 UTC, Alexandru Ermicioi 
 wrote:

 Then compiler would just lower down all T[] expressions to 
 Array!T, and then let druntime implement semantics of the 
 array.
BetterC?..
Are they working properly today though?
No, they don't. And won't until people stop trying to take the compiler's responsibilities away from it.
 Per my understanding these funcs mentioned in this thread are 
 also in druntime.
Arrays, ostensibly, are first class citizens in D, therefore should be implemented in the language, not a library.
Nov 07 2021
parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Sunday, 7 November 2021 at 13:28:16 UTC, Stanislav Blinov 
wrote:
 On Sunday, 7 November 2021 at 12:55:15 UTC, Alexandru Ermicioi 
 wrote:
 On Sunday, 7 November 2021 at 12:32:17 UTC, Stanislav Blinov 
 wrote:
 On Sunday, 7 November 2021 at 12:02:49 UTC, Alexandru 
 Ermicioi wrote:

 Then compiler would just lower down all T[] expressions to 
 Array!T, and then let druntime implement semantics of the 
 array.
BetterC?..
Are they working properly today though?
No, they don't. And won't until people stop trying to take the compiler's responsibilities away from it.
 Per my understanding these funcs mentioned in this thread are 
 also in druntime.
Arrays, ostensibly, are first class citizens in D, therefore should be implemented in the language, not a library.
But why does it matter where you put it? Array operations would still get lowered to some code implementing the desired behavior. In the case I've suggested to a struct. This struct could be hard-coded in compiler itself, or put in a separate file (not necessarily druntime). It's just more flexible, and modular, if it is in some file rather hard-coded in compiler itself. As such I don't think the placement of array behavior, does matter to how this behavior is being implemented, be it a couple of funcs, or a struct. The question, is more what are downsides and things preventing at doing it as struct, since imho, a struct would perfectly represent what array is in d: A grouping of pointer and length, with couple of methods to operate upon them.
Nov 07 2021
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Sunday, 7 November 2021 at 13:59:11 UTC, Alexandru Ermicioi 
wrote:

 Arrays, ostensibly, are first class citizens in D, therefore 
 should be implemented in the language, not a library.
But why does it matter where you put it?
Because this should just compile and work: ```d T[3] a = T(42); ``` for any T that can be thus instantiated. At the moment, whether it does even compile depends: - on what T actually is, - on where this instantiation is taking place, - on your compile options. Which is just ridiculous.
Nov 07 2021
parent reply Teodor Dutu <teodor.dutu gmail.com> writes:
I think the issue of array representation is a serious enough 
topic that it deserves its own separate thread. In addition, this 
issue is outside the scope of initial question, which revolves 
around the bug highlighted here:
- https://issues.dlang.org/show_bug.cgi?id=8850
- https://issues.dlang.org/show_bug.cgi?id=8863

Do you have any suggestions about it?
Nov 07 2021
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Sunday, 7 November 2021 at 16:07:19 UTC, Teodor Dutu wrote:
 I think the issue of array representation is a serious enough 
 topic that it deserves its own separate thread. In addition, 
 this issue is outside the scope of initial question, which 
 revolves around the bug highlighted here:
 - https://issues.dlang.org/show_bug.cgi?id=8850
 - https://issues.dlang.org/show_bug.cgi?id=8863

 Do you have any suggestions about it?
void-initialize your to-be-returned array in _d_arrayctor? Which BTW should also bring your attention to the fact that such a function can never be trusted, and can only be as safe as the actual ctor calls it would defer to.
Nov 07 2021
parent reply Teodor Dutu <teodor.dutu gmail.com> writes:
On Sunday, 7 November 2021 at 17:01:20 UTC, Stanislav Blinov 
wrote:
 On Sunday, 7 November 2021 at 16:07:19 UTC, Teodor Dutu wrote:
 I think the issue of array representation is a serious enough 
 topic that it deserves its own separate thread. In addition, 
 this issue is outside the scope of initial question, which 
 revolves around the bug highlighted here:
 - https://issues.dlang.org/show_bug.cgi?id=8850
 - https://issues.dlang.org/show_bug.cgi?id=8863

 Do you have any suggestions about it?
void-initialize your to-be-returned array in _d_arrayctor?
Won't this be the same as doing this: ```d T[] to; // to be returned to.length = length; ``` The problem with this approach is that `to` isn't NRVO'd because the lowering to `_d_arrayctor` is only done for static arrays.
 Which BTW should also bring your attention to the fact that 
 such a function can never be  trusted, and can only be as  safe 
 as the actual ctor calls it would defer to.
I used ` trusted` because some unittests that used this lowering were ` safe`. It was a compile-time workaround.
Nov 07 2021
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Sunday, 7 November 2021 at 17:47:45 UTC, Teodor Dutu wrote:

 void-initialize your to-be-returned array in _d_arrayctor?
Won't this be the same as doing this: ```d T[] to; // to be returned to.length = length; ``` The problem with this approach is that `to` isn't NRVO'd because the lowering to `_d_arrayctor` is only done for static arrays.
??? No, it will not. The problem in your ```d Tarr1 _d_arrayctor(Tarr1 : T1[], Tarr2 : T2[], T1, T2)(scope Tarr2 from) { Tarr1 to; // ... return to; } ``` is the default initialization. S from the unittest is a nested struct, and can't be instantiated outside of scope of its parent. But this should work: ```d // Obviously concrete implementation would be generic here, and copy/blit accordingly. This one explicitly calls copy ctor for illustration purposes only. Tarr1 _d_arrayctor(Tarr1 : T1[], Tarr2 : T2[], T1, T2)(scope Tarr2 from) { // Infer safe-ty from an actual ctor. // Note that is is a nuclear option. Ideally you'd check a trait, // something like isSafeCopyable!(T1, T2), which would need to be defined, // so that no code gets emitted here regardless of build options if (false) { T1* a; T2* b; (*a).__ctor(*b); } return () trusted { Tarr1 to = void; foreach (i, ref it; to) it.__ctor(from[i]); return to; } (); } void main() safe { int counter; struct Nested { this(return ref scope typeof(this) n) { /* ... */ } ~this() { ++counter; } } Nested[3] arr1; Nested[3] arr2 = _d_arrayctor!(typeof(arr1))(arr1); } ```
 I used ` trusted` because some unittests that used this 
 lowering were ` safe`. It was a compile-time workaround.
? It should pass unittests without lying to the compiler, by correctly inferring safe-ty.
Nov 07 2021
parent Teodor Dutu <teodor.dutu gmail.com> writes:
On Sunday, 7 November 2021 at 18:15:53 UTC, Stanislav Blinov 
wrote:
 ??? No, it will not. The problem in your

 ```d
 Tarr1 _d_arrayctor(Tarr1 : T1[], Tarr2 : T2[], T1, T2)(scope 
 Tarr2 from)
 {
     Tarr1 to;
     // ...
     return to;
 }
 ```

 is the default initialization. S from the unittest is a nested 
 struct, and can't be instantiated outside of scope of its 
 parent. But this should work:

 ```d
 [...]
 Tarr1 to = void;
 [...]
 ```
Thanks a lot! It worked.
 I used ` trusted` because some unittests that used this 
 lowering were ` safe`. It was a compile-time workaround.
? It should pass unittests without lying to the compiler, by correctly inferring safe-ty.
Yes, you're right. ` trusted` was more of a stop-gap solution in order to get the code to work in the first place. Now that it does, I should look into a nicer way of having ` safe`-ty inferred. I'll probably follow your advice here, too. Thanks again!
Nov 07 2021
prev sibling parent reply Basile B. <b2.temp gmx.com> writes:
On Sunday, 7 November 2021 at 12:02:49 UTC, Alexandru Ermicioi 
wrote:
 I've always wondered, whats the problem with representing an 
 array through a simple struct?

 I've seen multiple comments in the forum where people mentioned 
 that an array is basically a struct with one pointer and 
 length. So why not have entire array lower to smth like this:

 [...]

 Best regards,
 Alexandru.
- `.length` and `.ptr` becomes functions calls. - things like `.length++` is not really supported, you have to make a `Length` struct for that - implicit conversions, templates constraints, would lead to plenty of templates instances (things like `if (is(T == U[]))` would require to instantiate `U[]` right ?) - dynamic arrays are part of the type system, being `TypeNext` derived they share common methods with pointers and static arrays. - the internal type is still required (or plenty of special cases) so that for example errors message display the type representation correctly and not as lowered. - CTFE must use the new struct template too ? - apparently (recently read that elsewhere) control of the bound checks cannot be based on `version` I think that those arrays, despite of small technic problem to implement them, would work but would slow compilation down (even without -O) and the code generated would be less efficient.
Nov 07 2021
parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Sunday, 7 November 2021 at 14:08:32 UTC, Basile B. wrote:
 - `.length` and `.ptr` becomes functions calls.
 - things like `.length++` is not really supported, you have to 
 make a `Length` struct for that
That would make sense, since even right now .length is not just a simple number, otherwise, you wouldn't be able change the size of an array (i.e. allocate more memory).
 - implicit conversions, templates constraints, would lead to 
 plenty of templates instances (things like `if (is(T == U[]))` 
 would require to instantiate `U[]` right ?)
No idea about it being instantiated in is template. Are templates instantiated when doing is expression checks? Question to people knowledgeable about this.
 - dynamic arrays are part of the type system, being `TypeNext` 
 derived they share common methods with pointers and static 
 arrays.
That would be a breaking change, if the struct is not treated specially by compiler (i.e. compiler replacing the type).
 - the internal type is still required (or plenty of special 
 cases) so that for example errors message display the type 
 representation correctly and not as lowered.
So perhaps the internal type could then be a wrapper over struct one? Could that solve the issue with internal representation, and perhaps the type system (previous statement)?
 - CTFE must use the new struct template too?
What do you mean by that?
 - apparently (recently read that elsewhere) control of the 
 bound checks cannot be based on `version`
Why can't they?
Nov 07 2021
parent Basile B. <b2.temp gmx.com> writes:
On Sunday, 7 November 2021 at 14:38:14 UTC, Alexandru Ermicioi 
wrote:
 On Sunday, 7 November 2021 at 14:08:32 UTC, Basile B. wrote:
 - `.length` and `.ptr` becomes functions calls.
 - things like `.length++` is not really supported, you have to 
 make a `Length` struct for that
That would make sense, since even right now .length is not just a simple number, otherwise, you wouldn't be able change the size of an array (i.e. allocate more memory).
Actually I had in mind the simple fact to read array properties. Good news, it turns out that this point **is not valid**, in both cases optimizers (must be the common subexpression optim) only read the property once (according to [this](https://godbolt.org/z/E3eTEdWxP)).
 - implicit conversions, templates constraints, would lead to 
 plenty of templates instances (things like `if (is(T == U[]))` 
 would require to instantiate `U[]` right ?)
No idea about it being instantiated in is template. Are templates instantiated when doing is expression checks? Question to people knowledgeable about this.
 - dynamic arrays are part of the type system, being `TypeNext` 
 derived they share common methods with pointers and static 
 arrays.
That would be a breaking change, if the struct is not treated specially by compiler (i.e. compiler replacing the type).
 - the internal type is still required (or plenty of special 
 cases) so that for example errors message display the type 
 representation correctly and not as lowered.
So perhaps the internal type could then be a wrapper over struct one? Could that solve the issue with internal representation, and perhaps the type system (previous statement)?
 - CTFE must use the new struct template too?
What do you mean by that?
I mean it must use the AST of the new templatized struct.
 - apparently (recently read that elsewhere) control of the 
 bound checks cannot be based on `version`
Why can't they?
Because versions are not part of the mangle, so when you link you don't know if what's been generated in a *.o has bound checks activated or not.
Nov 07 2021