www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - How about some __initialize magic?

reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
D lacks syntax for initializing the uninitialized. We can do this:

```d
T stuff = T(args); // or new T(args);
```

but this?..

```d
T* ptr = allocateForT();
// now what?.. Can't just do *ptr = T(args) - that's an 
assignment, not initialization!
// is T a struct? A union? A class? An int?.. Is it even a 
constructor call?..
```

This is, uh, "solved", using library functions - 
`emplaceInitializer`, `emplace`, `copyEmplace`, `moveEmplace`. 
The fact that there are __four__ functions to do this should 
already ring a bell, but if one was to look at how e.g. the 
`emplace` is implemented, there's lots and lots more to it - 
classes or structs? Constructor or no constructor? Postblit? 
Copy?.. And all the delegation... A single call to `emplace` may 
copy the bits around more than once. Talk about initializing a 
static array... Or look at `emplaceInitializer`, which the other 
three all depend upon: it is, currently, built on a hack just to 
avoid blowing up the stack (which is, ostensibly, what previous 
less hacky hack lead to). Upcoming `__traits(initSymbol)` would 
help in removing the hack, but won't help CTFE any. At various 
points of their lives, these things even explicitly called 
`memcpy`, which is just... argh! And some still do 
(`copyEmplace`, I'm looking at you). Call into CRT to blit a 
8-byte struct? With statically known size and alignment? Just to 
sidestep type system? Eh??? Much fun for copying arrays!
...And still, none of them would work in CTFE for many types, due 
to various implementation quirks (which include those very calls 
to memcpy, or reinterpret casts). This one could, potentially, be 
solved with more barbed wire and swear words, that is, code, 
but...

Thing is, all those functions are re-implementing what the 
compiler can already do, but in a library. Or rather, come very 
close to doing that, but still don't really get there. C++ with 
its library solution does this better!

What if the language specified a "magic" function, called, say, 
`__initialize`, that would just do the right thing (tm)? Given an 
lvalue, it would instruct the compiler to generate code writing 
initializer, bliting, copying, or calling the appropriate 
constructor with the arguments. And most importantly, would work 
in CTFE regardless of type, and not require weird dances around 
T.init, dummy types involving extra argument copies, or manual 
fieldwise and elementwise blits (which is what one would have to 
do in order to e.g. make `copyEmplace` CTFE-able).

I.e:

```d
// Write .init
T* raw0 = allocateForT();
// currently - emplaceInitializer(raw0);
(*raw0).__initialize;

// Initialize fields or call constructor, whichever is applicable 
for T(arg1, arg2)
T* raw1 = allocateForT();
// currently - raw1.emplace(forward!(arg1, arg2));
(*raw1).__initialize(forward!(arg1, arg2));

// Copy
T* raw2 = allocateForT();
// currently - copyEmplace(*raw1, *raw2);
(*raw2).__initialize(*raw1);

// Move
T* raw3 = allocateForT();
// currently - moveEmplace(*raw2, *raw3);
(*raw3).__initialize(move(*raw2));

// Could be called at runtime or during CTFE
auto createArray()
{
    // big array, don't initialize
    const(T)[1000] result = void;
    // exception handling omitted for brevity
    foreach (i, ref it; result)
    {
        // currently - `emplace`, which may fail to compile in CTFE
        it.__initialize(createIthElement(i));
    }
    return result;
}

// CTFE use case:
static auto array = createArray();
```

The wins are obvious - unified syntax, better error messages, 
CTFE support, less library voodoo failing at mimicking the 
compiler. The losses? I don't see any.

Note that I am not talking about yet another library function. 
This would not be a symbol in druntime, this would be compiler 
magic. Having that, `emplaceInitializer`, `emplace` and 
`copyEmplace` could be re-implemented in terms of `__initialize`, 
and eventually deprecated and removed. `moveEmplace` could linger 
until DIP1040 is implemented, tried, and proven. The `move` 
example, verbatim, would be pessimized compared to `moveEmplace` 
due to moving twice, which hopefully DIP1040 could solve.

I'm a bit hesitant to suggest how this should interact with 
` safe`. On one hand, the established precedent is in `emplace` - 
it infers, and I'm leaning towards that, even though it can 
potentially invalidate existing state. On the other hand, because 
it can indeed invalidate existing state, it should be ` system`. 
But then it would require some additional facility just for 
inference, so it could be called ` trusted` correctly, otherwise 
it'd be useless. And that facility, whatever it is, better not be 
another library reincarnation of all required semantics. For 
example, something like a `__traits(isSafeToInitWith, T, args)`. 
Whichever the approach, it should definitely infer all other 
attributes.

There are undoubtedly other things to consider. For example - 
classes. It would seem prudent for this hypothetical 
`__initialize` to be calling class ctors. On the other, a 
reference itself is just a POD, and generic code might indeed 
want to write null as opposed to attempting to call a default 
constructor. Then again, generic code still would have to 
specialize for classes... Thoughts welcome.

What do you think? DIP this, yay or nay? Suggestions?..
Nov 27 2021
next sibling parent reply kinke <noone nowhere.com> writes:
On Saturday, 27 November 2021 at 21:56:05 UTC, Stanislav Blinov 
wrote:
 [...]
 Upcoming `__traits(initSymbol)` would help in removing the hack,
It's already removed in master.
 but won't help CTFE any. At various points of their lives, 
 these things even explicitly called `memcpy`, which is just... 
 argh! And some still do (`copyEmplace`, I'm looking at you). 
 Call into CRT to blit a 8-byte struct? With statically known 
 size and alignment? Just to sidestep type system? Eh???
1. Most optimizers recognize a memcmp call and its semantics, and try to avoid the lib call accordingly. 2. A slice copy (`source[] = target[]` with e.g. void[]-typed slices) is a memcpy with additional potential checks for matching length and no overlap (with enabled bounds checks IIRC), so memcpy avoids that overhead. It also works with -betterC; e.g., the aforementioned checks are implemented as a druntime helper function for LDC and so not available with -betterC. 3. I haven't checked, but if memcpy is the only real CTFE blocker for emplace at the moment, I guess one option would be extending the CTFE interpreter by a memcpy builtin, in order not to have to further uglify the existing library code.
 What do you think? DIP this, yay or nay? Suggestions?..
I'm not convinced I'm afraid. :) - I've been thinking in the other direction, treating core.lifetime.{move,forward} as builtins for codegen (possibly restricted to function call argument expressions), in order to save work for the optimizer and less bloat for debug builds.
Nov 27 2021
next sibling parent reply russhy <russhy gmail.com> writes:
I would love to be able to do:


```D

T* t = alloc();

(*t) = .{};

// or better
t.* = .{};

// then we could also go ahead and be able to do like:
t.* = .{ field_a: 1, fiels_2: 2 }

```


Basically relaxing that rule: 
https://dlang.org/spec/struct.html#static_struct_init


Other languages do that, and i love them

Don't let us stay behind because we refuse to more forward!
Nov 27 2021
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Sunday, 28 November 2021 at 03:19:49 UTC, russhy wrote:
 I would love to be able to do:
This is orthogonal to this discussion. Even if concise initializer syntax that you suggest was allowed...
 ```D

 T* t = alloc();

 (*t) = .{};
 ```
...that's an assignment. I.e. that would lower down to `uninitializedGarbage.opAssign(T.init);`. Destructing garbage and/or calling operators on garbage isn't exactly the way to success :) Which is the crux of the problem in question, and why things like `emplace` exist in the first place.
Nov 28 2021
next sibling parent reply russhy <russhy gmail.com> writes:
On Sunday, 28 November 2021 at 08:54:39 UTC, Stanislav Blinov 
wrote:
 On Sunday, 28 November 2021 at 03:19:49 UTC, russhy wrote:
 I would love to be able to do:
This is orthogonal to this discussion. Even if concise initializer syntax that you suggest was allowed...
 ```D

 T* t = alloc();

 (*t) = .{};
 ```
...that's an assignment. I.e. that would lower down to `uninitializedGarbage.opAssign(T.init);`. Destructing garbage and/or calling operators on garbage isn't exactly the way to success :) Which is the crux of the problem in question, and why things like `emplace` exist in the first place.
this is the exact same issue this is exactly why i mentioned it emplace is a library, it doesn't solve anything it solves people's addiction to "import" things if you tell people they need to import package to to initialization, then the language is a failure ``.{}`` wins over ``__initialize`` there need to be a movement to stop making syntax such a pain to write, and make things overall consistent It's the same with enums ``MyEnumDoingThings myEnumThatINeed = MyEnumDoingThings.SOMETHING_IS_NOT_RIGHT;`` And now you want to same for everything else ``(*raw1).__initialize(forward!(arg1, arg2));`` more typing! templates!! more long lines!!! more slowness!!!!
Nov 28 2021
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Sunday, 28 November 2021 at 16:36:05 UTC, russhy wrote:

 this is the exact same issue
No, it isn't.
 It's the same with enums
No, it isn't.
 And now you want to same for everything else
No, I don't.
 ``(*raw1).__initialize(forward!(arg1, arg2));``

 more typing! templates!! more long lines!!! more slowness!!!!
Way off mark here.
 This is orthogonal to this discussion. Even if concise 
 initializer syntax that you suggest was allowed
 let's improve it then, let's play more with it
 instead of introducing new functions/templates
 i feel like this is the perfect place to have such improvements 
 take place
This topic has nothing to do with what you're talking about.
Nov 28 2021
parent reply russhy <russhy gmail.com> writes:
On Sunday, 28 November 2021 at 19:30:11 UTC, Stanislav Blinov 
wrote:

 This topic has nothing to do with what you're talking about.
It does, you just don't understand what "we could improve it" mean; relaxing its rules, and reusing the syntax for doing what you ask for
Nov 28 2021
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Sunday, 28 November 2021 at 22:00:05 UTC, russhy wrote:
 On Sunday, 28 November 2021 at 19:30:11 UTC, Stanislav Blinov 
 wrote:

 This topic has nothing to do with what you're talking about.
It does, you just don't understand what "we could improve it" mean; relaxing its rules, and reusing the syntax for doing what you ask for
Oh I have no doubt that there is indeed some lack of understanding here. So I'm going to try one last time. The problem in question lies in the assignment operator, __not__ whatever's on the right hand side of it. It's absolutely irrelevant here how you spell the initializer. First please understand the difference between initialization and assignment. Then read up on https://dlang.org/spec/declaration.html#void_init and then try to understand that assigning to uninitialized structs that have an explicit or implicit `opAssign` defined would involve using uninitialized values, which may lead to UB. And that is just one of the problems that existing library solutions address. The rest is spelled out in the first post. Have fun with this little program: ```d import std.stdio; void main() { File file = void; file = File.init; // File.init, .{}, BANANAS - doesn't matter, it's UB } ``` So once again, if you want to discuss initializer syntax, feel free to create a topic for that as that is not what's in question here.
Nov 29 2021
parent russhy <russhy gmail.com> writes:
On Monday, 29 November 2021 at 08:35:06 UTC, Stanislav Blinov 
wrote:
 (..)
Oh i see.., thanks for being patient with me and providing me links where i can read more about it! sorry for derailing the post!
Nov 29 2021
prev sibling parent russhy <russhy gmail.com> writes:
On Sunday, 28 November 2021 at 08:54:39 UTC, Stanislav Blinov 
wrote:
 This is orthogonal to this discussion. Even if concise 
 initializer syntax that you suggest was allowed
let's improve it then, let's play more with it instead of introducing new functions/templates i feel like this is the perfect place to have such improvements take place
Nov 28 2021
prev sibling parent Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Sunday, 28 November 2021 at 02:15:37 UTC, kinke wrote:
 On Saturday, 27 November 2021 at 21:56:05 UTC, Stanislav Blinov 
 wrote:
 [...]
 Upcoming `__traits(initSymbol)` would help in removing the 
 hack,
It's already removed in master.
Cool!
 but won't help CTFE any. At various points of their lives, 
 these things even explicitly called `memcpy`, which is just... 
 argh! And some still do (`copyEmplace`, I'm looking at you). 
 Call into CRT to blit a 8-byte struct? With statically known 
 size and alignment? Just to sidestep type system? Eh???
1. Most optimizers recognize a memcmp call and its semantics, and try to avoid the lib call accordingly.
I'd rather not leave this to "try". Not only because it's work that needn't be done, but also for debug performance. Exactly the stuff you talk about at the end of your post :D
 2. A slice copy (`source[] = target[]` with e.g. void[]-typed 
 slices) is a memcpy with additional potential checks for 
 matching length and no overlap (with enabled bounds checks 
 IIRC), so memcpy avoids that overhead. It also works with 
 -betterC; e.g., the aforementioned checks are implemented as a 
 druntime helper function for LDC and so not available with 
 -betterC.
Slice copies aren't needed :) Nor would they work in CTFE, as that requires reinterpret-casting a T to a slice.
 3. I haven't checked, but if memcpy is the only real CTFE 
 blocker for emplace at the moment, I guess one option would be 
 extending the CTFE interpreter by a memcpy builtin, in order 
 not to have to further uglify the existing library code.
`emplace` is also deficient: https://github.com/dlang/druntime/blob/2b7873da09c63761fe6e69dc4dd225c0844ed4e9/src/core/internal/lifetime.d#L31-L59 Also note that that's already one call down from `emplace`, and potentially could `move` the bits or copy the argument(s) again (to call the fake struct ctor), and then, of course, again, in implementation of that fake ctor. Same goes for the actual non-fake struct `__ctor` version. Initializing large structs or those having expensive copy ctors is no fun. `-O` build may help with some of that, of course, but again I'd rather this didn't need to be in the first place. `emplaceInitializer` also may not work in all cases. Current one would fail on that mangling business, upcoming one - because `__traits(initSymbol)` gives you a `void[]`, meaning a reinterpret cast is needed somewhere, meaning no dice for CTFE. And that means none of these guys would work when initializer is required, since everyone in the `emplace` family is dependent on `emplaceInitializer`. So CTFE-able implementation would be back to union fun. Except, of course, for classes, which is... questionable. Making `mem*` functions available to CTFE would be a big improvement for sure, but it only solves half the problem (the other being reinterpret casts). `emplace` in CTFE should fail for one reason only - if the ctor is not CTFE-able (i.e. that's caller's responsibility). So far, it may fail for reasons that are down to language plumbing :(
 What do you think? DIP this, yay or nay? Suggestions?..
I'm not convinced I'm afraid. :) - I've been thinking in the other direction, treating core.lifetime.{move,forward} as builtins for codegen (possibly restricted to function call argument expressions), in order to save work for the optimizer and less bloat for debug builds.
A compiler extension? Wouldn't that require semantics to be the same? Surely you wouldn't want to artificially limit their implementation in compiler just because library versions are deficient? I mean, I'm not against this idea, but AFAIUI that route mandates we make library versions more robust. Then again, why have four builtins where one can suffice? ;)
Nov 28 2021
prev sibling next sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Saturday, 27 November 2021 at 21:56:05 UTC, Stanislav Blinov 
wrote:
 ```d
 T stuff = T(args); // or new T(args);
 ```

 but this?..

 ```d
 T* ptr = allocateForT();
 // now what?.. Can't just do *ptr = T(args) - that's an 
 assignment, not initialization!
 // is T a struct? A union? A class? An int?.. Is it even a 
 constructor call?..
What about taking inspiration from C++'s https://en.cppreference.com/w/cpp/language/new#Placement_new ? Allocators could be supported by either - passing the allocator as to `new()` as an the first normal or template parameter or - have `new` be defineable as a member function of an allocator (or any struct or class) ?
Dec 13 2021
prev sibling next sibling parent Tejas <notrealemail gmail.com> writes:
On Saturday, 27 November 2021 at 21:56:05 UTC, Stanislav Blinov 
wrote:
 D lacks syntax for initializing the uninitialized. We can do 
 this:

 [...]
Think this can help with that `void` initialization problem? We can have a node attribute `isInitialized` in the frontend, depending on which we can decide whether to call the destructor or not at the end of the scope?
Dec 19 2021
prev sibling parent vit <vit vit.vit> writes:
On Saturday, 27 November 2021 at 21:56:05 UTC, Stanislav Blinov 
wrote:
 ...
Nice idea, placement new really missing from D. I have reverse problem with emplace then you. emplaceRef wrongly infers safe attribute for non ctfe code if assignment is system because of this: ```d //emplaceRef: if (__ctfe) ///... chunk = forward!(args[0]); ///... } ``` Example: ```d struct Foo{ this(scope ref const typeof(this) rhs) safe{} void opAssign(scope ref const typeof(this) rhs) system{} } void main() safe{ import core.lifetime : emplace; Foo foo; { const Foo* ptr; emplace(ptr, foo); //OK __ctfe path doesn't exists } { Foo* ptr; emplace(ptr, foo); //ERROR __ctfe path exists and call system opAssign } } ``` Error: ` safe` function `D main` cannot call ` system` function `core.lifetime.emplace!(Foo, Foo).emplace` D has one good thing, you can create custom emplace which run your own code between emplaceInitialize and ctor. You can initialize your own vptr before ctor.
Jan 04 2022