www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Initializing an Immutable Field with Magic: The "Fake Placement New"

reply FeepingCreature <feepingcreature gmail.com> writes:
How would you initialize an immutable field outside the 
constructor?

For instance, assume you're trying to implement a tagged union, 
and you want to switch it to a new type - but it so happens that 
the type you're trying to switch to is an immutable struct.

...
immutable struct S { int i; }
union
{
   ...
   S s;
}

For instance, you might, like me, decide that std.conv.emplace 
does what you want:

...
   emplace(&s, S(5));
...

You would then get a strange compiler error that the return type 
of a "side effect free" function cannot be silently thrown out; 
and if you changed the call to, as DMD recommends, `cast(void) 
emplace(&s, S(5));`, you would discover with some astonishment 
that the call is silently removed.

Emplace does not emplace!

What's happening here? From the perspective of the compiler, it 
makes perfect sense.

Emplace is defined as a pure function, meaning that it cannot 
have any effects other than on its parameters. However, its 
parameters are, in order, an immutable struct ("can't" change the 
caller - it's immutable) and a value parameter, S, which also 
can't change the caller. And we told DMD to throw away the return 
value.

So DMD ends up, rather reasonably, convinced that this emplace is 
a no-op. We would need to convince DMD of something like, "it 
looks like I'm giving it an S*, but it's actually a ~magical 
type~ with the same layout as S but not immutable". This is not 
possible.

Instead, we have to use the same magic hack that emplace also 
uses internally: fake placement new!

See, there is *exactly one* construct in the language that is 
allowed to assign a new value to an immutable field, and it's the 
constructor. So we have to make DMD believe that our variable is 
a field in a type we control (hack 1), and then explicitly call 
that type's constructor with our new value (hack 2).

struct Wrapper
{
   S s;

   this(S s)
   {
     this.s = s; // the one operation allowed to assign immutable 
fields
   }
}

`Wrapper` has the same layout as `S`, because it basically *is* 
`S`.

Then we call the constructor as if we were currently constructing 
Wrapper at a location that "just happens" to overlap with our 
field.

Wrapper* wrapper = cast(Wrapper*) &s;
wrapper.__ctor(S(5)); // fake placement new

What a mess. Works though.

Demo: https://run.dlang.io/is/kg7j3f
Jul 26 2019
parent reply ag0aep6g <anonymous example.com> writes:
On 26.07.19 12:11, FeepingCreature wrote:
 How would you initialize an immutable field outside the constructor?
Not, I guess. [...]
 What a mess. Works though.
 
 Demo: https://run.dlang.io/is/kg7j3f
That looks like a complicated way of casting away immutable. `cast(int) value.s.i = 5;` also "works", but has undefined behavior, of course. Surely, calling `__ctor` on an existing immutable instance also has undefined behavior.
Jul 26 2019
parent reply FeepingCreature <feepingcreature gmail.com> writes:
On Friday, 26 July 2019 at 10:25:06 UTC, ag0aep6g wrote:
 That looks like a complicated way of casting away immutable.
 `cast(int) value.s.i = 5;` also "works", but has undefined 
 behavior, of course. Surely, calling `__ctor` on an existing 
 immutable instance also has undefined behavior.
Sure, in this example you can do that, but in a generic function you have no idea what's inside S.
Jul 26 2019
parent reply ag0aep6g <anonymous example.com> writes:
On 26.07.19 12:40, FeepingCreature wrote:
 On Friday, 26 July 2019 at 10:25:06 UTC, ag0aep6g wrote:
 That looks like a complicated way of casting away immutable.
 `cast(int) value.s.i = 5;` also "works", but has undefined behavior, 
 of course. Surely, calling `__ctor` on an existing immutable instance 
 also has undefined behavior.
Sure, in this example you can do that, but in a generic function you have no idea what's inside S.
My point is that you can't do either. You can't mutate immutable data. Doesn't matter whether you try it with a `cast` or with `__ctor`. Both ways are not allowed.
Jul 26 2019
parent reply FeepingCreature <feepingcreature gmail.com> writes:
On Friday, 26 July 2019 at 10:53:32 UTC, ag0aep6g wrote:
 My point is that you can't do either. You can't mutate 
 immutable data. Doesn't matter whether you try it with a `cast` 
 or with `__ctor`. Both ways are not allowed.
Sure you can. Look at the link, you're doing it :) More specific, immutable is kind of awkward. You have to differentiate between immutable types and immutable memory. Those are *often* the same, but not always. The thing you cannot do is mutate memory that was *allocated* immutable - ie. that came out of new T or T() where T was marked immutable, or had immutable fields. But that doesn't happen with immutable fields inside a union, because unions screen off all that stuff; they can't not, because immutable fields and mutable fields may freely overlap. So instead of forbidding mutable-immutable overlap in unions, the language basically just throws up its hands and goes "yeah, whatever." So when you're switching a tagged union to an immutable member, you're not dealing with "immutable memory", you're, effectively, dealing with an uninitialized field. And you can always set an uninitialized field to a new value, whether it's immutable or not, because that's *how the constructor hack works in the first place*. If abusing a constructor like this was broken, the constructor would *itself* be broken.
Jul 26 2019
parent reply ag0aep6g <anonymous example.com> writes:
On 26.07.19 13:14, FeepingCreature wrote:
 On Friday, 26 July 2019 at 10:53:32 UTC, ag0aep6g wrote:
 My point is that you can't do either. You can't mutate immutable data. 
 Doesn't matter whether you try it with a `cast` or with `__ctor`. Both 
 ways are not allowed.
Sure you can. Look at the link, you're doing it :)
What you can do is write invalid programs that seem to behave as you want. But they're invalid. They might explode any time.
 More specific, immutable is kind of awkward. You have to differentiate 
 between immutable types and immutable memory. Those are *often* the 
 same, but not always.
As far as I understand, they're the same to the language. Consequently, they're the same to me. D doesn't have C++'s const where it matters how the data was declared originally.
 The thing you cannot do is mutate memory that was *allocated* immutable 
 - ie. that came out of new T or T() where T was marked immutable, or had 
 immutable fields. But that doesn't happen with immutable fields inside a 
 union, because unions screen off all that stuff; they can't not, because 
 immutable fields and mutable fields may freely overlap. So instead of 
 forbidding mutable-immutable overlap in unions, the language basically 
 just throws up its hands and goes "yeah, whatever."
Do we have it in the spec somewhere that unions defeat immutable? I'm skeptical if that can be sound. As far as I know, we usually say that this function: void f(immutable int* p) { /* ... do something with *p ... */ g(); /* ... do more stuff with *p ... */ } can assume that `*p` is the same before and after calling `g`. But if unions have the power to defeat immutable, that assumption is invalid. Or maybe we can only use that super power of unions if we take care that no other code can observe what we're doing? Can that be specified without undermining the assumption above?
 So when you're switching a tagged union to an immutable member, you're 
 not dealing with "immutable memory", you're, effectively, dealing with 
 an uninitialized field. And you can always set an uninitialized field to 
 a new value, whether it's immutable or not, because that's *how the 
 constructor hack works in the first place*. If abusing a constructor 
 like this was broken, the constructor would *itself* be broken.
That might make sense, but it's at odds with the current spec and implementation. If an immutable union field is considered uninitialized until written to, then the language should forbid accessing it before that (in safe code). We can't have `immutable` data change its observable value.
Jul 26 2019
parent reply FeepingCreature <feepingcreature gmail.com> writes:
On Friday, 26 July 2019 at 12:12:19 UTC, ag0aep6g wrote:
 As far as I know, we usually say that this function:

     void f(immutable int* p)
     {
         /* ... do something with *p ... */
         g();
         /* ... do more stuff with *p ... */
     }

 can assume that `*p` is the same before and after calling `g`. 
 But if unions have the power to defeat immutable, that 
 assumption is invalid.
This is not correct, though it seems correct. This example hits the key of the problem though, so well spotted. What if `g()` manually freed `p`, then allocated some new memory, and that new memory just so happened to exist at the same address? You would have observed a change in the value of `p`, even though it was marked immutable. Now, this is invalid behavior, but it's not invalid behavior *of f*; the entire program is just written in a way that you were able to keep one pointer alive past the lifespan of the data it referenced. Nullable and Algebraic, two types that run into such issues (Nullable uses the union hack internally!) let you control the lifespan of its members via `nullify` or assigning a different type, respectively. As such, if you take a reference to Nullable.get or an algebraic member, and then nullify or reassign it, you have broken your program. It is up to the user, not the compiler, to ensure that this does not happen.
Jul 26 2019
parent reply ag0aep6g <anonymous example.com> writes:
On 26.07.19 14:36, FeepingCreature wrote:
 On Friday, 26 July 2019 at 12:12:19 UTC, ag0aep6g wrote:
 As far as I know, we usually say that this function:

     void f(immutable int* p)
     {
         /* ... do something with *p ... */
         g();
         /* ... do more stuff with *p ... */
     }

 can assume that `*p` is the same before and after calling `g`. But if 
 unions have the power to defeat immutable, that assumption is invalid.
This is not correct, though it seems correct. This example hits the key of the problem though, so well spotted. What if `g()` manually freed `p`, then allocated some new memory, and that new memory just so happened to exist at the same address? You would have observed a change in the value of `p`, even though it was marked immutable. Now, this is invalid behavior, but it's not invalid behavior *of f*; the entire program is just written in a way that you were able to keep one pointer alive past the lifespan of the data it referenced.
It's invalid, yes. So we don't need to consider it. If the only way to break the assumption is to rely on undefined behavior, then there is no way to break the assumption. `free`ing p and then dereferencing it has undefined behavior. It doesn't matter that the address happens to be reused by another allocation. The interesting part is whether you're relying on undefined behavior with your union/__ctor stuff. If yes, then your code is just invalid. If no, then you can break the immutability assumption in a seemingly valid way. That would be interesting, but I'm not convinced that your code is valid. The pain points: 1) The spec doesn't say clearly when union fields are considered initialized. 2) DMD allows safe access of (uninitialized) immutable union fields. 3) __ctor can be called on an existing instance in safe code. That's clearly a bug.
Jul 26 2019
next sibling parent reply FeepingCreature <feepingcreature gmail.com> writes:
On Friday, 26 July 2019 at 14:19:11 UTC, ag0aep6g wrote:
 The interesting part is whether you're relying on undefined 
 behavior with your union/__ctor stuff. If yes, then your code 
 is just invalid. If no, then you can break the immutability 
 assumption in a seemingly valid way. That would be interesting, 
 but I'm not convinced that your code is valid.

 The pain points:
 1) The spec doesn't say clearly when union fields are 
 considered initialized.
 2) DMD allows  safe access of (uninitialized) immutable union 
 fields.
 3) __ctor can be called on an existing instance in  safe code. 
 That's clearly a bug.
I think you are just seriously overestimating the D spec. Note that undefined behavior is a term of art arising from C/C++, referring to behavior explicitly called out as open to the compiler implementation. __ctor is not undefined behavior; I'd call it "unofficial behavior". The spec doesn't mention it. It so happens that defining a constructor, which validly initializes an immutable field, also defines a magical added function __ctor, on which the spec says nothing, but which happens to have the same effect as the constructor. Such a function could not be written normally, but it appears anyways. In any case, this is frontend business, and there is only one frontend and unlikely to ever be another, especially an incompatible one. So as unofficial business goes, it's probably pretty reliable. It might be changed, but if so, it'll probably be marked deprecated; even if not, some other technique will appear in its place. (emplaceRef still has to be implemented *somehow*.)
Jul 26 2019
parent reply ag0aep6g <anonymous example.com> writes:
On 26.07.19 16:47, FeepingCreature wrote:
 On Friday, 26 July 2019 at 14:19:11 UTC, ag0aep6g wrote:
[...]
 The pain points:
 1) The spec doesn't say clearly when union fields are considered 
 initialized.
 2) DMD allows  safe access of (uninitialized) immutable union fields.
 3) __ctor can be called on an existing instance in  safe code. That's 
 clearly a bug.
I think you are just seriously overestimating the D spec.
Overestimating? I'm saying that it's not good enough.
 Note that undefined behavior is a term of art arising from C/C++, 
 referring to behavior explicitly called out as open to the compiler 
 implementation. __ctor is not undefined behavior; I'd call it 
 "unofficial behavior". The spec doesn't mention it.
Undefined behavior isn't something to be filled by an implementation. Undefined behavior is given to operations that are specced as invalid. If you rely on undefined behavior, you don't have a valid program in the specced language anymore. An implementation is of course free to assign meaning to an originally invalid operation, but in doing so it creates a (superset) dialect of the language. And sure, you can write invalid programs that seem to happen to work as you want when compiled with DMD/LDC/GDC. I don't think that should be considered good or normal. If relying on undefined behavior is necessary, that just shows that D still has a long way to go.
 It so happens that defining a constructor, which validly initializes an 
 immutable field, also defines a magical added function __ctor, on which 
 the spec says nothing, but which happens to have the same effect as the 
 constructor. Such a function could not be written normally, but it 
 appears anyways.
As shown initially, if you don't mind relying on UB, you can also just cast.
Jul 26 2019
parent reply FeepingCreature <feepingcreature gmail.com> writes:
On Friday, 26 July 2019 at 15:32:49 UTC, ag0aep6g wrote:
 As shown initially, if you don't mind relying on UB, you can 
 also just cast.
As *immediately answered*, no you *can't*. You can't write a generic function that assigns a new value to a pointer with cast, because the immutable may be on a field, and in any case the value may have assignment disabled.
Jul 26 2019
parent reply ag0aep6g <anonymous example.com> writes:
On 26.07.19 17:35, FeepingCreature wrote:
 On Friday, 26 July 2019 at 15:32:49 UTC, ag0aep6g wrote:
 As shown initially, if you don't mind relying on UB, you can also just 
 cast.
As *immediately answered*, no you *can't*. You can't write a generic function that assigns a new value to a pointer with cast, because the immutable may be on a field, and in any case the value may have assignment disabled.
struct S { immutable int i; disable void opAssign(S); } void f(S* ptr) { * cast(ubyte[S.sizeof]*) ptr = cast(ubyte[S.sizeof]) S(5); }
Jul 26 2019
parent FeepingCreature <feepingcreature gmail.com> writes:
On Friday, 26 July 2019 at 15:55:14 UTC, ag0aep6g wrote:
 On 26.07.19 17:35, FeepingCreature wrote:
 On Friday, 26 July 2019 at 15:32:49 UTC, ag0aep6g wrote:
 As shown initially, if you don't mind relying on UB, you can 
 also just cast.
As *immediately answered*, no you *can't*. You can't write a generic function that assigns a new value to a pointer with cast, because the immutable may be on a field, and in any case the value may have assignment disabled.
struct S { immutable int i; disable void opAssign(S); } void f(S* ptr) { * cast(ubyte[S.sizeof]*) ptr = cast(ubyte[S.sizeof]) S(5); }
Ah, good point. I believe this is approximately what moveEmplace does anyways. (Except with a memcpy.)
Jul 26 2019
prev sibling parent FeepingCreature <feepingcreature gmail.com> writes:
On Friday, 26 July 2019 at 14:19:11 UTC, ag0aep6g wrote:
 The pain points:
 1) The spec doesn't say clearly when union fields are 
 considered initialized.
 2) DMD allows  safe access of (uninitialized) immutable union 
 fields.
 3) __ctor can be called on an existing instance in  safe code. 
 That's clearly a bug.
I forgot to mention: none of this is safe, of course. Manual lifetime management is almost inherently unsafe. Which is why Nullable is peppered with trusted...
Jul 26 2019