www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - =void in struct definition

reply Shachar Shemesh <shachar weka.io> writes:
struct S {
   int a;
   int[5000] arr = void;
}

void func() {
   S s;
}

During the s initialization, the entire "S" area is initialized, 
including the member arr which we asked to be = void.

Is this a bug?

Shachar
Apr 09 2018
next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Monday, 9 April 2018 at 11:06:50 UTC, Shachar Shemesh wrote:
 struct S {
   int a;
   int[5000] arr = void;
 }

 void func() {
   S s;
 }

 During the s initialization, the entire "S" area is 
 initialized, including the member arr which we asked to be = 
 void.

 Is this a bug?

 Shachar
Not semantically, but you might consider it a performance bug. This particular one could be fixed, put I cannot say how messy the details are. There is potential for code that silently relies on the behavior and would break in very non-obvious ways if we fixed it.
Apr 09 2018
next sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
On Monday, 9 April 2018 at 11:15:14 UTC, Stefan Koch wrote:
 On Monday, 9 April 2018 at 11:06:50 UTC, Shachar Shemesh wrote:
 [ ... ]
 During the s initialization, the entire "S" area is 
 initialized, including the member arr which we asked to be = 
 void.

 Is this a bug?

 Shachar
[ ... ] {This could be fixed, but may break code} [ ... ]
So currently on initalizsation we do this: --- structPtr = cast(StructType*) alloc(structSize); memcpy(structPtr, StructType.static_struct_initializer, StructType.sizeof); ---- which we could change to --- structPtr = cast(StructType*) alloc(structSize); foreach(initializerSegment;StructType.InitializerSegments) { memcpy((cast(void*)structPtr) + initializerSegment.segmentOffset, (cast(void*) initializerSegment.segmentPtr), initializerSegment.segmentSize); } --- This will potentially remove quite a lot of binary bloat since void-members do no longer need to be stored in initializers, and initialization overhead. In terms of implementation this _should_ be straight-forward but well ... runtime and compiler interaction can be a mess.
Apr 09 2018
prev sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Monday, 9 April 2018 at 11:15:14 UTC, Stefan Koch wrote:
 Not semantically, but you might consider it a performance bug.
 This particular one could be fixed, put I cannot say how messy 
 the details are.
 There is potential for code that silently relies on the 
 behavior and would break in very non-obvious ways if we fixed 
 it.
If the fix causes non-obvious breakage, then why not a DIP for an opInit that overrides the default initialization and has the desired new functionality? Though it would be annoying to have two ways of doing the same thing...
Apr 09 2018
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Monday, 9 April 2018 at 14:11:35 UTC, jmh530 wrote:
 On Monday, 9 April 2018 at 11:15:14 UTC, Stefan Koch wrote:
 Not semantically, but you might consider it a performance bug.
 This particular one could be fixed, put I cannot say how messy 
 the details are.
 There is potential for code that silently relies on the 
 behavior and would break in very non-obvious ways if we fixed 
 it.
If the fix causes non-obvious breakage, then why not a DIP for an opInit that overrides the default initialization and has the desired new functionality? Though it would be annoying to have two ways of doing the same thing...
It's not worth a DIP. You can write a static initializer function and pass it a GCAlloced pointer.
Apr 09 2018
prev sibling next sibling parent Simen =?UTF-8?B?S2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:
On Monday, 9 April 2018 at 11:06:50 UTC, Shachar Shemesh wrote:
 struct S {
   int a;
   int[5000] arr = void;
 }

 void func() {
   S s;
 }

 During the s initialization, the entire "S" area is 
 initialized, including the member arr which we asked to be = 
 void.

 Is this a bug?
https://issues.dlang.org/show_bug.cgi?id=16956 -- Simen
Apr 09 2018
prev sibling next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Monday, April 09, 2018 14:06:50 Shachar Shemesh via Digitalmars-d wrote:
 struct S {
    int a;
    int[5000] arr = void;
 }

 void func() {
    S s;
 }

 During the s initialization, the entire "S" area is initialized,
 including the member arr which we asked to be = void.

 Is this a bug?
It looks like Andrei created an issue about it as an enhancement request several years ago: https://issues.dlang.org/show_bug.cgi?id=11331 - Jonathan M Davis
Apr 09 2018
parent reply Shachar Shemesh <shachar weka.io> writes:
On 09/04/18 14:22, Jonathan M Davis wrote:
 On Monday, April 09, 2018 14:06:50 Shachar Shemesh via Digitalmars-d wrote:
 struct S {
     int a;
     int[5000] arr = void;
 }

 void func() {
     S s;
 }

 During the s initialization, the entire "S" area is initialized,
 including the member arr which we asked to be = void.

 Is this a bug?
It looks like Andrei created an issue about it as an enhancement request several years ago: https://issues.dlang.org/show_bug.cgi?id=11331 - Jonathan M Davis
Except that issue talks about default constructed objects. My problem happens also with objects constructed with a constructor: extern(C) void func(ref S s); struct S { uint a; int[5000] arr = void; this(uint val) { a = val; } } void main() { auto s = S(12); // To prevent the optimizer from optimizing s away func(s); } $ ldc2 -c -O3 -g test.d $ objdump -S -r test.o | ddemangle > test.s 0000000000000000 <_Dmain>: } } void main() { 0: 48 81 ec 28 4e 00 00 sub $0x4e28,%rsp 7: 48 8d 7c 24 04 lea 0x4(%rsp),%rdi auto s = S(12); c: 31 f6 xor %esi,%esi e: ba 20 4e 00 00 mov $0x4e20,%edx 13: e8 00 00 00 00 callq 18 <_Dmain+0x18> 14: R_X86_64_PLT32 memset-0x4 a = val; 18: c7 04 24 0c 00 00 00 movl $0xc,(%rsp) 1f: 48 89 e7 mov %rsp,%rdi // To prevent the optimizer from optimizing s away func(s); 22: e8 00 00 00 00 callq 27 <_Dmain+0x27> 23: R_X86_64_PLT32 func-0x4 } 27: 31 c0 xor %eax,%eax 29: 48 81 c4 28 4e 00 00 add $0x4e28,%rsp 30: c3 retq Notice the call to memset. Shachar
Apr 11 2018
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Wednesday, April 11, 2018 10:45:40 Shachar Shemesh via Digitalmars-d 
wrote:
 On 09/04/18 14:22, Jonathan M Davis wrote:
 On Monday, April 09, 2018 14:06:50 Shachar Shemesh via Digitalmars-d 
wrote:
 struct S {

     int a;
     int[5000] arr = void;

 }

 void func() {

     S s;

 }

 During the s initialization, the entire "S" area is initialized,
 including the member arr which we asked to be = void.

 Is this a bug?
It looks like Andrei created an issue about it as an enhancement request several years ago: https://issues.dlang.org/show_bug.cgi?id=11331 - Jonathan M Davis
Except that issue talks about default constructed objects. My problem happens also with objects constructed with a constructor: extern(C) void func(ref S s); struct S { uint a; int[5000] arr = void; this(uint val) { a = val; } } void main() { auto s = S(12); // To prevent the optimizer from optimizing s away func(s); } $ ldc2 -c -O3 -g test.d $ objdump -S -r test.o | ddemangle > test.s 0000000000000000 <_Dmain>: } } void main() { 0: 48 81 ec 28 4e 00 00 sub $0x4e28,%rsp 7: 48 8d 7c 24 04 lea 0x4(%rsp),%rdi auto s = S(12); c: 31 f6 xor %esi,%esi e: ba 20 4e 00 00 mov $0x4e20,%edx 13: e8 00 00 00 00 callq 18 <_Dmain+0x18> 14: R_X86_64_PLT32 memset-0x4 a = val; 18: c7 04 24 0c 00 00 00 movl $0xc,(%rsp) 1f: 48 89 e7 mov %rsp,%rdi // To prevent the optimizer from optimizing s away func(s); 22: e8 00 00 00 00 callq 27 <_Dmain+0x27> 23: R_X86_64_PLT32 func-0x4 } 27: 31 c0 xor %eax,%eax 29: 48 81 c4 28 4e 00 00 add $0x4e28,%rsp 30: c3 retq Notice the call to memset. Shachar
All objects are initialized with their init values prior to the constructor being called. So, whether an object is simply default-initialized or whether the constructor is called, you're going to get the same behavior except for the fact that the constructor would normally do further initialization beyond the init value. As such, if there's a problem with the default-initialized value, you're almost certainly going to get the same problem when you call a constructor. - Jonathan M Davis
Apr 11 2018
parent reply Shachar Shemesh <shachar weka.io> writes:
On 11/04/18 10:58, Jonathan M Davis wrote:
 All objects are initialized with their init values prior to the constructor
 being called. So, whether an object is simply default-initialized or whether
 the constructor is called, you're going to get the same behavior except for
 the fact that the constructor would normally do further initialization
 beyond the init value. As such, if there's a problem with the
 default-initialized value, you're almost certainly going to get the same
 problem when you call a constructor.
 
 - Jonathan M Davis
 
That's horrible! That means that constructor initialized objects, regardless of size, get initialized twice. Shachar
Apr 11 2018
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Wednesday, April 11, 2018 11:31:16 Shachar Shemesh via Digitalmars-d 
wrote:
 On 11/04/18 10:58, Jonathan M Davis wrote:
 All objects are initialized with their init values prior to the
 constructor being called. So, whether an object is simply
 default-initialized or whether the constructor is called, you're going
 to get the same behavior except for the fact that the constructor would
 normally do further initialization beyond the init value. As such, if
 there's a problem with the
 default-initialized value, you're almost certainly going to get the same
 problem when you call a constructor.

 - Jonathan M Davis
That's horrible! That means that constructor initialized objects, regardless of size, get initialized twice.
Well, only the stuff you initialize in the constructor gets initialized twice, but yeah, it could result in effectively initializing everything twice if you initialize everything in the constructor. It's one of those design choices that's geared towards correctness, since it avoids ever dealing with the type having garbage, and the fact that you can do stuff like struct S { int _i; this(int i) { foo(); _i = 42; } void foo() { writeln(_i); } } means that if it doesn't initialize it with the init value first, then you get undefined behavior, because _i would then be garbage when it's read (which isn't necessarily a big deal with an int but could really matter if it were something like a pointer). It also factors into how classes are guaranteed to be fully initialized to the correct type _before_ any constructors are run (avoiding the problems that you get in C++ when calling virtual functions in constructors or destructors). Unfortunately, because you're allowed to call arbitrary functions before initializing members, it's also possible to violate the type system with regards to const or immutable. e.g. struct S { immutable int _i; this(int i) { foo(); _i = 42; } void foo() { writeln(_i); } } reads _i before it's fully initialized, so its state isn't identical every time it's accessed like it's supposed to be. However, because the object is default-initialized first, you never end up reading garbage, and the behavior is completely deterministic even if it arguably violates the type system. What the correct solution to that particular problem is, I don't know (probably at least disallowing calling any member functions prior to initializing any immutable or const members), but the fact that the object is default-initialized first reduces the severity of the problem. And while you can end up with portions of an object effectively being initialized twice, for your average struct, I doubt that it matters much. It's when you start doing stuff like having large static arrays that it really becomes a problem. It also wouldn't surprise me if ldc optimized out some of the double-initializations at least some of the time, but I very much doubt that dmd's optimizer is ever that smart. Depending on the implementation of the constructor though, I would think that it would be possible for the compiler to determine that it doesn't actually need to default-initialize the struct first (or that it can just default-initialize pieces of it), because it can guarantee that a member variable isn't read before it's initialized by the constructor. So, at least in theory, the front end should be able to do some optimizations there. However, I have no idea if it ever does. I think that in theory, the idea is that we want initializion to be as correct as possible, so there should be no garbage or undefined behavior involved, and in the case of classes, the object should be fully the type that it's supposed to be when its constructor is called so that you don't get bad behavior from virtual functions, but we then have = void so that specific variables can avoid that extra initialization cost when profiling or whatnot show that it's important. So, if you have something like struct S { int _a; int[5000] _b; this(int a) { _a = a; } } then it's going to behave well as far as correctness goes, and then if the initialization is too expensive, you do S s = void; s._a = 42; I think that the problem is that void initialization was intended specifically for local variables, and the idea of = void for member variables was not really thought through. So, you can easily do something like S s = void; s._a = 42; right now and avoid the default-initialization, but you can't cleanly do struct S { int _a; int[5000] _b = void; this(int a) { _a = a; } } So, the process is completely manual, which obviously sucks if it's something that you _always_ want to do with the type. In general, D favors correctness over peformance with the idea that it gives you backdoors to get around the correctness guarantees in order to get more performance when it matters, but in this case, the backdoor arguably needs some improvement. - Jonathan M Davis
Apr 11 2018
prev sibling next sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 4/9/18 7:06 AM, Shachar Shemesh wrote:
 struct S {
    int a;
    int[5000] arr = void;
 }
 
 void func() {
    S s;
 }
 
 During the s initialization, the entire "S" area is initialized, 
 including the member arr which we asked to be = void.
 
 Is this a bug?
Not technically. It has to initialize `a` to 0. The only way we initialize structs is to copy the whole initializer with memcpy. It would be possible to leave the "tail" uninitialized, and just store the initializer for the first members that have non-void initializers. But that's not how it works now. If that were to happen, you'd still have the same issue with things like: struct S { int[5000] arr = void; int a; } But maybe that's just something we would have to live with. -Steve
Apr 09 2018
prev sibling parent Johan Engelen <j j.nl> writes:
On Monday, 9 April 2018 at 11:06:50 UTC, Shachar Shemesh wrote:
 struct S {
   int a;
   int[5000] arr = void;
 }

 void func() {
   S s;
 }

 During the s initialization, the entire "S" area is 
 initialized, including the member arr which we asked to be = 
 void.

 Is this a bug?
Could be optimized, yes, provided that the spec is updated. We discussed this live at the end of my DConf talk last year, and Walter (in audience) agreed upon the needed spec change. I haven't had/taken the time to work on it yet :( The optimization of simplifying the initialization isn't too hard. But it is a bit tricky, Johannes wrote down some good points here: https://issues.dlang.org/show_bug.cgi?id=15951 (note the padding bytes issue). The good news is that there doesn't appear to be any spec about it, so technically there is no language breakage and currently it is an "accepts invalid" bug... Over dinner me, deadalnix and some others discussed further optimization where emission of the large S.init could be eliminated. We worked out some details, but it's a little harder thing to do. cheers, Johan
Apr 09 2018