www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - How to get to a class initializer through introspection?

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
I'm working on redoing typeid for classes without compiler magic, and 
stumbled upon the class initializer - the bytes blitted over the class 
before the constructor is called.

Any ideas on how to do that via introspection? The fields are 
accessible, but not their default values.

It seems like __traits(type, getInitializer) might be necessary.
Aug 02 2020
next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Sunday, 2 August 2020 at 22:25:19 UTC, Andrei Alexandrescu 
wrote:
 I'm working on redoing typeid for classes without compiler 
 magic, and stumbled upon the class initializer - the bytes 
 blitted over the class before the constructor is called.

 Any ideas on how to do that via introspection? The fields are 
 accessible, but not their default values.

 It seems like __traits(type, getInitializer) might be necessary.
So you are introducing new compiler magic in the form of __traits, To replace the old compiler magic in the form of type-info? What exactly is the goal of this?
Aug 02 2020
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Sunday, 2 August 2020 at 22:48:51 UTC, Stefan Koch wrote:
 What exactly is the goal of this?
It makes pay-as-you-go runtime easier for specialized use cases at least.
Aug 02 2020
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Monday, 3 August 2020 at 00:06:13 UTC, Adam D. Ruppe wrote:
 On Sunday, 2 August 2020 at 22:48:51 UTC, Stefan Koch wrote:
 What exactly is the goal of this?
It makes pay-as-you-go runtime easier for specialized use cases at least.
Isn't betterC supposed to do that?
Aug 02 2020
parent Adam D. Ruppe <destructionator gmail.com> writes:
On Monday, 3 August 2020 at 00:07:40 UTC, Stefan Koch wrote:
 Isn't betterC supposed to do that?
betterC is about just using the C runtime and doesn't allow you to use any of the D runtime, even if you want to.
Aug 02 2020
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/2/20 6:48 PM, Stefan Koch wrote:
 On Sunday, 2 August 2020 at 22:25:19 UTC, Andrei Alexandrescu wrote:
 I'm working on redoing typeid for classes without compiler magic, and 
 stumbled upon the class initializer - the bytes blitted over the class 
 before the constructor is called.

 Any ideas on how to do that via introspection? The fields are 
 accessible, but not their default values.

 It seems like __traits(type, getInitializer) might be necessary.
So you are introducing new compiler magic in the form of __traits, To replace the old compiler magic in the form of type-info? What exactly is the goal of this?
The idea is to minimize the compiler magic and shift most of the work to the library in an on-demand manner. Most of the typeid stuff is (somewhat surprisingly) moveable to library code. Moving the classinfo part will expose the holes in the __traits offering. Once the typeid stuff is in druntime, there are numerous opportunities for improving and extending it.
Aug 02 2020
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 03.08.20 00:48, Stefan Koch wrote:
 On Sunday, 2 August 2020 at 22:25:19 UTC, Andrei Alexandrescu wrote:
 I'm working on redoing typeid for classes without compiler magic, and 
 stumbled upon the class initializer - the bytes blitted over the class 
 before the constructor is called.

 Any ideas on how to do that via introspection? The fields are 
 accessible, but not their default values.

 It seems like __traits(type, getInitializer) might be necessary.
So you are introducing new compiler magic in the form of __traits, To replace the old compiler magic in the form of type-info? What exactly is the goal of this?
Orthogonality of magic.
Aug 04 2020
prev sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Sunday, 2 August 2020 at 22:25:19 UTC, Andrei Alexandrescu 
wrote:
 Any ideas on how to do that via introspection? The fields are 
 accessible, but not their default values.
It is ugly but possible right now to pull in the symbol via extern(C). See line 20 in my latest blog's example: http://dpldocs.info/this-week-in-d/Blog.Posted_2020_07_27.html#zero-runtime-classes ldc complains but it is a type mismatch not a fundamental barrier, I just didn't figure out the right thing to silence it yet.
 It seems like __traits(type, getInitializer) might be necessary.
but yes this would be generally nicer anyway imo.
Aug 02 2020
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/2/20 7:59 PM, Adam D. Ruppe wrote:
 On Sunday, 2 August 2020 at 22:25:19 UTC, Andrei Alexandrescu wrote:
 Any ideas on how to do that via introspection? The fields are 
 accessible, but not their default values.
It is ugly but possible right now to pull in the symbol via extern(C). See line 20 in my latest blog's example: ero-runtime-classes ldc complains but it is a type mismatch not a fundamental barrier, I just didn't figure out the right thing to silence it yet.
 It seems like __traits(type, getInitializer) might be necessary.
but yes this would be generally nicer anyway imo.
Holy Molly this works. FWIW here's what I plan to use: https://run.dlang.io/is/A6cbal. No need for __gshared because immutable stuff is already shared. The only bummer is it can't be read during compilation, but I assume there'd be a chicken and egg problem if it were. Thanks, Adam! Walter and I had absolutely no idea this can be done but we thought to post here on the off chance. Thanks again!
Aug 02 2020
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Monday, 3 August 2020 at 00:37:48 UTC, Andrei Alexandrescu 
wrote:
 Holy Molly this works. FWIW here's what I plan to use: 
 https://run.dlang.io/is/A6cbal. No need for __gshared because 
 immutable stuff is already shared.
Yes, indeed.
 The only bummer is it can't be read during compilation, but I 
 assume there'd be a chicken and egg problem if it were.
Eh, what if you did class A { ubyte[__traits(classInstanceSize, A)] recursive; } it gives a forward reference error, but it works elsewhere and I think that's fair. We could probably do something similar with a hypothetical __traits(initializer). So I think it is potentially possible and worth at least looking more into. My pragma(mangle) thing there was just a hack to make it work with today's dmd, but tomorrow's dmd could do a better job. (what set me on this path btw was just a hunch I could put opCast in Object itself and hack up my own dynamic cast without druntime. Obviously, I didn't finish that for the blog, but it *is* possible, at least if you explicitly mixin something. That is what led me to posting this thread to muse how to change that: https://forum.dlang.org/thread/huheqhyjkgoroeulmotj forum.dlang.org but it is still awkward to deal with interface offsets without help from the compiler regardless. But still prolly doable with some more time.)
 Thanks, Adam! Walter and I had absolutely no idea this can be 
 done but we thought to post here on the off chance. Thanks 
 again!
Y'all should read my blog :P There's times when I go a full month with nothing to say since I get busy with day job work or around the house with family stuff etc. But I still try to slap something down at least once a month and I have a lot of magic tricks and just cool library stuff in there. Next week I'll probably post my little Tetris game I slapped together on an airplane a couple weeks ago. So not much to learn about D magic tricks but sometimes it is nice to just show how easy fun things are to do too!
Aug 02 2020
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/2/20 9:06 PM, Adam D. Ruppe wrote:
 Y'all should read my blog :P
Well said.
 There's times when I go a full month with nothing to say since I get 
 busy with day job work or around the house with family stuff etc. But I 
 still try to slap something down at least once a month and I have a lot 
 of magic tricks and just cool library stuff in there.
 
 Next week I'll probably post my little Tetris game I slapped together on 
 an airplane a couple weeks ago. So not much to learn about D magic 
 tricks but sometimes it is nice to just show how easy fun things are to 
 do too!
Before you mentally get into that... maybe you can also say what the whole deal with OffsetTypeInfo and offTi is. I reckon it's a sort of information on members of a class, but my attempts have been fruitless: https://run.dlang.io/is/68M1qn
Aug 02 2020
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Monday, 3 August 2020 at 01:16:37 UTC, Andrei Alexandrescu 
wrote:
 what the whole deal with OffsetTypeInfo and offTi is. I reckon 
 it's a sort of information on members of a class, but my 
 attempts have been fruitless:
I've never used that and looking at the compiler's source, I just see it outputting null with a comment saying "null for now, fix later"... it might not be implemented. Or if it is, Walter might know better than me as to where it is.
Aug 02 2020
parent Adam D. Ruppe <destructionator gmail.com> writes:
 On Monday, 3 August 2020 at 01:16:37 UTC, Andrei Alexandrescu 
 wrote:
 what the whole deal with OffsetTypeInfo
so if I had to guess, this was probably originally intended to support the precise GC and got replaced with RTInfo which uses __traits(getPointerBitmap) now instead. just speculation. of course if it were there, that could be potentially cool for runtime reflection stuff. but it should also prolly just be opt-in anyway imo.
Aug 02 2020
prev sibling parent reply Johan <j j.nl> writes:
On Sunday, 2 August 2020 at 23:59:23 UTC, Adam D. Ruppe wrote:
 On Sunday, 2 August 2020 at 22:25:19 UTC, Andrei Alexandrescu 
 wrote:
 Any ideas on how to do that via introspection? The fields are 
 accessible, but not their default values.
It is ugly but possible right now to pull in the symbol via extern(C). See line 20 in my latest blog's example: http://dpldocs.info/this-week-in-d/Blog.Posted_2020_07_27.html#zero-runtime-classes ldc complains but it is a type mismatch not a fundamental barrier, I just didn't figure out the right thing to silence it yet.
 It seems like __traits(type, getInitializer) might be 
 necessary.
but yes this would be generally nicer anyway imo.
That for that post Adam, I've been trying the same thing lately. It's needed to fix this: https://issues.dlang.org/show_bug.cgi?id=21097 https://github.com/weka-io/druntime/blob/9e5a36b0fcac242c4d160d3d7d0c85565aebe79f/src/core/internal/lifetime.d#L118 -Johan
Aug 03 2020
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/3/20 4:04 AM, Johan wrote:
 On Sunday, 2 August 2020 at 23:59:23 UTC, Adam D. Ruppe wrote:
 On Sunday, 2 August 2020 at 22:25:19 UTC, Andrei Alexandrescu wrote:
 Any ideas on how to do that via introspection? The fields are 
 accessible, but not their default values.
It is ugly but possible right now to pull in the symbol via extern(C). See line 20 in my latest blog's example: ero-runtime-classes ldc complains but it is a type mismatch not a fundamental barrier, I just didn't figure out the right thing to silence it yet.
 It seems like __traits(type, getInitializer) might be necessary.
but yes this would be generally nicer anyway imo.
That for that post Adam, I've been trying the same thing lately. It's needed to fix this: https://issues.dlang.org/show_bug.cgi?id=21097 https://github.com/weka-io/druntime/blob/9e5a36b0fcac242c4d160d3d7d0c85565aebe79f/src/core/inte nal/lifetime.d#L118 -Johan
Would it be effective to iterate through the .tupleof and initialize each in turn?
Aug 03 2020
parent reply Johan <j j.nl> writes:
On Monday, 3 August 2020 at 13:01:55 UTC, Andrei Alexandrescu 
wrote:
 On 8/3/20 4:04 AM, Johan wrote:
 On Sunday, 2 August 2020 at 23:59:23 UTC, Adam D. Ruppe wrote:
 On Sunday, 2 August 2020 at 22:25:19 UTC, Andrei Alexandrescu 
 wrote:
 Any ideas on how to do that via introspection? The fields 
 are accessible, but not their default values.
It is ugly but possible right now to pull in the symbol via extern(C). See line 20 in my latest blog's example: http://dpldocs.info/this-week-in-d/Blog.Posted_2020_07_27.html#zero-runtime-classes ldc complains but it is a type mismatch not a fundamental barrier, I just didn't figure out the right thing to silence it yet.
 It seems like __traits(type, getInitializer) might be 
 necessary.
but yes this would be generally nicer anyway imo.
That for that post Adam, I've been trying the same thing lately. It's needed to fix this: https://issues.dlang.org/show_bug.cgi?id=21097 https://github.com/weka-io/druntime/blob/9e5a36b0fcac242c4d160d3d7d0c85565aebe79f/src/core/internal/lifetime.d#L118 -Johan
Would it be effective to iterate through the .tupleof and initialize each in turn?
Possibly. IIRC, the spec obliges us to initialize the padding in-between address-aligned members aswell, such that a memcmp works to compare structs. If that is true, then we have to initialize the padding aswell and a memcpy would be that much nicer. If `__traits(type, getInitializer)` would return a symbol, then memcpy is easy and we're done. However, that forces us to emit these init symbols, except for very simple cases like all-zeros initialization (for which LDC does no longer emit an init symbol). It is very benificial for binary size to elide these all-zero initializer symbols. I don't know how much benefit there is for eliding near-zero symbols. If `__traits(type, getInitializer)` would return a function, then that's a different story... My current solution [*]: https://github.com/weka-io/druntime/blob/0dab4b0dc5cbccb891351095ff09b0558e3fbe06/src/core/internal/lifetime.d#L92-L140 -Johan [*] Hits an obscure mangling bug, so doesn't quite work with Weka's codebase yet
Aug 03 2020
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Monday, 3 August 2020 at 14:44:38 UTC, Johan wrote:
 If `__traits(type, getInitializer)` would return a function, 
 then that's a different story...
Yes... and then it could skip =void items too. Like struct A { int a = 10; ubyte[10000] b = void; int c = 20; } Could very well just be the two int assigns. Right now it will spit out 10000 zeros in the middle.
Aug 03 2020
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/3/20 10:47 AM, Adam D. Ruppe wrote:
 On Monday, 3 August 2020 at 14:44:38 UTC, Johan wrote:
 If `__traits(type, getInitializer)` would return a function, then 
 that's a different story...
Yes... and then it could skip =void items too. Like struct A {   int a = 10;   ubyte[10000] b = void;   int c = 20; } Could very well just be the two int assigns. Right now it will spit out 10000 zeros in the middle.
Oh, yes forgot about that important efficiency matter. Yes it does look like we need a __trait after all.
Aug 03 2020
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Tuesday, 4 August 2020 at 02:09:13 UTC, Andrei Alexandrescu 
wrote:
 Oh, yes forgot about that important efficiency matter. Yes it 
 does look like we need a __trait after all.
How do you think it should be exposed? An initialization function the compiler generates? Some kind of range of ranges? (so like a representation of "4 bytes of zero, 5000 bytes uninitialized, 4 bytes of 4s". Though at that point a .tupleof may make more sense, just gotta account for hidden fields too like the class vtable pointer.) I'm thinking the function is probably the best though then tweaking it becomes a compiler patch again. It would also want to be guaranteed to be inlined probably. I don't know though, it is kinda tricky to actually account for those =void things.
Aug 03 2020
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/3/20 10:35 PM, Adam D. Ruppe wrote:
 On Tuesday, 4 August 2020 at 02:09:13 UTC, Andrei Alexandrescu wrote:
 Oh, yes forgot about that important efficiency matter. Yes it does 
 look like we need a __trait after all.
How do you think it should be exposed? An initialization function the compiler generates? Some kind of range of ranges? (so like a representation of "4 bytes of zero, 5000 bytes uninitialized, 4 bytes of 4s". Though at that point a .tupleof may make more sense, just gotta account for hidden fields too like the class vtable pointer.) I'm thinking the function is probably the best though then tweaking it becomes a compiler patch again. It would also want to be guaranteed to be inlined probably. I don't know though, it is kinda tricky to actually account for those =void things.
I'm an introspection junkie so I just wish I got access to the initial value of every field. Come to think of it - a litmus test for introspection is that you can print out during compliation an exact definition of any data structure in the program. That is, you should be able to write a function: printDefinition(T) such that during compilation, given: struct S { int a = 42; immutable double b; string c = "hi"; char[100] c = void; void func(double); ... } then printDefinition!T would output S during compilation. (Without method bodies, but with all qualifiers and attributes and alignment directives and all.) From that perspective, clearly there's a need for __traits(initializerString, T, "c") or __traits(initializerString, T, 2). It always returns a string containing the initializer value ("void" for void) so code can either print it or mixin it. For S, __traits(initializerString, T, 0) returns "42", __traits(initializerString, T, 2) and __traits(initializerString, T, "c") return "\"hi\"", and so on.
Aug 03 2020
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/3/20 11:47 PM, Andrei Alexandrescu wrote:
 then printDefinition!T would output S during compilation.
s/printDefinition!T/printDefinition!S/
Aug 03 2020
prev sibling parent Simen =?UTF-8?B?S2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:
On Tuesday, 4 August 2020 at 03:47:44 UTC, Andrei Alexandrescu 
wrote:
 struct S {
     int a = 42;
     immutable double b;
     string c = "hi";
     char[100] c = void;
     void func(double);
     ...
 }

 then printDefinition!T would output S during compilation. 
 (Without method bodies, but with all qualifiers and attributes 
 and alignment directives and all.)

 From that perspective, clearly there's a need for 
 __traits(initializerString, T, "c") or 
 __traits(initializerString, T, 2). It always returns a string 
 containing the initializer value ("void" for void) so code can 
 either print it or mixin it.

 For S, __traits(initializerString, T, 0) returns "42", 
 __traits(initializerString, T, 2) and 
 __traits(initializerString, T, "c") return "\"hi\"", and so on.
The problem with initializerString is it doesn't play nice with mixins - when a field is of a type not defined or imported in the module that does the mixin, the compiler barfs. Since the initializer must be a compile-time constant, can't we just have the __trait return the value, and void in the case of void-initialization? (if so, what do we do for fields not explicitly initialized?) -- Simen
Aug 04 2020
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/3/20 10:44 AM, Johan wrote:
 On Monday, 3 August 2020 at 13:01:55 UTC, Andrei Alexandrescu wrote:
 Would it be effective to iterate through the .tupleof and initialize 
 each in turn?
Possibly. IIRC, the spec obliges us to initialize the padding in-between address-aligned members aswell, such that a memcmp works to compare structs. If that is true, then we have to initialize the padding aswell and a memcpy would be that much nicer.
To play devil's advocate, the padding bytes should not have been changed by user code in the first place :o).
 If `__traits(type, getInitializer)` would return a symbol, then memcpy 
 is easy and we're done. However, that forces us to emit these init 
 symbols, except for very simple cases like all-zeros initialization (for 
 which LDC does no longer emit an init symbol). It is very benificial for 
 binary size to elide these all-zero initializer symbols. I don't know 
 how much benefit there is for eliding near-zero symbols.
 If `__traits(type, getInitializer)` would return a function, then that's 
 a different story...
 
 My current solution [*]: 
 https://github.com/weka-io/druntime/blob/0dab4b0dc5cbccb891351095ff09b0558e3fbe06/src/core/internal
lifetime.d#L92-L140 
 
 
 -Johan
 
 [*] Hits an obscure mangling bug, so doesn't quite work with Weka's 
 codebase yet
Cool. Also the https://dlang.org/spec/traits.html#isZeroInit flag may help. There's also the related trickery TypeInfo uses for the initializer() function: https://github.com/dlang/druntime/blob/master/src/object.d#L390 Array with null pointer and non-zero length. There's also offset info for fields available so I'd say there is enough material for a complete solution. Just throwing everything at the wall...
Aug 03 2020
parent reply Johan <j j.nl> writes:
On Tuesday, 4 August 2020 at 02:03:34 UTC, Andrei Alexandrescu 
wrote:
 On 8/3/20 10:44 AM, Johan wrote:
 On Monday, 3 August 2020 at 13:01:55 UTC, Andrei Alexandrescu 
 wrote:
 Would it be effective to iterate through the .tupleof and 
 initialize each in turn?
Possibly. IIRC, the spec obliges us to initialize the padding in-between address-aligned members aswell, such that a memcmp works to compare structs. If that is true, then we have to initialize the padding aswell and a memcpy would be that much nicer.
To play devil's advocate, the padding bytes should not have been changed by user code in the first place :o).
But the memory into which objects are placed will be tainted and thus the padding areas will not be the same for each object. (it's the same for =void members. All can be incorporated into the initializer function, but it's work.) -Johan
Aug 04 2020
parent reply Johannes Pfau <nospam example.com> writes:
Am Tue, 04 Aug 2020 09:31:16 +0000 schrieb Johan:

 On Tuesday, 4 August 2020 at 02:03:34 UTC, Andrei Alexandrescu wrote:
 On 8/3/20 10:44 AM, Johan wrote:
 On Monday, 3 August 2020 at 13:01:55 UTC, Andrei Alexandrescu wrote:
 Would it be effective to iterate through the .tupleof and initialize
 each in turn?
Possibly. IIRC, the spec obliges us to initialize the padding in-between address-aligned members aswell, such that a memcmp works to compare structs. If that is true, then we have to initialize the padding aswell and a memcpy would be that much nicer.
To play devil's advocate, the padding bytes should not have been changed by user code in the first place :o).
But the memory into which objects are placed will be tainted and thus the padding areas will not be the same for each object. (it's the same for =void members. All can be incorporated into the initializer function, but it's work.) -Johan
I wonder whether an initial memset + then initializing members may be a good solution? The compiler backends may be clever enough to optimize the memset (e.g. if there are no gaps, so it's completely redundant, if there is a single gap and explicitly filling that gap is more efficient than zeroing everything, ...). However, in some cases a memcpy which copies both member initialization data and padding may be better? I'm not sure how to decide when which option is better or whether we can somehow have both... -- Johannes
Aug 04 2020
parent reply Johannes Pfau <nospam example.com> writes:
Am Tue, 04 Aug 2020 10:13:53 +0000 schrieb Johannes Pfau:

 Am Tue, 04 Aug 2020 09:31:16 +0000 schrieb Johan:
 
 On Tuesday, 4 August 2020 at 02:03:34 UTC, Andrei Alexandrescu wrote:
 On 8/3/20 10:44 AM, Johan wrote:
 On Monday, 3 August 2020 at 13:01:55 UTC, Andrei Alexandrescu wrote:
 Would it be effective to iterate through the .tupleof and initialize
 each in turn?
Possibly. IIRC, the spec obliges us to initialize the padding in-between address-aligned members aswell, such that a memcmp works to compare structs. If that is true, then we have to initialize the padding aswell and a memcpy would be that much nicer.
To play devil's advocate, the padding bytes should not have been changed by user code in the first place :o).
But the memory into which objects are placed will be tainted and thus the padding areas will not be the same for each object. (it's the same for =void members. All can be incorporated into the initializer function, but it's work.) -Johan
I wonder whether an initial memset + then initializing members may be a good solution? The compiler backends may be clever enough to optimize the memset (e.g. if there are no gaps, so it's completely redundant, if there is a single gap and explicitly filling that gap is more efficient than zeroing everything, ...). However, in some cases a memcpy which copies both member initialization data and padding may be better? I'm not sure how to decide when which option is better or whether we can somehow have both...
A quick look at some generated ASM for C++ code suggests that GCC can "see through" memcpys if the copied data is "well known": https://godbolt.org/z/jno9KM * So if GCC actually knows which data will be memcpyed, it may rewrite the memcpy to assignments of statically known values. Or it may rewrite the memcpy into multiple assignments skipping holes, it may remove redundant writes (e.g. if a member is immediately written after initialization), ... I'd therefore suggest the following: 1) Make all init symbols COMDAT: This ensures that if a smybol is actually needed (address taken, real memcpy call) it will be available. But if it is not needed, the compiler does not have to output the symbol. If it's required in multiple files, COMDAT will merge the symbols into one. 2) Ensure the compiler always knows the data of that symbol. This probably means during codegen, the initializer should never be an external symbol. It needs to be a COMDAT symbol with attached initializer expression. And the initializer data must always be fully available in .di files. The two rules combined should allow the backend to choose the initialization method that is most appropriate for the target architecture. To summarize, implementing "initializer functions" may prevent this optimization to some degree (depends on inlining and other factors though). So I'd probably prefer to keep compiler generated initializer symbols for aggregates, but make sure that these symbold always have an initializer expression attached, so the backend can choose which one to use. In addition, there needs to be some well-defined way for user code to initialize variables and trigger these optimizations. Most likely __builtin_memset(p, 0, size) and __builtin_memcpy(p, &T.init, T.sizeof) would be fine though. * Interesting that the most efficient way to return a default-initialized aggregate on X86 by value is to just return an address to the initializer. I guess the ABI copies anyway in the caller... -- Johannes
Aug 05 2020
parent reply Johan <j j.nl> writes:
On Wednesday, 5 August 2020 at 13:40:16 UTC, Johannes Pfau wrote:
 I'd therefore suggest the following:
 1) Make all init symbols COMDAT: This ensures that if a smybol 
 is
 actually needed (address taken, real memcpy call) it will be 
 available.
 But if it is not needed, the compiler does not have to output 
 the symbol.
 If it's required in multiple files, COMDAT will merge the 
 symbols into
 one.

 2) Ensure the compiler always knows the data of that symbol. 
 This probably means during codegen, the initializer should 
 never be an external symbol. It needs to be a COMDAT symbol 
 with attached initializer expression. And the initializer data 
 must always be fully available in .di files.

 The two rules combined should allow the backend to choose the 
 initialization method that is most appropriate for the target 
 architecture.
What you are suggesting is pretty much exactly what the compilers already do. Except that we don't expose the initialization symbol directly to the user (T.init is an rvalue, and does not point to the initialization symbol), but through TypeInfo.initializer. Not exposing the initializer symbol to the user had a nice benefit: for cases where we never want to emit an initializer symbol (very large structs), we simply removed that symbol and started doing something else (memset zero), without breaking any user code. However this only works for all-zero structs, because TypeInfo.initializer must return a slice ({init symbol, length}) to data or {null,length} for all-zero (the 'null' is what we started making use of). More complex cases cannot elide the symbol. Initializer functions would allow us to tailor initialization for more complex cases (e.g. with =void holes, padding schenanigans, or non-zero-but-repetitive-constant double[1million] arrays, ...), without having to always turn-on some backend optimizations (at -O0) and without having to expose a TypeInfo.initializer slice, but instead exposing a TypeInfo.initializer function pointer. -Johan
Aug 05 2020
parent reply Johannes Pfau <nospam example.com> writes:
Am Wed, 05 Aug 2020 14:36:37 +0000 schrieb Johan:

 On Wednesday, 5 August 2020 at 13:40:16 UTC, Johannes Pfau wrote:
 I'd therefore suggest the following:
 1) Make all init symbols COMDAT: This ensures that if a smybol is
 actually needed (address taken, real memcpy call) it will be available.
 But if it is not needed, the compiler does not have to output the
 symbol.
 If it's required in multiple files, COMDAT will merge the symbols into
 one.

 2) Ensure the compiler always knows the data of that symbol. This
 probably means during codegen, the initializer should never be an
 external symbol. It needs to be a COMDAT symbol with attached
 initializer expression. And the initializer data must always be fully
 available in .di files.

 The two rules combined should allow the backend to choose the
 initialization method that is most appropriate for the target
 architecture.
What you are suggesting is pretty much exactly what the compilers already do. Except that we don't expose the initialization symbol directly to the user (T.init is an rvalue, and does not point to the initialization symbol), but through TypeInfo.initializer. Not exposing the initializer symbol to the user had a nice benefit: for cases where we never want to emit an initializer symbol (very large structs), we simply removed that symbol and started doing something else (memset zero), without breaking any user code. However this only works for all-zero structs, because TypeInfo.initializer must return a slice ({init symbol, length}) to data or {null,length} for all-zero (the 'null' is what we started making use of). More complex cases cannot elide the symbol. Initializer functions would allow us to tailor initialization for more complex cases (e.g. with =void holes, padding schenanigans, or non-zero-but-repetitive-constant double[1million] arrays, ...), without having to always turn-on some backend optimizations (at -O0) and without having to expose a TypeInfo.initializer slice, but instead exposing a TypeInfo.initializer function pointer. -Johan
But initializer symbols are currently not in COMDAT, or does LDC implement that? That's a crucial point, as it addresses Andrei's initializer bloat point. And it also means you can avoid emitting the symbol if it's never referenced. But if it is referenced, it will be available. Initializer functions have the drawback that backends can no longer choose different strategies for -Os or -O2. All the other benefits you mention (=void holes, padding schenanigans, or non-zero-but-repetitive- constant double[1million] arrays, ...) can also be handled properly by the backend in the initializer-symbol case if the initializer expression is available to the backend. And you have to ensure that the initialization function can always be inlined, so without -O flags it may also lead to suboptimal code... If the initializer optimizations depend on -O flags, it should also be possible to move the necessary steps in the backend into a different step which is executed even without optimization flags. Choosing to initialize using expressions vs. a symbol should not be an expensive step. I don't see how an initializer function would be more flexible than that. In fact, you could generate the initializer function in the backend if information about the initialization expression is always preserved. Constructing an initializer function earlier (in the frontend, or D user code) removes information about the target architecture (-Os, memory available, efficient addressing of local constant data, ...). Because of that, I think the backend is the best place to implement this and the frontend should just provide the symbol initializer expression. -- Johannes
Aug 05 2020
parent reply Johan <j j.nl> writes:
On Wednesday, 5 August 2020 at 16:08:59 UTC, Johannes Pfau wrote:
 Am Wed, 05 Aug 2020 14:36:37 +0000 schrieb Johan:

 On Wednesday, 5 August 2020 at 13:40:16 UTC, Johannes Pfau 
 wrote:
 I'd therefore suggest the following:
 1) Make all init symbols COMDAT: This ensures that if a 
 smybol is
 actually needed (address taken, real memcpy call) it will be 
 available.
 But if it is not needed, the compiler does not have to output 
 the
 symbol.
 If it's required in multiple files, COMDAT will merge the 
 symbols into
 one.

 2) Ensure the compiler always knows the data of that symbol. 
 This probably means during codegen, the initializer should 
 never be an external symbol. It needs to be a COMDAT symbol 
 with attached initializer expression. And the initializer 
 data must always be fully available in .di files.

 The two rules combined should allow the backend to choose the 
 initialization method that is most appropriate for the target 
 architecture.
What you are suggesting is pretty much exactly what the compilers already do. Except that we don't expose the initialization symbol directly to the user (T.init is an rvalue, and does not point to the initialization symbol), but through TypeInfo.initializer. Not exposing the initializer symbol to the user had a nice benefit: for cases where we never want to emit an initializer symbol (very large structs), we simply removed that symbol and started doing something else (memset zero), without breaking any user code. However this only works for all-zero structs, because TypeInfo.initializer must return a slice ({init symbol, length}) to data or {null,length} for all-zero (the 'null' is what we started making use of). More complex cases cannot elide the symbol. Initializer functions would allow us to tailor initialization for more complex cases (e.g. with =void holes, padding schenanigans, or non-zero-but-repetitive-constant double[1million] arrays, ...), without having to always turn-on some backend optimizations (at -O0) and without having to expose a TypeInfo.initializer slice, but instead exposing a TypeInfo.initializer function pointer. -Johan
But initializer symbols are currently not in COMDAT, or does LDC implement that? That's a crucial point, as it addresses Andrei's initializer bloat point. And it also means you can avoid emitting the symbol if it's never referenced. But if it is referenced, it will be available.
It does not matter whether the initializer symbol is in COMDAT, because (currently) it has to be dynamically accessible (e.g. by a user of a compiled library or e.g. by druntime GC object destroy code) and thus cannot be determined whether it is referenced at link/compile time.
 Initializer functions have the drawback that backends can no 
 longer choose different strategies for -Os or -O2. All the 
 other benefits you mention (=void holes, padding schenanigans, 
 or non-zero-but-repetitive- constant double[1million] arrays, 
 ...) can also be handled properly by the backend in the 
 initializer-symbol case if the initializer expression is 
 available to the backend. And you have to ensure that the 
 initialization function can always be inlined, so without -O 
 flags it may also lead to suboptimal code...
Backends can also turn an initializer function into a memcpy function. It's perfectly fine if code is suboptimal without -O. You can simply express more with a function than with a symbol (a symbol implies the function "memcpy(all)", whereas a function could do that and more). How would you express =void using a symbol in an object file?
 If the initializer optimizations depend on -O flags, it should 
 also be possible to move the necessary steps in the backend 
 into a different step which is executed even without 
 optimization flags. Choosing to initialize using expressions 
 vs. a symbol should not be an expensive step.
Actually, this does sound like an expensive analysis to me (e.g. detecting the case of a large array with repetitive initialization inside a struct with a few other members). But maybe more practically, is it possible to enable/disable specific optimization passes for individual functions with gcc backend at -O0? (we can't with LLVM)
 I don't see how an initializer function would be more flexible 
 than that. In fact, you could generate the initializer function 
 in the backend if information about the initialization 
 expression is always preserved. Constructing an initializer 
 function earlier (in the frontend, or D user code) removes 
 information about the target architecture (-Os, memory 
 available, efficient addressing of local constant data, ...). 
 Because of that, I think the backend is the best place to 
 implement this and the frontend should just provide the symbol 
 initializer expression.
I'm a little confused because your last sentence is exactly what we currently do, with the terminology: frontend = dmd code that outputs a semantically analyzed AST. Backend = DMD/GCC/LLVM codegen. Possibly with "glue layer intermediate representation" in-between. What I thought is discussed in this thread, is that we move the complexity out of the compilers (so out of current backends) into druntime. For that, I think an initializer function is a good solution (similar to emitting a constructor function, rather than implementing that codegen inside the backend). -Johan
Aug 05 2020
parent reply Johannes Pfau <nospam example.com> writes:
Am Wed, 05 Aug 2020 22:19:11 +0000 schrieb Johan:


 But initializer symbols are currently not in COMDAT, or does LDC
 implement that? That's a crucial point, as it addresses Andrei's
 initializer bloat point. And it also means you can avoid emitting the
 symbol if it's never referenced. But if it is referenced, it will be
 available.
It does not matter whether the initializer symbol is in COMDAT, because (currently) it has to be dynamically accessible (e.g. by a user of a compiled library or e.g. by druntime GC object destroy code) and thus cannot be determined whether it is referenced at link/compile time.
You're right, I forgot for a second that right now, the initializer symbol has to be accessible. So obviously making it comdat now is not possible, however I think Andrei wanted to make most of that optional with the TypeInfo changes. Regarding "e.g. by a user of a compiled library": That is exactly my point when I said the initializer _expression_ must always be available to the compiler, even for such precompiled libraries. And whenever an initializer is accessed in some code unit, the symbol should be generated and put into comdat. This way, there can be exactly 0 or 1 instances of the initializer symbol, pay-as-you-go depending on whether it's used.
 
 Initializer functions have the drawback that backends can no longer
 choose different strategies for -Os or -O2. All the other benefits you
 mention (=void holes, padding schenanigans, or non-zero-but-repetitive-
 constant double[1million] arrays, ...) can also be handled properly by
 the backend in the initializer-symbol case if the initializer
 expression is available to the backend. And you have to ensure that the
 initialization function can always be inlined, so without -O flags it
 may also lead to suboptimal code...
Backends can also turn an initializer function into a memcpy function.
Yes but as there's no symbol with a global name, the compiler has to somehow place the data locally (local symbol / in code). Inline your code into two code units and you have unnecessarily duplicated initializer data. Interestingly, I can't even get GCC to convert an initilizer function into a symbol: https://godbolt.org/z/b61fcs There's the same problem for inlining though, this will lead to lots of duplication bloat. So when using initializer functions, inlining should probably be not enforced and there needs to be a global function symbol as a fallback. OTOH we want the inliner to be able to actually inline initializer functions in any case...
 It's perfectly fine if code is suboptimal without -O.
 You can simply express more with a function than with a symbol (a symbol
 implies the function "memcpy(all)", whereas a function could do that and
 more).
That's why I'm not talking about only a symbol, I'm talking about the symbol backed by an initializer expression. The initializer expression (StructInitializer / ExpInitializer) is essentially the code representation of the initializer, as complex / compact as it may be. But the symbol fallback (SymbolExp?) can be useful in some cases.
 How would you express =void using a symbol in an object file?
Obviously there has to be some data there, 0, random, whatever. But again, I don't want to have the symbols, I only want to have them as a fallback when needed: Maybe I don't really understand the problem: Consider this code: https://explore.dgnu.org/z/_yixUX ---------- struct Large { ubyte a = 42; size_t[64] blob = void; ubyte b = 10; } void foo() { Large l; } ---------- Because of the byte-by-byte struct comparison, the blob memory actually has to be initialized to 0. Nevertheless, you can see that the backend does not reference the symbol at -O0 and it explicitly does: mov BYTE PTR [rbp-528], 42 So it does not only see "the symbol", it does see the individual field initializers. If byte-by-byte comparison wasn't a requirement, the backend (GCC) would perfectly only initialize a and b. Now move struct Large into a different file: You'll see that GCC now "only sees the symbol", so copies from "_D1s5Large6__initZ". I see two problems with this: * We do not get the symbol-less initializer form if using multiple-files. That's why I think the frontend should make the initializer expression (StructInitializer) which provides expressions to initialize all fields even for aggregates in non-root modules. * We always emit the initializer symbol and pay for the overhead ==> comdat. Apart from that, there is also a GDC "bug" which seems to always emit the symbol-less initializer, if possible. It would be preferable to let the backend (GCC) choose which one to use and according to some tests in C++ experiments, that is be possible. But it probably needs -O to choose the best solution.
 
 If the initializer optimizations depend on -O flags, it should also be
 possible to move the necessary steps in the backend into a different
 step which is executed even without optimization flags. Choosing to
 initialize using expressions vs. a symbol should not be an expensive
 step.
Actually, this does sound like an expensive analysis to me (e.g. detecting the case of a large array with repetitive initialization inside a struct with a few other members). But maybe more practically, is it possible to enable/disable specific optimization passes for individual functions with gcc backend at -O0? (we can't with LLVM)
Of course it depends on how far you go. Simply checking how much actual initialization data there is vs. =void and alignement holes is simple. Detecting foo [1, 2, 3, 1, 2, 3, 1, 2, 3] would be quite difficult. But how is that different when done in the frontend? However, I'm not arguing at all that we should just pass a flat data buffer to the glue code and let the glue code figure out how to reconstruct initialization code from that. I'm suggesting that we always pass both, the comdat symbol and the initialization expression, to the backend: For GCC, we can simply pass any expression (I'm not sure if it has to be constant, i.e. computable at compile time) in the GCC GENERIC backend language to DECL_INITIAL for a variable. So if the initializer in D was this: ------- struct Foo { int[64] data = repeat(1, 3, 64); } ------- in theory we should be able to just pass the initializer code in it's GENERIC form to DECL_INITILIZER. The GCC backend could then just generate the code for initialization. So this then essentially is an initializer function, but of a more GCC readable kind. In some cases (Initialization of a global variable, maybe others) GCC would probably have to evaluate that code at compile time to obtain the data representation. That might be difficult, so maybe we have to consider this in the glue code and pass a complex expression/code based initializer in places where we can execute code but a data based initilizer where that's not possible. Ideally, we pass both options to GCC and let GCC choose. The GCC backend code could be as simple as: if (decl.initializer.isSymbol() && decl.initializer.symbol.hasInitializerExpression()) // TODO: When to use expr vs. symbol? initializer = decl.initializer.symbol.initializerExpression;
 
 I don't see how an initializer function would be more flexible than
 that. In fact, you could generate the initializer function in the
 backend if information about the initialization expression is always
 preserved. Constructing an initializer function earlier (in the
 frontend, or D user code) removes information about the target
 architecture (-Os, memory available, efficient addressing of local
 constant data, ...). Because of that, I think the backend is the best
 place to implement this and the frontend should just provide the symbol
 initializer expression.
I'm a little confused because your last sentence is exactly what we currently do, with the terminology: frontend = dmd code that outputs a semantically analyzed AST. Backend = DMD/GCC/LLVM codegen. Possibly with "glue layer intermediate representation" in-between.
When I said backend there, I meant the GCC, architecture dependent backend, not the glue layer.
 What I thought is discussed in this thread, is that we move the
 complexity out of the compilers (so out of current backends) into
 druntime. For that, I think an initializer function is a good solution
 (similar to emitting a constructor function, rather than implementing
 that codegen inside the backend).
But how is a initializer function different to the backend from a tree of StructInitializer / ExpInitializer? This is a 1:1 representation of the default initializer as written by the user. If you were to write an initializer function, wouldn't you just wrap that initializer tree in a statement and into a function? But the backend would still have to do exactly the same code transformation, with the main difference that it now has to generate a function, inline the function and it has less information about the function (e.g. an initializer tree can be evaluated at compile time / const in GCC terms, a function may not necessarily be, side effects, ...). So it seems to me, just passing the initializer tree from frontend to glue layer is the most information-preserving solution. Reflecting on this some more, I guess I finally understand your point about using a function. To summarize my points: 1 We do not get the expression initializer form if using multiple-files. That's why I think the frontend should make the initializer expression (StructInitializer) which provides expressions to initialize all fields even for aggregates in non-root modules. 2 We always emit the initializer symbol and pay for the overhead ==> comdat 3 One thing I didn't consider so far: CTFE constant folding of expressions in the expression based initializer: I guess that can destroy interesting information for the glue layer. So here we really want two things: A code based initializer expression, which never does CTFE constant folding. And a folded / evaluated expression to initialize global variables. So I guess if we decide we never need the symbol and drop point 2, the third point, a "non-CTFEd initializer expression" is probably pretty close to what you wanted as an initializer function. I just didn't think of it as a function... OTOH my point about using a symbol to unify initializer storage used in multiple invocations across code units would also apply to expression based initializers: Having a function there would actually allow saving space in some cases compared to always inlining the expression. So maybe a comdat, usually-inlined but optionally available function (e.g. for - Os) is a good idea... I'm not sure if the GCC backend can handle an initilizer function (with known body) as well a a DECL_INITIAL in non-optimizing cases though. Maybe this needs some backend engineering in GCC. (DECL_FUNC(DECL_INITIAL(x) = ...) ? -- Johannes
Aug 06 2020
parent reply Johan <j j.nl> writes:
Hi Johannes,
   Can you rewrite your email without all the GDC implementation 
details? Let's keep the discussion backend-agnostic.

The questions to solve are:
Q1 - What do we expose to the user? (an init symbol, an init 
function, typeid pointer to symbol/function for dynamic types... 
?) User code should be able to reset an object to the init state. 
Currently user code can do that without compile-time knowledge of 
the dynamic type of an object.
Q2 - Do we want to take care of initialization in druntime or 
inside the compilers? (currently it is done inside the compilers, 
and each backend does things its own way as long as it complies 
with the answer to Q1. Array comparison was moved from the 
compilers into druntime. It's the same kind of discussion.).

At the moment, we only provide user code dynamic access to 
initializer symbol through typeid.initializer. The idea in this 
thread was to add a way to have 'static' access that preserves 
type information (e.g. doing initialization by calling a druntime 
template function with type as template parameter).

cheers,
   Johan
Aug 06 2020
parent Johannes Pfau <nospam example.com> writes:
Am Thu, 06 Aug 2020 12:58:22 +0000 schrieb Johan:

 Hi Johannes,
    Can you rewrite your email without all the GDC implementation
 details? Let's keep the discussion backend-agnostic.
 
 The questions to solve are:
 Q1 - What do we expose to the user? (an init symbol, an init function,
 typeid pointer to symbol/function for dynamic types... ?) User code
 should be able to reset an object to the init state.
 Currently user code can do that without compile-time knowledge of the
 dynamic type of an object.
 Q2 - Do we want to take care of initialization in druntime or inside the
 compilers? (currently it is done inside the compilers,
 and each backend does things its own way as long as it complies with the
 answer to Q1. Array comparison was moved from the compilers into
 druntime. It's the same kind of discussion.).
 
 At the moment, we only provide user code dynamic access to initializer
 symbol through typeid.initializer. The idea in this thread was to add a
 way to have 'static' access that preserves type information (e.g. doing
 initialization by calling a druntime template function with type as
 template parameter).
 
 cheers,
    Johan
Sorry, I guess that Email Text got much longer than what I initially wanted to write. In the following, I'll just call "variables with non-statically known type" "dynamic types". Q1: Only an rvalue? I didn't know anything actually needs to get an initializer for a dynamic type. Where is this used, in the GC? If we really need that, we either need a pointer to an symbol or a function. I guess I'd agree the function is likely a better solution here. Maybe put it in the vtbl then, to get it out of TypeInfo. I don't mind exposing a function to the user if it's pay-as-you-go, e.g. only emitted on demand. Using it for dynamic types however means we'll always need to emit that function. So if it's somehow possible, I'd rather get rid of getting the initializer for dynamic types completely. Q2: In the compilers. My previous messages were only considering cases where the type is statically known. In that case, I think the compilers can do better than a runtime solution could. (E.g. use code based initializers for small types, remove redundant initialization, emit a single initializer function for large types as the initialization code may get too large (especially if duplicated), -Os vs. -O2, ...). -- Johannes
Aug 06 2020
prev sibling parent Johan <j j.nl> writes:
On Monday, 3 August 2020 at 14:44:38 UTC, Johan wrote:
 My current solution [*]: 
 https://github.com/weka-io/druntime/blob/0dab4b0dc5cbccb891351095ff09b0558e3fbe06/src/core/internal/lifetime.d#L92-L140

 -Johan

 [*] Hits an obscure mangling bug, so doesn't quite work with 
 Weka's codebase yet
This is the bug: https://issues.dlang.org/show_bug.cgi?id=21120 -Johan
Aug 05 2020
prev sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Monday, 3 August 2020 at 08:04:16 UTC, Johan wrote:
 https://github.com/weka-io/druntime/blob/9e5a36b0fcac242c4d160d3d7d0c85565aebe79f/src/core/internal/lifetime.d#L118
I'm not sure that is a virtual call once it is compiled - it ought to be a candidate for devirtualization and/or inlining since the typeinfo class instance is known at compile time. idk if the implementation actually does that though. The druntime method isn't marked final (it probably could be... the compiler puts out different instances of this object, not different subclasses) but still something like ldc's lto ought to be able to figure it out. Regardless, yeah, the pragma(mangle) trick can probably help you here too at least as a hacky solution. Let me know if you figure out the incantation to make ldc accept it, or if I come back to it I'll let you know the solution here too. But for dmd it is easy.
Aug 03 2020
parent Johan <j j.nl> writes:
On Monday, 3 August 2020 at 14:35:27 UTC, Adam D. Ruppe wrote:
 Regardless, yeah, the pragma(mangle) trick can probably help 
 you here too at least as a hacky solution. Let me know if you 
 figure out the incantation to make ldc accept it, or if I come 
 back to it I'll let you know the solution here too. But for dmd 
 it is easy.
For structs, `typeof(T.init)` works, but for classes I also have not figured out how to do it. Because a class variable is always a reference (pointer), you basically need something like `typeof(*Klass)`... -Johan
Aug 03 2020