www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Solutions to the TypeInfo dependency injection issue?

reply Nathan Petrelli <npetrelli klassmaster.com> writes:
I'm referring to the issue raised by Tango developers about a TypeInfo for
char[][] inflating the .EXE size by importing unneeded modules.

The solution given by Walter to this issue was carefully/painfully examination
of object file symbols to determine the correct order of linking.

Are other solutions been planned or considered?

Because I don't think this is a long term solution for big projects, specially
if an IDE is been used (Most of them don't even let you specify the compilation
order of files).

I think it would be possible to build a tool that analyzes object files and
determines the optimal order in most cases, but this seems like a hack on par
with the moc compiler of the Qt project. A hack that's only needed to supply a
deficiency of the compiler.
Mar 08 2007
parent reply Walter Bright <newshound digitalmars.com> writes:
Nathan Petrelli wrote:
 I'm referring to the issue raised by Tango developers about a TypeInfo for
char[][]
 inflating the .EXE size by importing unneeded modules.
 
 The solution given by Walter to this issue was carefully/painfully examination
 of object file symbols to determine the correct order of linking.
 
 Are other solutions been planned or considered?
 
 Because I don't think this is a long term solution for big projects,
 specially if an IDE is been used (Most of them don't even let you 

 compilation order of files).
 
 I think it would be possible to build a tool that analyzes object files and
 determines the optimal order in most cases, but this seems like a 

 with the moc compiler of the Qt project. A hack that's only needed to 

 a deficiency of the compiler.

This situation also only crops up when you're passing all the modules at once to dmd, and then putting the resulting object files into a library. Try compiling the modules independently when they are intended to be put in a library.
Mar 08 2007
parent reply kris <foo bar.com> writes:
Walter Bright wrote:

 This situation also only crops up when you're passing all the modules at 
 once to dmd, and then putting the resulting object files into a library. 
 Try compiling the modules independently when they are intended to be put 
 in a library.

Could you please be explicit about what the distinctions would be? And how it would affect template generation also? Please assume I am a complete idiot, and lead me through step-by-step: (a) what the implications are for discrete versus "batch" compilation (b) how the different compilation approaches lead to differing results (c) how templates are affected at each step I'm hoping this will lead to a "comprehensive" set of instructions to help others create useful libs for WIn32 with DM tools. The longer and more detailed these intructions are, the better it will be for D
Mar 08 2007
parent reply Walter Bright <newshound digitalmars.com> writes:
kris wrote:
 Walter Bright wrote:
 
 This situation also only crops up when you're passing all the modules 
 at once to dmd, and then putting the resulting object files into a 
 library. Try compiling the modules independently when they are 
 intended to be put in a library.

Could you please be explicit about what the distinctions would be? And how it would affect template generation also? Please assume I am a complete idiot, and lead me through step-by-step: (a) what the implications are for discrete versus "batch" compilation (b) how the different compilation approaches lead to differing results (c) how templates are affected at each step I'm hoping this will lead to a "comprehensive" set of instructions to help others create useful libs for WIn32 with DM tools. The longer and more detailed these intructions are, the better it will be for D

When you compile: dmd -c a b then dmd is assuming that a.obj and b.obj will be linked together, so it does not matter which object file something is placed in. In other words, it does not generate things twice. On the other hand: dmd -c a dmd -c b then dmd doesn't know, when compiling a.obj what will be in b.obj, so it assumes the worst and generates it. In other words: dmd -c a b lib foo.lib a.obj b.obj is not a good way to create a library, instead: dmd -c a dmd -c b lib foo.lib a.obj b.obj
Mar 08 2007
next sibling parent reply kris <foo bar.com> writes:
Walter Bright wrote:
 kris wrote:
 
 Walter Bright wrote:

 This situation also only crops up when you're passing all the modules 
 at once to dmd, and then putting the resulting object files into a 
 library. Try compiling the modules independently when they are 
 intended to be put in a library.

Could you please be explicit about what the distinctions would be? And how it would affect template generation also? Please assume I am a complete idiot, and lead me through step-by-step: (a) what the implications are for discrete versus "batch" compilation (b) how the different compilation approaches lead to differing results (c) how templates are affected at each step I'm hoping this will lead to a "comprehensive" set of instructions to help others create useful libs for WIn32 with DM tools. The longer and more detailed these intructions are, the better it will be for D

When you compile: dmd -c a b then dmd is assuming that a.obj and b.obj will be linked together, so it does not matter which object file something is placed in. In other words, it does not generate things twice. On the other hand: dmd -c a dmd -c b then dmd doesn't know, when compiling a.obj what will be in b.obj, so it assumes the worst and generates it. In other words: dmd -c a b lib foo.lib a.obj b.obj is not a good way to create a library, instead: dmd -c a dmd -c b lib foo.lib a.obj b.obj

What about (c) how templates are affected at each step ?
Mar 08 2007
parent reply Walter Bright <newshound digitalmars.com> writes:
kris wrote:
 What about (c) how templates are affected at each step ?

It's the same algorithm - nothing special about templates.
Mar 08 2007
parent reply kris <foo bar.com> writes:
Walter Bright wrote:
 kris wrote:
 
 What about (c) how templates are affected at each step ?

It's the same algorithm - nothing special about templates.

Is it possible, do you think, to be just a little more forthcoming on this? 1) when you batch-compile code with multiple references to a template, there is just one instance generated. 2) when you compile the same code modules individually, there are presumably multiple template instances generated? 3) how does the linker resolve the multiple template instances to just one?
Mar 08 2007
parent reply Walter Bright <newshound digitalmars.com> writes:
kris wrote:
 Walter Bright wrote:
 kris wrote:

 What about (c) how templates are affected at each step ?

It's the same algorithm - nothing special about templates.

Is it possible, do you think, to be just a little more forthcoming on this? 1) when you batch-compile code with multiple references to a template, there is just one instance generated.

Yes.
 2) when you compile the same code modules individually, there are 
 presumably multiple template instances generated?

Yes.
 3) how does the linker resolve the multiple template instances to just one?

The template instantiations are put into COMDAT sections, and the linker discards redundant ones.
Mar 08 2007
parent reply kris <foo bar.com> writes:
Walter Bright wrote:
 kris wrote:
 
 Walter Bright wrote:

 kris wrote:

 What about (c) how templates are affected at each step ?

It's the same algorithm - nothing special about templates.

Is it possible, do you think, to be just a little more forthcoming on this? 1) when you batch-compile code with multiple references to a template, there is just one instance generated.

Yes.
 2) when you compile the same code modules individually, there are 
 presumably multiple template instances generated?

Yes.
 3) how does the linker resolve the multiple template instances to just 
 one?

The template instantiations are put into COMDAT sections, and the linker discards redundant ones.

Thank you; 4) all symbols required to represent typeinfo and templates are now duplicated in each object file ? 5) The linker does not have to search beyond the current object file for instances of #4 (as suggested by larsivi) ? 6) the result is a library with many more duplicate symbols than before, but arranged in such a manner that persuades the linker to do the "right thing" ? 7) there is no possibility of the linker following a 'bad chain', and thus linking in unused or otherwise redundant code ?
Mar 08 2007
parent reply Sean Kelly <sean f4.ca> writes:
kris wrote:
 Walter Bright wrote:
 kris wrote:

 Walter Bright wrote:

 kris wrote:

 What about (c) how templates are affected at each step ?

It's the same algorithm - nothing special about templates.

Is it possible, do you think, to be just a little more forthcoming on this? 1) when you batch-compile code with multiple references to a template, there is just one instance generated.

Yes.
 2) when you compile the same code modules individually, there are 
 presumably multiple template instances generated?

Yes.
 3) how does the linker resolve the multiple template instances to 
 just one?

The template instantiations are put into COMDAT sections, and the linker discards redundant ones.

Thank you; 4) all symbols required to represent typeinfo and templates are now duplicated in each object file ?

My guess is separate compilation generates all TypeInfo and templates used by that module into the module's object file. Which I believe is a "yes."
 5) The linker does not have to search beyond the current object file for 
 instances of #4 (as suggested by larsivi) ?

Correct.
 6) the result is a library with many more duplicate symbols than before, 
 but arranged in such a manner that persuades the linker to do the "right 
 thing" ?

Yes.
 7) there is no possibility of the linker following a 'bad chain', and 
 thus linking in unused or otherwise redundant code ?

It certainly seems that way. We get larger object files and libraries in exchange for smaller executables. If any of the above is wrong, someone please correct me. Sean
Mar 11 2007
parent Pragma <ericanderton yahoo.removeme.com> writes:
Sean Kelly wrote:
 kris wrote:
 6) the result is a library with many more duplicate symbols than 
 before, but arranged in such a manner that persuades the linker to do 
 the "right thing" ?

Yes.
 7) there is no possibility of the linker following a 'bad chain', and 
 thus linking in unused or otherwise redundant code ?

It certainly seems that way. We get larger object files and libraries in exchange for smaller executables. If any of the above is wrong, someone please correct me.

That agrees with my experience, although I'm not sure about the "smaller executables" part. I think the reason why we sometimes get larger executables is more incidental than deliberate - so it doesn't always work out that way. But if we opt for larger object files, then yes, we *always* get the smallest executable size as a result. A nice thing to add to DMD for all this would be to emit "fat .obj files" when -c is supplied, no matter how many .d files are passed on the command line. That way, the optimizations Walter has added (non-duplication of templates and typeinfo) will still become useful for direct-to-link situations (w/o -c). -- - EricAnderton at yahoo
Mar 12 2007
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Thu, 08 Mar 2007 12:36:24 -0800, Walter Bright wrote:

 Walter Bright wrote:
 
 This situation also only crops up when you're passing all the modules 
 at once to dmd, and then putting the resulting object files into a 
 library. Try compiling the modules independently when they are 
 intended to be put in a library.



 When you compile:
 	dmd -c a b
 then dmd is assuming that a.obj and b.obj will be linked together, so it 
 does not matter which object file something is placed in. In other 
 words, it does not generate things twice.
 
 On the other hand:
 	dmd -c a
 	dmd -c b
 then dmd doesn't know, when compiling a.obj what will be in b.obj, so it 
 assumes the worst and generates it.
 
 In other words:
 	dmd -c a b
 	lib foo.lib a.obj b.obj
 is not a good way to create a library, instead:
 	dmd -c a
 	dmd -c b
 	lib foo.lib a.obj b.obj

One of the things that greatly impressed me was DMD's ability to quickly compile multiple files in one pass, rather than the make-like process on doing one file per DMD run. So when I came to write Bud, I made a lot of effort to ensure that I could compile as many as possible files in one call to the compiler. It now seems that you are warning us against this feature of DMD, in the case of creating libraries. This is extremely disappointing. I will add a new switch to Bud to force file-by-file compilation. -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnell
Mar 08 2007
parent kris <foo bar.com> writes:
Derek Parnell wrote:
 One of the things that greatly impressed me was DMD's ability to quickly
 compile multiple files in one pass, rather than the make-like process on
 doing one file per DMD run. So when I came to write Bud, I made a lot of
 effort to ensure that I could compile as many as possible files in one call
 to the compiler. 
 
 It now seems that you are warning us against this feature of DMD, in the
 case of creating libraries. This is extremely disappointing.
 
 I will add a new switch to Bud to force file-by-file compilation.

I don't see that as a hardship when buidling libs, since it perhaps doesn't happen as often as "regular" builds (assuming, of course, that this strategy actually resolves the underlying issue) ? Having said that, the new switch will be *greatly* appreciated. Means we can avoid having to create and maintain the damn make-files. Thanks, Derek!
Mar 08 2007