www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Why doesn't DMD create any redundant symbols?

reply Gregor Richards <Richards codu.org> writes:
This is a problem that comes up for me again and again in making DSSS 
work everywhere. When DMD is being used to compile several modules with 
-c, it never creates any redundant data, and it also doesn't mark any 
data which could be redundant as common as far as I can tell. This means 
that DSSS has to build one file at a time with DMD. This makes certain 
obnoxious people complain about DSSS being slow, because it takes an 
incredible ten seconds to compile a fairly large library. When I 
switched it to compiling multiple files simultaneously, it takes <1 
second, but was wrong for reasons that will be described below.

When DMD comes over typeinfo (for example), it only puts the typeinfo 
symbol into one .o file it is generating, even if it's used within 
several. On the surface, this seems like a good idea, but in reality it 
causes a whole slew of problems with bogus intermodule dependencies. 
With this, foo.io.output could arbitrarily depend on foo.net.ipvsix.udp, 
because some piece of typeinfo was put there.

First, libraries. I don't know precisely how .lib files work on Windows, 
but linking .a files will pick-and-choose only those .o files that are 
used. With these bogus inter-module dependencies, it will often be 
forced to drag in the whole library, even though only a small chunk of 
it is actually necessary. This just causes big binaries, except when 
libraries have conditional dependencies - if foo.a depends on another 
library, but foo.b does not, it is now unpredictable what libraries are 
necessary. Oof.

Second, incremental compilation. This is one I didn't realize was a 
problem until recently. DSSS will perform incremental compilation when 
only one file has changed by only compiling that file. However, that 
causes more issues with these common data problems. Now, typeinfo could 
be doubly defined but not marked common, or (by means I don't quite 
understand) not defined at all. So, I now have to compile one file at a 
time, even when building binaries.

The solution to all of this is simple: Create redundant symbols in the 
object files, marked as common. I know this can be done because it's 
done properly with one file at a time. This increases the size of the 
object files, but since it reduces bogus intermodule dependencies and 
sections marked as common will be merged anyway, it actually reduces the 
size of produced binaries, as well as making linking a significantly 
less complex problem.

I have to assume there's a reason for this, so, to summarize: Why 
doesn't DMD create any redundant symbols in .o files?

  - Gregor Richards
Aug 28 2007
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Gregor Richards wrote:
 I have to assume there's a reason for this, so, to summarize: Why 
 doesn't DMD create any redundant symbols in .o files?
It can improve build speed a lot. With C++, which doesn't do this, huge .obj files can be generated. The compiler assumes that if there are multiple modules on the command line, they'll all be linked together, so why generate redundant output?
Aug 28 2007
next sibling parent reply Gregor Richards <Richards codu.org> writes:
Walter Bright wrote:
 Gregor Richards wrote:
 I have to assume there's a reason for this, so, to summarize: Why 
 doesn't DMD create any redundant symbols in .o files?
It can improve build speed a lot. With C++, which doesn't do this, huge .obj files can be generated. The compiler assumes that if there are multiple modules on the command line, they'll all be linked together, so why generate redundant output?
OK, so how about for those willing to (or required to) take the performance penalty, adding an option to create redundant data? I imagine the speed difference between compiling one file at a time and compiling all at once but with redundant data is greater than the speed difference between compiling all at once with and without redundant data, so your improvement to build speed significantly hinders my build speed. - Gregor Richards
Aug 28 2007
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Gregor Richards wrote:
 Walter Bright wrote:
 Gregor Richards wrote:
 I have to assume there's a reason for this, so, to summarize: Why 
 doesn't DMD create any redundant symbols in .o files?
It can improve build speed a lot. With C++, which doesn't do this, huge .obj files can be generated. The compiler assumes that if there are multiple modules on the command line, they'll all be linked together, so why generate redundant output?
OK, so how about for those willing to (or required to) take the performance penalty, adding an option to create redundant data? I imagine the speed difference between compiling one file at a time and compiling all at once but with redundant data is greater than the speed difference between compiling all at once with and without redundant data, so your improvement to build speed significantly hinders my build speed.
It's a good idea, but it would be a fair bit of work the way dmd is designed.
Aug 28 2007
parent reply Frank Benoit <keinfarbton googlemail.com> writes:
Walter Bright schrieb:
 Gregor Richards wrote:
 Walter Bright wrote:
 Gregor Richards wrote:
 I have to assume there's a reason for this, so, to summarize: Why 
 doesn't DMD create any redundant symbols in .o files?
It can improve build speed a lot. With C++, which doesn't do this, huge .obj files can be generated. The compiler assumes that if there are multiple modules on the command line, they'll all be linked together, so why generate redundant output?
OK, so how about for those willing to (or required to) take the performance penalty, adding an option to create redundant data? I imagine the speed difference between compiling one file at a time and compiling all at once but with redundant data is greater than the speed difference between compiling all at once with and without redundant data, so your improvement to build speed significantly hinders my build speed.
It's a good idea, but it would be a fair bit of work the way dmd is designed.
DSSS has the option oneatatime=on as the default now, to avoid problems. But the compile time is no more acceptable. Several ppl complained that after 15 min they canceled compilation of DWT. With doing it with oneatatime=off the same took <15 sec. See also http://d.puremagic.com/issues/show_bug.cgi?id=1838
Feb 14 2008
parent Christopher Wright <dhasenan gmail.com> writes:
Frank Benoit wrote:
 Walter Bright schrieb:
 Gregor Richards wrote:
 Walter Bright wrote:
 Gregor Richards wrote:
 I have to assume there's a reason for this, so, to summarize: Why 
 doesn't DMD create any redundant symbols in .o files?
It can improve build speed a lot. With C++, which doesn't do this, huge .obj files can be generated. The compiler assumes that if there are multiple modules on the command line, they'll all be linked together, so why generate redundant output?
OK, so how about for those willing to (or required to) take the performance penalty, adding an option to create redundant data? I imagine the speed difference between compiling one file at a time and compiling all at once but with redundant data is greater than the speed difference between compiling all at once with and without redundant data, so your improvement to build speed significantly hinders my build speed.
It's a good idea, but it would be a fair bit of work the way dmd is designed.
DSSS has the option oneatatime=on as the default now, to avoid problems. But the compile time is no more acceptable. Several ppl complained that after 15 min they canceled compilation of DWT. With doing it with oneatatime=off the same took <15 sec. See also http://d.puremagic.com/issues/show_bug.cgi?id=1838
The problem being that, without those possibly redundant symbols, you get stuff dying at link time because DMD never bothered to include the symbol anywhere? Performance is secondary to correctness.
Feb 14 2008
prev sibling next sibling parent Derek Parnell <derek psych.ward> writes:
On Tue, 28 Aug 2007 10:36:03 -0700, Walter Bright wrote:

 Gregor Richards wrote:
 I have to assume there's a reason for this, so, to summarize: Why 
 doesn't DMD create any redundant symbols in .o files?
It can improve build speed a lot. With C++, which doesn't do this, huge .obj files can be generated. The compiler assumes that if there are multiple modules on the command line, they'll all be linked together, so why generate redundant output?
However, this assumption is not a valid one. They are valid reasons to compile a set of files (all named on the one command line) that are not necessarily going to be linked together. Also, tools such as make, rebuild and bud can determine which subset of a set of files has been changed and thus only recompiling the subset. I have found that doing this sometimes causes conflicting object file definitions between the subset object files and previously compiled object files from others in the full set. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Feb 16 2008
prev sibling parent Alexander Panek <alexander.panek brainsware.org> writes:
Walter Bright wrote:
 Gregor Richards wrote:
 I have to assume there's a reason for this, so, to summarize: Why 
 doesn't DMD create any redundant symbols in .o files?
It can improve build speed a lot. With C++, which doesn't do this, huge .obj files can be generated.
Pardon my ignorance, but, who cares? DMD is fast enough that such a time penalty for doing something correctly is "excusable" (in other words: needed). Requiring [forcing] the developers of build tools to work around that problem seems kinda weird, to me.
Feb 16 2008