digitalmars.D - Incremental compilation with DMD

Tom S (73/73) Sep 11 2009 Short story: DMD probably needs an option to output template instances

Ary Borenszweig (6/8) Sep 11 2009 Hi Tom,
Robert Jacques (4/73) Sep 11 2009 On the other hand, one-at-a-time builds can be done in parallel if you
Walter Bright (3/6) Sep 11 2009 Try compiling with -lib, which will put each template instance into its

Tom S (56/63) Sep 12 2009 Thanks for the suggestion. Unfortunately it's a no-go since -lib seems

Tom S (8/22) Sep 12 2009 To clarify, this is not the only issue with -lib. The libs would either
Walter Bright (6/9) Sep 12 2009 Sure, but -multiobj and -lib generate exactly the same object files,

Tom S (17/28) Sep 12 2009 You're right, I'm sorry. I must've overlooked something in the lib dumps...

Walter Bright (18/44) Sep 12 2009 All the .lib file is, is:

Tom S (12/14) Sep 12 2009 I'm not sure what you mean by "the -lib approach". Just how do you

Walter Bright (2/13) Sep 12 2009 You only have to build one source file with -lib, not all of them.

Tom S (37/51) Sep 13 2009 So you mean compiling each file separately? That's only an option if we

Walter Bright (6/64) Sep 13 2009 What you can try is creating a database that is basically a lib (call it...

Tom S (6/10) Sep 13 2009 That's what I'm getting at :)

Walter Bright (4/12) Sep 13 2009 With this approach, you could wind up with some 'dead' obj files in

Don (4/17) Sep 13 2009 I'm feeling horribly guilty for having asked for module-level static

Tom S (16/34) Sep 13 2009 No need to feel guilty. This problem actually manifests itself in many

Tom S (82/86) Sep 15 2009 OK, there we go: http://h3.team0xf.com/increBuild2.7z // I hope it's...

Walter Bright (6/13) Sep 17 2009 If you are compiling files with -lib, and nobody calls those CTFE

Tom S (9/23) Sep 17 2009 It could be debug info, because with -g something definitely is linked

Walter Bright (12/32) Sep 17 2009 The linker doesn't pull in obj modules based on symbolic debug info. You...

Tom S (15/50) Sep 17 2009 I tested it on a single-module program before posting. Basically void

Walter Bright (10/15) Sep 18 2009 The best way to determine what is linked in to an executable is to

Tom S (31/49) Sep 18 2009 Tests seem to indicate otherwise. By the way, the linker in gcc can also...

Walter Bright (3/17) Sep 17 2009 Please post to bugzilla.

Tom S (7/27) Sep 17 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3328

Tom S <h3r3tic remove.mat.uni.torun.pl> writes:

Short story: DMD probably needs an option to output template instances 
to all object files that need them.

Long story:

I've been trying to make incremental compilation in xfBuild reliable, 
but it turns out that it's really tricky with DMD. Consider the 
following example:

* module A instantiates template T from module C
* module B instantiates the same template T from module C (with the same 
arguments)
* compile all modules at the same time in the order: A, B, C
* now A.obj contains the instantiation of T
* remove the instantiation from the A module
* perform an incremental compilation - 'A' was changed, so only it has 
to be recompiled
* linking of A.obj, B.obj and C.obj fails because no module has the 
instantiation of T for B.obj

What happens is that the optimization in DMD to only emit templates to 
the first module that needs it creates implicit inter-module 
dependencies. I've tried tracking them by modifying DMD, but still 
wouldn't find them all - it seems that one would have to dig deep in the 
codegen, my attempts at hacking the frontend (mostly template.c) weren't 
enough.

Yet, I still managed to get some of these implicit dependencies figured 
and attempted using this extra info in xfBuild when deciding what to 
compile incrementally. I've tossed it on a project of mine with > 350 
modules and no circular imports. The result was that even a trivial 
change caused most of the project to be pulled into compilation.

When doing regular incremental compilation, all modules that import the 
modified ones must be recompiled as well. And all modules that import 
these, and so on, up to the root of the project. This is because the 
incremental build tool must assume that the modules that import module 
'A' could have code of the form 'static if (A.something) { ... } else { 
... }' or another form of it. As far as I know, it's not trivial to 
detect whether this is really the case or whether the change is isolated 
to 'A'.

When trying to cope with the implicit dependencies caused by template 
instantiations and references, one also has to recompile all modules 
that contain template references to a module/object file which gets the 
instance. In the first example, it would mean recompiling module 'B' 
whenever 'A' changes. The graph of dependencies here doesn't depend very 
much on the structure of imports in a project, but rather in the order 
that DMD decides to run semantic() on template instances.

Add up these two conservative mechanisms and it turns out that tweaking 
a simple function causes half of your project to be rebuilt. This is not 
acceptable. Even if it was feasible - getting these implicit 
dependencies is probably a matter of either hacking the backend or 
dumping object files and matching unresolved symbols with comdats. 
Neither would be very fast or portable.

Compiling modules one-at-a-time is not a solution because it's too slow.

Thus my suggestion of adding an option to DMD so it may emit template 
instances to all object files that use them. If anyone has alternative 
ideas, I'd be glad to hear them, because I'm running out of options. The 
approach I'm currently using in an experimental version of xfBuild is:

* get a fixed order of modules to be compiled determined by the order 
DMD calls semantic() on them with the root modules at the end
* when a module is modified, additionally recompile all modules that 
occur after it in the list

This quite obviously ends up compiling way too many modules, but seems 
to work reliably (except when OPTLINK decides to crash) without 
requiring full rebuilds all the time. Still, I fear there might be 
corner cases where it will fail as well. DMD sometimes places 
initializers in weird places, e.g.:

.objs\xf-nucleus-model-ILinkedKernel.obj(xf-nucleus-model-ILinkedKernel)
  Error 42: Symbol Undefined 
_D61TypeInfo_S2xf7nucleus9particles13BasicParticle13BasicParticle6__initZ

The two modules (xf.nucleus.model.ILinkedKernel and 
xf.nucleus.particles.BasicParticle) are unrelated. This error occured 
once, somewhere deep into an automated attempt to break the experimental 
xfBuild by touching random modules and performing incremental builds.


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode

Sep 11 2009

Ary Borenszweig <ary esperanto.org.ar> writes:

Tom S escribi�:
 Short story: DMD probably needs an option to output template instances 
 to all object files that need them.

Hi Tom,

What you describe here is very interesting and useful. I think of adding 
an incremental builder to Descent in some point in the future and I'll 
probably encounter the same problem.

So I vote++ to emmiting template instances in every obj that uses them.

Sep 11 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Fri, 11 Sep 2009 07:47:11 -0400, Tom S  
<h3r3tic remove.mat.uni.torun.pl> wrote:
 Short story: DMD probably needs an option to output template instances  
 to all object files that need them.

 Long story:

 I've been trying to make incremental compilation in xfBuild reliable,  
 but it turns out that it's really tricky with DMD. Consider the  
 following example:

 * module A instantiates template T from module C
 * module B instantiates the same template T from module C (with the same  
 arguments)
 * compile all modules at the same time in the order: A, B, C
 * now A.obj contains the instantiation of T
 * remove the instantiation from the A module
 * perform an incremental compilation - 'A' was changed, so only it has  
 to be recompiled
 * linking of A.obj, B.obj and C.obj fails because no module has the  
 instantiation of T for B.obj

 What happens is that the optimization in DMD to only emit templates to  
 the first module that needs it creates implicit inter-module  
 dependencies. I've tried tracking them by modifying DMD, but still  
 wouldn't find them all - it seems that one would have to dig deep in the  
 codegen, my attempts at hacking the frontend (mostly template.c) weren't  
 enough.

 Yet, I still managed to get some of these implicit dependencies figured  
 and attempted using this extra info in xfBuild when deciding what to  
 compile incrementally. I've tossed it on a project of mine with > 350  
 modules and no circular imports. The result was that even a trivial  
 change caused most of the project to be pulled into compilation.

 When doing regular incremental compilation, all modules that import the  
 modified ones must be recompiled as well. And all modules that import  
 these, and so on, up to the root of the project. This is because the  
 incremental build tool must assume that the modules that import module  
 'A' could have code of the form 'static if (A.something) { ... } else {  
 ... }' or another form of it. As far as I know, it's not trivial to  
 detect whether this is really the case or whether the change is isolated  
 to 'A'.

 When trying to cope with the implicit dependencies caused by template  
 instantiations and references, one also has to recompile all modules  
 that contain template references to a module/object file which gets the  
 instance. In the first example, it would mean recompiling module 'B'  
 whenever 'A' changes. The graph of dependencies here doesn't depend very  
 much on the structure of imports in a project, but rather in the order  
 that DMD decides to run semantic() on template instances.

 Add up these two conservative mechanisms and it turns out that tweaking  
 a simple function causes half of your project to be rebuilt. This is not  
 acceptable. Even if it was feasible - getting these implicit  
 dependencies is probably a matter of either hacking the backend or  
 dumping object files and matching unresolved symbols with comdats.  
 Neither would be very fast or portable.

 Compiling modules one-at-a-time is not a solution because it's too slow.

 Thus my suggestion of adding an option to DMD so it may emit template  
 instances to all object files that use them. If anyone has alternative  
 ideas, I'd be glad to hear them, because I'm running out of options. The  
 approach I'm currently using in an experimental version of xfBuild is:

 * get a fixed order of modules to be compiled determined by the order  
 DMD calls semantic() on them with the root modules at the end
 * when a module is modified, additionally recompile all modules that  
 occur after it in the list

 This quite obviously ends up compiling way too many modules, but seems  
 to work reliably (except when OPTLINK decides to crash) without  
 requiring full rebuilds all the time. Still, I fear there might be  
 corner cases where it will fail as well. DMD sometimes places  
 initializers in weird places, e.g.:

 .objs\xf-nucleus-model-ILinkedKernel.obj(xf-nucleus-model-ILinkedKernel)
   Error 42: Symbol Undefined  
 _D61TypeInfo_S2xf7nucleus9particles13BasicParticle13BasicParticle6__initZ

 The two modules (xf.nucleus.model.ILinkedKernel and  
 xf.nucleus.particles.BasicParticle) are unrelated. This error occured  
 once, somewhere deep into an automated attempt to break the experimental  
 xfBuild by touching random modules and performing incremental builds.

On the other hand, one-at-a-time builds can be done in parallel if you  
have multi-cores. Of course, still not a net win on my system, so vote++

Sep 11 2009

Walter Bright <newshound1 digitalmars.com> writes:

Tom S wrote:
 Thus my suggestion of adding an option to DMD so it may emit template 
 instances to all object files that use them. If anyone has alternative 
 ideas, I'd be glad to hear them, because I'm running out of options.

Try compiling with -lib, which will put each template instance into its 
own obj file.

Sep 11 2009

Tom S <h3r3tic remove.mat.uni.torun.pl> writes:

Walter Bright wrote:
 Tom S wrote:
 Thus my suggestion of adding an option to DMD so it may emit template 
 instances to all object files that use them. If anyone has alternative 
 ideas, I'd be glad to hear them, because I'm running out of options.

 
 Try compiling with -lib, which will put each template instance into its 
 own obj file.

Thanks for the suggestion. Unfortunately it's a no-go since -lib seems 
to have the same issue that compiling without -op does - if you have 
multiple modules with the same name (but different packages), one will 
overwrite the other in the lib. On the other hand, I was able to hack 
DMD a bit and use -multiobj since your suggestion gave me an idea :)

Basically, the approach would be to compile the project with -multiobj 
and move the generated objects to a local (per project) directory, 
renaming them so no conflicts arise.

The next step is to determine all public and comdat symbols in all of 
these object files - this might be done via a specialized program, 
however I've used Burton Radons' exelib to optimally run libunres.exe 
from DMC. The exports are saved to some sort of a database (a dumb 
structured file is ok).

The following is done on the initial build - so the next time we have 
some object files in a directory and a map of all their exported 
symbols. In an incremental step, we'll compile the modified modules, but 
don't move their object files immediately over to the special directory. 
We'll instead scan their public and comdat symbols and figure out which 
object files they replace from our already compiled set. For each symbol 
in the newly compiled objects, find which object in the original set 
defined it, then mark it. For all marked files, add them to a library ( 
I call it junk.lib ), then remove the source object. Finally, move the 
newly compiled objects to the special object directory.

The junk.lib will be used if the newly compiled object files missed any 
shared symbols that were in the old objects and that would be generated, 
had more modules be passed to the compiler. In other words, it contains 
symbols that the naive incremental compilation will lose.

When linking, all object files from the directory are passed explicitly 
to the compiler and symbols are pulled eagerly from them, however 
junk.lib will be queried only if a symbol cannot be found in the set of 
objects in the special directory.

I've put up a proof-of-concept implementation at 
http://h3.team0xf.com/increBuild.7z . It requires a slightly patched DMD 
(well, hacked actually), so it prints out the names of all objects it 
generates. Basically, uncomment the `printf("writing '%s'\n", fname);` 
in glue.c at line 133 and add `printf("writing '%s'\n", 
m->objfile->name->str);` after `m->genobjfile(global.params.multiobj);` 
in mars.c. I'm compiling the build tool with a recent (SVN-ish) version 
of Tango and DMD 1.047.

As for my own impressions of this idea, its biggest drawback probably is 
that the multitude of object files created via -multiobj strains the 
filesystem. Even when running on a ramdrive, my WinXP-based system took 
a good fraction of a second to move a few hundred object files to their 
destination directory. This can probably be improved on, as -multiobj 
seems to produce some empty object files (at least according to libunres 
and ddlinfo). It might also be possible to use specialized storage for 
object files by patching up dmd and hooking OPTLINK's calls to 
CreateFile. I'm not sure about Linux, but perhaps something based on 
FUSE might work. These last options are probably long shots, so I'm 
still quite curious how DMD might perform with outputting template 
instantiations into each object file that uses them.


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode

Sep 12 2009

Tom S <h3r3tic remove.mat.uni.torun.pl> writes:

Tom S wrote:
 Walter Bright wrote:
 Tom S wrote:
 Thus my suggestion of adding an option to DMD so it may emit template 
 instances to all object files that use them. If anyone has 
 alternative ideas, I'd be glad to hear them, because I'm running out 
 of options.

 Try compiling with -lib, which will put each template instance into 
 its own obj file.

 
 Thanks for the suggestion. Unfortunately it's a no-go since -lib seems 
 to have the same issue that compiling without -op does - if you have 
 multiple modules with the same name (but different packages), one will 
 overwrite the other in the lib.

To clarify, this is not the only issue with -lib. The libs would either 
have to be expanded into objects or static ctors would not run. And why 
extract them if -multiobj already generates them extracted?


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode

Sep 12 2009

Walter Bright <newshound1 digitalmars.com> writes:

Tom S wrote:
 As for my own impressions of this idea, its biggest drawback probably is 
 that the multitude of object files created via -multiobj strains the 
 filesystem.

Sure, but -multiobj and -lib generate exactly the same object files, 
it's just that -lib puts them all into a library so it doesn't strain 
the file system.

Extracting the obj files from the lib is pretty simple, you can see the 
libomf.c for the format.

Sep 12 2009

Tom S <h3r3tic remove.mat.uni.torun.pl> writes:

Walter Bright wrote:
 Tom S wrote:
 As for my own impressions of this idea, its biggest drawback probably 
 is that the multitude of object files created via -multiobj strains 
 the filesystem.

 
 Sure, but -multiobj and -lib generate exactly the same object files, 
 it's just that -lib puts them all into a library so it doesn't strain 
 the file system.
 
 Extracting the obj files from the lib is pretty simple, you can see the 
 libomf.c for the format.

You're right, I'm sorry. I must've overlooked something in the lib dumps 
and assumed one module overwrites the other.

So with -lib, it should be possible to only extract the object files 
that contain static constructors and the main function and keep the rest 
packed up. Does that sound about right?

By the way, using -lib causes DMD to eat a LOT of memory compared to the 
'normal' mode - in one of my projects, it eats up easily > 1.2GB and 
dies. This could be a downside to this approach. I haven't tested 
whether it's the same with -multiobj

Would it be hard to add an option to DMD to control template emission? 
Apparently GDC has -femit-templates, so it's doable ;) LDC outputs 
instantiations to all objects.


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode

Sep 12 2009

Walter Bright <newshound1 digitalmars.com> writes:

Tom S wrote:
 Walter Bright wrote:
 Tom S wrote:
 As for my own impressions of this idea, its biggest drawback probably 
 is that the multitude of object files created via -multiobj strains 
 the filesystem.

 Sure, but -multiobj and -lib generate exactly the same object files, 
 it's just that -lib puts them all into a library so it doesn't strain 
 the file system.

 Extracting the obj files from the lib is pretty simple, you can see 
 the libomf.c for the format.

 
 You're right, I'm sorry. I must've overlooked something in the lib dumps 
 and assumed one module overwrites the other.
 
 So with -lib, it should be possible to only extract the object files 
 that contain static constructors and the main function and keep the rest 
 packed up. Does that sound about right?

All the .lib file is, is:

[header]
[all the object files concatenated together and aligned]
[dictionary and index]

Linux .a libraries are the same idea, just a different format for the 
header, dictionary and index. The obj files are unmodified in the 
library. You can extract them based on whatever criteria you need.

 By the way, using -lib causes DMD to eat a LOT of memory compared to the 
 'normal' mode - in one of my projects, it eats up easily > 1.2GB and 
 dies. This could be a downside to this approach. I haven't tested 
 whether it's the same with -multiobj

Hmm. I build Phobos with -lib, and haven't experienced any problems, but 
it's possible as dmd doesn't ever discard any memory.


 Would it be hard to add an option to DMD to control template emission? 
 Apparently GDC has -femit-templates, so it's doable ;) LDC outputs 
 instantiations to all objects.

I've found the LDC approach to be generally a poor one (having much 
experience with it for C++, where there is no choice). It generates huge 
object files and there are often linker problems trying to remove the 
duplicates. I really got tired of "COMDAT" problems with linkers, and 
no, it wasn't just with Optlink. Having each template instantiation in 
its own obj file works out great, eliminating all those problems.

I don't really understand why the -lib approach is not working for your 
needs.

Sep 12 2009

Tom S <h3r3tic remove.mat.uni.torun.pl> writes:

Walter Bright wrote:
 I don't really understand why the -lib approach is not working for your 
 needs.

I'm not sure what you mean by "the -lib approach". Just how do you 
exactly apply it to incremental compilation? If my project has a few 
hundred modules and I change just one line in one function, I don't want 
to rebuild it with -lib all again. I thought you were referring to the 
proof-of-concept incremental build tool I posted yesterday which used 
-multiobj, as it should be possible to optimize it using -lib... I just 
haven't tried that yet.


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode

Sep 12 2009

Walter Bright <newshound1 digitalmars.com> writes:

Tom S wrote:
 Walter Bright wrote:
 I don't really understand why the -lib approach is not working for 
 your needs.

 
 I'm not sure what you mean by "the -lib approach". Just how do you 
 exactly apply it to incremental compilation? If my project has a few 
 hundred modules and I change just one line in one function, I don't want 
 to rebuild it with -lib all again. I thought you were referring to the 
 proof-of-concept incremental build tool I posted yesterday which used 
 -multiobj, as it should be possible to optimize it using -lib... I just 
 haven't tried that yet.

You only have to build one source file with -lib, not all of them.

Sep 12 2009

Tom S <h3r3tic remove.mat.uni.torun.pl> writes:

Walter Bright wrote:
 Tom S wrote:
 Walter Bright wrote:
 I don't really understand why the -lib approach is not working for 
 your needs.

 I'm not sure what you mean by "the -lib approach". Just how do you 
 exactly apply it to incremental compilation? If my project has a few 
 hundred modules and I change just one line in one function, I don't 
 want to rebuild it with -lib all again. I thought you were referring 
 to the proof-of-concept incremental build tool I posted yesterday 
 which used -multiobj, as it should be possible to optimize it using 
 -lib... I just haven't tried that yet.

 
 You only have to build one source file with -lib, not all of them.

So you mean compiling each file separately? That's only an option if we 
turn to the C/C++ way of doing projects - using .di files just like C 
headers - *everywhere*. Only then can changes in .d files be localized 

files because (to my knowledge) they have no means of changing what's 
compiled based on the contents of an imported module (basically they 
lack metaprogramming).

So we could give up and do it the C/C++ way with lots of duplicated code 
in headers (C++ is better here with allowing you to only implement 
methods of a class in the .cpp file instead of rewriting the complete 
class and filling in member functions, like the .d/.di approach would 
force) or we might have an incremental build tool that doesn't suck.

This is the picture as I see it:

* I need to rebuild all modules that import the changed modules, because 
some code in them might evaluate differently (static ifs on the imported 
modules, for instance - I explained that in my first post in this topic).

* I need to compile them all at once, because compiling each of them in 
succession yields massively long compile times.

* With your suggestion of using -lib, I assumed that you were suggesting 
building all these modules at once into a lib and then figuring out what 
to do with their object files one by one.

* Some object files need to be extracted because otherwise module ctors 
won't be linked into the executable.

* As this is incremental compilation, there will be object files from 
the previous build, some of which should not be linked, because that 
would cause multiple definition errors.

* The obsoleted object files can't be simply removed, since they might 
contain comdat symbols needed by some objects outside of the newly 
compiled set (I gave an example in my first post, but can provide actual 
D code that illustrates this issue). Thus they have to be moved into a 
lib and only pulled into linking on demand.

That's how my experimental build tool maps to the "-lib approach".


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode

Sep 13 2009

Walter Bright <newshound1 digitalmars.com> writes:

Tom S wrote:
 Walter Bright wrote:
 Tom S wrote:
 Walter Bright wrote:
 I don't really understand why the -lib approach is not working for 
 your needs.

 I'm not sure what you mean by "the -lib approach". Just how do you 
 exactly apply it to incremental compilation? If my project has a few 
 hundred modules and I change just one line in one function, I don't 
 want to rebuild it with -lib all again. I thought you were referring 
 to the proof-of-concept incremental build tool I posted yesterday 
 which used -multiobj, as it should be possible to optimize it using 
 -lib... I just haven't tried that yet.

 You only have to build one source file with -lib, not all of them.

 
 So you mean compiling each file separately?

Yes. Or a subset of the files.

 That's only an option if we 
 turn to the C/C++ way of doing projects - using .di files just like C 
 headers - *everywhere*. Only then can changes in .d files be localized 

 files because (to my knowledge) they have no means of changing what's 
 compiled based on the contents of an imported module (basically they 
 lack metaprogramming).
 
 So we could give up and do it the C/C++ way with lots of duplicated code 
 in headers (C++ is better here with allowing you to only implement 
 methods of a class in the .cpp file instead of rewriting the complete 
 class and filling in member functions, like the .d/.di approach would 
 force) or we might have an incremental build tool that doesn't suck.
 
 This is the picture as I see it:
 
 * I need to rebuild all modules that import the changed modules, because 
 some code in them might evaluate differently (static ifs on the imported 
 modules, for instance - I explained that in my first post in this topic).
 
 * I need to compile them all at once, because compiling each of them in 
 succession yields massively long compile times.
 
 * With your suggestion of using -lib, I assumed that you were suggesting 
 building all these modules at once into a lib and then figuring out what 
 to do with their object files one by one.
 
 * Some object files need to be extracted because otherwise module ctors 
 won't be linked into the executable.
 
 * As this is incremental compilation, there will be object files from 
 the previous build, some of which should not be linked, because that 
 would cause multiple definition errors.
 
 * The obsoleted object files can't be simply removed, since they might 
 contain comdat symbols needed by some objects outside of the newly 
 compiled set (I gave an example in my first post, but can provide actual 
 D code that illustrates this issue). Thus they have to be moved into a 
 lib and only pulled into linking on demand.
 
 That's how my experimental build tool maps to the "-lib approach".

What you can try is creating a database that is basically a lib (call it 
A.lib) of all the modules compiled with -lib. Then recompile all modules 
that depend on changed modules in one command, also with -lib, call it 
B.lib. Then for all the obj's in B, replace the corresponding ones in A.

Sep 13 2009

Tom S <h3r3tic remove.mat.uni.torun.pl> writes:

Walter Bright wrote:
 What you can try is creating a database that is basically a lib (call it 
 A.lib) of all the modules compiled with -lib. Then recompile all modules 
 that depend on changed modules in one command, also with -lib, call it 
 B.lib. Then for all the obj's in B, replace the corresponding ones in A.

That's what I'm getting at :)


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode

Sep 13 2009

Walter Bright <newshound1 digitalmars.com> writes:

Tom S wrote:
 Walter Bright wrote:
 What you can try is creating a database that is basically a lib (call 
 it A.lib) of all the modules compiled with -lib. Then recompile all 
 modules that depend on changed modules in one command, also with -lib, 
 call it B.lib. Then for all the obj's in B, replace the corresponding 
 ones in A.

 
 That's what I'm getting at :)

With this approach, you could wind up with some 'dead' obj files in 
A.lib, but aside from a bit of bloat in the lib file, they'll never wind 
up in the executable.

Sep 13 2009

Don <nospam nospam.com> writes:

Walter Bright wrote:
 Tom S wrote:
 Walter Bright wrote:
 What you can try is creating a database that is basically a lib (call 
 it A.lib) of all the modules compiled with -lib. Then recompile all 
 modules that depend on changed modules in one command, also with 
 -lib, call it B.lib. Then for all the obj's in B, replace the 
 corresponding ones in A.

 That's what I'm getting at :)

 
 With this approach, you could wind up with some 'dead' obj files in 
 A.lib, but aside from a bit of bloat in the lib file, they'll never wind 
 up in the executable.

I'm feeling horribly guilty for having asked for module-level static 
if(). I have a dreadful suspicion that it might have been a profoundly 
bad idea.

Sep 13 2009

Tom S <h3r3tic remove.mat.uni.torun.pl> writes:

Don wrote:
 Walter Bright wrote:
 Tom S wrote:
 Walter Bright wrote:
 What you can try is creating a database that is basically a lib 
 (call it A.lib) of all the modules compiled with -lib. Then 
 recompile all modules that depend on changed modules in one command, 
 also with -lib, call it B.lib. Then for all the obj's in B, replace 
 the corresponding ones in A.

 That's what I'm getting at :)

 With this approach, you could wind up with some 'dead' obj files in 
 A.lib, but aside from a bit of bloat in the lib file, they'll never 
 wind up in the executable.

 
 I'm feeling horribly guilty for having asked for module-level static 
 if(). I have a dreadful suspicion that it might have been a profoundly 
 bad idea.

No need to feel guilty. This problem actually manifests itself in many 
other cases than just static if, e.g. changing an alias in the modified 
module, adding some fields to a struct or methods to a class. Basically 
anything that would bite us if we had C/C++ projects solely in .h files 
(except multiple definition errors). I've prepared some examples (.d and 
.bat files) of these at http://h3.team0xf.com/dependencyFail.7z 
(-version is used instead of literally changing the code). I have no 

of static analysis.

As for the 'dead' obj files, one could run a 'garbage collection' step 
from time to time ;)

-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode

Sep 13 2009

Tom S <h3r3tic remove.mat.uni.torun.pl> writes:

Walter Bright wrote:
 What you can try is creating a database that is basically a lib (call it 
 A.lib) of all the modules compiled with -lib. Then recompile all modules 
 that depend on changed modules in one command, also with -lib, call it 
 B.lib. Then for all the obj's in B, replace the corresponding ones in A.

OK, there we go: http://h3.team0xf.com/increBuild2.7z     // I hope it's 
fine to include LIBUNRES here. It's just for convenience.

This is the second incarnation of that incremental build tool 
experiment. This time it uses -lib instead of -multiobj, as suggested by 
Walter.

The algorithm works as follows:

* compile modules to a .lib file
* extract objects with static ctors or the __Dmain function (remove them 
from the lib)
* find out which old object files should be replaced
	* any objects whose any symbols were re-generated in this compilation pass
* pack up the obsoleted object files into a 'junk' library
* prepend the 'junk' library to the /library chain/
* prepend the newly compiled library to the /library chain/
* link the executable by passing the cached object files and the whole 
library chain to the linker

It doesn't use the simple approach of having just one 'junk'/'A.lib' 
library and appending objects to it, because that's pretty slow due to 
the librarian having to re-generate the dictionary at each such 
operation. So instead it keeps a chain of all libraries generated in 
this process and passes them to the linker in the right order. This will 
waste more space than the naive approach, but should be faster.

The archive contains the source code and a compiled binary (DMD-Win only 
for now... Sorry, folks) as well as a little test in the test/ 
directory. It shows how naive incremental compilation fails (break.bat) 
and how this tool works (work.bat).

The tool can be used with the latest Mercurial revision of xfBuild ( 
http://bitbucket.org/h3r3tic/xfbuild/ ) by passing "+cincreBuild" to it. 
The support is a massive hack though, so expect some strangeness.

I was able to run it on the 'Test1' demo of my Hybrid GUI ( 
http://team0xf.com:1024/hybrid/file/c841d95675ca/Test1.d ) and a 
simple/dumb ray tracer based on OMG ( 
http://team0xf.com:1024/omg/file/5199ed783490/Tracer.d ). In incremental 
compilation it's not noticeably slower than the naive approach, however 
DMD consumes more memory in the -lib mode and the executables produced 
by this approach are larger for some reason. For instance, with Hybrid, 
Test1.exe has about 20MB with increBuild, compared to about 5MB with the 
traditional approach. Perhaps there's some simple way to remove this 
bloat, as compressed with UPX even with the fastest compression method 
the executables differ by just a few kilobytes.

When building my second largest project, DMD eats up about 1.2GB of 
memory and dies (even without -g). Luckily, xfBuild allows me to set the 
limit of modules to be compiled at a time, so when I cap it to 200, it 
compiled... but didn't link :( Somewhere in the process a library is 
created that confuses OPTLINK as well as "lib -l". There's one symbol in 
it that neither of these are unable to see and it results in an 
undefined reference when linking. The symbol is clearly there when using 
a lib dumping tool from DDL or "libunres -d -c". I've dropped the lib at 
http://h3.team0xf.com/strangeLib.7z . The symbol in question is 
compressed and this newsgroup probably won't chew the non-ansi chars 
well, but it can be found via a regex "D2xf3omg4core.*ctFromRealVee0P0Z".

One thing slowing this tool down is the need to call the librarian 
multiple times. DMD -lib will sometimes generate multiple objects with 
the same name and you can only extract them (when using the librarian) 
by running lib -x multiple times. DMD should probably be patched up to 
include fully qualified module names in objects instead of just the last 
name (foo.Mod and bar.Mod both yield Mod.obj in the library), as -op 
doesn't seem to help here.

Another idea that will map well onto any incremental builder would be to 
write a tool that will find the differences between modules and tell 
whether e.g. they're limited to function bodies. Then an incremental 
builder could assume that it doesn't have to recompile any dependencies, 
just this one modified file. Unfortunately, this assumption doesn't 
always hold - functions could be used via CTFE to generate code, thus 
the changes escape. Personally I'm of the opinion that functions should 
be explicitly marked for CTFE, and this is just another reason for such. 
I'm using a patched DMD with added pragma(ctfe) which instructs the 
compiler not to run any codegen or generate debug info 
functions/aggregates marked as such. This trick alone can slim an 
executable down by a good megabyte, which sometimes is a life-saver with 
OPTLINK. I've been hearing that other people put their CTFE stuff into 
.di files, but this approach doesn't cover all cases of codegen via CTFE 
and string mixins.

I'm afraid I won't be doing any other prototypes shortly - I really need 
to focus on my master's thesis :P But then, I don't really know how this 
tool can be improved without hacking the compiler or writing custom OMF 
processing.


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode

Sep 15 2009

Walter Bright <newshound1 digitalmars.com> writes:

Tom S wrote:
 Personally I'm of the opinion that functions should 
 be explicitly marked for CTFE, and this is just another reason for such. 
 I'm using a patched DMD with added pragma(ctfe) which instructs the 
 compiler not to run any codegen or generate debug info 
 functions/aggregates marked as such. This trick alone can slim an 
 executable down by a good megabyte, which sometimes is a life-saver with 
 OPTLINK.

If you are compiling files with -lib, and nobody calls those CTFE 
functions at runtime, then they should never be linked in. (Virtual 
functions are always linked in, as they have a reference to them even if 
they are never called.)

Executables built this way shouldn't have dead functions in them.

Sep 17 2009

Tom S <h3r3tic remove.mat.uni.torun.pl> writes:

Walter Bright wrote:
 Tom S wrote:
 Personally I'm of the opinion that functions should be explicitly 
 marked for CTFE, and this is just another reason for such. I'm using a 
 patched DMD with added pragma(ctfe) which instructs the compiler not 
 to run any codegen or generate debug info functions/aggregates marked 
 as such. This trick alone can slim an executable down by a good 
 megabyte, which sometimes is a life-saver with OPTLINK.

 
 If you are compiling files with -lib, and nobody calls those CTFE 
 functions at runtime, then they should never be linked in. (Virtual 
 functions are always linked in, as they have a reference to them even if 
 they are never called.)
 
 Executables built this way shouldn't have dead functions in them.

It could be debug info, because with -g something definitely is linked 
in whether it's -lib or not (except with -lib there's way more of it). 
With ctfe-mixin-based metaprogramming, you also end up with string 
literals that don't seem to get optimized away by the linker.


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode

Sep 17 2009

Walter Bright <newshound1 digitalmars.com> writes:

Tom S wrote:
 Walter Bright wrote:
 Tom S wrote:
 Personally I'm of the opinion that functions should be explicitly 
 marked for CTFE, and this is just another reason for such. I'm using 
 a patched DMD with added pragma(ctfe) which instructs the compiler 
 not to run any codegen or generate debug info functions/aggregates 
 marked as such. This trick alone can slim an executable down by a 
 good megabyte, which sometimes is a life-saver with OPTLINK.

 If you are compiling files with -lib, and nobody calls those CTFE 
 functions at runtime, then they should never be linked in. (Virtual 
 functions are always linked in, as they have a reference to them even 
 if they are never called.)

 Executables built this way shouldn't have dead functions in them.

 
 It could be debug info, because with -g something definitely is linked 
 in whether it's -lib or not (except with -lib there's way more of it). 

The linker doesn't pull in obj modules based on symbolic debug info. You 
can find out what is pulling in a particular module by deleting it from 
the library, linking, and seeing what undefined symbol message the 
linker produces.


 With ctfe-mixin-based metaprogramming, you also end up with string 
 literals that don't seem to get optimized away by the linker.

The linker has no idea what a string literal is, or what any other 
literals are, either. It doesn't know what a type is. It doesn't know 
what language the source code was. It only knows about symbols, 
sections, and bytes of binary data. The object module format offers no 
way to mark a piece of data as a string literal.

I do think it is possible, though, for the compiler to do a better job 
of not putting unneeded literals into the obj file.

Sep 17 2009

Tom S <h3r3tic remove.mat.uni.torun.pl> writes:

Walter Bright wrote:
 Tom S wrote:
 Walter Bright wrote:
 Tom S wrote:
 Personally I'm of the opinion that functions should be explicitly 
 marked for CTFE, and this is just another reason for such. I'm using 
 a patched DMD with added pragma(ctfe) which instructs the compiler 
 not to run any codegen or generate debug info functions/aggregates 
 marked as such. This trick alone can slim an executable down by a 
 good megabyte, which sometimes is a life-saver with OPTLINK.

 If you are compiling files with -lib, and nobody calls those CTFE 
 functions at runtime, then they should never be linked in. (Virtual 
 functions are always linked in, as they have a reference to them even 
 if they are never called.)

 Executables built this way shouldn't have dead functions in them.

 It could be debug info, because with -g something definitely is linked 
 in whether it's -lib or not (except with -lib there's way more of it). 

 
 The linker doesn't pull in obj modules based on symbolic debug info.

I wasn't implying that.


 You 
 can find out what is pulling in a particular module by deleting it from 
 the library, linking, and seeing what undefined symbol message the 
 linker produces.

I tested it on a single-module program before posting. Basically void 
main() {} and a single unused function void fooBar {}. With -g, 
something with the function's mangled name ended up in the executable. 
Without -g, the linker was able to remove the function (I ran a diff on 
a compiled file with the function removed altogether from source).


 With ctfe-mixin-based metaprogramming, you also end up with string 
 literals that don't seem to get optimized away by the linker.

 
 The linker has no idea what a string literal is, or what any other 
 literals are, either. It doesn't know what a type is. It doesn't know 
 what language the source code was. It only knows about symbols, 
 sections, and bytes of binary data. The object module format offers no 
 way to mark a piece of data as a string literal.

I wasn't implying that either and I'm well aware of it :S I thought it 
would be easier for everyone to understand than any blurbing about 
LEDATA/LED386 and static data segments.


 I do think it is possible, though, for the compiler to do a better job 
 of not putting unneeded literals into the obj file.

That would be nice and perhaps might make OPTLINK crash less.


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode

Sep 17 2009

Walter Bright <newshound1 digitalmars.com> writes:

Tom S wrote:
 I tested it on a single-module program before posting. Basically void 
 main() {} and a single unused function void fooBar {}. With -g, 
 something with the function's mangled name ended up in the executable. 
 Without -g, the linker was able to remove the function (I ran a diff on 
 a compiled file with the function removed altogether from source).

The best way to determine what is linked in to an executable is to 
generate a map file with -L/map, and examine it. It will list all the 
symbols in it.

Also, if you specify a .obj file directly to the linker, it will put all 
of the symbols and data in that .obj file into the executable. The 
linker does NOT remove functions.

What it DOES do is pull obj files out of a library to resolve unresolved 
symbols from other obj files already linked in.

In other words, it's an additive process, not a subtractive one.

Sep 18 2009

Tom S <h3r3tic remove.mat.uni.torun.pl> writes:

Walter Bright wrote:
 Also, if you specify a .obj file directly to the linker, it will put all 
 of the symbols and data in that .obj file into the executable. The 
 linker does NOT remove functions.
 
 What it DOES do is pull obj files out of a library to resolve unresolved 
 symbols from other obj files already linked in.
 
 In other words, it's an additive process, not a subtractive one.

Tests seem to indicate otherwise. By the way, the linker in gcc can also 
remove unused sections (--gc-sections, which works best with 
-ffunction-sections).

----

cat foo.d

void main() {
}

version (WithFoo) {
         void foo() {
         }
}
dmd foo.d -c -of1.obj

dmd foo.d -version=WithFoo -c -of2.obj

diff 1.obj 2.obj

Files 1.obj and 2.obj differ

lib -l 1.obj   1>NUL  && cat 1.lst

Publics by name         module
__Dmain                          1
_D3foo12__ModuleInfoZ            1


Publics by module
1
         __Dmain                           _D3foo12__ModuleInfoZ

lib -l 2.obj   1>NUL  && cat 2.lst

Publics by name         module
__Dmain                          2
_D3foo12__ModuleInfoZ            2
_D3foo3fooFZv                    2


Publics by module
2
         __Dmain                           _D3foo12__ModuleInfoZ
         _D3foo3fooFZv

dmd -L/M 1.obj -of1.exe

dmd -L/M 2.obj -of2.exe

diff 1.exe 2.exe

diff 1.map 2.map


----

-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode

Sep 18 2009

Walter Bright <newshound1 digitalmars.com> writes:

Tom S wrote:
 When building my second largest project, DMD eats up about 1.2GB of 
 memory and dies (even without -g). Luckily, xfBuild allows me to set the 
 limit of modules to be compiled at a time, so when I cap it to 200, it 
 compiled... but didn't link :( Somewhere in the process a library is 
 created that confuses OPTLINK as well as "lib -l". There's one symbol in 
 it that neither of these are unable to see and it results in an 
 undefined reference when linking. The symbol is clearly there when using 
 a lib dumping tool from DDL or "libunres -d -c". I've dropped the lib at 
 http://h3.team0xf.com/strangeLib.7z . The symbol in question is 
 compressed and this newsgroup probably won't chew the non-ansi chars 
 well, but it can be found via a regex "D2xf3omg4core.*ctFromRealVee0P0Z".

Please post to bugzilla.


 One thing slowing this tool down is the need to call the librarian 
 multiple times. DMD -lib will sometimes generate multiple objects with 
 the same name

Please post to bugzilla.

Sep 17 2009

Tom S <h3r3tic remove.mat.uni.torun.pl> writes:

Walter Bright wrote:
 Tom S wrote:
 When building my second largest project, DMD eats up about 1.2GB of 
 memory and dies (even without -g). Luckily, xfBuild allows me to set 
 the limit of modules to be compiled at a time, so when I cap it to 
 200, it compiled... but didn't link :( Somewhere in the process a 
 library is created that confuses OPTLINK as well as "lib -l". There's 
 one symbol in it that neither of these are unable to see and it 
 results in an undefined reference when linking. The symbol is clearly 
 there when using a lib dumping tool from DDL or "libunres -d -c". I've 
 dropped the lib at http://h3.team0xf.com/strangeLib.7z . The symbol in 
 question is compressed and this newsgroup probably won't chew the 
 non-ansi chars well, but it can be found via a regex 
 "D2xf3omg4core.*ctFromRealVee0P0Z".

 
 Please post to bugzilla.

http://d.puremagic.com/issues/show_bug.cgi?id=3327


 One thing slowing this tool down is the need to call the librarian 
 multiple times. DMD -lib will sometimes generate multiple objects with 
 the same name

 
 Please post to bugzilla.

http://d.puremagic.com/issues/show_bug.cgi?id=3328


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode

Sep 17 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Incremental compilation with DMD