www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Compilation times and idiomatic D code

reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
Over time, what is considered "idiomatic D" has changed, and nowadays it
seems to be leaning heavily towards range-based code with UFCS chains
using std.algorithm and similar reusable pieces of code.

D (well, DMD specifically) is famed for its lightning speed compilation
times.

So this left me wondering why my latest D project, a smallish codebase
with only ~5000 lines of code, a good part of which are comments, takes
about 11 seconds to compile.

A first hint is that these meager 5000 lines of code compile to a 600MB
executable. Well, large executables have been the plague of D since the
beginning, but the reasoning has always been that hello world examples
don't really count, because the language offers the machinery for much
more than that, and the idea is that as the code size grows, the "bloat
to functionality" ratio decreases.  But still... 600MB for 5000 lines of
code seems a bit excessive. Especially when stripping symbols cut off
about *half* of that size.

Which leads to the discovery, to my horror, that there are some very,
VERY large symbols that are generated. Including one that's 388881
characters long. Yes, that's almost 400KB just for ONE symbol.  This
particular symbol is the result of a long UFCS chain in the main
program, and contains a lot of repeated elements, like
myTemplate__lambdaXXX_myTemplateArguments__mapXXX__Result__myTemplateArguments
and so on.  Each additional member in the UFCS chain causes a repetition
of all the previous members' return type names, plus the new typename,
causing an O(n^2) explosion in symbol size.

Worse yet, because the typename encoded in this monster symbol is a
range, you have the same 300+KB of typename repeated for each of the
range primitives. And anything else this typename happens to be a
template argument to.  There's another related symbol that's 388944
characters long.  Not to mention all the range primitives (along with
their similarly huge typenames) of all the smaller types contained
within this monster typename.

Given this, it's no surprise that the compiler took 11 seconds to
compile a 5000-line program. Just imagine how much time is spent
generating these huge symbols, storing them in the symbol table,
comparing them in symbol table lookups, writing them to the executable,
etc..  And we're not even talking about the other smaller, but still
huge symbols that are also present -- 100KB symbols, 50KB symbols, 10KB
symbols, etc..  And think about the impact of this on the compiler's
memory footprint.

IOW, the very range-based idiom that has become one of the defining
characteristics of modern D is negating the selling point of fast
compilation.

I vaguely remember there was talk about compressing symbols when they
get too long... is there any hope of seeing this realized in the near
future?


T

-- 
War doesn't prove who's right, just who's left. -- BSD Games' Fortune
Jul 05 2017
next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 5 July 2017 at 20:12:40 UTC, H. S. Teoh wrote:
 I vaguely remember there was talk about compressing symbols 
 when they get too long... is there any hope of seeing this 
 realized in the near future?
Yes there is. Rainer Schuetze is quite close to a solution. Which reduces the symbol-name bloat significantly. See https://github.com/dlang/dmd/pull/5855 There is still a problem with the template system as a whole. Which I am working on in my spare time. And which will become my focus after newCTFE is done.
Jul 05 2017
next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:
 Yes there is.
 Rainer Schuetze is quite close to a solution. Which reduces the 
 symbol-name bloat significantly.
 See https://github.com/dlang/dmd/pull/5855
A table in the comments [1] shows a significant reduction in bloat when compiling phobos unit tests. However, it shows a slight increase in build time. I would have expected a decrease. Any idea why that is? [1] https://github.com/dlang/dmd/pull/5855#issuecomment-310653542
Jul 05 2017
parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Wed, Jul 05, 2017 at 09:18:45PM +0000, jmh530 via Digitalmars-d wrote:
 On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:
 
 Yes there is.
 Rainer Schuetze is quite close to a solution. Which reduces the
 symbol-name bloat significantly.
 See https://github.com/dlang/dmd/pull/5855
That's very nice. Hope we will get this through sooner rather than later! [...]
 A table in the comments [1] shows a significant reduction in bloat
 when compiling phobos unit tests. However, it shows a slight increase
 in build time. I would have expected a decrease. Any idea why that is?
 
 [1] https://github.com/dlang/dmd/pull/5855#issuecomment-310653542
sure why that PR would interact with this one in this way. In any case, I think the actual compilation times would depend on the details of the code. If you're using relatively shallow UFCS chains, like Phobos unittests tend to do, probably the compressed symbols won't give very much advantage over the cost of computing the compression. But if you have heavy usage of UFCS like in my code, this should cause significant speedups from not having to operate on 300KB large symbols. T -- Help a man when he is in trouble and he will remember you when he is in trouble again.
Jul 05 2017
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 7/5/17 5:24 PM, H. S. Teoh via Digitalmars-d wrote:
 On Wed, Jul 05, 2017 at 09:18:45PM +0000, jmh530 via Digitalmars-d wrote:
 On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:
 Yes there is.
 Rainer Schuetze is quite close to a solution. Which reduces the
 symbol-name bloat significantly.
 See https://github.com/dlang/dmd/pull/5855
That's very nice. Hope we will get this through sooner rather than later!
I'm super-psyched this has moved from "proof of concept" to ready for review. Kudos to Rainer for his work on this! Has been a PITA for a while: https://issues.dlang.org/show_bug.cgi?id=15831 https://forum.dlang.org/post/n96k3g$ka5$1 digitalmars.com
 In any case, I think the actual compilation times would depend on the
 details of the code.  If you're using relatively shallow UFCS chains,
 like Phobos unittests tend to do, probably the compressed symbols won't
 give very much advantage over the cost of computing the compression.
 But if you have heavy usage of UFCS like in my code, this should cause
 significant speedups from not having to operate on 300KB large symbols.
I have found that the linker gets REALLY slow when the symbols get large. So it's not necessarily the compiler that's slow for this. -Steve
Jul 07 2017
parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Fri, Jul 07, 2017 at 09:32:24AM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 7/5/17 5:24 PM, H. S. Teoh via Digitalmars-d wrote:
 On Wed, Jul 05, 2017 at 09:18:45PM +0000, jmh530 via Digitalmars-d wrote:
 On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:
[...]
 Rainer Schuetze is quite close to a solution. Which reduces the
 symbol-name bloat significantly.
 See https://github.com/dlang/dmd/pull/5855
[...]
 I'm super-psyched this has moved from "proof of concept" to ready for
 review. Kudos to Rainer for his work on this! Has been a PITA for a
 while:
 
 https://issues.dlang.org/show_bug.cgi?id=15831
 https://forum.dlang.org/post/n96k3g$ka5$1 digitalmars.com
Yes, kudos to Rainer for making this a (near) reality! [...]
 I have found that the linker gets REALLY slow when the symbols get
 large. So it's not necessarily the compiler that's slow for this.
[...] True, I didn't profile the compiler carefully to discern whether it was the compiler that's slow, or the linker. But either way, having smaller symbols will benefit both. T -- Freedom: (n.) Man's self-given right to be enslaved by his own depravity.
Jul 07 2017
prev sibling parent John Colvin <john.loughran.colvin gmail.com> writes:
On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:
 On Wednesday, 5 July 2017 at 20:12:40 UTC, H. S. Teoh wrote:
 I vaguely remember there was talk about compressing symbols 
 when they get too long... is there any hope of seeing this 
 realized in the near future?
Yes there is. Rainer Schuetze is quite close to a solution. Which reduces the symbol-name bloat significantly. See https://github.com/dlang/dmd/pull/5855 There is still a problem with the template system as a whole. Which I am working on in my spare time. And which will become my focus after newCTFE is done.
Please give consent for the D Foundation to clone you.
Jul 06 2017
prev sibling next sibling parent kinke <noone nowhere.com> writes:
On Wednesday, 5 July 2017 at 20:12:40 UTC, H. S. Teoh wrote:
 I vaguely remember there was talk about compressing symbols 
 when they get too long... is there any hope of seeing this 
 realized in the near future?
LDC has an experimental feature replacing long names by their hash; ldc2 -help: ... -hash-threshold=<uint> - Hash symbol names longer than this threshold (experimental)
Jul 05 2017
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2017-07-05 22:12, H. S. Teoh via Digitalmars-d wrote:
 Over time, what is considered "idiomatic D" has changed, and nowadays it
 seems to be leaning heavily towards range-based code with UFCS chains
 using std.algorithm and similar reusable pieces of code.
It's not UFCS per say that causes the problem. If you're using the traditional calling syntax it would generate the same symbols.
 D (well, DMD specifically) is famed for its lightning speed compilation
 times.

 So this left me wondering why my latest D project, a smallish codebase
 with only ~5000 lines of code, a good part of which are comments, takes
 about 11 seconds to compile.
Yeah, it's usually all these D specific compile time features that is slowing down compilation. DWT and Tango are two good examples of large code bases where very few of these features are used, they're written in a more traditional style. They're at least 200k lines of code each and, IIRC, takes around 10 seconds (or less) to compile, for a full build. -- /Jacob Carlborg
Jul 06 2017
parent reply Atila Neves <atila.neves gmail.com> writes:
On Thursday, 6 July 2017 at 12:00:29 UTC, Jacob Carlborg wrote:
 On 2017-07-05 22:12, H. S. Teoh via Digitalmars-d wrote:
 [...]
It's not UFCS per say that causes the problem. If you're using the traditional calling syntax it would generate the same symbols.
 [...]
Yeah, it's usually all these D specific compile time features that is slowing down compilation. DWT and Tango are two good examples of large code bases where very few of these features are used, they're written in a more traditional style. They're at least 200k lines of code each and, IIRC, takes around 10 seconds (or less) to compile, for a full build.
IIRC building Tango per package instead of all-at-once got the build time down to less than a second. Atila
Jul 06 2017
parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Thu, Jul 06, 2017 at 01:32:04PM +0000, Atila Neves via Digitalmars-d wrote:
 On Thursday, 6 July 2017 at 12:00:29 UTC, Jacob Carlborg wrote:
[...]
 Yeah, it's usually all these D specific compile time features that
 is slowing down compilation.
 
 DWT and Tango are two good examples of large code bases where very
 few of these features are used, they're written in a more
 traditional style.  They're at least 200k lines of code each and,
 IIRC, takes around 10 seconds (or less) to compile, for a full
 build.
IIRC building Tango per package instead of all-at-once got the build time down to less than a second.
[...] Well, obviously D's famed compilation speed must still be applicable *somewhere*, otherwise we'd be hearing loud complaints. :-D My point was that D's compile-time features, which are a big draw to me personally, and also becoming a selling point of D, need improvement in this area. I'm very happy to be pointed to Rainer's PR that implements symbol backreferencing compression. Apparently it has successfully compressed the largest symbol generated by Phobos unittests from 30KB (or something like that) down to about 1100 characters, which, though still on the large side, is much more reasonable than the present situation. I hope this PR will get merged in the near future. T -- Making non-nullable pointers is just plugging one hole in a cheese grater. -- Walter Bright
Jul 06 2017