digitalmars.D - Compilation times and idiomatic D code

H. S. Teoh via Digitalmars-d (49/49) Jul 05 2017 Over time, what is considered "idiomatic D" has changed, and nowadays it

Stefan Koch (8/11) Jul 05 2017 Yes there is.

jmh530 (6/10) Jul 05 2017 A table in the comments [1] shows a significant reduction in

H. S. Teoh via Digitalmars-d (15/26) Jul 05 2017 That's very nice. Hope we will get this through sooner rather than

Steven Schveighoffer (8/24) Jul 07 2017 I'm super-psyched this has moved from "proof of concept" to ready for

H. S. Teoh via Digitalmars-d (12/26) Jul 07 2017 [...]

John Colvin (2/14) Jul 06 2017 Please give consent for the D Foundation to clone you.

kinke (6/9) Jul 05 2017 LDC has an experimental feature replacing long names by their
Jacob Carlborg (11/19) Jul 06 2017 It's not UFCS per say that causes the problem. If you're using the

Atila Neves (4/17) Jul 06 2017 IIRC building Tango per package instead of all-at-once got the

H. S. Teoh via Digitalmars-d (17/29) Jul 06 2017 [...]

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

Over time, what is considered "idiomatic D" has changed, and nowadays it
seems to be leaning heavily towards range-based code with UFCS chains
using std.algorithm and similar reusable pieces of code.

D (well, DMD specifically) is famed for its lightning speed compilation
times.

So this left me wondering why my latest D project, a smallish codebase
with only ~5000 lines of code, a good part of which are comments, takes
about 11 seconds to compile.

A first hint is that these meager 5000 lines of code compile to a 600MB
executable. Well, large executables have been the plague of D since the
beginning, but the reasoning has always been that hello world examples
don't really count, because the language offers the machinery for much
more than that, and the idea is that as the code size grows, the "bloat
to functionality" ratio decreases.  But still... 600MB for 5000 lines of
code seems a bit excessive. Especially when stripping symbols cut off
about *half* of that size.

Which leads to the discovery, to my horror, that there are some very,
VERY large symbols that are generated. Including one that's 388881
characters long. Yes, that's almost 400KB just for ONE symbol.  This
particular symbol is the result of a long UFCS chain in the main
program, and contains a lot of repeated elements, like
myTemplate__lambdaXXX_myTemplateArguments__mapXXX__Result__myTemplateArguments
and so on.  Each additional member in the UFCS chain causes a repetition
of all the previous members' return type names, plus the new typename,
causing an O(n^2) explosion in symbol size.

Worse yet, because the typename encoded in this monster symbol is a
range, you have the same 300+KB of typename repeated for each of the
range primitives. And anything else this typename happens to be a
template argument to.  There's another related symbol that's 388944
characters long.  Not to mention all the range primitives (along with
their similarly huge typenames) of all the smaller types contained
within this monster typename.

Given this, it's no surprise that the compiler took 11 seconds to
compile a 5000-line program. Just imagine how much time is spent
generating these huge symbols, storing them in the symbol table,
comparing them in symbol table lookups, writing them to the executable,
etc..  And we're not even talking about the other smaller, but still
huge symbols that are also present -- 100KB symbols, 50KB symbols, 10KB
symbols, etc..  And think about the impact of this on the compiler's
memory footprint.

IOW, the very range-based idiom that has become one of the defining
characteristics of modern D is negating the selling point of fast
compilation.

I vaguely remember there was talk about compressing symbols when they
get too long... is there any hope of seeing this realized in the near
future?


T

-- 
War doesn't prove who's right, just who's left. -- BSD Games' Fortune

Jul 05 2017

Stefan Koch <uplink.coder googlemail.com> writes:

On Wednesday, 5 July 2017 at 20:12:40 UTC, H. S. Teoh wrote:
 I vaguely remember there was talk about compressing symbols 
 when they get too long... is there any hope of seeing this 
 realized in the near future?

Yes there is.
Rainer Schuetze is quite close to a solution. Which reduces the 
symbol-name bloat significantly.
See https://github.com/dlang/dmd/pull/5855


There is still a problem with the template system as a whole.
Which I am working on in my spare time.
And which will become my focus after newCTFE is done.

Jul 05 2017

jmh530 <john.michael.hall gmail.com> writes:

On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:
 Yes there is.
 Rainer Schuetze is quite close to a solution. Which reduces the 
 symbol-name bloat significantly.
 See https://github.com/dlang/dmd/pull/5855

A table in the comments [1] shows a significant reduction in 
bloat when compiling phobos unit tests. However, it shows a 
slight increase in build time. I would have expected a decrease. 
Any idea why that is?

[1] https://github.com/dlang/dmd/pull/5855#issuecomment-310653542

Jul 05 2017

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Wed, Jul 05, 2017 at 09:18:45PM +0000, jmh530 via Digitalmars-d wrote:
 On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:
 
 Yes there is.
 Rainer Schuetze is quite close to a solution. Which reduces the
 symbol-name bloat significantly.
 See https://github.com/dlang/dmd/pull/5855


That's very nice.  Hope we will get this through sooner rather than
later!


[...]
 A table in the comments [1] shows a significant reduction in bloat
 when compiling phobos unit tests. However, it shows a slight increase
 in build time. I would have expected a decrease. Any idea why that is?
 
 [1] https://github.com/dlang/dmd/pull/5855#issuecomment-310653542


sure why that PR would interact with this one in this way.

In any case, I think the actual compilation times would depend on the
details of the code.  If you're using relatively shallow UFCS chains,
like Phobos unittests tend to do, probably the compressed symbols won't
give very much advantage over the cost of computing the compression.
But if you have heavy usage of UFCS like in my code, this should cause
significant speedups from not having to operate on 300KB large symbols.


T

-- 
Help a man when he is in trouble and he will remember you when he is in trouble
again.

Jul 05 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 7/5/17 5:24 PM, H. S. Teoh via Digitalmars-d wrote:
 On Wed, Jul 05, 2017 at 09:18:45PM +0000, jmh530 via Digitalmars-d wrote:
 On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:
 Yes there is.
 Rainer Schuetze is quite close to a solution. Which reduces the
 symbol-name bloat significantly.
 See https://github.com/dlang/dmd/pull/5855


 
 That's very nice.  Hope we will get this through sooner rather than
 later!

I'm super-psyched this has moved from "proof of concept" to ready for 
review. Kudos to Rainer for his work on this! Has been a PITA for a while:

https://issues.dlang.org/show_bug.cgi?id=15831
https://forum.dlang.org/post/n96k3g$ka5$1 digitalmars.com

 In any case, I think the actual compilation times would depend on the
 details of the code.  If you're using relatively shallow UFCS chains,
 like Phobos unittests tend to do, probably the compressed symbols won't
 give very much advantage over the cost of computing the compression.
 But if you have heavy usage of UFCS like in my code, this should cause
 significant speedups from not having to operate on 300KB large symbols.

I have found that the linker gets REALLY slow when the symbols get 
large. So it's not necessarily the compiler that's slow for this.

-Steve

Jul 07 2017

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Fri, Jul 07, 2017 at 09:32:24AM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 7/5/17 5:24 PM, H. S. Teoh via Digitalmars-d wrote:
 On Wed, Jul 05, 2017 at 09:18:45PM +0000, jmh530 via Digitalmars-d wrote:
 On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:



[...]
 Rainer Schuetze is quite close to a solution. Which reduces the
 symbol-name bloat significantly.
 See https://github.com/dlang/dmd/pull/5855




[...]
 I'm super-psyched this has moved from "proof of concept" to ready for
 review. Kudos to Rainer for his work on this! Has been a PITA for a
 while:
 
 https://issues.dlang.org/show_bug.cgi?id=15831
 https://forum.dlang.org/post/n96k3g$ka5$1 digitalmars.com

Yes, kudos to Rainer for making this a (near) reality!


[...]
 I have found that the linker gets REALLY slow when the symbols get
 large. So it's not necessarily the compiler that's slow for this.

[...]

True, I didn't profile the compiler carefully to discern whether it was
the compiler that's slow, or the linker.  But either way, having smaller
symbols will benefit both.


T

-- 
Freedom: (n.) Man's self-given right to be enslaved by his own depravity.

Jul 07 2017

John Colvin <john.loughran.colvin gmail.com> writes:

On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:
 On Wednesday, 5 July 2017 at 20:12:40 UTC, H. S. Teoh wrote:
 I vaguely remember there was talk about compressing symbols 
 when they get too long... is there any hope of seeing this 
 realized in the near future?

 Yes there is.
 Rainer Schuetze is quite close to a solution. Which reduces the 
 symbol-name bloat significantly.
 See https://github.com/dlang/dmd/pull/5855


 There is still a problem with the template system as a whole.
 Which I am working on in my spare time.
 And which will become my focus after newCTFE is done.

Please give consent for the D Foundation to clone you.

Jul 06 2017

kinke <noone nowhere.com> writes:

On Wednesday, 5 July 2017 at 20:12:40 UTC, H. S. Teoh wrote:
 I vaguely remember there was talk about compressing symbols 
 when they get too long... is there any hope of seeing this 
 realized in the near future?

LDC has an experimental feature replacing long names by their 
hash; ldc2 -help:
...
   -hash-threshold=<uint>                    - Hash symbol names 
longer than this threshold (experimental)

Jul 05 2017

Jacob Carlborg <doob me.com> writes:

On 2017-07-05 22:12, H. S. Teoh via Digitalmars-d wrote:
 Over time, what is considered "idiomatic D" has changed, and nowadays it
 seems to be leaning heavily towards range-based code with UFCS chains
 using std.algorithm and similar reusable pieces of code.

It's not UFCS per say that causes the problem. If you're using the 
traditional calling syntax it would generate the same symbols.

 D (well, DMD specifically) is famed for its lightning speed compilation
 times.

 So this left me wondering why my latest D project, a smallish codebase
 with only ~5000 lines of code, a good part of which are comments, takes
 about 11 seconds to compile.

Yeah, it's usually all these D specific compile time features that is 
slowing down compilation.

DWT and Tango are two good examples of large code bases where very few 
of these features are used, they're written in a more traditional style. 
They're at least 200k lines of code each and, IIRC, takes around 10 
seconds (or less) to compile, for a full build.

-- 
/Jacob Carlborg

Jul 06 2017

Atila Neves <atila.neves gmail.com> writes:

On Thursday, 6 July 2017 at 12:00:29 UTC, Jacob Carlborg wrote:
 On 2017-07-05 22:12, H. S. Teoh via Digitalmars-d wrote:
 [...]

 It's not UFCS per say that causes the problem. If you're using 
 the traditional calling syntax it would generate the same 
 symbols.

 [...]

 Yeah, it's usually all these D specific compile time features 
 that is slowing down compilation.

 DWT and Tango are two good examples of large code bases where 
 very few of these features are used, they're written in a more 
 traditional style. They're at least 200k lines of code each 
 and, IIRC, takes around 10 seconds (or less) to compile, for a 
 full build.

IIRC building Tango per package instead of all-at-once got the 
build time down to less than a second.

Atila

Jul 06 2017

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Thu, Jul 06, 2017 at 01:32:04PM +0000, Atila Neves via Digitalmars-d wrote:
 On Thursday, 6 July 2017 at 12:00:29 UTC, Jacob Carlborg wrote:

[...]
 Yeah, it's usually all these D specific compile time features that
 is slowing down compilation.
 
 DWT and Tango are two good examples of large code bases where very
 few of these features are used, they're written in a more
 traditional style.  They're at least 200k lines of code each and,
 IIRC, takes around 10 seconds (or less) to compile, for a full
 build.

 
 IIRC building Tango per package instead of all-at-once got the build
 time down to less than a second.

[...]

Well, obviously D's famed compilation speed must still be applicable
*somewhere*, otherwise we'd be hearing loud complaints. :-D

My point was that D's compile-time features, which are a big draw to me
personally, and also becoming a selling point of D, need improvement in
this area.

I'm very happy to be pointed to Rainer's PR that implements symbol
backreferencing compression.  Apparently it has successfully compressed
the largest symbol generated by Phobos unittests from 30KB (or something
like that) down to about 1100 characters, which, though still on the
large side, is much more reasonable than the present situation.  I hope
this PR will get merged in the near future.


T

-- 
Making non-nullable pointers is just plugging one hole in a cheese grater. --
Walter Bright

Jul 06 2017

D Programming

C/C++ Programming

Other

digitalmars.D - Compilation times and idiomatic D code