www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - D's memory-hungry templates

reply tsbockman <thomas.bockman gmail.com> writes:
While working on a small PR 
(https://github.com/dlang/phobos/pull/4420), I noticed that D's 
template computation system has horrific memory consumption (as 
well as being very slow).

I believe there are several reasons for this:

1) All template instantiations are memoized, even if they're just 
private internal implementation details. This causes the space 
complexity for recursive algorithms to generally be just as bad 
as the time complexity, whereas it should usually be much better.

2) Everything is immutable, which sometimes forces O(N) array 
algorithms to be replaced with O(N log(N)) tree algorithms. This 
compounds with (1).

3) Even though D *requires* that all template algorithms be 
recursive, the recursion limit is actually set very low (500). 
This forces efficient linear recursion O(N) algorithms to be 
replaced with wasteful O(N log(N)) binary recursion, unless N is 
guaranteed to be very small. This compounds with (1), and 
sometimes (2).

The combination of these issues causes very severe memory 
consumption and speed problems; as an example 
`staticSort!(aliasSeq!(iota(N)))`, which should have a time 
complexity of O(N log(N)) and a space complexity of O(N), instead 
seems to have a complexity of O(N^2 log(N)) for both time and 
space. This is awful - worse than any normal sorting algorithm, 
despite the fact that `staticSort` is based on the 
normally-very-efficient "merge sort" algorithm.

On my system, for N = 450 the sort takes about 300 ms and 
consumes 120 MB of memory. (With my pull request this is reduced 
to 80 ms and 35 MB, but that's still terrible.)

Ultimately, I believe it was a mistake for D to implement a 
separate, inferior programming language just for templates. 
However, it is too late to change that now (at least for D2), so 
I will offer some suggestions as to how memory consumption can be 
reduced within the current design:

A) Members of a template instantiation should be eagerly 
evaluated. Once a template has been fully evaluated, any private 
member which is not referenced by a public one should be deleted.

B) Template instantiations should be deleted after they are no 
longer accessible (even indirectly) via a top-level declaration.

C) The compiler should store `T...` in such a way that 
recursively appending N items to the beginning or end of an 
AliasSeq does not allocate more than O(N log(N)) additional 
memory. (O(N) is possible with more indirections.)

D) Implement tail recursion optimization for templates. Tail 
recursion should not count toward the recursion depth limit.

E) Consider eliminating the recursion limit entirely. Given that 
the template system is Turing Complete and mandates heavy use of 
recursion, there is no reason to think that a depth of 500 means 
something has gone wrong, any more than for a `while` loop 
running 500+ iterations. (Implementing this may require fixing 
the exponential name growth, though.)

Implementing A, B, and C should get D's template memory 
consumption under control.
Implementing D and E will make template computation more 
flexible, encouraging people to use it more and find new things 
to complain about.  :-P

Thoughts?
Jun 09 2016
next sibling parent reply maik klein <maikklein googlemail.com> writes:
On Thursday, 9 June 2016 at 14:46:12 UTC, tsbockman wrote:
 While working on a small PR 
 (https://github.com/dlang/phobos/pull/4420), I noticed that D's 
 template computation system has horrific memory consumption (as 
 well as being very slow).

 [...]
I run into the same issues with https://maikklein.github.io/2016/03/01/metaprogramming-typeobject/ I think doing metaprogramming that way is really neat but the memory consumption if I remember correctly was around 50 times worse than doing it without "type objects". Also C++ beat D in every compile time meta programming benchmark that I have tested. The only time when D was roughly as fast as C++ was with string mixins but they are even more memory hungry. Stuff like filtering odd integers from an AliasSeq of 100k elements etc. I mostly recreated those benchmarks in D https://github.com/boostorg/hana/tree/master/benchmark The only time when D compiled roughly as fast as C++ was with string mixins and they are even more memory hungry.
Jun 09 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/9/16 7:56 PM, maik klein wrote:
 Also C++ beat D in every compile time meta programming benchmark that I
 have tested.
Intriguing. Do you have a couple of representative benchmarks along with their C++ counterparts to post to bugzilla? Thanks! -- Andrei
Jun 10 2016
parent reply maik klein <maikklein googlemail.com> writes:
On Friday, 10 June 2016 at 10:27:09 UTC, Andrei Alexandrescu 
wrote:
 On 6/9/16 7:56 PM, maik klein wrote:
 Also C++ beat D in every compile time meta programming 
 benchmark that I
 have tested.
Intriguing. Do you have a couple of representative benchmarks along with their C++ counterparts to post to bugzilla? Thanks! -- Andrei
Not in a presentable form, I still have a framework on my other machine. I basically generated new D files from within D and then compiled them using ldc/dmd. I could clean it up and upload it when I have some time.
Jun 10 2016
next sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
On Friday, 10 June 2016 at 11:09:58 UTC, maik klein wrote:

 Not in a presentable form, I still have a framework on my other 
 machine. I basically generated new D files from within D and 
 then compiled them using ldc/dmd.

 I could clean it up and upload it when I have some time.
Yes please. Compile-time perf is always good to test :)
Jun 10 2016
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/10/16 11:09 AM, maik klein wrote:
 On Friday, 10 June 2016 at 10:27:09 UTC, Andrei Alexandrescu wrote:
 On 6/9/16 7:56 PM, maik klein wrote:
 Also C++ beat D in every compile time meta programming benchmark that I
 have tested.
Intriguing. Do you have a couple of representative benchmarks along with their C++ counterparts to post to bugzilla? Thanks! -- Andrei
Not in a presentable form, I still have a framework on my other machine. I basically generated new D files from within D and then compiled them using ldc/dmd. I could clean it up and upload it when I have some time.
That would be very helpful. Thanks! -- Andrei
Jun 10 2016
prev sibling parent reply Alex Bradbury <asb asbradbury.org> writes:
On Thursday, 9 June 2016 at 14:46:12 UTC, tsbockman wrote:
 Ultimately, I believe it was a mistake for D to implement a 
 separate, inferior programming language just for templates. 
 However, it is too late to change that now (at least for D2), 
 so I will offer some suggestions as to how memory consumption 
 can be reduced within the current design:
If you have a design in mind, I'd be interested in hearing your proposals for an alternative.
Jun 09 2016
parent tsbockman <thomas.bockman gmail.com> writes:
On Thursday, 9 June 2016 at 22:03:36 UTC, Alex Bradbury wrote:
 On Thursday, 9 June 2016 at 14:46:12 UTC, tsbockman wrote:
 Ultimately, I believe it was a mistake for D to implement a 
 separate, inferior programming language just for templates. 
 However, it is too late to change that now (at least for D2), 
 so I will offer some suggestions as to how memory consumption 
 can be reduced within the current design:
If you have a design in mind, I'd be interested in hearing your proposals for an alternative.
I have an incomplete design for a new programming language, and the beginnings of a proof-of-concept implementation (written in D). It was inspired mainly by D and Lisp, with goal being to distill the power and performance that D strives for down into a language that is much closer to the simplicity and conceptual purity of Lisp. The project is still at a very early stage, and I haven't decided yet how serious I am about completing it, but I'll give you a brief sketch anyway: * Very simple syntax, with the goal that a person should be able to easily visualize the resulting AST. Inspired by Lisp, but with a little extra complexity to keep people from getting Lost In a Sea of Parentheses. * statically typed * functional, but not pure - mutability and global state are permitted * what D calls a template is really just a function * symbols, AST nodes, functions, templates, and data types are all regular values which can be manipulated via normal code * function call syntax explicitly (and concisely) indicates whether to evaluate the call at compile-time, run-time, or whenever * currying: evaluating a function with some, but not all, of its parameters returns another function that can later be fed the rest of its parameters * currying a function at compile-time and then calling the result at run-time is equivalent to instantiating and calling a template function in D * most complex features, such as polymorphic classes, lambdas, reified generics, inlining, tail recursion, and string mixins can be implemented in the standard library The above approach to meta-programming would be vastly simpler than what D has done, and offer much better compile-time performance if implemented sensibly. (I also have other ideas about managing memory, mutability, and multi-tasking, as well, but they're off-topic and less fully formed.) For D itself, though, we must work with what we have. I think the suggestions I gave at the beginning of this thread, combined with the CTFE system upgrade that Stefan Koch is working on, would bring huge improvements to D's compile-time performance.
Jun 09 2016