digitalmars.D - Compiler scalability. Question inspired by OOM errors seen by Crystal

Pradeep Gowda (5/16) Aug 29 2017 How does D compare in this regard, especially in cases where

Nicholas Wilson (8/24) Aug 29 2017 Auto is not the problem you can easily figure out the return type

Petar Kirov [ZombineDev] (10/37) Aug 30 2017 Symbols do eat your RAM because, the compiler has to generate
H. S. Teoh via Digitalmars-d (15/19) Aug 30 2017 The symbol length problem has been greatly reduced by Rainer's recent

Stefan Koch (7/14) Aug 31 2017 Yes that is true.

H. S. Teoh via Digitalmars-d (41/61) Aug 31 2017 I think D's conception of templates, which IMO goes beyond C++'s

Stefan Koch (26/38) Aug 31 2017 That is a good example and it is one where we could maybe deduce

Petar Kirov [ZombineDev] (5/21) Aug 30 2017 D supports separate compilation by design. I.e. it doesn't

H. S. Teoh via Digitalmars-d (15/18) Aug 30 2017 Yes, I think the general recommendation for very large codebases is to

Petar Kirov [ZombineDev] (12/31) Aug 30 2017 Yep, that avoids parsing the same file over and over again (like

Pradeep Gowda <pradeep btbytes.com> writes:

I'm referring to this thread about Crystal --  
https://lobste.rs/s/dyitr0/its_fun_program_crystal_is_it_scalable

 Crystal is strongly typed, but overwhelmingly uses type 
 inference, rather than explicit types. Because Crystal aims to 
 be spiritually—and frequently literally—compatible with Ruby, 
 that’s a problem: to accomplish that, Crystal relies on 
 sometimes-nullable types with implicit structure and implicit 
 unions, such that, frequently, the only way to even begin type 
 inference is to load the entire program’s AST into RAM all at 
 once and then start your massive type inference pass. What 
 you’re seeing in this thread is how a “simple” fix to a YAML 
 parser error reporting hit that problem, causing Crystal to use 
 a critical amount too much RAM and OOM.

How does D compare in this regard, especially in cases where 
`auto` storage class specifiers are used liberally throughout the 
code base?

Aug 29 2017

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Wednesday, 30 August 2017 at 01:30:30 UTC, Pradeep Gowda wrote:
 I'm referring to this thread about Crystal --  
 https://lobste.rs/s/dyitr0/its_fun_program_crystal_is_it_scalable

 Crystal is strongly typed, but overwhelmingly uses type 
 inference, rather than explicit types. Because Crystal aims to 
 be spiritually—and frequently literally—compatible with Ruby, 
 that’s a problem: to accomplish that, Crystal relies on 
 sometimes-nullable types with implicit structure and implicit 
 unions, such that, frequently, the only way to even begin type 
 inference is to load the entire program’s AST into RAM all at 
 once and then start your massive type inference pass. What 
 you’re seeing in this thread is how a “simple” fix to a YAML 
 parser error reporting hit that problem, causing Crystal to 
 use a critical amount too much RAM and OOM.

 How does D compare in this regard, especially in cases where 
 `auto` storage class specifiers are used liberally throughout 
 the code base?

Auto is not the problem you can easily figure out the return type 
of function that return a primitive type, and aggregates have to 
be specified.

The problem with D is the memory hogging nature of CTFE and the 
sheer number of templates that get instantiated when compiling 
big codebases. Symbol length is also a problem but that eats you 
dose space not your RAM.

Aug 29 2017

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Wednesday, 30 August 2017 at 02:53:42 UTC, Nicholas Wilson 
wrote:
 On Wednesday, 30 August 2017 at 01:30:30 UTC, Pradeep Gowda 
 wrote:
 I'm referring to this thread about Crystal --  
 https://lobste.rs/s/dyitr0/its_fun_program_crystal_is_it_scalable

 Crystal is strongly typed, but overwhelmingly uses type 
 inference, rather than explicit types. Because Crystal aims 
 to be spiritually—and frequently literally—compatible with 
 Ruby, that’s a problem: to accomplish that, Crystal relies on 
 sometimes-nullable types with implicit structure and implicit 
 unions, such that, frequently, the only way to even begin 
 type inference is to load the entire program’s AST into RAM 
 all at once and then start your massive type inference pass. 
 What you’re seeing in this thread is how a “simple” fix to a 
 YAML parser error reporting hit that problem, causing Crystal 
 to use a critical amount too much RAM and OOM.

 How does D compare in this regard, especially in cases where 
 `auto` storage class specifiers are used liberally throughout 
 the code base?

 Auto is not the problem you can easily figure out the return 
 type of function that return a primitive type, and aggregates 
 have to be specified.

 The problem with D is the memory hogging nature of CTFE and the 
 sheer number of templates that get instantiated when compiling 
 big codebases. Symbol length is also a problem but that eats 
 you dose space not your RAM.

Symbols do eat your RAM because, the compiler has to generate 
them in RAM before writing them to a binary, and also don't 
forget `pragma (msg, symbol.mangle)` - the mangled name of each 
symbol is available for use in CTFE. However the mangling 
problems should finally be put to rest, thanks to Rainer 
Schuetze, now that
https://github.com/dlang/dmd/pull/5855 / 
https://github.com/dlang/dmd/pull/6998 was merged.

Aug 30 2017

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Wed, Aug 30, 2017 at 02:53:42AM +0000, Nicholas Wilson via Digitalmars-d
wrote:
[...]
 The problem with D is the memory hogging nature of CTFE and the sheer
 number of templates that get instantiated when compiling big
 codebases. Symbol length is also a problem but that eats you dose
 space not your RAM.

The symbol length problem has been greatly reduced by Rainer's recent
mangle PR, which is now in git master and will be in the following
release. There is still room for even more reduction but they will be
comparatively minor. Rainer's fix has eliminated the bulk of the symbol
length problem.

The CTFE problem will be fixed soon, once Stefan Koch finishes his
newCTFE engine.

Templates are still an area where a lot of improvement can be made. But
Stefan has indicated interest in visiting this area in the compiler
after the newCTFE project is done.  So eventually we'll get there.


T

-- 
Famous last words: I *think* this will work...

Aug 30 2017

Stefan Koch <uplink.coder googlemail.com> writes:

On Wednesday, 30 August 2017 at 16:34:13 UTC, H. S. Teoh wrote:
 The CTFE problem will be fixed soon, once Stefan Koch finishes 
 his newCTFE engine.

 Templates are still an area where a lot of improvement can be 
 made. But Stefan has indicated interest in visiting this area 
 in the compiler after the newCTFE project is done.  So 
 eventually we'll get there.


 T

Yes that is true.
However since I've just taken a full-time job I cannot say when 
it will be finished.

As for templates I have chosen to work around the issue entirely 
and integrate type-manipulation and introspection abilities with 
ctfe. (Which is inherently more scalable then templates)

Aug 31 2017

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Thu, Aug 31, 2017 at 04:36:41PM +0000, Stefan Koch via Digitalmars-d wrote:
 On Wednesday, 30 August 2017 at 16:34:13 UTC, H. S. Teoh wrote:
 
 The CTFE problem will be fixed soon, once Stefan Koch finishes his
 newCTFE engine.
 
 Templates are still an area where a lot of improvement can be made.
 But Stefan has indicated interest in visiting this area in the
 compiler after the newCTFE project is done.  So eventually we'll get
 there.
 
 
 T

 
 Yes that is true.
 However since I've just taken a full-time job I cannot say when it
 will be finished.
 
 As for templates I have chosen to work around the issue entirely and
 integrate type-manipulation and introspection abilities with ctfe.
 (Which is inherently more scalable then templates)

I think D's conception of templates, which IMO goes beyond C++'s
(conceptually, anyway, the current implementation is not too different
fundamentally), can be implemented in a much better way that makes it
more scalable.

C++ sees templates as code stencils to be instantiated with the given
template arguments, i.e., copy-n-paste the stencil and substitute
template parameters with the given arguments.

D's templates for the most part still retains that, and certainly the
current implementation essentially just follows the C++ model. But IMO
there's an important conceptual difference: D sees template parameters
not so much as parameters for code stencils to be copy-n-pasted, but as
*compile-time* parameters to a function / set of declarations.  I.e.,
they are parameters that are known at compile-time rather than runtime.

The distinction is subtle but IMO important, because it allows us to
break out of the C++ mold of copy-n-paste-this-stencil that is one of
the causes of template scalability problems (pass too many different
template arguments, and you get massive code bloat, one copy per
instantiation).  By treating template arguments as arguments known at
compile-time instead, the compiler can get smarter with when to
copy-n-paste a declaration, and when to just evaluate the template with
the argument without actually instantiating anything (no copying of AST
nodes, etc.).

A common example is a templated linked-list container, where much of the
linked-list pointer manipulation code is actually independent of the
type of the contained data.  The compiler really only needs to generate
distinct copies of the code when the data itself is being manipulated;
all the other code can be shared between template instantiations.

This is just one example off the top of my head; I'm sure there are
plenty of others that we can come up with once we break free of the C++
mold of thinking of templates as "copy-n-paste except substitute X with
Y".  Another example that just occurred to me is the various recursive
templates in std.meta for manipulating AliasSeq's.  There is actually no
need for the compiler to instantiate anything at all, except perhaps for
the final result. By treating the AliasSeq as an in-compiler data type
with compile-time operations performed on it, the compiler ought to be
able to evaluate the template without needing to instantiate anything
past the first level of recursion.


T

-- 
"How are you doing?" "Doing what?"

Aug 31 2017

Stefan Koch <uplink.coder googlemail.com> writes:

On Thursday, 31 August 2017 at 17:39:35 UTC, H. S. Teoh wrote:

 This is just one example off the top of my head; I'm sure there 
 are plenty of others that we can come up with once we break 
 free of the C++ mold of thinking of templates as "copy-n-paste 
 except substitute X with Y".  Another example that just 
 occurred to me is the various recursive templates in std.meta 
 for manipulating AliasSeq's.  There is actually no need for the 
 compiler to instantiate anything at all, except perhaps for the 
 final result. By treating the AliasSeq as an in-compiler data 
 type with compile-time operations performed on it, the compiler 
 ought to be able to evaluate the template without needing to 
 instantiate anything past the first level of recursion.


 T

That is a good example and it is one where we could maybe deduce 
what is doing on.
However the classification of a template (rather the deduction of 
unused intanciations) actually requires us to instantiate the 
template.
Because we cannot predict to what it might evaluate given 
specific parameters.
For this to work we have to produce all possible instantiation 
behaviors of a template, and deduplicate this set.
Which happens to be infinite, in most cases.
Even predicting if the set of instantiations can be finite is 
essentially an instance of the halting problem.

I've worked on this for some time until I realized that I was 
fighting without gaining ground.

It would end up a myriad of special cases and would be impossible 
to maintain.

Considering that the current template system which does not try 
anything smart is already fiendishly complicated, it'd rather not 
introduce an even more complicated 
template-instance-shape-predictor which will probably only work 
on the simplest of cases and is not guaranteed to bring wins that 
would justify the time it'll take to implement it.

TL;DR

The change to optimize certain groups of templates will require 
MASSIVE amounts of work and is unlikely to pay for itself.

Aug 31 2017

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Wednesday, 30 August 2017 at 01:30:30 UTC, Pradeep Gowda wrote:
 I'm referring to this thread about Crystal --  
 https://lobste.rs/s/dyitr0/its_fun_program_crystal_is_it_scalable

 Crystal is strongly typed, but overwhelmingly uses type 
 inference, rather than explicit types. Because Crystal aims to 
 be spiritually—and frequently literally—compatible with Ruby, 
 that’s a problem: to accomplish that, Crystal relies on 
 sometimes-nullable types with implicit structure and implicit 
 unions, such that, frequently, the only way to even begin type 
 inference is to load the entire program’s AST into RAM all at 
 once and then start your massive type inference pass. What 
 you’re seeing in this thread is how a “simple” fix to a YAML 
 parser error reporting hit that problem, causing Crystal to 
 use a critical amount too much RAM and OOM.

 How does D compare in this regard, especially in cases where 
 `auto` storage class specifiers are used liberally throughout 
 the code base?

D supports separate compilation by design. I.e. it doesn't 
require all the source files corresponding to all the object 
files being linked to produce the final executable, to be loaded 
in memory by the compiler.

Aug 30 2017

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Wed, Aug 30, 2017 at 08:01:17AM +0000, via Digitalmars-d wrote:
[...]
 D supports separate compilation by design. I.e. it doesn't require all
 the source files corresponding to all the object files being linked to
 produce the final executable, to be loaded in memory by the compiler.

Yes, I think the general recommendation for very large codebases is to
compile one package at a time, i.e., one subdir at a time.

Nevertheless, compiler memory usage remains an issue on low-memory
systems.  My old work PC simply cannot handle running dmd without
grinding to a halt because there's just not enough RAM to go around
(only 500MB total).  It's generally not a problem for modern PCs with at
least a few GB of memory.  Walter chose compilation speed over memory
efficiency, so that's just the way it is.  In theory, one could turn on
GC in the compiler so that it will work on low-memory systems, but I'm
not sure if such a change will be accepted into dmd.


T

-- 
Questions are the beginning of intelligence, but the fear of God is the
beginning of wisdom.

Aug 30 2017

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Wednesday, 30 August 2017 at 16:39:32 UTC, H. S. Teoh wrote:
 On Wed, Aug 30, 2017 at 08:01:17AM +0000, via Digitalmars-d 
 wrote: [...]
 D supports separate compilation by design. I.e. it doesn't 
 require all the source files corresponding to all the object 
 files being linked to produce the final executable, to be 
 loaded in memory by the compiler.

 Yes, I think the general recommendation for very large 
 codebases is to compile one package at a time, i.e., one subdir 
 at a time.

Yep, that avoids parsing the same file over and over again (like 
would be the case in C++ with commonly used header files for each 
translation unit), yet provides enough granularity for 
parallelism and also somewhat limits the memory problem.

 Nevertheless, compiler memory usage remains an issue on 
 low-memory systems.  My old work PC simply cannot handle 
 running dmd without grinding to a halt because there's just not 
 enough RAM to go around (only 500MB total).  It's generally not 
 a problem for modern PCs with at least a few GB of memory.  
 Walter chose compilation speed over memory efficiency, so 
 that's just the way it is.  In theory, one could turn on GC in 
 the compiler so that it will work on low-memory systems, but

(Nods)

 I'm not sure if such a change will be accepted into dmd.


 T

We could always make that a compiler switch. I'm not sure if a 
substantial benefit could be gained during parsing and semantic 
analysis (e.g. consider staticIota - each instantiation needs to 
be kept in memory, since you never know when it will be needed 
again), but certainly once you convert the AST to the backend IR, 
you could immediately drop all memory allocated by the frontend.

Aug 30 2017

D Programming

C/C++ Programming

Other

digitalmars.D - Compiler scalability. Question inspired by OOM errors seen by Crystal