www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Compiler scalability. Question inspired by OOM errors seen by Crystal

reply Pradeep Gowda <pradeep btbytes.com> writes:
I'm referring to this thread about Crystal --  
https://lobste.rs/s/dyitr0/its_fun_program_crystal_is_it_scalable

 Crystal is strongly typed, but overwhelmingly uses type 
 inference, rather than explicit types. Because Crystal aims to 
 be spiritually—and frequently literally—compatible with Ruby, 
 that’s a problem: to accomplish that, Crystal relies on 
 sometimes-nullable types with implicit structure and implicit 
 unions, such that, frequently, the only way to even begin type 
 inference is to load the entire program’s AST into RAM all at 
 once and then start your massive type inference pass. What 
 you’re seeing in this thread is how a “simple” fix to a YAML 
 parser error reporting hit that problem, causing Crystal to use 
 a critical amount too much RAM and OOM.
How does D compare in this regard, especially in cases where `auto` storage class specifiers are used liberally throughout the code base?
Aug 29
next sibling parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Wednesday, 30 August 2017 at 01:30:30 UTC, Pradeep Gowda wrote:
 I'm referring to this thread about Crystal --  
 https://lobste.rs/s/dyitr0/its_fun_program_crystal_is_it_scalable

 Crystal is strongly typed, but overwhelmingly uses type 
 inference, rather than explicit types. Because Crystal aims to 
 be spiritually—and frequently literally—compatible with Ruby, 
 that’s a problem: to accomplish that, Crystal relies on 
 sometimes-nullable types with implicit structure and implicit 
 unions, such that, frequently, the only way to even begin type 
 inference is to load the entire program’s AST into RAM all at 
 once and then start your massive type inference pass. What 
 you’re seeing in this thread is how a “simple” fix to a YAML 
 parser error reporting hit that problem, causing Crystal to 
 use a critical amount too much RAM and OOM.
How does D compare in this regard, especially in cases where `auto` storage class specifiers are used liberally throughout the code base?
Auto is not the problem you can easily figure out the return type of function that return a primitive type, and aggregates have to be specified. The problem with D is the memory hogging nature of CTFE and the sheer number of templates that get instantiated when compiling big codebases. Symbol length is also a problem but that eats you dose space not your RAM.
Aug 29
next sibling parent Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Wednesday, 30 August 2017 at 02:53:42 UTC, Nicholas Wilson 
wrote:
 On Wednesday, 30 August 2017 at 01:30:30 UTC, Pradeep Gowda 
 wrote:
 I'm referring to this thread about Crystal --  
 https://lobste.rs/s/dyitr0/its_fun_program_crystal_is_it_scalable

 Crystal is strongly typed, but overwhelmingly uses type 
 inference, rather than explicit types. Because Crystal aims 
 to be spiritually—and frequently literally—compatible with 
 Ruby, that’s a problem: to accomplish that, Crystal relies on 
 sometimes-nullable types with implicit structure and implicit 
 unions, such that, frequently, the only way to even begin 
 type inference is to load the entire program’s AST into RAM 
 all at once and then start your massive type inference pass. 
 What you’re seeing in this thread is how a “simple” fix to a 
 YAML parser error reporting hit that problem, causing Crystal 
 to use a critical amount too much RAM and OOM.
How does D compare in this regard, especially in cases where `auto` storage class specifiers are used liberally throughout the code base?
Auto is not the problem you can easily figure out the return type of function that return a primitive type, and aggregates have to be specified. The problem with D is the memory hogging nature of CTFE and the sheer number of templates that get instantiated when compiling big codebases. Symbol length is also a problem but that eats you dose space not your RAM.
Symbols do eat your RAM because, the compiler has to generate them in RAM before writing them to a binary, and also don't forget `pragma (msg, symbol.mangle)` - the mangled name of each symbol is available for use in CTFE. However the mangling problems should finally be put to rest, thanks to Rainer Schuetze, now that https://github.com/dlang/dmd/pull/5855 / https://github.com/dlang/dmd/pull/6998 was merged.
Aug 30
prev sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Wed, Aug 30, 2017 at 02:53:42AM +0000, Nicholas Wilson via Digitalmars-d
wrote:
[...]
 The problem with D is the memory hogging nature of CTFE and the sheer
 number of templates that get instantiated when compiling big
 codebases. Symbol length is also a problem but that eats you dose
 space not your RAM.
The symbol length problem has been greatly reduced by Rainer's recent mangle PR, which is now in git master and will be in the following release. There is still room for even more reduction but they will be comparatively minor. Rainer's fix has eliminated the bulk of the symbol length problem. The CTFE problem will be fixed soon, once Stefan Koch finishes his newCTFE engine. Templates are still an area where a lot of improvement can be made. But Stefan has indicated interest in visiting this area in the compiler after the newCTFE project is done. So eventually we'll get there. T -- Famous last words: I *think* this will work...
Aug 30
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 30 August 2017 at 16:34:13 UTC, H. S. Teoh wrote:
 The CTFE problem will be fixed soon, once Stefan Koch finishes 
 his newCTFE engine.

 Templates are still an area where a lot of improvement can be 
 made. But Stefan has indicated interest in visiting this area 
 in the compiler after the newCTFE project is done.  So 
 eventually we'll get there.


 T
Yes that is true. However since I've just taken a full-time job I cannot say when it will be finished. As for templates I have chosen to work around the issue entirely and integrate type-manipulation and introspection abilities with ctfe. (Which is inherently more scalable then templates)
Aug 31
parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Thu, Aug 31, 2017 at 04:36:41PM +0000, Stefan Koch via Digitalmars-d wrote:
 On Wednesday, 30 August 2017 at 16:34:13 UTC, H. S. Teoh wrote:
 
 The CTFE problem will be fixed soon, once Stefan Koch finishes his
 newCTFE engine.
 
 Templates are still an area where a lot of improvement can be made.
 But Stefan has indicated interest in visiting this area in the
 compiler after the newCTFE project is done.  So eventually we'll get
 there.
 
 
 T
Yes that is true. However since I've just taken a full-time job I cannot say when it will be finished. As for templates I have chosen to work around the issue entirely and integrate type-manipulation and introspection abilities with ctfe. (Which is inherently more scalable then templates)
I think D's conception of templates, which IMO goes beyond C++'s (conceptually, anyway, the current implementation is not too different fundamentally), can be implemented in a much better way that makes it more scalable. C++ sees templates as code stencils to be instantiated with the given template arguments, i.e., copy-n-paste the stencil and substitute template parameters with the given arguments. D's templates for the most part still retains that, and certainly the current implementation essentially just follows the C++ model. But IMO there's an important conceptual difference: D sees template parameters not so much as parameters for code stencils to be copy-n-pasted, but as *compile-time* parameters to a function / set of declarations. I.e., they are parameters that are known at compile-time rather than runtime. The distinction is subtle but IMO important, because it allows us to break out of the C++ mold of copy-n-paste-this-stencil that is one of the causes of template scalability problems (pass too many different template arguments, and you get massive code bloat, one copy per instantiation). By treating template arguments as arguments known at compile-time instead, the compiler can get smarter with when to copy-n-paste a declaration, and when to just evaluate the template with the argument without actually instantiating anything (no copying of AST nodes, etc.). A common example is a templated linked-list container, where much of the linked-list pointer manipulation code is actually independent of the type of the contained data. The compiler really only needs to generate distinct copies of the code when the data itself is being manipulated; all the other code can be shared between template instantiations. This is just one example off the top of my head; I'm sure there are plenty of others that we can come up with once we break free of the C++ mold of thinking of templates as "copy-n-paste except substitute X with Y". Another example that just occurred to me is the various recursive templates in std.meta for manipulating AliasSeq's. There is actually no need for the compiler to instantiate anything at all, except perhaps for the final result. By treating the AliasSeq as an in-compiler data type with compile-time operations performed on it, the compiler ought to be able to evaluate the template without needing to instantiate anything past the first level of recursion. T -- "How are you doing?" "Doing what?"
Aug 31
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Thursday, 31 August 2017 at 17:39:35 UTC, H. S. Teoh wrote:

 This is just one example off the top of my head; I'm sure there 
 are plenty of others that we can come up with once we break 
 free of the C++ mold of thinking of templates as "copy-n-paste 
 except substitute X with Y".  Another example that just 
 occurred to me is the various recursive templates in std.meta 
 for manipulating AliasSeq's.  There is actually no need for the 
 compiler to instantiate anything at all, except perhaps for the 
 final result. By treating the AliasSeq as an in-compiler data 
 type with compile-time operations performed on it, the compiler 
 ought to be able to evaluate the template without needing to 
 instantiate anything past the first level of recursion.


 T
That is a good example and it is one where we could maybe deduce what is doing on. However the classification of a template (rather the deduction of unused intanciations) actually requires us to instantiate the template. Because we cannot predict to what it might evaluate given specific parameters. For this to work we have to produce all possible instantiation behaviors of a template, and deduplicate this set. Which happens to be infinite, in most cases. Even predicting if the set of instantiations can be finite is essentially an instance of the halting problem. I've worked on this for some time until I realized that I was fighting without gaining ground. It would end up a myriad of special cases and would be impossible to maintain. Considering that the current template system which does not try anything smart is already fiendishly complicated, it'd rather not introduce an even more complicated template-instance-shape-predictor which will probably only work on the simplest of cases and is not guaranteed to bring wins that would justify the time it'll take to implement it. TL;DR The change to optimize certain groups of templates will require MASSIVE amounts of work and is unlikely to pay for itself.
Aug 31
prev sibling parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Wednesday, 30 August 2017 at 01:30:30 UTC, Pradeep Gowda wrote:
 I'm referring to this thread about Crystal --  
 https://lobste.rs/s/dyitr0/its_fun_program_crystal_is_it_scalable

 Crystal is strongly typed, but overwhelmingly uses type 
 inference, rather than explicit types. Because Crystal aims to 
 be spiritually—and frequently literally—compatible with Ruby, 
 that’s a problem: to accomplish that, Crystal relies on 
 sometimes-nullable types with implicit structure and implicit 
 unions, such that, frequently, the only way to even begin type 
 inference is to load the entire program’s AST into RAM all at 
 once and then start your massive type inference pass. What 
 you’re seeing in this thread is how a “simple” fix to a YAML 
 parser error reporting hit that problem, causing Crystal to 
 use a critical amount too much RAM and OOM.
How does D compare in this regard, especially in cases where `auto` storage class specifiers are used liberally throughout the code base?
D supports separate compilation by design. I.e. it doesn't require all the source files corresponding to all the object files being linked to produce the final executable, to be loaded in memory by the compiler.
Aug 30
parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Wed, Aug 30, 2017 at 08:01:17AM +0000, via Digitalmars-d wrote:
[...]
 D supports separate compilation by design. I.e. it doesn't require all
 the source files corresponding to all the object files being linked to
 produce the final executable, to be loaded in memory by the compiler.
Yes, I think the general recommendation for very large codebases is to compile one package at a time, i.e., one subdir at a time. Nevertheless, compiler memory usage remains an issue on low-memory systems. My old work PC simply cannot handle running dmd without grinding to a halt because there's just not enough RAM to go around (only 500MB total). It's generally not a problem for modern PCs with at least a few GB of memory. Walter chose compilation speed over memory efficiency, so that's just the way it is. In theory, one could turn on GC in the compiler so that it will work on low-memory systems, but I'm not sure if such a change will be accepted into dmd. T -- Questions are the beginning of intelligence, but the fear of God is the beginning of wisdom.
Aug 30
parent Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Wednesday, 30 August 2017 at 16:39:32 UTC, H. S. Teoh wrote:
 On Wed, Aug 30, 2017 at 08:01:17AM +0000, via Digitalmars-d 
 wrote: [...]
 D supports separate compilation by design. I.e. it doesn't 
 require all the source files corresponding to all the object 
 files being linked to produce the final executable, to be 
 loaded in memory by the compiler.
Yes, I think the general recommendation for very large codebases is to compile one package at a time, i.e., one subdir at a time.
Yep, that avoids parsing the same file over and over again (like would be the case in C++ with commonly used header files for each translation unit), yet provides enough granularity for parallelism and also somewhat limits the memory problem.
 Nevertheless, compiler memory usage remains an issue on 
 low-memory systems.  My old work PC simply cannot handle 
 running dmd without grinding to a halt because there's just not 
 enough RAM to go around (only 500MB total).  It's generally not 
 a problem for modern PCs with at least a few GB of memory.  
 Walter chose compilation speed over memory efficiency, so 
 that's just the way it is.  In theory, one could turn on GC in 
 the compiler so that it will work on low-memory systems, but
(Nods)
 I'm not sure if such a change will be accepted into dmd.


 T
We could always make that a compiler switch. I'm not sure if a substantial benefit could be gained during parsing and semantic analysis (e.g. consider staticIota - each instantiation needs to be kept in memory, since you never know when it will be needed again), but certainly once you convert the AST to the backend IR, you could immediately drop all memory allocated by the frontend.
Aug 30