digitalmars.D - Few ideas to reduce template bloat
- bearophile <bearophileHUGS lycos.com> Mar 25 2010
- Norbert Nemec <Norbert Nemec-online.de> Mar 25 2010
- bearophile <bearophileHUGS lycos.com> Mar 25 2010
- sebastien.f <sebastien noreply.com> Mar 25 2010
- bearophile <bearophileHUGS lycos.com> Mar 25 2010
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Mar 25 2010
- Walter Bright <newshound1 digitalmars.com> Mar 25 2010
- bearophile <bearophileHUGS lycos.com> Mar 25 2010
- Bernard Helyer <b.helyer gmail.com> Mar 25 2010
- Walter Bright <newshound1 digitalmars.com> Mar 26 2010
- bearophile <bearophileHUGS lycos.com> Mar 26 2010
- Walter Bright <newshound1 digitalmars.com> Mar 26 2010
- bearophile <bearophileHUGS lycos.com> Mar 26 2010
- Norbert Nemec <Norbert Nemec-online.de> Mar 26 2010
- Walter Bright <newshound1 digitalmars.com> Mar 28 2010
- Walter Bright <newshound1 digitalmars.com> Mar 28 2010
- bearophile <bearophileHUGS lycos.com> Mar 28 2010
- Steve Teale <steve.teale britseyeview.com> Mar 25 2010
There are some ways to reduce the amount of binary code produced by templates
in a language like D.
1) There are few situations where the types are different but the asm
instructions inside the function are the same, like uint and int, but this
situation probably is not common enough.
2) Even in function templates that get instantiated to a different asm there
can be parts that are shared. If the shared part is at the end of the function
the compiler can use a jump to use a single function exit point and remove some
code duplication. I don't think this is a common situation.
3) There are more situations where a template type is one of several classes
(OOP), this can produce the same duplicated asm. In this case the compiler can
keep only one version in the final binary. This seems common enough.
4) C# avoids code bloat because it has a JIT and templates are instantiated at
run-time. D can of course do the same if it uses the LLVM backend, but let's
ignore this possibility for now.
5) Java generics don't cause code bloat because it removes the types at runtime
and uses boxed values. This can be type-safe, but it's less flexible than C++/D
templates, and the boxing-unboxing decreases performance. On the other hand
programming practice shows that in most programs not all functions have to be
top performance. So a compromise can be invented (I have never found this idea
elsewhere). An attribute like boxed can be used to mark what function template
arguments are boxed (and don't multiply the template instances).
So for example this template function can be called with double or int values,
so it gets intantiated in four versions:
T2 foo(T1, T2)(T1 x, T2 y) {...}
Instantiation 1: foo(1, 2)
Instantiation 2: foo(1, 2.0)
Instantiation 3: foo(1.0, 2)
Instantiation 4: foo(1.0, 2.0)
If foo() is not performance-critical, then the same function template with one
boxed type, and with the same inputs, gets instantiated just two times,
halving its combinatorial explosion:
boxed(T2) foo(T1, boxed T2)(T1 x, T2 y) {...}
Instantiation 1: foo(1, 2) and foo(1, 2.0)
Instantiation 2: foo(1.0, 2) and foo(1.0, 2.0)
The first argument is like a normal template argument and created different
instantiations, the second argument gets automatically boxed-unboxed, so it's
like a single type.
Inside this version of the function foo() instructions like:
static if typeof(T2 == int) {...
are not allowed by the compiler (despite they are formally correct) because
instances of T2 are objects and the compiler reminds you that you have to use a
run time cast:
if (cast(Integer)T2 !is null) {...
There are ways to reduce code bloat with idioms used by the programmer, but
that's good for C++ programmers. Programmers in a more modern language prefer
something safer done by the compiler :-)
Bye,
bearophile
Mar 25 2010
bearophile wrote:4) C# avoids code bloat because it has a JIT and templates are instantiated at run-time. D can of course do the same if it uses the LLVM backend, but let's ignore this possibility for now. 5) Java generics don't cause code bloat because it removes the types at runtime and uses boxed values. This can be type-safe, but it's less flexible than C++/D templates, and the boxing-unboxing decreases performance. On the other hand programming practice shows that in most programs not all functions have to be top performance. So a compromise can be invented (I have never found this idea elsewhere). An attribute like boxed can be used to mark what function template arguments are boxed (and don't multiply the template instances).
Indeed, this is the very fundamental distinction between templates (C++) and generics (C# and Java). Templates: * have to be instantiated at compile time * usable for meta-programming * cannot be compiled or type-checked on its own --> not type-safe (i.e. if you don't set the constraints correctly, the instantiation may fail with an error message deep inside the template code) Generics: * Every template parameter must follow a clear interface. * generic code can be compiled and fully type-checked on its own * instantiation not needed at all but possible as optimization -> use boxed values without instantiation -> optionally instantiate at run-time/link-time/compile-time (analogous to function inlining) Though both concepts have a large overlap, it is difficult to bridge the gap without adding significant complexity. Some experimental languages try to get the best of both ways by smearing or even giving up the distinction between compile-time and run-time. Though these attempts are very promising for the future, they are clearly beyond the pragmatic goals of D. D is clearly designed as language that can be compiled to a static binary.
Mar 25 2010
Norbert Nemec:D is clearly designed as language that can be compiled to a static binary.<
Just to be sure you have understood: the solution number 5 is for a statically compiled language. It doesn't need a JIT. You just need normal objects, a virtual table, etc. Bye, bearophile
Mar 25 2010
This problem reminds me of a recent paper written by Bjarne Stroustrup (and others) describing and evaluating a technique named “generalized hoisting”. It’s applied to the C++ STL iterators and removes lot of code bloat and increase performances surprisingly. I found it a great read (as often with Bjarne.S) I thought you may find it interesting too. It says the ideas apply to the D language too. Minimizing Dependencies within Generic Classes for Faster and Smaller Programs. http://www.research.att.com/~bs/SCARY.pdf
Mar 25 2010
sebastien.f:I found it a great read (as often with Bjarne.S) I thought you may find it interesting too. It says the ideas apply to the D language too.
It seems a nice paper, thank you, I'll read it. Bjarne.S is the last one in the list of its authors. It's nice to see a paper written by such people that mentions D three times or more :-) Bye, bearophile
Mar 25 2010
On 03/25/2010 03:17 PM, bearophile wrote:sebastien.f:I found it a great read (as often with Bjarne.S) I thought you may find it interesting too. It says the ideas apply to the D language too.
It seems a nice paper, thank you, I'll read it. Bjarne.S is the last one in the list of its authors. It's nice to see a paper written by such people that mentions D three times or more :-)
That's a very, very nice hat tip from the authors. Andrei
Mar 25 2010
Andrei Alexandrescu wrote:On 03/25/2010 03:17 PM, bearophile wrote:sebastien.f:I found it a great read (as often with Bjarne.S) I thought you may find it interesting too. It says the ideas apply to the D language too.
It seems a nice paper, thank you, I'll read it. Bjarne.S is the last one in the list of its authors. It's nice to see a paper written by such people that mentions D three times or more :-)
That's a very, very nice hat tip from the authors.
Yes!
Mar 25 2010
After reading that paper I can say that D compilers can start doing the simpler thing: recognizing instantiations of a function template that are compiled to the same asm code, and keep only one of them. The Microsoft compiler seems to do that already. Bye, bearophile
Mar 25 2010
On 26/03/10 10:18, bearophile wrote:The Microsoft compiler seems to do that already.
Microsoft Visual D? =P
Mar 25 2010
bearophile wrote:After reading that paper I can say that D compilers can start doing the simpler thing: recognizing instantiations of a function template that are compiled to the same asm code, and keep only one of them. The Microsoft compiler seems to do that already.
Because of separate compilation, that can't be done with the compiler in the general case. The linker can, though.
Mar 26 2010
Walter Bright:Because of separate compilation, that can't be done with the compiler in the general case. The linker can, though.
LDC is able to do it already! I don't need 100% general solutions. Bye, bearophile
Mar 26 2010
bearophile wrote:Walter Bright:Because of separate compilation, that can't be done with the compiler in the general case. The linker can, though.
LDC is able to do it already! I don't need 100% general solutions.
Are you sure LDC isn't doing it at the link step? Regardless, if you implement a half solution in the compiler, and then a full solution in the linker, you've doubled your effort for the same result. This may be a practical solution, however, if you desire to work with any linker, including ones you have no control over.
Mar 26 2010
Walter Bright:Are you sure LDC isn't doing it at the link step?
This is the post where I have explained it: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=108275 As you can see from the link too llvm source code at the top of my post: https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_25/lib/Transforms/IPO/MergeFunctions.cpp that's a LLVM optimization pass, and ldc is spitting out already optimized asm, so it's not something done by the linker.Regardless, if you implement a half solution in the compiler, and then a full solution in the linker, you've doubled your effort for the same result.
Right. Is dmd linker able to do that? The Gold linker (http://en.wikipedia.org/wiki/Gold_%28linker%29 ) that I think in future ldc will use may be able to do that. But you are right, if D modules are compiled separately the compiler can't recognize many cases of duplications. Bye, bearophile
Mar 26 2010
bearophile wrote:Norbert Nemec:D is clearly designed as language that can be compiled to a static binary.<
Just to be sure you have understood: the solution number 5 is for a statically compiled language. It doesn't need a JIT. You just need normal objects, a virtual table, etc.
Of course. In that sense, Java, C#, C++ and D are all in the same boat. My comment was geared towards languages that experiment with multi-stage compilation (or "specialization"). What I wanted to say: statically compiled languages typically distinguish between generics (which "exist" at run time as objects) and templates (which exist at compile time only). Most languages offer only one of both. Once you start merging both concepts you quickly end up with smearing the boundary between run-time and compile-time and depend on JIT technology or multi-stage compilation. D may of course use these techniques for optimization, but the language should not depend on them.
Mar 26 2010
bearophile wrote:Just to be sure you have understood: the solution number 5 is for a statically compiled language. It doesn't need a JIT. You just need normal objects, a virtual table, etc.
With enough indirection, template instances can all be done with one function regardless of the actual argument types. What is lost with that approach, however, are all the benefits of inlining, constant folding, specialization, etc. In addition, CPUs tend to do a much poorer job with indirection than they do with inline code. D's templates are also more powerful than generics in that they can be used to templatize data and symbols, not just functions and classes. Generic types can also be more bloated because they have to carry around the indirect references, while templated types can specialize them away. For example, a templated hash-map can often eliminate storing the hash, while a generic one will have to keep it regardless.
Mar 28 2010
Oh, one more thing. A template system can be used to create a user-defined generic system, but a generic system cannot be used to create a template system.
Mar 28 2010
Walter Bright:With enough indirection, template instances can all be done with one function regardless of the actual argument types. What is lost with that approach, however, are all the benefits of inlining, constant folding, specialization, etc
My idea 5 was not to remove templates from D, and not even to add normal Java-like generics to D. And I know templates are usually faster and more flexible. My idea was to create something mixed, where you can choose what input arguments are boxed as in Java, and what ones are not boxed as in D/C++. So the purpose was not to replace normal D templates, but to add flexibility. Because I think there are templates that don't need max performance, in such case using a mixed template-generic (or full generic) can give a "good enough" performance while keeping the final binary smaller. Reducing binary size is good because it decreases the pressure on the code L1 cache, and this speeds up the program a little. So it's like a cross between a template and a generic. This is a normal D template: T1 foo(T1, T2, T3)(T1 x, T2, y, T3 z) {...} This is a full java-style generic, that will have only one implementation in the binary, x, y and z will be boxed inside foo: generic T1 foo( generic T1, generic T2, generic T3)(T1 x, T2, y, T3 z) {...} And this is something mixed, here T1 and T3 will be boxed but T2 will not, T2 is like a D template argument, so the compiler will instantiate as many different T2 it will find: generic T1 foo( generic T1, T2, generic T3)(T1 x, T2, y, T3 z) {...}Generic types can also be more bloated because they have to carry around the indirect references, while templated types can specialize them away.<
You are right, as always in life it's a matter of compromises and balances :-) If you instantiate a hybrid template like foo( generic T1, T2, generic T3) like with ten different T2 types, you can usually produce more code than a single (probably bigger) 100% generic like generic T1 foo( generic T1, generic T2, generic T3) :-) Bye, bearophile
Mar 28 2010
On Thu, 25 Mar 2010 16:48:14 -0500, Andrei Alexandrescu wrote:On 03/25/2010 03:17 PM, bearophile wrote:sebastien.f:I found it a great read (as often with Bjarne.S) I thought you may find it interesting too. It says the ideas apply to the D language too.
It seems a nice paper, thank you, I'll read it. Bjarne.S is the last one in the list of its authors. It's nice to see a paper written by such people that mentions D three times or more :-)
That's a very, very nice hat tip from the authors. Andrei
Do we have a balanced tree implementation in Phobos these days? Steve
Mar 25 2010









Walter Bright <newshound1 digitalmars.com> 