digitalmars.D - Folding similar templates into one

Meta (67/67) Jun 26 2013 I apologize for the title, as I couldn't come up with something

Jonathan M Davis (27/32) Jun 26 2013 That's _very_ dependent on what the function does. A prime counter-examp...

Meta (26/66) Jun 27 2013 This may be true. Any kind of optimization would probably have to

Marco Leise (22/25) Jun 28 2013 I've thought of that as well, but you have to merge templates

Meta (11/34) Jun 28 2013 That's good to know. I wasn't sure if there were some cases when
monarch_dodra (47/58) Jun 28 2013 What is a "string lambda" ? Do you mean "a string alias latter

Meta (8/33) Jun 28 2013 The former. I didn't think it was *that* obscure of a term

Daniel Murphy (6/9) Jun 28 2013 The obvious place to do this is at link time - then you get to fold all

monarch_dodra (11/17) Jun 28 2013 Just because the arguments are known 99% of the time at compile

Meta (13/24) Jun 28 2013 That was probably the idea when templates were conceived of for

Meta (2/5) Jun 28 2013 s/characterizing/parameterizing

"Meta" <jared771 gmail.com> writes:

I apologize for the title, as I couldn't come up with something 
succinct and descriptive enough while still being reasonably 
short. We all know how powerful templates are in C++/D, but they 
have that damning cost of needing to generate new code for each 
new template argument. For example:

string format(string fmtStr)(T data)
{
     ...
}

As fmtStr is a compile-time template argument, the compiler will 
have to generate N nearly-identical format functions for N 
strings that format is instantiated with. This "template bloat" 
increases exponentially as more template arguments are added. 
Take std.exception.assertNotThrown:

void assertNotThrown(T : Throwable = Exception, E)
                     (lazy E expression,
                      string msg = null,

                      //Could be compile-time, but aren't,
                      //because of template bloat worries
                      string file = __FILE__,
                      size_t line = __LINE__)
{
     try
         expression();
     catch(T t)
     {
         immutable message = msg.empty ? t.msg : msg;
         immutable tail = message.empty ? "." : ": " ~ message;
         throw new AssertError(format("assertNotThrown failed: %s 
was thrown%s",
                                      T.stringof,
                                      tail),
                               file,
                               line,
                               t);
     }
}

The arguments file and line are known at compile time, but should 
not be made template arguments because there is a potential to 
create N*M copies of assertNotThrown, given N values of __FILE__ 
and M values of __LINE__.

Adding large numbers of template arguments spuriously can cause 
an explosion in binary size, especially when templates are very 
heavily used, such as in most D code. Furthermore, programmers 
worried about such bloat will undoubtedly take the safe route, 
and try to move as many arguments as possible from the 
compile-time parameter list to run-time. This means that 
generated code will not necessarily be efficient as it should be.

I'd like to ask, however, why should this be? The arguments line 
and file will only ever be strings and ints. They don't vary in 
type, only value, and changing those values will not affect how 
the function works. You could easily substitute out "exception.d" 
with "stdio.d", and the functionality will be unchanged.

This remains true for values of the same type. It doesn't matter 
whether you pass 1000, 10_000, or 100_000 as a template argument. 
The way the function operates is not observably different (note 
that for this to be true, we have to assume that static if does 
not exist). This is in contrast to passing a completely different 
type for T (say, RangeError), which may change how the function 
works.

I'd like to know, are there any template "optimizations" that DMD 
currently implements which mitigates such a situation such as 
with assertNotThrown, where a compile-time argument may vary only 
in value, and not type? Judging from the fact that this function 
was changed so that that __FILE__ and __LINE__ are taking as 
run-time parameters, I guess the answer is no. Is there anything 
we can do about this?

Jun 26 2013

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Thursday, June 27, 2013 04:03:34 Meta wrote:
 I'd like to ask, however, why should this be? The arguments line
 and file will only ever be strings and ints. They don't vary in
 type, only value, and changing those values will not affect how
 the function works. You could easily substitute out "exception.d"
 with "stdio.d", and the functionality will be unchanged.

That's _very_ dependent on what the function does. A prime counter-example 
would be overloaded operators like opBinary which uses their string template 
arguments in mixins within the function, thereby generating completely 
different functions. And I believe that that is by far the more common use of 
strings as template parameters.

Also, when using strings and integral values such as __FILE__ and __LINE__, 
the functions _do_ have different code.

auto foo(string file = __FILE__, size_t line = __LINE__)(Bar b)
{
 auto f = file;
 auto l = line;

 ....
}

ends up generating different code depending on the values of file or line. The 
code is _not_ identical. The only way for the code not to vary is for those 
arguments to be runtime arguments. If they're compile-time arguments, then by 
definition, they affect the generated code if they're ever used.

Yes, it is theoretically possible for the compiler to use the same template 
definition under certain circumstances (especially if you're dealing with 
something like a templated struct that ultimately ends up with exactly the 
same layout across multiple instantiations), but the only way that the 
compiler could avoid duplicating the code with something like __FILE__ and 
__LINE__ being used as template arguments is to translate them into function 
arguments, because if the function has only one definition, then it can't have 
different values for the file or line number.

- Jonathan M Davis

Jun 26 2013

"Meta" <jared771 gmail.com> writes:

On Thursday, 27 June 2013 at 03:47:39 UTC, Jonathan M Davis wrote:
 That's _very_ dependent on what the function does. A prime 
 counter-example
 would be overloaded operators like opBinary which uses their 
 string template
 arguments in mixins within the function, thereby generating 
 completely
 different functions. And I believe that that is by far the more 
 common use of
 strings as template parameters.

This may be true. Any kind of optimization would probably have to 
ignore most operator functions. Another heavy use of strings, 
however, is string lambdas. Checking if two string lambdas are 
the same is a simple string comparison, and then that template 
can be compiled once and all copies can be ellided. Then it's 
just a call to that template wherever it is used with that 
particular lambda in the program. I think this could be done with 
any template that accepts some kind of lambda function, but it 
may be somewhat tricky currently to figure out if two non-string 
lambda are equal, and I believe that DMD currently generates a 
unique lambda wherever one is used.

 Also, when using strings and integral values such as __FILE__ 
 and __LINE__,
 the functions _do_ have different code.

 auto foo(string file = __FILE__, size_t line = __LINE__)(Bar b)
 {
  auto f = file;
  auto l = line;

  ....
 }

 ends up generating different code depending on the values of 
 file or line. The
 code is _not_ identical. The only way for the code not to vary 
 is for those
 arguments to be runtime arguments. If they're compile-time 
 arguments, then by
 definition, they affect the generated code if they're ever used.

I do know that the code generated is different. Turning 
compile-time arguments into run-time arguments will, 
unfortunately, completely defeat the point, as you know. It's a 
tricky problem, because you can't just partially compile a 
function and then change the compile-time arguments as you 
encounter each new instantiation. If we could do that, we 
wouldn't have this problem, and we would probably be using Common 
Lisp.

 Yes, it is theoretically possible for the compiler to use the 
 same template
 definition under certain circumstances (especially if you're 
 dealing with
 something like a templated struct that ultimately ends up with 
 exactly the
 same layout across multiple instantiations), but the only way 
 that the
 compiler could avoid duplicating the code with something like 
 __FILE__ and
 __LINE__ being used as template arguments is to translate them 
 into function
 arguments, because if the function has only one definition, 
 then it can't have
 different values for the file or line number.

This is something that could be potentially solved on the binary 
level, if you're sure that varying the value of a template 
argument will not affect how it functions. I'm not an expert on 
how DMD works, but could this possibly be done after the native 
code is generated, but before optimizations are performed? Just 
throwing out ideas.

Jun 27 2013

Marco Leise <Marco.Leise gmx.de> writes:

Am Thu, 27 Jun 2013 23:24:48 +0200
schrieb "Meta" <jared771 gmail.com>:

 I'm not an expert on 
 how DMD works, but could this possibly be done after the native 
 code is generated

I've thought of that as well, but you have to merge templates
at an earlier stage. In case of a compiler such as GCC, the
native code is not even generated in the compiler, but an
external assembler.
That said, DMD does not create duplicate template instances
for the same parameters. I'd assume this includes "string
lambdas". And I'm not sure what you really want to improve for
assertNotThrown. It is a unit testing function and the most
important thing at the moment seems to keep DMD's memory
consumption low, which more compile-time parameters can only
increase.
Another point is that today's best compilers are good at
detecting compile-time constants already and could inline a
function with runtime parameters that are statically known.
I've once let DMD create a switch-case with 81 cases for me
and used a 100% templated function specialized for all these
cases. The result was a major slow-down from all the
compile-time arguments. Run-time arguments are just better!

-- 
Marco

Jun 28 2013

"Meta" <jared771 gmail.com> writes:

On Friday, 28 June 2013 at 12:45:22 UTC, Marco Leise wrote:
 Am Thu, 27 Jun 2013 23:24:48 +0200
 schrieb "Meta" <jared771 gmail.com>:

 I'm not an expert on how DMD works, but could this possibly be 
 done after the native code is generated

 I've thought of that as well, but you have to merge templates
 at an earlier stage. In case of a compiler such as GCC, the
 native code is not even generated in the compiler, but an
 external assembler.
 That said, DMD does not create duplicate template instances
 for the same parameters. I'd assume this includes "string
 lambdas".

That's good to know. I wasn't sure if there were some cases when 
template were ellided or not, even with the same arguments. By 
string lambdas, I mean lambdas of the form sort!"a <= b" that are 
heavily used in std.algorithm.

 And I'm not sure what you really want to improve for
 assertNotThrown. It is a unit testing function and the most
 important thing at the moment seems to keep DMD's memory
 consumption low, which more compile-time parameters can only
 increase.

I was only using assertNotThrown as an example of a template 
function where parameters that could be compile-time arguments 
become run-time arguments to minimize template bloat.

 Another point is that today's best compilers are good at
 detecting compile-time constants already and could inline a
 function with runtime parameters that are statically known.
 I've once let DMD create a switch-case with 81 cases for me
 and used a 100% templated function specialized for all these
 cases. The result was a major slow-down from all the
 compile-time arguments. Run-time arguments are just better!

That is certainly a corner case. Perhaps finding a way to fold 
templates in some cases would still provide benefits in more 
normal cases.

Jun 28 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Friday, 28 June 2013 at 12:45:22 UTC, Marco Leise wrote:
 Am Thu, 27 Jun 2013 23:24:48 +0200
 schrieb "Meta" <jared771 gmail.com>:

 I'm not an expert on how DMD works, but could this possibly be 
 done after the native code is generated

 I've thought of that as well, but you have to merge templates
 at an earlier stage. In case of a compiler such as GCC, the
 native code is not even generated in the compiler, but an
 external assembler.
 That said, DMD does not create duplicate template instances
 for the same parameters. I'd assume this includes "string
 lambdas".

What is a "string lambda" ? Do you mean "a string alias latter 
parsed as an expression", or "an actual lambda function". There 
were talks about this recently in learn: If you *type* the same 
lambda function twice, it *will* generate two different templates.
http://forum.dlang.org/thread/rufkccowcdogctvckzar forum.dlang.org

For example:
--------
struct S(alias pred){}

void main()
{
     auto a = S!(() => 1)();
     auto b = S!(() => 1)();
     a = b;
}
--------
DMD v2.064 DEBUG
main.d(7): Error: cannot implicitly convert expression (b) of 
type S!(function int()
{
return 1;
}
) to S!(function int()
{
return 1;
}
)
--------

However,

--------
enum PRED = () => 1;

struct S(alias pred){}

void main()
{
     auto a = S!PRED();
     auto b = S!PRED();
     a = b;
}
--------
This is fine.

 From testing, *string* preds don't have this "feature"/"bug" (?), 
even if you compile time build them, the compiler will "see" that 
it is the same string (probably true for integrals and whatnot). 
But for lambdas, only their *name* counts (eg __lambda1__). So if 
you plan to re-use a lambda, it *must* be stored in a way the 
compiler can "see" its unicity, and not typed more than once.

Furthermore, I'm think this is the correct behavior.

Jun 28 2013

"Meta" <jared771 gmail.com> writes:

On Friday, 28 June 2013 at 13:20:47 UTC, monarch_dodra wrote:

 What is a "string lambda" ? Do you mean "a string alias latter 
 parsed as an expression", or "an actual lambda function".

The former. I didn't think it was *that* obscure of a term 
(https://www.google.com/search?q=%22string+lambda%22+site:forum.dlang.org).

 There were talks about this recently in learn: If you *type* 
 the same lambda function twice, it *will* generate two 
 different templates.
 http://forum.dlang.org/thread/rufkccowcdogctvckzar forum.dlang.org

I'm not sure if that's due to this bug or not: 
http://forum.dlang.org/thread/jdbhas$2ftb$1 digitalmars.com?page=2#post-jdcn98:241mg9:244:40digitalmars.com

 However,

 --------
 enum PRED = () => 1;

 struct S(alias pred){}

 void main()
 {
     auto a = S!PRED();
     auto b = S!PRED();
     a = b;
 }
 --------
 This is fine.

If that's all it takes to avoid copies, then maybe it's not so 
big of a problem for lambdas.

 From testing, *string* preds don't have this "feature"/"bug" 
 (?), even if you compile time build them, the compiler will 
 "see" that it is the same string (probably true for integrals 
 and whatnot). But for lambdas, only their *name* counts (eg 
 __lambda1__). So if you plan to re-use a lambda, it *must* be 
 stored in a way the compiler can "see" its unicity, and not 
 typed more than once.

Ditto.

Jun 28 2013

"Daniel Murphy" <yebblies nospamgmail.com> writes:

"Meta" <jared771 gmail.com> wrote in message 
news:ovlbbsdtyejzpsllfrcb forum.dlang.org...
I'm not an expert on how DMD works, but could this possibly be done after 
the native code is generated, but before optimizations are performed? Just 
throwing out ideas.

The obvious place to do this is at link time - then you get to fold all 
functions from all compilation units.  It becomes something similar to 
string constant pooling - you fold sections/data based on content, not 
labels.

Jun 28 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 27 June 2013 at 02:03:52 UTC, Meta wrote:
 I'd like to ask, however, why should this be? The arguments 
 line and file will only ever be strings and ints. They don't 
 vary in type, only value, and changing those values will not 
 affect how the function works. You could easily substitute out 
 "exception.d" with "stdio.d", and the functionality will be 
 unchanged.

Just because the arguments are known 99% of the time at compile 
time, doesn't mean they should be parameters. Parameters 
shouldn't be used to just mean "compile time" anyways, but 
*specifically* to create different versions of a function or 
code, which in this case, is not the case at all.

Besides, there are scenarios where you could pass these as 
runtime parameters from higher functions, if you want the 
exception to "look" like it came from a higher level call. Making 
these parameters would mean that *any* function that handles 
lines or filenames would have to be parameterized on that.

Jun 28 2013

"Meta" <jared771 gmail.com> writes:

On Friday, 28 June 2013 at 13:25:54 UTC, monarch_dodra wrote:
 Just because the arguments are known 99% of the time at compile 
 time, doesn't mean they should be parameters. Parameters 
 shouldn't be used to just mean "compile time" anyways, but 
 *specifically* to create different versions of a function or 
 code, which in this case, is not the case at all.

That was probably the idea when templates were conceived of for 
C++, but I think their usage has evolved far beyond that now, 
especially in D. With the advent of CTFE, there's an enormous 
amount that we can do at compile-time, a lot of which isn't 
strictly related to characterizing a function/struct/class/etc. 
on a type. Though I suppose it could be argued that with CTFE, 
doing compile-time template stuff isn't as important.

 Besides, there are scenarios where you could pass these as 
 runtime parameters from higher functions, if you want the 
 exception to "look" like it came from a higher level call.

I was just using assertNotThrown as an example to illustrate my 
point.

 Making these parameters would mean that *any* function that 
 handles lines or filenames would have to be parameterized on 
 that.

I don't think there's anything stopping you from passing __FILE__ 
and __LINE__ as template parameters to some functions, and as 
function parameters to other functions.

Jun 28 2013

"Meta" <jared771 gmail.com> writes:

On Friday, 28 June 2013 at 22:40:14 UTC, Meta wrote:
 With the advent of CTFE, there's an enormous amount that we can 
 do at compile-time, a lot of which isn't strictly related to 
 characterizing a function/struct/class/etc. on a type.

s/characterizing/parameterizing

Jun 28 2013

D Programming

C/C++ Programming

Other

digitalmars.D - Folding similar templates into one