www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Reducing variadic template combinatorics (C++ was onto something)

reply Steven Schveighoffer <schveiguy gmail.com> writes:
In a conversation on a pull request for one of my libraries, I 
came across an interesting revelation. Variadic template 
functions waste resources for the most part, because much of the 
time, you don't care about the *relationship* between the 
parameters. In many such functions, you just process them in a 
loop.

Let's start with the traditional variadic D pattern. I'll write a 
function which writes each individual parameter on its own line:

```d
void writelines(T...)(T values)
{
     import std.stdio;
     static foreach(v; values)
         writeln(v);
}
```

Every combination of every type creates a new template 
instantiation. We only save on instantiations when 2 calls happen 
to match all their types.

But this function that is instantiated is all just calls to 
`writeln`! It's not an interesting function, nor is it really 
worthy of applying a combinatoric solution. We aren't getting any 
special optimization by having the entire list in view at once.

Let's take an example call:

```d
writelines(1, 2, 3, 4);
```

Note how we had to *type out* each parameter individually, in the 
same order they would be processed in the loop. Well, we can 
write this ourselves!

```d
writeln(1); writeln(2); writeln(3); writeln(4);
```

This accomplishes the same thing, but instead of a template 
instantiation per group of writes, we only get one instantiation 
of `writeln` for integers. We effectively have removed the 
combinatorics.

Now, this is quite the ugly solution! We have to repeat the call 
for each one.

But notice how we have written the same exact list! But instead 
of `", "`, our separator is `"); writeln("`.

What if we could reduce that separator? Maybe we can use an 
`opCall`?

```d
struct WriteLines
{
     ref opCall(T)(T val) {
         import std.stdio;
         writeln(val);
         return this;
     }
}

enum writelines2 = WriteLines.init;
```

Now, how does this look?


```d
writelines2(1)(2)(3)(4);
```

Our separator has changed into `")("`. A little nicer, but still 
looks weird.

But more importantly, we have *one* instantiation of the `opCall` 
for *all integers*. I can write any number of integers, or any 
combination of integers and strings, or anything else, and we 
only get one instantiation per type. The combinatorics are gone, 
and yet I'm *mostly* writing the same thing.

How about we try an operator? Wait, isn't there another language 
that does this?

```d
struct WriteLinesCpp
{
     ref opBinary(string s: "<<", T)(T val) {
         import std.stdio;
         writeln(val);
         return this;
     }
}
enum coutlines = WriteLinesCpp.init;
```

And the usage is as you would expect:

```d
coutlines << 1 << 2 << 3 << 4;
```

Again, the benefit here is less combinatorics -- one 
instantiation per type -- and less junk functions which are 
unrolling the unrolled loops that we typed in the first place.

But... I still want to write `writelines(1, 2, 3, 4)`. The 
ergonomics there are nice! Is there some way we can capture this 
same reduction in complexity while still keeping the nice syntax?

I'll leave it up to the experts here to think about. I can 
probably think of ways, but I'm sure they would not stand up to 
scrutiny.

-Steve
Oct 13
next sibling parent monkyyy <crazymonkyyy gmail.com> writes:
On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven Schveighoffer 
wrote:
 In a conversation on a pull request for one of my libraries, I 
 came across an interesting revelation. Variadic template 
 functions waste resources for the most part, because much of 
 the time, you don't care about the *relationship* between the 
 parameters. In many such functions, you just process them in a 
 loop.

 [...]
I have fun ideas but youd need a benchmark; while I think I know how to estimate the big O of dmd, theres always the fun edge cases
Oct 13
prev sibling next sibling parent reply Daniel N <no public.email> writes:
On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven Schveighoffer 
wrote:
 Now, how does this look?

 ```d
 writelines2(1)(2)(3)(4);
 ```

 -Steve
It's probably not what you want... [1,2,3,4].writeln; ... but I think the syntax is sufficiently nice.
Oct 13
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On Tuesday, 14 October 2025 at 06:54:24 UTC, Daniel N wrote:
 On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven 
 Schveighoffer wrote:
 Now, how does this look?

 ```d
 writelines2(1)(2)(3)(4);
 ```
It's probably not what you want... [1,2,3,4].writeln; ... but I think the syntax is sufficiently nice.
So this won't work if the types are different. ```d writelines("value", 1); writelines2("value")(1); // ["value", 1].writeln; // does not work ``` -Steve
Oct 14
prev sibling next sibling parent reply Dennis <dkorpel gmail.com> writes:
Very interesting thing you're bringing up, I'll share my thoughts.

On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven Schveighoffer 
wrote:
 ```d
 void writelines(T...)(T values)
 {
     import std.stdio;
     static foreach(v; values)
         writeln(v);
 }
 ```

 Every combination of every type creates a new template 
 instantiation. We only save on instantiations when 2 calls 
 happen to match all their types.
True, but the instantiations are all very short, so they are cheap to process and get inlined in an optimized build. In theory, it's not much worse than writing the calls yourself like in your second example:
 ```d
 writeln(1); writeln(2); writeln(3); writeln(4);
 ```
Much more problematic is: ```D void writelines(T...)(T values) { static foreach(v; values) { // *hundreds of lines of implementation for writeln(v)* } } ``` Because then you duplicate the implementation multiple times, even though it's probably 99% the same for `int, uint, const(int), immutable(uint)` etc. + every variadic combination of those. So it's best practice to reduce the number of possible template parameter values as early as possible before getting to an implementation. Don't instantiate `impl!T` for every integral type when `impl!(T.sizeof)` also works. Here's a real example of employing this technique: https://github.com/dlang/druntime/pull/3852
 How about we try an operator? (...)
 And the usage is as you would expect:

 ```d
 coutlines << 1 << 2 << 3 << 4;
 ```
The problem here is that << is meant to do arithmetic, not construct a list. In Java you can write `System.out.println("" + 1 + 2 + 3 + 4);`, which is better, but + is still also used for arithmetic so it's still confusing (as seen by the `"" +` required to string concatenate rather than add). Now D has its own operator that naturally builds a list: ~ However, that often needlessly GC allocates, and it doesn't convert other types to strings. A pattern I often write is: ```D void writeInTag(ref OutBuffer buf, int x) { buf ~= "<"; buf ~= x; buf ~= ">"; } ``` I'd rather write: ```D buf ~= "<" ~ x ~ ">"; ``` That doesn't work because I don't want `string ~ int`, I want it to use my `OutBuffer ~ int` implementation, which requires a different operator precedence: ```D (buf ~= "<") ~ x ~ ">"; ``` This looks bad, so I've considered swapping out the assignment operator for something with lower precedence: ```D buf ~ "<" ~ x ~ ">"; ``` But now we're back in C++ operator overloading abuse territory. An unsuspecting programmer would see this as a typo, creating a concatenated value and discarding it with no side effect. The only unambiguous one-liners I found to work so far are format strings or interpolated strings: ```D buf.format("<%s>", x); buf ~= i"<$(x)>"; ```
 But... I still want to write `writelines(1, 2, 3, 4)`. The 
 ergonomics there are nice! Is there some way we can capture 
 this same reduction in complexity while still keeping the nice 
 syntax?
I'd say: just write a template helper that does nothing but forward each argument to another call. If that's somehow significantly worse than manually writing 4 function calls, investigate why that is and see if that can be fixed.
Oct 14
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On Tuesday, 14 October 2025 at 12:30:23 UTC, Dennis wrote:
 Very interesting thing you're bringing up, I'll share my 
 thoughts.

 On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven 
 Schveighoffer wrote:
 ```d
 void writelines(T...)(T values)
 {
     import std.stdio;
     static foreach(v; values)
         writeln(v);
 }
 ```

 Every combination of every type creates a new template 
 instantiation. We only save on instantiations when 2 calls 
 happen to match all their types.
True, but the instantiations are all very short, so they are cheap to process and get inlined in an optimized build. In theory, it's not much worse than writing the calls yourself like in your second example:
 ```d
 writeln(1); writeln(2); writeln(3); writeln(4);
 ```
Yes, but it's not just runtime cost. There's cost to the compile time. What I'm exploring is that we *have* an unrolled loop -- the parameter list. If somehow we can harness this into avoiding the different processing per call to reach the same conclusion, that's what I'm looking at. Whether this results in some kind of "best practice" or some special handling the compiler can use to save some compile time, I'm not sure. But I feel there is definitely some benefit that can be seen here.
 Much more problematic is:

 ```D
 void writelines(T...)(T values)
 {
     static foreach(v; values)
     {
         // *hundreds of lines of implementation for writeln(v)*
     }
 }
 ```

 Because then you duplicate the implementation multiple times, 
 even though it's probably 99% the same for `int, uint, 
 const(int), immutable(uint)` etc. + every variadic combination 
 of those. So it's best practice to reduce the number of 
 possible template parameter values as early as possible before 
 getting to an implementation. Don't instantiate `impl!T` for 
 every integral type when `impl!(T.sizeof)` also works.
It's also the same implementation for `int, int, int, int`! the *hundreds of lines of implementation* is not only repeated for different types, it's also repeated for the *same* type with different parameter configuration. In other words, `writelines(1, 2)` and `writelines(1, 2, 3)` creates 5 loop bodies, one for each parameter, whereas `writelines2(1)(2)` and `writelines2(1)(2)(3)` creates one "body" for int, and that's it. The compiler compiles and processes all variations of function calls independently, and sees no benefit from realizing all the loop bodies are the same.
 Here's a real example of employing this technique: 
 https://github.com/dlang/druntime/pull/3852
Nice, yes this is part of it.
 How about we try an operator? (...)
 And the usage is as you would expect:

 ```d
 coutlines << 1 << 2 << 3 << 4;
 ```
The problem here is that << is meant to do arithmetic, not construct a list. In Java you can write `System.out.println("" + 1 + 2 + 3 + 4);`, which is better, but + is still also used for arithmetic so it's still confusing (as seen by the `"" +` required to string concatenate rather than add).
I did not properly convey that I *don't* want this solution. It's just an exploration of what is possible with the current language. I dislike C++ iostream operators, and prefer the parameter list style for writeln. But I had never considered that there is a *compile time* benefit to the way C++ is doing it.
 But... I still want to write `writelines(1, 2, 3, 4)`. The 
 ergonomics there are nice! Is there some way we can capture 
 this same reduction in complexity while still keeping the nice 
 syntax?
I'd say: just write a template helper that does nothing but forward each argument to another call. If that's somehow significantly worse than manually writing 4 function calls, investigate why that is and see if that can be fixed.
Right, if nothing else it is a "best practice" on how to avoid template bloat. But if there is a way we can convey to the compiler this pattern, it would be very nice! -Steve
Oct 14
parent reply Dennis <dkorpel gmail.com> writes:
On Tuesday, 14 October 2025 at 14:44:56 UTC, Steven Schveighoffer 
wrote:
 Right, if nothing else it is a "best practice" on how to avoid 
 template bloat. But if there is a way we can convey to the 
 compiler this pattern, it would be very nice!
I don't see the difference between the two, isn't the following both a "best practice" as well as conveying the pattern to the compiler? ```D void f(T...)(T args) if (T.length > 1) { foreach (arg; args) f(arg); } void f(T)(T arg) { // Implementation } ```
Oct 14
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On Tuesday, 14 October 2025 at 15:35:48 UTC, Dennis wrote:
 On Tuesday, 14 October 2025 at 14:44:56 UTC, Steven 
 Schveighoffer wrote:
 Right, if nothing else it is a "best practice" on how to avoid 
 template bloat. But if there is a way we can convey to the 
 compiler this pattern, it would be very nice!
I don't see the difference between the two, isn't the following both a "best practice" as well as conveying the pattern to the compiler? ```D void f(T...)(T args) if (T.length > 1) { foreach (arg; args) f(arg); } void f(T)(T arg) { // Implementation } ```
Like if there were some way for the function author to identify that the variadic args are going to be looped independently, the compiler can save the code generation per type. I don't know, maybe it's not worth the effort. The idea is to keep the normal parameter list style, but the compiler rewrites it to cacheable templates. Either by being clued in that the variadic list is going to be processed this way, or by detecting it itself. For example, if it can tell that the loop body doesn't use anything except the element, then it could rewrite the function as a standalone template automatically. Sort of like a lambda can detect whether it can be a function or delegate based on whether it uses any data from the surrounding frame. I just feel like when your function is just doing a foreach over the tuple, it's a waste for the function to even exist, since you created the tuple in the first place! The whole impetus for this is someone in one of my projects wants to change a function from a typesafe variadic into a template variadic. I'm hesitant to bloat the code this way for the sake of a new feature. I've written a lot of variadic functions where the code mostly just loops over the variadic and does the same thing to each parameter independently. The bloat is always annoying but it's the best user API available in D. -Steve
Oct 14
prev sibling next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven Schveighoffer 
wrote:
 But... I still want to write `writelines(1, 2, 3, 4)`. The 
 ergonomics there are nice! Is there some way we can capture 
 this same reduction in complexity while still keeping the nice 
 syntax?
The answer to this, of course, is Lisp-style AST macros. You write some code that describes the AST transformation from `writelines(1, 2, 3, 4);` to `writeln(1); writeln(2); writeln(3); writeln(4);`, and the compiler executes this code at compile time and replaces the original AST with the result. Because the macro is totally ephemeral, it does not need to have space allocated for each expansion in the template cache, nor a unique name generated for the symbol table, nor time spent inlining the wrapper functions during optimization (or failing to), nor time spent during codegen and linking to process those wrapper functions. Since D will never have AST macros, we are condemned for eternity to suffer from either template bloat or awkward syntax.
Oct 14
parent monkyyy <crazymonkyyy gmail.com> writes:
On Tuesday, 14 October 2025 at 15:29:04 UTC, Paul Backus wrote:
 On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven 
 Schveighoffer wrote:
 But... I still want to write `writelines(1, 2, 3, 4)`. The 
 ergonomics there are nice! Is there some way we can capture 
 this same reduction in complexity while still keeping the nice 
 syntax?
The answer to this, of course, is Lisp-style AST macros. You write some code that describes the AST transformation from `writelines(1, 2, 3, 4);` to `writeln(1); writeln(2); writeln(3); writeln(4);`, and the compiler executes this code at compile time and replaces the original AST with the result. Because the macro is totally ephemeral, it does not need to have space allocated for each expansion in the template cache, nor a unique name generated for the symbol table, nor time spent inlining the wrapper functions during optimization (or failing to), nor time spent during codegen and linking to process those wrapper functions. Since D will never have AST macros, we are condemned for eternity to suffer from either template bloat or awkward syntax.
d has "ephemeral" compile time execution, I just dont know how much is deleted.
Oct 14
prev sibling parent Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven Schveighoffer 
wrote:
 In a conversation on a pull request for one of my libraries, I 
 came across an interesting revelation. Variadic template 
 functions waste resources for the most part, because much of 
 the time, you don't care about the *relationship* between the 
 parameters. In many such functions, you just process them in a 
 loop.

 […]

 But... I still want to write `writelines(1, 2, 3, 4)`. The 
 ergonomics there are nice! Is there some way we can capture 
 this same reduction in complexity while still keeping the nice 
 syntax?
Yes, macros. C++ was onto something.
Oct 15