digitalmars.D - Reducing variadic template combinatorics (C++ was onto something)

Steven Schveighoffer (92/92) Oct 13 In a conversation on a pull request for one of my libraries, I

monkyyy (4/11) Oct 13 I have fun ideas but youd need a benchmark; while I think I know
Daniel N (5/10) Oct 13 It's probably not what you want...

Steven Schveighoffer (8/20) Oct 14 So this won't work if the types are different.

Dennis (70/93) Oct 14 Very interesting thing you're bringing up, I'll share my thoughts.

Steven Schveighoffer (31/92) Oct 14 Yes, but it's not just runtime cost. There's cost to the compile

Dennis (16/19) Oct 14 I don't see the difference between the two, isn't the following

Steven Schveighoffer (26/45) Oct 14 Like if there were some way for the function author to identify

Paul Backus (15/19) Oct 14 The answer to this, of course, is Lisp-style AST macros. You

monkyyy (3/22) Oct 14 d has "ephemeral" compile time execution, I just dont know how

Quirin Schroll (3/14) Oct 15 Yes, macros. C++ was onto something.

Steven Schveighoffer <schveiguy gmail.com> writes:

In a conversation on a pull request for one of my libraries, I 
came across an interesting revelation. Variadic template 
functions waste resources for the most part, because much of the 
time, you don't care about the *relationship* between the 
parameters. In many such functions, you just process them in a 
loop.

Let's start with the traditional variadic D pattern. I'll write a 
function which writes each individual parameter on its own line:

```d
void writelines(T...)(T values)
{
     import std.stdio;
     static foreach(v; values)
         writeln(v);
}
```

Every combination of every type creates a new template 
instantiation. We only save on instantiations when 2 calls happen 
to match all their types.

But this function that is instantiated is all just calls to 
`writeln`! It's not an interesting function, nor is it really 
worthy of applying a combinatoric solution. We aren't getting any 
special optimization by having the entire list in view at once.

Let's take an example call:

```d
writelines(1, 2, 3, 4);
```

Note how we had to *type out* each parameter individually, in the 
same order they would be processed in the loop. Well, we can 
write this ourselves!

```d
writeln(1); writeln(2); writeln(3); writeln(4);
```

This accomplishes the same thing, but instead of a template 
instantiation per group of writes, we only get one instantiation 
of `writeln` for integers. We effectively have removed the 
combinatorics.

Now, this is quite the ugly solution! We have to repeat the call 
for each one.

But notice how we have written the same exact list! But instead 
of `", "`, our separator is `"); writeln("`.

What if we could reduce that separator? Maybe we can use an 
`opCall`?

```d
struct WriteLines
{
     ref opCall(T)(T val) {
         import std.stdio;
         writeln(val);
         return this;
     }
}

enum writelines2 = WriteLines.init;
```

Now, how does this look?


```d
writelines2(1)(2)(3)(4);
```

Our separator has changed into `")("`. A little nicer, but still 
looks weird.

But more importantly, we have *one* instantiation of the `opCall` 
for *all integers*. I can write any number of integers, or any 
combination of integers and strings, or anything else, and we 
only get one instantiation per type. The combinatorics are gone, 
and yet I'm *mostly* writing the same thing.

How about we try an operator? Wait, isn't there another language 
that does this?

```d
struct WriteLinesCpp
{
     ref opBinary(string s: "<<", T)(T val) {
         import std.stdio;
         writeln(val);
         return this;
     }
}
enum coutlines = WriteLinesCpp.init;
```

And the usage is as you would expect:

```d
coutlines << 1 << 2 << 3 << 4;
```

Again, the benefit here is less combinatorics -- one 
instantiation per type -- and less junk functions which are 
unrolling the unrolled loops that we typed in the first place.

But... I still want to write `writelines(1, 2, 3, 4)`. The 
ergonomics there are nice! Is there some way we can capture this 
same reduction in complexity while still keeping the nice syntax?

I'll leave it up to the experts here to think about. I can 
probably think of ways, but I'm sure they would not stand up to 
scrutiny.

-Steve

Oct 13

monkyyy <crazymonkyyy gmail.com> writes:

On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven Schveighoffer 
wrote:
 In a conversation on a pull request for one of my libraries, I 
 came across an interesting revelation. Variadic template 
 functions waste resources for the most part, because much of 
 the time, you don't care about the *relationship* between the 
 parameters. In many such functions, you just process them in a 
 loop.

 [...]

I have fun ideas but youd need a benchmark; while I think I know 
how to estimate the big O of dmd, theres always the fun edge cases

Oct 13

Daniel N <no public.email> writes:

On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven Schveighoffer 
wrote:
 Now, how does this look?

 ```d
 writelines2(1)(2)(3)(4);
 ```

 -Steve

It's probably not what you want...
[1,2,3,4].writeln;
... but I think the syntax is sufficiently nice.

Oct 13

Steven Schveighoffer <schveiguy gmail.com> writes:

On Tuesday, 14 October 2025 at 06:54:24 UTC, Daniel N wrote:
 On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven 
 Schveighoffer wrote:
 Now, how does this look?

 ```d
 writelines2(1)(2)(3)(4);
 ```

 It's probably not what you want...
 [1,2,3,4].writeln;
 ... but I think the syntax is sufficiently nice.

So this won't work if the types are different.

```d
writelines("value", 1);
writelines2("value")(1);
// ["value", 1].writeln; // does not work
```

-Steve

Oct 14

Dennis <dkorpel gmail.com> writes:

Very interesting thing you're bringing up, I'll share my thoughts.

On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven Schveighoffer 
wrote:
 ```d
 void writelines(T...)(T values)
 {
     import std.stdio;
     static foreach(v; values)
         writeln(v);
 }
 ```

 Every combination of every type creates a new template 
 instantiation. We only save on instantiations when 2 calls 
 happen to match all their types.

True, but the instantiations are all very short, so they are 
cheap to process and get inlined in an optimized build. In 
theory, it's not much worse than writing the calls yourself like 
in your second example:

 ```d
 writeln(1); writeln(2); writeln(3); writeln(4);
 ```

Much more problematic is:

```D
void writelines(T...)(T values)
{
     static foreach(v; values)
     {
         // *hundreds of lines of implementation for writeln(v)*
     }
}
```

Because then you duplicate the implementation multiple times, 
even though it's probably 99% the same for `int, uint, 
const(int), immutable(uint)` etc. + every variadic combination of 
those. So it's best practice to reduce the number of possible 
template parameter values as early as possible before getting to 
an implementation. Don't instantiate `impl!T` for every integral 
type when `impl!(T.sizeof)` also works.

Here's a real example of employing this technique: 
https://github.com/dlang/druntime/pull/3852

 How about we try an operator? (...)
 And the usage is as you would expect:

 ```d
 coutlines << 1 << 2 << 3 << 4;
 ```

The problem here is that << is meant to do arithmetic, not 
construct a list. In Java you can write `System.out.println("" + 
1 + 2 + 3 + 4);`, which is better, but + is still also used for 
arithmetic so it's still confusing (as seen by the `"" +` 
required to string concatenate rather than add). Now D has its 
own operator that naturally builds a list: ~

However, that often needlessly GC allocates, and it doesn't 
convert other types to strings. A pattern I often write is:

```D
void writeInTag(ref OutBuffer buf, int x)
{
     buf ~= "<";
     buf ~= x;
     buf ~= ">";
}
```

I'd rather write:
```D
buf ~= "<" ~ x ~ ">";
```

That doesn't work because I don't want `string ~ int`, I want it 
to use my `OutBuffer ~ int` implementation, which requires a 
different operator precedence:

```D
(buf ~= "<") ~ x ~ ">";
```

This looks bad, so I've considered swapping out the assignment 
operator for something with lower precedence:
```D
buf ~ "<" ~ x ~ ">";
```

But now we're back in C++ operator overloading abuse territory. 
An unsuspecting programmer would see this as a typo, creating a 
concatenated value and discarding it with no side effect. The 
only unambiguous one-liners I found to work so far are format 
strings or interpolated strings:

```D
buf.format("<%s>", x);
buf ~= i"<$(x)>";
```

 But... I still want to write `writelines(1, 2, 3, 4)`. The 
 ergonomics there are nice! Is there some way we can capture 
 this same reduction in complexity while still keeping the nice 
 syntax?

I'd say: just write a template helper that does nothing but 
forward each argument to another call. If that's somehow 
significantly worse than manually writing 4 function calls, 
investigate why that is and see if that can be fixed.

Oct 14

Steven Schveighoffer <schveiguy gmail.com> writes:

On Tuesday, 14 October 2025 at 12:30:23 UTC, Dennis wrote:
 Very interesting thing you're bringing up, I'll share my 
 thoughts.

 On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven 
 Schveighoffer wrote:
 ```d
 void writelines(T...)(T values)
 {
     import std.stdio;
     static foreach(v; values)
         writeln(v);
 }
 ```

 Every combination of every type creates a new template 
 instantiation. We only save on instantiations when 2 calls 
 happen to match all their types.

 True, but the instantiations are all very short, so they are 
 cheap to process and get inlined in an optimized build. In 
 theory, it's not much worse than writing the calls yourself 
 like in your second example:

 ```d
 writeln(1); writeln(2); writeln(3); writeln(4);
 ```


Yes, but it's not just runtime cost. There's cost to the compile 
time.

What I'm exploring is that we *have* an unrolled loop -- the 
parameter list. If somehow we can harness this into avoiding the 
different processing per call to reach the same conclusion, 
that's what I'm looking at. Whether this results in some kind of 
"best practice" or some special handling the compiler can use to 
save some compile time, I'm not sure. But I feel there is 
definitely some benefit that can be seen here.

 Much more problematic is:

 ```D
 void writelines(T...)(T values)
 {
     static foreach(v; values)
     {
         // *hundreds of lines of implementation for writeln(v)*
     }
 }
 ```

 Because then you duplicate the implementation multiple times, 
 even though it's probably 99% the same for `int, uint, 
 const(int), immutable(uint)` etc. + every variadic combination 
 of those. So it's best practice to reduce the number of 
 possible template parameter values as early as possible before 
 getting to an implementation. Don't instantiate `impl!T` for 
 every integral type when `impl!(T.sizeof)` also works.

It's also the same implementation for `int, int, int, int`!

the *hundreds of lines of implementation* is not only repeated 
for different types, it's also repeated for the *same* type with 
different parameter configuration.

In other words, `writelines(1, 2)` and `writelines(1, 2, 3)` 
creates 5 loop bodies, one for each parameter, whereas 
`writelines2(1)(2)` and `writelines2(1)(2)(3)` creates one "body" 
for int, and that's it.

The compiler compiles and processes all variations of function 
calls independently, and sees no benefit from realizing all the 
loop bodies are the same.

 Here's a real example of employing this technique: 
 https://github.com/dlang/druntime/pull/3852

Nice, yes this is part of it.

 How about we try an operator? (...)
 And the usage is as you would expect:

 ```d
 coutlines << 1 << 2 << 3 << 4;
 ```

 The problem here is that << is meant to do arithmetic, not 
 construct a list. In Java you can write `System.out.println("" 
 + 1 + 2 + 3 + 4);`, which is better, but + is still also used 
 for arithmetic so it's still confusing (as seen by the `"" +` 
 required to string concatenate rather than add).

I did not properly convey that I *don't* want this solution. It's 
just an exploration of what is possible with the current language.

I dislike C++ iostream operators, and prefer the parameter list 
style for writeln. But I had never considered that there is a 
*compile time* benefit to the way C++ is doing it.

 But... I still want to write `writelines(1, 2, 3, 4)`. The 
 ergonomics there are nice! Is there some way we can capture 
 this same reduction in complexity while still keeping the nice 
 syntax?

 I'd say: just write a template helper that does nothing but 
 forward each argument to another call. If that's somehow 
 significantly worse than manually writing 4 function calls, 
 investigate why that is and see if that can be fixed.

Right, if nothing else it is a "best practice" on how to avoid 
template bloat. But if there is a way we can convey to the 
compiler this pattern, it would be very nice!

-Steve

Oct 14

Dennis <dkorpel gmail.com> writes:

On Tuesday, 14 October 2025 at 14:44:56 UTC, Steven Schveighoffer 
wrote:
 Right, if nothing else it is a "best practice" on how to avoid 
 template bloat. But if there is a way we can convey to the 
 compiler this pattern, it would be very nice!

I don't see the difference between the two, isn't the following 
both a "best practice" as well as conveying the pattern to the 
compiler?

```D
void f(T...)(T args) if (T.length > 1)
{
     foreach (arg; args)
         f(arg);
}

void f(T)(T arg)
{
     // Implementation
}
```

Oct 14

Steven Schveighoffer <schveiguy gmail.com> writes:

On Tuesday, 14 October 2025 at 15:35:48 UTC, Dennis wrote:
 On Tuesday, 14 October 2025 at 14:44:56 UTC, Steven 
 Schveighoffer wrote:
 Right, if nothing else it is a "best practice" on how to avoid 
 template bloat. But if there is a way we can convey to the 
 compiler this pattern, it would be very nice!

 I don't see the difference between the two, isn't the following 
 both a "best practice" as well as conveying the pattern to the 
 compiler?

 ```D
 void f(T...)(T args) if (T.length > 1)
 {
     foreach (arg; args)
         f(arg);
 }

 void f(T)(T arg)
 {
     // Implementation
 }
 ```

Like if there were some way for the function author to identify 
that the variadic args are going to be looped independently, the 
compiler can save the code generation per type.

I don't know, maybe it's not worth the effort.

The idea is to keep the normal parameter list style, but the 
compiler rewrites it to cacheable templates. Either by being 
clued in that the variadic list is going to be processed this 
way, or by detecting it itself.

For example, if it can tell that the loop body doesn't use 
anything except the element, then it could rewrite the function 
as a standalone template automatically. Sort of like a lambda can 
detect whether it can be a function or delegate based on whether 
it uses any data from the surrounding frame.

I just feel like when your function is just doing a foreach over 
the tuple, it's a waste for the function to even exist, since you 
created the tuple in the first place!

The whole impetus for this is someone in one of my projects wants 
to change a function from a typesafe variadic into a template 
variadic. I'm hesitant to bloat the code this way for the sake of 
a new feature.

I've written a lot of variadic functions where the code mostly 
just loops over the variadic and does the same thing to each 
parameter independently. The bloat is always annoying but it's 
the best user API available in D.

-Steve

Oct 14

Paul Backus <snarwin gmail.com> writes:

On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven Schveighoffer 
wrote:
 But... I still want to write `writelines(1, 2, 3, 4)`. The 
 ergonomics there are nice! Is there some way we can capture 
 this same reduction in complexity while still keeping the nice 
 syntax?

The answer to this, of course, is Lisp-style AST macros. You 
write some code that describes the AST transformation from 
`writelines(1, 2, 3, 4);` to `writeln(1); writeln(2); writeln(3); 
writeln(4);`, and the compiler executes this code at compile time 
and replaces the original AST with the result. Because the macro 
is totally ephemeral, it does not need to have space allocated 
for each expansion in the template cache, nor a unique name 
generated for the symbol table, nor time spent inlining the 
wrapper functions during optimization (or failing to), nor time 
spent during codegen and linking to process those wrapper 
functions.

Since D will never have AST macros, we are condemned for eternity 
to suffer from either template bloat or awkward syntax.

Oct 14

monkyyy <crazymonkyyy gmail.com> writes:

On Tuesday, 14 October 2025 at 15:29:04 UTC, Paul Backus wrote:
 On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven 
 Schveighoffer wrote:
 But... I still want to write `writelines(1, 2, 3, 4)`. The 
 ergonomics there are nice! Is there some way we can capture 
 this same reduction in complexity while still keeping the nice 
 syntax?

 The answer to this, of course, is Lisp-style AST macros. You 
 write some code that describes the AST transformation from 
 `writelines(1, 2, 3, 4);` to `writeln(1); writeln(2); 
 writeln(3); writeln(4);`, and the compiler executes this code 
 at compile time and replaces the original AST with the result. 
 Because the macro is totally ephemeral, it does not need to 
 have space allocated for each expansion in the template cache, 
 nor a unique name generated for the symbol table, nor time 
 spent inlining the wrapper functions during optimization (or 
 failing to), nor time spent during codegen and linking to 
 process those wrapper functions.

 Since D will never have AST macros, we are condemned for 
 eternity to suffer from either template bloat or awkward syntax.

d has "ephemeral" compile time execution, I just dont know how 
much is deleted.

Oct 14

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven Schveighoffer 
wrote:
 In a conversation on a pull request for one of my libraries, I 
 came across an interesting revelation. Variadic template 
 functions waste resources for the most part, because much of 
 the time, you don't care about the *relationship* between the 
 parameters. In many such functions, you just process them in a 
 loop.

 […]

 But... I still want to write `writelines(1, 2, 3, 4)`. The 
 ergonomics there are nice! Is there some way we can capture 
 this same reduction in complexity while still keeping the nice 
 syntax?

Yes, macros. C++ was onto something.

Oct 15

D Programming

C/C++ Programming

Other

digitalmars.D - Reducing variadic template combinatorics (C++ was onto something)