www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Compile-time optimization

reply "JS" <js.mdnq gmail.com> writes:
There seems to be a lot of improvement for programming languages 
to optimize compile time aspects that are not taken into account. 
With ctfe I think such optimizations are more and more relevant 
in D.

I'll give a simple example:

Take a standard string joining function:

string join(string[] s)
{
     string ss;
     foreach(sss; s) ss ~= sss;
     return ss
}

when this string function is called at run-time with literal 
arguments it is extremely inefficient:

join(["a", "b", "c"]);

because the result can obviously be computed at compile-time by 
the compiler.

using ctfe can solve this problem but only when used in const 
results.

But there should be no reason why join(["a", "b", "c"]) can't be 
optimized at compile time so that the function call is replaced 
with "abc".

Now, think of join as defined like

string join(T...)(T t);

and we call it like

join("a", s, "b", "c", ss);

where s is string that is compile time evaluable and ss is a 
run-time string array.

the optimal join code would be something like

r = "a"~s~"bc"
foreach(sss; ss) r ~= sss;

(and which would change depending on the argument types and if 
they are compile time evaluable)


To generate such "optimal" code(one that gets the compiler to do 
as much work as it can at compile time) is somewhat convoluted in 
D. We can use string mixins to solve the problem but not easily 
because we can't pass variadic arguments to templates.

e.g.,

join(T...)(T s)
{
     mixin(tJoinH!(s))
}

fails because s is not known at compile time... even if s is 
partially known.

It would be nice if we could pass the arguments to a template 
even if they are not completely known at compile time.

e.g., join("a", s),

tJoinH can receive "a" and s but s's value, if not compile-time 
known, is undefined.

Hence, tJoinH can optimize the arguments by attempting to join 
the compile-time arguments and insert optimal code for the 
run-time arguments.

That is, if we call a function with an argument list, we should 
be able to pass that to a template without any errors and be able 
to determine the value of each argument... if the argument is 
undefined, then that becomes it's value.

Another way to see it work is with sum:

sum(1,2,x);

int sum(T...)(T i)
{
     // return 1 + 2 + i[3]; for the call above, it is optimal and 
our goal to generate
     // return mixin(tSumH!(i)); // hypothetically generates 1 + 2 
+ i[3] for the call above
     // return mixin(tSumH2!(i)); // generates i[1] + i[2] + i[3]. 
Current possible way

     // naive approach(assumes all arguments are known only at 
run-time):
     int y;
     foreach(ii; i) y += ii;
     return y;
}


I'm able to achieve this partially by passing T, rather than i, 
to the template function and the variable name, i. This works but 
may not produce optimal results(referencing the stack instead of 
the value of a literal... which prevents it from being further 
optimized).

e.g., from above, instead of return 1 + 2 + i[3]; I would 
generate return i[1] + i[2] + i[3]; Not optimal because i[1] and 
i[2] are treated as run-time entities even though in the above 
case they are not.


If D had some model to create functions that can optimize the way 
they deal with their compile-time arguments and run-time 
arguments then it would be pretty easy to create such functions.

e.g.,

string join(T...)(T t)    // Note join has a variadic parameter
{
     ctfe {
          // called when join is used in ctfe, t can be used to 
get the values directly if they exist at compile time, else their 
value is undefined.
          // can call join here with run-time entities

     }

     // runtime version goes here
}

we might have something like this

string join(T...)(T t) // easy notation to constrain T on char, 
string, string[]
{
     ctfe
     {
         string code = "string s;";
         foreach(k, val; t)
         {
             if (isDefined!val)
                 code ~= `s ~= "`~join(val)~`";`; // calls join as 
a ctfe on val.
             else
             {
                 if (type == string[])
                     code ~= `foreach(ss; t[`~k~`]) s~= sss;`;
                 } else code ~= `s ~= t[`~k~`];`;
             }
         }
         code ~= "return s;";
         mixin code;
     }

     // concatenates all arguments of t as if they were run-time 
arguments.
     string s;
     foreach(k, ss; t)
         static if (T[k].type == string[])
             foreach(sss; ss)
                 s ~= sss;
         else
             s ~= ss;
     return s;
}


I'm not saying this works but the idea is that the ctfe block is 
executed at compile time WHEN there are non-compile time 
arguments(ok, maybe ctfe isn't the best keyword). The block then 
figures out how to best deal with the non-ct arguments.

In the above code note how join is called in the ctfe block to 
handle known compile time types. This is just standard ctfe 
execution. The real "magic" happens when it is dealing with 
non-ct arguments, in which case it builds up meta code which is 
used as a mixin(special statement at end of ctfe block) that is 
inserted.

So for join("a", s); the code would execute something like

at compile time(in ctfe block)
     code = `string s;`;
     "a" is defined so
     code ~= `s ~= "~join("a")~`";`;
         join("a") is called, since "a" is known, the non-ctfe 
block is called, so we actually have
     code ~= `s ~= "a";`;
  next for s(which is undefined at compile time)
     code ~= `s ~= t[1];`;
  end
     code ~= "return s;";
so the string mixin code looks like:

     string s;
     s ~= "a";
     s ~= t[1];
     return s;

or simplified,

    return "a"~t[1];

which is an optimal join code for the join call join("a", s);

(it would make more sense to call a template in the ctfe block so 
we don't have form code strings of code strings)


Note also that having such a ctfe block does not break existing 
code structures and pre existing functions can easily be upgraded 
for optimal behavior.


Having the compiler do as much optimization as possible should 
produce a significant speedup in most applications.

When a function is called with any known compile time entities, 
it is possible for an optimization to take place. I think if D 
incorporates some mechanism in it's language to do so it will go 
a long.

In fact, all this simply demonstrates that ctfe's are not as 
powerful as they can be. Calling a ctfe'able function with just 
one run-time variable will prevent it from being ctfe'ed and 
optimized completely... even though there is still some 
possibility for optimization.


Hopefully some will read this acutely and attempt to ponder its 
potential.  I expect the immediate naysayers who can't see past 
their own noses... and those that get hung up on non-essential 
details(I said that the block doesn't have to be called ctfe... 
oh hell, you do realize we don't even need to use a block?).
Jul 23 2013
next sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 07/23/2013 09:21 AM, JS wrote:

 string join(T...)(T t)    // Note join has a variadic parameter
 {
      ctfe {
Are you aware of version(ctfe)?
 Hopefully some will read this acutely
Not me. :(
 and attempt to ponder its potential.
I understood that much though...
 I expect the immediate naysayers who can't see past their own 
noses... and
 those that get hung up on non-essential details(I said that the block
 doesn't have to be called ctfe... oh hell, you do realize we don't 
even need
 to use a block?).
That's me! :) Ali
Jul 23 2013
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Tuesday, 23 July 2013 at 17:41:01 UTC, Ali Çehreli wrote:
 On 07/23/2013 09:21 AM, JS wrote:

 string join(T...)(T t)    // Note join has a variadic
parameter
 {
      ctfe {
Are you aware of version(ctfe)?
Hum... really? I don't think that exists. Either that, or I'm using it wrong. Could you provide an example? What you do have is "if (__ctfe)" though... //---- import std.stdio; int foo() { version (poop) return 1; else return 0; } void main(string[] args) { enum a = foo(); auto b = foo(); writeln(a, " ", b); } //---- Prints: 0 0 //---- import std.stdio; int foo() { if (__ctfe) return 1; else return 0; } void main(string[] args) { enum a = foo(); auto b = foo(); writeln(a, " ", b); } //---- Prints: 1 0
Jul 24 2013
next sibling parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Wednesday, 24 July 2013 at 09:56:41 UTC, monarch_dodra wrote:
     version (poop)
Goddamnit. There goes my credibility -_-' I was testing to see if I got a compile error with an inexistant version tag. Please replace that with "ctfe"
Jul 24 2013
parent "JS" <js.mdnq gmail.com> writes:
On Wednesday, 24 July 2013 at 09:58:02 UTC, monarch_dodra wrote:
 On Wednesday, 24 July 2013 at 09:56:41 UTC, monarch_dodra wrote:
    version (poop)
Goddamnit. There goes my credibility -_-' I was testing to see if I got a compile error with an inexistant version tag. Please replace that with "ctfe"
lol... I was wondering were the hell that came from. I thought maybe it meant you wrote that part of the code on the toilet ;)
Jul 24 2013
prev sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 07/24/2013 02:56 AM, monarch_dodra wrote:

 On Tuesday, 23 July 2013 at 17:41:01 UTC, Ali Çehreli wrote:
 Are you aware of version(ctfe)?
Hum... really? I don't think that exists.
You are right. I must be remembering old proposals about it. Ali
Jul 24 2013
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jul 24, 2013 at 09:22:14AM -0700, Ali Çehreli wrote:
 On 07/24/2013 02:56 AM, monarch_dodra wrote:
 
 On Tuesday, 23 July 2013 at 17:41:01 UTC, Ali Çehreli wrote:
 Are you aware of version(ctfe)?
Hum... really? I don't think that exists.
You are right. I must be remembering old proposals about it.
[...] There's if(__ctfe), but unfortunately (IIRC) it doesn't work with static if, because CTFE runs after static ifs have been processed. So it might not be enough if you have code that has parts that can't run under CTFE. Those parts will taint all parent functions up the call tree as non-CTFEable, so once you have them you're out of luck. T -- We are in class, we are supposed to be learning, we have a teacher... Is it too much that I expect him to teach me??? -- RL
Jul 24 2013
next sibling parent reply "JS" <js.mdnq gmail.com> writes:
T,

I've looked over your code and understand what you are doing but 
I don't understand the expression

      is(typeof(func(args[0], args[1]))))

you are checking if func(a,b)'s return type is a type?
Jul 24 2013
parent "JS" <js.mdnq gmail.com> writes:
On Wednesday, 24 July 2013 at 18:12:37 UTC, JS wrote:
T,

Also, I've tried to wrap the template in a function and I can't 
get it to work:

	
	string join(T...)(string delim, T args)
	{
		return tupleStringReduce!(args);
	}

it seems tupleStringReduce just returns the tuple of of the 
optimized strings. I'll still need to write code that will deal 
mixing in the actual compile time and runtime code.

This doesn't seem that hard as tupleReduce essentially outlines 
the approach. The difference is instead of calling func directly 
I sort of need to mix it in as a code string.

e.g., for tupleStringReduce, func is x~y. I need to build a 
string using that "((arg[0]~arg[1])~arg[2])..." which is then 
mixed in at compile time but at runtime evaluates to a string(not 
an array).

Since there is no .codeof, AFAIK, I guess the only way to do this 
is pass func, and the func code string("x~y") in this case.

The idea though is to use the lamba as a ctfe to reduce the tuple 
when possible but *also* use it at runtime to evaluate the tuple.


Any ideas?
Jul 24 2013
prev sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Wednesday, 24 July 2013 at 17:10:01 UTC, H. S. Teoh wrote:
 On Wed, Jul 24, 2013 at 09:22:14AM -0700, Ali Çehreli wrote:
 On 07/24/2013 02:56 AM, monarch_dodra wrote:
 
 On Tuesday, 23 July 2013 at 17:41:01 UTC, Ali Çehreli wrote:
 Are you aware of version(ctfe)?
Hum... really? I don't think that exists.
You are right. I must be remembering old proposals about it.
[...] There's if(__ctfe), but unfortunately (IIRC) it doesn't work with static if, because CTFE runs after static ifs have been processed. So it might not be enough if you have code that has parts that can't run under CTFE. Those parts will taint all parent functions up the call tree as non-CTFEable, so once you have them you're out of luck. T
That's not true either. The code will compile, and CTFE will only complain if it *runs* into code that can't be executed CTFE-style. For example: //---- import std.stdio; import core.stdc.string; int foo() { if(__ctfe) { return 1; } else { int i = void; memset(&i, 0, 4); return i; } } void main(string[] args) { enum a = foo(); auto b = foo(); writeln(a, " ", b); } //---- This compiles and runs fine, even though memset is not ctfe-able.
Jul 24 2013
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 23, 2013 at 06:21:08PM +0200, JS wrote:
 There seems to be a lot of improvement for programming languages to
 optimize compile time aspects that are not taken into account. With
 ctfe I think such optimizations are more and more relevant in D.
[...]
 But there should be no reason why join(["a", "b", "c"]) can't be
 optimized at compile time so that the function call is replaced with
 "abc".
[...]
 To generate such "optimal" code(one that gets the compiler to do as
 much work as it can at compile time) is somewhat convoluted in D. We
 can use string mixins to solve the problem but not easily because we
 can't pass variadic arguments to templates.
 
 e.g.,
 
 join(T...)(T s)
 {
     mixin(tJoinH!(s))
 }
 
 fails because s is not known at compile time... even if s is
 partially known.
[...] Your CTFE-fu isn't good enough. ;-) Check this out: import std.stdio; template tuple(args...) { alias tuple = args; } /** * Given a tuple of strings, returns a tuple in which all * adjacent compile-time readable strings are concatenated. */ template tupleReduce(args...) { static if (args.length > 1) { static if (is(typeof(args[0])==string) && __traits(compiles, { enum x = args[0]; })) { static if (is(typeof(args[1])==string) && __traits(compiles, { enum x = args[1]; })) { alias tupleReduce = tupleReduce!(args[0] ~ args[1], args[2..$]); } else { alias tupleReduce = tuple!(args[0], args[1], tupleReduce!(args[2..$])); } } else { alias tupleReduce = tuple!(args[0], tupleReduce!(args[1..$])); } } else { alias tupleReduce = args; } } void main() { string x = "runtime1"; string y = "runtime2"; auto arr = [ tupleReduce!("a", "b", x, "c", "d", y, "e", "f") ]; writeln(arr); } The output is: ["ab", "runtime1", "cd", "runtime2", "ef"] which proves that all adjacent compile-time readable strings have been concatenated at compile-time. Using tupleReduce, you can optimize calls to join by writing: join(tupleReduce!("a", "b", x, "c", "d")); and it will be compiled as though you've written: join("ab", x, "cd"); Yeah, so tupleReduce looks a bit on the ugly side. But the point is that it works. So don't be so quick to say something is not possible in D, 'cos even with its current warts, D is one powerful beast. :-) T -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot
Jul 23 2013
parent "Jesse Phillips" <Jesse.K.Phillips+D gmail.com> writes:
On Tuesday, 23 July 2013 at 18:10:54 UTC, H. S. Teoh wrote:
 	void main() {
 		string x = "runtime1";
 		string y = "runtime2";
 		auto arr = [ tupleReduce!("a", "b", x, "c", "d", y, "e", "f") 
 ];
 		writeln(arr);
 	}
Heh, missed this. Forgot about recursive aliasing too. My suggestion also is bogus since literals can't be aliased.
Jul 23 2013
prev sibling next sibling parent "Jesse Phillips" <Jesse.K.Phillips+D gmail.com> writes:
On Tuesday, 23 July 2013 at 16:21:10 UTC, JS wrote:
 Now, think of join as defined like

 string join(T...)(T t);

 and we call it like

 join("a", s, "b", "c", ss);
To summarize to the best of my knowledge, the request is to provide string join(alias T...)() {} With the call changed to: join!("a", s, "b", "c", ss); I could be wrong here. I think think this is a reasonable request, but haven't looked at any implications of adding it.
Jul 23 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 23, 2013 at 11:09:24AM -0700, H. S. Teoh wrote:
[...]
 Using tupleReduce, you can optimize calls to join by writing:
 
 	join(tupleReduce!("a", "b", x, "c", "d"));
 
 and it will be compiled as though you've written:
 
 	join("ab", x, "cd");
[...] I've generalized tupleReduce to handle reduction of tuples by any CTFE-able binary function: import std.stdio; template tuple(args...) { alias tuple = args; } /** * Given a binary function and a tuple, returns a tuple in which all adjacent * compile-time readable items are recursively substituted with the return * value of the binary function applied to them. */ template tupleReduce(alias func, args...) { static if (args.length > 1) { static if (__traits(compiles, { enum x = args[0]; })) { static if (__traits(compiles, { enum x = args[1]; }) && is(typeof(func(args[0], args[1])))) { enum result = func(args[0], args[1]); alias tupleReduce = tupleReduce!(func, result, args[2..$]); } else { alias tupleReduce = tuple!(args[0], args[1], tupleReduce!(func, args[2..$])); } } else { alias tupleReduce = tuple!(args[0], tupleReduce!(func, args[1..$])); } } else { alias tupleReduce = args; } } /** * Given a tuple of strings, returns a tuple in which all adjacent compile-time * readable strings are concatenated. */ template tupleStringReduce(args...) { alias tupleStringReduce = tupleReduce!( (string x, string y) => x~y, args ); } void main() { string x = "runtime1"; string y = "runtime2"; auto arr = [ tupleStringReduce!("a", "b", x, "c", "d", y, "e", "f", "g", x) ]; writeln(arr); int i1=100, i2=200; auto brr = [ tupleReduce!((int x, int y) => x+y, 1, 2, i1, 3, 4, i2, 5, 6, 7) ]; writeln(brr); } The output is: ["ab", "runtime1", "cd", "runtime2", "efg", "runtime1"] [3, 100, 7, 200, 18] For the string case, to make user code more convenient, I've written an alias template tupleStringReduce() that basically supplies the binary function for concatenating adjacent strings. The second case shows how this can be explicitly specified via a function literal. The example here reduces adjacent compile-time readable ints via addition; you could substitute that with any binary function you want, say (x,y)=>x*y, or (x,y)=>x*2-y, or whatever. The only limit is that the function must be CTFE-able. With the help of the new tupleReduce, you can do all sorts of compile-time optimizations. D gives you power -- awesome power -- but you do have to learn how to tame the beast first. ;-) T -- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald Knuth
Jul 23 2013
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 23, 2013 at 12:42:18PM -0700, H. S. Teoh wrote:
[...]
 With the help of the new tupleReduce, you can do all sorts of
 compile-time optimizations.
[...] Here's an example of how to implement compile-time optimization of join() with the help of tupleReduce: // Just for the sake of example, the following function uses "," // to join the strings in the arguments. string join(string[] args...) { string result; string delim; foreach (arg; args) { result ~= delim ~ arg; delim = ","; // just for example } return result; } // This is the wrapper that uses tupleReduce (see my previous // post) to reduce the arguments at compile-time. For // illustration purposes, I omitted the "," from the reduction // function, so that you can tell from the output what has been // reduced at compile-time, and what is reduced at runtime. template optimizedJoin(args...) { string optimizedJoin() { return join(tupleReduce!((string x, string y) => x~y, args)); } } void main() { writeln(optimizedJoin!(x, "a", "bc", y, "de", "f")); } The output is: runtime1,abc,runtime2,def As you can see, "a" and "bc" have been concatenated at compile-time, and so have "de" and "f". Since x and y can't be read at compile-time, they remain as variable arguments. The optimizedJoin template then collects the elements of the reduced tuple and hands it off to join() to do the rest of the work at runtime. Notice that join() uses "delim ~ arg" to do the concatenation at runtime; we could ostensibly replace this with the same function literal we pass to tupleReduce in optimizedJoin(). Then you'll have a full implementation of join() that performs compile-time optimization of its parameters. I rest my case. :-) T -- In order to understand recursion you must first understand recursion.
Jul 23 2013
parent reply "JS" <js.mdnq gmail.com> writes:
On Tuesday, 23 July 2013 at 19:59:46 UTC, H. S. Teoh wrote:
 On Tue, Jul 23, 2013 at 12:42:18PM -0700, H. S. Teoh wrote:
 [...]
 With the help of the new tupleReduce, you can do all sorts of
 compile-time optimizations.
[...] Here's an example of how to implement compile-time optimization of join() with the help of tupleReduce: // Just for the sake of example, the following function uses "," // to join the strings in the arguments. string join(string[] args...) { string result; string delim; foreach (arg; args) { result ~= delim ~ arg; delim = ","; // just for example } return result; } // This is the wrapper that uses tupleReduce (see my previous // post) to reduce the arguments at compile-time. For // illustration purposes, I omitted the "," from the reduction // function, so that you can tell from the output what has been // reduced at compile-time, and what is reduced at runtime. template optimizedJoin(args...) { string optimizedJoin() { return join(tupleReduce!((string x, string y) => x~y, args)); } } void main() { writeln(optimizedJoin!(x, "a", "bc", y, "de", "f")); } The output is: runtime1,abc,runtime2,def As you can see, "a" and "bc" have been concatenated at compile-time, and so have "de" and "f". Since x and y can't be read at compile-time, they remain as variable arguments. The optimizedJoin template then collects the elements of the reduced tuple and hands it off to join() to do the rest of the work at runtime. Notice that join() uses "delim ~ arg" to do the concatenation at runtime; we could ostensibly replace this with the same function literal we pass to tupleReduce in optimizedJoin(). Then you'll have a full implementation of join() that performs compile-time optimization of its parameters. I rest my case. :-) T
I must study your powerful ctfe-fu master! Great fu you have! This is great that D can do this and from just a quick glance it looks like you solve the problem. Your code doesn't exactly do what I wanted but probably because I wasn't clear... I think it is modifiable and solves the problem I was having(again, assuming your code is correct). Note though that something like string x = "asd" join(x,...); // x is compile time known and should be joined at compile time. Not sure if D is smart enough to get this one? When I get some time later I'll check your code out, what I want to do is get something like string x = join(a, b, "a", "b"); to produce the same code as string x = a~b~"ab"; (no looping using for each except for string arrays) Thanks!
Jul 23 2013
next sibling parent reply "Yota" <yotaxp thatGoogleMailThing.com> writes:
On Tuesday, 23 July 2013 at 20:11:21 UTC, JS wrote:
 Your code doesn't exactly do what I wanted but probably because 
 I wasn't clear... I think it is modifiable and solves the 
 problem I was having(again, assuming your code is correct).

 Note though that something like

 string x = "asd"
 join(x,...); // x is compile time known and should be joined at 
 compile time. Not sure if D is smart enough to get this one?

 When I get some time later I'll check your code out, what I 
 want to do is get something like

 string x = join(a, b, "a", "b"); to produce the same code as
 string x = a~b~"ab";

 (no looping using for each except for string arrays)

 Thanks!
Don't forget that tuples can store more than just types! See this for instance. import std.stdio; typeof(Xs[0]) sum(Xs...)() { pragma(msg, Xs); // tuple(1, 2, 3) typeof(Xs[0]) y; foreach(x; Xs) y ~= x; return y; } int main() { auto n = sum!(1, 2, 3); enum m = sum!(1, 2, 3); pragma(msg, typeof(n)); // int writeln(n); // 6 writeln(m); // 6 return 0; } Works great. When I comment out the 'sum!("1", "2", "3")' and replace it with a literal '6', the compiled EXE is byte for byte exactly the same. You can exchange the '+=' for '~=' and it will concatenate strings instead. Of course, now all arguments MUST be known at compile time, but that's no surprise. PS: OK, just noticed that you were hoping to mix run time and compile time arguments. Oops.... Oh well, posting anyway because I think it's neat. lol As for a~"B"~"C" compiling into a~"BC", I think we just need to work on constant folding in the compiler. (If it doesn't already do this anyway.)
Jul 23 2013
next sibling parent "Yota" <yotaxp thatGoogleMailThing.com> writes:
On Tuesday, 23 July 2013 at 22:38:35 UTC, Yota wrote:
   typeof(Xs[0]) sum(Xs...)()
   {
       pragma(msg, Xs); // tuple(1, 2, 3)
       typeof(Xs[0]) y;
       foreach(x; Xs) y ~= x;
       return y;
   }
*facepalm* That ~= in there is what I get for copying code after I experiment on it. Ment for it to be a +=. Phooey ~Yota
Jul 23 2013
prev sibling parent reply "JS" <js.mdnq gmail.com> writes:
On Tuesday, 23 July 2013 at 22:38:35 UTC, Yota wrote:
 On Tuesday, 23 July 2013 at 20:11:21 UTC, JS wrote:
 Your code doesn't exactly do what I wanted but probably 
 because I wasn't clear... I think it is modifiable and solves 
 the problem I was having(again, assuming your code is correct).

 Note though that something like

 string x = "asd"
 join(x,...); // x is compile time known and should be joined 
 at compile time. Not sure if D is smart enough to get this one?

 When I get some time later I'll check your code out, what I 
 want to do is get something like

 string x = join(a, b, "a", "b"); to produce the same code as
 string x = a~b~"ab";

 (no looping using for each except for string arrays)

 Thanks!
Don't forget that tuples can store more than just types! See this for instance. import std.stdio; typeof(Xs[0]) sum(Xs...)() { pragma(msg, Xs); // tuple(1, 2, 3) typeof(Xs[0]) y; foreach(x; Xs) y ~= x; return y; } int main() { auto n = sum!(1, 2, 3); enum m = sum!(1, 2, 3); pragma(msg, typeof(n)); // int writeln(n); // 6 writeln(m); // 6 return 0; } Works great. When I comment out the 'sum!("1", "2", "3")' and replace it with a literal '6', the compiled EXE is byte for byte exactly the same. You can exchange the '+=' for '~=' and it will concatenate strings instead. Of course, now all arguments MUST be known at compile time, but that's no surprise. PS: OK, just noticed that you were hoping to mix run time and compile time arguments. Oops.... Oh well, posting anyway because I think it's neat. lol As for a~"B"~"C" compiling into a~"BC", I think we just need to work on constant folding in the compiler. (If it doesn't already do this anyway.)
Well, it's not that simple. Constant folding is easy when it is obvious to the compiler... but something buried in foreach loop that uses runtime and compile time objects is not obvious. My three points out of the post is: 1. D needs some way to make doing this stuff easier. 2. It needs to be done! ;) 3. Goto 1. I didn't realize D could even potentially do it, and still not sure if it can(have to check out T's post again and work with the code). I think there is a lot of potential power behind such techniques with regard to optimization. Suppose you are writing a matrix library. Wouldn't it be nice to have certain matrix computations that can be done at compile time actually done? (I suppose what I'm trying to do is extend the compiler's "constant folding" to be able to deal with much more complex cases). Another way to potentially do it, I think, is for the compiler to automatically expand ctfe functions and replace runtime entities with "placeholder" code which is then expanded at runtime. e.g., for a simple join join(T...)(T t) { string s; foreach(tt; t) s ~= tt; return s; } the compiler can unroll the loop to get s ~= t[0]~t[1]~t[2]~...; replace the compile time entities, e.g., suppose t[1] and t[3] are known at compile time at the invocation, then s ~= t[0]~"abcd"~t[2]~"asdf"; then this code could be used inline for the join call instead of the actual code. The key here is there is no dependence on the for loop. If all arguments are compile-time evaluable, then s could be just a reference lookup to a static string. If it can't be done, then no big deal, it is optimized as far as it can go. Only if all arguments passed are runtime entities will the exact code be used. This seems like it would take a pretty smart compiler and possibly it would have issues in some many cases. I personally don't mind writing the code itself but it seems pretty complex, and having no way to debug the code generation easily(AFAIK) makes it difficult to do. Possibly using version(ctfe) can help... I'll work on it when I get a chance...
Jul 23 2013
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jul 24, 2013 at 01:35:23AM +0200, JS wrote:
[...]
 My three points out of the post is:
 
 1. D needs some way to make doing this stuff easier.
 2. It needs to be done! ;)
 3. Goto 1.
 
 I didn't realize D could even potentially do it, and still not sure
 if it can(have to check out T's post again and work with the code).
 
 I think there is a lot of potential power behind such techniques
 with regard to optimization.
 
 Suppose you are writing a matrix library. Wouldn't it be nice to
 have certain matrix computations that can be done at compile time
 actually done? (I suppose what I'm trying to do is extend the
 compiler's "constant folding" to be able to deal with much more
 complex cases).
[...] You're looking for expression templates. So far, nobody seems to have written expression templates in D yet (that I know of, anyway, I could be wrong). D should be more than well-equipped to handle expression templates, you just have to write them. (But having said that, they aren't exactly easy to write. :-P) I think the last time this topic came up, somebody mentioned the dreaded mythical AST macros, and the discussion got derailed. I've been meaning to write a D matrix library using expression templates, but I haven't gotten around to it yet. The basic idea behind expression templates is that overloaded operators (or function calls, if operator overloading isn't good enough for you) don't return concrete types, but a proxy wrapper type that gets unboxed into a real type when assigned to a concrete variable. The operators are overloaded to take these wrapper types as arguments, and basically construct a template tree paralleling the structure of the expression so that when you're finally assigning the result to a concrete variable, the assignment operator can traverse the expression tree, fold constants and do whatever other optimizations you wish, then generate optimized code for performing the operation represented by the tree. That's the theory anyway. In practice, expression templates are rather hard to write (and read!) if you're the unlucky library writer (it *is* really nice from the user's POV though!). They are also one of the sources of C++ error messages that span 15 pages per line, if you're ever unlucky enough to run into a problem. Last I was told, however, that the preferred approach in D is to use string mixins instead. You basically write a string containing an arbitrary expression (of arbtrary syntax, even!) that gets parsed by a CTFE lexer/parser that can perform arbitrary transformations on it, which generates D code as a string that you can use in a mixin. This is significantly more powerful than expression templates, because you're no longer constrained by the syntax of the host language, but your input string can be literally *anything*, like a DSL, or a math expression containing real, bona fide Unicode mathematical symbols for operators, sets, and what-have-you. You could even write your own language that way. All of it eventually gets compiled down into D code as a string mixin. Syntax-wise, there's the slight ugliness of having to write "mixin(MyDSL("A ± B / C → D"))", but that's more than outweighed by the total freedom of syntax within the input string itself. I saw an expression-template-based regex implementation in C++ once, and almost went blind. It overloaded C++ operators in rather ugly ways in order to work around the fact that C++ doesn't have native regex operators. By contrast, our own std.regex has a ctRegex function that essentially does the same thing while allowing you to use native regex syntax within a string literal. I think that's a far better approach for many cases. :) T -- It always amuses me that Windows has a Safe Mode during bootup. Does that mean that Windows is normally unsafe?
Jul 23 2013
parent "JS" <js.mdnq gmail.com> writes:
On Wednesday, 24 July 2013 at 01:07:15 UTC, H. S. Teoh wrote:
 On Wed, Jul 24, 2013 at 01:35:23AM +0200, JS wrote:
 [...]
 My three points out of the post is:
 
 1. D needs some way to make doing this stuff easier.
 2. It needs to be done! ;)
 3. Goto 1.
 
 I didn't realize D could even potentially do it, and still not 
 sure
 if it can(have to check out T's post again and work with the 
 code).
 
 I think there is a lot of potential power behind such 
 techniques
 with regard to optimization.
 
 Suppose you are writing a matrix library. Wouldn't it be nice 
 to
 have certain matrix computations that can be done at compile 
 time
 actually done? (I suppose what I'm trying to do is extend the
 compiler's "constant folding" to be able to deal with much more
 complex cases).
[...]
Your tuple reduce seems to work! Debugging the code shows no loops and memory debugging looks like the adjacent constants are folded! magnificent! As far as over optimizing, I am trying to write some library code and want to "get it right" the first time... having it not only optimize the best it can but also make the functions easier to use... e.g., variargs instead of nested calls. As far as ctfe running time issues, I believe they exist already? In any case, the compiler at some point has to mark a variable that is compile time evaluable as not evaluable at some point(probably at a function call) What we do know is arguments to ctfe's return compile time evaluable results... So it is easy to know that a variable is compile time evaluable but not necessarily actually be able to compute it at compile time.... a simple time out count with error/warning could be used... if failed it could be marked as a run-time entity at that point and beyond(so no optimization beyond the point where it can't be computed at compile time. Not perfect but covers 99.99% of cases.
 You're looking for expression templates. So far, nobody seems 
 to have
 written expression templates in D yet (that I know of, anyway, 
 I could
 be wrong). D should be more than well-equipped to handle 
 expression
 templates, you just have to write them. (But having said that, 
 they
 aren't exactly easy to write. :-P)
I think I'll pass, I'm still struggling just to learn D. You, OTH, seem like someone that can pull it off rather easily!
 I think the last time this topic came up, somebody mentioned 
 the dreaded
 mythical AST macros, and the discussion got derailed. I've been 
 meaning
 to write a D matrix library using expression templates, but I 
 haven't
 gotten around to it yet.

 The basic idea behind expression templates is that overloaded 
 operators
 (or function calls, if operator overloading isn't good enough 
 for you)
 don't return concrete types, but a proxy wrapper type that gets 
 unboxed
 into a real type when assigned to a concrete variable. The 
 operators are
 overloaded to take these wrapper types as arguments, and 
 basically
 construct a template tree paralleling the structure of the 
 expression so
 that when you're finally assigning the result to a concrete 
 variable,
 the assignment operator can traverse the expression tree, fold 
 constants
 and do whatever other optimizations you wish, then generate 
 optimized
 code for performing the operation represented by the tree.

 That's the theory anyway. In practice, expression templates are 
 rather
 hard to write (and read!) if you're the unlucky library writer 
 (it *is*
 really nice from the user's POV though!). They are also one of 
 the
 sources of C++ error messages that span 15 pages per line, if 
 you're
 ever unlucky enough to run into a problem.

 Last I was told, however, that the preferred approach in D is 
 to use
 string mixins instead. You basically write a string containing 
 an
 arbitrary expression (of arbtrary syntax, even!) that gets 
 parsed by a
 CTFE lexer/parser that can perform arbitrary transformations on 
 it,
 which generates D code as a string that you can use in a mixin. 
 This is
 significantly more powerful than expression templates, because 
 you're no
 longer constrained by the syntax of the host language, but your 
 input
 string can be literally *anything*, like a DSL, or a math 
 expression
 containing real, bona fide Unicode mathematical symbols for 
 operators,
 sets, and what-have-you. You could even write your own language 
 that
 way. All of it eventually gets compiled down into D code as a 
 string
 mixin.

 Syntax-wise, there's the slight ugliness of having to write
 "mixin(MyDSL("A ± B / C → D"))", but that's more than 
 outweighed by the
 total freedom of syntax within the input string itself. I saw an
 expression-template-based regex implementation in C++ once, and 
 almost
 went blind. It overloaded C++ operators in rather ugly ways in 
 order to
 work around the fact that C++ doesn't have native regex 
 operators. By
 contrast, our own std.regex has a ctRegex function that 
 essentially does
 the same thing while allowing you to use native regex syntax 
 within a
 string literal. I think that's a far better approach for many 
 cases. :)
I was wondering if that was possible. I was using string mixins to essentially accomplish the constant folding was thinking that one could implement java script or some other language in the strings and have them transformed. It could be pretty useful to have. I've still got a lot to learn about D but it seems nearly unbounded for what it can do compared to other languages. Thanks for making this work! I'll spend some time studying your code to learn how it works then try to apply it to my library code. Trying to do stripLeft/Right, split, join, etc... for strings. Each function taking an arbitrary number of run-time or compile-time char or string arguments with optimal code generation.
Jul 24 2013
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 23, 2013 at 10:11:17PM +0200, JS wrote:
[...]
 Note though that something like
 
 string x = "asd"
 join(x,...); // x is compile time known and should be joined at
 compile time. Not sure if D is smart enough to get this one?
Well, if you want *that*, then you'll just have to submit a DMD pull request for the optimizer. ;-) The problem is that even though in this particular case it should be easy for the compiler to figure out, in the general case it's far from trivial. For example, you might write: string x = "asd"; x ~= "def"; join(x, ...); Arguably, the compiler should detect that the value of x is computable at compile-time, so it should apply the optimization. But the problem is, the code between the first line and the call to join() can be arbitrarily complex... for example: enum VeryLargeNumber = 2_391_382_391_483; string x = "asd"; if (canFactorize(VeryLargeNumber)) x ~= "x"; join(x, ...); Assuming canFactorize is CTFE-able, in theory the compiler should be able to figure out the value of x at compile-time. But integer factorization currently has no simple solution, which means your code will take 5 months to compile while DMD tries to solve integer factorization. Worse yet, you might write something like: string x = "asd"; void function(int x) fn = createArbitraryFunction(); if (canHalt(fn)) x ~= "x"; join(x, ...); Then your program will *never* finish compiling, because the halting problem is unsolvable, but nothing stops you from writing a CTFE-able solution that requires infinite time to find an answer.
 When I get some time later I'll check your code out, what I want to
 do is get something like
 
 string x = join(a, b, "a", "b"); to produce the same code as
 string x = a~b~"ab";
 
 (no looping using for each except for string arrays)
[...] That's pretty much what my code already does. Just use GDC and it will unroll the loop for you. ;-) Alternatively, since tupleReduce returns a tuple, you *could* just foreach over it (loops involving tuples are automatically unrolled at compile-time) and generate the code. That is, instead of using optimizedJoin() as a wrapper for a function, you could do something like this instead: string optimizedJoin(args...)() { string result; foreach (arg; tupleReduce!( (string x, string y) => x~y, args)) { result ~= arg; } return result; } I haven't tested this code, though, so I'm not 100% sure if it will work. But if not, a little massaging should make it work. On another note, though, it makes me wonder why you want this level of control over the generated code. Are you sure you're not doing premature optimization? You should be aware that the more you use such arcane tricks to generate this "optimized" code, the less the compiler's built-in optimizer will be able to understand your intent, and the more likely that it may be generating more inferior code than if you had just written the code the "normal" way. Generally, this level of manual optimization should only be done after you've profiled your code and identified the hotspots. I know from experience that more often than not, what you *think* is a hotspot actually isn't. It's only after running a profiler that you find out where the *real* hotspots are. :) I used to be a hardcore optimization freak -- used to read assembly output from the compiler to identify inefficiencies -- until one day I actually ran a profiler on my code. Turns out that the real hotspot is completely NOT where I expected it. All those late nights spent "optimizing" my code, only to get negligible performance gain, when a 1-line change in the *real* hotspot improved performance by 30% instantly (and only took me 5 minutes instead of hours and hours). T -- My program has no bugs! Only undocumented features...
Jul 23 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 23, 2013 at 12:58:11PM -0700, H. S. Teoh wrote:
[...]
 	void main() {
Oops, forgot to include the definition of x and y in main(): string x = "runtime1"; string y = "runtime2";
 		writeln(optimizedJoin!(x, "a", "bc", y, "de", "f"));
 	}
T -- People say I'm indecisive, but I'm not sure about that. -- YHL, CONLANG
Jul 23 2013
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jul 24, 2013 at 08:12:36PM +0200, JS wrote:
 T,
 
 I've looked over your code and understand what you are doing but I
 don't understand the expression
 
      is(typeof(func(args[0], args[1]))))
 
 you are checking if func(a,b)'s return type is a type?
That's the usual idiom for checking if it's legal to pass args[0] and args[1] to func(). If it's legal, then typeof(func(args[0],args[1])) will return some type, which then passes the check. If it's illegal, then the call to func has no type because the compiler rejects the call; say you have func(int,int) and you attempted to call func("a","a"). So then typeof(func(...)) will not be a valid type, and the check fails. On Wed, Jul 24, 2013 at 08:42:48PM +0200, JS wrote:
 On Wednesday, 24 July 2013 at 18:12:37 UTC, JS wrote:
 T,
 
 Also, I've tried to wrap the template in a function and I can't get
 it to work:
 
 	
 	string join(T...)(string delim, T args)
 	{
 		return tupleStringReduce!(args);
 	}
 
 it seems tupleStringReduce just returns the tuple of of the
 optimized strings. I'll still need to write code that will deal
 mixing in the actual compile time and runtime code.
You can't return a tuple as a string. :) Mainly because the joining has only happened to the string literals, but you haven't told the compiler how the tuple itself is to be joined into a string. I thought I already gave example code to do this: string join(string[] args...) { string result; string delim; // You need this in order for the compiler to know what // you wanna do with the optimized tuple. foreach (arg; args) { result ~= delim ~ arg; delim = ","; // just for example } return result; } template optimizedJoin(args...) { // This wrapper optimizes the tuple and passes the // result to join(). string optimizedJoin() { return join(tupleReduce!((string x, string y) => x~y, args)); } } Note that you have to use template calling syntax for optimizedJoin, like this: string result = optimizedJoin!("a", "b", x, "c", "d"); I did try wrapping it so that you can use normal runtime call syntax, but it doesn't work because the tuple reduction must happen at template expansion stage. It's too late to do it in CTFE proper.
 This doesn't seem that hard as tupleReduce essentially outlines the
 approach. The difference is instead of calling func directly I sort
 of need to mix it in as a code string.
 
 e.g., for tupleStringReduce, func is x~y. I need to build a string
 using that "((arg[0]~arg[1])~arg[2])..." which is then mixed in at
 compile time but at runtime evaluates to a string(not an array).
 
 Since there is no .codeof, AFAIK, I guess the only way to do this is
 pass func, and the func code string("x~y") in this case.
 
 The idea though is to use the lamba as a ctfe to reduce the tuple
 when possible but *also* use it at runtime to evaluate the tuple.
[...] Alright, here's an improved version of the code: import std.stdio; template tuple(args...) { alias tuple = args; } /** * Given a binary function and a tuple, returns a tuple in which all adjacent * compile-time readable items are recursively substituted with the return * value of the binary function applied to them. */ template tupleReduce(alias func, args...) { static if (args.length > 1) { static if (__traits(compiles, { enum x = args[0]; })) { static if (__traits(compiles, { enum x = args[1]; }) && is(typeof(func(args[0], args[1])))) { enum result = func(args[0], args[1]); alias tupleReduce = tupleReduce!(func, result, args[2..$]); } else { alias tupleReduce = tuple!(args[0], args[1], tupleReduce!(func, args[2..$])); } } else { alias tupleReduce = tuple!(args[0], tupleReduce!(func, args[1..$])); } } else { alias tupleReduce = args; } } // I decided to use a real variadic instead of an array, to // force loop-unrolling. // Strictly speaking, we should add a signature constraint to // enforce uniform types in T..., but I was too lazy to figure // out how to phrase allSatisfy to check for that. For // production code, though you'll definitely want to do this. auto reduceImpl(alias fun, T...)(T args) { import std.functional; typeof(unaryFun!fun(args[0],args[0])) result; foreach (arg; args) { result = unaryFun!fun(result, arg); } return result; } // Nice wrapper for reduceImpl, so that we don't have to type // the lambda twice. template reduce(alias fun, args...) { string reduce() { return reduceImpl!fun(tupleReduce!(fun, args)); } } /** * Joins a list of strings. Adjacent CTFEable strings are joined * at compile-time into single strings, while runtime arguments * are joined at runtime. */ template join(args...) { string join() { // Behold, only one instance of the lambda! It // gets used both at compile-time and runtime. return reduce!((string x, string y) => x~y, args); } } void main() { // Example of how to use tupleReduce for compile-time // optimization of join: string x = "(runtime1)"; string y = "(runtime2)"; // Note that you still have to use compile-time argument // syntax for join. writeln(join!(x, "a", "bc", y, "de", "f", "g")); } Compiling with dmd -O -inline produced two nested function calls for the join!(...), though inside, the string literals have already been concatenated. Compiling with gdc -O3, OTOH, produced an unrolled loop in Dmain which basically sequentially appends to the result of join!(), i.e., it's as though I wrote: void main() { string x = "(runtime1)"; string y = "(runtime2)"; writeln(x ~ "abc" ~ y ~ "defg"); } You can't get much better than that. :) I think trying to repackage join!(...) as join(...) is a fool's errand, though. You'll likely only end up with pain. :) IMO writing join!(...) is a Good Enough(tm) compromise given that we have already achieved maximal compile-time optimization. T -- ASCII stupid question, getty stupid ANSI.
Jul 24 2013
parent reply "JS" <js.mdnq gmail.com> writes:
On Wednesday, 24 July 2013 at 19:52:57 UTC, H. S. Teoh wrote:
 On Wed, Jul 24, 2013 at 08:12:36PM +0200, JS wrote:
 T,
 
 I've looked over your code and understand what you are doing 
 but I
 don't understand the expression
 
      is(typeof(func(args[0], args[1]))))
 
 you are checking if func(a,b)'s return type is a type?
That's the usual idiom for checking if it's legal to pass args[0] and args[1] to func(). If it's legal, then typeof(func(args[0],args[1])) will return some type, which then passes the check. If it's illegal, then the call to func has no type because the compiler rejects the call; say you have func(int,int) and you attempted to call func("a","a"). So then typeof(func(...)) will not be a valid type, and the check fails. On Wed, Jul 24, 2013 at 08:42:48PM +0200, JS wrote:
 On Wednesday, 24 July 2013 at 18:12:37 UTC, JS wrote:
 T,
 
 Also, I've tried to wrap the template in a function and I 
 can't get
 it to work:
 
 	
 	string join(T...)(string delim, T args)
 	{
 		return tupleStringReduce!(args);
 	}
 
 it seems tupleStringReduce just returns the tuple of of the
 optimized strings. I'll still need to write code that will deal
 mixing in the actual compile time and runtime code.
You can't return a tuple as a string. :) Mainly because the joining has only happened to the string literals, but you haven't told the compiler how the tuple itself is to be joined into a string. I thought I already gave example code to do this: string join(string[] args...) { string result; string delim; // You need this in order for the compiler to know what // you wanna do with the optimized tuple. foreach (arg; args) { result ~= delim ~ arg; delim = ","; // just for example } return result; } template optimizedJoin(args...) { // This wrapper optimizes the tuple and passes the // result to join(). string optimizedJoin() { return join(tupleReduce!((string x, string y) => x~y, args)); } } Note that you have to use template calling syntax for optimizedJoin, like this: string result = optimizedJoin!("a", "b", x, "c", "d"); I did try wrapping it so that you can use normal runtime call syntax, but it doesn't work because the tuple reduction must happen at template expansion stage. It's too late to do it in CTFE proper.
 This doesn't seem that hard as tupleReduce essentially 
 outlines the
 approach. The difference is instead of calling func directly I 
 sort
 of need to mix it in as a code string.
 
 e.g., for tupleStringReduce, func is x~y. I need to build a 
 string
 using that "((arg[0]~arg[1])~arg[2])..." which is then mixed 
 in at
 compile time but at runtime evaluates to a string(not an 
 array).
 
 Since there is no .codeof, AFAIK, I guess the only way to do 
 this is
 pass func, and the func code string("x~y") in this case.
 
 The idea though is to use the lamba as a ctfe to reduce the 
 tuple
 when possible but *also* use it at runtime to evaluate the 
 tuple.
[...] Alright, here's an improved version of the code: import std.stdio; template tuple(args...) { alias tuple = args; } /** * Given a binary function and a tuple, returns a tuple in which all adjacent * compile-time readable items are recursively substituted with the return * value of the binary function applied to them. */ template tupleReduce(alias func, args...) { static if (args.length > 1) { static if (__traits(compiles, { enum x = args[0]; })) { static if (__traits(compiles, { enum x = args[1]; }) && is(typeof(func(args[0], args[1])))) { enum result = func(args[0], args[1]); alias tupleReduce = tupleReduce!(func, result, args[2..$]); } else { alias tupleReduce = tuple!(args[0], args[1], tupleReduce!(func, args[2..$])); } } else { alias tupleReduce = tuple!(args[0], tupleReduce!(func, args[1..$])); } } else { alias tupleReduce = args; } } // I decided to use a real variadic instead of an array, to // force loop-unrolling. // Strictly speaking, we should add a signature constraint to // enforce uniform types in T..., but I was too lazy to figure // out how to phrase allSatisfy to check for that. For // production code, though you'll definitely want to do this. auto reduceImpl(alias fun, T...)(T args) { import std.functional; typeof(unaryFun!fun(args[0],args[0])) result; foreach (arg; args) { result = unaryFun!fun(result, arg); } return result; } // Nice wrapper for reduceImpl, so that we don't have to type // the lambda twice. template reduce(alias fun, args...) { string reduce() { return reduceImpl!fun(tupleReduce!(fun, args)); } } /** * Joins a list of strings. Adjacent CTFEable strings are joined * at compile-time into single strings, while runtime arguments * are joined at runtime. */ template join(args...) { string join() { // Behold, only one instance of the lambda! It // gets used both at compile-time and runtime. return reduce!((string x, string y) => x~y, args); } } void main() { // Example of how to use tupleReduce for compile-time // optimization of join: string x = "(runtime1)"; string y = "(runtime2)"; // Note that you still have to use compile-time argument // syntax for join. writeln(join!(x, "a", "bc", y, "de", "f", "g")); } Compiling with dmd -O -inline produced two nested function calls for the join!(...), though inside, the string literals have already been concatenated. Compiling with gdc -O3, OTOH, produced an unrolled loop in Dmain which basically sequentially appends to the result of join!(), i.e., it's as though I wrote: void main() { string x = "(runtime1)"; string y = "(runtime2)"; writeln(x ~ "abc" ~ y ~ "defg"); } You can't get much better than that. :) I think trying to repackage join!(...) as join(...) is a fool's errand, though. You'll likely only end up with pain. :) IMO writing join!(...) is a Good Enough(tm) compromise given that we have already achieved maximal compile-time optimization.
Cool, this is definitely on the right track! I like how you are able to generate the expansion reduction without string mixins(the naive way I would try and do it). The main things I was after are achieved: Loop unrolling, mixed-time constant folding, and variable arguments. While I think I understand the overall idea you are using it will take me a bit to grasp exactly how you were able to get it all done correctly. I have two requests if you don't mind and have the time... Allow for array reduction and delim/indexing. When using join we typically want to join an array of strings while also inserting a delimiter between them... usually with the last one removed. Sometimes we want to know the index and maximum array size to do specialized joins. e.g., join("a", "b", ["c", "d"]) we might want to print out "4: 1a, 2b, 3c, 4d". Using your lambda's this is somewhat easy to achieve, (string a, int i, int m) => { return ((i == 0) ? to!string(m)~": " : "")~to!string(i)~a~((i < m) ? ", " : ""); } this would be a unary applied to each string... either at compile time, if possible, or runtime. I'm not sure how to achieve this because of the runtime/compile time issue... but I think it's similar to how to mention the use of reduce being used at compile time and runtime. As far as the array issue goes, it seems that a check for an array is needed then func needs to be checked if it can work on the array element and applied in a similar fashion... but just too much complexity for my little brain to grasp ;/ Having such features should 1. Provide a join function that is optimal in all conditions but also mimic the traditional join function. 2. Demonstrate techniques that produce such optimal performing code and make normal functions more performant and expressive(e.g., works well with variable args). I can imagine an analogous Boost like library for D that provides the optimal performance possible by getting the compiler to do as much work as possible behind the scenes. Once I get these programming techniques down I'll hopefully be able to start implementing some of routines(string stuff, math, etc...). At that point, if no one beats me to the punch I'll try to see if it was all worth it in the first place ;)
Jul 25 2013
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jul 25, 2013 at 03:47:23PM +0200, JS wrote:
[...]
 Cool, this is definitely on the right track! I like how you are able
 to generate the expansion reduction without string mixins(the naive
 way I would try and do it).
String mixins are sorta like nuclear warheads, a last resort when absolutely nothing else will work. They're usually overkill for most tasks. Generally, other less drastic alternatives should be tried first. :)
 The main things I was after are achieved: Loop unrolling, mixed-time
 constant folding, and variable arguments.
While it has been a fun exercise, I still think loop unrolling is in the province of the compiler's optimizer (e.g. gdc -O3). Unless you're writing performance-critical inner loops in a 3D game engine or a hard real-time application, it seems like premature optimization to me. As for constant folding, I wonder if expression templates may be a better approach for the general case. Tuple reduction works well when the reduction can be done pairwise, but in more complicated cases, like optimizing matrix expressions, say, you probably need something more sophisticated. Still, I think it would be cleaner if the language provided a way for the compiler to support such optimizations, so that we can take advantage of its built-in optimizer instead of reinventing the wheel in user code. I've often thought about how one might go about specifying expression reduction rules for user-defined types, like associativity for matrix expressions, etc., that tell the compiler what kind of rules the custom operators obey, that the optimizer can make use of.
 While I think I understand the overall idea you are using it will
 take me a bit to grasp exactly how you were able to get it all done
 correctly.
Maybe you should take the time to study the code, then next time you can do it yourself. ;-)
 I have two requests if you don't mind and have the time...
 
 Allow for array reduction and delim/indexing. When using join we
 typically want to join an array of strings while also inserting a
 delimiter between them... usually with the last one removed.
We already have std.algorithm.join... yes, yes I know you want compile-time optimization, so that can be an interesting exercise for your learning. ;-) But for starters, one way to achieve this is to preprocess your tuple to insert delimiters between elements, then hand it over to tupleReduce to optimize it. Something like this might do it: // Warning: untested code template insertDelim(string delim, args...) { static if (args.length > 1) { alias insertDelim = tuple!(args[0], delim, insertDelim!(args[1..$])); } else { alias insertDelim = args; } } // This should do the trick. Use insertDelim to insert the // delimiters, then tupleReduce it to concatenate all // compile-time-known arguments, then hand over the rest to the // runtime function as before. template join(string delim, args...) { string join() { return joinImpl(tupleReduce!( (string x, string y) => x~y, insertDelim!(delim, args))); } } If I didn't make a mistake in the above code, it should be able to optimize: join!(",", "a", "b", x, "c", "d"); into: "a,b," ~ x ~ ",c,d"
 Sometimes we want to know the index and maximum array size to do
 specialized joins.
[...] That's trivial, tuples can be indexed just like arrays at compile-time, and have a .length property just like arrays. T -- "Real programmers can write assembly code in any language. :-)" -- Larry Wall
Jul 25 2013
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jul 25, 2013 at 07:57:30AM -0700, H. S. Teoh wrote:
 On Thu, Jul 25, 2013 at 03:47:23PM +0200, JS wrote:
[...]
 Allow for array reduction and delim/indexing. When using join we
 typically want to join an array of strings while also inserting a
 delimiter between them... usually with the last one removed.
[...] Sigh... I could never resist a challenge. Here's a full, working, compilable, runnable implementation: import std.stdio; template tuple(args...) { alias tuple = args; } /** * Given a binary function and a tuple, returns a tuple in which all adjacent * compile-time readable items are recursively substituted with the return * value of the binary function applied to them. */ template tupleReduce(alias func, args...) { static if (args.length > 1) { static if (__traits(compiles, { enum x = args[0]; })) { static if (__traits(compiles, { enum x = args[1]; }) && is(typeof(func(args[0], args[1])))) { enum result = func(args[0], args[1]); alias tupleReduce = tupleReduce!(func, result, args[2..$]); } else { alias tupleReduce = tuple!(args[0], args[1], tupleReduce!(func, args[2..$])); } } else { alias tupleReduce = tuple!(args[0], tupleReduce!(func, args[1..$])); } } else { alias tupleReduce = args; } } auto reduceImpl(alias fun, T...)(T args) { import std.functional; typeof(unaryFun!fun(args[0],args[0])) result; foreach (arg; args) { result = unaryFun!fun(result, arg); } return result; } template reduce(alias fun, args...) { string reduce() { return reduceImpl!fun(tupleReduce!(fun, args)); } } template join(args...) { string join() { return reduce!((string x, string y) => x~y, args); } } template insertDelim(string delim, args...) { static if (args.length > 1) alias insertDelim = tuple!(args[0], delim, insertDelim!(delim, args[1..$])); else alias insertDelim = args; } template joinWithDelim(string delim, args...) { alias joinWithDelim = join!(insertDelim!(delim, args)); } void joinDelimTest() { // Example of joinWithDelim string x = "runtime1"; string y = "runtime2"; string z = "runtime3"; writeln(joinWithDelim!(":", "a", "b", x, "c", y, z, "d")); // Here's the proof that the delimiters have been concatenated with // adjacent items at compile-time where possible: writeln([ tupleReduce!((string x, string y) => x~y, insertDelim!(":", "a", "b", x, "c", y, z, "d")) ]); } void main() { joinDelimTest(); } The key points of note are the joinWithDelim template, which implements a fully-working join with specifiable delimiter, and the code in joinDelimTest() that illustrates how to use it and what it does. The program produces this output: a:b:runtime1:c:runtime2:runtime3:d ["a:b:", "runtime1", ":c:", "runtime2", ":", "runtime3", ":d"] The last line indicates that the delimiters have been concatenated with all adjacent compile-time readable arguments, where possible. So basically, we start with: "a" "b" x "c" y z "d" then delimiters are inserted: "a" ":" "b" ":" x ":" "c" ":" y ":" z ":" "d" then this is optimized at compile-time into: "a:b:" x ":c:" y ":" z ":d" By capturing this stage in an array, we get to see what the optimized tuple looks like (2nd line of output), before joinWithDelim, at runtime, finally interpolates x, y, and z into the result (1st line of output). Looking at the disassembly of the program, compiled with gdc -O3, the call to joinWithDelim is inside its own function (reasonable, since it's rather large), in which there are 7 array concatenations, as expected. (If the delimiters weren't optimized at compile-time, we'd have 13 concatenations instead.) That is, it's as if we wrote: "a:b" ~ x ~ ":c:" ~ y ~ ":" ~ z ~ ":d" (The first concatenation happens on the empty array of the return value; that's why there are 7 rather than 6.) Exercise for the reader: ;-) Large numbers of array concatenations are not efficient at runtime. Ideally, if the number of elements to be concatenated is > N, for some threshold N, we should use std.array.appender instead of the built-in ~. Rewrite the code to do this switchover to std.array.appender given some value of N, say 5. (Hint: since the length of tuples is conveniently available at compile-time as .length, we can just check if .length is greater than 5, and if so, instead of using ~ we call a different template function that declares a string appender, then use .put on it in a foreach over the tuple -- remember that foreach over tuples are expanded at compile-time.) (Unfortunately, this means that we can't use reduce!() anymore, since appender can only be used at runtime; at compile-time we still have to use ~. So we can't use the same lambda at both compile-time and runtime.) T -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? -- Michael Beibl
Jul 25 2013
parent reply "JS" <js.mdnq gmail.com> writes:
Thanks.. I am working on getting the code to work with unrolling 
an array:


module main;

import std.stdio, std.traits;

template tuple(args...)
{
     alias tuple = args;
}

template tUnrollArray(args...)
{
	static if (isArray!(typeof(args)))
	{
		pragma(msg, "isarray");
		static if (args.length > 1)
		{
			pragma(msg, "unrolling");
			alias tUnrollArray = tuple!(arg[0], args[1..$]);
		}
		else
		{
			static if (args[0].length > 1)
			{
				pragma(msg, "1");
				alias tUnrollArray = tuple!(args[0][0], 
tUnrollArray!(args[0][1..$]));
			}
			else
			{
				pragma(msg, "2");
				alias tUnrollArray = tuple!(args[0][0]);
			}
		}
	}
	else
		alias tUnrollArray = args;
}

void main(string[] argv)
{
	auto z = ["a", "b"];
         writeln(tUnrollArray!(["a", "b"]));
	writeln(tUnrollArray!(z));    // fails
	readln();
}


which seems to follow along the lines of your tuple reduce.

The problem is that I would like for it to unroll compile time 
known variables. I think we talked about that already and you 
said it wasn't a good idea because of potential infinite loops 
for some types of templates(which, I still think in most cases it 
should be ok, such as above.

Making z immutable does work but I don't get why making it static 
doesn't.

In any case, if the code looks correct I'll try and add the 
ability to have tupleReduce join elements that are arrays.

The alias stuff is starting to make some sense but having to do 
recursion for simple iterations isn't very fun ;/
Jul 25 2013
parent reply "JS" <js.mdnq gmail.com> writes:
The following code works better. Forgot to allow for multiple 
arguments. Runtime variables are not unrolled though but I'll 
work on understanding how you were able to unify the two as one(I 
think in your reduceImpl).

module main;

import std.stdio, std.traits;

template tuple(args...)
{
     alias tuple = args;
}

template tUnrollArgs(args...)
{
	static if (isArray!(typeof(args[0])))
	{
		pragma(msg, "isarray");
		static if (args.length > 1)
		{
			pragma(msg, "unrolling");
			alias tUnrollArgs = tuple!(tUnrollArgs!(args[0]), args[1..$]);
		}
		else
		{
			static if (args[0].length > 1)
			{
				pragma(msg, "1");
				alias tUnrollArgs = tuple!(args[0][0], 
tUnrollArgs!(args[0][1..$]));
			}
			else
			{
				pragma(msg, "2");
				alias tUnrollArgs = tuple!(args[0][0]);
			}
		}
	}
	else
		static if (args[0].length > 1)
			alias tUnrollArgs = tuple!(args[0], 
tUnrollArgs!(args[0][1..$]));
		else
			alias tUnrollArgs = args;
}

void main(string[] argv)
{
	auto z = ["1", "2"];
     writeln(tUnrollArgs!(["a", "b"], z, "c"));
	readln();
}
Jul 26 2013
parent "JS" <js.mdnq gmail.com> writes:
oops, that was a mess... I guess I need to learn how to program. 
This one is not much better but at least works. I'll have to 
rewrite the code.


module main;

import std.stdio, std.traits;

template tuple(args...)
{
     alias tuple = args;
}

template tUnrollArgs(args...)
{
	pragma(msg, ">"~args.stringof);
	static if (args.length == 0)
		alias tUnrollArgs = tuple!();
	else
	static if (isArray!(typeof(args[0])))
	{
		pragma(msg, "isarray");
		static if (args[0].length > 1)
		{
			pragma(msg, "unrolling");
			static if (args[0].length > 1)
			{
				pragma(msg, "length > 1");
				alias tUnrollArgs = tuple!(tUnrollArgs!(args[0][0]), 
tUnrollArgs!(args[0][1..$],args[1..$]));
			}
			else
			{
				pragma(msg, "length == 0 || 1");
				alias tUnrollArgs = args[0];
			}
		}
		else
		{
			static if (args[0].length > 1)
			{
				pragma(msg, "1");
				alias tUnrollArgs = tuple!(args[0][0], 
tUnrollArgs!(args[0][1..$], args[1..$]));
			}
			else
			{
				pragma(msg, "2");
				alias tUnrollArgs = tuple!(args[0][0], 
tUnrollArgs!(args[1..$]));
			}
		}
	}
	else
		//static if (args[0].length > 1)
			//alias tUnrollArgs = tuple!(args[0], 
tUnrollArgs!(args[0][1..$]));
		//else
			alias tUnrollArgs = args;
}

void main(string[] argv)
{
	immutable auto z = ["1", "2"];
     writeln(tUnrollArgs!(["a", "b"], z, "c"));
	readln();
}
Jul 26 2013