digitalmars.D - Compile-time optimization

JS (179/179) Jul 23 2013 There seems to be a lot of improvement for programming languages

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (8/17) Jul 23 2013 Not me. :(

monarch_dodra (38/44) Jul 24 2013 Hum... really? I don't think that exists. Either that, or I'm

monarch_dodra (4/5) Jul 24 2013 Goddamnit. There goes my credibility -_-'

JS (3/8) Jul 24 2013 lol... I was wondering were the hell that came from. I thought

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (3/6) Jul 24 2013 You are right. I must be remembering old proposals about it.

H. S. Teoh (11/20) Jul 24 2013 [...]

JS (5/5) Jul 24 2013 T,

JS (24/24) Jul 24 2013 On Wednesday, 24 July 2013 at 18:12:37 UTC, JS wrote:

monarch_dodra (28/48) Jul 24 2013 That's not true either. The code will compile, and CTFE will only

H. S. Teoh (65/85) Jul 23 2013 [...]

Jesse Phillips (3/10) Jul 23 2013 Heh, missed this. Forgot about recursive aliasing too.

Jesse Phillips (9/13) Jul 23 2013 To summarize to the best of my knowledge, the request is to
H. S. Teoh (86/93) Jul 23 2013 [...]
H. S. Teoh (46/48) Jul 23 2013 [...]

JS (17/72) Jul 23 2013 I must study your powerful ctfe-fu master! Great fu you have!

Yota (30/43) Jul 23 2013 Don't forget that tuples can store more than just types! See

Yota (5/12) Jul 23 2013 *facepalm*
JS (48/97) Jul 23 2013 Well, it's not that simple. Constant folding is easy when it is

H. S. Teoh (52/69) Jul 23 2013 [...]

JS (35/143) Jul 24 2013 Your tuple reduce seems to work! Debugging the code shows no

H. S. Teoh (73/85) Jul 23 2013 Well, if you want *that*, then you'll just have to submit a DMD pull

H. S. Teoh (8/11) Jul 23 2013 Oops, forgot to include the definition of x and y in main():
H. S. Teoh (150/186) Jul 24 2013 That's the usual idiom for checking if it's legal to pass args[0] and

JS (42/252) Jul 25 2013 Cool, this is definitely on the right track! I like how you are

H. S. Teoh (62/77) Jul 25 2013
H. S. Teoh (148/152) Jul 25 2013 [...]

JS (55/55) Jul 25 2013 Thanks.. I am working on getting the code to work with unrolling

JS (48/48) Jul 26 2013 The following code works better. Forgot to allow for multiple

JS (62/62) Jul 26 2013 oops, that was a mess... I guess I need to learn how to program.

"JS" <js.mdnq gmail.com> writes:

There seems to be a lot of improvement for programming languages 
to optimize compile time aspects that are not taken into account. 
With ctfe I think such optimizations are more and more relevant 
in D.

I'll give a simple example:

Take a standard string joining function:

string join(string[] s)
{
     string ss;
     foreach(sss; s) ss ~= sss;
     return ss
}

when this string function is called at run-time with literal 
arguments it is extremely inefficient:

join(["a", "b", "c"]);

because the result can obviously be computed at compile-time by 
the compiler.

using ctfe can solve this problem but only when used in const 
results.

But there should be no reason why join(["a", "b", "c"]) can't be 
optimized at compile time so that the function call is replaced 
with "abc".

Now, think of join as defined like

string join(T...)(T t);

and we call it like

join("a", s, "b", "c", ss);

where s is string that is compile time evaluable and ss is a 
run-time string array.

the optimal join code would be something like

r = "a"~s~"bc"
foreach(sss; ss) r ~= sss;

(and which would change depending on the argument types and if 
they are compile time evaluable)


To generate such "optimal" code(one that gets the compiler to do 
as much work as it can at compile time) is somewhat convoluted in 
D. We can use string mixins to solve the problem but not easily 
because we can't pass variadic arguments to templates.

e.g.,

join(T...)(T s)
{
     mixin(tJoinH!(s))
}

fails because s is not known at compile time... even if s is 
partially known.

It would be nice if we could pass the arguments to a template 
even if they are not completely known at compile time.

e.g., join("a", s),

tJoinH can receive "a" and s but s's value, if not compile-time 
known, is undefined.

Hence, tJoinH can optimize the arguments by attempting to join 
the compile-time arguments and insert optimal code for the 
run-time arguments.

That is, if we call a function with an argument list, we should 
be able to pass that to a template without any errors and be able 
to determine the value of each argument... if the argument is 
undefined, then that becomes it's value.

Another way to see it work is with sum:

sum(1,2,x);

int sum(T...)(T i)
{
     // return 1 + 2 + i[3]; for the call above, it is optimal and 
our goal to generate
     // return mixin(tSumH!(i)); // hypothetically generates 1 + 2 
+ i[3] for the call above
     // return mixin(tSumH2!(i)); // generates i[1] + i[2] + i[3]. 
Current possible way

     // naive approach(assumes all arguments are known only at 
run-time):
     int y;
     foreach(ii; i) y += ii;
     return y;
}


I'm able to achieve this partially by passing T, rather than i, 
to the template function and the variable name, i. This works but 
may not produce optimal results(referencing the stack instead of 
the value of a literal... which prevents it from being further 
optimized).

e.g., from above, instead of return 1 + 2 + i[3]; I would 
generate return i[1] + i[2] + i[3]; Not optimal because i[1] and 
i[2] are treated as run-time entities even though in the above 
case they are not.


If D had some model to create functions that can optimize the way 
they deal with their compile-time arguments and run-time 
arguments then it would be pretty easy to create such functions.

e.g.,

string join(T...)(T t)    // Note join has a variadic parameter
{
     ctfe {
          // called when join is used in ctfe, t can be used to 
get the values directly if they exist at compile time, else their 
value is undefined.
          // can call join here with run-time entities

     }

     // runtime version goes here
}

we might have something like this

string join(T...)(T t) // easy notation to constrain T on char, 
string, string[]
{
     ctfe
     {
         string code = "string s;";
         foreach(k, val; t)
         {
             if (isDefined!val)
                 code ~= `s ~= "`~join(val)~`";`; // calls join as 
a ctfe on val.
             else
             {
                 if (type == string[])
                     code ~= `foreach(ss; t[`~k~`]) s~= sss;`;
                 } else code ~= `s ~= t[`~k~`];`;
             }
         }
         code ~= "return s;";
         mixin code;
     }

     // concatenates all arguments of t as if they were run-time 
arguments.
     string s;
     foreach(k, ss; t)
         static if (T[k].type == string[])
             foreach(sss; ss)
                 s ~= sss;
         else
             s ~= ss;
     return s;
}


I'm not saying this works but the idea is that the ctfe block is 
executed at compile time WHEN there are non-compile time 
arguments(ok, maybe ctfe isn't the best keyword). The block then 
figures out how to best deal with the non-ct arguments.

In the above code note how join is called in the ctfe block to 
handle known compile time types. This is just standard ctfe 
execution. The real "magic" happens when it is dealing with 
non-ct arguments, in which case it builds up meta code which is 
used as a mixin(special statement at end of ctfe block) that is 
inserted.

So for join("a", s); the code would execute something like

at compile time(in ctfe block)
     code = `string s;`;
     "a" is defined so
     code ~= `s ~= "~join("a")~`";`;
         join("a") is called, since "a" is known, the non-ctfe 
block is called, so we actually have
     code ~= `s ~= "a";`;
  next for s(which is undefined at compile time)
     code ~= `s ~= t[1];`;
  end
     code ~= "return s;";
so the string mixin code looks like:

     string s;
     s ~= "a";
     s ~= t[1];
     return s;

or simplified,

    return "a"~t[1];

which is an optimal join code for the join call join("a", s);

(it would make more sense to call a template in the ctfe block so 
we don't have form code strings of code strings)


Note also that having such a ctfe block does not break existing 
code structures and pre existing functions can easily be upgraded 
for optimal behavior.


Having the compiler do as much optimization as possible should 
produce a significant speedup in most applications.

When a function is called with any known compile time entities, 
it is possible for an optimization to take place. I think if D 
incorporates some mechanism in it's language to do so it will go 
a long.

In fact, all this simply demonstrates that ctfe's are not as 
powerful as they can be. Calling a ctfe'able function with just 
one run-time variable will prevent it from being ctfe'ed and 
optimized completely... even though there is still some 
possibility for optimization.


Hopefully some will read this acutely and attempt to ponder its 
potential.  I expect the immediate naysayers who can't see past 
their own noses... and those that get hung up on non-essential 
details(I said that the block doesn't have to be called ctfe... 
oh hell, you do realize we don't even need to use a block?).

Jul 23 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 07/23/2013 09:21 AM, JS wrote:

 string join(T...)(T t)    // Note join has a variadic parameter
 {
      ctfe {

Are you aware of version(ctfe)?

 Hopefully some will read this acutely

Not me. :(

 and attempt to ponder its potential.

I understood that much though...

 I expect the immediate naysayers who can't see past their own 

noses... and
 those that get hung up on non-essential details(I said that the block
 doesn't have to be called ctfe... oh hell, you do realize we don't 

even need
 to use a block?).

That's me! :)

Ali

Jul 23 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Tuesday, 23 July 2013 at 17:41:01 UTC, Ali Çehreli wrote:
 On 07/23/2013 09:21 AM, JS wrote:

 string join(T...)(T t)    // Note join has a variadic

 parameter
 {
      ctfe {

 Are you aware of version(ctfe)?

Hum... really? I don't think that exists. Either that, or I'm 
using it wrong. Could you provide an example?

What you do have is "if (__ctfe)" though...

//----
import std.stdio;

int foo()
{
     version (poop)
         return 1;
     else
         return 0;
}

void main(string[] args)
{
     enum a = foo();
     auto b = foo();
     writeln(a, " ", b);
}
//----
Prints: 0 0

//----
import std.stdio;

int foo()
{
     if (__ctfe)
         return 1;
     else
         return 0;
}

void main(string[] args)
{
     enum a = foo();
     auto b = foo();
     writeln(a, " ", b);
}
//----
Prints: 1 0

Jul 24 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Wednesday, 24 July 2013 at 09:56:41 UTC, monarch_dodra wrote:
     version (poop)

Goddamnit. There goes my credibility -_-'

I was testing to see if I got a compile error with an inexistant 
version tag. Please replace that with "ctfe"

Jul 24 2013

"JS" <js.mdnq gmail.com> writes:

On Wednesday, 24 July 2013 at 09:58:02 UTC, monarch_dodra wrote:
 On Wednesday, 24 July 2013 at 09:56:41 UTC, monarch_dodra wrote:
    version (poop)

 Goddamnit. There goes my credibility -_-'

 I was testing to see if I got a compile error with an 
 inexistant version tag. Please replace that with "ctfe"

lol... I was wondering were the hell that came from. I thought 
maybe it meant you wrote that part of the code on the toilet ;)

Jul 24 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 07/24/2013 02:56 AM, monarch_dodra wrote:

 On Tuesday, 23 July 2013 at 17:41:01 UTC, Ali Çehreli wrote:

 Are you aware of version(ctfe)?

 Hum... really? I don't think that exists.

You are right. I must be remembering old proposals about it.

Ali

Jul 24 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Jul 24, 2013 at 09:22:14AM -0700, Ali �ehreli wrote:
 On 07/24/2013 02:56 AM, monarch_dodra wrote:
 
 On Tuesday, 23 July 2013 at 17:41:01 UTC, Ali �ehreli wrote:

 
 Are you aware of version(ctfe)?

 Hum... really? I don't think that exists.

 
 You are right. I must be remembering old proposals about it.

[...]

There's if(__ctfe), but unfortunately (IIRC) it doesn't work with static
if, because CTFE runs after static ifs have been processed. So it might
not be enough if you have code that has parts that can't run under CTFE.
Those parts will taint all parent functions up the call tree as
non-CTFEable, so once you have them you're out of luck.


T

-- 
We are in class, we are supposed to be learning, we have a teacher... Is
it too much that I expect him to teach me??? -- RL

Jul 24 2013

"JS" <js.mdnq gmail.com> writes:

T,

I've looked over your code and understand what you are doing but 
I don't understand the expression

      is(typeof(func(args[0], args[1]))))

you are checking if func(a,b)'s return type is a type?

Jul 24 2013

"JS" <js.mdnq gmail.com> writes:

On Wednesday, 24 July 2013 at 18:12:37 UTC, JS wrote:
T,

Also, I've tried to wrap the template in a function and I can't 
get it to work:

	
	string join(T...)(string delim, T args)
	{
		return tupleStringReduce!(args);
	}

it seems tupleStringReduce just returns the tuple of of the 
optimized strings. I'll still need to write code that will deal 
mixing in the actual compile time and runtime code.

This doesn't seem that hard as tupleReduce essentially outlines 
the approach. The difference is instead of calling func directly 
I sort of need to mix it in as a code string.

e.g., for tupleStringReduce, func is x~y. I need to build a 
string using that "((arg[0]~arg[1])~arg[2])..." which is then 
mixed in at compile time but at runtime evaluates to a string(not 
an array).

Since there is no .codeof, AFAIK, I guess the only way to do this 
is pass func, and the func code string("x~y") in this case.

The idea though is to use the lamba as a ctfe to reduce the tuple 
when possible but *also* use it at runtime to evaluate the tuple.


Any ideas?

Jul 24 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Wednesday, 24 July 2013 at 17:10:01 UTC, H. S. Teoh wrote:
 On Wed, Jul 24, 2013 at 09:22:14AM -0700, Ali Çehreli wrote:
 On 07/24/2013 02:56 AM, monarch_dodra wrote:
 
 On Tuesday, 23 July 2013 at 17:41:01 UTC, Ali Çehreli wrote:

 
 Are you aware of version(ctfe)?

 Hum... really? I don't think that exists.

 
 You are right. I must be remembering old proposals about it.

 [...]

 There's if(__ctfe), but unfortunately (IIRC) it doesn't work 
 with static
 if, because CTFE runs after static ifs have been processed. So 
 it might
 not be enough if you have code that has parts that can't run 
 under CTFE.
 Those parts will taint all parent functions up the call tree as
 non-CTFEable, so once you have them you're out of luck.

 T

That's not true either. The code will compile, and CTFE will only 
complain if it *runs* into code that can't be executed 
CTFE-style. For example:

//----
import std.stdio;
import core.stdc.string;

int foo()
{
     if(__ctfe)
     {
         return 1;
     }
     else
     {
         int i = void;
         memset(&i, 0, 4);
         return i;
     }
}

void main(string[] args)
{
     enum a = foo();
     auto b = foo();
     writeln(a, " ", b);
}
//----

This compiles and runs fine, even though memset is not ctfe-able.

Jul 24 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jul 23, 2013 at 06:21:08PM +0200, JS wrote:
 There seems to be a lot of improvement for programming languages to
 optimize compile time aspects that are not taken into account. With
 ctfe I think such optimizations are more and more relevant in D.

[...]
 But there should be no reason why join(["a", "b", "c"]) can't be
 optimized at compile time so that the function call is replaced with
 "abc".

[...]
 To generate such "optimal" code(one that gets the compiler to do as
 much work as it can at compile time) is somewhat convoluted in D. We
 can use string mixins to solve the problem but not easily because we
 can't pass variadic arguments to templates.
 
 e.g.,
 
 join(T...)(T s)
 {
     mixin(tJoinH!(s))
 }
 
 fails because s is not known at compile time... even if s is
 partially known.

[...]

Your CTFE-fu isn't good enough. ;-) Check this out:

	import std.stdio;

	template tuple(args...) {
		alias tuple = args;
	}

	/**
	 * Given a tuple of strings, returns a tuple in which all
	 * adjacent compile-time readable strings are concatenated.
	 */
	template tupleReduce(args...)
	{
		static if (args.length > 1)
		{
			static if (is(typeof(args[0])==string) &&
				__traits(compiles, { enum x = args[0]; }))
			{
				static if (is(typeof(args[1])==string) &&
					__traits(compiles, { enum x = args[1]; }))
				{
					alias tupleReduce = tupleReduce!(args[0] ~
								args[1], args[2..$]);
				}
				else
				{
					alias tupleReduce = tuple!(args[0], args[1],
								tupleReduce!(args[2..$]));
				}
			}
			else
			{
				alias tupleReduce = tuple!(args[0],
							tupleReduce!(args[1..$]));
			}
		}
		else
		{
			alias tupleReduce = args;
		}
	}

	void main() {
		string x = "runtime1";
		string y = "runtime2";
		auto arr = [ tupleReduce!("a", "b", x, "c", "d", y, "e", "f") ];
		writeln(arr);
	}

The output is:

	["ab", "runtime1", "cd", "runtime2", "ef"]

which proves that all adjacent compile-time readable strings have been
concatenated at compile-time.

Using tupleReduce, you can optimize calls to join by writing:

	join(tupleReduce!("a", "b", x, "c", "d"));

and it will be compiled as though you've written:

	join("ab", x, "cd");

Yeah, so tupleReduce looks a bit on the ugly side. But the point is that
it works. So don't be so quick to say something is not possible in D,
'cos even with its current warts, D is one powerful beast.  :-)


T

-- 
One disk to rule them all, One disk to find them. One disk to bring them
all and in the darkness grind them. In the Land of Redmond where the
shadows lie. -- The Silicon Valley Tarot

Jul 23 2013

"Jesse Phillips" <Jesse.K.Phillips+D gmail.com> writes:

On Tuesday, 23 July 2013 at 18:10:54 UTC, H. S. Teoh wrote:
 	void main() {
 		string x = "runtime1";
 		string y = "runtime2";
 		auto arr = [ tupleReduce!("a", "b", x, "c", "d", y, "e", "f") 
 ];
 		writeln(arr);
 	}

Heh, missed this. Forgot about recursive aliasing too.

My suggestion also is bogus since literals can't be aliased.

Jul 23 2013

"Jesse Phillips" <Jesse.K.Phillips+D gmail.com> writes:

On Tuesday, 23 July 2013 at 16:21:10 UTC, JS wrote:
 Now, think of join as defined like

 string join(T...)(T t);

 and we call it like

 join("a", s, "b", "c", ss);

To summarize to the best of my knowledge, the request is to 
provide

string join(alias T...)() {}

With the call changed to:

join!("a", s, "b", "c", ss);

I could be wrong here.

I think think this is a reasonable request, but haven't looked at 
any implications of adding it.

Jul 23 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jul 23, 2013 at 11:09:24AM -0700, H. S. Teoh wrote:
[...]
 Using tupleReduce, you can optimize calls to join by writing:
 
 	join(tupleReduce!("a", "b", x, "c", "d"));
 
 and it will be compiled as though you've written:
 
 	join("ab", x, "cd");

[...]

I've generalized tupleReduce to handle reduction of tuples by any
CTFE-able binary function:

	import std.stdio;
	
	template tuple(args...) {
		alias tuple = args;
	}
	
	/**
	 * Given a binary function and a tuple, returns a tuple in which all adjacent
	 * compile-time readable items are recursively substituted with the return
	 * value of the binary function applied to them.
	 */
	template tupleReduce(alias func, args...)
	{
		static if (args.length > 1)
		{
			static if (__traits(compiles, { enum x = args[0]; }))
			{
				static if (__traits(compiles, { enum x = args[1]; }) &&
					is(typeof(func(args[0], args[1]))))
				{
					enum result = func(args[0], args[1]);
					alias tupleReduce = tupleReduce!(func, result,
						args[2..$]);
				}
				else
				{
					alias tupleReduce = tuple!(args[0], args[1],
						tupleReduce!(func, args[2..$]));
				}
			}
			else
			{
				alias tupleReduce = tuple!(args[0],
					tupleReduce!(func, args[1..$]));
			}
		}
		else
		{
			alias tupleReduce = args;
		}
	}
	
	/**
	 * Given a tuple of strings, returns a tuple in which all adjacent compile-time
	 * readable strings are concatenated.
	 */
	template tupleStringReduce(args...)
	{
		alias tupleStringReduce = tupleReduce!(
			(string x, string y) => x~y, args
		);
	}
	
	void main() {
		string x = "runtime1";
		string y = "runtime2";
		auto arr = [ tupleStringReduce!("a", "b", x, "c", "d", y, "e", "f", "g", x) ];
		writeln(arr);
	
		int i1=100, i2=200;
		auto brr = [ tupleReduce!((int x, int y) => x+y, 1, 2, i1, 3, 4, i2, 5, 6, 7)
];
		writeln(brr);
	}

The output is:

	["ab", "runtime1", "cd", "runtime2", "efg", "runtime1"]
	[3, 100, 7, 200, 18]

For the string case, to make user code more convenient, I've written an
alias template tupleStringReduce() that basically supplies the binary
function for concatenating adjacent strings.

The second case shows how this can be explicitly specified via a
function literal. The example here reduces adjacent compile-time
readable ints via addition; you could substitute that with any binary
function you want, say (x,y)=>x*y, or (x,y)=>x*2-y, or whatever. The
only limit is that the function must be CTFE-able.

With the help of the new tupleReduce, you can do all sorts of
compile-time optimizations.

D gives you power -- awesome power -- but you do have to learn how to
tame the beast first. ;-)


T

-- 
Beware of bugs in the above code; I have only proved it correct, not tried it.
-- Donald Knuth

Jul 23 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jul 23, 2013 at 12:42:18PM -0700, H. S. Teoh wrote:
[...]
 With the help of the new tupleReduce, you can do all sorts of
 compile-time optimizations.

[...]

Here's an example of how to implement compile-time optimization of
join() with the help of tupleReduce:

	// Just for the sake of example, the following function uses ","
	// to join the strings in the arguments.
	string join(string[] args...) {
		string result;
		string delim;
		foreach (arg; args) {
			result ~= delim ~ arg;
			delim = ",";	// just for example
		}
		return result;
	}

	// This is the wrapper that uses tupleReduce (see my previous
	// post) to reduce the arguments at compile-time. For
	// illustration purposes, I omitted the "," from the reduction
	// function, so that you can tell from the output what has been
	// reduced at compile-time, and what is reduced at runtime.
	template optimizedJoin(args...)
	{
		string optimizedJoin() {
			return join(tupleReduce!((string x, string y) => x~y, args));
		}
	}

	void main() {
		writeln(optimizedJoin!(x, "a", "bc", y, "de", "f"));
	}

The output is:

	runtime1,abc,runtime2,def

As you can see, "a" and "bc" have been concatenated at compile-time, and
so have "de" and "f". Since x and y can't be read at compile-time, they
remain as variable arguments. The optimizedJoin template then collects
the elements of the reduced tuple and hands it off to join() to do the
rest of the work at runtime.

Notice that join() uses "delim ~ arg" to do the concatenation at
runtime; we could ostensibly replace this with the same function literal
we pass to tupleReduce in optimizedJoin(). Then you'll have a full
implementation of join() that performs compile-time optimization of its
parameters.

I rest my case. :-)


T

-- 
In order to understand recursion you must first understand recursion.

Jul 23 2013

"JS" <js.mdnq gmail.com> writes:

On Tuesday, 23 July 2013 at 19:59:46 UTC, H. S. Teoh wrote:
 On Tue, Jul 23, 2013 at 12:42:18PM -0700, H. S. Teoh wrote:
 [...]
 With the help of the new tupleReduce, you can do all sorts of
 compile-time optimizations.

 [...]

 Here's an example of how to implement compile-time optimization 
 of
 join() with the help of tupleReduce:

 	// Just for the sake of example, the following function uses 
 ","
 	// to join the strings in the arguments.
 	string join(string[] args...) {
 		string result;
 		string delim;
 		foreach (arg; args) {
 			result ~= delim ~ arg;
 			delim = ",";	// just for example
 		}
 		return result;
 	}

 	// This is the wrapper that uses tupleReduce (see my previous
 	// post) to reduce the arguments at compile-time. For
 	// illustration purposes, I omitted the "," from the reduction
 	// function, so that you can tell from the output what has been
 	// reduced at compile-time, and what is reduced at runtime.
 	template optimizedJoin(args...)
 	{
 		string optimizedJoin() {
 			return join(tupleReduce!((string x, string y) => x~y, args));
 		}
 	}

 	void main() {
 		writeln(optimizedJoin!(x, "a", "bc", y, "de", "f"));
 	}

 The output is:

 	runtime1,abc,runtime2,def

 As you can see, "a" and "bc" have been concatenated at 
 compile-time, and
 so have "de" and "f". Since x and y can't be read at 
 compile-time, they
 remain as variable arguments. The optimizedJoin template then 
 collects
 the elements of the reduced tuple and hands it off to join() to 
 do the
 rest of the work at runtime.

 Notice that join() uses "delim ~ arg" to do the concatenation at
 runtime; we could ostensibly replace this with the same 
 function literal
 we pass to tupleReduce in optimizedJoin(). Then you'll have a 
 full
 implementation of join() that performs compile-time 
 optimization of its
 parameters.

 I rest my case. :-)


 T


I must study your powerful ctfe-fu master! Great fu you have!

This is great that D can do this and from just a quick glance it 
looks like you solve the problem.

Your code doesn't exactly do what I wanted but probably because I 
wasn't clear... I think it is modifiable and solves the problem I 
was having(again, assuming your code is correct).

Note though that something like

string x = "asd"
join(x,...); // x is compile time known and should be joined at 
compile time. Not sure if D is smart enough to get this one?

When I get some time later I'll check your code out, what I want 
to do is get something like

string x = join(a, b, "a", "b"); to produce the same code as
string x = a~b~"ab";

(no looping using for each except for string arrays)

Thanks!

Jul 23 2013

"Yota" <yotaxp thatGoogleMailThing.com> writes:

On Tuesday, 23 July 2013 at 20:11:21 UTC, JS wrote:
 Your code doesn't exactly do what I wanted but probably because 
 I wasn't clear... I think it is modifiable and solves the 
 problem I was having(again, assuming your code is correct).

 Note though that something like

 string x = "asd"
 join(x,...); // x is compile time known and should be joined at 
 compile time. Not sure if D is smart enough to get this one?

 When I get some time later I'll check your code out, what I 
 want to do is get something like

 string x = join(a, b, "a", "b"); to produce the same code as
 string x = a~b~"ab";

 (no looping using for each except for string arrays)

 Thanks!


Don't forget that tuples can store more than just types!  See 
this for instance.

   import std.stdio;

   typeof(Xs[0]) sum(Xs...)()
   {
       pragma(msg, Xs); // tuple(1, 2, 3)
       typeof(Xs[0]) y;
       foreach(x; Xs) y ~= x;
       return y;
   }

   int main() {
       auto n = sum!(1, 2, 3);
       enum m = sum!(1, 2, 3);

       pragma(msg, typeof(n)); // int
       writeln(n); // 6
       writeln(m); // 6

       return 0;
   }

Works great.  When I comment out the 'sum!("1", "2", "3")' and 
replace it with a literal '6', the compiled EXE is byte for byte 
exactly the same.  You can exchange the '+=' for '~=' and it will 
concatenate strings instead.  Of course, now all arguments MUST 
be known at compile time, but that's no surprise.

PS: OK, just noticed that you were hoping to mix run time and 
compile time arguments.  Oops....
Oh well, posting anyway because I think it's neat. lol

As for a~"B"~"C" compiling into a~"BC", I think we just need to 
work on constant folding in the compiler. (If it doesn't already 
do this anyway.)

Jul 23 2013

"Yota" <yotaxp thatGoogleMailThing.com> writes:

On Tuesday, 23 July 2013 at 22:38:35 UTC, Yota wrote:
   typeof(Xs[0]) sum(Xs...)()
   {
       pragma(msg, Xs); // tuple(1, 2, 3)
       typeof(Xs[0]) y;
       foreach(x; Xs) y ~= x;
       return y;
   }

*facepalm*
That ~= in there is what I get for copying code after I 
experiment on it.  Ment for it to be a +=. Phooey

~Yota

Jul 23 2013

"JS" <js.mdnq gmail.com> writes:

On Tuesday, 23 July 2013 at 22:38:35 UTC, Yota wrote:
 On Tuesday, 23 July 2013 at 20:11:21 UTC, JS wrote:
 Your code doesn't exactly do what I wanted but probably 
 because I wasn't clear... I think it is modifiable and solves 
 the problem I was having(again, assuming your code is correct).

 Note though that something like

 string x = "asd"
 join(x,...); // x is compile time known and should be joined 
 at compile time. Not sure if D is smart enough to get this one?

 When I get some time later I'll check your code out, what I 
 want to do is get something like

 string x = join(a, b, "a", "b"); to produce the same code as
 string x = a~b~"ab";

 (no looping using for each except for string arrays)

 Thanks!


 Don't forget that tuples can store more than just types!  See 
 this for instance.

   import std.stdio;

   typeof(Xs[0]) sum(Xs...)()
   {
       pragma(msg, Xs); // tuple(1, 2, 3)
       typeof(Xs[0]) y;
       foreach(x; Xs) y ~= x;
       return y;
   }

   int main() {
       auto n = sum!(1, 2, 3);
       enum m = sum!(1, 2, 3);

       pragma(msg, typeof(n)); // int
       writeln(n); // 6
       writeln(m); // 6

       return 0;
   }

 Works great.  When I comment out the 'sum!("1", "2", "3")' and 
 replace it with a literal '6', the compiled EXE is byte for 
 byte exactly the same.  You can exchange the '+=' for '~=' and 
 it will concatenate strings instead.  Of course, now all 
 arguments MUST be known at compile time, but that's no surprise.

 PS: OK, just noticed that you were hoping to mix run time and 
 compile time arguments.  Oops....
 Oh well, posting anyway because I think it's neat. lol

 As for a~"B"~"C" compiling into a~"BC", I think we just need to 
 work on constant folding in the compiler. (If it doesn't 
 already do this anyway.)

Well, it's not that simple. Constant folding is easy when it is 
obvious to the compiler... but something buried in foreach loop 
that uses runtime and compile time objects is not obvious.

My three points out of the post is:

1. D needs some way to make doing this stuff easier.
2. It needs to be done! ;)
3. Goto 1.

I didn't realize D could even potentially do it, and still not 
sure if it can(have to check out T's post again and work with the 
code).

I think there is a lot of potential power behind such techniques 
with regard to optimization.

Suppose you are writing a matrix library. Wouldn't it be nice to 
have certain matrix computations that can be done at compile time 
actually done? (I suppose what I'm trying to do is extend the 
compiler's "constant folding" to be able to deal with much more 
complex cases).


Another way to potentially do it, I think, is for the compiler to 
automatically expand ctfe functions and replace runtime entities 
with "placeholder" code which is then expanded at runtime.

e.g., for a simple join

join(T...)(T t)
{
     string s;
     foreach(tt; t)
         s ~= tt;
     return s;
}

the compiler can unroll the loop to get

s ~= t[0]~t[1]~t[2]~...;

replace the compile time entities, e.g., suppose t[1] and t[3] 
are known at compile time at the invocation, then

s ~= t[0]~"abcd"~t[2]~"asdf";

then this code could be used inline for the join call instead of 
the actual code. The key here is there is no dependence on the 
for loop. If all arguments are compile-time evaluable, then s 
could be just a reference lookup to a static string. If it can't 
be done, then no big deal, it is optimized as far as it can go. 
Only if all arguments passed are runtime entities will the exact 
code be used.

This seems like it would take a pretty smart compiler and 
possibly it would have issues in some many cases. I personally 
don't mind writing the code itself but it seems pretty complex, 
and having no way to debug the code generation easily(AFAIK) 
makes it difficult to do.

Possibly using version(ctfe) can help... I'll work on it when I 
get a chance...

Jul 23 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Jul 24, 2013 at 01:35:23AM +0200, JS wrote:
[...]
 My three points out of the post is:
 
 1. D needs some way to make doing this stuff easier.
 2. It needs to be done! ;)
 3. Goto 1.
 
 I didn't realize D could even potentially do it, and still not sure
 if it can(have to check out T's post again and work with the code).
 
 I think there is a lot of potential power behind such techniques
 with regard to optimization.
 
 Suppose you are writing a matrix library. Wouldn't it be nice to
 have certain matrix computations that can be done at compile time
 actually done? (I suppose what I'm trying to do is extend the
 compiler's "constant folding" to be able to deal with much more
 complex cases).

[...]

You're looking for expression templates. So far, nobody seems to have
written expression templates in D yet (that I know of, anyway, I could
be wrong). D should be more than well-equipped to handle expression
templates, you just have to write them. (But having said that, they
aren't exactly easy to write. :-P)

I think the last time this topic came up, somebody mentioned the dreaded
mythical AST macros, and the discussion got derailed. I've been meaning
to write a D matrix library using expression templates, but I haven't
gotten around to it yet.

The basic idea behind expression templates is that overloaded operators
(or function calls, if operator overloading isn't good enough for you)
don't return concrete types, but a proxy wrapper type that gets unboxed
into a real type when assigned to a concrete variable. The operators are
overloaded to take these wrapper types as arguments, and basically
construct a template tree paralleling the structure of the expression so
that when you're finally assigning the result to a concrete variable,
the assignment operator can traverse the expression tree, fold constants
and do whatever other optimizations you wish, then generate optimized
code for performing the operation represented by the tree.

That's the theory anyway. In practice, expression templates are rather
hard to write (and read!) if you're the unlucky library writer (it *is*
really nice from the user's POV though!). They are also one of the
sources of C++ error messages that span 15 pages per line, if you're
ever unlucky enough to run into a problem.

Last I was told, however, that the preferred approach in D is to use
string mixins instead. You basically write a string containing an
arbitrary expression (of arbtrary syntax, even!) that gets parsed by a
CTFE lexer/parser that can perform arbitrary transformations on it,
which generates D code as a string that you can use in a mixin. This is
significantly more powerful than expression templates, because you're no
longer constrained by the syntax of the host language, but your input
string can be literally *anything*, like a DSL, or a math expression
containing real, bona fide Unicode mathematical symbols for operators,
sets, and what-have-you. You could even write your own language that
way. All of it eventually gets compiled down into D code as a string
mixin.

Syntax-wise, there's the slight ugliness of having to write
"mixin(MyDSL("A ± B / C → D"))", but that's more than outweighed by the
total freedom of syntax within the input string itself. I saw an
expression-template-based regex implementation in C++ once, and almost
went blind. It overloaded C++ operators in rather ugly ways in order to
work around the fact that C++ doesn't have native regex operators. By
contrast, our own std.regex has a ctRegex function that essentially does
the same thing while allowing you to use native regex syntax within a
string literal. I think that's a far better approach for many cases. :)


T

-- 
It always amuses me that Windows has a Safe Mode during bootup. Does
that mean that Windows is normally unsafe?

Jul 23 2013

"JS" <js.mdnq gmail.com> writes:

On Wednesday, 24 July 2013 at 01:07:15 UTC, H. S. Teoh wrote:
 On Wed, Jul 24, 2013 at 01:35:23AM +0200, JS wrote:
 [...]
 My three points out of the post is:
 
 1. D needs some way to make doing this stuff easier.
 2. It needs to be done! ;)
 3. Goto 1.
 
 I didn't realize D could even potentially do it, and still not 
 sure
 if it can(have to check out T's post again and work with the 
 code).
 
 I think there is a lot of potential power behind such 
 techniques
 with regard to optimization.
 
 Suppose you are writing a matrix library. Wouldn't it be nice 
 to
 have certain matrix computations that can be done at compile 
 time
 actually done? (I suppose what I'm trying to do is extend the
 compiler's "constant folding" to be able to deal with much more
 complex cases).

 [...]

Your tuple reduce seems to work! Debugging the code shows no 
loops and memory debugging looks like the adjacent constants are 
folded! magnificent!

As far as over optimizing, I am trying to write some library code 
and want to "get it right" the first time... having it not only 
optimize the best it can but also make the functions easier to 
use... e.g., variargs instead of nested calls.

As far as ctfe running time issues, I believe they exist already?

In any case, the compiler at some point has to mark a variable 
that is compile time evaluable as not evaluable at some 
point(probably at a function call)

What we do know is arguments to ctfe's return compile time 
evaluable results... So it is easy to know that a variable is 
compile time evaluable but not necessarily actually be able to 
compute it at compile time.... a simple time out count with 
error/warning could be used... if failed it could be marked as a 
run-time entity at that point and beyond(so no optimization 
beyond the point where it can't be computed at compile time. Not 
perfect but covers 99.99% of cases.





 You're looking for expression templates. So far, nobody seems 
 to have
 written expression templates in D yet (that I know of, anyway, 
 I could
 be wrong). D should be more than well-equipped to handle 
 expression
 templates, you just have to write them. (But having said that, 
 they
 aren't exactly easy to write. :-P)

I think I'll pass, I'm still struggling just to learn D. You, 
OTH, seem like someone that can pull it off rather easily!
 I think the last time this topic came up, somebody mentioned 
 the dreaded
 mythical AST macros, and the discussion got derailed. I've been 
 meaning
 to write a D matrix library using expression templates, but I 
 haven't
 gotten around to it yet.

 The basic idea behind expression templates is that overloaded 
 operators
 (or function calls, if operator overloading isn't good enough 
 for you)
 don't return concrete types, but a proxy wrapper type that gets 
 unboxed
 into a real type when assigned to a concrete variable. The 
 operators are
 overloaded to take these wrapper types as arguments, and 
 basically
 construct a template tree paralleling the structure of the 
 expression so
 that when you're finally assigning the result to a concrete 
 variable,
 the assignment operator can traverse the expression tree, fold 
 constants
 and do whatever other optimizations you wish, then generate 
 optimized
 code for performing the operation represented by the tree.

 That's the theory anyway. In practice, expression templates are 
 rather
 hard to write (and read!) if you're the unlucky library writer 
 (it *is*
 really nice from the user's POV though!). They are also one of 
 the
 sources of C++ error messages that span 15 pages per line, if 
 you're
 ever unlucky enough to run into a problem.

 Last I was told, however, that the preferred approach in D is 
 to use
 string mixins instead. You basically write a string containing 
 an
 arbitrary expression (of arbtrary syntax, even!) that gets 
 parsed by a
 CTFE lexer/parser that can perform arbitrary transformations on 
 it,
 which generates D code as a string that you can use in a mixin. 
 This is
 significantly more powerful than expression templates, because 
 you're no
 longer constrained by the syntax of the host language, but your 
 input
 string can be literally *anything*, like a DSL, or a math 
 expression
 containing real, bona fide Unicode mathematical symbols for 
 operators,
 sets, and what-have-you. You could even write your own language 
 that
 way. All of it eventually gets compiled down into D code as a 
 string
 mixin.

 Syntax-wise, there's the slight ugliness of having to write
 "mixin(MyDSL("A ± B / C → D"))", but that's more than 
 outweighed by the
 total freedom of syntax within the input string itself. I saw an
 expression-template-based regex implementation in C++ once, and 
 almost
 went blind. It overloaded C++ operators in rather ugly ways in 
 order to
 work around the fact that C++ doesn't have native regex 
 operators. By
 contrast, our own std.regex has a ctRegex function that 
 essentially does
 the same thing while allowing you to use native regex syntax 
 within a
 string literal. I think that's a far better approach for many 
 cases. :)

I was wondering if that was possible. I was using string mixins 
to essentially accomplish the constant folding was thinking that 
one could implement java script or some other language in the 
strings and have them transformed. It could be pretty useful to 
have. I've still got a lot to learn about D but it seems nearly 
unbounded for what it can do compared to other languages.

Thanks for making this work! I'll spend some time studying your 
code to learn how it works then try to apply it to my library 
code.

Trying to do stripLeft/Right, split, join, etc... for strings. 
Each function taking an arbitrary number of run-time or 
compile-time char or string arguments with optimal code 
generation.

Jul 24 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jul 23, 2013 at 10:11:17PM +0200, JS wrote:
[...]
 Note though that something like
 
 string x = "asd"
 join(x,...); // x is compile time known and should be joined at
 compile time. Not sure if D is smart enough to get this one?

Well, if you want *that*, then you'll just have to submit a DMD pull
request for the optimizer. ;-)

The problem is that even though in this particular case it should be
easy for the compiler to figure out, in the general case it's far from
trivial. For example, you might write:

	string x = "asd";
	x ~= "def";
	join(x, ...);

Arguably, the compiler should detect that the value of x is computable
at compile-time, so it should apply the optimization. But the problem
is, the code between the first line and the call to join() can be
arbitrarily complex... for example:

	enum VeryLargeNumber = 2_391_382_391_483;
	string x = "asd";
	if (canFactorize(VeryLargeNumber))
		x ~= "x";
	join(x, ...);

Assuming canFactorize is CTFE-able, in theory the compiler should be
able to figure out the value of x at compile-time. But integer
factorization currently has no simple solution, which means your code
will take 5 months to compile while DMD tries to solve integer
factorization. Worse yet, you might write something like:

	string x = "asd";
	void function(int x) fn = createArbitraryFunction();
	if (canHalt(fn))
		x ~= "x";
	join(x, ...);

Then your program will *never* finish compiling, because the halting
problem is unsolvable, but nothing stops you from writing a CTFE-able
solution that requires infinite time to find an answer.


 When I get some time later I'll check your code out, what I want to
 do is get something like
 
 string x = join(a, b, "a", "b"); to produce the same code as
 string x = a~b~"ab";
 
 (no looping using for each except for string arrays)

[...]

That's pretty much what my code already does.  Just use GDC and it will
unroll the loop for you. ;-)

Alternatively, since tupleReduce returns a tuple, you *could* just
foreach over it (loops involving tuples are automatically unrolled at
compile-time) and generate the code. That is, instead of using
optimizedJoin() as a wrapper for a function, you could do something like
this instead:

	string optimizedJoin(args...)() {
		string result;
		foreach (arg; tupleReduce!(
				(string x, string y) => x~y, args))
		{
			result ~= arg;
		}
		return result;
	}

I haven't tested this code, though, so I'm not 100% sure if it will
work. But if not, a little massaging should make it work.

On another note, though, it makes me wonder why you want this level of
control over the generated code. Are you sure you're not doing premature
optimization? You should be aware that the more you use such arcane
tricks to generate this "optimized" code, the less the compiler's
built-in optimizer will be able to understand your intent, and the more
likely that it may be generating more inferior code than if you had just
written the code the "normal" way.

Generally, this level of manual optimization should only be done after
you've profiled your code and identified the hotspots. I know from
experience that more often than not, what you *think* is a hotspot
actually isn't. It's only after running a profiler that you find out
where the *real* hotspots are. :)  I used to be a hardcore optimization
freak -- used to read assembly output from the compiler to identify
inefficiencies -- until one day I actually ran a profiler on my code.
Turns out that the real hotspot is completely NOT where I expected it.
All those late nights spent "optimizing" my code, only to get negligible
performance gain, when a 1-line change in the *real* hotspot improved
performance by 30% instantly (and only took me 5 minutes instead of
hours and hours).


T

-- 
My program has no bugs! Only undocumented features...

Jul 23 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jul 23, 2013 at 12:58:11PM -0700, H. S. Teoh wrote:
[...]
 	void main() {

Oops, forgot to include the definition of x and y in main():

	string x = "runtime1";
	string y = "runtime2";

 		writeln(optimizedJoin!(x, "a", "bc", y, "de", "f"));
 	}


T

-- 
People say I'm indecisive, but I'm not sure about that. -- YHL, CONLANG

Jul 23 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Jul 24, 2013 at 08:12:36PM +0200, JS wrote:
 T,
 
 I've looked over your code and understand what you are doing but I
 don't understand the expression
 
      is(typeof(func(args[0], args[1]))))
 
 you are checking if func(a,b)'s return type is a type?

That's the usual idiom for checking if it's legal to pass args[0] and
args[1] to func().

If it's legal, then typeof(func(args[0],args[1])) will return some type,
which then passes the check. If it's illegal, then the call to func has
no type because the compiler rejects the call; say you have
func(int,int) and you attempted to call func("a","a"). So then
typeof(func(...)) will not be a valid type, and the check fails.


On Wed, Jul 24, 2013 at 08:42:48PM +0200, JS wrote:
 On Wednesday, 24 July 2013 at 18:12:37 UTC, JS wrote:
 T,
 
 Also, I've tried to wrap the template in a function and I can't get
 it to work:
 
 	
 	string join(T...)(string delim, T args)
 	{
 		return tupleStringReduce!(args);
 	}
 
 it seems tupleStringReduce just returns the tuple of of the
 optimized strings. I'll still need to write code that will deal
 mixing in the actual compile time and runtime code.

You can't return a tuple as a string. :)  Mainly because the joining has
only happened to the string literals, but you haven't told the compiler
how the tuple itself is to be joined into a string.

I thought I already gave example code to do this:

	string join(string[] args...) {
		string result;
		string delim;

		// You need this in order for the compiler to know what
		// you wanna do with the optimized tuple.
		foreach (arg; args) {
			result ~= delim ~ arg;
			delim = ",";	// just for example
		}
		return result;
	}

	template optimizedJoin(args...)
	{
		// This wrapper optimizes the tuple and passes the
		// result to join().
		string optimizedJoin() {
			return join(tupleReduce!((string x, string y) => x~y, args));
		}
	}

Note that you have to use template calling syntax for optimizedJoin,
like this:

	string result = optimizedJoin!("a", "b", x, "c", "d");

I did try wrapping it so that you can use normal runtime call syntax,
but it doesn't work because the tuple reduction must happen at template
expansion stage. It's too late to do it in CTFE proper.


 This doesn't seem that hard as tupleReduce essentially outlines the
 approach. The difference is instead of calling func directly I sort
 of need to mix it in as a code string.
 
 e.g., for tupleStringReduce, func is x~y. I need to build a string
 using that "((arg[0]~arg[1])~arg[2])..." which is then mixed in at
 compile time but at runtime evaluates to a string(not an array).
 
 Since there is no .codeof, AFAIK, I guess the only way to do this is
 pass func, and the func code string("x~y") in this case.
 
 The idea though is to use the lamba as a ctfe to reduce the tuple
 when possible but *also* use it at runtime to evaluate the tuple.

[...]

Alright, here's an improved version of the code:

	import std.stdio;
	
	template tuple(args...) {
		alias tuple = args;
	}
	
	/**
	 * Given a binary function and a tuple, returns a tuple in which all adjacent
	 * compile-time readable items are recursively substituted with the return
	 * value of the binary function applied to them.
	 */
	template tupleReduce(alias func, args...)
	{
		static if (args.length > 1)
		{
			static if (__traits(compiles, { enum x = args[0]; }))
			{
				static if (__traits(compiles, { enum x = args[1]; }) &&
					is(typeof(func(args[0], args[1]))))
				{
					enum result = func(args[0], args[1]);
					alias tupleReduce = tupleReduce!(func, result,
						args[2..$]);
				}
				else
				{
					alias tupleReduce = tuple!(args[0], args[1],
						tupleReduce!(func, args[2..$]));
				}
			}
			else
			{
				alias tupleReduce = tuple!(args[0],
					tupleReduce!(func, args[1..$]));
			}
		}
		else
		{
			alias tupleReduce = args;
		}
	}
	
	// I decided to use a real variadic instead of an array, to
	// force loop-unrolling.
	// Strictly speaking, we should add a signature constraint to
	// enforce uniform types in T..., but I was too lazy to figure
	// out how to phrase allSatisfy to check for that. For
	// production code, though you'll definitely want to do this.
	auto reduceImpl(alias fun, T...)(T args)
	{
		import std.functional;
		typeof(unaryFun!fun(args[0],args[0])) result;
	
		foreach (arg; args) {
			result = unaryFun!fun(result, arg);
		}
		return result;
	}
	
	// Nice wrapper for reduceImpl, so that we don't have to type
	// the lambda twice.
	template reduce(alias fun, args...)
	{
		string reduce() {
			return reduceImpl!fun(tupleReduce!(fun, args));
		}
	}
	
	/**
	 * Joins a list of strings. Adjacent CTFEable strings are joined
	 * at compile-time into single strings, while runtime arguments
	 * are joined at runtime.
	 */
	template join(args...)
	{
		string join() {
			// Behold, only one instance of the lambda! It
			// gets used both at compile-time and runtime.
			return reduce!((string x, string y) => x~y, args);
		}
	}
	
	void main() {
		// Example of how to use tupleReduce for compile-time
		// optimization of join:
		string x = "(runtime1)";
		string y = "(runtime2)";

		// Note that you still have to use compile-time argument
		// syntax for join.
		writeln(join!(x, "a", "bc", y, "de", "f", "g"));
	}

Compiling with dmd -O -inline produced two nested function calls for the
join!(...), though inside, the string literals have already been
concatenated.

Compiling with gdc -O3, OTOH, produced an unrolled loop in Dmain which
basically sequentially appends to the result of join!(), i.e., it's as
though I wrote:

	void main() {
		string x = "(runtime1)";
		string y = "(runtime2)";
		writeln(x ~ "abc" ~ y ~ "defg");
	}

You can't get much better than that. :)

I think trying to repackage join!(...) as join(...) is a fool's errand,
though. You'll likely only end up with pain. :) IMO writing join!(...)
is a Good Enough(tm) compromise given that we have already achieved
maximal compile-time optimization.


T

-- 
ASCII stupid question, getty stupid ANSI.

Jul 24 2013

"JS" <js.mdnq gmail.com> writes:

On Wednesday, 24 July 2013 at 19:52:57 UTC, H. S. Teoh wrote:
 On Wed, Jul 24, 2013 at 08:12:36PM +0200, JS wrote:
 T,
 
 I've looked over your code and understand what you are doing 
 but I
 don't understand the expression
 
      is(typeof(func(args[0], args[1]))))
 
 you are checking if func(a,b)'s return type is a type?

 That's the usual idiom for checking if it's legal to pass 
 args[0] and
 args[1] to func().

 If it's legal, then typeof(func(args[0],args[1])) will return 
 some type,
 which then passes the check. If it's illegal, then the call to 
 func has
 no type because the compiler rejects the call; say you have
 func(int,int) and you attempted to call func("a","a"). So then
 typeof(func(...)) will not be a valid type, and the check fails.


 On Wed, Jul 24, 2013 at 08:42:48PM +0200, JS wrote:
 On Wednesday, 24 July 2013 at 18:12:37 UTC, JS wrote:
 T,
 
 Also, I've tried to wrap the template in a function and I 
 can't get
 it to work:
 
 	
 	string join(T...)(string delim, T args)
 	{
 		return tupleStringReduce!(args);
 	}
 
 it seems tupleStringReduce just returns the tuple of of the
 optimized strings. I'll still need to write code that will deal
 mixing in the actual compile time and runtime code.

 You can't return a tuple as a string. :)  Mainly because the 
 joining has
 only happened to the string literals, but you haven't told the 
 compiler
 how the tuple itself is to be joined into a string.

 I thought I already gave example code to do this:

 	string join(string[] args...) {
 		string result;
 		string delim;

 		// You need this in order for the compiler to know what
 		// you wanna do with the optimized tuple.
 		foreach (arg; args) {
 			result ~= delim ~ arg;
 			delim = ",";	// just for example
 		}
 		return result;
 	}

 	template optimizedJoin(args...)
 	{
 		// This wrapper optimizes the tuple and passes the
 		// result to join().
 		string optimizedJoin() {
 			return join(tupleReduce!((string x, string y) => x~y, args));
 		}
 	}

 Note that you have to use template calling syntax for 
 optimizedJoin,
 like this:

 	string result = optimizedJoin!("a", "b", x, "c", "d");

 I did try wrapping it so that you can use normal runtime call 
 syntax,
 but it doesn't work because the tuple reduction must happen at 
 template
 expansion stage. It's too late to do it in CTFE proper.


 This doesn't seem that hard as tupleReduce essentially 
 outlines the
 approach. The difference is instead of calling func directly I 
 sort
 of need to mix it in as a code string.
 
 e.g., for tupleStringReduce, func is x~y. I need to build a 
 string
 using that "((arg[0]~arg[1])~arg[2])..." which is then mixed 
 in at
 compile time but at runtime evaluates to a string(not an 
 array).
 
 Since there is no .codeof, AFAIK, I guess the only way to do 
 this is
 pass func, and the func code string("x~y") in this case.
 
 The idea though is to use the lamba as a ctfe to reduce the 
 tuple
 when possible but *also* use it at runtime to evaluate the 
 tuple.

 [...]

 Alright, here's an improved version of the code:

 	import std.stdio;
 	
 	template tuple(args...) {
 		alias tuple = args;
 	}
 	
 	/**
 	 * Given a binary function and a tuple, returns a tuple in 
 which all adjacent
 	 * compile-time readable items are recursively substituted 
 with the return
 	 * value of the binary function applied to them.
 	 */
 	template tupleReduce(alias func, args...)
 	{
 		static if (args.length > 1)
 		{
 			static if (__traits(compiles, { enum x = args[0]; }))
 			{
 				static if (__traits(compiles, { enum x = args[1]; }) &&
 					is(typeof(func(args[0], args[1]))))
 				{
 					enum result = func(args[0], args[1]);
 					alias tupleReduce = tupleReduce!(func, result,
 						args[2..$]);
 				}
 				else
 				{
 					alias tupleReduce = tuple!(args[0], args[1],
 						tupleReduce!(func, args[2..$]));
 				}
 			}
 			else
 			{
 				alias tupleReduce = tuple!(args[0],
 					tupleReduce!(func, args[1..$]));
 			}
 		}
 		else
 		{
 			alias tupleReduce = args;
 		}
 	}
 	
 	// I decided to use a real variadic instead of an array, to
 	// force loop-unrolling.
 	// Strictly speaking, we should add a signature constraint to
 	// enforce uniform types in T..., but I was too lazy to figure
 	// out how to phrase allSatisfy to check for that. For
 	// production code, though you'll definitely want to do this.
 	auto reduceImpl(alias fun, T...)(T args)
 	{
 		import std.functional;
 		typeof(unaryFun!fun(args[0],args[0])) result;
 	
 		foreach (arg; args) {
 			result = unaryFun!fun(result, arg);
 		}
 		return result;
 	}
 	
 	// Nice wrapper for reduceImpl, so that we don't have to type
 	// the lambda twice.
 	template reduce(alias fun, args...)
 	{
 		string reduce() {
 			return reduceImpl!fun(tupleReduce!(fun, args));
 		}
 	}
 	
 	/**
 	 * Joins a list of strings. Adjacent CTFEable strings are 
 joined
 	 * at compile-time into single strings, while runtime arguments
 	 * are joined at runtime.
 	 */
 	template join(args...)
 	{
 		string join() {
 			// Behold, only one instance of the lambda! It
 			// gets used both at compile-time and runtime.
 			return reduce!((string x, string y) => x~y, args);
 		}
 	}
 	
 	void main() {
 		// Example of how to use tupleReduce for compile-time
 		// optimization of join:
 		string x = "(runtime1)";
 		string y = "(runtime2)";

 		// Note that you still have to use compile-time argument
 		// syntax for join.
 		writeln(join!(x, "a", "bc", y, "de", "f", "g"));
 	}

 Compiling with dmd -O -inline produced two nested function 
 calls for the
 join!(...), though inside, the string literals have already been
 concatenated.

 Compiling with gdc -O3, OTOH, produced an unrolled loop in 
 Dmain which
 basically sequentially appends to the result of join!(), i.e., 
 it's as
 though I wrote:

 	void main() {
 		string x = "(runtime1)";
 		string y = "(runtime2)";
 		writeln(x ~ "abc" ~ y ~ "defg");
 	}

 You can't get much better than that. :)

 I think trying to repackage join!(...) as join(...) is a fool's 
 errand,
 though. You'll likely only end up with pain. :) IMO writing 
 join!(...)
 is a Good Enough(tm) compromise given that we have already 
 achieved
 maximal compile-time optimization.


Cool, this is definitely on the right track! I like how you are 
able to generate the expansion reduction without string 
mixins(the naive way I would try and do it).

The main things I was after are achieved: Loop unrolling, 
mixed-time constant folding, and variable arguments.

While I think I understand the overall idea you are using it will 
take me a bit to grasp exactly how you were able to get it all 
done correctly.

I have two requests if you don't mind and have the time...

Allow for array reduction and delim/indexing. When using join we 
typically want to join an array of strings while also inserting a 
delimiter between them... usually with the last one removed.

Sometimes we want to know the index and maximum array size to do 
specialized joins.

e.g.,

join("a", "b", ["c", "d"])

we might want to print out "4: 1a, 2b, 3c, 4d".

Using your lambda's this is somewhat easy to achieve,

(string a, int i, int m) => { return ((i == 0) ? to!string(m)~": 
" : "")~to!string(i)~a~((i < m) ? ", " : ""); }

this would be a unary applied to each string... either at compile 
time, if possible, or runtime.

I'm not sure how to achieve this because of the runtime/compile 
time issue... but I think it's similar to how to mention the use 
of reduce being used at compile time and runtime.

As far as the array issue goes, it seems that a check for an 
array is needed then func needs to be checked if it can work on 
the array element and applied in a similar fashion... but just 
too much complexity for my little brain to grasp ;/

Having such features should 1. Provide a join function that is 
optimal in all conditions but also mimic the traditional join 
function. 2. Demonstrate techniques that produce such optimal 
performing code and make normal functions more performant and 
expressive(e.g., works well with variable args).

I can imagine an analogous Boost like library for D that provides 
the optimal performance possible by getting the compiler to do as 
much work as possible behind the scenes.

Once I get these programming techniques down I'll hopefully be 
able to start implementing some of routines(string stuff, math, 
etc...). At that point, if no one beats me to the punch I'll try 
to see if it was all worth it in the first place ;)

Jul 25 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Jul 25, 2013 at 03:47:23PM +0200, JS wrote:
[...]
 Cool, this is definitely on the right track! I like how you are able
 to generate the expansion reduction without string mixins(the naive
 way I would try and do it).

 
String mixins are sorta like nuclear warheads, a last resort when
absolutely nothing else will work. They're usually overkill for most
tasks. Generally, other less drastic alternatives should be tried first.
:)


 The main things I was after are achieved: Loop unrolling, mixed-time
 constant folding, and variable arguments.

While it has been a fun exercise, I still think loop unrolling is in the
province of the compiler's optimizer (e.g. gdc -O3). Unless you're
writing performance-critical inner loops in a 3D game engine or a hard
real-time application, it seems like premature optimization to me.

As for constant folding, I wonder if expression templates may be a
better approach for the general case. Tuple reduction works well when
the reduction can be done pairwise, but in more complicated cases, like
optimizing matrix expressions, say, you probably need something more
sophisticated.

Still, I think it would be cleaner if the language provided a way for
the compiler to support such optimizations, so that we can take
advantage of its built-in optimizer instead of reinventing the wheel in
user code. I've often thought about how one might go about specifying
expression reduction rules for user-defined types, like associativity
for matrix expressions, etc., that tell the compiler what kind of rules
the custom operators obey, that the optimizer can make use of.


 While I think I understand the overall idea you are using it will
 take me a bit to grasp exactly how you were able to get it all done
 correctly.

Maybe you should take the time to study the code, then next time you can
do it yourself. ;-)


 I have two requests if you don't mind and have the time...
 
 Allow for array reduction and delim/indexing. When using join we
 typically want to join an array of strings while also inserting a
 delimiter between them... usually with the last one removed.

We already have std.algorithm.join... yes, yes I know you want
compile-time optimization, so that can be an interesting exercise for
your learning. ;-)

But for starters, one way to achieve this is to preprocess your tuple to
insert delimiters between elements, then hand it over to tupleReduce to
optimize it. Something like this might do it:

	// Warning: untested code
	template insertDelim(string delim, args...) {
		static if (args.length > 1) {
			alias insertDelim = tuple!(args[0], delim,
				insertDelim!(args[1..$]));
		} else {
			alias insertDelim = args;
		}
	}

	// This should do the trick. Use insertDelim to insert the
	// delimiters, then tupleReduce it to concatenate all
	// compile-time-known arguments, then hand over the rest to the
	// runtime function as before.
	template join(string delim, args...) {
		string join() {
			return joinImpl(tupleReduce!(
				(string x, string y) => x~y,
				insertDelim!(delim, args)));
		}
	}

If I didn't make a mistake in the above code, it should be able to
optimize:

	join!(",", "a", "b", x, "c", "d");

into:

	"a,b," ~ x ~ ",c,d"


 Sometimes we want to know the index and maximum array size to do
 specialized joins.

[...]

That's trivial, tuples can be indexed just like arrays at compile-time,
and have a .length property just like arrays.


T

-- 
"Real programmers can write assembly code in any language. :-)" -- Larry Wall

Jul 25 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Jul 25, 2013 at 07:57:30AM -0700, H. S. Teoh wrote:
 On Thu, Jul 25, 2013 at 03:47:23PM +0200, JS wrote:

[...]
 Allow for array reduction and delim/indexing. When using join we
 typically want to join an array of strings while also inserting a
 delimiter between them... usually with the last one removed.


[...]

Sigh... I could never resist a challenge. Here's a full, working,
compilable, runnable implementation:

	import std.stdio;
	
	template tuple(args...) {
		alias tuple = args;
	}
	
	/**
	 * Given a binary function and a tuple, returns a tuple in which all adjacent
	 * compile-time readable items are recursively substituted with the return
	 * value of the binary function applied to them.
	 */
	template tupleReduce(alias func, args...)
	{
		static if (args.length > 1)
		{
			static if (__traits(compiles, { enum x = args[0]; }))
			{
				static if (__traits(compiles, { enum x = args[1]; }) &&
					is(typeof(func(args[0], args[1]))))
				{
					enum result = func(args[0], args[1]);
					alias tupleReduce = tupleReduce!(func, result,
						args[2..$]);
				}
				else
				{
					alias tupleReduce = tuple!(args[0], args[1],
						tupleReduce!(func, args[2..$]));
				}
			}
			else
			{
				alias tupleReduce = tuple!(args[0],
					tupleReduce!(func, args[1..$]));
			}
		}
		else
		{
			alias tupleReduce = args;
		}
	}
	
	auto reduceImpl(alias fun, T...)(T args)
	{
		import std.functional;
		typeof(unaryFun!fun(args[0],args[0])) result;
	
		foreach (arg; args) {
			result = unaryFun!fun(result, arg);
		}
		return result;
	}
	
	template reduce(alias fun, args...)
	{
		string reduce() {
			return reduceImpl!fun(tupleReduce!(fun, args));
		}
	}
	
	template join(args...)
	{
		string join() {
			return reduce!((string x, string y) => x~y, args);
		}
	}
	
	template insertDelim(string delim, args...) {
		static if (args.length > 1)
			alias insertDelim = tuple!(args[0], delim,
				insertDelim!(delim, args[1..$]));
		else
			alias insertDelim = args;
	}
	
	template joinWithDelim(string delim, args...)
	{
		alias joinWithDelim = join!(insertDelim!(delim, args));
	}
	
	void joinDelimTest() {
		// Example of joinWithDelim
		string x = "runtime1";
		string y = "runtime2";
		string z = "runtime3";
	
		writeln(joinWithDelim!(":", "a", "b", x, "c", y, z, "d"));
	
		// Here's the proof that the delimiters have been concatenated with
		// adjacent items at compile-time where possible:
		writeln([ tupleReduce!((string x, string y) => x~y,
				insertDelim!(":", "a", "b", x, "c", y, z, "d")) ]);
	}
	
	void main() {
		joinDelimTest();
	}


The key points of note are the joinWithDelim template, which implements
a fully-working join with specifiable delimiter, and the code in
joinDelimTest() that illustrates how to use it and what it does. The
program produces this output:

	a:b:runtime1:c:runtime2:runtime3:d
	["a:b:", "runtime1", ":c:", "runtime2", ":", "runtime3", ":d"]

The last line indicates that the delimiters have been concatenated with
all adjacent compile-time readable arguments, where possible. So
basically, we start with:

	"a" "b" x "c" y z "d"

then delimiters are inserted:

	"a" ":" "b" ":" x ":" "c" ":" y ":" z ":" "d"

then this is optimized at compile-time into:

	"a:b:" x ":c:" y ":" z ":d"

By capturing this stage in an array, we get to see what the optimized
tuple looks like (2nd line of output), before joinWithDelim, at runtime,
finally interpolates x, y, and z into the result (1st line of output).

Looking at the disassembly of the program, compiled with gdc -O3, the
call to joinWithDelim is inside its own function (reasonable, since it's
rather large), in which there are 7 array concatenations, as expected.
(If the delimiters weren't optimized at compile-time, we'd have 13
concatenations instead.) That is, it's as if we wrote:

	"a:b" ~ x ~ ":c:" ~ y ~ ":" ~ z ~ ":d"

(The first concatenation happens on the empty array of the return value;
that's why there are 7 rather than 6.)

Exercise for the reader: ;-)

	Large numbers of array concatenations are not efficient at
	runtime.  Ideally, if the number of elements to be concatenated
	is > N, for some threshold N, we should use std.array.appender
	instead of the built-in ~.

	Rewrite the code to do this switchover to std.array.appender
	given some value of N, say 5.

	(Hint: since the length of tuples is conveniently available at
	compile-time as .length, we can just check if .length is greater
	than 5, and if so, instead of using ~ we call a different
	template function that declares a string appender, then use .put
	on it in a foreach over the tuple -- remember that foreach over
	tuples are expanded at compile-time.)

	(Unfortunately, this means that we can't use reduce!() anymore,
	since appender can only be used at runtime; at compile-time we
	still have to use ~. So we can't use the same lambda at both
	compile-time and runtime.)


T

-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth? -- Michael Beibl

Jul 25 2013

"JS" <js.mdnq gmail.com> writes:

Thanks.. I am working on getting the code to work with unrolling 
an array:


module main;

import std.stdio, std.traits;

template tuple(args...)
{
     alias tuple = args;
}

template tUnrollArray(args...)
{
	static if (isArray!(typeof(args)))
	{
		pragma(msg, "isarray");
		static if (args.length > 1)
		{
			pragma(msg, "unrolling");
			alias tUnrollArray = tuple!(arg[0], args[1..$]);
		}
		else
		{
			static if (args[0].length > 1)
			{
				pragma(msg, "1");
				alias tUnrollArray = tuple!(args[0][0], 
tUnrollArray!(args[0][1..$]));
			}
			else
			{
				pragma(msg, "2");
				alias tUnrollArray = tuple!(args[0][0]);
			}
		}
	}
	else
		alias tUnrollArray = args;
}

void main(string[] argv)
{
	auto z = ["a", "b"];
         writeln(tUnrollArray!(["a", "b"]));
	writeln(tUnrollArray!(z));    // fails
	readln();
}


which seems to follow along the lines of your tuple reduce.

The problem is that I would like for it to unroll compile time 
known variables. I think we talked about that already and you 
said it wasn't a good idea because of potential infinite loops 
for some types of templates(which, I still think in most cases it 
should be ok, such as above.

Making z immutable does work but I don't get why making it static 
doesn't.

In any case, if the code looks correct I'll try and add the 
ability to have tupleReduce join elements that are arrays.

The alias stuff is starting to make some sense but having to do 
recursion for simple iterations isn't very fun ;/

Jul 25 2013

"JS" <js.mdnq gmail.com> writes:

The following code works better. Forgot to allow for multiple 
arguments. Runtime variables are not unrolled though but I'll 
work on understanding how you were able to unify the two as one(I 
think in your reduceImpl).

module main;

import std.stdio, std.traits;

template tuple(args...)
{
     alias tuple = args;
}

template tUnrollArgs(args...)
{
	static if (isArray!(typeof(args[0])))
	{
		pragma(msg, "isarray");
		static if (args.length > 1)
		{
			pragma(msg, "unrolling");
			alias tUnrollArgs = tuple!(tUnrollArgs!(args[0]), args[1..$]);
		}
		else
		{
			static if (args[0].length > 1)
			{
				pragma(msg, "1");
				alias tUnrollArgs = tuple!(args[0][0], 
tUnrollArgs!(args[0][1..$]));
			}
			else
			{
				pragma(msg, "2");
				alias tUnrollArgs = tuple!(args[0][0]);
			}
		}
	}
	else
		static if (args[0].length > 1)
			alias tUnrollArgs = tuple!(args[0], 
tUnrollArgs!(args[0][1..$]));
		else
			alias tUnrollArgs = args;
}

void main(string[] argv)
{
	auto z = ["1", "2"];
     writeln(tUnrollArgs!(["a", "b"], z, "c"));
	readln();
}

Jul 26 2013

"JS" <js.mdnq gmail.com> writes:

oops, that was a mess... I guess I need to learn how to program. 
This one is not much better but at least works. I'll have to 
rewrite the code.


module main;

import std.stdio, std.traits;

template tuple(args...)
{
     alias tuple = args;
}

template tUnrollArgs(args...)
{
	pragma(msg, ">"~args.stringof);
	static if (args.length == 0)
		alias tUnrollArgs = tuple!();
	else
	static if (isArray!(typeof(args[0])))
	{
		pragma(msg, "isarray");
		static if (args[0].length > 1)
		{
			pragma(msg, "unrolling");
			static if (args[0].length > 1)
			{
				pragma(msg, "length > 1");
				alias tUnrollArgs = tuple!(tUnrollArgs!(args[0][0]), 
tUnrollArgs!(args[0][1..$],args[1..$]));
			}
			else
			{
				pragma(msg, "length == 0 || 1");
				alias tUnrollArgs = args[0];
			}
		}
		else
		{
			static if (args[0].length > 1)
			{
				pragma(msg, "1");
				alias tUnrollArgs = tuple!(args[0][0], 
tUnrollArgs!(args[0][1..$], args[1..$]));
			}
			else
			{
				pragma(msg, "2");
				alias tUnrollArgs = tuple!(args[0][0], 
tUnrollArgs!(args[1..$]));
			}
		}
	}
	else
		//static if (args[0].length > 1)
			//alias tUnrollArgs = tuple!(args[0], 
tUnrollArgs!(args[0][1..$]));
		//else
			alias tUnrollArgs = args;
}

void main(string[] argv)
{
	immutable auto z = ["1", "2"];
     writeln(tUnrollArgs!(["a", "b"], z, "c"));
	readln();
}

Jul 26 2013

D Programming

C/C++ Programming

Other

digitalmars.D - Compile-time optimization