www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Code generation tricks

reply "JS" <js.mdnq gmail.com> writes:
This seems to be a somewhat efficient string splitter

http://dpaste.dzfl.pl/4307aa5f

The basic idea is

for(int j = 0; j < s.length; j++)
	{
		mixin(ExpandVariadicIf!("??Cs[j]??s[j..min(s.length-1, j + 
%%L)]::", "d", "
			if (r.length <= i) r.length += 5;
			if (j != 0)
			{
				r[i++] = s[oldj..j];
				oldj = j + %%L;
			}
			else
				oldj = %%L;
		 j += %%L; continue;", T));
		
	}

ExpandVariadicIf creates a series of if's for each variadic 
argument. There is some strange formatting(just some crap I threw 
together to get something working) but it boils down to 
generating compile time code that minimizes computations and 
lookups by directly using the known compile time literals passed.

IMO these types of functions seem useful but ATM are just hacks. 
Hopefully there is a better way to do these sorts of things as I 
find them pretty useful.

One of the big issues not being able to pass a variadic variable 
to a template directly which is why the formatting string is 
necessary(You can pass the typetuple to get the types and size 
but not the compile time values if they exist.

I think int this case a variadic alias would be very useful.

alias T... => alias T0, alias T1, etc....
(e.g. T[0] is an alias, T.length is number of aliases, etc...)

In any case, maybe someone has a good way to make these things 
easier and more useful. Being able to handle variadic types and 
values in a consistent and simple way will make them moreful.
Jul 21 2013
next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Sunday, 21 July 2013 at 17:24:11 UTC, JS wrote:
 This seems to be a somewhat efficient string splitter

 http://dpaste.dzfl.pl/4307aa5f

 The basic idea is

 for(int j = 0; j < s.length; j++)
 	{
 		mixin(ExpandVariadicIf!("??Cs[j]??s[j..min(s.length-1, j + 
 %%L)]::", "d", "
 			if (r.length <= i) r.length += 5;
 			if (j != 0)
 			{
 				r[i++] = s[oldj..j];
 				oldj = j + %%L;
 			}
 			else
 				oldj = %%L;
 		 j += %%L; continue;", T));
 		
 	}

 ExpandVariadicIf creates a series of if's for each variadic 
 argument. There is some strange formatting(just some crap I 
 threw together to get something working) but it boils down to 
 generating compile time code that minimizes computations and 
 lookups by directly using the known compile time literals 
 passed.

 IMO these types of functions seem useful but ATM are just 
 hacks. Hopefully there is a better way to do these sorts of 
 things as I find them pretty useful.

 One of the big issues not being able to pass a variadic 
 variable to a template directly which is why the formatting 
 string is necessary(You can pass the typetuple to get the types 
 and size but not the compile time values if they exist.

 I think int this case a variadic alias would be very useful.

 alias T... => alias T0, alias T1, etc....
 (e.g. T[0] is an alias, T.length is number of aliases, etc...)

 In any case, maybe someone has a good way to make these things 
 easier and more useful. Being able to handle variadic types and 
 values in a consistent and simple way will make them moreful.
How does this perform compared to naive/phobos splitting?
Jul 22 2013
next sibling parent "JS" <js.mdnq gmail.com> writes:
On Monday, 22 July 2013 at 21:04:42 UTC, John Colvin wrote:
 On Sunday, 21 July 2013 at 17:24:11 UTC, JS wrote:
 This seems to be a somewhat efficient string splitter

 http://dpaste.dzfl.pl/4307aa5f

 The basic idea is

 for(int j = 0; j < s.length; j++)
 	{
 		mixin(ExpandVariadicIf!("??Cs[j]??s[j..min(s.length-1, j + 
 %%L)]::", "d", "
 			if (r.length <= i) r.length += 5;
 			if (j != 0)
 			{
 				r[i++] = s[oldj..j];
 				oldj = j + %%L;
 			}
 			else
 				oldj = %%L;
 		 j += %%L; continue;", T));
 		
 	}

 ExpandVariadicIf creates a series of if's for each variadic 
 argument. There is some strange formatting(just some crap I 
 threw together to get something working) but it boils down to 
 generating compile time code that minimizes computations and 
 lookups by directly using the known compile time literals 
 passed.

 IMO these types of functions seem useful but ATM are just 
 hacks. Hopefully there is a better way to do these sorts of 
 things as I find them pretty useful.

 One of the big issues not being able to pass a variadic 
 variable to a template directly which is why the formatting 
 string is necessary(You can pass the typetuple to get the 
 types and size but not the compile time values if they exist.

 I think int this case a variadic alias would be very useful.

 alias T... => alias T0, alias T1, etc....
 (e.g. T[0] is an alias, T.length is number of aliases, etc...)

 In any case, maybe someone has a good way to make these things 
 easier and more useful. Being able to handle variadic types 
 and values in a consistent and simple way will make them 
 moreful.
How does this perform compared to naive/phobos splitting?
I don't know... probably not a huge difference unless phobo's is heavily optimized. With just one delim, there should be no difference. With 100 delim literals, it should probably be significant, more so when chars are used. If the compiler is able to optimize slices of literal strings then it should be even better. http://dpaste.dzfl.pl/2f10d24a The code has a bunch of errors on it but compiles fine on mine. Must be some command line switch or something.
Jul 22 2013
prev sibling parent "JS" <js.mdnq gmail.com> writes:
On Monday, 22 July 2013 at 21:04:42 UTC, John Colvin wrote:
 On Sunday, 21 July 2013 at 17:24:11 UTC, JS wrote:
 This seems to be a somewhat efficient string splitter

 http://dpaste.dzfl.pl/4307aa5f

 The basic idea is

 for(int j = 0; j < s.length; j++)
 	{
 		mixin(ExpandVariadicIf!("??Cs[j]??s[j..min(s.length-1, j + 
 %%L)]::", "d", "
 			if (r.length <= i) r.length += 5;
 			if (j != 0)
 			{
 				r[i++] = s[oldj..j];
 				oldj = j + %%L;
 			}
 			else
 				oldj = %%L;
 		 j += %%L; continue;", T));
 		
 	}

 ExpandVariadicIf creates a series of if's for each variadic 
 argument. There is some strange formatting(just some crap I 
 threw together to get something working) but it boils down to 
 generating compile time code that minimizes computations and 
 lookups by directly using the known compile time literals 
 passed.

 IMO these types of functions seem useful but ATM are just 
 hacks. Hopefully there is a better way to do these sorts of 
 things as I find them pretty useful.

 One of the big issues not being able to pass a variadic 
 variable to a template directly which is why the formatting 
 string is necessary(You can pass the typetuple to get the 
 types and size but not the compile time values if they exist.

 I think int this case a variadic alias would be very useful.

 alias T... => alias T0, alias T1, etc....
 (e.g. T[0] is an alias, T.length is number of aliases, etc...)

 In any case, maybe someone has a good way to make these things 
 easier and more useful. Being able to handle variadic types 
 and values in a consistent and simple way will make them 
 moreful.
How does this perform compared to naive/phobos splitting?
I don't know... probably not a huge difference unless phobo's is heavily optimized. With just one delim, there should be no difference. With 100 delim literals, it should probably be significant, more so when chars are used. If the compiler is able to optimize slices of literal strings then it should be even better. Heres my test code that you might be able to profile if you want: http://dpaste.dzfl.pl/2f10d24a The code has a bunch of errors on it but compiles fine on mine. Must be some command line switch or something. The Expand templates simply allow one to expand the variadic args into compile time expressions. e.g., we can do if (a == b) with normal args but not with variargs... the templates help accomplish that. (I'm sure there are better ways... think of the code as proof of concept).
Jul 22 2013
prev sibling parent "anonymous" <anonymous example.com> writes:
On Sunday, 21 July 2013 at 17:24:11 UTC, JS wrote:
 This seems to be a somewhat efficient string splitter

 http://dpaste.dzfl.pl/4307aa5f
I probably shouldn't have done this, but I wanted to know what that abomination actually does, so I reduced it (code below). In the end, all it does is accepting both char and string separators, something rather simple when you have static if. Some comments on the result: * I think the fiddling with i and r.length is silly, but it had some impact on performance, so I left it in. * Likewise, I'd rather just use std.algorithm.startsWith and not distinguish between char and string separators in split. Again, performance was slightly worse. And here it is: inout(char)[][] split(Separators ...)(inout(char)[] s, Separators separators) { size_t i = 0, oldj = 0; inout(char)[][] r; for(size_t j = 0; j < s.length; j++) { foreach(si, S; Separators) { immutable sep = separators[si]; static if(is(S : char)) { auto slice = s[j]; enum seplen = 1; } else static if(is(S : const(char)[])) { auto slice = s[j .. min(s.length, j + sep.length)]; immutable seplen = sep.length; } else static assert(false); if(slice == sep) { if(r.length <= i) r.length += 5; if(j != 0) r[i++] = s[oldj .. j]; j += seplen; oldj = j; } } } if(oldj < s.length) { auto tail = s[oldj .. $]; if(tail.length > 0) { if(r.length <= i) r.length++; r[i++] = tail; } } r.length = i; return r; }
Jul 23 2013