www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - A simple way to do compile time loop unrolling

reply "finalpatch" <fengli gmail.com> writes:
Just want to share a new way I just discovered to do loop 
unrolling.

template Unroll(alias CODE, alias N)
{
     static if (N == 1)
         enum Unroll = format(CODE, 0);
     else
         enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1);
}

after that you can write stuff like

mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3));

and it gets expanded to

v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2];

I find this method simpler than with foreach() and a tuple range, 
and also faster because it's identical to hand unrolling.
May 31 2013
next sibling parent "finalpatch" <fengli gmail.com> writes:
Minor improvement:

template Unroll(alias CODE, alias N, alias SEP="")
{
     static if (N == 1)
         enum Unroll = format(CODE, 0);
     else
         enum Unroll = Unroll!(CODE, N-1, SEP)~SEP~format(CODE, 
N-1);
}

So vector dot product can be unrolled like this:

mixin(Unroll!("v1[%1$d]*v2[%1$d]", 3, "+"));

which becomes: v1[0]*v2[0]+v1[1]*v2[1]+v1[2]*v2[2]

On Friday, 31 May 2013 at 14:06:19 UTC, finalpatch wrote:
 Just want to share a new way I just discovered to do loop 
 unrolling.

 template Unroll(alias CODE, alias N)
 {
     static if (N == 1)
         enum Unroll = format(CODE, 0);
     else
         enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1);
 }

 after that you can write stuff like

 mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3));

 and it gets expanded to

 v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2];

 I find this method simpler than with foreach() and a tuple 
 range, and also faster because it's identical to hand unrolling.

May 31 2013
prev sibling next sibling parent Piotr Szturmaj <bncrbme jadamspam.pl> writes:
W dniu 31.05.2013 16:06, finalpatch pisze:
 Just want to share a new way I just discovered to do loop unrolling.

 template Unroll(alias CODE, alias N)
 {
      static if (N == 1)
          enum Unroll = format(CODE, 0);
      else
          enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1);
 }

 after that you can write stuff like

 mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3));

 and it gets expanded to

 v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2];

 I find this method simpler than with foreach() and a tuple range, and
 also faster because it's identical to hand unrolling.

The advantage of foreach unrolling is that compiler can optimally choose unrolling depth as different depths may be faster or slower on different CPU targets. It is also an opportunity to do loop vectorization. But I doubt that either is available in DMD, not sure about GDC and LDC.
May 31 2013
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/31/13 10:06 AM, finalpatch wrote:
 Just want to share a new way I just discovered to do loop unrolling.

 template Unroll(alias CODE, alias N)
 {
 static if (N == 1)
 enum Unroll = format(CODE, 0);
 else
 enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1);
 }

 after that you can write stuff like

 mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3));

 and it gets expanded to

 v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2];

 I find this method simpler than with foreach() and a tuple range, and
 also faster because it's identical to hand unrolling.

Hehe, first shot is always a trip isn't it. Welcome aboard. We should have something like that in phobos. Andrei
May 31 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 We should have something like that in phobos.

Better (some part of static foreach): http://d.puremagic.com/issues/show_bug.cgi?id=4085 Bye, bearophile
May 31 2013
prev sibling next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Fri, 31 May 2013 16:33:19 +0200
schrieb Piotr Szturmaj <bncrbme jadamspam.pl>:

 It is also an opportunity to do loop vectorization. But I 
 doubt that either is available in DMD, not sure about GDC and LDC.

GDC once vectorized something for me, where I used a struct of 4 ubyte fields. I don't remember if it was a loop at all. I think all I did was operate on 3 of the fields in sequence applying the same operations and the compiler loaded the whole struct into an SSE register and it really payed off speed wise! But when you think about it, working with RGB or XYZW vectors is a common task in programming, so I can see why they put so much work into vectorization. The caveat is just that you have to remember to add a fourth dummy field to XYZ or RGB. -- Marco
May 31 2013
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Friday, 31 May 2013 at 14:06:19 UTC, finalpatch wrote:
 Just want to share a new way I just discovered to do loop 
 unrolling.

 template Unroll(alias CODE, alias N)
 {
     static if (N == 1)
         enum Unroll = format(CODE, 0);
     else
         enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1);
 }

 after that you can write stuff like

 mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3));

 and it gets expanded to

 v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2];

 I find this method simpler than with foreach() and a tuple 
 range, and also faster because it's identical to hand unrolling.

Remember that in D, most side-effect free functions can be run at compile time. No need for recursive template trickery: mixin(iota(3).map!(i => format("v[%1$d]+=rhs.v[%1$d];", i)).join());
May 31 2013
prev sibling next sibling parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On Fri, 31 May 2013 19:30:10 +0200
"Peter Alexander" <peter.alexander.au gmail.com> wrote:
 
 mixin(iota(3).map!(i => format("v[%1$d]+=rhs.v[%1$d];", 
 i)).join());

Dayamn! I knew CTFE had improved considerably over the last year or so, but even I didn't expect something like that to be working already. That's crazy! :)
May 31 2013
prev sibling parent "finalpatch" <fengli gmail.com> writes:
Wow! That's so very cool! We can make it even nicer with

template Unroll(alias CODE, alias N, alias SEP="")
{
     enum t = replace(CODE, "%", "%1$d");
     enum Unroll = iota(N).map!(i => format(t, i)).join(SEP);
}

And use % as the placeholder instead of the ugly %1$d:

mixin(Unroll!("v1[%]*v2[%]", 3, "+"));

It actually gets quite readable now.

On Friday, 31 May 2013 at 17:30:13 UTC, Peter Alexander wrote:
 Remember that in D, most side-effect free functions can be run 
 at compile time. No need for recursive template trickery:

 mixin(iota(3).map!(i => format("v[%1$d]+=rhs.v[%1$d];", 
 i)).join());

May 31 2013