www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Vector performance

reply Manu <turkeyman gmail.com> writes:
--bcaec51ba51f4c60fa04b62d227d
Content-Type: text/plain; charset=UTF-8

Just thought I might share a real-life case study today. Been a lot of talk
of SIMD stuff, some people might be interested.

Working on an android product today, I noticed the matrix library was
burning a ridiculous amount of our frame time.
The disassembly looked like pretty normal ARM float code, so rewriting a
couple of the key routines to use the VFPU (carefully), our key device
moved from 19fps -> 34fps (limited at 30, we can now ship).
GalaxyS 2 is now running at 170fps, and devices we previously considered
un-viable can now actually get a release! .. Most devices saw around 25-45%
speed improvement.

Imagine if all vector code throughout was using the vector hardware nicely,
and not just one or 2 key functions...
Getting the API right (intuitively encouraging proper usage and disallowing
inefficient operations), it'll make a big difference!

--bcaec51ba51f4c60fa04b62d227d
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Just thought=C2=A0I might share a real-life case study today. Been a lot of=
 talk of SIMD stuff, some people might be interested.<div><br></div><div>Wo=
rking on an android product today, I noticed the matrix library was burning=
 a ridiculous amount of our frame time.</div>
<div>The disassembly looked like pretty normal ARM float code, so rewriting=
 a couple of the key routines to use the VFPU (carefully), our key device m=
oved from 19fps -&gt; 34fps (limited at 30, we can now ship).</div><div>
GalaxyS 2 is now running at 170fps, and devices we previously considered un=
-viable can now actually get a release! .. Most devices saw around 25-45% s=
peed improvement.</div><div><br></div><div>Imagine if all vector code throu=
ghout was using the vector hardware nicely, and not just one or 2 key funct=
ions...</div>
<div>Getting the API right (intuitively encouraging proper usage and disall=
owing inefficient operations), it&#39;ll make a big difference!</div>

--bcaec51ba51f4c60fa04b62d227d--
Jan 10 2012
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Manu:

 Imagine if all vector code throughout was using the vector hardware nicely,
 and not just one or 2 key functions...

Is Walter adding types/ops for 256 bit YMM registers too? (AVX2 is not here yet, but AVX is). Bye, bearophile
Jan 10 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 1/10/2012 6:39 AM, Manu wrote:
 On 10 January 2012 16:31, bearophile <bearophileHUGS lycos.com
 <mailto:bearophileHUGS lycos.com>> wrote:

     Manu:

      > Imagine if all vector code throughout was using the vector hardware
nicely,
      > and not just one or 2 key functions...

     Is Walter adding types/ops for 256 bit YMM registers too? (AVX2 is not here
     yet, but AVX is).


 Eventually.
 I don't think we need to do that until we have gotten the API right though.

Right. We'll see how the 128 bit SIMD works out before doing the work to extend it.
Jan 10 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--00235429dda402915f04b62d7a73
Content-Type: text/plain; charset=UTF-8

On 10 January 2012 16:31, bearophile <bearophileHUGS lycos.com> wrote:

 Manu:

 Imagine if all vector code throughout was using the vector hardware

 and not just one or 2 key functions...

Is Walter adding types/ops for 256 bit YMM registers too? (AVX2 is not here yet, but AVX is).

Eventually. I don't think we need to do that until we have gotten the API right though. --00235429dda402915f04b62d7a73 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 10 January 2012 16:31, bearophile <span dir= =3D"ltr">&lt;<a href=3D"mailto:bearophileHUGS lycos.com">bearophileHUGS lyc= os.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"= margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Manu:<br> <div class=3D"im"><br> &gt; Imagine if all vector code throughout was using the vector hardware ni= cely,<br> &gt; and not just one or 2 key functions...<br> <br> </div>Is Walter adding types/ops for 256 bit YMM registers too? (AVX2 is no= t here yet, but AVX is).<br></blockquote><div><br></div><div>Eventually.</d= iv><div>I don&#39;t think we need to do that until we have gotten the API r= ight though.</div> </div> --00235429dda402915f04b62d7a73--
Jan 10 2012
prev sibling next sibling parent reply "F i L" <witte2008 gmail.com> writes:
On Tuesday, 10 January 2012 at 14:14:41 UTC, Manu wrote:
 Just thought I might share a real-life case study today. Been a 
 lot of talk
 of SIMD stuff, some people might be interested.

 Working on an android product today, I noticed the matrix 
 library was
 burning a ridiculous amount of our frame time.
 The disassembly looked like pretty normal ARM float code, so 
 rewriting a
 couple of the key routines to use the VFPU (carefully), our key 
 device
 moved from 19fps -> 34fps (limited at 30, we can now ship).
 GalaxyS 2 is now running at 170fps, and devices we previously 
 considered
 un-viable can now actually get a release! .. Most devices saw 
 around 25-45%
 speed improvement.

 Imagine if all vector code throughout was using the vector 
 hardware nicely,
 and not just one or 2 key functions...
 Getting the API right (intuitively encouraging proper usage and 
 disallowing
 inefficient operations), it'll make a big difference!

Wow, impressive difference. In the future, how will [your idea of] D's SIMD vector libraries effect my math libraries? Will I simply replace: struct Vector4(T) { T x, y, z, w; } with something like: struct Vector4(T) { __vector(T[4]) values; } or will std.simd automatically provide a full range of vector operations (normalize, dot, cross, etc) like mono.simd? I can't help but hope for the latter, even if it does make my current efforts redundant, it would defiantly be a benefit to future D pioneers.
Jan 10 2012
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 1/11/2012 4:46 PM, F i L wrote:
 I think that is also possible if that's what you want to do, and I see no
 reason why any of these constructs wouldn't be efficient (or supported).
 You can probably even try it out now with what Walter has already done...

Cool, I was unaware Walter had begun implementing SIMD operations. I'll have to build DMD and test them out. What's the syntax like right now?

It's not ready yet. Give me some more time ;-)
Jan 11 2012
prev sibling parent simendsjo <simendsjo gmail.com> writes:
On 13.01.2012 12:21, Marco Leise wrote:
 Am 13.01.2012, 11:37 Uhr, schrieb Iain Buclaw <ibuclaw ubuntu.com>:

 On 13 January 2012 04:16, Marco Leise <Marco.Leise gmx.de> wrote:
 Am 12.01.2012, 16:40 Uhr, schrieb Iain Buclaw <ibuclaw ubuntu.com>:

 On 12 January 2012 08:29, Manu <turkeyman gmail.com> wrote:
 On 12 January 2012 02:46, F i L <witte2008 gmail.com> wrote:
 Well the idea is you can have both. You could even have a:

 Vector2!(Transition!(Vector4!(Transition!float))) // headache
 or something more practical...

 Vector4!(Vector4!float) // Matrix4f
 Vector4!(Transition!(Vector4!float)) // Smooth Matrix4f

 Or anything like that. I should point out that my example didn't
 make it
 clear that a Matrix4!(Transition!float) would be pointless
 compared to
 Transition!(Matrix4!float) unless each Transition held it's own
 iteration
 value. Example:

 struct Transition(T, bool isTimer = false) {

 T value, start, target;
 alias value this;

 static if (isTimer) {
 float time, speed;

 void update() {
 time += speed;
 value = start + ((target - start) * time);
 }
 }
 }

 That way each channel could update on it's own time frame. There may
 even
 be a way to have each channel be it's own separate Transition type.
 Which
 could be interesting. I'm still playing with possibilities.

The vector's aren't quite like that.. you can't make a hardware vector out of anything, only things the hardware supports: __vector(float[4]) for instance. You can make your own vector template that wraps those I guess if you want to make a matrix that way, but it sounds inefficient. When it comes to writing the vector/matrix operations, if you're assuming generic code, you won't be able to make it anywhere near as good as if you write a Matrix4x4 class.
 I think that is also possible if that's what you want to do, and
 I see
 no
 reason why any of these constructs wouldn't be efficient (or
 supported).
 You can probably even try it out now with what Walter has already
 done...

Cool, I was unaware Walter had begun implementing SIMD operations. I'll have to build DMD and test them out. What's the syntax like right now?

The syntax for the types (supporting basic arithmetic) look like __vector(float[4]) float4vector.. Try it on the latest GDC.

This will change. I'm uploading core.simd later which has a Vector!() template, and aliases for vfloat4, vdouble2, vint4, etc... I don't plan on implementing vector instrinsics in the same way Walter is doing it. a) GCC already prodives it's own intrinsics b) The intrinsics I see Walter has already implemented in core.simd is restricted to x86 line of architectures. Regards

Looks like you two should discuss this. I see how Walter envisioned D to have an inline assembler unlike C, which resulted in several vendor specific syntaxes and how GCC has already done the bulk load of work to support SIMD and multiple platforms. Naturally you don't want to redo that work to wrap Walter's immature approach around the solid base in GDC. Can you please have a meeting together with the LDC devs and decide on a fair way for everyone to support inline ASM and SIMD intrinsics? Once there is a common ground for three compilers other compilers will want to go the same route and everyone is happy with source code that can be compiled by every compiler. I think this is a fundamental decision for a systems programming language.

Who are the LDC devs? :)

:) Actually I don't know. Only heard about this "LLVM" that's supposed to be good at source-to-source compilation and is more of a framework than a single compiler. And then LDC emerged around that and I recently heard that 'its pretty much up to date'. Since you are working on GDC it seemed natural someone else must be actively maintaining LDC... But dsource.org shows commits that are at least 2 years old. Look at the positive side: One less party to satisfy!

It was at bitbucket (updated ~6 months ago), but it seems it has moved to github (updated 2 days ago) https://github.com/ldc-developers/ldc
Jan 13 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--bcaec51ba51fbd6e4304b63b914e
Content-Type: text/plain; charset=UTF-8

On 11 January 2012 02:47, F i L <witte2008 gmail.com> wrote:

 On Tuesday, 10 January 2012 at 14:14:41 UTC, Manu wrote:

 Just thought I might share a real-life case study today. Been a lot of
 talk
 of SIMD stuff, some people might be interested.

 Working on an android product today, I noticed the matrix library was
 burning a ridiculous amount of our frame time.
 The disassembly looked like pretty normal ARM float code, so rewriting a
 couple of the key routines to use the VFPU (carefully), our key device
 moved from 19fps -> 34fps (limited at 30, we can now ship).
 GalaxyS 2 is now running at 170fps, and devices we previously considered
 un-viable can now actually get a release! .. Most devices saw around
 25-45%
 speed improvement.

 Imagine if all vector code throughout was using the vector hardware
 nicely,
 and not just one or 2 key functions...
 Getting the API right (intuitively encouraging proper usage and
 disallowing
 inefficient operations), it'll make a big difference!

Wow, impressive difference. In the future, how will [your idea of] D's SIMD vector libraries effect my math libraries? Will I simply replace: struct Vector4(T) { T x, y, z, w; } with something like: struct Vector4(T) { __vector(T[4]) values; }

This is too simple an example, but yes that's basically the idea. Have some code of more complex operations?
 or will std.simd automatically provide a full range of vector operations
 (normalize, dot, cross, etc) like mono.simd? I can't help but hope for the
 latter, even if it does make my current efforts redundant, it would
 defiantly be a benefit to future D pioneers.

Yes the lib would supply standard operations, probably even a matrix type or 2. --bcaec51ba51fbd6e4304b63b914e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 11 January 2012 02:47, F i L <span dir=3D"ltr= ">&lt;<a href=3D"mailto:witte2008 gmail.com">witte2008 gmail.com</a>&gt;</s= pan> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex= ;border-left:1px #ccc solid;padding-left:1ex"> <div class=3D"im">On Tuesday, 10 January 2012 at 14:14:41 UTC, Manu wrote:<= br> </div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l= eft:1px #ccc solid;padding-left:1ex"><div class=3D"im"> Just thought I might share a real-life case study today. Been a lot of talk= <br> of SIMD stuff, some people might be interested.<br> <br> Working on an android product today, I noticed the matrix library was<br> burning a ridiculous amount of our frame time.<br> The disassembly looked like pretty normal ARM float code, so rewriting a<br=

moved from 19fps -&gt; 34fps (limited at 30, we can now ship).<br> GalaxyS 2 is now running at 170fps, and devices we previously considered<br=

<br> speed improvement.<br> <br></div><div class=3D"im"> Imagine if all vector code throughout was using the vector hardware nicely,= <br> and not just one or 2 key functions...<br></div><div class=3D"im"> Getting the API right (intuitively encouraging proper usage and disallowing= <br> inefficient operations), it&#39;ll make a big difference!<br> </div></blockquote> <br> Wow, impressive difference.<br> <br> In the future, how will [your idea of] D&#39;s SIMD vector libraries effect= my math libraries? Will I simply replace:<br> <br> =C2=A0 struct Vector4(T) {<br> =C2=A0 =C2=A0 =C2=A0 T x, y, z, w;<br> =C2=A0 }<br> <br> with something like:<br> <br> =C2=A0 struct Vector4(T) {<br> =C2=A0 =C2=A0 =C2=A0 __vector(T[4]) values;<br> =C2=A0 }<br></blockquote><div><br></div><div>This is too simple an example= , but yes that&#39;s basically the idea. Have some code of more complex ope= rations?</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"m= argin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> or will std.simd automatically provide a full range of vector operations (n= ormalize, dot, cross, etc) like mono.simd? I can&#39;t help but hope for th= e latter, even if it does make my current efforts redundant, it would defia= ntly be a benefit to future D pioneers.<br> </blockquote></div><div><br></div><div>Yes the lib would supply standard op= erations, probably even a matrix type or 2.</div> --bcaec51ba51fbd6e4304b63b914e--
Jan 10 2012
prev sibling next sibling parent "F i L" <witte2008 gmail.com> writes:
Manu wrote:
 Yes the lib would supply standard operations, probably even a 
 matrix type or 2.

Okay cool. That's basically what I wanted to know. However, I'm still wondering exactly how flexible these libraries will be.
 Have some code of more complex operations?

My main concern is with my "transition" objects. Example: struct Transition(T) { T value, start, target; alias value this; void update(U)(U iteration) { value = start + ((target - start) * iteration); } } struct Vector4(T) { T x, y, z, w; auto abs() { ... } auto dot() { ... } auto norm() { ... } // ect... static if (isTransition(T)) { void update(U)(U iteration) { x.update(iteration); y.update(iteration); z.update(iteration); w.update(iteration); } } } void main() { // Simple transition vector auto tranVec = Transition!(Vector4!float)(); tranVec.target = {50f, 36f} tranVec.update(0.5f); // Or transition per channel auto vecTran = Vector4!(Transition!float)(); vecTran.x.target = 50f; vecTran.y.target = 36f; vecTran.update(); } I could make a free function "auto Linear(U)(U start, U target)" but it's but best to keep things in object oriented containers, IMO. I've illustrated a simple linear transition here, but the goal is to make many different transition types: Bezier, EaseIn, Circular, Bounce, etc and continuous/physics one like: SmoothLookAt, Giggly, Shaky, etc. My matrix code also looks something like: struct Matrix4(T) if (isVector(T) || isTransitionOfVector(T)) { T x, y, z, w; } So Transitions potentially work with matrices in some areas. I'm still new to Quarternion math, but I'm guessing these might be able to apply there as well. So my main concern is how SIMD will effect this sort of flexibility, or if I'm going to have to rethink my whole model here to accommodate SSE operations. SIMD is usually 128 bit right? So making a Vector4!double doesn't really work... unless it was something like: struct Vector4(T) { version (SIMD_128) { static if (T.sizeof == 32) { __v128 xyzw; } else if (T.sizeof == 64) { __v128 xy; __v128 zw; } } version (SIMD_256) { // ... } } Of course, that would obviously complicate the method code quite a bit. IDK, your thoughts?
Jan 11 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--0021cc022fb65fe14704b6495017
Content-Type: text/plain; charset=UTF-8

On 12 January 2012 01:15, F i L <witte2008 gmail.com> wrote:

 Manu wrote:

 Yes the lib would supply standard operations, probably even a matrix type
 or 2.

Okay cool. That's basically what I wanted to know. However, I'm still wondering exactly how flexible these libraries will be.

Define 'flexible'? Probably not very flexible, they will be fast!
 Have some code of more complex operations?

My main concern is with my "transition" objects. Example: struct Transition(T) { T value, start, target; alias value this; void update(U)(U iteration) { value = start + ((target - start) * iteration); } } struct Vector4(T) { T x, y, z, w; auto abs() { ... } auto dot() { ... } auto norm() { ... } // ect... static if (isTransition(T)) { void update(U)(U iteration) { x.update(iteration); y.update(iteration); z.update(iteration); w.update(iteration); } } } void main() { // Simple transition vector auto tranVec = Transition!(Vector4!float)(); tranVec.target = {50f, 36f} tranVec.update(0.5f); // Or transition per channel auto vecTran = Vector4!(Transition!float)(); vecTran.x.target = 50f; vecTran.y.target = 36f; vecTran.update(); } I could make a free function "auto Linear(U)(U start, U target)" but it's but best to keep things in object oriented containers, IMO. I've illustrated a simple linear transition here, but the goal is to make many different transition types: Bezier, EaseIn, Circular, Bounce, etc and continuous/physics one like: SmoothLookAt, Giggly, Shaky, etc.

I don't see any problem here. This looks trivial. It depends on basically nothing, it might even work with what Walter has already added, and no libs :) I think the term 'iteration' is a bit ugly/misleading though, it should be 't' or 'time'. My matrix code also looks something like:
   struct Matrix4(T)
    if (isVector(T) || isTransitionOfVector(T)) {

       T x, y, z, w;
   }

 So Transitions potentially work with matrices in some areas. I'm still new
 to Quarternion math, but I'm guessing these might be able to apply there as
 well.

I would probably make a transition of matrices, rather than a matrix of vector transitions (so you can get references to the internal matrices)... but aside from that, I don't see any problems here either. So my main concern is how SIMD will effect this sort of flexibility, or if
 I'm going to have to rethink my whole model here to accommodate SSE
 operations. SIMD is usually 128 bit right? So making a Vector4!double
 doesn't really work... unless it was something like:

   struct Vector4(T) {
       version (SIMD_128) {
           static if (T.sizeof == 32) {
               __v128 xyzw;
           }
           else if (T.sizeof == 64) {
               __v128 xy;
               __v128 zw;
           }
       }
       version (SIMD_256) {
           // ...
       }
   }

 Of course, that would obviously complicate the method code quite a bit.
 IDK, your thoughts?

I think that is also possible if that's what you want to do, and I see no reason why any of these constructs wouldn't be efficient (or supported). You can probably even try it out now with what Walter has already done... --0021cc022fb65fe14704b6495017 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 12 January 2012 01:15, F i L <span dir=3D"ltr= ">&lt;<a href=3D"mailto:witte2008 gmail.com">witte2008 gmail.com</a>&gt;</s= pan> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex= ;border-left:1px #ccc solid;padding-left:1ex"> <div class=3D"im">Manu wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> Yes the lib would supply standard operations, probably even a matrix type o= r 2.<br> </blockquote> <br></div> Okay cool. That&#39;s basically what I wanted to know. However, I&#39;m sti= ll wondering exactly how flexible these libraries will be.</blockquote><div=
<br></div><div>Define &#39;flexible&#39;?</div><div>Probably not very flex=

<div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8= ex;border-left:1px #ccc solid;padding-left:1ex"><div class=3D"im"><blockquo= te class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc so= lid;padding-left:1ex"> Have some code of more complex operations?<br> </blockquote> <br></div> My main concern is with my &quot;transition&quot; objects. Example:<br> <br> =C2=A0 struct Transition(T) {<br> =C2=A0 =C2=A0 =C2=A0 T value, start, target;<br> =C2=A0 =C2=A0 =C2=A0 alias value this;<br> <br> =C2=A0 =C2=A0 =C2=A0 void update(U)(U iteration) {<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 value =3D start + ((target - start) * i= teration);<div class=3D"im"><br> =C2=A0 =C2=A0 =C2=A0 }<br> =C2=A0 }<br> <br> <br> =C2=A0 struct Vector4(T) {<br> =C2=A0 =C2=A0 =C2=A0 T x, y, z, w;<br> <br></div> =C2=A0 =C2=A0 =C2=A0 auto abs() { ... }<br> =C2=A0 =C2=A0 =C2=A0 auto dot() { ... }<br> =C2=A0 =C2=A0 =C2=A0 auto norm() { ... }<br> =C2=A0 =C2=A0 =C2=A0 // ect...<br> <br> =C2=A0 =C2=A0 =C2=A0 static if (isTransition(T)) {<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 void update(U)(U iteration) {<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 x.update(iteration);<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 y.update(iteration);<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 z.update(iteration);<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 w.update(iteration);<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }<br> =C2=A0 =C2=A0 =C2=A0 }<br> =C2=A0 }<br> <br> <br> =C2=A0 void main() {<br> =C2=A0 =C2=A0 =C2=A0 // Simple transition vector<br> =C2=A0 =C2=A0 =C2=A0 auto tranVec =3D Transition!(Vector4!float)();<br> =C2=A0 =C2=A0 =C2=A0 tranVec.target =3D {50f, 36f}<br> =C2=A0 =C2=A0 =C2=A0 tranVec.update(0.5f);<br> <br> =C2=A0 =C2=A0 =C2=A0 // Or transition per channel<br> =C2=A0 =C2=A0 =C2=A0 auto vecTran =3D Vector4!(Transition!float)();<br> =C2=A0 =C2=A0 =C2=A0 vecTran.x.target =3D 50f;<br> =C2=A0 =C2=A0 =C2=A0 vecTran.y.target =3D 36f;<br> =C2=A0 =C2=A0 =C2=A0 vecTran.update();<br> =C2=A0 }<br> <br> I could make a free function &quot;auto Linear(U)(U start, U target)&quot; = but it&#39;s but best to keep things in object oriented containers, IMO. I&= #39;ve illustrated a simple linear transition here, but the goal is to make= many different transition types: Bezier, EaseIn, Circular, Bounce, etc and= continuous/physics one like: SmoothLookAt, Giggly, Shaky, etc.<br> </blockquote><div><br></div><div>I don&#39;t see any problem here. This loo= ks trivial. It depends on basically nothing, it might even work with what W= alter has already added, and no libs :)</div><div>I think the term &#39;ite= ration&#39; is a bit ugly/misleading though, it should be &#39;t&#39; or &#= 39;time&#39;.</div> <div><br></div><div><br></div><blockquote class=3D"gmail_quote" style=3D"ma= rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">My matrix code= also looks something like:<br> <br> =C2=A0 struct Matrix4(T)<br> =C2=A0 =C2=A0if (isVector(T) || isTransitionOfVector(T)) {<div class=3D"im= "><br> =C2=A0 =C2=A0 =C2=A0 T x, y, z, w;<br> =C2=A0 }<br> <br></div> So Transitions potentially work with matrices in some areas. I&#39;m still = new to Quarternion math, but I&#39;m guessing these might be able to apply = there as well.<br></blockquote><div><br></div><div>I would probably make a = transition of matrices, rather than a matrix of vector transitions (so you = can get references to the internal matrices)... but aside from that, I don&= #39;t see any problems here either.</div> <div><br></div><div><br></div><blockquote class=3D"gmail_quote" style=3D"ma= rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">So my main con= cern is how SIMD will effect this sort of flexibility, or if I&#39;m going = to have to rethink my whole model here to accommodate SSE operations. SIMD = is usually 128 bit right? So making a Vector4!double doesn&#39;t really wor= k... unless it was something like:<br> <br> =C2=A0 struct Vector4(T) {<br> =C2=A0 =C2=A0 =C2=A0 version (SIMD_128) {<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 static if (T.sizeof =3D=3D 32) {<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 __v128 xyzw;<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 else if (T.sizeof =3D=3D 64) {<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 __v128 xy;<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 __v128 zw;<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }<br> =C2=A0 =C2=A0 =C2=A0 }<br> =C2=A0 =C2=A0 =C2=A0 version (SIMD_256) {<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 // ...<br> =C2=A0 =C2=A0 =C2=A0 }<br> =C2=A0 }<br> <br> Of course, that would obviously complicate the method code quite a bit. IDK= , your thoughts?<br> </blockquote></div><br><div>I think that is also possible if that&#39;s wha= t you want to do, and I see no reason why any of these constructs wouldn&#3= 9;t be efficient (or supported).</div><div>You can probably even try it out= now with what Walter has already done...</div> --0021cc022fb65fe14704b6495017--
Jan 11 2012
prev sibling next sibling parent "F i L" <witte2008 gmail.com> writes:
Manu wrote:
 Define 'flexible'?
 Probably not very flexible, they will be fast!

Flexible as in my examples.
 I think the term 'iteration' is a bit ugly/misleading though, 
 it should be
 't' or 'time'.

I've tried to come up with a better term. I guess the logic behind 'iteration' (which i got from someone else) is that an iteration of 2 gives you a value of two distances from start to target. Whereas 'time' (or 't') could imply any measurement, eg, seconds or hours. Maybe 'tween', as in between? idk, i'll keep looking.
 I would probably make a transition of matrices, rather than a 
 matrix of
 vector transitions (so you can get references to the internal 
 matrices)...

Well the idea is you can have both. You could even have a: Vector2!(Transition!(Vector4!(Transition!float))) // headache or something more practical... Vector4!(Vector4!float) // Matrix4f Vector4!(Transition!(Vector4!float)) // Smooth Matrix4f Or anything like that. I should point out that my example didn't make it clear that a Matrix4!(Transition!float) would be pointless compared to Transition!(Matrix4!float) unless each Transition held it's own iteration value. Example: struct Transition(T, bool isTimer = false) { T value, start, target; alias value this; static if (isTimer) { float time, speed; void update() { time += speed; value = start + ((target - start) * time); } } } That way each channel could update on it's own time frame. There may even be a way to have each channel be it's own separate Transition type. Which could be interesting. I'm still playing with possibilities.
 I think that is also possible if that's what you want to do, 
 and I see no
 reason why any of these constructs wouldn't be efficient (or 
 supported).
 You can probably even try it out now with what Walter has 
 already done...

Cool, I was unaware Walter had begun implementing SIMD operations. I'll have to build DMD and test them out. What's the syntax like right now? I was under the impression you would be helping him here, or that you would be building the SIMD-based math libraries. Or something like that. That's why I was posting my examples in question to how the std.simd lib would compare.
Jan 11 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--bcaec51ba51fd7941b04b6508c94
Content-Type: text/plain; charset=UTF-8

On 12 January 2012 02:46, F i L <witte2008 gmail.com> wrote:

 Well the idea is you can have both. You could even have a:

   Vector2!(Transition!(Vector4!(**Transition!float))) // headache
   or something more practical...

   Vector4!(Vector4!float) // Matrix4f
   Vector4!(Transition!(Vector4!**float)) // Smooth Matrix4f

 Or anything like that. I should point out that my example didn't make it
 clear that a Matrix4!(Transition!float) would be pointless compared to
 Transition!(Matrix4!float) unless each Transition held it's own iteration
 value. Example:

   struct Transition(T, bool isTimer = false) {

       T value, start, target;
       alias value this;

       static if (isTimer) {
           float time, speed;

           void update() {
               time += speed;
               value = start + ((target - start) * time);
           }
       }
   }

 That way each channel could update on it's own time frame. There may even
 be a way to have each channel be it's own separate Transition type. Which
 could be interesting. I'm still playing with possibilities.

The vector's aren't quite like that.. you can't make a hardware vector out of anything, only things the hardware supports: __vector(float[4]) for instance. You can make your own vector template that wraps those I guess if you want to make a matrix that way, but it sounds inefficient. When it comes to writing the vector/matrix operations, if you're assuming generic code, you won't be able to make it anywhere near as good as if you write a Matrix4x4 class. I think that is also possible if that's what you want to do, and I see no
 reason why any of these constructs wouldn't be efficient (or supported).
 You can probably even try it out now with what Walter has already done...

Cool, I was unaware Walter had begun implementing SIMD operations. I'll have to build DMD and test them out. What's the syntax like right now?

The syntax for the types (supporting basic arithmetic) look like __vector(float[4]) float4vector.. Try it on the latest GDC. I was under the impression you would be helping him here, or that you would
 be building the SIMD-based math libraries. Or something like that. That's
 why I was posting my examples in question to how the std.simd lib would
 compare.

I know nothing of DMD. Then the type semantics and opcode intrinsics are working, I'll happily write the fiddly library, and I'm using GDC for my own experiment in the mean time while Walter works on the code gen. --bcaec51ba51fd7941b04b6508c94 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 12 January 2012 02:46, F i L <span dir=3D"ltr= ">&lt;<a href=3D"mailto:witte2008 gmail.com">witte2008 gmail.com</a>&gt;</s= pan> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex= ;border-left:1px #ccc solid;padding-left:1ex"> <div class=3D"im">Well the idea is you can have both. You could even have a= :</div> <br> =C2=A0 Vector2!(Transition!(Vector4!(<u></u>Transition!float))) // headach= e<br> =C2=A0 or something more practical...<br> <br> =C2=A0 Vector4!(Vector4!float) // Matrix4f<br> =C2=A0 Vector4!(Transition!(Vector4!<u></u>float)) // Smooth Matrix4f<br> <br> Or anything like that. I should point out that my example didn&#39;t make i= t clear that a Matrix4!(Transition!float) would be pointless compared to Tr= ansition!(Matrix4!float) unless each Transition held it&#39;s own iteration= value. Example:<br> <br> =C2=A0 struct Transition(T, bool isTimer =3D false) {<div class=3D"im"><br=

=C2=A0 =C2=A0 =C2=A0 alias value this;<br> <br></div> =C2=A0 =C2=A0 =C2=A0 static if (isTimer) {<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 float time, speed;<br> <br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 void update() {<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 time +=3D speed;<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 value =3D start + ((targe= t - start) * time);<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }<br> =C2=A0 =C2=A0 =C2=A0 }<br> =C2=A0 }<br> <br> That way each channel could update on it&#39;s own time frame. There may ev= en be a way to have each channel be it&#39;s own separate Transition type. = Which could be interesting. I&#39;m still playing with possibilities.</bloc= kquote> <div><br></div><div>The vector&#39;s aren&#39;t quite like that.. you can&#= 39;t make a hardware vector out of anything, only things the hardware suppo= rts: __vector(float[4]) for instance.</div><div>You can make your own vecto= r template that wraps those I guess if you want to make a matrix that way, = but it sounds inefficient. When it comes to writing the vector/matrix opera= tions, if you&#39;re assuming generic code, you won&#39;t be able to make i= t anywhere near as good as if you write a Matrix4x4 class.</div> <div><br></div><div><br></div><blockquote class=3D"gmail_quote" style=3D"ma= rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class=3D"= im"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-lef= t:1px #ccc solid;padding-left:1ex"> I think that is also possible if that&#39;s what you want to do, and I see = no<br> reason why any of these constructs wouldn&#39;t be efficient (or supported)= .<br> You can probably even try it out now with what Walter has already done...<b= r> </blockquote> <br></div> Cool, I was unaware Walter had begun implementing SIMD operations. I&#39;ll= have to build DMD and test them out. What&#39;s the syntax like right now?= <br></blockquote><div><br></div><div>The syntax for the types (supporting b= asic arithmetic) look like __vector(float[4]) float4vector.. Try it on the = latest GDC.</div> <div><br></div><div><br></div><blockquote class=3D"gmail_quote" style=3D"ma= rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I was under th= e impression you would be helping him here, or that you would be building t= he SIMD-based math libraries. Or something like that. That&#39;s why I was = posting my examples in question to how the std.simd lib would compare.<br> </blockquote></div><br><div>I know nothing of DMD. Then the type semantics = and opcode intrinsics are working, I&#39;ll happily write the fiddly librar= y, and I&#39;m using GDC for my own experiment in the mean time while Walte= r works on the code gen.</div> --bcaec51ba51fd7941b04b6508c94--
Jan 12 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 12 January 2012 08:29, Manu <turkeyman gmail.com> wrote:
 On 12 January 2012 02:46, F i L <witte2008 gmail.com> wrote:
 Well the idea is you can have both. You could even have a:

 =A0 Vector2!(Transition!(Vector4!(Transition!float))) // headache
 =A0 or something more practical...

 =A0 Vector4!(Vector4!float) // Matrix4f
 =A0 Vector4!(Transition!(Vector4!float)) // Smooth Matrix4f

 Or anything like that. I should point out that my example didn't make it
 clear that a Matrix4!(Transition!float) would be pointless compared to
 Transition!(Matrix4!float) unless each Transition held it's own iteratio=


 value. Example:

 =A0 struct Transition(T, bool isTimer =3D false) {

 =A0 =A0 =A0 T value, start, target;
 =A0 =A0 =A0 alias value this;

 =A0 =A0 =A0 static if (isTimer) {
 =A0 =A0 =A0 =A0 =A0 float time, speed;

 =A0 =A0 =A0 =A0 =A0 void update() {
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 time +=3D speed;
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 value =3D start + ((target - start) * time);
 =A0 =A0 =A0 =A0 =A0 }
 =A0 =A0 =A0 }
 =A0 }

 That way each channel could update on it's own time frame. There may eve=


 be a way to have each channel be it's own separate Transition type. Whic=


 could be interesting. I'm still playing with possibilities.

The vector's aren't quite like that.. you can't make a hardware vector ou=

 of anything, only things the hardware supports: __vector(float[4]) for
 instance.
 You can make your own vector template that wraps those I guess if you wan=

 to make a matrix that way, but it sounds inefficient. When it comes to
 writing the vector/matrix operations, if you're assuming generic code, yo=

 won't be able to make it anywhere near as good as if you write a Matrix4x=

 class.


 I think that is also possible if that's what you want to do, and I see =



 reason why any of these constructs wouldn't be efficient (or supported)=



 You can probably even try it out now with what Walter has already done.=



 Cool, I was unaware Walter had begun implementing SIMD operations. I'll
 have to build DMD and test them out. What's the syntax like right now?

The syntax for the types (supporting basic arithmetic) look like __vector(float[4]) float4vector.. Try it on the latest GDC.

This will change. I'm uploading core.simd later which has a Vector!() template, and aliases for vfloat4, vdouble2, vint4, etc... I don't plan on implementing vector instrinsics in the same way Walter is doing it. a) GCC already prodives it's own intrinsics b) The intrinsics I see Walter has already implemented in core.simd is restricted to x86 line of architectures. Regards --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';
Jan 12 2012
prev sibling next sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 12.01.2012, 16:40 Uhr, schrieb Iain Buclaw <ibuclaw ubuntu.com>:

 On 12 January 2012 08:29, Manu <turkeyman gmail.com> wrote:
 On 12 January 2012 02:46, F i L <witte2008 gmail.com> wrote:
 Well the idea is you can have both. You could even have a:

   Vector2!(Transition!(Vector4!(Transition!float))) // headache
   or something more practical...

   Vector4!(Vector4!float) // Matrix4f
   Vector4!(Transition!(Vector4!float)) // Smooth Matrix4f

 Or anything like that. I should point out that my example didn't make  
 it
 clear that a Matrix4!(Transition!float) would be pointless compared to
 Transition!(Matrix4!float) unless each Transition held it's own  
 iteration
 value. Example:

   struct Transition(T, bool isTimer = false) {

       T value, start, target;
       alias value this;

       static if (isTimer) {
           float time, speed;

           void update() {
               time += speed;
               value = start + ((target - start) * time);
           }
       }
   }

 That way each channel could update on it's own time frame. There may  
 even
 be a way to have each channel be it's own separate Transition type.  
 Which
 could be interesting. I'm still playing with possibilities.

The vector's aren't quite like that.. you can't make a hardware vector out of anything, only things the hardware supports: __vector(float[4]) for instance. You can make your own vector template that wraps those I guess if you want to make a matrix that way, but it sounds inefficient. When it comes to writing the vector/matrix operations, if you're assuming generic code, you won't be able to make it anywhere near as good as if you write a Matrix4x4 class.
 I think that is also possible if that's what you want to do, and I  
 see no
 reason why any of these constructs wouldn't be efficient (or  
 supported).
 You can probably even try it out now with what Walter has already  
 done...

Cool, I was unaware Walter had begun implementing SIMD operations. I'll have to build DMD and test them out. What's the syntax like right now?

The syntax for the types (supporting basic arithmetic) look like __vector(float[4]) float4vector.. Try it on the latest GDC.

This will change. I'm uploading core.simd later which has a Vector!() template, and aliases for vfloat4, vdouble2, vint4, etc... I don't plan on implementing vector instrinsics in the same way Walter is doing it. a) GCC already prodives it's own intrinsics b) The intrinsics I see Walter has already implemented in core.simd is restricted to x86 line of architectures. Regards

Looks like you two should discuss this. I see how Walter envisioned D to have an inline assembler unlike C, which resulted in several vendor specific syntaxes and how GCC has already done the bulk load of work to support SIMD and multiple platforms. Naturally you don't want to redo that work to wrap Walter's immature approach around the solid base in GDC. Can you please have a meeting together with the LDC devs and decide on a fair way for everyone to support inline ASM and SIMD intrinsics? Once there is a common ground for three compilers other compilers will want to go the same route and everyone is happy with source code that can be compiled by every compiler. I think this is a fundamental decision for a systems programming language.
Jan 12 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 13 January 2012 04:16, Marco Leise <Marco.Leise gmx.de> wrote:
 Am 12.01.2012, 16:40 Uhr, schrieb Iain Buclaw <ibuclaw ubuntu.com>:

 On 12 January 2012 08:29, Manu <turkeyman gmail.com> wrote:
 On 12 January 2012 02:46, F i L <witte2008 gmail.com> wrote:
 Well the idea is you can have both. You could even have a:

 =A0Vector2!(Transition!(Vector4!(Transition!float))) // headache
 =A0or something more practical...

 =A0Vector4!(Vector4!float) // Matrix4f
 =A0Vector4!(Transition!(Vector4!float)) // Smooth Matrix4f

 Or anything like that. I should point out that my example didn't make =




 clear that a Matrix4!(Transition!float) would be pointless compared to
 Transition!(Matrix4!float) unless each Transition held it's own
 iteration
 value. Example:

 =A0struct Transition(T, bool isTimer =3D false) {

 =A0 =A0 =A0T value, start, target;
 =A0 =A0 =A0alias value this;

 =A0 =A0 =A0static if (isTimer) {
 =A0 =A0 =A0 =A0 =A0float time, speed;

 =A0 =A0 =A0 =A0 =A0void update() {
 =A0 =A0 =A0 =A0 =A0 =A0 =A0time +=3D speed;
 =A0 =A0 =A0 =A0 =A0 =A0 =A0value =3D start + ((target - start) * time)=




 =A0 =A0 =A0 =A0 =A0}
 =A0 =A0 =A0}
 =A0}

 That way each channel could update on it's own time frame. There may
 even
 be a way to have each channel be it's own separate Transition type.
 Which
 could be interesting. I'm still playing with possibilities.

The vector's aren't quite like that.. you can't make a hardware vector out of anything, only things the hardware supports: __vector(float[4]) for instance. You can make your own vector template that wraps those I guess if you want to make a matrix that way, but it sounds inefficient. When it comes to writing the vector/matrix operations, if you're assuming generic code, you won't be able to make it anywhere near as good as if you write a Matrix4x4 class.
 I think that is also possible if that's what you want to do, and I se=





 no
 reason why any of these constructs wouldn't be efficient (or
 supported).
 You can probably even try it out now with what Walter has already
 done...

Cool, I was unaware Walter had begun implementing SIMD operations. I'l=




 have to build DMD and test them out. What's the syntax like right now?

The syntax for the types (supporting basic arithmetic) look like __vector(float[4]) float4vector.. Try it on the latest GDC.

This will change. =A0I'm uploading core.simd later which has a Vector!() template, and aliases for vfloat4, vdouble2, vint4, etc... I don't plan on implementing vector instrinsics in the same way Walter is doing it. a) =A0GCC already prodives it's own intrinsics b) The intrinsics I see Walter has already implemented in core.simd is restricted to x86 line of architectures. Regards

Looks like you two should discuss this. I see how Walter envisioned D to have an inline assembler unlike C, which resulted in several vendor speci=

 syntaxes and how GCC has already done the bulk load of work to support SI=

 and multiple platforms. Naturally you don't want to redo that work to wra=

 Walter's immature approach around the solid base in GDC.
 Can you please have a meeting together with the LDC devs and decide on a
 fair way for everyone to support inline ASM and SIMD intrinsics? Once the=

 is a common ground for three compilers other compilers will want to go th=

 same route and everyone is happy with source code that can be compiled by
 every compiler.
 I think this is a fundamental decision for a systems programming language=

Who are the LDC devs? :) --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';
Jan 13 2012
prev sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 13.01.2012, 11:37 Uhr, schrieb Iain Buclaw <ibuclaw ubuntu.com>:

 On 13 January 2012 04:16, Marco Leise <Marco.Leise gmx.de> wrote:
 Am 12.01.2012, 16:40 Uhr, schrieb Iain Buclaw <ibuclaw ubuntu.com>:

 On 12 January 2012 08:29, Manu <turkeyman gmail.com> wrote:
 On 12 January 2012 02:46, F i L <witte2008 gmail.com> wrote:
 Well the idea is you can have both. You could even have a:

  Vector2!(Transition!(Vector4!(Transition!float))) // headache
  or something more practical...

  Vector4!(Vector4!float) // Matrix4f
  Vector4!(Transition!(Vector4!float)) // Smooth Matrix4f

 Or anything like that. I should point out that my example didn't  
 make it
 clear that a Matrix4!(Transition!float) would be pointless compared  
 to
 Transition!(Matrix4!float) unless each Transition held it's own
 iteration
 value. Example:

  struct Transition(T, bool isTimer = false) {

      T value, start, target;
      alias value this;

      static if (isTimer) {
          float time, speed;

          void update() {
              time += speed;
              value = start + ((target - start) * time);
          }
      }
  }

 That way each channel could update on it's own time frame. There may
 even
 be a way to have each channel be it's own separate Transition type.
 Which
 could be interesting. I'm still playing with possibilities.

The vector's aren't quite like that.. you can't make a hardware vector out of anything, only things the hardware supports: __vector(float[4]) for instance. You can make your own vector template that wraps those I guess if you want to make a matrix that way, but it sounds inefficient. When it comes to writing the vector/matrix operations, if you're assuming generic code, you won't be able to make it anywhere near as good as if you write a Matrix4x4 class.
 I think that is also possible if that's what you want to do, and I  
 see
 no
 reason why any of these constructs wouldn't be efficient (or
 supported).
 You can probably even try it out now with what Walter has already
 done...

Cool, I was unaware Walter had begun implementing SIMD operations. I'll have to build DMD and test them out. What's the syntax like right now?

The syntax for the types (supporting basic arithmetic) look like __vector(float[4]) float4vector.. Try it on the latest GDC.

This will change. I'm uploading core.simd later which has a Vector!() template, and aliases for vfloat4, vdouble2, vint4, etc... I don't plan on implementing vector instrinsics in the same way Walter is doing it. a) GCC already prodives it's own intrinsics b) The intrinsics I see Walter has already implemented in core.simd is restricted to x86 line of architectures. Regards

Looks like you two should discuss this. I see how Walter envisioned D to have an inline assembler unlike C, which resulted in several vendor specific syntaxes and how GCC has already done the bulk load of work to support SIMD and multiple platforms. Naturally you don't want to redo that work to wrap Walter's immature approach around the solid base in GDC. Can you please have a meeting together with the LDC devs and decide on a fair way for everyone to support inline ASM and SIMD intrinsics? Once there is a common ground for three compilers other compilers will want to go the same route and everyone is happy with source code that can be compiled by every compiler. I think this is a fundamental decision for a systems programming language.

Who are the LDC devs? :)

:) Actually I don't know. Only heard about this "LLVM" that's supposed to be good at source-to-source compilation and is more of a framework than a single compiler. And then LDC emerged around that and I recently heard that 'its pretty much up to date'. Since you are working on GDC it seemed natural someone else must be actively maintaining LDC... But dsource.org shows commits that are at least 2 years old. Look at the positive side: One less party to satisfy!
Jan 13 2012