digitalmars.D - Vector performance

Manu (14/14) Jan 10 2012 Just thought I might share a real-life case study today. Been a lot of t...

bearophile (4/6) Jan 10 2012 Is Walter adding types/ops for 256 bit YMM registers too? (AVX2 is not h...

Manu (3/9) Jan 10 2012 Eventually.

Walter Bright (2/11) Jan 10 2012 Right. We'll see how the 128 bit SIMD works out before doing the work to...

F i L (16/38) Jan 10 2012 Wow, impressive difference.

Manu (5/41) Jan 10 2012 This is too simple an example, but yes that's basically the idea. Have s...

F i L (72/75) Jan 11 2012 Okay cool. That's basically what I wanted to know. However, I'm

Manu (16/89) Jan 11 2012 Define 'flexible'?

F i L (39/54) Jan 11 2012 I've tried to come up with a better term. I guess the logic

Walter Bright (2/7) Jan 11 2012 It's not ready yet. Give me some more time ;-)
Manu (16/47) Jan 12 2012 The vector's aren't quite like that.. you can't make a hardware vector o...
Iain Buclaw (22/72) Jan 12 2012 n

Marco Leise (12/89) Jan 12 2012 Looks like you two should discuss this. I see how Walter envisioned D to...

Iain Buclaw (15/113) Jan 13 2012 ;

Marco Leise (8/123) Jan 13 2012 :) Actually I don't know. Only heard about this "LLVM" that's supposed t...

simendsjo (3/127) Jan 13 2012 It was at bitbucket (updated ~6 months ago), but it seems it has moved

Manu <turkeyman gmail.com> writes:

Just thought I might share a real-life case study today. Been a lot of talk
of SIMD stuff, some people might be interested.

Working on an android product today, I noticed the matrix library was
burning a ridiculous amount of our frame time.
The disassembly looked like pretty normal ARM float code, so rewriting a
couple of the key routines to use the VFPU (carefully), our key device
moved from 19fps -> 34fps (limited at 30, we can now ship).
GalaxyS 2 is now running at 170fps, and devices we previously considered
un-viable can now actually get a release! .. Most devices saw around 25-45%
speed improvement.

Imagine if all vector code throughout was using the vector hardware nicely,
and not just one or 2 key functions...
Getting the API right (intuitively encouraging proper usage and disallowing
inefficient operations), it'll make a big difference!

Jan 10 2012

bearophile <bearophileHUGS lycos.com> writes:

Manu:

 Imagine if all vector code throughout was using the vector hardware nicely,
 and not just one or 2 key functions...

Is Walter adding types/ops for 256 bit YMM registers too? (AVX2 is not here
yet, but AVX is).

Bye,
bearophile

Jan 10 2012

Manu <turkeyman gmail.com> writes:

On 10 January 2012 16:31, bearophile <bearophileHUGS lycos.com> wrote:

 Manu:

 Imagine if all vector code throughout was using the vector hardware

 nicely,
 and not just one or 2 key functions...

 Is Walter adding types/ops for 256 bit YMM registers too? (AVX2 is not
 here yet, but AVX is).

Eventually.
I don't think we need to do that until we have gotten the API right though.

Jan 10 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 1/10/2012 6:39 AM, Manu wrote:
 On 10 January 2012 16:31, bearophile <bearophileHUGS lycos.com
 <mailto:bearophileHUGS lycos.com>> wrote:

     Manu:

      > Imagine if all vector code throughout was using the vector hardware
nicely,
      > and not just one or 2 key functions...

     Is Walter adding types/ops for 256 bit YMM registers too? (AVX2 is not here
     yet, but AVX is).

 Eventually.
 I don't think we need to do that until we have gotten the API right though.

Right. We'll see how the 128 bit SIMD works out before doing the work to extend
it.

Jan 10 2012

"F i L" <witte2008 gmail.com> writes:

On Tuesday, 10 January 2012 at 14:14:41 UTC, Manu wrote:
 Just thought I might share a real-life case study today. Been a 
 lot of talk
 of SIMD stuff, some people might be interested.

 Working on an android product today, I noticed the matrix 
 library was
 burning a ridiculous amount of our frame time.
 The disassembly looked like pretty normal ARM float code, so 
 rewriting a
 couple of the key routines to use the VFPU (carefully), our key 
 device
 moved from 19fps -> 34fps (limited at 30, we can now ship).
 GalaxyS 2 is now running at 170fps, and devices we previously 
 considered
 un-viable can now actually get a release! .. Most devices saw 
 around 25-45%
 speed improvement.

 Imagine if all vector code throughout was using the vector 
 hardware nicely,
 and not just one or 2 key functions...
 Getting the API right (intuitively encouraging proper usage and 
 disallowing
 inefficient operations), it'll make a big difference!

Wow, impressive difference.

In the future, how will [your idea of] D's SIMD vector libraries 
effect my math libraries? Will I simply replace:

    struct Vector4(T) {
        T x, y, z, w;
    }

with something like:

    struct Vector4(T) {
        __vector(T[4]) values;
    }

or will std.simd automatically provide a full range of vector 
operations (normalize, dot, cross, etc) like mono.simd? I can't 
help but hope for the latter, even if it does make my current 
efforts redundant, it would defiantly be a benefit to future D 
pioneers.

Jan 10 2012

Manu <turkeyman gmail.com> writes:

On 11 January 2012 02:47, F i L <witte2008 gmail.com> wrote:

 On Tuesday, 10 January 2012 at 14:14:41 UTC, Manu wrote:

 Just thought I might share a real-life case study today. Been a lot of
 talk
 of SIMD stuff, some people might be interested.

 Working on an android product today, I noticed the matrix library was
 burning a ridiculous amount of our frame time.
 The disassembly looked like pretty normal ARM float code, so rewriting a
 couple of the key routines to use the VFPU (carefully), our key device
 moved from 19fps -> 34fps (limited at 30, we can now ship).
 GalaxyS 2 is now running at 170fps, and devices we previously considered
 un-viable can now actually get a release! .. Most devices saw around
 25-45%
 speed improvement.

 Imagine if all vector code throughout was using the vector hardware
 nicely,
 and not just one or 2 key functions...
 Getting the API right (intuitively encouraging proper usage and
 disallowing
 inefficient operations), it'll make a big difference!

 Wow, impressive difference.

 In the future, how will [your idea of] D's SIMD vector libraries effect my
 math libraries? Will I simply replace:

   struct Vector4(T) {
       T x, y, z, w;
   }

 with something like:

   struct Vector4(T) {
       __vector(T[4]) values;
   }

This is too simple an example, but yes that's basically the idea. Have some
code of more complex operations?


 or will std.simd automatically provide a full range of vector operations
 (normalize, dot, cross, etc) like mono.simd? I can't help but hope for the
 latter, even if it does make my current efforts redundant, it would
 defiantly be a benefit to future D pioneers.

Yes the lib would supply standard operations, probably even a matrix type
or 2.

Jan 10 2012

"F i L" <witte2008 gmail.com> writes:

Manu wrote:
 Yes the lib would supply standard operations, probably even a 
 matrix type or 2.

Okay cool. That's basically what I wanted to know. However, I'm 
still wondering exactly how flexible these libraries will be.

 Have some code of more complex operations?

My main concern is with my "transition" objects. Example:

    struct Transition(T) {
        T value, start, target;
        alias value this;

        void update(U)(U iteration) {
            value = start + ((target - start) * iteration);
        }
    }


    struct Vector4(T) {
        T x, y, z, w;

        auto abs() { ... }
        auto dot() { ... }
        auto norm() { ... }
        // ect...

        static if (isTransition(T)) {
            void update(U)(U iteration) {
                x.update(iteration);
                y.update(iteration);
                z.update(iteration);
                w.update(iteration);
            }
        }
    }


    void main() {
        // Simple transition vector
        auto tranVec = Transition!(Vector4!float)();
        tranVec.target = {50f, 36f}
        tranVec.update(0.5f);

        // Or transition per channel
        auto vecTran = Vector4!(Transition!float)();
        vecTran.x.target = 50f;
        vecTran.y.target = 36f;
        vecTran.update();
    }

I could make a free function "auto Linear(U)(U start, U target)" 
but it's but best to keep things in object oriented containers, 
IMO. I've illustrated a simple linear transition here, but the 
goal is to make many different transition types: Bezier, EaseIn, 
Circular, Bounce, etc and continuous/physics one like: 
SmoothLookAt, Giggly, Shaky, etc.

My matrix code also looks something like:

    struct Matrix4(T)
     if (isVector(T) || isTransitionOfVector(T)) {
        T x, y, z, w;
    }

So Transitions potentially work with matrices in some areas. I'm 
still new to Quarternion math, but I'm guessing these might be 
able to apply there as well.

So my main concern is how SIMD will effect this sort of 
flexibility, or if I'm going to have to rethink my whole model 
here to accommodate SSE operations. SIMD is usually 128 bit 
right? So making a Vector4!double doesn't really work... unless 
it was something like:

    struct Vector4(T) {
        version (SIMD_128) {
            static if (T.sizeof == 32) {
                __v128 xyzw;
            }
            else if (T.sizeof == 64) {
                __v128 xy;
                __v128 zw;
            }
        }
        version (SIMD_256) {
            // ...
        }
    }

Of course, that would obviously complicate the method code quite 
a bit. IDK, your thoughts?

Jan 11 2012

Manu <turkeyman gmail.com> writes:

On 12 January 2012 01:15, F i L <witte2008 gmail.com> wrote:

 Manu wrote:

 Yes the lib would supply standard operations, probably even a matrix type
 or 2.

 Okay cool. That's basically what I wanted to know. However, I'm still
 wondering exactly how flexible these libraries will be.


Define 'flexible'?
Probably not very flexible, they will be fast!


 Have some code of more complex operations?

 My main concern is with my "transition" objects. Example:

   struct Transition(T) {
       T value, start, target;
       alias value this;

       void update(U)(U iteration) {
           value = start + ((target - start) * iteration);

       }
   }


   struct Vector4(T) {
       T x, y, z, w;

       auto abs() { ... }
       auto dot() { ... }
       auto norm() { ... }
       // ect...

       static if (isTransition(T)) {
           void update(U)(U iteration) {
               x.update(iteration);
               y.update(iteration);
               z.update(iteration);
               w.update(iteration);
           }
       }
   }


   void main() {
       // Simple transition vector
       auto tranVec = Transition!(Vector4!float)();
       tranVec.target = {50f, 36f}
       tranVec.update(0.5f);

       // Or transition per channel
       auto vecTran = Vector4!(Transition!float)();
       vecTran.x.target = 50f;
       vecTran.y.target = 36f;
       vecTran.update();
   }

 I could make a free function "auto Linear(U)(U start, U target)" but it's
 but best to keep things in object oriented containers, IMO. I've
 illustrated a simple linear transition here, but the goal is to make many
 different transition types: Bezier, EaseIn, Circular, Bounce, etc and
 continuous/physics one like: SmoothLookAt, Giggly, Shaky, etc.

I don't see any problem here. This looks trivial. It depends on basically
nothing, it might even work with what Walter has already added, and no libs
:)
I think the term 'iteration' is a bit ugly/misleading though, it should be
't' or 'time'.


My matrix code also looks something like:
   struct Matrix4(T)
    if (isVector(T) || isTransitionOfVector(T)) {

       T x, y, z, w;
   }

 So Transitions potentially work with matrices in some areas. I'm still new
 to Quarternion math, but I'm guessing these might be able to apply there as
 well.

I would probably make a transition of matrices, rather than a matrix of
vector transitions (so you can get references to the internal matrices)...
but aside from that, I don't see any problems here either.


So my main concern is how SIMD will effect this sort of flexibility, or if
 I'm going to have to rethink my whole model here to accommodate SSE
 operations. SIMD is usually 128 bit right? So making a Vector4!double
 doesn't really work... unless it was something like:

   struct Vector4(T) {
       version (SIMD_128) {
           static if (T.sizeof == 32) {
               __v128 xyzw;
           }
           else if (T.sizeof == 64) {
               __v128 xy;
               __v128 zw;
           }
       }
       version (SIMD_256) {
           // ...
       }
   }

 Of course, that would obviously complicate the method code quite a bit.
 IDK, your thoughts?

I think that is also possible if that's what you want to do, and I see no
reason why any of these constructs wouldn't be efficient (or supported).
You can probably even try it out now with what Walter has already done...

Jan 11 2012

"F i L" <witte2008 gmail.com> writes:

Manu wrote:
 Define 'flexible'?
 Probably not very flexible, they will be fast!

Flexible as in my examples.


 I think the term 'iteration' is a bit ugly/misleading though, 
 it should be
 't' or 'time'.

I've tried to come up with a better term. I guess the logic 
behind 'iteration' (which i got from someone else) is that an 
iteration of 2 gives you a value of two distances from start to 
target. Whereas 'time' (or 't') could imply any measurement, eg, 
seconds or hours. Maybe 'tween', as in between? idk, i'll keep 
looking.


 I would probably make a transition of matrices, rather than a 
 matrix of
 vector transitions (so you can get references to the internal 
 matrices)...

Well the idea is you can have both. You could even have a:

    Vector2!(Transition!(Vector4!(Transition!float))) // headache
    or something more practical...

    Vector4!(Vector4!float) // Matrix4f
    Vector4!(Transition!(Vector4!float)) // Smooth Matrix4f

Or anything like that. I should point out that my example didn't 
make it clear that a Matrix4!(Transition!float) would be 
pointless compared to Transition!(Matrix4!float) unless each 
Transition held it's own iteration value. Example:

    struct Transition(T, bool isTimer = false) {
        T value, start, target;
        alias value this;

        static if (isTimer) {
            float time, speed;

            void update() {
                time += speed;
                value = start + ((target - start) * time);
            }
        }
    }

That way each channel could update on it's own time frame. There 
may even be a way to have each channel be it's own separate 
Transition type. Which could be interesting. I'm still playing 
with possibilities.


 I think that is also possible if that's what you want to do, 
 and I see no
 reason why any of these constructs wouldn't be efficient (or 
 supported).
 You can probably even try it out now with what Walter has 
 already done...

Cool, I was unaware Walter had begun implementing SIMD 
operations. I'll have to build DMD and test them out. What's the 
syntax like right now?

I was under the impression you would be helping him here, or that 
you would be building the SIMD-based math libraries. Or something 
like that. That's why I was posting my examples in question to 
how the std.simd lib would compare.

Jan 11 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 1/11/2012 4:46 PM, F i L wrote:
 I think that is also possible if that's what you want to do, and I see no
 reason why any of these constructs wouldn't be efficient (or supported).
 You can probably even try it out now with what Walter has already done...

 Cool, I was unaware Walter had begun implementing SIMD operations. I'll have to
 build DMD and test them out. What's the syntax like right now?

It's not ready yet. Give me some more time ;-)

Jan 11 2012

Manu <turkeyman gmail.com> writes:

On 12 January 2012 02:46, F i L <witte2008 gmail.com> wrote:

 Well the idea is you can have both. You could even have a:

   Vector2!(Transition!(Vector4!(**Transition!float))) // headache
   or something more practical...

   Vector4!(Vector4!float) // Matrix4f
   Vector4!(Transition!(Vector4!**float)) // Smooth Matrix4f

 Or anything like that. I should point out that my example didn't make it
 clear that a Matrix4!(Transition!float) would be pointless compared to
 Transition!(Matrix4!float) unless each Transition held it's own iteration
 value. Example:

   struct Transition(T, bool isTimer = false) {

       T value, start, target;
       alias value this;

       static if (isTimer) {
           float time, speed;

           void update() {
               time += speed;
               value = start + ((target - start) * time);
           }
       }
   }

 That way each channel could update on it's own time frame. There may even
 be a way to have each channel be it's own separate Transition type. Which
 could be interesting. I'm still playing with possibilities.


The vector's aren't quite like that.. you can't make a hardware vector out
of anything, only things the hardware supports: __vector(float[4]) for
instance.
You can make your own vector template that wraps those I guess if you want
to make a matrix that way, but it sounds inefficient. When it comes to
writing the vector/matrix operations, if you're assuming generic code, you
won't be able to make it anywhere near as good as if you write a Matrix4x4
class.


I think that is also possible if that's what you want to do, and I see no
 reason why any of these constructs wouldn't be efficient (or supported).
 You can probably even try it out now with what Walter has already done...

 Cool, I was unaware Walter had begun implementing SIMD operations. I'll
 have to build DMD and test them out. What's the syntax like right now?

The syntax for the types (supporting basic arithmetic) look like
__vector(float[4]) float4vector.. Try it on the latest GDC.


I was under the impression you would be helping him here, or that you would
 be building the SIMD-based math libraries. Or something like that. That's
 why I was posting my examples in question to how the std.simd lib would
 compare.

I know nothing of DMD. Then the type semantics and opcode intrinsics are
working, I'll happily write the fiddly library, and I'm using GDC for my
own experiment in the mean time while Walter works on the code gen.

Jan 12 2012

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 12 January 2012 08:29, Manu <turkeyman gmail.com> wrote:
 On 12 January 2012 02:46, F i L <witte2008 gmail.com> wrote:
 Well the idea is you can have both. You could even have a:

 =A0 Vector2!(Transition!(Vector4!(Transition!float))) // headache
 =A0 or something more practical...

 =A0 Vector4!(Vector4!float) // Matrix4f
 =A0 Vector4!(Transition!(Vector4!float)) // Smooth Matrix4f

 Or anything like that. I should point out that my example didn't make it
 clear that a Matrix4!(Transition!float) would be pointless compared to
 Transition!(Matrix4!float) unless each Transition held it's own iteratio=


n
 value. Example:

 =A0 struct Transition(T, bool isTimer =3D false) {

 =A0 =A0 =A0 T value, start, target;
 =A0 =A0 =A0 alias value this;

 =A0 =A0 =A0 static if (isTimer) {
 =A0 =A0 =A0 =A0 =A0 float time, speed;

 =A0 =A0 =A0 =A0 =A0 void update() {
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 time +=3D speed;
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 value =3D start + ((target - start) * time);
 =A0 =A0 =A0 =A0 =A0 }
 =A0 =A0 =A0 }
 =A0 }

 That way each channel could update on it's own time frame. There may eve=


n
 be a way to have each channel be it's own separate Transition type. Whic=


h
 could be interesting. I'm still playing with possibilities.


 The vector's aren't quite like that.. you can't make a hardware vector ou=

t
 of anything, only things the hardware supports: __vector(float[4]) for
 instance.
 You can make your own vector template that wraps those I guess if you wan=

t
 to make a matrix that way, but it sounds inefficient. When it comes to
 writing the vector/matrix operations, if you're assuming generic code, yo=

u
 won't be able to make it anywhere near as good as if you write a Matrix4x=

4
 class.


 I think that is also possible if that's what you want to do, and I see =



no
 reason why any of these constructs wouldn't be efficient (or supported)=



.
 You can probably even try it out now with what Walter has already done.=



..
 Cool, I was unaware Walter had begun implementing SIMD operations. I'll
 have to build DMD and test them out. What's the syntax like right now?


 The syntax for the types (supporting basic arithmetic) look like
 __vector(float[4]) float4vector.. Try it on the latest GDC.

This will change.  I'm uploading core.simd later which has a Vector!()
template, and aliases for vfloat4, vdouble2, vint4, etc...

I don't plan on implementing vector instrinsics in the same way Walter
is doing it.

a)  GCC already prodives it's own intrinsics
b) The intrinsics I see Walter has already implemented in core.simd is
restricted to x86 line of architectures.


Regards
--=20
Iain Buclaw

*(p < e ? p++ : p) =3D (c & 0x0f) + '0';

Jan 12 2012

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 12.01.2012, 16:40 Uhr, schrieb Iain Buclaw <ibuclaw ubuntu.com>:

 On 12 January 2012 08:29, Manu <turkeyman gmail.com> wrote:
 On 12 January 2012 02:46, F i L <witte2008 gmail.com> wrote:
 Well the idea is you can have both. You could even have a:

   Vector2!(Transition!(Vector4!(Transition!float))) // headache
   or something more practical...

   Vector4!(Vector4!float) // Matrix4f
   Vector4!(Transition!(Vector4!float)) // Smooth Matrix4f

 Or anything like that. I should point out that my example didn't make  
 it
 clear that a Matrix4!(Transition!float) would be pointless compared to
 Transition!(Matrix4!float) unless each Transition held it's own  
 iteration
 value. Example:

   struct Transition(T, bool isTimer = false) {

       T value, start, target;
       alias value this;

       static if (isTimer) {
           float time, speed;

           void update() {
               time += speed;
               value = start + ((target - start) * time);
           }
       }
   }

 That way each channel could update on it's own time frame. There may  
 even
 be a way to have each channel be it's own separate Transition type.  
 Which
 could be interesting. I'm still playing with possibilities.


 The vector's aren't quite like that.. you can't make a hardware vector  
 out
 of anything, only things the hardware supports: __vector(float[4]) for
 instance.
 You can make your own vector template that wraps those I guess if you  
 want
 to make a matrix that way, but it sounds inefficient. When it comes to
 writing the vector/matrix operations, if you're assuming generic code,  
 you
 won't be able to make it anywhere near as good as if you write a  
 Matrix4x4
 class.


 I think that is also possible if that's what you want to do, and I  
 see no
 reason why any of these constructs wouldn't be efficient (or  
 supported).
 You can probably even try it out now with what Walter has already  
 done...


 Cool, I was unaware Walter had begun implementing SIMD operations. I'll
 have to build DMD and test them out. What's the syntax like right now?


 The syntax for the types (supporting basic arithmetic) look like
 __vector(float[4]) float4vector.. Try it on the latest GDC.

 This will change.  I'm uploading core.simd later which has a Vector!()
 template, and aliases for vfloat4, vdouble2, vint4, etc...

 I don't plan on implementing vector instrinsics in the same way Walter
 is doing it.

 a)  GCC already prodives it's own intrinsics
 b) The intrinsics I see Walter has already implemented in core.simd is
 restricted to x86 line of architectures.


 Regards

Looks like you two should discuss this. I see how Walter envisioned D to  
have an inline assembler unlike C, which resulted in several vendor  
specific syntaxes and how GCC has already done the bulk load of work to  
support SIMD and multiple platforms. Naturally you don't want to redo that  
work to wrap Walter's immature approach around the solid base in GDC.
Can you please have a meeting together with the LDC devs and decide on a  
fair way for everyone to support inline ASM and SIMD intrinsics? Once  
there is a common ground for three compilers other compilers will want to  
go the same route and everyone is happy with source code that can be  
compiled by every compiler.
I think this is a fundamental decision for a systems programming language.

Jan 12 2012

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 13 January 2012 04:16, Marco Leise <Marco.Leise gmx.de> wrote:
 Am 12.01.2012, 16:40 Uhr, schrieb Iain Buclaw <ibuclaw ubuntu.com>:

 On 12 January 2012 08:29, Manu <turkeyman gmail.com> wrote:
 On 12 January 2012 02:46, F i L <witte2008 gmail.com> wrote:
 Well the idea is you can have both. You could even have a:

 =A0Vector2!(Transition!(Vector4!(Transition!float))) // headache
 =A0or something more practical...

 =A0Vector4!(Vector4!float) // Matrix4f
 =A0Vector4!(Transition!(Vector4!float)) // Smooth Matrix4f

 Or anything like that. I should point out that my example didn't make =




it
 clear that a Matrix4!(Transition!float) would be pointless compared to
 Transition!(Matrix4!float) unless each Transition held it's own
 iteration
 value. Example:

 =A0struct Transition(T, bool isTimer =3D false) {

 =A0 =A0 =A0T value, start, target;
 =A0 =A0 =A0alias value this;

 =A0 =A0 =A0static if (isTimer) {
 =A0 =A0 =A0 =A0 =A0float time, speed;

 =A0 =A0 =A0 =A0 =A0void update() {
 =A0 =A0 =A0 =A0 =A0 =A0 =A0time +=3D speed;
 =A0 =A0 =A0 =A0 =A0 =A0 =A0value =3D start + ((target - start) * time)=




;
 =A0 =A0 =A0 =A0 =A0}
 =A0 =A0 =A0}
 =A0}

 That way each channel could update on it's own time frame. There may
 even
 be a way to have each channel be it's own separate Transition type.
 Which
 could be interesting. I'm still playing with possibilities.



 The vector's aren't quite like that.. you can't make a hardware vector
 out
 of anything, only things the hardware supports: __vector(float[4]) for
 instance.
 You can make your own vector template that wraps those I guess if you
 want
 to make a matrix that way, but it sounds inefficient. When it comes to
 writing the vector/matrix operations, if you're assuming generic code,
 you
 won't be able to make it anywhere near as good as if you write a
 Matrix4x4
 class.


 I think that is also possible if that's what you want to do, and I se=





e
 no
 reason why any of these constructs wouldn't be efficient (or
 supported).
 You can probably even try it out now with what Walter has already
 done...



 Cool, I was unaware Walter had begun implementing SIMD operations. I'l=




l
 have to build DMD and test them out. What's the syntax like right now?



 The syntax for the types (supporting basic arithmetic) look like
 __vector(float[4]) float4vector.. Try it on the latest GDC.

 This will change. =A0I'm uploading core.simd later which has a Vector!()
 template, and aliases for vfloat4, vdouble2, vint4, etc...

 I don't plan on implementing vector instrinsics in the same way Walter
 is doing it.

 a) =A0GCC already prodives it's own intrinsics
 b) The intrinsics I see Walter has already implemented in core.simd is
 restricted to x86 line of architectures.


 Regards


 Looks like you two should discuss this. I see how Walter envisioned D to
 have an inline assembler unlike C, which resulted in several vendor speci=

fic
 syntaxes and how GCC has already done the bulk load of work to support SI=

MD
 and multiple platforms. Naturally you don't want to redo that work to wra=

p
 Walter's immature approach around the solid base in GDC.
 Can you please have a meeting together with the LDC devs and decide on a
 fair way for everyone to support inline ASM and SIMD intrinsics? Once the=

re
 is a common ground for three compilers other compilers will want to go th=

e
 same route and everyone is happy with source code that can be compiled by
 every compiler.
 I think this is a fundamental decision for a systems programming language=

.

Who are the LDC devs? :)

--=20
Iain Buclaw

*(p < e ? p++ : p) =3D (c & 0x0f) + '0';

Jan 13 2012

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 13.01.2012, 11:37 Uhr, schrieb Iain Buclaw <ibuclaw ubuntu.com>:

 On 13 January 2012 04:16, Marco Leise <Marco.Leise gmx.de> wrote:
 Am 12.01.2012, 16:40 Uhr, schrieb Iain Buclaw <ibuclaw ubuntu.com>:

 On 12 January 2012 08:29, Manu <turkeyman gmail.com> wrote:
 On 12 January 2012 02:46, F i L <witte2008 gmail.com> wrote:
 Well the idea is you can have both. You could even have a:

  Vector2!(Transition!(Vector4!(Transition!float))) // headache
  or something more practical...

  Vector4!(Vector4!float) // Matrix4f
  Vector4!(Transition!(Vector4!float)) // Smooth Matrix4f

 Or anything like that. I should point out that my example didn't  
 make it
 clear that a Matrix4!(Transition!float) would be pointless compared  
 to
 Transition!(Matrix4!float) unless each Transition held it's own
 iteration
 value. Example:

  struct Transition(T, bool isTimer = false) {

      T value, start, target;
      alias value this;

      static if (isTimer) {
          float time, speed;

          void update() {
              time += speed;
              value = start + ((target - start) * time);
          }
      }
  }

 That way each channel could update on it's own time frame. There may
 even
 be a way to have each channel be it's own separate Transition type.
 Which
 could be interesting. I'm still playing with possibilities.



 The vector's aren't quite like that.. you can't make a hardware vector
 out
 of anything, only things the hardware supports: __vector(float[4]) for
 instance.
 You can make your own vector template that wraps those I guess if you
 want
 to make a matrix that way, but it sounds inefficient. When it comes to
 writing the vector/matrix operations, if you're assuming generic code,
 you
 won't be able to make it anywhere near as good as if you write a
 Matrix4x4
 class.


 I think that is also possible if that's what you want to do, and I  
 see
 no
 reason why any of these constructs wouldn't be efficient (or
 supported).
 You can probably even try it out now with what Walter has already
 done...



 Cool, I was unaware Walter had begun implementing SIMD operations.  
 I'll
 have to build DMD and test them out. What's the syntax like right  
 now?



 The syntax for the types (supporting basic arithmetic) look like
 __vector(float[4]) float4vector.. Try it on the latest GDC.

 This will change.  I'm uploading core.simd later which has a Vector!()
 template, and aliases for vfloat4, vdouble2, vint4, etc...

 I don't plan on implementing vector instrinsics in the same way Walter
 is doing it.

 a)  GCC already prodives it's own intrinsics
 b) The intrinsics I see Walter has already implemented in core.simd is
 restricted to x86 line of architectures.


 Regards


 Looks like you two should discuss this. I see how Walter envisioned D to
 have an inline assembler unlike C, which resulted in several vendor  
 specific
 syntaxes and how GCC has already done the bulk load of work to support  
 SIMD
 and multiple platforms. Naturally you don't want to redo that work to  
 wrap
 Walter's immature approach around the solid base in GDC.
 Can you please have a meeting together with the LDC devs and decide on a
 fair way for everyone to support inline ASM and SIMD intrinsics? Once  
 there
 is a common ground for three compilers other compilers will want to go  
 the
 same route and everyone is happy with source code that can be compiled  
 by
 every compiler.
 I think this is a fundamental decision for a systems programming  
 language.

 Who are the LDC devs? :)

:) Actually I don't know. Only heard about this "LLVM" that's supposed to  
be good at source-to-source compilation and is more of a framework than a  
single compiler. And then LDC emerged around that and I recently heard  
that 'its pretty much up to date'. Since you are working on GDC it seemed  
natural someone else must be actively maintaining LDC...
But dsource.org shows commits that are at least 2 years old. Look at the  
positive side: One less party to satisfy!

Jan 13 2012

simendsjo <simendsjo gmail.com> writes:

On 13.01.2012 12:21, Marco Leise wrote:
 Am 13.01.2012, 11:37 Uhr, schrieb Iain Buclaw <ibuclaw ubuntu.com>:

 On 13 January 2012 04:16, Marco Leise <Marco.Leise gmx.de> wrote:
 Am 12.01.2012, 16:40 Uhr, schrieb Iain Buclaw <ibuclaw ubuntu.com>:

 On 12 January 2012 08:29, Manu <turkeyman gmail.com> wrote:
 On 12 January 2012 02:46, F i L <witte2008 gmail.com> wrote:
 Well the idea is you can have both. You could even have a:

 Vector2!(Transition!(Vector4!(Transition!float))) // headache
 or something more practical...

 Vector4!(Vector4!float) // Matrix4f
 Vector4!(Transition!(Vector4!float)) // Smooth Matrix4f

 Or anything like that. I should point out that my example didn't
 make it
 clear that a Matrix4!(Transition!float) would be pointless
 compared to
 Transition!(Matrix4!float) unless each Transition held it's own
 iteration
 value. Example:

 struct Transition(T, bool isTimer = false) {

 T value, start, target;
 alias value this;

 static if (isTimer) {
 float time, speed;

 void update() {
 time += speed;
 value = start + ((target - start) * time);
 }
 }
 }

 That way each channel could update on it's own time frame. There may
 even
 be a way to have each channel be it's own separate Transition type.
 Which
 could be interesting. I'm still playing with possibilities.



 The vector's aren't quite like that.. you can't make a hardware vector
 out
 of anything, only things the hardware supports: __vector(float[4]) for
 instance.
 You can make your own vector template that wraps those I guess if you
 want
 to make a matrix that way, but it sounds inefficient. When it comes to
 writing the vector/matrix operations, if you're assuming generic code,
 you
 won't be able to make it anywhere near as good as if you write a
 Matrix4x4
 class.


 I think that is also possible if that's what you want to do, and
 I see
 no
 reason why any of these constructs wouldn't be efficient (or
 supported).
 You can probably even try it out now with what Walter has already
 done...



 Cool, I was unaware Walter had begun implementing SIMD operations.
 I'll
 have to build DMD and test them out. What's the syntax like right
 now?



 The syntax for the types (supporting basic arithmetic) look like
 __vector(float[4]) float4vector.. Try it on the latest GDC.

 This will change. I'm uploading core.simd later which has a Vector!()
 template, and aliases for vfloat4, vdouble2, vint4, etc...

 I don't plan on implementing vector instrinsics in the same way Walter
 is doing it.

 a) GCC already prodives it's own intrinsics
 b) The intrinsics I see Walter has already implemented in core.simd is
 restricted to x86 line of architectures.


 Regards


 Looks like you two should discuss this. I see how Walter envisioned D to
 have an inline assembler unlike C, which resulted in several vendor
 specific
 syntaxes and how GCC has already done the bulk load of work to
 support SIMD
 and multiple platforms. Naturally you don't want to redo that work to
 wrap
 Walter's immature approach around the solid base in GDC.
 Can you please have a meeting together with the LDC devs and decide on a
 fair way for everyone to support inline ASM and SIMD intrinsics? Once
 there
 is a common ground for three compilers other compilers will want to
 go the
 same route and everyone is happy with source code that can be
 compiled by
 every compiler.
 I think this is a fundamental decision for a systems programming
 language.

 Who are the LDC devs? :)

 :) Actually I don't know. Only heard about this "LLVM" that's supposed
 to be good at source-to-source compilation and is more of a framework
 than a single compiler. And then LDC emerged around that and I recently
 heard that 'its pretty much up to date'. Since you are working on GDC it
 seemed natural someone else must be actively maintaining LDC...
 But dsource.org shows commits that are at least 2 years old. Look at the
 positive side: One less party to satisfy!

It was at bitbucket (updated ~6 months ago), but it seems it has moved 
to github (updated 2 days ago) https://github.com/ldc-developers/ldc

Jan 13 2012

D Programming

C/C++ Programming

Other

digitalmars.D - Vector performance