digitalmars.D - primitive vector types

Mattias Holm (18/18) Feb 19 2009 Since (SIMD) vectors are so common and every reasonabe system support

Denis Koroskin (3/21) Feb 19 2009 I don't see any reason why float4 can't be made a library type.

Don (4/36) Feb 19 2009 Walter at one point suggested that float[4] should be specially
Andrei Alexandrescu (10/42) Feb 19 2009 Yah, I was thinking the same:

Jason House (2/15) Feb 19 2009 I've looked for this functionality in std.bitmanip before. I never thou...
Denis Koroskin (5/46) Feb 19 2009 That would be great. If float4 gets its way into D, I'll share our blazi...

Andrei Alexandrescu (3/66) Feb 19 2009 Put me down for that. What do I need to do?

Denis Koroskin (3/66) Feb 19 2009 Convince Walter to add float4 type and some intrinsics to DMD (I'll post...

Denis Koroskin (82/154) Feb 20 2009 Here is a nice documentation about MMX, SSE, SSE2 intrinsics:

Mattias Holm (4/4) Feb 20 2009 Yeah, this is good statistics and does point out that the vector add/mul...

Christian Kamm (2/6) Feb 21 2009 Yes, LDC would follow. The main reason people can't use these intrinsics...

Michel Fortin (8/17) Feb 21 2009 Instead of introducing new a new type, couldn't float[4] be the one

bearophile (4/6) Feb 21 2009 Alignment requirements, shuffling operations, scalar operations on just ...

Daniel Keep (6/14) Feb 21 2009 Another advantage would be that you could specify in the ABI that this

Don (18/34) Feb 21 2009 I don't think that's messy at all. I can't see much difference between

Andrei Alexandrescu (4/9) Feb 21 2009 I agree with float[4] as a good choice. So are value semantics for T[n]

Don (3/14) Feb 21 2009 Oh. I guess it was just a proposal then, and not implemented. We're not

Andrei Alexandrescu (14/28) Feb 21 2009 Yah. Walter agrees that that's the right thing to do. The only thing

Jarrett Billingsley (7/17) Feb 21 2009 Structs already work like this. In fact, the compiler will pass a

Andrei Alexandrescu (22/42) Feb 21 2009 Ok, you just tipped the balance :o). I'm also realizing something. The

Bill Baxter (5/13) Feb 21 2009 I don't follow you. Wouldn't they rather pass such huge chunks of

Andrei Alexandrescu (11/25) Feb 21 2009 What I'm saying (sorry for being unclear) is:

Bill Baxter (9/37) Feb 21 2009 Ok. And for large static arrays you can still explicitly use ref, right...

Andrei Alexandrescu (4/36) Feb 21 2009 Yah, ref on the callee side, or the [] operator without any arguments on...

Jarrett Billingsley (4/6) Feb 21 2009 Wait, what? I wouldn't have expected anything but "ref type[n] foo"

Christopher Wright (5/12) Feb 21 2009 void foo(T)(T t) {}

Jarrett Billingsley (2/17) Feb 21 2009 Oh. See, I don't usually template _everything_ so that didn't cross my ...

Michel Fortin (12/16) Feb 21 2009 I think it is the right decision too.

Andrei Alexandrescu (3/20) Feb 21 2009 Yah, and that would give a good model to follow for user-defined contain...

bearophile (7/11) Feb 22 2009 Well, I think the type system can be extended to manage that: the progra...

Christopher Wright (4/13) Feb 21 2009 This is less of an issue with the type of string literals being

Andrei Alexandrescu (4/18) Feb 21 2009 Exactly so. So with this other fix in the language there's even more

bearophile (6/25) Feb 21 2009 I have quoted it all because I like the experience and ideas you bring t...
Mattias Holm (19/39) Feb 21 2009 Yes, float[4] would be ok, if some CPU independent permutation support

Mattias Holm (19/19) Feb 22 2009 I think that the following would work reasonably well:

Denis Koroskin (2/22) Feb 22 2009 How would you implement it for user-defined types?

Christopher Wright (3/20) Feb 22 2009 T[] opIndex(int[] indices) { ... }

Denis Koroskin (2/21) Feb 22 2009 How about ranges included - v[0..3, 6, 5, 7..len] ?

bearophile (14/20) Feb 22 2009 Be careful, I think a syntax like:

Daniel Keep (47/70) Feb 22 2009 Swizzling a vector and multidimensional access are orthogonal on account

Don (4/35) Feb 23 2009 I've always believed that we need that syntax for multi-dimensional

Don (22/51) Feb 23 2009 Note that if you had static arrays with value semantics, with proper

Bill Baxter (5/56) Feb 23 2009 Actually its 4^4 if you do it like OpenCL/GLSL/HLSL/Cg and allow

Don (16/74) Feb 23 2009 Yes. Is the syntax sugar actually needed for all the permutations?

bearophile (8/12) Feb 23 2009 ...
Andrei Alexandrescu (6/29) Feb 23 2009 There's no need to ever enumerate all functions - they can be generated

Bill Baxter (11/23) Feb 23 2009 I think the issue is just that however you get them there it's a lot

Chad J (8/12) Feb 23 2009 enum { x=0, y=1, z=2, w=3 }

Fawzi Mohamed (17/31) Feb 27 2009 Sorry to bump up this discussion, but I was away and then busy with

Daniel Keep (29/54) Feb 19 2009 I remember implementing a vector struct [1] quite some time ago that had

Jarrett Billingsley (4/15) Feb 19 2009 It's just align(16).

Daniel Keep (24/42) Feb 19 2009 Yeah, except it doesn't do what you want in this case. For example:

Don (2/58) Feb 19 2009 http://d.puremagic.com/issues/show_bug.cgi?id=2278

Lionello Lunesu (4/5) Feb 20 2009 That bug is interesting. Didn't Walter have to change DMD for the Mac to...

downs (4/31) Feb 20 2009 We have that in dglut too :) It's optimized for the float[3]/float[4] ca...

Bill Baxter (17/34) Feb 19 2009 To justify making them primitive types you need to show that they are

bearophile (7/9) Feb 19 2009 Recently I have discussed about this with the LDC main designer, my idea...
Mattias Holm (40/48) Feb 19 2009 Firstly they are widespread. They might be hampered by the fact that

Joel C. Salomon (8/14) Feb 22 2009 Given that the wedge product is a defined operation on vectors (a^b is a

Mattias Holm <hannibal.holm gmail.com> writes:

Since (SIMD) vectors are so common and every reasonabe system support 
them in one way or the other (and scalar emulation of this is rather 
simple), why not have support for this in D directly?

Yes, the array operations are nice (and one of the main reasons for why 
I like D :) ), but have the problem that an array of floats must be 
aligned on float boundaries and not vector boundaries. In my mind 
vectors are a primitive data type that should be exposed by the 
programming language.

Something OpenCL-like:

	float4 vec;
	vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment
	vec.xyzw = vec.wyxz; // permutation
	vec[i] = 1.0; // indexing

And then we can easily immagine some extra nice features to have with 
respect to operators:

	vec ^ vec2; // 3d cross product for float vectors, for int vectors xor

Has this been discussed before?

/ Mattias

Feb 19 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm <hannibal.holm gmail.com>  
wrote:

 Since (SIMD) vectors are so common and every reasonabe system support  
 them in one way or the other (and scalar emulation of this is rather  
 simple), why not have support for this in D directly?

 Yes, the array operations are nice (and one of the main reasons for why  
 I like D :) ), but have the problem that an array of floats must be  
 aligned on float boundaries and not vector boundaries. In my mind  
 vectors are a primitive data type that should be exposed by the  
 programming language.

 Something OpenCL-like:

 	float4 vec;
 	vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment
 	vec.xyzw = vec.wyxz; // permutation
 	vec[i] = 1.0; // indexing

 And then we can easily immagine some extra nice features to have with  
 respect to operators:

 	vec ^ vec2; // 3d cross product for float vectors, for int vectors xor

 Has this been discussed before?

 / Mattias

I don't see any reason why float4 can't be made a library type.

Feb 19 2009

Don <nospam nospam.com> writes:

Denis Koroskin wrote:
 On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm 
 <hannibal.holm gmail.com> wrote:
 
 Since (SIMD) vectors are so common and every reasonabe system support 
 them in one way or the other (and scalar emulation of this is rather 
 simple), why not have support for this in D directly?

 Yes, the array operations are nice (and one of the main reasons for 
 why I like D :) ), but have the problem that an array of floats must 
 be aligned on float boundaries and not vector boundaries. In my mind 
 vectors are a primitive data type that should be exposed by the 
 programming language.

 Something OpenCL-like:

     float4 vec;
     vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment
     vec.xyzw = vec.wyxz; // permutation
     vec[i] = 1.0; // indexing

 And then we can easily immagine some extra nice features to have with 
 respect to operators:

     vec ^ vec2; // 3d cross product for float vectors, for int vectors 
 xor

 Has this been discussed before?

 / Mattias

 
 I don't see any reason why float4 can't be made a library type.

Walter at one point suggested that float[4] should be specially 
recognized by the compiler -- it would always be aligned, and stored in 
a SSE register if possible. Ditto for float[3].

Feb 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Denis Koroskin wrote:
 On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm 
 <hannibal.holm gmail.com> wrote:
 
 Since (SIMD) vectors are so common and every reasonabe system support 
 them in one way or the other (and scalar emulation of this is rather 
 simple), why not have support for this in D directly?

 Yes, the array operations are nice (and one of the main reasons for 
 why I like D :) ), but have the problem that an array of floats must 
 be aligned on float boundaries and not vector boundaries. In my mind 
 vectors are a primitive data type that should be exposed by the 
 programming language.

 Something OpenCL-like:

     float4 vec;
     vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment
     vec.xyzw = vec.wyxz; // permutation
     vec[i] = 1.0; // indexing

 And then we can easily immagine some extra nice features to have with 
 respect to operators:

     vec ^ vec2; // 3d cross product for float vectors, for int vectors 
 xor

 Has this been discussed before?

 / Mattias

 
 I don't see any reason why float4 can't be made a library type.

Yah, I was thinking the same:

struct float4
{
     __align(16) float[4] data; // right syntax and value?
     alias data this;
}

This looks like something that should go into std.matrix pronto. It even 
has value semantics even though fixed arrays don't :o/.


Andrei

Feb 19 2009

Jason House <jason.james.house gmail.com> writes:

Andrei Alexandrescu Wrote:

 Denis Koroskin wrote:
 I don't see any reason why float4 can't be made a library type.

 
 Yah, I was thinking the same:
 
 struct float4
 {
      __align(16) float[4] data; // right syntax and value?
      alias data this;
 }
 
 This looks like something that should go into std.matrix pronto. It even 
 has value semantics even though fixed arrays don't :o/.


I've looked for this functionality in std.bitmanip before.  I never thought to
look in std.matrix.  Fixed size bit arrays seems like an obvious choice for
SIMD optimization.

Feb 19 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Thu, 19 Feb 2009 23:05:34 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 Denis Koroskin wrote:
 On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm  
 <hannibal.holm gmail.com> wrote:

 Since (SIMD) vectors are so common and every reasonabe system support  
 them in one way or the other (and scalar emulation of this is rather  
 simple), why not have support for this in D directly?

 Yes, the array operations are nice (and one of the main reasons for  
 why I like D :) ), but have the problem that an array of floats must  
 be aligned on float boundaries and not vector boundaries. In my mind  
 vectors are a primitive data type that should be exposed by the  
 programming language.

 Something OpenCL-like:

     float4 vec;
     vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment
     vec.xyzw = vec.wyxz; // permutation
     vec[i] = 1.0; // indexing

 And then we can easily immagine some extra nice features to have with  
 respect to operators:

     vec ^ vec2; // 3d cross product for float vectors, for int vectors  
 xor

 Has this been discussed before?

 / Mattias

  I don't see any reason why float4 can't be made a library type.

 Yah, I was thinking the same:

 struct float4
 {
      __align(16) float[4] data; // right syntax and value?
      alias data this;
 }

 This looks like something that should go into std.matrix pronto. It even  
 has value semantics even though fixed arrays don't :o/.


 Andrei

That would be great. If float4 gets its way into D, I'll share our blazing fast
math code with community (most common operations on vectors, matrices,
quaternions etc). It is written entirely in SSE (intrinsics, not asm; there is
a problem with inlining asm in D, IIRC. Can anyone elaborate on this?) and
*very* fast. According to our benchmarks, that's the best we get squeeze out of
hardware.

I know LLVM have support for *very* wide range of intrinsics:
http://www.cs.ucla.edu/classes/spring08/cs259/llvm-2.2/include/llvm/Intrinsics.gen

Hopefully they will get into LDC (and DMD *hint* Walter *hint*) very soon.

Feb 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Denis Koroskin wrote:
 On Thu, 19 Feb 2009 23:05:34 +0300, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Denis Koroskin wrote:
 On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm 
 <hannibal.holm gmail.com> wrote:

 Since (SIMD) vectors are so common and every reasonabe system 
 support them in one way or the other (and scalar emulation of this 
 is rather simple), why not have support for this in D directly?

 Yes, the array operations are nice (and one of the main reasons for 
 why I like D :) ), but have the problem that an array of floats must 
 be aligned on float boundaries and not vector boundaries. In my mind 
 vectors are a primitive data type that should be exposed by the 
 programming language.

 Something OpenCL-like:

     float4 vec;
     vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment
     vec.xyzw = vec.wyxz; // permutation
     vec[i] = 1.0; // indexing

 And then we can easily immagine some extra nice features to have 
 with respect to operators:

     vec ^ vec2; // 3d cross product for float vectors, for int 
 vectors xor

 Has this been discussed before?

 / Mattias

  I don't see any reason why float4 can't be made a library type.

 Yah, I was thinking the same:

 struct float4
 {
      __align(16) float[4] data; // right syntax and value?
      alias data this;
 }

 This looks like something that should go into std.matrix pronto. It 
 even has value semantics even though fixed arrays don't :o/.


 Andrei

 
 That would be great. If float4 gets its way into D, I'll share our 
 blazing fast math code with community (most common operations on 
 vectors, matrices, quaternions etc). It is written entirely in SSE 
 (intrinsics, not asm; there is a problem with inlining asm in D, IIRC. 
 Can anyone elaborate on this?) and *very* fast. According to our 
 benchmarks, that's the best we get squeeze out of hardware.
 
 I know LLVM have support for *very* wide range of intrinsics:
 http://www.cs.ucla.edu/classes/spring08/cs259/llvm-2.2/include
llvm/Intrinsics.gen 
 
 
 Hopefully they will get into LDC (and DMD *hint* Walter *hint*) very soon.
 

Put me down for that. What do I need to do?

Andrei

Feb 19 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Fri, 20 Feb 2009 06:22:40 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 Denis Koroskin wrote:
 On Thu, 19 Feb 2009 23:05:34 +0300, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Denis Koroskin wrote:
 On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm  
 <hannibal.holm gmail.com> wrote:

 Since (SIMD) vectors are so common and every reasonabe system  
 support them in one way or the other (and scalar emulation of this  
 is rather simple), why not have support for this in D directly?

 Yes, the array operations are nice (and one of the main reasons for  
 why I like D :) ), but have the problem that an array of floats must  
 be aligned on float boundaries and not vector boundaries. In my mind  
 vectors are a primitive data type that should be exposed by the  
 programming language.

 Something OpenCL-like:

     float4 vec;
     vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment
     vec.xyzw = vec.wyxz; // permutation
     vec[i] = 1.0; // indexing

 And then we can easily immagine some extra nice features to have  
 with respect to operators:

     vec ^ vec2; // 3d cross product for float vectors, for int  
 vectors xor

 Has this been discussed before?

 / Mattias

  I don't see any reason why float4 can't be made a library type.

 Yah, I was thinking the same:

 struct float4
 {
      __align(16) float[4] data; // right syntax and value?
      alias data this;
 }

 This looks like something that should go into std.matrix pronto. It  
 even has value semantics even though fixed arrays don't :o/.


 Andrei

  That would be great. If float4 gets its way into D, I'll share our  
 blazing fast math code with community (most common operations on  
 vectors, matrices, quaternions etc). It is written entirely in SSE  
 (intrinsics, not asm; there is a problem with inlining asm in D, IIRC.  
 Can anyone elaborate on this?) and *very* fast. According to our  
 benchmarks, that's the best we get squeeze out of hardware.
  I know LLVM have support for *very* wide range of intrinsics:
 http://www.cs.ucla.edu/classes/spring08/cs259/llvm-2.2/include
llvm/Intrinsics.gen  
   Hopefully they will get into LDC (and DMD *hint* Walter *hint*) very  
 soon.

 Put me down for that. What do I need to do?

 Andrei

Convince Walter to add float4 type and some intrinsics to DMD (I'll post a list
of those we use later), LDC will follow, I believe.
There should be some type that would be treated specially. After all,
intrinsics have function signatures and those should specify some concrete
types.

Feb 19 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Fri, 20 Feb 2009 08:55:16 +0300, Denis Koroskin <2korden gmail.com>  
wrote:

 On Fri, 20 Feb 2009 06:22:40 +0300, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Denis Koroskin wrote:
 On Thu, 19 Feb 2009 23:05:34 +0300, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Denis Koroskin wrote:
 On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm  
 <hannibal.holm gmail.com> wrote:

 Since (SIMD) vectors are so common and every reasonabe system  
 support them in one way or the other (and scalar emulation of this  
 is rather simple), why not have support for this in D directly?

 Yes, the array operations are nice (and one of the main reasons for  
 why I like D :) ), but have the problem that an array of floats  
 must be aligned on float boundaries and not vector boundaries. In  
 my mind vectors are a primitive data type that should be exposed by  
 the programming language.

 Something OpenCL-like:

     float4 vec;
     vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment
     vec.xyzw = vec.wyxz; // permutation
     vec[i] = 1.0; // indexing

 And then we can easily immagine some extra nice features to have  
 with respect to operators:

     vec ^ vec2; // 3d cross product for float vectors, for int  
 vectors xor

 Has this been discussed before?

 / Mattias

  I don't see any reason why float4 can't be made a library type.

 Yah, I was thinking the same:

 struct float4
 {
      __align(16) float[4] data; // right syntax and value?
      alias data this;
 }

 This looks like something that should go into std.matrix pronto. It  
 even has value semantics even though fixed arrays don't :o/.


 Andrei

  That would be great. If float4 gets its way into D, I'll share our  
 blazing fast math code with community (most common operations on  
 vectors, matrices, quaternions etc). It is written entirely in SSE  
 (intrinsics, not asm; there is a problem with inlining asm in D, IIRC.  
 Can anyone elaborate on this?) and *very* fast. According to our  
 benchmarks, that's the best we get squeeze out of hardware.
  I know LLVM have support for *very* wide range of intrinsics:
 http://www.cs.ucla.edu/classes/spring08/cs259/llvm-2.2/include
llvm/Intrinsics.gen  
   Hopefully they will get into LDC (and DMD *hint* Walter *hint*) very  
 soon.

 Put me down for that. What do I need to do?

 Andrei

 Convince Walter to add float4 type and some intrinsics to DMD (I'll post  
 a list of those we use later), LDC will follow, I believe.
 There should be some type that would be treated specially. After all,  
 intrinsics have function signatures and those should specify some  
 concrete types.

Here is a nice documentation about MMX, SSE, SSE2 intrinsics:
http://msdn.microsoft.com/en-us/library/y0dh78ez(VS.80).aspx

Here is a quick statistics on what intrinsics are used in our code and how  
many times.
Note that it doesn't directly maps to how many times it is *actually* used  
in user-code.

This info may give Walter some information about priorities (those  
intrinsics that aren't often used may be given lower priority, for  
example).

Arithmetic Operations (Floating-Point SSE2 Intrinsics)
http://msdn.microsoft.com/en-us/library/708ya3be(VS.80).aspx
_mm_add_ss - 2
_mm_add_ps - 48
_mm_sub_ss - 4
_mm_sub_ps - 24
_mm_mul_ss - 2
_mm_mul_ps - 100
_mm_div_ss - 0
_mm_div_ps - 1
_mm_sqrt_ss - 0
_mm_sqrt_ps - 0
_mm_rcp_ss - 1
_mm_rcp_ps - 0
_mm_rsqrt_ss - 0
_mm_rsqrt_ps - 1
_mm_min_ss - 0
_mm_min_ps - 1
_mm_max_ss - 0
_mm_max_ps - 1

Store Operations (SSE)
http://msdn.microsoft.com/en-us/library/ybhzf6dk(VS.80).aspx
_mm_store_ss - 1
_mm_store1_ps - 0
_mm_store_ps1 - 0
_mm_store_ps - 0
_mm_storeu_ps - 0
_mm_storer_ps - 0
_mm_move_ss - 2

Set Operations (SSE)
http://msdn.microsoft.com/en-us/library/wbzwdy6a(VS.80).aspx
_mm_set_ss - 0
_mm_set1_ps - 0
_mm_set_ps1 - 19
_mm_set_ps - 45
_mm_setr_ps - 0
_mm_setzero_ps - 2

Logical Operations (SSE)
http://msdn.microsoft.com/en-us/library/9759as73(VS.80).aspx
_mm_and_ps - 2
_mm_andnot_ps - 0
_mm_or_ps - 0
_mm_xor_ps - 3

Miscellaneous Instructions That Use Streaming SIMD Extensions
http://msdn.microsoft.com/en-us/library/dzs626wx.aspx
_mm_shuffle_ps - 124
_mm_shuffle_pi16 - 0
_mm_unpackhi_ps - 0
_mm_unpacklo_ps - 0
_mm_loadh_pi - 0
_mm_storeh_pi - 0
_mm_movehl_ps - 0
_mm_movelh_ps - 0
_mm_loadl_pi - 0
_mm_storel_pi - 0
_mm_movemask_ps - 0
_mm_getcsr - 0
_mm_setcsr - 0
_mm_extract_si64 - 0
_mm_extracti_si64 - 0
_mm_insert_si64 - 0
_mm_inserti_si64 - 0

Comparison Intrinsics (SSE)
http://msdn.microsoft.com/en-us/library/w8kez9sf(VS.80).aspx
Not used

Conversion Operations (SSE)
http://msdn.microsoft.com/en-us/library/0d4dtzhb(VS.80).aspx
Not used

Macros
_MM_SHUFFLE - 100 - #define _MM_SHUFFLE(fp3,fp2,fp1,fp0) (((fp3) << 6) |  
((fp2) << 4) | ((fp1) << 2) | ((fp0)))

Feb 20 2009

Mattias Holm <hannibal.holm gmail.com> writes:

Yeah, this is good statistics and does point out that the vector
add/mul/permute stuff are used whenever vectors are in use.

Intrinsics is one thing, however, better would be platform independent stuff.
Altivec have a different syntax for the permute instructions than the SSE
shuffle instructions, so in my mind primitive vectors should support all the
basic operations of the base type such as +,-,* and / for float vectors.
Permutation should be supported with the OpenCL-like syntax (as it is easy to
remember) as I suggested.

Stuff like cross and dot products are up for libraries in my opinion (but could
be nice as operators for readability issues, but this is probably not worth the
hassle unless there is a way to override operators for standard types, which is
probably a bad idea anyway, better would be proper unicode support and a way to
define infix functions so it is possible define a cross product function with
the proper cross product operator as function name on a library level).

/ Mattias

Feb 20 2009

Christian Kamm <kamm-incasoftware remove-garbage.de> writes:

Denis Koroskin wrote:
 Convince Walter to add float4 type and some intrinsics to DMD (I'll post a
 list of those we use later), LDC will follow, I believe. There should be
 some type that would be treated specially. After all, intrinsics have
 function signatures and those should specify some concrete types.

Yes, LDC would follow. The main reason people can't use these intrinsics in LDC
at the moment is that there's no type in D that maps to an LLVM vector type.

Feb 21 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-02-21 04:34:07 -0500, Christian Kamm 
<kamm-incasoftware remove-garbage.de> said:

 Denis Koroskin wrote:
 Convince Walter to add float4 type and some intrinsics to DMD (I'll post a
 list of those we use later), LDC will follow, I believe. There should be
 some type that would be treated specially. After all, intrinsics have
 function signatures and those should specify some concrete types.

 
 Yes, LDC would follow. The main reason people can't use these 
 intrinsics in LDC at the moment is that there's no type in D that maps 
 to an LLVM vector type.

Instead of introducing new a new type, couldn't float[4] be the one 
mapped to a vector type? Why do we need a new type?

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Feb 21 2009

bearophile <bearophileHUGS lycos.com> writes:

Michel Fortin:
 Instead of introducing new a new type, couldn't float[4] be the one 
 mapped to a vector type? Why do we need a new type?

Alignment requirements, shuffling operations, scalar operations on just the
first item of the vector, ecc. It may be doable, and it may be even a nice
idea, but probably it requires lot of care.

Bye,
bearophile

Feb 21 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

bearophile wrote:
 Michel Fortin:
 Instead of introducing new a new type, couldn't float[4] be the one 
 mapped to a vector type? Why do we need a new type?

 
 Alignment requirements, shuffling operations, scalar operations on just the
first item of the vector, ecc. It may be doable, and it may be even a nice
idea, but probably it requires lot of care.
 
 Bye,
 bearophile

Another advantage would be that you could specify in the ABI that this
vector type should be passed to and returned from functions via the XMM
registers.  You could make a specific exception for float[4], but that
just seems messy.

  -- Daniel

Feb 21 2009

Don <nospam nospam.com> writes:

Daniel Keep wrote:
 
 bearophile wrote:
 Michel Fortin:
 Instead of introducing new a new type, couldn't float[4] be the one 
 mapped to a vector type? Why do we need a new type?

 Alignment requirements, shuffling operations, scalar operations on just the
first item of the vector, ecc. It may be doable, and it may be even a nice
idea, but probably it requires lot of care.

 Bye,
 bearophile

 
 Another advantage would be that you could specify in the ABI that this
 vector type should be passed to and returned from functions via the XMM
 registers.  You could make a specific exception for float[4], but that
 just seems messy.
 
   -- Daniel

I don't think that's messy at all. I can't see much difference between 
special support for float[4] versus float4. It's better if the code can 
take advantage of hardware without specific support. Bear in mind that 
SSE/SSE2 is a temporary situation. AVX provides for much longer arrays 
of vectors; and it's extensible. You'd end up needing to keep adding on 
special types whenever a new CPU comes out.

Note that the fundamental concept which is missing from the C virtual 
machine is that all modern machines can efficiently perform operations 
on arrays of built-in types of length 2^n, for some small value of n.
We need to get this into the language abstraction. Not follow C++ in 
hacking a few extra special types onto the old, deficient C model. And I 
think D is actually in a position to do this.

float[4] would be a greatly superior option if it could be done.
The key requirements are:
(1) need to specify that static arrays are passed by value.
(2) need to keep stack aligned to 16.
The good news is that both of these appear to be done on DMD2-Mac!

Feb 21 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 float[4] would be a greatly superior option if it could be done.
 The key requirements are:
 (1) need to specify that static arrays are passed by value.
 (2) need to keep stack aligned to 16.
 The good news is that both of these appear to be done on DMD2-Mac!

I agree with float[4] as a good choice. So are value semantics for T[n] 
implemented on the Mac??

Andrei

Feb 21 2009

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 Don wrote:
 float[4] would be a greatly superior option if it could be done.
 The key requirements are:
 (1) need to specify that static arrays are passed by value.
 (2) need to keep stack aligned to 16.
 The good news is that both of these appear to be done on DMD2-Mac!

 
 I agree with float[4] as a good choice. So are value semantics for T[n] 
 implemented on the Mac??
 
 Andrei

Oh. I guess it was just a proposal then, and not implemented. We're not 
as close as I thought. Bummer.

Feb 21 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 float[4] would be a greatly superior option if it could be done.
 The key requirements are:
 (1) need to specify that static arrays are passed by value.
 (2) need to keep stack aligned to 16.
 The good news is that both of these appear to be done on DMD2-Mac!

 I agree with float[4] as a good choice. So are value semantics for 
 T[n] implemented on the Mac??

 Andrei

 Oh. I guess it was just a proposal then, and not implemented. We're not 
 as close as I thought. Bummer.

Yah. Walter agrees that that's the right thing to do. The only thing 
that worries us is passing by-value large statically-sized vectors to 
template functions. But then gaming code wants to do exactly that. It's 
hard to figure where to draw the line. Imagine the error message "Hey, 
you're going a bit overboard by passing 512 bytes around on the stack".

Besides, we already do have a solution for pass-by-value vectors: 
Tuple!(T[N]). That would put the burden in the right place (on the 
programmer actively wanting pass-by-value). But then it's a shame that 
the built-in type T[N] is a weird exception that must be handled in all 
template code.

No idea what the right choice is. I'm just dumping whatever is buzzing 
around my head whenever I think of the issue.


Andrei

Feb 21 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Sat, Feb 21, 2009 at 12:32 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 Yah. Walter agrees that that's the right thing to do. The only thing that
 worries us is passing by-value large statically-sized vectors to template
 functions. But then gaming code wants to do exactly that. It's hard to
 figure where to draw the line. Imagine the error message "Hey, you're going
 a bit overboard by passing 512 bytes around on the stack".

Structs already work like this.  In fact, the compiler will pass a
struct in a register if it's 1, 2, or 4 bytes on x86.  Having the
compiler "magically" put float[4]s in SSE registers seems like a
similar idea.

 Besides, we already do have a solution for pass-by-value vectors:
 Tuple!(T[N]). That would put the burden in the right place (on the
 programmer actively wanting pass-by-value). But then it's a shame that the
 built-in type T[N] is a weird exception that must be handled in all template
 code.

Please make them value types.  I, for one, am tired of dealing with their crap.

Feb 21 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jarrett Billingsley wrote:
 On Sat, Feb 21, 2009 at 12:32 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Yah. Walter agrees that that's the right thing to do. The only thing that
 worries us is passing by-value large statically-sized vectors to template
 functions. But then gaming code wants to do exactly that. It's hard to
 figure where to draw the line. Imagine the error message "Hey, you're going
 a bit overboard by passing 512 bytes around on the stack".

 
 Structs already work like this.  In fact, the compiler will pass a
 struct in a register if it's 1, 2, or 4 bytes on x86.  Having the
 compiler "magically" put float[4]s in SSE registers seems like a
 similar idea.

I agree.

 Besides, we already do have a solution for pass-by-value vectors:
 Tuple!(T[N]). That would put the burden in the right place (on the
 programmer actively wanting pass-by-value). But then it's a shame that the
 built-in type T[N] is a weird exception that must be handled in all template
 code.

 
 Please make them value types.  I, for one, am tired of dealing with their crap.

Ok, you just tipped the balance :o). I'm also realizing something. The 
scenario I'm most afraid of is something like:

char[10000] humongous = "This is a humongous message. I will type here 
exactly 10000 characters. ... ";
foreach (i; 1 .. 100_000_000) writeln(humongous);

But then there is a reason making this scenario rather scarce: for large 
static arrays, it's hard to keep the claimed length (10000) in sync with 
the actual length of the vectors. I used to think that's a language 
defect and suggested the syntax char[$] humongous = " ... " for it, such 
that the compiler infers the length from the initializer. But now I get 
to think that the defect actually discourages people from defining very 
large statically-sized arrays unwittingly.

With mixins and template techniques, very large static arrays can still 
be generated, but such advanced uses also has a nice feedback: those who 
know the language well enough to embark on such styles of coding will 
also likely understand the cautions needed in making them work well.

So, yes, it seems like it's a solid choice to make statically-sized 
arrays value types. Now we only need to convince Walter that 
implementation is "a simple matter of coding" :o).


Andrei

Feb 21 2009

Bill Baxter <wbaxter gmail.com> writes:

On Sun, Feb 22, 2009 at 4:00 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Jarrett Billingsley wrote:
 On Sat, Feb 21, 2009 at 12:32 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Yah. Walter agrees that that's the right thing to do. The only thing that
 worries us is passing by-value large statically-sized vectors to template
 functions. But then gaming code wants to do exactly that.



I don't follow you.  Wouldn't they rather pass such huge chunks of
data by reference?

--bb

Feb 21 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Sun, Feb 22, 2009 at 4:00 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Jarrett Billingsley wrote:
 On Sat, Feb 21, 2009 at 12:32 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Yah. Walter agrees that that's the right thing to do. The only thing that
 worries us is passing by-value large statically-sized vectors to template
 functions. But then gaming code wants to do exactly that.



 
 I don't follow you.  Wouldn't they rather pass such huge chunks of
 data by reference?
 
 --bb

What I'm saying (sorry for being unclear) is:

1. If we choose T[N] as a value type, the downside is that people may 
pass large arrays by values to e.g. template functions.

2. The upside is that gaming programmers DO want to pass short arrays of 
type T[N] by value.

The conundrum is that a type system can't say that T[N] has some 
semantics for N <= Nmax and some other semantics for N > Nmax. So we 
need to pick one, and probably picking the value semantics is the right 
thing to do.


Andrei

Feb 21 2009

Bill Baxter <wbaxter gmail.com> writes:

On Sun, Feb 22, 2009 at 5:11 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Sun, Feb 22, 2009 at 4:00 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Jarrett Billingsley wrote:
 On Sat, Feb 21, 2009 at 12:32 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Yah. Walter agrees that that's the right thing to do. The only thing
 that
 worries us is passing by-value large statically-sized vectors to
 template
 functions. But then gaming code wants to do exactly that.



 I don't follow you.  Wouldn't they rather pass such huge chunks of
 data by reference?

 --bb

 What I'm saying (sorry for being unclear) is:

 1. If we choose T[N] as a value type, the downside is that people may pass
 large arrays by values to e.g. template functions.

 2. The upside is that gaming programmers DO want to pass short arrays of
 type T[N] by value.

 The conundrum is that a type system can't say that T[N] has some semantics
 for N <= Nmax and some other semantics for N > Nmax. So we need to pick one,
 and probably picking the value semantics is the right thing to do.

Ok.  And for large static arrays you can still explicitly use ref, right?
That's what I was confused about -- sounded like you were saying ref
wouldn't even be possible.  I don't think there are many cases where
you don't know in advance if the static array you are expecting is
huge or not.  And you can always get creative with static if() in the
cases where you don't.

--bb

Feb 21 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Sun, Feb 22, 2009 at 5:11 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Sun, Feb 22, 2009 at 4:00 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Jarrett Billingsley wrote:
 On Sat, Feb 21, 2009 at 12:32 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Yah. Walter agrees that that's the right thing to do. The only thing
 that
 worries us is passing by-value large statically-sized vectors to
 template
 functions. But then gaming code wants to do exactly that.



 I don't follow you.  Wouldn't they rather pass such huge chunks of
 data by reference?

 --bb

 What I'm saying (sorry for being unclear) is:

 1. If we choose T[N] as a value type, the downside is that people may pass
 large arrays by values to e.g. template functions.

 2. The upside is that gaming programmers DO want to pass short arrays of
 type T[N] by value.

 The conundrum is that a type system can't say that T[N] has some semantics
 for N <= Nmax and some other semantics for N > Nmax. So we need to pick one,
 and probably picking the value semantics is the right thing to do.

 
 Ok.  And for large static arrays you can still explicitly use ref, right?

Yah, ref on the callee side, or the [] operator without any arguments on 
the caller side.

Andrei

Feb 21 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Sat, Feb 21, 2009 at 5:47 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Yah, ref on the callee side, or the [] operator without any arguments on the
 caller side.

Wait, what?  I wouldn't have expected anything but "ref type[n] foo"
on the function parameter to pass byref.  What are you saying here?

Feb 21 2009

Christopher Wright <dhasenan gmail.com> writes:

Jarrett Billingsley wrote:
 On Sat, Feb 21, 2009 at 5:47 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Yah, ref on the callee side, or the [] operator without any arguments on the
 caller side.

 
 Wait, what?  I wouldn't have expected anything but "ref type[n] foo"
 on the function parameter to pass byref.  What are you saying here?

void foo(T)(T t) {}

int[5_000_000] i;
foo(i[]); // foo!(int[]) -- semi-by-reference
foo(i); // foo!(int[5_000_000]) -- by value

Feb 21 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Sat, Feb 21, 2009 at 6:37 PM, Christopher Wright <dhasenan gmail.com> wrote:
 Jarrett Billingsley wrote:
 On Sat, Feb 21, 2009 at 5:47 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Yah, ref on the callee side, or the [] operator without any arguments on
 the
 caller side.

 Wait, what?  I wouldn't have expected anything but "ref type[n] foo"
 on the function parameter to pass byref.  What are you saying here?

 void foo(T)(T t) {}

 int[5_000_000] i;
 foo(i[]); // foo!(int[]) -- semi-by-reference
 foo(i); // foo!(int[5_000_000]) -- by value

Oh.  See, I don't usually template _everything_ so that didn't cross my mind ;)

Feb 21 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-02-21 15:11:15 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 The conundrum is that a type system can't say that T[N] has some 
 semantics for N <= Nmax and some other semantics for N > Nmax. So we 
 need to pick one, and probably picking the value semantics is the right 
 thing to do.

I think it is the right decision too.

This way "static array" becomes the container type and "dynamic array" 
is the corresonding range type. Perhaps some concept renaming is in 
order for D2:

	static array  => array
	dynamic array => array range (or slice)

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Feb 21 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2009-02-21 15:11:15 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 The conundrum is that a type system can't say that T[N] has some 
 semantics for N <= Nmax and some other semantics for N > Nmax. So we 
 need to pick one, and probably picking the value semantics is the 
 right thing to do.

 
 I think it is the right decision too.
 
 This way "static array" becomes the container type and "dynamic array" 
 is the corresonding range type. Perhaps some concept renaming is in 
 order for D2:
 
     static array  => array
     dynamic array => array range (or slice)
 

Yah, and that would give a good model to follow for user-defined containers.

Andrei

Feb 21 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 The conundrum is that a type system can't say that T[N] has some 
 semantics for N <= Nmax and some other semantics for N > Nmax. So we 
 need to pick one, and probably picking the value semantics is the right 
 thing to do.

Well, I think the type system can be extended to manage that: the programmer
may specify an optional compiler command line argument like:
-ms 128
Now all structs/static arrays more than 128 bytes long are passed by reference
:-)

Alternative solution, less extreme: instead of changing the value/ref pass
semantics, when you add such optional command line argument the compiler gives
you a compilation warning (or even error, if you want) everywhere you try to
pass by value struct or static array more than 128 bytes long.

Bye,
bearophile

Feb 22 2009

Christopher Wright <dhasenan gmail.com> writes:

Andrei Alexandrescu wrote:
 Jarrett Billingsley wrote:
 Please make them value types.  I, for one, am tired of dealing with 
 their crap.

 
 Ok, you just tipped the balance :o). I'm also realizing something. The 
 scenario I'm most afraid of is something like:
 
 char[10000] humongous = "This is a humongous message. I will type here 
 exactly 10000 characters. ... ";

This is less of an issue with the type of string literals being 
invariant(T)[] rather than invariant(T)[length]. Even less so if the 
default type for array literals were dynamic rather than static arrays.

Feb 21 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Christopher Wright wrote:
 Andrei Alexandrescu wrote:
 Jarrett Billingsley wrote:
 Please make them value types.  I, for one, am tired of dealing with 
 their crap.

 Ok, you just tipped the balance :o). I'm also realizing something. The 
 scenario I'm most afraid of is something like:

 char[10000] humongous = "This is a humongous message. I will type here 
 exactly 10000 characters. ... ";

 
 This is less of an issue with the type of string literals being 
 invariant(T)[] rather than invariant(T)[length]. Even less so if the 
 default type for array literals were dynamic rather than static arrays.

Exactly so. So with this other fix in the language there's even more 
push toward value semantics for T[N].

Andrei

Feb 21 2009

bearophile <bearophileHUGS lycos.com> writes:

Don:
 I don't think that's messy at all. I can't see much difference between 
 special support for float[4] versus float4. It's better if the code can 
 take advantage of hardware without specific support. Bear in mind that 
 SSE/SSE2 is a temporary situation. AVX provides for much longer arrays 
 of vectors; and it's extensible. You'd end up needing to keep adding on 
 special types whenever a new CPU comes out.
 
 Note that the fundamental concept which is missing from the C virtual 
 machine is that all modern machines can efficiently perform operations 
 on arrays of built-in types of length 2^n, for some small value of n.
 We need to get this into the language abstraction. Not follow C++ in 
 hacking a few extra special types onto the old, deficient C model. And I 
 think D is actually in a position to do this.
 
 float[4] would be a greatly superior option if it could be done.
 The key requirements are:
 (1) need to specify that static arrays are passed by value.
 (2) need to keep stack aligned to 16.
 The good news is that both of these appear to be done on DMD2-Mac!

I have quoted it all because I like the experience and ideas you bring to D.
But how can the operations like shuffling, or how to map a sqrt on just the
first item of such 4 floats, or on them all, etc? What syntax can be used?

Regarding the array operations already implemented, are there ways to force the
compiler to inline such code/operations?

Bye,
bearophile

Feb 21 2009

Mattias Holm <hannibal.holm gmail.com> writes:

On 2009-02-21 17:03:06 +0100, Don <nospam nospam.com> said:
 
 I don't think that's messy at all. I can't see much difference between 
 special support for float[4] versus float4. It's better if the code can 
 take advantage of hardware without specific support. Bear in mind that 
 SSE/SSE2 is a temporary situation. AVX provides for much longer arrays 
 of vectors; and it's extensible. You'd end up needing to keep adding on 
 special types whenever a new CPU comes out.
 
 Note that the fundamental concept which is missing from the C virtual 
 machine is that all modern machines can efficiently perform operations 
 on arrays of built-in types of length 2^n, for some small value of n.
 We need to get this into the language abstraction. Not follow C++ in 
 hacking a few extra special types onto the old, deficient C model. And 
 I think D is actually in a position to do this.
 
 float[4] would be a greatly superior option if it could be done.
 The key requirements are:
 (1) need to specify that static arrays are passed by value.
 (2) need to keep stack aligned to 16.
 The good news is that both of these appear to be done on DMD2-Mac!

Yes, float[4] would be ok, if some CPU independent permutation support 
can be added. Would this be with some intrinsic then or what? I very 
much like the OpenCL syntax for permutation, but I suppose that an 
intrinsic such as "float[4] noref permute(float[4] noref vec, int 
newPos0, int newPos1, int newPos2, int newPos3)" would work as well. 
Note that this should also work with double[2], byte[16], short[8] and 
int[4].

How would pass by value semantics be implemented without breaking 
compatibility work, would you you have (yet another) type qulifier 
(noref used in the example above)?

In my opinion, vectors are a fundamental type in my mind and there is a 
reason that arrays and vectors are kept separate in LLVM. The problem 
is exposing this to the programmer in the proper way. OpenCL does have 
a lot of nice things in it that might be worth considering. But, yeah, 
if something is done for ensuring the alignment of power of 2 vectors, 
the permutation support and the pass by value, then I would be fairly 
happy with that as well.

/ Mattias

Feb 21 2009

Mattias Holm <hannibal.holm gmail.com> writes:

I think that the following would work reasonably well:

	allow the [] operator for arrays to take comma separated lists of indices.

So the OpenCL like statement:

	v.xyzw = v2.wzyx;

will be written as:

	v[0,1,2,3] = v2[3,2,1,0];

Would this be ok? This is a general extension of the array slicing, and 
it might be possible to permute with a combination of slices and 
indices like this (i.e. v[0..3, 6, 5, 7.. len]). Is permutation 
operations something that Walter would be willing to add?

As said by someone else in this thread, there need to be a way to 
specify that static arrays are passed by value, so can the ref keyword 
be paired with the oposite "byval" or something similar.

And also, functions need to be able to return static arrays which is 
not possible at the moment.

Note that the support should be general and work with any array type 
(so that you can get YMM support whenever that makes it into the future 
chips).


/ Mattias

Feb 22 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Sun, 22 Feb 2009 14:02:53 +0300, Mattias Holm <hannibal.holm gmail.com>
wrote:

 I think that the following would work reasonably well:

 	allow the [] operator for arrays to take comma separated lists of  
 indices.

 So the OpenCL like statement:

 	v.xyzw = v2.wzyx;

 will be written as:

 	v[0,1,2,3] = v2[3,2,1,0];

 Would this be ok? This is a general extension of the array slicing, and  
 it might be possible to permute with a combination of slices and indices  
 like this (i.e. v[0..3, 6, 5, 7.. len]). Is permutation operations  
 something that Walter would be willing to add?

 As said by someone else in this thread, there need to be a way to  
 specify that static arrays are passed by value, so can the ref keyword  
 be paired with the oposite "byval" or something similar.

 And also, functions need to be able to return static arrays which is not  
 possible at the moment.

 Note that the support should be general and work with any array type (so  
 that you can get YMM support whenever that makes it into the future  
 chips).


 / Mattias

How would you implement it for user-defined types?

Feb 22 2009

Christopher Wright <dhasenan gmail.com> writes:

Denis Koroskin wrote:
 On Sun, 22 Feb 2009 14:02:53 +0300, Mattias Holm 
 <hannibal.holm gmail.com> wrote:
 
 I think that the following would work reasonably well:

     allow the [] operator for arrays to take comma separated lists of 
 indices.

 So the OpenCL like statement:

     v.xyzw = v2.wzyx;

 will be written as:

     v[0,1,2,3] = v2[3,2,1,0];

 
 How would you implement it for user-defined types?

T[] opIndex(int[] indices) { ... }
void opIndexAssign(int[] indices, T[] values) { ... }

Feb 22 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Sun, 22 Feb 2009 16:51:10 +0300, Christopher Wright <dhasenan gmail.com>
wrote:

 Denis Koroskin wrote:
 On Sun, 22 Feb 2009 14:02:53 +0300, Mattias Holm  
 <hannibal.holm gmail.com> wrote:

 I think that the following would work reasonably well:

     allow the [] operator for arrays to take comma separated lists of  
 indices.

 So the OpenCL like statement:

     v.xyzw = v2.wzyx;

 will be written as:

     v[0,1,2,3] = v2[3,2,1,0];

  How would you implement it for user-defined types?

 T[] opIndex(int[] indices) { ... }
 void opIndexAssign(int[] indices, T[] values) { ... }

How about ranges included - v[0..3, 6, 5, 7..len] ?

Feb 22 2009

bearophile <bearophileHUGS lycos.com> writes:

Mattias Holm:
 I think that the following would work reasonably well:
 	allow the [] operator for arrays to take comma separated lists of indices.
 So the OpenCL like statement:
 	v.xyzw = v2.wzyx;
 will be written as:
 	v[0,1,2,3] = v2[3,2,1,0];

Be careful, I think a syntax like:
v[0, 1]
v[0, 1, 2, 3]
Is better left to index in 2D and 4D arrays.
nD arrays can be quite important in a language like D.

So the following may be better and more compatible with the future:
v[0; 1; 2; 3] = v2[3; 2; 1; 0];
Or:
v[(0,1,2,3)] = v2[(3,2,1,0)];
Or:
v[[0,1,2,3]] = v2[[3,2,1,0]];

Bye,
bearophile

Feb 22 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

bearophile wrote:
 Mattias Holm:
 I think that the following would work reasonably well:
 	allow the [] operator for arrays to take comma separated lists of indices.
 So the OpenCL like statement:
 	v.xyzw = v2.wzyx;
 will be written as:
 	v[0,1,2,3] = v2[3,2,1,0];

 
 Be careful, I think a syntax like:
 v[0, 1]
 v[0, 1, 2, 3]
 Is better left to index in 2D and 4D arrays.
 nD arrays can be quite important in a language like D.
 
 So the following may be better and more compatible with the future:
 v[0; 1; 2; 3] = v2[3; 2; 1; 0];
 Or:
 v[(0,1,2,3)] = v2[(3,2,1,0)];
 Or:
 v[[0,1,2,3]] = v2[[3,2,1,0]];
 
 Bye,
 bearophile

Swizzling a vector and multidimensional access are orthogonal on account
of vectors having only one dimension [1].

struct FloatTuple(size_t n); // exercise to the reader :P

struct vec4f
{
    union
    {
        float[4] data;
        struct { float w, x, y, z; }
    }

    float opIndex(size_t i) { return data[i]; }
    float opIndexAssign(float v, size_t i) { return data[i] = v; }

    FloatTuple!(2) opIndex(size_t i, size_t j)
    {
        return FloatTuple!(2)(data[i],data[j]);
    }

    FloatTuple!(2) opIndexAssign(FloatTuple!(2) vs, size_t i, size_t j)
    {
        data[i] = vs.data[0];
        data[j] = vs.data[1];
        return vs;
    }

    // and so on for 3 and 4 argument versions.
}

Personally, I think something like this is a better idea:

struct vec4f
{
    ...

    FloatTuple!(PermSpec.length) perm(string PermSpec)()
    {
        FloatTuple!(PermSpec.length) vs;
        foreach( i ; Range!(PermSpec.length) )
            vs.data[i] = mixin(`this.`~PermSpec[i]);
        return vs;
    }
}

void main()
{
    vec4f a, b;
    a = ...;
    b = a.perm!("xyzw");
}

  -- Daniel


[1] By this I mean they're a 1D data type; vec4f represents a 4D vector
but in terms of implementation, it has only one dimension: the index of
data.

Feb 22 2009

Don <nospam nospam.com> writes:

Mattias Holm wrote:
 I think that the following would work reasonably well:
 
     allow the [] operator for arrays to take comma separated lists of 
 indices.

I've always believed that we need that syntax for multi-dimensional 
arrays. Swizzling ought to be possible on a multi-dimensional array, and 
I don't think it would be, with your proposal?

 
 So the OpenCL like statement:
 
     v.xyzw = v2.wzyx;
 
 will be written as:
 
     v[0,1,2,3] = v2[3,2,1,0];
 
 Would this be ok? This is a general extension of the array slicing, and 
 it might be possible to permute with a combination of slices and indices 
 like this (i.e. v[0..3, 6, 5, 7.. len]). Is permutation operations 
 something that Walter would be willing to add?
 
 As said by someone else in this thread, there need to be a way to 
 specify that static arrays are passed by value, so can the ref keyword 
 be paired with the oposite "byval" or something similar.
 
 And also, functions need to be able to return static arrays which is not 
 possible at the moment.
 
 Note that the support should be general and work with any array type (so 
 that you can get YMM support whenever that makes it into the future chips).
 
 
 / Mattias

Feb 23 2009

Don <nospam nospam.com> writes:

Mattias Holm wrote:
 On 2009-02-21 17:03:06 +0100, Don <nospam nospam.com> said:
 I don't think that's messy at all. I can't see much difference between 
 special support for float[4] versus float4. It's better if the code 
 can take advantage of hardware without specific support. Bear in mind 
 that SSE/SSE2 is a temporary situation. AVX provides for much longer 
 arrays of vectors; and it's extensible. You'd end up needing to keep 
 adding on special types whenever a new CPU comes out.

 Note that the fundamental concept which is missing from the C virtual 
 machine is that all modern machines can efficiently perform operations 
 on arrays of built-in types of length 2^n, for some small value of n.
 We need to get this into the language abstraction. Not follow C++ in 
 hacking a few extra special types onto the old, deficient C model. And 
 I think D is actually in a position to do this.

 float[4] would be a greatly superior option if it could be done.
 The key requirements are:
 (1) need to specify that static arrays are passed by value.
 (2) need to keep stack aligned to 16.
 The good news is that both of these appear to be done on DMD2-Mac!

 
 Yes, float[4] would be ok, if some CPU independent permutation support 
 can be added. Would this be with some intrinsic then or what? I very 
 much like the OpenCL syntax for permutation, but I suppose that an 
 intrinsic such as "float[4] noref permute(float[4] noref vec, int 
 newPos0, int newPos1, int newPos2, int newPos3)" would work as well. 
 Note that this should also work with double[2], byte[16], short[8] and 
 int[4].

Note that if you had static arrays with value semantics, with proper 
alignment, then you could simply create

module std.swizzle;
float[4] permute(float[4] vec, int newPos0, int newPos1, int newPos2, 
int newPos3);  /* intrinsic */

float[4] wzyx(float[4] q) { return permute(q, 4, 3, 2, 1); }
float[4] xywz(float[4] q) { return permute(q, 1, 2, 4, 3); }
// etc

---
and your code would be:

import std.swizzle;

void main()
{
    float[4] t;
    auto u = t.wzyx;
}

I don't think this is terribly difficult once the value semantics are in 
place.
(Note that once you get beyond 4 members, the .xyzw syntax gives an 
explosion of functions; but I think it's workable at 4; 4! is only 24.
Beyond that point, you'd probably require direct permute calls).

Feb 23 2009

Bill Baxter <wbaxter gmail.com> writes:

On Mon, Feb 23, 2009 at 5:18 PM, Don <nospam nospam.com> wrote:
 Mattias Holm wrote:
 On 2009-02-21 17:03:06 +0100, Don <nospam nospam.com> said:
 I don't think that's messy at all. I can't see much difference between
 special support for float[4] versus float4. It's better if the code can take
 advantage of hardware without specific support. Bear in mind that SSE/SSE2
 is a temporary situation. AVX provides for much longer arrays of vectors;
 and it's extensible. You'd end up needing to keep adding on special types
 whenever a new CPU comes out.

 Note that the fundamental concept which is missing from the C virtual
 machine is that all modern machines can efficiently perform operations on
 arrays of built-in types of length 2^n, for some small value of n.
 We need to get this into the language abstraction. Not follow C++ in
 hacking a few extra special types onto the old, deficient C model. And I
 think D is actually in a position to do this.

 float[4] would be a greatly superior option if it could be done.
 The key requirements are:
 (1) need to specify that static arrays are passed by value.
 (2) need to keep stack aligned to 16.
 The good news is that both of these appear to be done on DMD2-Mac!

 Yes, float[4] would be ok, if some CPU independent permutation support can
 be added. Would this be with some intrinsic then or what? I very much like
 the OpenCL syntax for permutation, but I suppose that an intrinsic such as
 "float[4] noref permute(float[4] noref vec, int newPos0, int newPos1, int
 newPos2, int newPos3)" would work as well. Note that this should also work
 with double[2], byte[16], short[8] and int[4].

 Note that if you had static arrays with value semantics, with proper
 alignment, then you could simply create

 module std.swizzle;
 float[4] permute(float[4] vec, int newPos0, int newPos1, int newPos2, int
 newPos3);  /* intrinsic */

 float[4] wzyx(float[4] q) { return permute(q, 4, 3, 2, 1); }
 float[4] xywz(float[4] q) { return permute(q, 1, 2, 4, 3); }
 // etc

 ---
 and your code would be:

 import std.swizzle;

 void main()
 {
   float[4] t;
   auto u = t.wzyx;
 }

 I don't think this is terribly difficult once the value semantics are in
 place.
 (Note that once you get beyond 4 members, the .xyzw syntax gives an
 explosion of functions; but I think it's workable at 4; 4! is only 24.
 Beyond that point, you'd probably require direct permute calls).

Actually its 4^4 if you do it like OpenCL/GLSL/HLSL/Cg and allow
repeats like .xxyy.

--bb

--bb

Feb 23 2009

Don <nospam nospam.com> writes:

Bill Baxter wrote:
 On Mon, Feb 23, 2009 at 5:18 PM, Don <nospam nospam.com> wrote:
 Mattias Holm wrote:
 On 2009-02-21 17:03:06 +0100, Don <nospam nospam.com> said:
 I don't think that's messy at all. I can't see much difference between
 special support for float[4] versus float4. It's better if the code can take
 advantage of hardware without specific support. Bear in mind that SSE/SSE2
 is a temporary situation. AVX provides for much longer arrays of vectors;
 and it's extensible. You'd end up needing to keep adding on special types
 whenever a new CPU comes out.

 Note that the fundamental concept which is missing from the C virtual
 machine is that all modern machines can efficiently perform operations on
 arrays of built-in types of length 2^n, for some small value of n.
 We need to get this into the language abstraction. Not follow C++ in
 hacking a few extra special types onto the old, deficient C model. And I
 think D is actually in a position to do this.

 float[4] would be a greatly superior option if it could be done.
 The key requirements are:
 (1) need to specify that static arrays are passed by value.
 (2) need to keep stack aligned to 16.
 The good news is that both of these appear to be done on DMD2-Mac!

 Yes, float[4] would be ok, if some CPU independent permutation support can
 be added. Would this be with some intrinsic then or what? I very much like
 the OpenCL syntax for permutation, but I suppose that an intrinsic such as
 "float[4] noref permute(float[4] noref vec, int newPos0, int newPos1, int
 newPos2, int newPos3)" would work as well. Note that this should also work
 with double[2], byte[16], short[8] and int[4].

 Note that if you had static arrays with value semantics, with proper
 alignment, then you could simply create

 module std.swizzle;
 float[4] permute(float[4] vec, int newPos0, int newPos1, int newPos2, int
 newPos3);  /* intrinsic */

 float[4] wzyx(float[4] q) { return permute(q, 4, 3, 2, 1); }
 float[4] xywz(float[4] q) { return permute(q, 1, 2, 4, 3); }
 // etc

 ---
 and your code would be:

 import std.swizzle;

 void main()
 {
   float[4] t;
   auto u = t.wzyx;
 }

 I don't think this is terribly difficult once the value semantics are in
 place.
 (Note that once you get beyond 4 members, the .xyzw syntax gives an
 explosion of functions; but I think it's workable at 4; 4! is only 24.
 Beyond that point, you'd probably require direct permute calls).

 
 Actually its 4^4 if you do it like OpenCL/GLSL/HLSL/Cg and allow
 repeats like .xxyy.

Yes. Is the syntax sugar actually needed for all the permutations?
Even so, it's still only 256, which is probably still OK. I don't think 
a language change is required.

This scheme doesn't cover:
* shufp  where the two sources are different
* haddpd, haddps [SSE3] { double[2] a, b;  a[0]=a[0]+a[1]; a[1]=b[0]+b[1]; }
* non-temporal stores (although I think these are covered adequately by 
array operations)

and the byte/word operations:

* pack with saturation
* movmsk
* avg
* multiply and add.

So it looks to me as though with the minimal language changes, we could 
get almost complete SIMD support, with excellent syntax.

Feb 23 2009

bearophile <bearophileHUGS lycos.com> writes:

Don:

 This scheme doesn't cover:

...
 and the byte/word operations:

...
 So it looks to me as though with the minimal language changes, we could 
 get almost complete SIMD support, with excellent syntax.

Do you consider things from the short future too?
http://en.wikipedia.org/wiki/SSE5
http://en.wikipedia.org/wiki/Advanced_Vector_Extensions

Bye,
bearophile

Feb 23 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 Bill Baxter wrote:
 Actually its 4^4 if you do it like OpenCL/GLSL/HLSL/Cg and allow
 repeats like .xxyy.

 
 Yes. Is the syntax sugar actually needed for all the permutations?
 Even so, it's still only 256, which is probably still OK. I don't think 
 a language change is required.

There's no need to ever enumerate all functions - they can be generated 
with templates and mixins rather easily.

 This scheme doesn't cover:
 * shufp  where the two sources are different
 * haddpd, haddps [SSE3] { double[2] a, b;  a[0]=a[0]+a[1]; 
 a[1]=b[0]+b[1]; }
 * non-temporal stores (although I think these are covered adequately by 
 array operations)

Well probably we can find ways to generate those too.

 and the byte/word operations:
 
 * pack with saturation
 * movmsk
 * avg
 * multiply and add.
 
 So it looks to me as though with the minimal language changes, we could 
 get almost complete SIMD support, with excellent syntax.
 

That sounds great.


Andrei

Feb 23 2009

Bill Baxter <wbaxter gmail.com> writes:

On Mon, Feb 23, 2009 at 10:24 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Don wrote:
 Bill Baxter wrote:
 Actually its 4^4 if you do it like OpenCL/GLSL/HLSL/Cg and allow
 repeats like .xxyy.

 Yes. Is the syntax sugar actually needed for all the permutations?
 Even so, it's still only 256, which is probably still OK. I don't think a
 language change is required.

 There's no need to ever enumerate all functions - they can be generated with
 templates and mixins rather easily.

I think the issue is just that however you get them there it's a lot
of exe bloat.  The ideal would be something like a template that only
got instantiated when used, like    vec2 = vec1.swizzle!("xyzy");  ...
but with the syntax vec2 = vec1.xyzy.   And which can generate optimal
assembly.
Really, if .swizzle!("xyzy") can do that, though, that looks good
enough to me.  I would rather have that than 256 silly little
functions that get instantiated no matter what.

--bb

Feb 23 2009

Chad J <gamerchad __spam.is.bad__gmail.com> writes:

Don wrote:
 
 So it looks to me as though with the minimal language changes, we could
 get almost complete SIMD support, with excellent syntax.
 

enum { x=0, y=1, z=2, w=3 }
float[4] foo;
foo[x] = 42;
foo[y] = foo[x];
// etc
foo[] = [foo[y],foo[x],foo[y],foo[x]];

*grin*

Feb 23 2009

Fawzi Mohamed <fmohamed mac.com> writes:

On 2009-02-23 14:48:50 +0100, Chad J <gamerchad __spam.is.bad__gmail.com> said:

 Don wrote:
 
 So it looks to me as though with the minimal language changes, we could
 get almost complete SIMD support, with excellent syntax.
 

 
 enum { x=0, y=1, z=2, w=3 }
 float[4] foo;
 foo[x] = 42;
 foo[y] = foo[x];
 // etc
 foo[] = [foo[y],foo[x],foo[y],foo[x]];
 
 *grin*

Sorry to bump up this discussion, but I was away and then busy with 
other stuff... and coming back I had lot of piled up work... (also make 
tango work wit the brand new dmd on mac ;) so I had missed it.

I think that an aligned vector would be very useful to have for small vectors.
Intrinsic functions could be useful, but I agree with the "joke" done by Chad.
What I really want to see is that the compiler uses them when it should 
(as I think downs did with his autovectorization patch for gdc).
Not too much cluttering for special notation, but the normal one that 
is done efficiently when possible.
A special type is probably needed (for alignment reasons), but then 
that's about it from the language point of view (even that might be 
avoided with align, or maybe sometime even without, but maybe that is 
to expect too much from the compiler).

I also would like to stress again that there are also doubles (but 
indeed the gain in that case is much smaller).

Fawzi

Feb 27 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

Andrei Alexandrescu wrote:
 Denis Koroskin wrote:
 On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm
 <hannibal.holm gmail.com> wrote:

 Since (SIMD) vectors are so common and every reasonabe system support
 them in one way or the other (and scalar emulation of this is rather
 simple), why not have support for this in D directly?

 [snip]

 I don't see any reason why float4 can't be made a library type.

 
 Yah, I was thinking the same:
 
 struct float4
 {
     __align(16) float[4] data; // right syntax and value?
     alias data this;
 }
 
 This looks like something that should go into std.matrix pronto. It even
 has value semantics even though fixed arrays don't :o/.
 
 
 Andrei

I remember implementing a vector struct [1] quite some time ago that had
an SSE-accelerated path.  There were three problems I had with it:

1. The alignment thing.  Incidentally, I just did a quick check and
don't see any notes in the changelog about __align(n) syntax.  As I
remember, there was no way to actually ensure the data was properly
aligned.  (There's "Data items in static data segment >= 16 bytes in
size are now paragraph aligned." but that doesn't help when the vectors
are on, say, the stack or in the heap.)

2. As soon as you use inline asm, you lose inlining.  When the functions
are as small as they are, this can be a bit of overhead.  It gets worse
when you realise that the CPU is spending most of its time running data
back and forth between main memory and the XMM registers...

   Array operations help, but they don't cover everything.

3. There was a not insignificant performance difference for using byref
passing on operators over byval passing.  Of course, you can't ACTUALLY
use byref because it completely breaks anything that uses a temporary
expression as an argument.

In the end, I just dropped it to see how BLADE would turn out.  I ended
up coming to the conclusion that while we can do a float[4] vector in D
and use SIMD to speed it up, there's not much point when BLADE is there.
 Of course, BLADE is a little unwieldy to use what with that mixin
malarky.  Pity we didn't get AST macros... :P

Anyway, just my AUD$0.02.

  -- Daniel


[1] That struct was scary.  It was one of those Vector!(type, size)
jobbies, so it had multiple paths through functions, members that only
existed for certain sizes, special-cased loop unrolling... don't even
ASK about the matrix struct... :P

Feb 19 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Thu, Feb 19, 2009 at 9:31 PM, Daniel Keep
<daniel.keep.lists gmail.com> wrote:
 struct float4
 {
     __align(16) float[4] data; // right syntax and value?
     alias data this;
 }


 1. The alignment thing.  Incidentally, I just did a quick check and
 don't see any notes in the changelog about __align(n) syntax.  As I
 remember, there was no way to actually ensure the data was properly
 aligned.  (There's "Data items in static data segment >= 16 bytes in
 size are now paragraph aligned." but that doesn't help when the vectors
 are on, say, the stack or in the heap.)

It's just align(16).

http://www.digitalmars.com/d/1.0/attribute.html#align

Feb 19 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

Jarrett Billingsley wrote:
 On Thu, Feb 19, 2009 at 9:31 PM, Daniel Keep
 <daniel.keep.lists gmail.com> wrote:
 struct float4
 {
     __align(16) float[4] data; // right syntax and value?
     alias data this;
 }


 
 1. The alignment thing.  Incidentally, I just did a quick check and
 don't see any notes in the changelog about __align(n) syntax.  As I
 remember, there was no way to actually ensure the data was properly
 aligned.  (There's "Data items in static data segment >= 16 bytes in
 size are now paragraph aligned." but that doesn't help when the vectors
 are on, say, the stack or in the heap.)

 
 It's just align(16).
 
 http://www.digitalmars.com/d/1.0/attribute.html#align

Yeah, except it doesn't do what you want in this case.  For example:


module alignment;

import tango.io.Stdout;

struct float4
{
    align(16) float[4] data;
}

void main()
{
    float4 a;
    byte b;
    float4 c;

    Stdout.format("a   {0} & 0xF == {1}", &a, cast(size_t)&a & 0xF).newline;
    Stdout.format("c   {0} & 0xF == {1}", &c, cast(size_t)&c & 0xF).newline;
}



Output is:

a   12fe68 & 0xF == 8
c   12fe78 & 0xF == 8



The only way I found of guaranteeing alignment was to allocate all
vectors on the heap, using a custom allocator to allocate an extra 15
bytes and then align the result appropriately.  Obviously, for 16-byte
vectors, this is completely unacceptable.

  -- Daniel

Feb 19 2009

Don <nospam nospam.com> writes:

Daniel Keep wrote:
 
 Jarrett Billingsley wrote:
 On Thu, Feb 19, 2009 at 9:31 PM, Daniel Keep
 <daniel.keep.lists gmail.com> wrote:
 struct float4
 {
     __align(16) float[4] data; // right syntax and value?
     alias data this;
 }

 1. The alignment thing.  Incidentally, I just did a quick check and
 don't see any notes in the changelog about __align(n) syntax.  As I
 remember, there was no way to actually ensure the data was properly
 aligned.  (There's "Data items in static data segment >= 16 bytes in
 size are now paragraph aligned." but that doesn't help when the vectors
 are on, say, the stack or in the heap.)

 It's just align(16).

 http://www.digitalmars.com/d/1.0/attribute.html#align

 
 Yeah, except it doesn't do what you want in this case.  For example:
 
 
 module alignment;
 
 import tango.io.Stdout;
 
 struct float4
 {
     align(16) float[4] data;
 }
 
 void main()
 {
     float4 a;
     byte b;
     float4 c;
 
     Stdout.format("a   {0} & 0xF == {1}", &a, cast(size_t)&a & 0xF).newline;
     Stdout.format("c   {0} & 0xF == {1}", &c, cast(size_t)&c & 0xF).newline;
 }
 
 
 
 Output is:
 
 a   12fe68 & 0xF == 8
 c   12fe78 & 0xF == 8
 
 
 
 The only way I found of guaranteeing alignment was to allocate all
 vectors on the heap, using a custom allocator to allocate an extra 15
 bytes and then align the result appropriately.  Obviously, for 16-byte
 vectors, this is completely unacceptable.
 
   -- Daniel

http://d.puremagic.com/issues/show_bug.cgi?id=2278

Feb 19 2009

"Lionello Lunesu" <lionello lunesu.remove.com> writes:

 http://d.puremagic.com/issues/show_bug.cgi?id=2278

That bug is interesting. Didn't Walter have to change DMD for the Mac to 
make sure the stack is aligned to 16 bytes? Perhaps most of the work is done 
now?

L.

Feb 20 2009

downs <default_357-line yahoo.de> writes:

Daniel Keep wrote:
 Andrei Alexandrescu wrote:
 Denis Koroskin wrote:
 On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm
 <hannibal.holm gmail.com> wrote:

 Since (SIMD) vectors are so common and every reasonabe system support
 them in one way or the other (and scalar emulation of this is rather
 simple), why not have support for this in D directly?

 [snip]

 I don't see any reason why float4 can't be made a library type.

 Yah, I was thinking the same:

 struct float4
 {
     __align(16) float[4] data; // right syntax and value?
     alias data this;
 }

 This looks like something that should go into std.matrix pronto. It even
 has value semantics even though fixed arrays don't :o/.


 Andrei

 
 I remember implementing a vector struct [1] quite some time ago that had
 an SSE-accelerated path.

We have that in dglut too :) It's optimized for the float[3]/float[4] case -
all float[3]s add a hidden bogus member, then the arithmetic operations
generate for loops which are very easy for a properly patched gdc to
autovectorize :)

No loss of inlining, which means no ref VS. val issues. Alignment is still a
problem, but movups in the aligned case is just as fast as movaps, so I figure
it doesn't matter that much.

Autovec is sweet.

Feb 20 2009

Bill Baxter <wbaxter gmail.com> writes:

On Fri, Feb 20, 2009 at 4:25 AM, Mattias Holm <hannibal.holm gmail.com> wrote:
 Since (SIMD) vectors are so common and every reasonabe system support them
 in one way or the other (and scalar emulation of this is rather simple), why
 not have support for this in D directly?

 Yes, the array operations are nice (and one of the main reasons for why I
 like D :) ), but have the problem that an array of floats must be aligned on
 float boundaries and not vector boundaries. In my mind vectors are a
 primitive data type that should be exposed by the programming language.

To justify making them primitive types you need to show that they are
widespread, and that there is some good reason that they cannot be
implemented in a library.  And even if they can't be implemented in a
library right now, it could be that fixing that reason is better than
making new built-in types.   For instance I think there's issues now
with getting the right stack alignment for structs.  But that's
something that needs to be fixed generally, not by making new built-in
types that know how to do alignment right.

 Something OpenCL-like:

        float4 vec;
        vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment
        vec.xyzw = vec.wyxz; // permutation
        vec[i] = 1.0; // indexing

 And then we can easily immagine some extra nice features to have with
 respect to operators:

        vec ^ vec2; // 3d cross product for float vectors, for int vectors
 xor

All of this is pretty much doable in a library.  The only nonobvious
one is vec.xyzw = vec.wyxz which can be implemented using a bunch of
CTFE auto-code generation (see some earlier version of H3r3tic's
Hybrid library).

And using xor for cross product sucks.  It doesn't have the right precedent.

 Has this been discussed before?

I think it has and no one came up with a reason why these can't be
library types.

--bb

Feb 19 2009

bearophile <bearophileHUGS lycos.com> writes:

Bill Baxter:
 I think it has and no one came up with a reason why these can't be
 library types.

Recently I have discussed about this with the LDC main designer, my idea was to
to have something like

Vect!double, 2) v1;
Vect!(float, 4) v2;

And then make LDC use LLVM intrinsics for the operations on them. So I think
with some intrinsics most of that can be just a library.

Bye,
bearophile

Feb 19 2009

Mattias Holm <hannibal.holm gmail.com> writes:

On 2009-02-19 21:04:06 +0100, Bill Baxter <wbaxter gmail.com> said:
 To justify making them primitive types you need to show that they are
 widespread, and that there is some good reason that they cannot be
 implemented in a library.  And even if they can't be implemented in a
 library right now, it could be that fixing that reason is better than
 making new built-in types.   For instance I think there's issues now
 with getting the right stack alignment for structs.  But that's
 something that needs to be fixed generally, not by making new built-in
 types that know how to do alignment right.

Firstly they are widespread. They might be hampered by the fact that 
they have no direct support in many languages however. There are 
reasons that you nowdays have standard vector syntax in GCC. It is a 
bit rough (only support basic operations like elementwise add, div, mul 
et.c.) and does not support permutations. The cross product is only a 
nice to have operator for floats, but it is far from neccisary, 
permutations are more painful though as GCC at the moment force you to 
write CPU-speciffic intrincics for them.

There are firstly the alignment issuse, but there are as other said 
alignment attribute in D as well. More problematic for a library 
implementation of these types is however calling conventions.

Passing a vector into a function is very efficient, they can be passed 
in registers for both in and out values (if the ABI allows, I think PPC 
and x86-64 calling conventions allow for this). The problem with the 
GCC versions however is that due to how the vectors work in it, you 
have to resort to union hacks to get the scalars out of a vector:

union {
	_m128 data;
	struct {
		float x, y, z, w;
	}
}

Unfortunatelly, due to this, in order to make the compiler issue nice 
code, you have to work with the vector type only, and only bring out 
the union when inspecting the scalars in the vector. Which 
unfortunatelly will happen quite often (since the union has an 
ambiguous call by value semantic, should it be passed by one or four 
xmm regs (assuming x86-64 conventions)).

Also, if a vector is a struct you do get the alignment issuse in some 
cases but you may also have different call by value calling conventions 
(with the x86-64 they would in principle be the same, but that is just 
coincidence).

I do agrre that if it could be implemeted a library type, then it 
should, but this doesn't work if you take the fact of calling 
conventions into account.

Also, I am sure that the optimiser could more easily make 
transformations of vector types than on structs that are being passed 
around since the semantics of a primitive type is known by the compiler.


/ Mattias

Feb 19 2009

"Joel C. Salomon" <joelcsalomon gmail.com> writes:

Mattias Holm wrote:
 And then we can easily immagine some extra nice features to have with
 respect to operators:
 
     vec ^ vec2; // 3d cross product for float vectors, for int vectors xor
 
 Has this been discussed before?

Given that the wedge product is a defined operation on vectors (a^b is a
bivector in the common plane of a & b with area |a||b|sin θ)—related to,
but very distinct from, the cross product—I’d call this a BAD operator
overload.

(B.T.W., I am starting work on a Geometric Algebra library which will
implement all these products.)

—Joel Salomon

Feb 22 2009

D Programming

C/C++ Programming

Other

digitalmars.D - primitive vector types