digitalmars.D - primitive vector types
- Mattias Holm (18/18) Feb 19 2009 Since (SIMD) vectors are so common and every reasonabe system support
- Denis Koroskin (3/21) Feb 19 2009 I don't see any reason why float4 can't be made a library type.
- Don (4/36) Feb 19 2009 Walter at one point suggested that float[4] should be specially
- Andrei Alexandrescu (10/42) Feb 19 2009 Yah, I was thinking the same:
- Jason House (2/15) Feb 19 2009 I've looked for this functionality in std.bitmanip before. I never thou...
- Denis Koroskin (5/46) Feb 19 2009 That would be great. If float4 gets its way into D, I'll share our blazi...
- Andrei Alexandrescu (3/66) Feb 19 2009 Put me down for that. What do I need to do?
- Denis Koroskin (3/66) Feb 19 2009 Convince Walter to add float4 type and some intrinsics to DMD (I'll post...
- Denis Koroskin (82/154) Feb 20 2009 Here is a nice documentation about MMX, SSE, SSE2 intrinsics:
- Mattias Holm (4/4) Feb 20 2009 Yeah, this is good statistics and does point out that the vector add/mul...
- Christian Kamm (2/6) Feb 21 2009 Yes, LDC would follow. The main reason people can't use these intrinsics...
- Michel Fortin (8/17) Feb 21 2009 Instead of introducing new a new type, couldn't float[4] be the one
- bearophile (4/6) Feb 21 2009 Alignment requirements, shuffling operations, scalar operations on just ...
- Daniel Keep (6/14) Feb 21 2009 Another advantage would be that you could specify in the ABI that this
- Don (18/34) Feb 21 2009 I don't think that's messy at all. I can't see much difference between
- Andrei Alexandrescu (4/9) Feb 21 2009 I agree with float[4] as a good choice. So are value semantics for T[n]
- Don (3/14) Feb 21 2009 Oh. I guess it was just a proposal then, and not implemented. We're not
- Andrei Alexandrescu (14/28) Feb 21 2009 Yah. Walter agrees that that's the right thing to do. The only thing
- Jarrett Billingsley (7/17) Feb 21 2009 Structs already work like this. In fact, the compiler will pass a
- Andrei Alexandrescu (22/42) Feb 21 2009 Ok, you just tipped the balance :o). I'm also realizing something. The
- Bill Baxter (5/13) Feb 21 2009 I don't follow you. Wouldn't they rather pass such huge chunks of
- Andrei Alexandrescu (11/25) Feb 21 2009 What I'm saying (sorry for being unclear) is:
- Bill Baxter (9/37) Feb 21 2009 Ok. And for large static arrays you can still explicitly use ref, right...
- Andrei Alexandrescu (4/36) Feb 21 2009 Yah, ref on the callee side, or the [] operator without any arguments on...
- Jarrett Billingsley (4/6) Feb 21 2009 Wait, what? I wouldn't have expected anything but "ref type[n] foo"
- Christopher Wright (5/12) Feb 21 2009 void foo(T)(T t) {}
- Jarrett Billingsley (2/17) Feb 21 2009 Oh. See, I don't usually template _everything_ so that didn't cross my ...
- Michel Fortin (12/16) Feb 21 2009 I think it is the right decision too.
- Andrei Alexandrescu (3/20) Feb 21 2009 Yah, and that would give a good model to follow for user-defined contain...
- bearophile (7/11) Feb 22 2009 Well, I think the type system can be extended to manage that: the progra...
- Christopher Wright (4/13) Feb 21 2009 This is less of an issue with the type of string literals being
- Andrei Alexandrescu (4/18) Feb 21 2009 Exactly so. So with this other fix in the language there's even more
- bearophile (6/25) Feb 21 2009 I have quoted it all because I like the experience and ideas you bring t...
- Mattias Holm (19/39) Feb 21 2009 Yes, float[4] would be ok, if some CPU independent permutation support
- Mattias Holm (19/19) Feb 22 2009 I think that the following would work reasonably well:
- Denis Koroskin (2/22) Feb 22 2009 How would you implement it for user-defined types?
- Christopher Wright (3/20) Feb 22 2009 T[] opIndex(int[] indices) { ... }
- Denis Koroskin (2/21) Feb 22 2009 How about ranges included - v[0..3, 6, 5, 7..len] ?
- bearophile (14/20) Feb 22 2009 Be careful, I think a syntax like:
- Daniel Keep (47/70) Feb 22 2009 Swizzling a vector and multidimensional access are orthogonal on account
- Don (4/35) Feb 23 2009 I've always believed that we need that syntax for multi-dimensional
- Don (22/51) Feb 23 2009 Note that if you had static arrays with value semantics, with proper
- Bill Baxter (5/56) Feb 23 2009 Actually its 4^4 if you do it like OpenCL/GLSL/HLSL/Cg and allow
- Don (16/74) Feb 23 2009 Yes. Is the syntax sugar actually needed for all the permutations?
- bearophile (8/12) Feb 23 2009 ...
- Andrei Alexandrescu (6/29) Feb 23 2009 There's no need to ever enumerate all functions - they can be generated
- Bill Baxter (11/23) Feb 23 2009 I think the issue is just that however you get them there it's a lot
- Chad J (8/12) Feb 23 2009 enum { x=0, y=1, z=2, w=3 }
- Fawzi Mohamed (17/31) Feb 27 2009 Sorry to bump up this discussion, but I was away and then busy with
- Daniel Keep (29/54) Feb 19 2009 I remember implementing a vector struct [1] quite some time ago that had
- Jarrett Billingsley (4/15) Feb 19 2009 It's just align(16).
- Daniel Keep (24/42) Feb 19 2009 Yeah, except it doesn't do what you want in this case. For example:
- Don (2/58) Feb 19 2009 http://d.puremagic.com/issues/show_bug.cgi?id=2278
- Lionello Lunesu (4/5) Feb 20 2009 That bug is interesting. Didn't Walter have to change DMD for the Mac to...
- downs (4/31) Feb 20 2009 We have that in dglut too :) It's optimized for the float[3]/float[4] ca...
- Bill Baxter (17/34) Feb 19 2009 To justify making them primitive types you need to show that they are
- bearophile (7/9) Feb 19 2009 Recently I have discussed about this with the LDC main designer, my idea...
- Mattias Holm (40/48) Feb 19 2009 Firstly they are widespread. They might be hampered by the fact that
- Joel C. Salomon (8/14) Feb 22 2009 Given that the wedge product is a defined operation on vectors (a^b is a
Since (SIMD) vectors are so common and every reasonabe system support them in one way or the other (and scalar emulation of this is rather simple), why not have support for this in D directly? Yes, the array operations are nice (and one of the main reasons for why I like D :) ), but have the problem that an array of floats must be aligned on float boundaries and not vector boundaries. In my mind vectors are a primitive data type that should be exposed by the programming language. Something OpenCL-like: float4 vec; vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment vec.xyzw = vec.wyxz; // permutation vec[i] = 1.0; // indexing And then we can easily immagine some extra nice features to have with respect to operators: vec ^ vec2; // 3d cross product for float vectors, for int vectors xor Has this been discussed before? / Mattias
Feb 19 2009
On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm <hannibal.holm gmail.com> wrote:Since (SIMD) vectors are so common and every reasonabe system support them in one way or the other (and scalar emulation of this is rather simple), why not have support for this in D directly? Yes, the array operations are nice (and one of the main reasons for why I like D :) ), but have the problem that an array of floats must be aligned on float boundaries and not vector boundaries. In my mind vectors are a primitive data type that should be exposed by the programming language. Something OpenCL-like: float4 vec; vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment vec.xyzw = vec.wyxz; // permutation vec[i] = 1.0; // indexing And then we can easily immagine some extra nice features to have with respect to operators: vec ^ vec2; // 3d cross product for float vectors, for int vectors xor Has this been discussed before? / MattiasI don't see any reason why float4 can't be made a library type.
Feb 19 2009
Denis Koroskin wrote:On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm <hannibal.holm gmail.com> wrote:Walter at one point suggested that float[4] should be specially recognized by the compiler -- it would always be aligned, and stored in a SSE register if possible. Ditto for float[3].Since (SIMD) vectors are so common and every reasonabe system support them in one way or the other (and scalar emulation of this is rather simple), why not have support for this in D directly? Yes, the array operations are nice (and one of the main reasons for why I like D :) ), but have the problem that an array of floats must be aligned on float boundaries and not vector boundaries. In my mind vectors are a primitive data type that should be exposed by the programming language. Something OpenCL-like: float4 vec; vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment vec.xyzw = vec.wyxz; // permutation vec[i] = 1.0; // indexing And then we can easily immagine some extra nice features to have with respect to operators: vec ^ vec2; // 3d cross product for float vectors, for int vectors xor Has this been discussed before? / MattiasI don't see any reason why float4 can't be made a library type.
Feb 19 2009
Denis Koroskin wrote:On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm <hannibal.holm gmail.com> wrote:Yah, I was thinking the same: struct float4 { __align(16) float[4] data; // right syntax and value? alias data this; } This looks like something that should go into std.matrix pronto. It even has value semantics even though fixed arrays don't :o/. AndreiSince (SIMD) vectors are so common and every reasonabe system support them in one way or the other (and scalar emulation of this is rather simple), why not have support for this in D directly? Yes, the array operations are nice (and one of the main reasons for why I like D :) ), but have the problem that an array of floats must be aligned on float boundaries and not vector boundaries. In my mind vectors are a primitive data type that should be exposed by the programming language. Something OpenCL-like: float4 vec; vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment vec.xyzw = vec.wyxz; // permutation vec[i] = 1.0; // indexing And then we can easily immagine some extra nice features to have with respect to operators: vec ^ vec2; // 3d cross product for float vectors, for int vectors xor Has this been discussed before? / MattiasI don't see any reason why float4 can't be made a library type.
Feb 19 2009
Andrei Alexandrescu Wrote:Denis Koroskin wrote:I've looked for this functionality in std.bitmanip before. I never thought to look in std.matrix. Fixed size bit arrays seems like an obvious choice for SIMD optimization.I don't see any reason why float4 can't be made a library type.Yah, I was thinking the same: struct float4 { __align(16) float[4] data; // right syntax and value? alias data this; } This looks like something that should go into std.matrix pronto. It even has value semantics even though fixed arrays don't :o/.
Feb 19 2009
On Thu, 19 Feb 2009 23:05:34 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Denis Koroskin wrote:That would be great. If float4 gets its way into D, I'll share our blazing fast math code with community (most common operations on vectors, matrices, quaternions etc). It is written entirely in SSE (intrinsics, not asm; there is a problem with inlining asm in D, IIRC. Can anyone elaborate on this?) and *very* fast. According to our benchmarks, that's the best we get squeeze out of hardware. I know LLVM have support for *very* wide range of intrinsics: http://www.cs.ucla.edu/classes/spring08/cs259/llvm-2.2/include/llvm/Intrinsics.gen Hopefully they will get into LDC (and DMD *hint* Walter *hint*) very soon.On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm <hannibal.holm gmail.com> wrote:Yah, I was thinking the same: struct float4 { __align(16) float[4] data; // right syntax and value? alias data this; } This looks like something that should go into std.matrix pronto. It even has value semantics even though fixed arrays don't :o/. AndreiSince (SIMD) vectors are so common and every reasonabe system support them in one way or the other (and scalar emulation of this is rather simple), why not have support for this in D directly? Yes, the array operations are nice (and one of the main reasons for why I like D :) ), but have the problem that an array of floats must be aligned on float boundaries and not vector boundaries. In my mind vectors are a primitive data type that should be exposed by the programming language. Something OpenCL-like: float4 vec; vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment vec.xyzw = vec.wyxz; // permutation vec[i] = 1.0; // indexing And then we can easily immagine some extra nice features to have with respect to operators: vec ^ vec2; // 3d cross product for float vectors, for int vectors xor Has this been discussed before? / MattiasI don't see any reason why float4 can't be made a library type.
Feb 19 2009
Denis Koroskin wrote:On Thu, 19 Feb 2009 23:05:34 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Put me down for that. What do I need to do? AndreiDenis Koroskin wrote:That would be great. If float4 gets its way into D, I'll share our blazing fast math code with community (most common operations on vectors, matrices, quaternions etc). It is written entirely in SSE (intrinsics, not asm; there is a problem with inlining asm in D, IIRC. Can anyone elaborate on this?) and *very* fast. According to our benchmarks, that's the best we get squeeze out of hardware. I know LLVM have support for *very* wide range of intrinsics: http://www.cs.ucla.edu/classes/spring08/cs259/llvm-2.2/include llvm/Intrinsics.gen Hopefully they will get into LDC (and DMD *hint* Walter *hint*) very soon.On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm <hannibal.holm gmail.com> wrote:Yah, I was thinking the same: struct float4 { __align(16) float[4] data; // right syntax and value? alias data this; } This looks like something that should go into std.matrix pronto. It even has value semantics even though fixed arrays don't :o/. AndreiSince (SIMD) vectors are so common and every reasonabe system support them in one way or the other (and scalar emulation of this is rather simple), why not have support for this in D directly? Yes, the array operations are nice (and one of the main reasons for why I like D :) ), but have the problem that an array of floats must be aligned on float boundaries and not vector boundaries. In my mind vectors are a primitive data type that should be exposed by the programming language. Something OpenCL-like: float4 vec; vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment vec.xyzw = vec.wyxz; // permutation vec[i] = 1.0; // indexing And then we can easily immagine some extra nice features to have with respect to operators: vec ^ vec2; // 3d cross product for float vectors, for int vectors xor Has this been discussed before? / MattiasI don't see any reason why float4 can't be made a library type.
Feb 19 2009
On Fri, 20 Feb 2009 06:22:40 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Denis Koroskin wrote:Convince Walter to add float4 type and some intrinsics to DMD (I'll post a list of those we use later), LDC will follow, I believe. There should be some type that would be treated specially. After all, intrinsics have function signatures and those should specify some concrete types.On Thu, 19 Feb 2009 23:05:34 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Put me down for that. What do I need to do? AndreiDenis Koroskin wrote:That would be great. If float4 gets its way into D, I'll share our blazing fast math code with community (most common operations on vectors, matrices, quaternions etc). It is written entirely in SSE (intrinsics, not asm; there is a problem with inlining asm in D, IIRC. Can anyone elaborate on this?) and *very* fast. According to our benchmarks, that's the best we get squeeze out of hardware. I know LLVM have support for *very* wide range of intrinsics: http://www.cs.ucla.edu/classes/spring08/cs259/llvm-2.2/include llvm/Intrinsics.gen Hopefully they will get into LDC (and DMD *hint* Walter *hint*) very soon.On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm <hannibal.holm gmail.com> wrote:Yah, I was thinking the same: struct float4 { __align(16) float[4] data; // right syntax and value? alias data this; } This looks like something that should go into std.matrix pronto. It even has value semantics even though fixed arrays don't :o/. AndreiSince (SIMD) vectors are so common and every reasonabe system support them in one way or the other (and scalar emulation of this is rather simple), why not have support for this in D directly? Yes, the array operations are nice (and one of the main reasons for why I like D :) ), but have the problem that an array of floats must be aligned on float boundaries and not vector boundaries. In my mind vectors are a primitive data type that should be exposed by the programming language. Something OpenCL-like: float4 vec; vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment vec.xyzw = vec.wyxz; // permutation vec[i] = 1.0; // indexing And then we can easily immagine some extra nice features to have with respect to operators: vec ^ vec2; // 3d cross product for float vectors, for int vectors xor Has this been discussed before? / MattiasI don't see any reason why float4 can't be made a library type.
Feb 19 2009
On Fri, 20 Feb 2009 08:55:16 +0300, Denis Koroskin <2korden gmail.com> wrote:On Fri, 20 Feb 2009 06:22:40 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Here is a nice documentation about MMX, SSE, SSE2 intrinsics: http://msdn.microsoft.com/en-us/library/y0dh78ez(VS.80).aspx Here is a quick statistics on what intrinsics are used in our code and how many times. Note that it doesn't directly maps to how many times it is *actually* used in user-code. This info may give Walter some information about priorities (those intrinsics that aren't often used may be given lower priority, for example). Arithmetic Operations (Floating-Point SSE2 Intrinsics) http://msdn.microsoft.com/en-us/library/708ya3be(VS.80).aspx _mm_add_ss - 2 _mm_add_ps - 48 _mm_sub_ss - 4 _mm_sub_ps - 24 _mm_mul_ss - 2 _mm_mul_ps - 100 _mm_div_ss - 0 _mm_div_ps - 1 _mm_sqrt_ss - 0 _mm_sqrt_ps - 0 _mm_rcp_ss - 1 _mm_rcp_ps - 0 _mm_rsqrt_ss - 0 _mm_rsqrt_ps - 1 _mm_min_ss - 0 _mm_min_ps - 1 _mm_max_ss - 0 _mm_max_ps - 1 Store Operations (SSE) http://msdn.microsoft.com/en-us/library/ybhzf6dk(VS.80).aspx _mm_store_ss - 1 _mm_store1_ps - 0 _mm_store_ps1 - 0 _mm_store_ps - 0 _mm_storeu_ps - 0 _mm_storer_ps - 0 _mm_move_ss - 2 Set Operations (SSE) http://msdn.microsoft.com/en-us/library/wbzwdy6a(VS.80).aspx _mm_set_ss - 0 _mm_set1_ps - 0 _mm_set_ps1 - 19 _mm_set_ps - 45 _mm_setr_ps - 0 _mm_setzero_ps - 2 Logical Operations (SSE) http://msdn.microsoft.com/en-us/library/9759as73(VS.80).aspx _mm_and_ps - 2 _mm_andnot_ps - 0 _mm_or_ps - 0 _mm_xor_ps - 3 Miscellaneous Instructions That Use Streaming SIMD Extensions http://msdn.microsoft.com/en-us/library/dzs626wx.aspx _mm_shuffle_ps - 124 _mm_shuffle_pi16 - 0 _mm_unpackhi_ps - 0 _mm_unpacklo_ps - 0 _mm_loadh_pi - 0 _mm_storeh_pi - 0 _mm_movehl_ps - 0 _mm_movelh_ps - 0 _mm_loadl_pi - 0 _mm_storel_pi - 0 _mm_movemask_ps - 0 _mm_getcsr - 0 _mm_setcsr - 0 _mm_extract_si64 - 0 _mm_extracti_si64 - 0 _mm_insert_si64 - 0 _mm_inserti_si64 - 0 Comparison Intrinsics (SSE) http://msdn.microsoft.com/en-us/library/w8kez9sf(VS.80).aspx Not used Conversion Operations (SSE) http://msdn.microsoft.com/en-us/library/0d4dtzhb(VS.80).aspx Not used Macros _MM_SHUFFLE - 100 - #define _MM_SHUFFLE(fp3,fp2,fp1,fp0) (((fp3) << 6) | ((fp2) << 4) | ((fp1) << 2) | ((fp0)))Denis Koroskin wrote:Convince Walter to add float4 type and some intrinsics to DMD (I'll post a list of those we use later), LDC will follow, I believe. There should be some type that would be treated specially. After all, intrinsics have function signatures and those should specify some concrete types.On Thu, 19 Feb 2009 23:05:34 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Put me down for that. What do I need to do? AndreiDenis Koroskin wrote:That would be great. If float4 gets its way into D, I'll share our blazing fast math code with community (most common operations on vectors, matrices, quaternions etc). It is written entirely in SSE (intrinsics, not asm; there is a problem with inlining asm in D, IIRC. Can anyone elaborate on this?) and *very* fast. According to our benchmarks, that's the best we get squeeze out of hardware. I know LLVM have support for *very* wide range of intrinsics: http://www.cs.ucla.edu/classes/spring08/cs259/llvm-2.2/include llvm/Intrinsics.gen Hopefully they will get into LDC (and DMD *hint* Walter *hint*) very soon.On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm <hannibal.holm gmail.com> wrote:Yah, I was thinking the same: struct float4 { __align(16) float[4] data; // right syntax and value? alias data this; } This looks like something that should go into std.matrix pronto. It even has value semantics even though fixed arrays don't :o/. AndreiSince (SIMD) vectors are so common and every reasonabe system support them in one way or the other (and scalar emulation of this is rather simple), why not have support for this in D directly? Yes, the array operations are nice (and one of the main reasons for why I like D :) ), but have the problem that an array of floats must be aligned on float boundaries and not vector boundaries. In my mind vectors are a primitive data type that should be exposed by the programming language. Something OpenCL-like: float4 vec; vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment vec.xyzw = vec.wyxz; // permutation vec[i] = 1.0; // indexing And then we can easily immagine some extra nice features to have with respect to operators: vec ^ vec2; // 3d cross product for float vectors, for int vectors xor Has this been discussed before? / MattiasI don't see any reason why float4 can't be made a library type.
Feb 20 2009
Yeah, this is good statistics and does point out that the vector add/mul/permute stuff are used whenever vectors are in use. Intrinsics is one thing, however, better would be platform independent stuff. Altivec have a different syntax for the permute instructions than the SSE shuffle instructions, so in my mind primitive vectors should support all the basic operations of the base type such as +,-,* and / for float vectors. Permutation should be supported with the OpenCL-like syntax (as it is easy to remember) as I suggested. Stuff like cross and dot products are up for libraries in my opinion (but could be nice as operators for readability issues, but this is probably not worth the hassle unless there is a way to override operators for standard types, which is probably a bad idea anyway, better would be proper unicode support and a way to define infix functions so it is possible define a cross product function with the proper cross product operator as function name on a library level). / Mattias
Feb 20 2009
Denis Koroskin wrote:Convince Walter to add float4 type and some intrinsics to DMD (I'll post a list of those we use later), LDC will follow, I believe. There should be some type that would be treated specially. After all, intrinsics have function signatures and those should specify some concrete types.Yes, LDC would follow. The main reason people can't use these intrinsics in LDC at the moment is that there's no type in D that maps to an LLVM vector type.
Feb 21 2009
On 2009-02-21 04:34:07 -0500, Christian Kamm <kamm-incasoftware remove-garbage.de> said:Denis Koroskin wrote:Instead of introducing new a new type, couldn't float[4] be the one mapped to a vector type? Why do we need a new type? -- Michel Fortin michel.fortin michelf.com http://michelf.com/Convince Walter to add float4 type and some intrinsics to DMD (I'll post a list of those we use later), LDC will follow, I believe. There should be some type that would be treated specially. After all, intrinsics have function signatures and those should specify some concrete types.Yes, LDC would follow. The main reason people can't use these intrinsics in LDC at the moment is that there's no type in D that maps to an LLVM vector type.
Feb 21 2009
Michel Fortin:Instead of introducing new a new type, couldn't float[4] be the one mapped to a vector type? Why do we need a new type?Alignment requirements, shuffling operations, scalar operations on just the first item of the vector, ecc. It may be doable, and it may be even a nice idea, but probably it requires lot of care. Bye, bearophile
Feb 21 2009
bearophile wrote:Michel Fortin:Another advantage would be that you could specify in the ABI that this vector type should be passed to and returned from functions via the XMM registers. You could make a specific exception for float[4], but that just seems messy. -- DanielInstead of introducing new a new type, couldn't float[4] be the one mapped to a vector type? Why do we need a new type?Alignment requirements, shuffling operations, scalar operations on just the first item of the vector, ecc. It may be doable, and it may be even a nice idea, but probably it requires lot of care. Bye, bearophile
Feb 21 2009
Daniel Keep wrote:bearophile wrote:I don't think that's messy at all. I can't see much difference between special support for float[4] versus float4. It's better if the code can take advantage of hardware without specific support. Bear in mind that SSE/SSE2 is a temporary situation. AVX provides for much longer arrays of vectors; and it's extensible. You'd end up needing to keep adding on special types whenever a new CPU comes out. Note that the fundamental concept which is missing from the C virtual machine is that all modern machines can efficiently perform operations on arrays of built-in types of length 2^n, for some small value of n. We need to get this into the language abstraction. Not follow C++ in hacking a few extra special types onto the old, deficient C model. And I think D is actually in a position to do this. float[4] would be a greatly superior option if it could be done. The key requirements are: (1) need to specify that static arrays are passed by value. (2) need to keep stack aligned to 16. The good news is that both of these appear to be done on DMD2-Mac!Michel Fortin:Another advantage would be that you could specify in the ABI that this vector type should be passed to and returned from functions via the XMM registers. You could make a specific exception for float[4], but that just seems messy. -- DanielInstead of introducing new a new type, couldn't float[4] be the one mapped to a vector type? Why do we need a new type?Alignment requirements, shuffling operations, scalar operations on just the first item of the vector, ecc. It may be doable, and it may be even a nice idea, but probably it requires lot of care. Bye, bearophile
Feb 21 2009
Don wrote:float[4] would be a greatly superior option if it could be done. The key requirements are: (1) need to specify that static arrays are passed by value. (2) need to keep stack aligned to 16. The good news is that both of these appear to be done on DMD2-Mac!I agree with float[4] as a good choice. So are value semantics for T[n] implemented on the Mac?? Andrei
Feb 21 2009
Andrei Alexandrescu wrote:Don wrote:Oh. I guess it was just a proposal then, and not implemented. We're not as close as I thought. Bummer.float[4] would be a greatly superior option if it could be done. The key requirements are: (1) need to specify that static arrays are passed by value. (2) need to keep stack aligned to 16. The good news is that both of these appear to be done on DMD2-Mac!I agree with float[4] as a good choice. So are value semantics for T[n] implemented on the Mac?? Andrei
Feb 21 2009
Don wrote:Andrei Alexandrescu wrote:Yah. Walter agrees that that's the right thing to do. The only thing that worries us is passing by-value large statically-sized vectors to template functions. But then gaming code wants to do exactly that. It's hard to figure where to draw the line. Imagine the error message "Hey, you're going a bit overboard by passing 512 bytes around on the stack". Besides, we already do have a solution for pass-by-value vectors: Tuple!(T[N]). That would put the burden in the right place (on the programmer actively wanting pass-by-value). But then it's a shame that the built-in type T[N] is a weird exception that must be handled in all template code. No idea what the right choice is. I'm just dumping whatever is buzzing around my head whenever I think of the issue. AndreiDon wrote:Oh. I guess it was just a proposal then, and not implemented. We're not as close as I thought. Bummer.float[4] would be a greatly superior option if it could be done. The key requirements are: (1) need to specify that static arrays are passed by value. (2) need to keep stack aligned to 16. The good news is that both of these appear to be done on DMD2-Mac!I agree with float[4] as a good choice. So are value semantics for T[n] implemented on the Mac?? Andrei
Feb 21 2009
On Sat, Feb 21, 2009 at 12:32 PM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Yah. Walter agrees that that's the right thing to do. The only thing that worries us is passing by-value large statically-sized vectors to template functions. But then gaming code wants to do exactly that. It's hard to figure where to draw the line. Imagine the error message "Hey, you're going a bit overboard by passing 512 bytes around on the stack".Structs already work like this. In fact, the compiler will pass a struct in a register if it's 1, 2, or 4 bytes on x86. Having the compiler "magically" put float[4]s in SSE registers seems like a similar idea.Besides, we already do have a solution for pass-by-value vectors: Tuple!(T[N]). That would put the burden in the right place (on the programmer actively wanting pass-by-value). But then it's a shame that the built-in type T[N] is a weird exception that must be handled in all template code.Please make them value types. I, for one, am tired of dealing with their crap.
Feb 21 2009
Jarrett Billingsley wrote:On Sat, Feb 21, 2009 at 12:32 PM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I agree.Yah. Walter agrees that that's the right thing to do. The only thing that worries us is passing by-value large statically-sized vectors to template functions. But then gaming code wants to do exactly that. It's hard to figure where to draw the line. Imagine the error message "Hey, you're going a bit overboard by passing 512 bytes around on the stack".Structs already work like this. In fact, the compiler will pass a struct in a register if it's 1, 2, or 4 bytes on x86. Having the compiler "magically" put float[4]s in SSE registers seems like a similar idea.Ok, you just tipped the balance :o). I'm also realizing something. The scenario I'm most afraid of is something like: char[10000] humongous = "This is a humongous message. I will type here exactly 10000 characters. ... "; foreach (i; 1 .. 100_000_000) writeln(humongous); But then there is a reason making this scenario rather scarce: for large static arrays, it's hard to keep the claimed length (10000) in sync with the actual length of the vectors. I used to think that's a language defect and suggested the syntax char[$] humongous = " ... " for it, such that the compiler infers the length from the initializer. But now I get to think that the defect actually discourages people from defining very large statically-sized arrays unwittingly. With mixins and template techniques, very large static arrays can still be generated, but such advanced uses also has a nice feedback: those who know the language well enough to embark on such styles of coding will also likely understand the cautions needed in making them work well. So, yes, it seems like it's a solid choice to make statically-sized arrays value types. Now we only need to convince Walter that implementation is "a simple matter of coding" :o). AndreiBesides, we already do have a solution for pass-by-value vectors: Tuple!(T[N]). That would put the burden in the right place (on the programmer actively wanting pass-by-value). But then it's a shame that the built-in type T[N] is a weird exception that must be handled in all template code.Please make them value types. I, for one, am tired of dealing with their crap.
Feb 21 2009
On Sun, Feb 22, 2009 at 4:00 AM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Jarrett Billingsley wrote:I don't follow you. Wouldn't they rather pass such huge chunks of data by reference? --bbOn Sat, Feb 21, 2009 at 12:32 PM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Yah. Walter agrees that that's the right thing to do. The only thing that worries us is passing by-value large statically-sized vectors to template functions. But then gaming code wants to do exactly that.
Feb 21 2009
Bill Baxter wrote:On Sun, Feb 22, 2009 at 4:00 AM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:What I'm saying (sorry for being unclear) is: 1. If we choose T[N] as a value type, the downside is that people may pass large arrays by values to e.g. template functions. 2. The upside is that gaming programmers DO want to pass short arrays of type T[N] by value. The conundrum is that a type system can't say that T[N] has some semantics for N <= Nmax and some other semantics for N > Nmax. So we need to pick one, and probably picking the value semantics is the right thing to do. AndreiJarrett Billingsley wrote:I don't follow you. Wouldn't they rather pass such huge chunks of data by reference? --bbOn Sat, Feb 21, 2009 at 12:32 PM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Yah. Walter agrees that that's the right thing to do. The only thing that worries us is passing by-value large statically-sized vectors to template functions. But then gaming code wants to do exactly that.
Feb 21 2009
On Sun, Feb 22, 2009 at 5:11 AM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Bill Baxter wrote:Ok. And for large static arrays you can still explicitly use ref, right? That's what I was confused about -- sounded like you were saying ref wouldn't even be possible. I don't think there are many cases where you don't know in advance if the static array you are expecting is huge or not. And you can always get creative with static if() in the cases where you don't. --bbOn Sun, Feb 22, 2009 at 4:00 AM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:What I'm saying (sorry for being unclear) is: 1. If we choose T[N] as a value type, the downside is that people may pass large arrays by values to e.g. template functions. 2. The upside is that gaming programmers DO want to pass short arrays of type T[N] by value. The conundrum is that a type system can't say that T[N] has some semantics for N <= Nmax and some other semantics for N > Nmax. So we need to pick one, and probably picking the value semantics is the right thing to do.Jarrett Billingsley wrote:I don't follow you. Wouldn't they rather pass such huge chunks of data by reference? --bbOn Sat, Feb 21, 2009 at 12:32 PM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Yah. Walter agrees that that's the right thing to do. The only thing that worries us is passing by-value large statically-sized vectors to template functions. But then gaming code wants to do exactly that.
Feb 21 2009
Bill Baxter wrote:On Sun, Feb 22, 2009 at 5:11 AM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Yah, ref on the callee side, or the [] operator without any arguments on the caller side. AndreiBill Baxter wrote:Ok. And for large static arrays you can still explicitly use ref, right?On Sun, Feb 22, 2009 at 4:00 AM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:What I'm saying (sorry for being unclear) is: 1. If we choose T[N] as a value type, the downside is that people may pass large arrays by values to e.g. template functions. 2. The upside is that gaming programmers DO want to pass short arrays of type T[N] by value. The conundrum is that a type system can't say that T[N] has some semantics for N <= Nmax and some other semantics for N > Nmax. So we need to pick one, and probably picking the value semantics is the right thing to do.Jarrett Billingsley wrote:I don't follow you. Wouldn't they rather pass such huge chunks of data by reference? --bbOn Sat, Feb 21, 2009 at 12:32 PM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Yah. Walter agrees that that's the right thing to do. The only thing that worries us is passing by-value large statically-sized vectors to template functions. But then gaming code wants to do exactly that.
Feb 21 2009
On Sat, Feb 21, 2009 at 5:47 PM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Yah, ref on the callee side, or the [] operator without any arguments on the caller side.Wait, what? I wouldn't have expected anything but "ref type[n] foo" on the function parameter to pass byref. What are you saying here?
Feb 21 2009
Jarrett Billingsley wrote:On Sat, Feb 21, 2009 at 5:47 PM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:void foo(T)(T t) {} int[5_000_000] i; foo(i[]); // foo!(int[]) -- semi-by-reference foo(i); // foo!(int[5_000_000]) -- by valueYah, ref on the callee side, or the [] operator without any arguments on the caller side.Wait, what? I wouldn't have expected anything but "ref type[n] foo" on the function parameter to pass byref. What are you saying here?
Feb 21 2009
On Sat, Feb 21, 2009 at 6:37 PM, Christopher Wright <dhasenan gmail.com> wrote:Jarrett Billingsley wrote:Oh. See, I don't usually template _everything_ so that didn't cross my mind ;)On Sat, Feb 21, 2009 at 5:47 PM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:void foo(T)(T t) {} int[5_000_000] i; foo(i[]); // foo!(int[]) -- semi-by-reference foo(i); // foo!(int[5_000_000]) -- by valueYah, ref on the callee side, or the [] operator without any arguments on the caller side.Wait, what? I wouldn't have expected anything but "ref type[n] foo" on the function parameter to pass byref. What are you saying here?
Feb 21 2009
On 2009-02-21 15:11:15 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:The conundrum is that a type system can't say that T[N] has some semantics for N <= Nmax and some other semantics for N > Nmax. So we need to pick one, and probably picking the value semantics is the right thing to do.I think it is the right decision too. This way "static array" becomes the container type and "dynamic array" is the corresonding range type. Perhaps some concept renaming is in order for D2: static array => array dynamic array => array range (or slice) -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Feb 21 2009
Michel Fortin wrote:On 2009-02-21 15:11:15 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Yah, and that would give a good model to follow for user-defined containers. AndreiThe conundrum is that a type system can't say that T[N] has some semantics for N <= Nmax and some other semantics for N > Nmax. So we need to pick one, and probably picking the value semantics is the right thing to do.I think it is the right decision too. This way "static array" becomes the container type and "dynamic array" is the corresonding range type. Perhaps some concept renaming is in order for D2: static array => array dynamic array => array range (or slice)
Feb 21 2009
Andrei Alexandrescu:The conundrum is that a type system can't say that T[N] has some semantics for N <= Nmax and some other semantics for N > Nmax. So we need to pick one, and probably picking the value semantics is the right thing to do.Well, I think the type system can be extended to manage that: the programmer may specify an optional compiler command line argument like: -ms 128 Now all structs/static arrays more than 128 bytes long are passed by reference :-) Alternative solution, less extreme: instead of changing the value/ref pass semantics, when you add such optional command line argument the compiler gives you a compilation warning (or even error, if you want) everywhere you try to pass by value struct or static array more than 128 bytes long. Bye, bearophile
Feb 22 2009
Andrei Alexandrescu wrote:Jarrett Billingsley wrote:This is less of an issue with the type of string literals being invariant(T)[] rather than invariant(T)[length]. Even less so if the default type for array literals were dynamic rather than static arrays.Please make them value types. I, for one, am tired of dealing with their crap.Ok, you just tipped the balance :o). I'm also realizing something. The scenario I'm most afraid of is something like: char[10000] humongous = "This is a humongous message. I will type here exactly 10000 characters. ... ";
Feb 21 2009
Christopher Wright wrote:Andrei Alexandrescu wrote:Exactly so. So with this other fix in the language there's even more push toward value semantics for T[N]. AndreiJarrett Billingsley wrote:This is less of an issue with the type of string literals being invariant(T)[] rather than invariant(T)[length]. Even less so if the default type for array literals were dynamic rather than static arrays.Please make them value types. I, for one, am tired of dealing with their crap.Ok, you just tipped the balance :o). I'm also realizing something. The scenario I'm most afraid of is something like: char[10000] humongous = "This is a humongous message. I will type here exactly 10000 characters. ... ";
Feb 21 2009
Don:I don't think that's messy at all. I can't see much difference between special support for float[4] versus float4. It's better if the code can take advantage of hardware without specific support. Bear in mind that SSE/SSE2 is a temporary situation. AVX provides for much longer arrays of vectors; and it's extensible. You'd end up needing to keep adding on special types whenever a new CPU comes out. Note that the fundamental concept which is missing from the C virtual machine is that all modern machines can efficiently perform operations on arrays of built-in types of length 2^n, for some small value of n. We need to get this into the language abstraction. Not follow C++ in hacking a few extra special types onto the old, deficient C model. And I think D is actually in a position to do this. float[4] would be a greatly superior option if it could be done. The key requirements are: (1) need to specify that static arrays are passed by value. (2) need to keep stack aligned to 16. The good news is that both of these appear to be done on DMD2-Mac!I have quoted it all because I like the experience and ideas you bring to D. But how can the operations like shuffling, or how to map a sqrt on just the first item of such 4 floats, or on them all, etc? What syntax can be used? Regarding the array operations already implemented, are there ways to force the compiler to inline such code/operations? Bye, bearophile
Feb 21 2009
On 2009-02-21 17:03:06 +0100, Don <nospam nospam.com> said:I don't think that's messy at all. I can't see much difference between special support for float[4] versus float4. It's better if the code can take advantage of hardware without specific support. Bear in mind that SSE/SSE2 is a temporary situation. AVX provides for much longer arrays of vectors; and it's extensible. You'd end up needing to keep adding on special types whenever a new CPU comes out. Note that the fundamental concept which is missing from the C virtual machine is that all modern machines can efficiently perform operations on arrays of built-in types of length 2^n, for some small value of n. We need to get this into the language abstraction. Not follow C++ in hacking a few extra special types onto the old, deficient C model. And I think D is actually in a position to do this. float[4] would be a greatly superior option if it could be done. The key requirements are: (1) need to specify that static arrays are passed by value. (2) need to keep stack aligned to 16. The good news is that both of these appear to be done on DMD2-Mac!Yes, float[4] would be ok, if some CPU independent permutation support can be added. Would this be with some intrinsic then or what? I very much like the OpenCL syntax for permutation, but I suppose that an intrinsic such as "float[4] noref permute(float[4] noref vec, int newPos0, int newPos1, int newPos2, int newPos3)" would work as well. Note that this should also work with double[2], byte[16], short[8] and int[4]. How would pass by value semantics be implemented without breaking compatibility work, would you you have (yet another) type qulifier (noref used in the example above)? In my opinion, vectors are a fundamental type in my mind and there is a reason that arrays and vectors are kept separate in LLVM. The problem is exposing this to the programmer in the proper way. OpenCL does have a lot of nice things in it that might be worth considering. But, yeah, if something is done for ensuring the alignment of power of 2 vectors, the permutation support and the pass by value, then I would be fairly happy with that as well. / Mattias
Feb 21 2009
I think that the following would work reasonably well: allow the [] operator for arrays to take comma separated lists of indices. So the OpenCL like statement: v.xyzw = v2.wzyx; will be written as: v[0,1,2,3] = v2[3,2,1,0]; Would this be ok? This is a general extension of the array slicing, and it might be possible to permute with a combination of slices and indices like this (i.e. v[0..3, 6, 5, 7.. len]). Is permutation operations something that Walter would be willing to add? As said by someone else in this thread, there need to be a way to specify that static arrays are passed by value, so can the ref keyword be paired with the oposite "byval" or something similar. And also, functions need to be able to return static arrays which is not possible at the moment. Note that the support should be general and work with any array type (so that you can get YMM support whenever that makes it into the future chips). / Mattias
Feb 22 2009
On Sun, 22 Feb 2009 14:02:53 +0300, Mattias Holm <hannibal.holm gmail.com> wrote:I think that the following would work reasonably well: allow the [] operator for arrays to take comma separated lists of indices. So the OpenCL like statement: v.xyzw = v2.wzyx; will be written as: v[0,1,2,3] = v2[3,2,1,0]; Would this be ok? This is a general extension of the array slicing, and it might be possible to permute with a combination of slices and indices like this (i.e. v[0..3, 6, 5, 7.. len]). Is permutation operations something that Walter would be willing to add? As said by someone else in this thread, there need to be a way to specify that static arrays are passed by value, so can the ref keyword be paired with the oposite "byval" or something similar. And also, functions need to be able to return static arrays which is not possible at the moment. Note that the support should be general and work with any array type (so that you can get YMM support whenever that makes it into the future chips). / MattiasHow would you implement it for user-defined types?
Feb 22 2009
Denis Koroskin wrote:On Sun, 22 Feb 2009 14:02:53 +0300, Mattias Holm <hannibal.holm gmail.com> wrote:T[] opIndex(int[] indices) { ... } void opIndexAssign(int[] indices, T[] values) { ... }I think that the following would work reasonably well: allow the [] operator for arrays to take comma separated lists of indices. So the OpenCL like statement: v.xyzw = v2.wzyx; will be written as: v[0,1,2,3] = v2[3,2,1,0];How would you implement it for user-defined types?
Feb 22 2009
On Sun, 22 Feb 2009 16:51:10 +0300, Christopher Wright <dhasenan gmail.com> wrote:Denis Koroskin wrote:How about ranges included - v[0..3, 6, 5, 7..len] ?On Sun, 22 Feb 2009 14:02:53 +0300, Mattias Holm <hannibal.holm gmail.com> wrote:T[] opIndex(int[] indices) { ... } void opIndexAssign(int[] indices, T[] values) { ... }I think that the following would work reasonably well: allow the [] operator for arrays to take comma separated lists of indices. So the OpenCL like statement: v.xyzw = v2.wzyx; will be written as: v[0,1,2,3] = v2[3,2,1,0];How would you implement it for user-defined types?
Feb 22 2009
Mattias Holm:I think that the following would work reasonably well: allow the [] operator for arrays to take comma separated lists of indices. So the OpenCL like statement: v.xyzw = v2.wzyx; will be written as: v[0,1,2,3] = v2[3,2,1,0];Be careful, I think a syntax like: v[0, 1] v[0, 1, 2, 3] Is better left to index in 2D and 4D arrays. nD arrays can be quite important in a language like D. So the following may be better and more compatible with the future: v[0; 1; 2; 3] = v2[3; 2; 1; 0]; Or: v[(0,1,2,3)] = v2[(3,2,1,0)]; Or: v[[0,1,2,3]] = v2[[3,2,1,0]]; Bye, bearophile
Feb 22 2009
bearophile wrote:Mattias Holm:Swizzling a vector and multidimensional access are orthogonal on account of vectors having only one dimension [1]. struct FloatTuple(size_t n); // exercise to the reader :P struct vec4f { union { float[4] data; struct { float w, x, y, z; } } float opIndex(size_t i) { return data[i]; } float opIndexAssign(float v, size_t i) { return data[i] = v; } FloatTuple!(2) opIndex(size_t i, size_t j) { return FloatTuple!(2)(data[i],data[j]); } FloatTuple!(2) opIndexAssign(FloatTuple!(2) vs, size_t i, size_t j) { data[i] = vs.data[0]; data[j] = vs.data[1]; return vs; } // and so on for 3 and 4 argument versions. } Personally, I think something like this is a better idea: struct vec4f { ... FloatTuple!(PermSpec.length) perm(string PermSpec)() { FloatTuple!(PermSpec.length) vs; foreach( i ; Range!(PermSpec.length) ) vs.data[i] = mixin(`this.`~PermSpec[i]); return vs; } } void main() { vec4f a, b; a = ...; b = a.perm!("xyzw"); } -- Daniel [1] By this I mean they're a 1D data type; vec4f represents a 4D vector but in terms of implementation, it has only one dimension: the index of data.I think that the following would work reasonably well: allow the [] operator for arrays to take comma separated lists of indices. So the OpenCL like statement: v.xyzw = v2.wzyx; will be written as: v[0,1,2,3] = v2[3,2,1,0];Be careful, I think a syntax like: v[0, 1] v[0, 1, 2, 3] Is better left to index in 2D and 4D arrays. nD arrays can be quite important in a language like D. So the following may be better and more compatible with the future: v[0; 1; 2; 3] = v2[3; 2; 1; 0]; Or: v[(0,1,2,3)] = v2[(3,2,1,0)]; Or: v[[0,1,2,3]] = v2[[3,2,1,0]]; Bye, bearophile
Feb 22 2009
Mattias Holm wrote:I think that the following would work reasonably well: allow the [] operator for arrays to take comma separated lists of indices.I've always believed that we need that syntax for multi-dimensional arrays. Swizzling ought to be possible on a multi-dimensional array, and I don't think it would be, with your proposal?So the OpenCL like statement: v.xyzw = v2.wzyx; will be written as: v[0,1,2,3] = v2[3,2,1,0]; Would this be ok? This is a general extension of the array slicing, and it might be possible to permute with a combination of slices and indices like this (i.e. v[0..3, 6, 5, 7.. len]). Is permutation operations something that Walter would be willing to add? As said by someone else in this thread, there need to be a way to specify that static arrays are passed by value, so can the ref keyword be paired with the oposite "byval" or something similar. And also, functions need to be able to return static arrays which is not possible at the moment. Note that the support should be general and work with any array type (so that you can get YMM support whenever that makes it into the future chips). / Mattias
Feb 23 2009
Mattias Holm wrote:On 2009-02-21 17:03:06 +0100, Don <nospam nospam.com> said:Note that if you had static arrays with value semantics, with proper alignment, then you could simply create module std.swizzle; float[4] permute(float[4] vec, int newPos0, int newPos1, int newPos2, int newPos3); /* intrinsic */ float[4] wzyx(float[4] q) { return permute(q, 4, 3, 2, 1); } float[4] xywz(float[4] q) { return permute(q, 1, 2, 4, 3); } // etc --- and your code would be: import std.swizzle; void main() { float[4] t; auto u = t.wzyx; } I don't think this is terribly difficult once the value semantics are in place. (Note that once you get beyond 4 members, the .xyzw syntax gives an explosion of functions; but I think it's workable at 4; 4! is only 24. Beyond that point, you'd probably require direct permute calls).I don't think that's messy at all. I can't see much difference between special support for float[4] versus float4. It's better if the code can take advantage of hardware without specific support. Bear in mind that SSE/SSE2 is a temporary situation. AVX provides for much longer arrays of vectors; and it's extensible. You'd end up needing to keep adding on special types whenever a new CPU comes out. Note that the fundamental concept which is missing from the C virtual machine is that all modern machines can efficiently perform operations on arrays of built-in types of length 2^n, for some small value of n. We need to get this into the language abstraction. Not follow C++ in hacking a few extra special types onto the old, deficient C model. And I think D is actually in a position to do this. float[4] would be a greatly superior option if it could be done. The key requirements are: (1) need to specify that static arrays are passed by value. (2) need to keep stack aligned to 16. The good news is that both of these appear to be done on DMD2-Mac!Yes, float[4] would be ok, if some CPU independent permutation support can be added. Would this be with some intrinsic then or what? I very much like the OpenCL syntax for permutation, but I suppose that an intrinsic such as "float[4] noref permute(float[4] noref vec, int newPos0, int newPos1, int newPos2, int newPos3)" would work as well. Note that this should also work with double[2], byte[16], short[8] and int[4].
Feb 23 2009
On Mon, Feb 23, 2009 at 5:18 PM, Don <nospam nospam.com> wrote:Mattias Holm wrote:Actually its 4^4 if you do it like OpenCL/GLSL/HLSL/Cg and allow repeats like .xxyy. --bb --bbOn 2009-02-21 17:03:06 +0100, Don <nospam nospam.com> said:Note that if you had static arrays with value semantics, with proper alignment, then you could simply create module std.swizzle; float[4] permute(float[4] vec, int newPos0, int newPos1, int newPos2, int newPos3); /* intrinsic */ float[4] wzyx(float[4] q) { return permute(q, 4, 3, 2, 1); } float[4] xywz(float[4] q) { return permute(q, 1, 2, 4, 3); } // etc --- and your code would be: import std.swizzle; void main() { float[4] t; auto u = t.wzyx; } I don't think this is terribly difficult once the value semantics are in place. (Note that once you get beyond 4 members, the .xyzw syntax gives an explosion of functions; but I think it's workable at 4; 4! is only 24. Beyond that point, you'd probably require direct permute calls).I don't think that's messy at all. I can't see much difference between special support for float[4] versus float4. It's better if the code can take advantage of hardware without specific support. Bear in mind that SSE/SSE2 is a temporary situation. AVX provides for much longer arrays of vectors; and it's extensible. You'd end up needing to keep adding on special types whenever a new CPU comes out. Note that the fundamental concept which is missing from the C virtual machine is that all modern machines can efficiently perform operations on arrays of built-in types of length 2^n, for some small value of n. We need to get this into the language abstraction. Not follow C++ in hacking a few extra special types onto the old, deficient C model. And I think D is actually in a position to do this. float[4] would be a greatly superior option if it could be done. The key requirements are: (1) need to specify that static arrays are passed by value. (2) need to keep stack aligned to 16. The good news is that both of these appear to be done on DMD2-Mac!Yes, float[4] would be ok, if some CPU independent permutation support can be added. Would this be with some intrinsic then or what? I very much like the OpenCL syntax for permutation, but I suppose that an intrinsic such as "float[4] noref permute(float[4] noref vec, int newPos0, int newPos1, int newPos2, int newPos3)" would work as well. Note that this should also work with double[2], byte[16], short[8] and int[4].
Feb 23 2009
Bill Baxter wrote:On Mon, Feb 23, 2009 at 5:18 PM, Don <nospam nospam.com> wrote:Yes. Is the syntax sugar actually needed for all the permutations? Even so, it's still only 256, which is probably still OK. I don't think a language change is required. This scheme doesn't cover: * shufp where the two sources are different * haddpd, haddps [SSE3] { double[2] a, b; a[0]=a[0]+a[1]; a[1]=b[0]+b[1]; } * non-temporal stores (although I think these are covered adequately by array operations) and the byte/word operations: * pack with saturation * movmsk * avg * multiply and add. So it looks to me as though with the minimal language changes, we could get almost complete SIMD support, with excellent syntax.Mattias Holm wrote:Actually its 4^4 if you do it like OpenCL/GLSL/HLSL/Cg and allow repeats like .xxyy.On 2009-02-21 17:03:06 +0100, Don <nospam nospam.com> said:Note that if you had static arrays with value semantics, with proper alignment, then you could simply create module std.swizzle; float[4] permute(float[4] vec, int newPos0, int newPos1, int newPos2, int newPos3); /* intrinsic */ float[4] wzyx(float[4] q) { return permute(q, 4, 3, 2, 1); } float[4] xywz(float[4] q) { return permute(q, 1, 2, 4, 3); } // etc --- and your code would be: import std.swizzle; void main() { float[4] t; auto u = t.wzyx; } I don't think this is terribly difficult once the value semantics are in place. (Note that once you get beyond 4 members, the .xyzw syntax gives an explosion of functions; but I think it's workable at 4; 4! is only 24. Beyond that point, you'd probably require direct permute calls).I don't think that's messy at all. I can't see much difference between special support for float[4] versus float4. It's better if the code can take advantage of hardware without specific support. Bear in mind that SSE/SSE2 is a temporary situation. AVX provides for much longer arrays of vectors; and it's extensible. You'd end up needing to keep adding on special types whenever a new CPU comes out. Note that the fundamental concept which is missing from the C virtual machine is that all modern machines can efficiently perform operations on arrays of built-in types of length 2^n, for some small value of n. We need to get this into the language abstraction. Not follow C++ in hacking a few extra special types onto the old, deficient C model. And I think D is actually in a position to do this. float[4] would be a greatly superior option if it could be done. The key requirements are: (1) need to specify that static arrays are passed by value. (2) need to keep stack aligned to 16. The good news is that both of these appear to be done on DMD2-Mac!Yes, float[4] would be ok, if some CPU independent permutation support can be added. Would this be with some intrinsic then or what? I very much like the OpenCL syntax for permutation, but I suppose that an intrinsic such as "float[4] noref permute(float[4] noref vec, int newPos0, int newPos1, int newPos2, int newPos3)" would work as well. Note that this should also work with double[2], byte[16], short[8] and int[4].
Feb 23 2009
Don:This scheme doesn't cover:...and the byte/word operations:...So it looks to me as though with the minimal language changes, we could get almost complete SIMD support, with excellent syntax.Do you consider things from the short future too? http://en.wikipedia.org/wiki/SSE5 http://en.wikipedia.org/wiki/Advanced_Vector_Extensions Bye, bearophile
Feb 23 2009
Don wrote:Bill Baxter wrote:There's no need to ever enumerate all functions - they can be generated with templates and mixins rather easily.Actually its 4^4 if you do it like OpenCL/GLSL/HLSL/Cg and allow repeats like .xxyy.Yes. Is the syntax sugar actually needed for all the permutations? Even so, it's still only 256, which is probably still OK. I don't think a language change is required.This scheme doesn't cover: * shufp where the two sources are different * haddpd, haddps [SSE3] { double[2] a, b; a[0]=a[0]+a[1]; a[1]=b[0]+b[1]; } * non-temporal stores (although I think these are covered adequately by array operations)Well probably we can find ways to generate those too.and the byte/word operations: * pack with saturation * movmsk * avg * multiply and add. So it looks to me as though with the minimal language changes, we could get almost complete SIMD support, with excellent syntax.That sounds great. Andrei
Feb 23 2009
On Mon, Feb 23, 2009 at 10:24 PM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Don wrote:I think the issue is just that however you get them there it's a lot of exe bloat. The ideal would be something like a template that only got instantiated when used, like vec2 = vec1.swizzle!("xyzy"); ... but with the syntax vec2 = vec1.xyzy. And which can generate optimal assembly. Really, if .swizzle!("xyzy") can do that, though, that looks good enough to me. I would rather have that than 256 silly little functions that get instantiated no matter what. --bbBill Baxter wrote:There's no need to ever enumerate all functions - they can be generated with templates and mixins rather easily.Actually its 4^4 if you do it like OpenCL/GLSL/HLSL/Cg and allow repeats like .xxyy.Yes. Is the syntax sugar actually needed for all the permutations? Even so, it's still only 256, which is probably still OK. I don't think a language change is required.
Feb 23 2009
Don wrote:So it looks to me as though with the minimal language changes, we could get almost complete SIMD support, with excellent syntax.enum { x=0, y=1, z=2, w=3 } float[4] foo; foo[x] = 42; foo[y] = foo[x]; // etc foo[] = [foo[y],foo[x],foo[y],foo[x]]; *grin*
Feb 23 2009
On 2009-02-23 14:48:50 +0100, Chad J <gamerchad __spam.is.bad__gmail.com> said:Don wrote:Sorry to bump up this discussion, but I was away and then busy with other stuff... and coming back I had lot of piled up work... (also make tango work wit the brand new dmd on mac ;) so I had missed it. I think that an aligned vector would be very useful to have for small vectors. Intrinsic functions could be useful, but I agree with the "joke" done by Chad. What I really want to see is that the compiler uses them when it should (as I think downs did with his autovectorization patch for gdc). Not too much cluttering for special notation, but the normal one that is done efficiently when possible. A special type is probably needed (for alignment reasons), but then that's about it from the language point of view (even that might be avoided with align, or maybe sometime even without, but maybe that is to expect too much from the compiler). I also would like to stress again that there are also doubles (but indeed the gain in that case is much smaller). FawziSo it looks to me as though with the minimal language changes, we could get almost complete SIMD support, with excellent syntax.enum { x=0, y=1, z=2, w=3 } float[4] foo; foo[x] = 42; foo[y] = foo[x]; // etc foo[] = [foo[y],foo[x],foo[y],foo[x]]; *grin*
Feb 27 2009
Andrei Alexandrescu wrote:Denis Koroskin wrote:I remember implementing a vector struct [1] quite some time ago that had an SSE-accelerated path. There were three problems I had with it: 1. The alignment thing. Incidentally, I just did a quick check and don't see any notes in the changelog about __align(n) syntax. As I remember, there was no way to actually ensure the data was properly aligned. (There's "Data items in static data segment >= 16 bytes in size are now paragraph aligned." but that doesn't help when the vectors are on, say, the stack or in the heap.) 2. As soon as you use inline asm, you lose inlining. When the functions are as small as they are, this can be a bit of overhead. It gets worse when you realise that the CPU is spending most of its time running data back and forth between main memory and the XMM registers... Array operations help, but they don't cover everything. 3. There was a not insignificant performance difference for using byref passing on operators over byval passing. Of course, you can't ACTUALLY use byref because it completely breaks anything that uses a temporary expression as an argument. In the end, I just dropped it to see how BLADE would turn out. I ended up coming to the conclusion that while we can do a float[4] vector in D and use SIMD to speed it up, there's not much point when BLADE is there. Of course, BLADE is a little unwieldy to use what with that mixin malarky. Pity we didn't get AST macros... :P Anyway, just my AUD$0.02. -- Daniel [1] That struct was scary. It was one of those Vector!(type, size) jobbies, so it had multiple paths through functions, members that only existed for certain sizes, special-cased loop unrolling... don't even ASK about the matrix struct... :POn Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm <hannibal.holm gmail.com> wrote:Yah, I was thinking the same: struct float4 { __align(16) float[4] data; // right syntax and value? alias data this; } This looks like something that should go into std.matrix pronto. It even has value semantics even though fixed arrays don't :o/. AndreiSince (SIMD) vectors are so common and every reasonabe system support them in one way or the other (and scalar emulation of this is rather simple), why not have support for this in D directly? [snip]I don't see any reason why float4 can't be made a library type.
Feb 19 2009
On Thu, Feb 19, 2009 at 9:31 PM, Daniel Keep <daniel.keep.lists gmail.com> wrote:struct float4 { __align(16) float[4] data; // right syntax and value? alias data this; }1. The alignment thing. Incidentally, I just did a quick check and don't see any notes in the changelog about __align(n) syntax. As I remember, there was no way to actually ensure the data was properly aligned. (There's "Data items in static data segment >= 16 bytes in size are now paragraph aligned." but that doesn't help when the vectors are on, say, the stack or in the heap.)It's just align(16). http://www.digitalmars.com/d/1.0/attribute.html#align
Feb 19 2009
Jarrett Billingsley wrote:On Thu, Feb 19, 2009 at 9:31 PM, Daniel Keep <daniel.keep.lists gmail.com> wrote:Yeah, except it doesn't do what you want in this case. For example: module alignment; import tango.io.Stdout; struct float4 { align(16) float[4] data; } void main() { float4 a; byte b; float4 c; Stdout.format("a {0} & 0xF == {1}", &a, cast(size_t)&a & 0xF).newline; Stdout.format("c {0} & 0xF == {1}", &c, cast(size_t)&c & 0xF).newline; } Output is: a 12fe68 & 0xF == 8 c 12fe78 & 0xF == 8 The only way I found of guaranteeing alignment was to allocate all vectors on the heap, using a custom allocator to allocate an extra 15 bytes and then align the result appropriately. Obviously, for 16-byte vectors, this is completely unacceptable. -- Danielstruct float4 { __align(16) float[4] data; // right syntax and value? alias data this; }1. The alignment thing. Incidentally, I just did a quick check and don't see any notes in the changelog about __align(n) syntax. As I remember, there was no way to actually ensure the data was properly aligned. (There's "Data items in static data segment >= 16 bytes in size are now paragraph aligned." but that doesn't help when the vectors are on, say, the stack or in the heap.)It's just align(16). http://www.digitalmars.com/d/1.0/attribute.html#align
Feb 19 2009
Daniel Keep wrote:Jarrett Billingsley wrote:http://d.puremagic.com/issues/show_bug.cgi?id=2278On Thu, Feb 19, 2009 at 9:31 PM, Daniel Keep <daniel.keep.lists gmail.com> wrote:Yeah, except it doesn't do what you want in this case. For example: module alignment; import tango.io.Stdout; struct float4 { align(16) float[4] data; } void main() { float4 a; byte b; float4 c; Stdout.format("a {0} & 0xF == {1}", &a, cast(size_t)&a & 0xF).newline; Stdout.format("c {0} & 0xF == {1}", &c, cast(size_t)&c & 0xF).newline; } Output is: a 12fe68 & 0xF == 8 c 12fe78 & 0xF == 8 The only way I found of guaranteeing alignment was to allocate all vectors on the heap, using a custom allocator to allocate an extra 15 bytes and then align the result appropriately. Obviously, for 16-byte vectors, this is completely unacceptable. -- DanielIt's just align(16). http://www.digitalmars.com/d/1.0/attribute.html#alignstruct float4 { __align(16) float[4] data; // right syntax and value? alias data this; }1. The alignment thing. Incidentally, I just did a quick check and don't see any notes in the changelog about __align(n) syntax. As I remember, there was no way to actually ensure the data was properly aligned. (There's "Data items in static data segment >= 16 bytes in size are now paragraph aligned." but that doesn't help when the vectors are on, say, the stack or in the heap.)
Feb 19 2009
http://d.puremagic.com/issues/show_bug.cgi?id=2278That bug is interesting. Didn't Walter have to change DMD for the Mac to make sure the stack is aligned to 16 bytes? Perhaps most of the work is done now? L.
Feb 20 2009
Daniel Keep wrote:Andrei Alexandrescu wrote:We have that in dglut too :) It's optimized for the float[3]/float[4] case - all float[3]s add a hidden bogus member, then the arithmetic operations generate for loops which are very easy for a properly patched gdc to autovectorize :) No loss of inlining, which means no ref VS. val issues. Alignment is still a problem, but movups in the aligned case is just as fast as movaps, so I figure it doesn't matter that much. Autovec is sweet.Denis Koroskin wrote:I remember implementing a vector struct [1] quite some time ago that had an SSE-accelerated path.On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm <hannibal.holm gmail.com> wrote:Yah, I was thinking the same: struct float4 { __align(16) float[4] data; // right syntax and value? alias data this; } This looks like something that should go into std.matrix pronto. It even has value semantics even though fixed arrays don't :o/. AndreiSince (SIMD) vectors are so common and every reasonabe system support them in one way or the other (and scalar emulation of this is rather simple), why not have support for this in D directly? [snip]I don't see any reason why float4 can't be made a library type.
Feb 20 2009
On Fri, Feb 20, 2009 at 4:25 AM, Mattias Holm <hannibal.holm gmail.com> wrote:Since (SIMD) vectors are so common and every reasonabe system support them in one way or the other (and scalar emulation of this is rather simple), why not have support for this in D directly? Yes, the array operations are nice (and one of the main reasons for why I like D :) ), but have the problem that an array of floats must be aligned on float boundaries and not vector boundaries. In my mind vectors are a primitive data type that should be exposed by the programming language.To justify making them primitive types you need to show that they are widespread, and that there is some good reason that they cannot be implemented in a library. And even if they can't be implemented in a library right now, it could be that fixing that reason is better than making new built-in types. For instance I think there's issues now with getting the right stack alignment for structs. But that's something that needs to be fixed generally, not by making new built-in types that know how to do alignment right.Something OpenCL-like: float4 vec; vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment vec.xyzw = vec.wyxz; // permutation vec[i] = 1.0; // indexing And then we can easily immagine some extra nice features to have with respect to operators: vec ^ vec2; // 3d cross product for float vectors, for int vectors xorAll of this is pretty much doable in a library. The only nonobvious one is vec.xyzw = vec.wyxz which can be implemented using a bunch of CTFE auto-code generation (see some earlier version of H3r3tic's Hybrid library). And using xor for cross product sucks. It doesn't have the right precedent.Has this been discussed before?I think it has and no one came up with a reason why these can't be library types. --bb
Feb 19 2009
Bill Baxter:I think it has and no one came up with a reason why these can't be library types.Recently I have discussed about this with the LDC main designer, my idea was to to have something like Vect!double, 2) v1; Vect!(float, 4) v2; And then make LDC use LLVM intrinsics for the operations on them. So I think with some intrinsics most of that can be just a library. Bye, bearophile
Feb 19 2009
On 2009-02-19 21:04:06 +0100, Bill Baxter <wbaxter gmail.com> said:To justify making them primitive types you need to show that they are widespread, and that there is some good reason that they cannot be implemented in a library. And even if they can't be implemented in a library right now, it could be that fixing that reason is better than making new built-in types. For instance I think there's issues now with getting the right stack alignment for structs. But that's something that needs to be fixed generally, not by making new built-in types that know how to do alignment right.Firstly they are widespread. They might be hampered by the fact that they have no direct support in many languages however. There are reasons that you nowdays have standard vector syntax in GCC. It is a bit rough (only support basic operations like elementwise add, div, mul et.c.) and does not support permutations. The cross product is only a nice to have operator for floats, but it is far from neccisary, permutations are more painful though as GCC at the moment force you to write CPU-speciffic intrincics for them. There are firstly the alignment issuse, but there are as other said alignment attribute in D as well. More problematic for a library implementation of these types is however calling conventions. Passing a vector into a function is very efficient, they can be passed in registers for both in and out values (if the ABI allows, I think PPC and x86-64 calling conventions allow for this). The problem with the GCC versions however is that due to how the vectors work in it, you have to resort to union hacks to get the scalars out of a vector: union { _m128 data; struct { float x, y, z, w; } } Unfortunatelly, due to this, in order to make the compiler issue nice code, you have to work with the vector type only, and only bring out the union when inspecting the scalars in the vector. Which unfortunatelly will happen quite often (since the union has an ambiguous call by value semantic, should it be passed by one or four xmm regs (assuming x86-64 conventions)). Also, if a vector is a struct you do get the alignment issuse in some cases but you may also have different call by value calling conventions (with the x86-64 they would in principle be the same, but that is just coincidence). I do agrre that if it could be implemeted a library type, then it should, but this doesn't work if you take the fact of calling conventions into account. Also, I am sure that the optimiser could more easily make transformations of vector types than on structs that are being passed around since the semantics of a primitive type is known by the compiler. / Mattias
Feb 19 2009
Mattias Holm wrote:And then we can easily immagine some extra nice features to have with respect to operators: vec ^ vec2; // 3d cross product for float vectors, for int vectors xor Has this been discussed before?Given that the wedge product is a defined operation on vectors (a^b is a bivector in the common plane of a & b with area |a||b|sin θ)—related to, but very distinct from, the cross product—I’d call this a BAD operator overload. (B.T.W., I am starting work on a Geometric Algebra library which will implement all these products.) —Joel Salomon
Feb 22 2009