www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Does dmd have SSE intrinsics?

reply Jeremie Pelletier <jeremiep gmail.com> writes:
While writing SSE assembly by hand in D is fun and works well, I'm wondering if
the compiler has intrinsics for its instruction set, much like xmmintrin.h in C.

The reason is that the compiler can usually reorder the intrinsics to optimize
performance.

I could always use C code to implement my SSE routines but then I'd lose the
ability to inline them in D.
Aug 26 2009
next sibling parent reply Don <nospam nospam.com> writes:
Jeremie Pelletier wrote:
 While writing SSE assembly by hand in D is fun and works well, I'm wondering
if the compiler has intrinsics for its instruction set, much like xmmintrin.h
in C.
 
 The reason is that the compiler can usually reorder the intrinsics to optimize
performance.
 
 I could always use C code to implement my SSE routines but then I'd lose the
ability to inline them in D.
I know this is an old post, but since it wasn't answered... Make sure you know what the SSE intrinsics actually *do* in VC++/Intel! I've read many complaints about how poorly they perform on all compilers -- the penalty for allowing them to be reordered is that extra instructions are often added, which means that straightforward C code is sometimes faster! In this regard, I'm personally excited about array operations. I think the need for SSE intrinsics and vectorisation is a result of abstract inversion: the instruction set is higher-level than the "high level language"! Array operations allow D to catch up with asm again. When array operations get implemented properly, it'll be interesting to see how much need for SSE intrinsics remains.
Sep 21 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Don (nospam nospam.com)'s article
 Jeremie Pelletier wrote:
 While writing SSE assembly by hand in D is fun and works well, I'm wondering
if the compiler has intrinsics for its instruction set, much like xmmintrin.h in C.
 The reason is that the compiler can usually reorder the intrinsics to optimize
performance.
 I could always use C code to implement my SSE routines but then I'd lose the
ability to inline them in D.
 I know this is an old post, but since it wasn't answered...
 Make sure you know what the SSE intrinsics actually *do* in VC++/Intel!
 I've read many complaints about how poorly they perform on all compilers
 -- the penalty for allowing them to be reordered is that extra
 instructions are often added, which means that straightforward C code is
 sometimes faster!
 In this regard, I'm personally excited about array operations. I think
 the need for SSE intrinsics and vectorisation is a result of abstract
 inversion: the instruction set is higher-level than the "high level
 language"! Array operations allow D to catch up with asm again. When
 array operations get implemented properly, it'll be interesting to see
 how much need for SSE intrinsics remains.
What's wrong with the current implementation of array ops (other than a few misc. bugs that have already been filed)? I thought they already use SSE if available.
Sep 21 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
dsimcha:

 What's wrong with the current implementation of array ops (other than a few
misc.
 bugs that have already been filed)?  I thought they already use SSE if
available.
The idea is to improve array operations so they become a handy way to efficiently use present and future (AVX too, http://en.wikipedia.org/wiki/Advanced_Vector_Extensions ) vector instructions. So for example if in my D code I have: float[4] a = [1.f, 2., 3., 4.]; float[4] b[] = 10f; float[4] c = a + b; The compiler has to use a single inlined SSE instruction to implement the third line (the 4 float sum) of D code. And to use two instructions to load & broadcast the float value 10 to a whole XMM register. If the D code is: float[8] a = [1.f, 2., 3., 4., 5., 6., 7., 8.]; float[8] b = [10.f, 20., 30., 40., 50., 60., 70., 80.]; float[8] c = a + b; The current vector instructions aren't wide enough to do that in a single instruction (but future AVX will be able to), so the compiler has to inline two SSE instructions. Currently such operations are implemented with calls to a function (that also tests if/what vector instructions are available), that slow down code if you have to sum just 4 floats. Another problem is that some important semantics is missing, for example some shuffling, and few other things. With some care some, most, or all such operations (keeping a good look at AVX too) can be mapped to built-in array methods... The problem here is that you don't want to tie too much the D language to the currently available vector instructions because in 5-10 years CPUs may change. So what you want is to add enough semantics that later the compiler can compile as it can (with the scalar instructions, with SSE1, with future AVX 1024 bit wide, or with something today unknown). If the language doesn't give enough semantics to the compiler, you are forced to do as GCC that now tries to infer vector operations from normal code, but it's a complex thing and usually not as efficient as using GCC SSE intrinsics. This is something that deserves a thread here :-) In the end implementing all this doesn't look hard. It's mostly a matter of designing it well (while implementing the auto-vectorization as in GCC is harder to implement). Bye, bearophile
Sep 21 2009
prev sibling parent reply Don <nospam nospam.com> writes:
dsimcha wrote:
 == Quote from Don (nospam nospam.com)'s article
 Jeremie Pelletier wrote:
 While writing SSE assembly by hand in D is fun and works well, I'm wondering
if the compiler has intrinsics for its instruction set, much like xmmintrin.h in C.
 The reason is that the compiler can usually reorder the intrinsics to optimize
performance.
 I could always use C code to implement my SSE routines but then I'd lose the
ability to inline them in D.
 I know this is an old post, but since it wasn't answered...
 Make sure you know what the SSE intrinsics actually *do* in VC++/Intel!
 I've read many complaints about how poorly they perform on all compilers
 -- the penalty for allowing them to be reordered is that extra
 instructions are often added, which means that straightforward C code is
 sometimes faster!
 In this regard, I'm personally excited about array operations. I think
 the need for SSE intrinsics and vectorisation is a result of abstract
 inversion: the instruction set is higher-level than the "high level
 language"! Array operations allow D to catch up with asm again. When
 array operations get implemented properly, it'll be interesting to see
 how much need for SSE intrinsics remains.
What's wrong with the current implementation of array ops (other than a few misc. bugs that have already been filed)? I thought they already use SSE if available.
(1) They don't take advantage of fixed-length arrays. In particular, operations on float[4] should be a single SSE instruction (no function call, no loop, nothing). This will make a huge difference to game and graphics programmers, I believe. (2) The operations don't block on cache size. (3) DMD doesn't allow you to generate code assuming a minimum CPU capabilities. (In fact, when generating inline asm, the CPU type is 8086! (this is in bugzilla)) This limits the possible use of (1). It's issue (1) which is the killer.
Sep 21 2009
next sibling parent Jeremie Pelletier <jeremiep gmail.com> writes:
Don wrote:
 dsimcha wrote:
 == Quote from Don (nospam nospam.com)'s article
 Jeremie Pelletier wrote:
 While writing SSE assembly by hand in D is fun and works well, I'm 
 wondering
if the compiler has intrinsics for its instruction set, much like xmmintrin.h in C.
 The reason is that the compiler can usually reorder the intrinsics 
 to optimize
performance.
 I could always use C code to implement my SSE routines but then I'd 
 lose the
ability to inline them in D.
 I know this is an old post, but since it wasn't answered...
 Make sure you know what the SSE intrinsics actually *do* in VC++/Intel!
 I've read many complaints about how poorly they perform on all compilers
 -- the penalty for allowing them to be reordered is that extra
 instructions are often added, which means that straightforward C code is
 sometimes faster!
 In this regard, I'm personally excited about array operations. I think
 the need for SSE intrinsics and vectorisation is a result of abstract
 inversion: the instruction set is higher-level than the "high level
 language"! Array operations allow D to catch up with asm again. When
 array operations get implemented properly, it'll be interesting to see
 how much need for SSE intrinsics remains.
What's wrong with the current implementation of array ops (other than a few misc. bugs that have already been filed)? I thought they already use SSE if available.
(1) They don't take advantage of fixed-length arrays. In particular, operations on float[4] should be a single SSE instruction (no function call, no loop, nothing). This will make a huge difference to game and graphics programmers, I believe. (2) The operations don't block on cache size. (3) DMD doesn't allow you to generate code assuming a minimum CPU capabilities. (In fact, when generating inline asm, the CPU type is 8086! (this is in bugzilla)) This limits the possible use of (1). It's issue (1) which is the killer.
I agree that a -arch switch of some sort would the best thing to hit dmd. It is already most useful in gcc which supported up to core2 when I last used it. I wrote a linear algebra module with support for 2D,3D,4D vectors, quaternions, 3x2 and 4x4 matrices, all with template structs so I can declare them for float, double, or real components. I used SSE for the bigger operations which grew up the module size considerably. This is where I first started looking for SSE intrinsics. It would also be greatly helpful if the compiler could generate SSE code by itself, it would save a LOT of inline assembly for simple operations.
Sep 21 2009
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Don:
 (1) They don't take advantage of fixed-length arrays. In particular, 
 operations on float[4] should be a single SSE instruction (no function 
 call, no loop, nothing). This will make a huge difference to game and 
 graphics programmers, I believe.
[...]
It's issue (1) which is the killer.
In my answer I have forgotten to say another small thing. The std.gc.malloc() of D returns pointers aligned to 16 bytes (but I may like to add a second argument to such GC malloc, to specify the alignment, this can be used to save some memory when the alignment isn't necessary), while I think the std.c.stdlib.malloc() doesn't give pointers aligned to 16 bytes. In the following code if you want to implement the last line with one vector instruction then a and b arrays have to be aligned to 16 bytes. I think that currently LDC doesn't align a and b to 16 bytes. float[4] a = [1.f, 2., 3., 4.]; float[4] b[] = 10f; float[4] c[] = a[] + b[]; So you may need a syntax like the following, that's not handy: align(16) float[4] a = [1.f, 2., 3., 4.]; align(16) float[4] b[] = 10f; align(16) float[4] c[] = a[] + b[]; A possible solution is to automatically align to 16 (by default, but it can be changed to save stack space in specific situations) all static arrays allocated on the stack too :-) A note: in future probably CPU vector instructions will relax their alignment requirements... it's already happening. Bye, bearophile
Sep 21 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
bearophile wrote:
 Don:
 (1) They don't take advantage of fixed-length arrays. In particular, 
 operations on float[4] should be a single SSE instruction (no function 
 call, no loop, nothing). This will make a huge difference to game and 
 graphics programmers, I believe.
[...]
 It's issue (1) which is the killer.
In my answer I have forgotten to say another small thing. The std.gc.malloc() of D returns pointers aligned to 16 bytes (but I may like to add a second argument to such GC malloc, to specify the alignment, this can be used to save some memory when the alignment isn't necessary), while I think the std.c.stdlib.malloc() doesn't give pointers aligned to 16 bytes. In the following code if you want to implement the last line with one vector instruction then a and b arrays have to be aligned to 16 bytes. I think that currently LDC doesn't align a and b to 16 bytes. float[4] a = [1.f, 2., 3., 4.]; float[4] b[] = 10f; float[4] c[] = a[] + b[]; So you may need a syntax like the following, that's not handy: align(16) float[4] a = [1.f, 2., 3., 4.]; align(16) float[4] b[] = 10f; align(16) float[4] c[] = a[] + b[]; A possible solution is to automatically align to 16 (by default, but it can be changed to save stack space in specific situations) all static arrays allocated on the stack too :-) A note: in future probably CPU vector instructions will relax their alignment requirements... it's already happening. Bye, bearophile
That 16bytes alignment is a restriction of the current usage of bit fields. Since every bit in the field indexes a single 16bytes block, a simple shift 4 bits to the right translate a pointer into its index in the bit field. You could align on 4 bytes boundaries but at the cost of doubling the size of bit fields, and possibly having slower collection runs. Doesn't SSE have aligned and unaligned versions of its move instructions? like MOVAPS and MOVUPS.
Sep 21 2009
parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 21 Sep 2009 18:32:50 -0400, Jeremie Pelletier <jeremiep gmail.com>  
wrote:

 bearophile wrote:
 Don:
 (1) They don't take advantage of fixed-length arrays. In particular,  
 operations on float[4] should be a single SSE instruction (no function  
 call, no loop, nothing). This will make a huge difference to game and  
 graphics programmers, I believe.
[...]
 It's issue (1) which is the killer.
In my answer I have forgotten to say another small thing. The std.gc.malloc() of D returns pointers aligned to 16 bytes (but I may like to add a second argument to such GC malloc, to specify the alignment, this can be used to save some memory when the alignment isn't necessary), while I think the std.c.stdlib.malloc() doesn't give pointers aligned to 16 bytes. In the following code if you want to implement the last line with one vector instruction then a and b arrays have to be aligned to 16 bytes. I think that currently LDC doesn't align a and b to 16 bytes. float[4] a = [1.f, 2., 3., 4.]; float[4] b[] = 10f; float[4] c[] = a[] + b[]; So you may need a syntax like the following, that's not handy: align(16) float[4] a = [1.f, 2., 3., 4.]; align(16) float[4] b[] = 10f; align(16) float[4] c[] = a[] + b[]; A possible solution is to automatically align to 16 (by default, but it can be changed to save stack space in specific situations) all static arrays allocated on the stack too :-) A note: in future probably CPU vector instructions will relax their alignment requirements... it's already happening. Bye, bearophile
That 16bytes alignment is a restriction of the current usage of bit fields. Since every bit in the field indexes a single 16bytes block, a simple shift 4 bits to the right translate a pointer into its index in the bit field. You could align on 4 bytes boundaries but at the cost of doubling the size of bit fields, and possibly having slower collection runs. Doesn't SSE have aligned and unaligned versions of its move instructions? like MOVAPS and MOVUPS.
Yes, but the unaligned version is slower, even for aligned data. Also, another issue for game/graphic/robotic programmers is the ability to return fixed length arrays from functions. Though struct wrappers mitigates this.
Sep 21 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Robert Jacques:

 Yes, but the unaligned version is slower, even for aligned data.
This is true today, but in future it may become a little less true, thanks to improvements in the CPUs.
 Also, another issue for game/graphic/robotic programmers is the ability to  
 return fixed length arrays from functions. Though struct wrappers  
 mitigates this.
Why doesn't D allow to return fixed-sized arrays from functions? It's a basic feature that I can find useful in many situations, it looks more useful than most of the last features implemented in D2. Bye, bearophile
Sep 22 2009
next sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 22 Sep 2009 07:09:09 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:
 Robert Jacques:
[snip]
 Also, another issue for game/graphic/robotic programmers is the ability  
 to
 return fixed length arrays from functions. Though struct wrappers
 mitigates this.
Why doesn't D allow to return fixed-sized arrays from functions? It's a basic feature that I can find useful in many situations, it looks more useful than most of the last features implemented in D2. Bye, bearophile
Well, fixed length arrays are an implicit/explicit pointer to some (stack/heap) allocated memory. So returning a fixed length array usually means returning a pointer to now invalid stack memory. Allowing fixed-length arrays to be returned by value would be nice, but basically means the compiler is wrapping the array in a struct, which is easy enough to do yourself. Using wrappers also avoids the breaking the logical semantics of arrays (i.e. pass by reference).
Sep 22 2009
next sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Robert Jacques wrote:
 On Tue, 22 Sep 2009 07:09:09 -0400, bearophile
 <bearophileHUGS lycos.com> wrote:
 Robert Jacques:
[snip]
 Also, another issue for game/graphic/robotic programmers is the
 ability to
 return fixed length arrays from functions. Though struct wrappers
 mitigates this.
Why doesn't D allow to return fixed-sized arrays from functions? It's a basic feature that I can find useful in many situations, it looks more useful than most of the last features implemented in D2. Bye, bearophile
Well, fixed length arrays are an implicit/explicit pointer to some (stack/heap) allocated memory. So returning a fixed length array usually means returning a pointer to now invalid stack memory. Allowing fixed-length arrays to be returned by value would be nice, but basically means the compiler is wrapping the array in a struct, which is easy enough to do yourself. Using wrappers also avoids the breaking the logical semantics of arrays (i.e. pass by reference).
The problem is that currently you have a class of types which can be passed as arguments but cannot be returned. For example, Tango's Variant has this horrible hack where the ACTUAL definition of Variant.get is: returnT!(S) get(S)(); where you have: template returnT(T) { static if( isStaticArrayType!(T) ) alias typeof(T.dup) returnT; else alias T returnT; } I can't recall the number of times this stupid hole in the language has bitten me. As for safety concerns, it's really no different to allowing people to return delegates. Not a very good reason, but I *REALLY* hate having to special-case static arrays. P.S. And another thing while I'm at it: why can't we declare void variables? This is another thing that really complicates generic code.
Sep 22 2009
next sibling parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Daniel Keep wrote:
 
 Robert Jacques wrote:
 On Tue, 22 Sep 2009 07:09:09 -0400, bearophile
 <bearophileHUGS lycos.com> wrote:
 Robert Jacques:
[snip]
 Also, another issue for game/graphic/robotic programmers is the
 ability to
 return fixed length arrays from functions. Though struct wrappers
 mitigates this.
Why doesn't D allow to return fixed-sized arrays from functions? It's a basic feature that I can find useful in many situations, it looks more useful than most of the last features implemented in D2. Bye, bearophile
Well, fixed length arrays are an implicit/explicit pointer to some (stack/heap) allocated memory. So returning a fixed length array usually means returning a pointer to now invalid stack memory. Allowing fixed-length arrays to be returned by value would be nice, but basically means the compiler is wrapping the array in a struct, which is easy enough to do yourself. Using wrappers also avoids the breaking the logical semantics of arrays (i.e. pass by reference).
The problem is that currently you have a class of types which can be passed as arguments but cannot be returned. For example, Tango's Variant has this horrible hack where the ACTUAL definition of Variant.get is: returnT!(S) get(S)(); where you have: template returnT(T) { static if( isStaticArrayType!(T) ) alias typeof(T.dup) returnT; else alias T returnT; } I can't recall the number of times this stupid hole in the language has bitten me. As for safety concerns, it's really no different to allowing people to return delegates. Not a very good reason, but I *REALLY* hate having to special-case static arrays. P.S. And another thing while I'm at it: why can't we declare void variables? This is another thing that really complicates generic code.
Why would you declare void variables? The point of declaring typed variables is to know what kind of storage to use, void means no storage at all. The only time I use void in variable types is for void* and void[] (which really is just a void* with a length). In fact, every single scope has an infinity of void variables, you just don't need to explicitly declare them :) 'void foo;' is the same semantically as ''.
Sep 22 2009
next sibling parent Lutger <lutger.blijdestijn gmail.com> writes:
Jeremie Pelletier wrote:

...
 Why would you declare void variables? The point of declaring typed
 variables is to know what kind of storage to use, void means no storage
 at all. The only time I use void in variable types is for void* and
 void[] (which really is just a void* with a length).
 
 In fact, every single scope has an infinity of void variables, you just
 don't need to explicitly declare them :)
 
 'void foo;' is the same semantically as ''.
exactly: thus 'return foo;' in generic code can mean 'return;' when foo is of type void. This is similar to how return foo(); is already allowed when foo itself returns void.
Sep 22 2009
prev sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
Jeremie Pelletier wrote:
 Why would you declare void variables? The point of declaring typed 
 variables is to know what kind of storage to use, void means no storage 
 at all. The only time I use void in variable types is for void* and 
 void[] (which really is just a void* with a length).
 
 In fact, every single scope has an infinity of void variables, you just 
 don't need to explicitly declare them :)
 
 'void foo;' is the same semantically as ''.
It simplifies generic code a fair bit. Let's say you want to intercept a method call transparently -- maybe wrap it in a database transaction, for instance. I do similar things in dmocks. Anyway, you need to store the return value. You could write: ReturnType!(func) func(ParameterTupleOf!(func) params) { auto result = innerObj.func(params); // do something interesting return result; } Except then you get the error: voids have no value So instead you need to do some amount of special casing, perhaps quite a lot if you have to do something with the function result.
Sep 22 2009
next sibling parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Christopher Wright wrote:
 Jeremie Pelletier wrote:
 Why would you declare void variables? The point of declaring typed 
 variables is to know what kind of storage to use, void means no 
 storage at all. The only time I use void in variable types is for 
 void* and void[] (which really is just a void* with a length).

 In fact, every single scope has an infinity of void variables, you 
 just don't need to explicitly declare them :)

 'void foo;' is the same semantically as ''.
It simplifies generic code a fair bit. Let's say you want to intercept a method call transparently -- maybe wrap it in a database transaction, for instance. I do similar things in dmocks. Anyway, you need to store the return value. You could write: ReturnType!(func) func(ParameterTupleOf!(func) params) { auto result = innerObj.func(params); // do something interesting return result; } Except then you get the error: voids have no value So instead you need to do some amount of special casing, perhaps quite a lot if you have to do something with the function result.
I don't get how void could be used to simplify generic code. You can already use type unions and variants for that and if you need a single more generic type you can always use void* to point to the data. Besides in your above example, suppose the interesting thing its doing is to modify the result data, how would the compiler know how to modify void? It would just push back the error to the next statement. Why don't you just replace ReturnType!func by auto and let the compiler resolve the return type to void?
Sep 22 2009
next sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Jeremie Pelletier wrote:
 I don't get how void could be used to simplify generic code. You can
 already use type unions and variants for that and if you need a single
 more generic type you can always use void* to point to the data.
You can't take the address of a return value. I'm not even sure you could define a union type that would function generically without specialising on void anyway. And using a Variant is just ridiculous; it's adding runtime overhead that is completely unnecessary.
 Besides in your above example, suppose the interesting thing its doing
 is to modify the result data, how would the compiler know how to modify
 void? It would just push back the error to the next statement.
Example from actual code: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { alias ReturnType!(Fn) returnT; static if( is( returnT == void ) ) Fn(args); else auto result = Fn(args); glCheckError(); static if( !is( returnT == void ) ) return result; } This function is used to wrap OpenGL calls so that error checking is performed automatically. Here's what it would look like if we could use void variables: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { auto result = Fn(args); glCheckError(); return result; } I don't CARE about the result. If I did, I wouldn't be allowing voids at all, or I would be special-casing on it anyway and it wouldn't be an issue. The point is that there is NO WAY in a generic function to NOT care what the return type is. You have to, even if it ultimately doesn't matter.
 Why don't you just replace ReturnType!func by auto and let the compiler
 resolve the return type to void?
Well, there's this thing called "D1". Quite a few people use it. Especially since D2 isn't finished yet.
Sep 22 2009
parent Jeremie Pelletier <jeremiep gmail.com> writes:
Daniel Keep wrote:
 
 Jeremie Pelletier wrote:
 I don't get how void could be used to simplify generic code. You can
 already use type unions and variants for that and if you need a single
 more generic type you can always use void* to point to the data.
You can't take the address of a return value. I'm not even sure you could define a union type that would function generically without specialising on void anyway. And using a Variant is just ridiculous; it's adding runtime overhead that is completely unnecessary.
 Besides in your above example, suppose the interesting thing its doing
 is to modify the result data, how would the compiler know how to modify
 void? It would just push back the error to the next statement.
Example from actual code: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { alias ReturnType!(Fn) returnT; static if( is( returnT == void ) ) Fn(args); else auto result = Fn(args); glCheckError(); static if( !is( returnT == void ) ) return result; } This function is used to wrap OpenGL calls so that error checking is performed automatically. Here's what it would look like if we could use void variables: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { auto result = Fn(args); glCheckError(); return result; } I don't CARE about the result. If I did, I wouldn't be allowing voids at all, or I would be special-casing on it anyway and it wouldn't be an issue. The point is that there is NO WAY in a generic function to NOT care what the return type is. You have to, even if it ultimately doesn't matter.
 Why don't you just replace ReturnType!func by auto and let the compiler
 resolve the return type to void?
Well, there's this thing called "D1". Quite a few people use it. Especially since D2 isn't finished yet.
Oops sorry! I tend to forget the semantics and syntax of D1, I haven't used it since I first found about D2! I would have to agree that you do make a good point here, void values could be useful in such a case, so long as the value is only assigned by method calls and not modified locally. Basically in your example, auto result would just mean "use no storage and ignore return statements on result if auto resolves to void, but keep the value around until I return result if auto resolves to any other type". Jeremie
Sep 22 2009
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 22 Sep 2009 19:40:03 -0400, Jeremie Pelletier <jeremiep gmail.com>  
wrote:

 Christopher Wright wrote:
 Jeremie Pelletier wrote:
 Why would you declare void variables? The point of declaring typed  
 variables is to know what kind of storage to use, void means no  
 storage at all. The only time I use void in variable types is for  
 void* and void[] (which really is just a void* with a length).

 In fact, every single scope has an infinity of void variables, you  
 just don't need to explicitly declare them :)

 'void foo;' is the same semantically as ''.
It simplifies generic code a fair bit. Let's say you want to intercept a method call transparently -- maybe wrap it in a database transaction, for instance. I do similar things in dmocks. Anyway, you need to store the return value. You could write: ReturnType!(func) func(ParameterTupleOf!(func) params) { auto result = innerObj.func(params); // do something interesting return result; } Except then you get the error: voids have no value So instead you need to do some amount of special casing, perhaps quite a lot if you have to do something with the function result.
I don't get how void could be used to simplify generic code. You can already use type unions and variants for that and if you need a single more generic type you can always use void* to point to the data. Besides in your above example, suppose the interesting thing its doing is to modify the result data, how would the compiler know how to modify void? It would just push back the error to the next statement. Why don't you just replace ReturnType!func by auto and let the compiler resolve the return type to void?
Because auto returns suffer from forward referencing problems : //Bad auto x = bar; auto bar() { return foo; } auto foo() { return 1.0; } //Okay auto foo() { return 1.0; } auto bar() { return foo; } auto x = bar;
Sep 22 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Christopher Wright wrote:
 Jeremie Pelletier wrote:
 Why would you declare void variables? The point of declaring typed 
 variables is to know what kind of storage to use, void means no 
 storage at all. The only time I use void in variable types is for 
 void* and void[] (which really is just a void* with a length).

 In fact, every single scope has an infinity of void variables, you 
 just don't need to explicitly declare them :)

 'void foo;' is the same semantically as ''.
It simplifies generic code a fair bit. Let's say you want to intercept a method call transparently -- maybe wrap it in a database transaction, for instance. I do similar things in dmocks. Anyway, you need to store the return value. You could write: ReturnType!(func) func(ParameterTupleOf!(func) params) { auto result = innerObj.func(params); // do something interesting return result; } Except then you get the error: voids have no value So instead you need to do some amount of special casing, perhaps quite a lot if you have to do something with the function result.
Yah, but inside "do something interesting" you need to do special casing anyway. Andrei
Sep 22 2009
parent Christopher Wright <dhasenan gmail.com> writes:
Andrei Alexandrescu wrote:
 Yah, but inside "do something interesting" you need to do special casing 
 anyway.
 
 Andrei
Sure, but if you're writing a generic library you can punt the problem to the user, who may or may not care about the return value at all. As is, it's a cost you pay whether you care or not.
Sep 23 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Daniel Keep wrote:
 
 Robert Jacques wrote:
 On Tue, 22 Sep 2009 07:09:09 -0400, bearophile
 <bearophileHUGS lycos.com> wrote:
 Robert Jacques:
[snip]
 Also, another issue for game/graphic/robotic programmers is the
 ability to
 return fixed length arrays from functions. Though struct wrappers
 mitigates this.
Why doesn't D allow to return fixed-sized arrays from functions? It's a basic feature that I can find useful in many situations, it looks more useful than most of the last features implemented in D2. Bye, bearophile
Well, fixed length arrays are an implicit/explicit pointer to some (stack/heap) allocated memory. So returning a fixed length array usually means returning a pointer to now invalid stack memory. Allowing fixed-length arrays to be returned by value would be nice, but basically means the compiler is wrapping the array in a struct, which is easy enough to do yourself. Using wrappers also avoids the breaking the logical semantics of arrays (i.e. pass by reference).
The problem is that currently you have a class of types which can be passed as arguments but cannot be returned. For example, Tango's Variant has this horrible hack where the ACTUAL definition of Variant.get is: returnT!(S) get(S)(); where you have: template returnT(T) { static if( isStaticArrayType!(T) ) alias typeof(T.dup) returnT; else alias T returnT; } I can't recall the number of times this stupid hole in the language has bitten me. As for safety concerns, it's really no different to allowing people to return delegates. Not a very good reason, but I *REALLY* hate having to special-case static arrays.
Yah, same in std.variant. I think there it's called DecayStaticToDynamicArray!T. Has someone added the correct handling of static arrays to bugzilla? Walter wants to implement it, but we want to make sure it's not forgotten.
 P.S. And another thing while I'm at it: why can't we declare void
 variables?  This is another thing that really complicates generic code.
How would you use them? Andrei
Sep 22 2009
next sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 22 Sep 2009 12:32:25 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Daniel Keep wrote:
[snip]
  The problem is that currently you have a class of types which can be
 passed as arguments but cannot be returned.
  For example, Tango's Variant has this horrible hack where the ACTUAL
 definition of Variant.get is:
      returnT!(S) get(S)();
  where you have:
      template returnT(T)
     {
         static if( isStaticArrayType!(T) )
             alias typeof(T.dup) returnT;
         else
             alias T returnT;
     }
  I can't recall the number of times this stupid hole in the language has
 bitten me.  As for safety concerns, it's really no different to allowing
 people to return delegates.  Not a very good reason, but I *REALLY* hate
 having to special-case static arrays.
Yah, same in std.variant. I think there it's called DecayStaticToDynamicArray!T. Has someone added the correct handling of static arrays to bugzilla? Walter wants to implement it, but we want to make sure it's not forgotten.
Well, what is the correct handling? Struct style RVO or delegate auto-magical heap allocation? Something else? Both solutions are far from perfect. RVO breaks the reference semantics of arrays, though it works for many common cases and is high performance. This would be my choice, as I would like to efficiently return short vectors from functions. Delegate style heap allocation runs into the whole I'd-rather-be-safe-than-sorry issue of excessive heap allocations. I'd imagine this would be better for generic code, since it would always work.
Sep 22 2009
parent reply grauzone <none example.net> writes:
Robert Jacques wrote:
 On Tue, 22 Sep 2009 12:32:25 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 Daniel Keep wrote:
[snip]
  The problem is that currently you have a class of types which can be
 passed as arguments but cannot be returned.
  For example, Tango's Variant has this horrible hack where the ACTUAL
 definition of Variant.get is:
      returnT!(S) get(S)();
  where you have:
      template returnT(T)
     {
         static if( isStaticArrayType!(T) )
             alias typeof(T.dup) returnT;
         else
             alias T returnT;
     }
  I can't recall the number of times this stupid hole in the language has
 bitten me.  As for safety concerns, it's really no different to allowing
 people to return delegates.  Not a very good reason, but I *REALLY* hate
 having to special-case static arrays.
Yah, same in std.variant. I think there it's called DecayStaticToDynamicArray!T. Has someone added the correct handling of static arrays to bugzilla? Walter wants to implement it, but we want to make sure it's not forgotten.
Well, what is the correct handling? Struct style RVO or delegate auto-magical heap allocation? Something else? Both solutions are far from perfect. RVO breaks the reference semantics of arrays, though it works for many common cases and is high performance. This would be my choice, as I would like to efficiently return short vectors from functions. Delegate style heap allocation runs into the whole I'd-rather-be-safe-than-sorry issue of excessive heap allocations. I'd imagine this would be better for generic code, since it would always work.
I think static arrays should be value types. Then this isn't a problem anymore, and returning a static array can be handled exactly like returning structs. Didn't Walter once say that a type shouldn't behave differently, if it's wrapped into a struct? With current static array semantics, this rule is violated. If a static array has reference or value semantics kind of depends, if it's inside a struct: if you copy a struct, the embedded static array obviously looses its reference semantics. Also, I second that it should be possible to declare void variables. It'd be really useful for doing return value handling when transparently wrapping delegate calls in generic code.
Sep 22 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
grauzone wrote:
 Robert Jacques wrote:
 On Tue, 22 Sep 2009 12:32:25 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 Daniel Keep wrote:
[snip]
  The problem is that currently you have a class of types which can be
 passed as arguments but cannot be returned.
  For example, Tango's Variant has this horrible hack where the ACTUAL
 definition of Variant.get is:
      returnT!(S) get(S)();
  where you have:
      template returnT(T)
     {
         static if( isStaticArrayType!(T) )
             alias typeof(T.dup) returnT;
         else
             alias T returnT;
     }
  I can't recall the number of times this stupid hole in the language 
 has
 bitten me.  As for safety concerns, it's really no different to 
 allowing
 people to return delegates.  Not a very good reason, but I *REALLY* 
 hate
 having to special-case static arrays.
Yah, same in std.variant. I think there it's called DecayStaticToDynamicArray!T. Has someone added the correct handling of static arrays to bugzilla? Walter wants to implement it, but we want to make sure it's not forgotten.
Well, what is the correct handling? Struct style RVO or delegate auto-magical heap allocation? Something else? Both solutions are far from perfect. RVO breaks the reference semantics of arrays, though it works for many common cases and is high performance. This would be my choice, as I would like to efficiently return short vectors from functions. Delegate style heap allocation runs into the whole I'd-rather-be-safe-than-sorry issue of excessive heap allocations. I'd imagine this would be better for generic code, since it would always work.
I think static arrays should be value types. Then this isn't a problem anymore, and returning a static array can be handled exactly like returning structs. Didn't Walter once say that a type shouldn't behave differently, if it's wrapped into a struct? With current static array semantics, this rule is violated. If a static array has reference or value semantics kind of depends, if it's inside a struct: if you copy a struct, the embedded static array obviously looses its reference semantics.
Yah.
 Also, I second that it should be possible to declare void variables. 
 It'd be really useful for doing return value handling when transparently 
 wrapping delegate calls in generic code.
I think that already works. Andrei
Sep 22 2009
prev sibling next sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Andrei Alexandrescu wrote:
 Daniel Keep wrote:
 P.S. And another thing while I'm at it: why can't we declare void
 variables?  This is another thing that really complicates generic code.
How would you use them? Andrei
Here's an OLD example: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { alias ReturnType!(Fn) returnT; static if( is( returnT == void ) ) Fn(args); else auto result = Fn(args); glCheckError(); static if( !is( returnT == void ) ) return result; } This function is used to wrap OpenGL calls so that error checking is performed automatically. Here's what it would look like if we could use void variables: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { auto result = Fn(args); glCheckError(); return result; }
Sep 22 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Daniel Keep wrote:
 Andrei Alexandrescu wrote:
 Daniel Keep wrote:
 P.S. And another thing while I'm at it: why can't we declare void
 variables?  This is another thing that really complicates generic code.
How would you use them? Andrei
Here's an OLD example: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { alias ReturnType!(Fn) returnT; static if( is( returnT == void ) ) Fn(args); else auto result = Fn(args); glCheckError(); static if( !is( returnT == void ) ) return result; } This function is used to wrap OpenGL calls so that error checking is performed automatically. Here's what it would look like if we could use void variables: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { auto result = Fn(args); glCheckError(); return result; }
ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { scope(exit) glCheckError(); return Fn(args); } :o) Andrei
Sep 22 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Andrei Alexandrescu wrote:
 Daniel Keep wrote:
 Andrei Alexandrescu wrote:
 Daniel Keep wrote:
 P.S. And another thing while I'm at it: why can't we declare void
 variables?  This is another thing that really complicates generic code.
How would you use them? Andrei
Here's an OLD example: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { alias ReturnType!(Fn) returnT; static if( is( returnT == void ) ) Fn(args); else auto result = Fn(args); glCheckError(); static if( !is( returnT == void ) ) return result; } This function is used to wrap OpenGL calls so that error checking is performed automatically. Here's what it would look like if we could use void variables: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { auto result = Fn(args); glCheckError(); return result; }
ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { scope(exit) glCheckError(); return Fn(args); } :o) Andrei
Calling into a framehandler for such a trivial routine, especially if used with real time rendering, is definitely not a good idea, no matter how elegant its syntax is!
Sep 22 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Jeremie Pelletier wrote:
 Andrei Alexandrescu wrote:
 Daniel Keep wrote:
 Andrei Alexandrescu wrote:
 Daniel Keep wrote:
 P.S. And another thing while I'm at it: why can't we declare void
 variables?  This is another thing that really complicates generic 
 code.
How would you use them? Andrei
Here's an OLD example: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { alias ReturnType!(Fn) returnT; static if( is( returnT == void ) ) Fn(args); else auto result = Fn(args); glCheckError(); static if( !is( returnT == void ) ) return result; } This function is used to wrap OpenGL calls so that error checking is performed automatically. Here's what it would look like if we could use void variables: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { auto result = Fn(args); glCheckError(); return result; }
ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { scope(exit) glCheckError(); return Fn(args); } :o) Andrei
Calling into a framehandler for such a trivial routine, especially if used with real time rendering, is definitely not a good idea, no matter how elegant its syntax is!
I guess that's what the smiley was about! Andrei
Sep 22 2009
parent Jeremie Pelletier <jeremiep gmail.com> writes:
Andrei Alexandrescu wrote:
 Jeremie Pelletier wrote:
 Andrei Alexandrescu wrote:
 Daniel Keep wrote:
 Andrei Alexandrescu wrote:
 Daniel Keep wrote:
 P.S. And another thing while I'm at it: why can't we declare void
 variables?  This is another thing that really complicates generic 
 code.
How would you use them? Andrei
Here's an OLD example: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { alias ReturnType!(Fn) returnT; static if( is( returnT == void ) ) Fn(args); else auto result = Fn(args); glCheckError(); static if( !is( returnT == void ) ) return result; } This function is used to wrap OpenGL calls so that error checking is performed automatically. Here's what it would look like if we could use void variables: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { auto result = Fn(args); glCheckError(); return result; }
ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { scope(exit) glCheckError(); return Fn(args); } :o) Andrei
Calling into a framehandler for such a trivial routine, especially if used with real time rendering, is definitely not a good idea, no matter how elegant its syntax is!
I guess that's what the smiley was about! Andrei
I thought it meant "there, problem solved!" :o)
Sep 22 2009
prev sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-09-22 12:32:25 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Daniel Keep wrote:
 P.S. And another thing while I'm at it: why can't we declare void
 variables?  This is another thing that really complicates generic code.
How would you use them?
Here's some generic code that would benefit from void as a variable type in the D/Objective-C bridge. Basically, it keeps the result of a function call, does some cleaning, and returns the result (with value conversions if needed). Unfortunately, you need a separate path for functions that returns void: // Call Objective-C code that may raise an exception here. static if (is(R == void)) func(objcArgs); else ObjcType!(R) objcResult = func(objcArgs); _NSRemoveHandler2(&_localHandler); // Converting return value. static if (is(R == void)) return; else return decapsulate!(R)(objcResult); It could be rewriten in a simpler way if void variables were supported: // Call Objective-C code that may raise an exception here. ObjcType!(R) objcResult = func(objcArgs); _NSRemoveHandler2(&_localHandler); // Converting return value. return decapsulate!(R)(objcResult); Note that returning a void resulting from a function call already works in D. You just can't "store" the result of such functions in a variable. That said, it's not a big hassle in this case, thanks to static if. What suffers most is code readability. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Sep 23 2009
prev sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
Robert Jacques wrote:
 On Tue, 22 Sep 2009 07:09:09 -0400, bearophile 
 <bearophileHUGS lycos.com> wrote:
 Robert Jacques:
[snip]
 Also, another issue for game/graphic/robotic programmers is the 
 ability to
 return fixed length arrays from functions. Though struct wrappers
 mitigates this.
Why doesn't D allow to return fixed-sized arrays from functions? It's a basic feature that I can find useful in many situations, it looks more useful than most of the last features implemented in D2. Bye, bearophile
Well, fixed length arrays are an implicit/explicit pointer to some (stack/heap) allocated memory. So returning a fixed length array usually means returning a pointer to now invalid stack memory. Allowing fixed-length arrays to be returned by value would be nice, but basically means the compiler is wrapping the array in a struct, which is easy enough to do yourself. Using wrappers also avoids the breaking the logical semantics of arrays (i.e. pass by reference).
You could ease the restriction by disallowing implicit conversion from static to dynamic arrays in certain situations. A function returning a dynamic array cannot return a static array; you cannot assign the return value of a function returning a static array to a dynamic array. Or in those cases, put the static array on the heap.
Sep 22 2009
parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 22 Sep 2009 19:06:22 -0400, Christopher Wright  
<dhasenan gmail.com> wrote:
 Robert Jacques wrote:
 On Tue, 22 Sep 2009 07:09:09 -0400, bearophile  
 <bearophileHUGS lycos.com> wrote:
 Robert Jacques:
[snip]
 Also, another issue for game/graphic/robotic programmers is the  
 ability to
 return fixed length arrays from functions. Though struct wrappers
 mitigates this.
Why doesn't D allow to return fixed-sized arrays from functions? It's a basic feature that I can find useful in many situations, it looks more useful than most of the last features implemented in D2. Bye, bearophile
Well, fixed length arrays are an implicit/explicit pointer to some (stack/heap) allocated memory. So returning a fixed length array usually means returning a pointer to now invalid stack memory. Allowing fixed-length arrays to be returned by value would be nice, but basically means the compiler is wrapping the array in a struct, which is easy enough to do yourself. Using wrappers also avoids the breaking the logical semantics of arrays (i.e. pass by reference).
You could ease the restriction by disallowing implicit conversion from static to dynamic arrays in certain situations. A function returning a dynamic array cannot return a static array; you cannot assign the return value of a function returning a static array to a dynamic array. Or in those cases, put the static array on the heap.
I'm not sure what you're referencing.
 A function returning a dynamic array cannot return a static array;
This is already true; you have to .dup the array to return it.
 you cannot assign the return value of a function returning a static  
 array to a dynamic array.
This is already sorta true; once the return value is assigned to a static-array, it may then be implicitly casted to dynamic. Neither of which help the situation.
Sep 22 2009
prev sibling parent reply Don <nospam nospam.com> writes:
bearophile wrote:
 Robert Jacques:
 
 Yes, but the unaligned version is slower, even for aligned data.
This is true today, but in future it may become a little less true, thanks to improvements in the CPUs.
The problem is that difference today is so extreme. On core2: movaps [mem128], xmm0; // aligned, 1 micro-op movups [mem128], xmm0; // unaligned, 9 micro-ops, even on aligned data! In practice it's about an 8X speed difference! On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops. On i7, movups on aligned data is the same speed as movaps. It's still slower if it's an unaligned access. It all depends on how important you think performance on Core2 and earlier Intel processors is.
Sep 22 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Don wrote:
 bearophile wrote:
 Robert Jacques:

 Yes, but the unaligned version is slower, even for aligned data.
This is true today, but in future it may become a little less true, thanks to improvements in the CPUs.
The problem is that difference today is so extreme. On core2: movaps [mem128], xmm0; // aligned, 1 micro-op movups [mem128], xmm0; // unaligned, 9 micro-ops, even on aligned data! In practice it's about an 8X speed difference! On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops. On i7, movups on aligned data is the same speed as movaps. It's still slower if it's an unaligned access. It all depends on how important you think performance on Core2 and earlier Intel processors is.
I wasn't aware of that, and here I was wondering why my SSE code was slower than the FPU in certain places on my core2 quad, I now recall using a lot of movups instructions, thanks for the tip.
Sep 22 2009
parent reply #ponce <aliloko gmail.com> writes:
 In practice it's about an 8X speed difference!
 
 On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
 On i7, movups on aligned data is the same speed as movaps. It's still 
 slower if it's an unaligned access.
 
 It all depends on how important you think performance on Core2 and 
 earlier Intel processors is.
I wasn't aware of that, and here I was wondering why my SSE code was slower than the FPU in certain places on my core2 quad, I now recall using a lot of movups instructions, thanks for the tip.
Indeed SSE is known to be overkill when dealing with unaligned data. In C++ writing SSE code is so painful you either have to use intrisics, or use libraries like Eigen (a SIMD vectorization library based on expression templates, which can generate SSE, AVX or FPU code). But using such a library is often way too intrusive, and alignement is not in standard C++. D does already understand arrays operations like Eigen do, in order to increase cacheability. It would be great if it could statically detect 16-byte aligned data and perform SSE when possible (though there must be many others things to do :) ).
Sep 22 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
#ponce wrote:
 In practice it's about an 8X speed difference!

 On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
 On i7, movups on aligned data is the same speed as movaps. It's still 
 slower if it's an unaligned access.

 It all depends on how important you think performance on Core2 and 
 earlier Intel processors is.
I wasn't aware of that, and here I was wondering why my SSE code was slower than the FPU in certain places on my core2 quad, I now recall using a lot of movups instructions, thanks for the tip.
Indeed SSE is known to be overkill when dealing with unaligned data. In C++ writing SSE code is so painful you either have to use intrisics, or use libraries like Eigen (a SIMD vectorization library based on expression templates, which can generate SSE, AVX or FPU code). But using such a library is often way too intrusive, and alignement is not in standard C++. D does already understand arrays operations like Eigen do, in order to increase cacheability. It would be great if it could statically detect 16-byte aligned data and perform SSE when possible (though there must be many others things to do :) ).
The D memory manager already aligns data on 16 bytes boundaries. The only case I can think of right now is when data is in a struct or class: struct { float[4] vec; // aligned! int a; float[4] vec; // unaligned! }
Sep 22 2009
next sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 22 Sep 2009 12:09:23 -0400, Jeremie Pelletier <jeremiep gmail.com>  
wrote:

 #ponce wrote:
 In practice it's about an 8X speed difference!

 On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
 On i7, movups on aligned data is the same speed as movaps. It's still  
 slower if it's an unaligned access.

 It all depends on how important you think performance on Core2 and  
 earlier Intel processors is.
I wasn't aware of that, and here I was wondering why my SSE code was slower than the FPU in certain places on my core2 quad, I now recall using a lot of movups instructions, thanks for the tip.
Indeed SSE is known to be overkill when dealing with unaligned data. In C++ writing SSE code is so painful you either have to use intrisics, or use libraries like Eigen (a SIMD vectorization library based on expression templates, which can generate SSE, AVX or FPU code). But using such a library is often way too intrusive, and alignement is not in standard C++. D does already understand arrays operations like Eigen do, in order to increase cacheability. It would be great if it could statically detect 16-byte aligned data and perform SSE when possible (though there must be many others things to do :) ).
The D memory manager already aligns data on 16 bytes boundaries. The only case I can think of right now is when data is in a struct or class: struct { float[4] vec; // aligned! int a; float[4] vec; // unaligned! }
Yes, although classes have hidden vars, which are runtime dependent, changing the offset. Structs may be embedded in other things (therefore offset). And then there's the whole slicing from an array issue.
Sep 22 2009
next sibling parent Jeremie Pelletier <jeremiep gmail.com> writes:
Robert Jacques wrote:
 On Tue, 22 Sep 2009 12:09:23 -0400, Jeremie Pelletier 
 <jeremiep gmail.com> wrote:
 
 #ponce wrote:
 In practice it's about an 8X speed difference!

 On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
 On i7, movups on aligned data is the same speed as movaps. It's 
 still slower if it's an unaligned access.

 It all depends on how important you think performance on Core2 and 
 earlier Intel processors is.
I wasn't aware of that, and here I was wondering why my SSE code was slower than the FPU in certain places on my core2 quad, I now recall using a lot of movups instructions, thanks for the tip.
Indeed SSE is known to be overkill when dealing with unaligned data. In C++ writing SSE code is so painful you either have to use intrisics, or use libraries like Eigen (a SIMD vectorization library based on expression templates, which can generate SSE, AVX or FPU code). But using such a library is often way too intrusive, and alignement is not in standard C++. D does already understand arrays operations like Eigen do, in order to increase cacheability. It would be great if it could statically detect 16-byte aligned data and perform SSE when possible (though there must be many others things to do :) ).
The D memory manager already aligns data on 16 bytes boundaries. The only case I can think of right now is when data is in a struct or class: struct { float[4] vec; // aligned! int a; float[4] vec; // unaligned! }
Yes, although classes have hidden vars, which are runtime dependent, changing the offset. Structs may be embedded in other things (therefore offset). And then there's the whole slicing from an array issue.
Ah yes, you are right. Then I guess it really is up to the programmer to know if the data is aligned or not and select different code paths from it. Adding checks at runtime just adds to the overhead we're trying to save by using SSE in the first place. It would be great if we could declare aliases to asm instructions and use template functions with a (bool aligned = true) and set a movps alias to either movaps or movups depending on the value of aligned.
Sep 22 2009
prev sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
Robert Jacques wrote:
 Yes, although classes have hidden vars, which are runtime dependent, 
 changing the offset. Structs may be embedded in other things (therefore 
 offset). And then there's the whole slicing from an array issue.
Um, no. Field accesses for class variables are (pointer + offset). Successive subclasses append their fields to the object, so if you sliced an object and changed its vtbl pointer, you could get a valid instance of its superclass. If the class layout weren't determined at compile time, field accesses would be as slow as virtual function calls.
Sep 22 2009
parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 22 Sep 2009 18:56:12 -0400, Christopher Wright  
<dhasenan gmail.com> wrote:

 Robert Jacques wrote:
 Yes, although classes have hidden vars, which are runtime dependent,  
 changing the offset. Structs may be embedded in other things (therefore  
 offset). And then there's the whole slicing from an array issue.
Um, no. Field accesses for class variables are (pointer + offset). Successive subclasses append their fields to the object, so if you sliced an object and changed its vtbl pointer, you could get a valid instance of its superclass. If the class layout weren't determined at compile time, field accesses would be as slow as virtual function calls.
Clarification: I meant slicing an array of value types. i.e. if the size of the value type isn't a multiple of 16, then the alignment will change. (i.e. float3[]) As for classes, yes the compiler knows, but the point is that you don't know the size and therefore alignment of your super-class. Worse, it could change with different run-times or OSes. So trying to manually align things by introducing spacing vars, etc. is both hard, error-prone and non-portable.
Sep 22 2009
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Jeremie Pelletier:

 The D memory manager already aligns data on 16 bytes boundaries. The 
 only case I can think of right now is when data is in a struct or class:
LDC doesn't align to 16 the normal arrays inside functions: A small test program: void main() { float[4] a = [1.0f, 2.0, 3.0, 4.0]; float[4] b, c; b[] = 10.0f; c[] = a[] + b[]; } The ll code (the asm of the LLVM) LDC produces, this is the head: ldc -O3 -inline -release -output-ll vect1.d define x86_stdcallcc i32 _Dmain(%"char[][]" %unnamed) { entry: %a = alloca [4 x float], align 4 ; <[4 x float]*> [#uses=5] %b = alloca [4 x float], align 4 ; <[4 x float]*> [#uses=4] %c = alloca [4 x float], align 4 ; <[4 x float]*> [#uses=4] %.gc_mem = call noalias i8* _d_newarrayvT(%object.TypeInfo* _D11TypeInfo_Af6__initZ, i32 4) ; <i8*> [#uses=5] [...] The asm it produces for the whole main (the call to the array op is inlined, while _d_array_init_float is not inlined, I don't know why): ldc -O3 -inline -release -output-s vect1.d _Dmain: pushl %esi subl $64, %esp movl $4, 4(%esp) movl $_D11TypeInfo_Af6__initZ, (%esp) call _d_newarrayvT movl $1065353216, (%eax) movl $1073741824, 4(%eax) movl $1077936128, 8(%eax) movl $1082130432, 12(%eax) movl 8(%eax), %ecx movl %ecx, 56(%esp) movl 4(%eax), %ecx movl %ecx, 52(%esp) movl (%eax), %eax movl %eax, 48(%esp) movl $1082130432, 60(%esp) leal 32(%esp), %esi movl %esi, (%esp) movl $2143289344, 8(%esp) movl $4, 4(%esp) call _d_array_init_float leal 16(%esp), %eax movl %eax, (%esp) movl $2143289344, 8(%esp) movl $4, 4(%esp) call _d_array_init_float movl %esi, (%esp) movl $1092616192, 8(%esp) movl $4, 4(%esp) call _d_array_init_float movss 48(%esp), %xmm0 addss 32(%esp), %xmm0 movss %xmm0, 16(%esp) movss 52(%esp), %xmm0 addss 36(%esp), %xmm0 movss %xmm0, 20(%esp) movss 56(%esp), %xmm0 addss 40(%esp), %xmm0 movss %xmm0, 24(%esp) movss 60(%esp), %xmm0 addss 44(%esp), %xmm0 movss %xmm0, 28(%esp) xorl %eax, %eax addl $64, %esp popl %esi ret $8 By the way, using Link-Time Optimization and interning LDC produces this LL (whole main): define x86_stdcallcc i32 _Dmain(%"char[][]" %unnamed) { entry: %b = alloca [4 x float], align 4 ; <[4 x float]*> [#uses=1] %c = alloca [4 x float], align 4 ; <[4 x float]*> [#uses=1] %.gc_mem = call noalias i8* _d_newarrayvT(%object.TypeInfo* _D11TypeInfo_Af6__initZ, i32 4) ; <i8*> [#uses=4] %.gc_mem1 = bitcast i8* %.gc_mem to float* ; <float*> [#uses=1] store float 1.000000e+00, float* %.gc_mem1 %tmp3 = getelementptr i8* %.gc_mem, i32 4 ; <i8*> [#uses=1] %0 = bitcast i8* %tmp3 to float* ; <float*> [#uses=1] store float 2.000000e+00, float* %0 %tmp4 = getelementptr i8* %.gc_mem, i32 8 ; <i8*> [#uses=1] %1 = bitcast i8* %tmp4 to float* ; <float*> [#uses=1] store float 3.000000e+00, float* %1 %tmp5 = getelementptr i8* %.gc_mem, i32 12 ; <i8*> [#uses=1] %2 = bitcast i8* %tmp5 to float* ; <float*> [#uses=1] store float 4.000000e+00, float* %2 %tmp8 = getelementptr [4 x float]* %b, i32 0, i32 0 ; <float*> [#uses=2] call void _d_array_init_float(float* nocapture %tmp8, i32 4, float 0x7FF8000000000000) %tmp9 = getelementptr [4 x float]* %c, i32 0, i32 0 ; <float*> [#uses=1] call void _d_array_init_float(float* nocapture %tmp9, i32 4, float 0x7FF8000000000000) call void _d_array_init_float(float* nocapture %tmp8, i32 4, float 1.000000e+01) ret i32 0 } Bye, bearophile
Sep 22 2009
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Robert Jacques:

 Well, fixed length arrays are an implicit/explicit pointer to some  
 (stack/heap) allocated memory. So returning a fixed length array usually  
 means returning a pointer to now invalid stack memory. Allowing  
 fixed-length arrays to be returned by value would be nice, but basically  
 means the compiler is wrapping the array in a struct, which is easy enough  
 to do yourself. Using wrappers also avoids the breaking the logical  
 semantics of arrays (i.e. pass by reference).
As usual this discussion is developing into other directions that are both interesting and bordeline too complex for me :-) Arrays are the most common and useful data structure (beside single values/variables). And experience shows me that in some situations static arrays can lead to higher performance (for example if you have a matrix, and its number of columns is known at compile time and such number is a power of 2, then the compiler can use just a shift to find a cell). So I'd like to see improving the D management of such arrays (for me it's a MUCH more common problem than for example the last contravariant argument types discussed by Andrei. I am for improving simple things that I can understand and use every day first, and complex things later. D2 is getting too much difficult for me), even if some extra annotations are necessary. The possible ways that can be useful: - To return small arrays (for example the ones used by SSE/AVX registers) by value. Non need to create silly wrapper structs. The compiler has to show a performance warning when such arrays is bigger than 1024 bytes of RAM. - LLVM has good stack-allocated (alloca) arrays, like the ones introduced by C99. Having a way to use them in D too is good. - A way to return just the reference to a dynamic array when the function already takes in input the reference to it. - To automatically allocate and copy returned static arrays on the heap, to keep the situation safe and avoid too many copies of large arrays (so it gets copied only once here). I'm not sure about this. Bye, bearophile
Sep 22 2009