digitalmars.D - Does dmd have SSE intrinsics?

Jeremie Pelletier (3/3) Aug 26 2009 While writing SSE assembly by hand in D is fun and works well, I'm wonde...

Don (13/18) Sep 21 2009 I know this is an old post, but since it wasn't answered...

dsimcha (6/24) Sep 21 2009 performance.

bearophile (18/20) Sep 21 2009 The idea is to improve array operations so they become a handy way to ef...
Don (10/33) Sep 21 2009 (1) They don't take advantage of fixed-length arrays. In particular,

Jeremie Pelletier (11/52) Sep 21 2009 I agree that a -arch switch of some sort would the best thing to hit
bearophile (16/21) Sep 21 2009 In my answer I have forgotten to say another small thing.

Jeremie Pelletier (8/37) Sep 21 2009 That 16bytes alignment is a restriction of the current usage of bit

Robert Jacques (6/45) Sep 21 2009 Yes, but the unaligned version is slower, even for aligned data.

bearophile (5/9) Sep 22 2009 Why doesn't D allow to return fixed-sized arrays from functions? It's a ...

Robert Jacques (10/20) Sep 22 2009 [snip]

Daniel Keep (20/43) Sep 22 2009 The problem is that currently you have a class of types which can be

Jeremie Pelletier (8/56) Sep 22 2009 Why would you declare void variables? The point of declaring typed

Lutger (5/14) Sep 22 2009 exactly: thus 'return foo;' in generic code can mean 'return;' when foo ...
Christopher Wright (14/23) Sep 22 2009 It simplifies generic code a fair bit. Let's say you want to intercept a...

Jeremie Pelletier (9/37) Sep 22 2009 I don't get how void could be used to simplify generic code. You can

Daniel Keep (34/42) Sep 22 2009 You can't take the address of a return value. I'm not even sure you

Jeremie Pelletier (11/71) Sep 22 2009 Oops sorry! I tend to forget the semantics and syntax of D1, I haven't

Robert Jacques (11/43) Sep 22 2009 Because auto returns suffer from forward referencing problems :

Andrei Alexandrescu (4/32) Sep 22 2009 Yah, but inside "do something interesting" you need to do special casing...

Christopher Wright (4/8) Sep 23 2009 Sure, but if you're writing a generic library you can punt the problem

Andrei Alexandrescu (7/55) Sep 22 2009 Yah, same in std.variant. I think there it's called

Robert Jacques (12/34) Sep 22 2009 [snip]

grauzone (12/49) Sep 22 2009 I think static arrays should be value types. Then this isn't a problem

Andrei Alexandrescu (4/59) Sep 22 2009 I think that already works.

Daniel Keep (22/30) Sep 22 2009 Here's an OLD example:

Andrei Alexandrescu (8/46) Sep 22 2009 ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)

Jeremie Pelletier (4/54) Sep 22 2009 Calling into a framehandler for such a trivial routine, especially if

Andrei Alexandrescu (3/59) Sep 22 2009 I guess that's what the smiley was about!

Jeremie Pelletier (3/65) Sep 22 2009 I thought it meant "there, problem solved!"

Michel Fortin (30/35) Sep 23 2009 Here's some generic code that would benefit from void as a variable

Christopher Wright (6/29) Sep 22 2009 You could ease the restriction by disallowing implicit conversion from

Robert Jacques (7/38) Sep 22 2009 I'm not sure what you're referencing.

Don (10/15) Sep 22 2009 The problem is that difference today is so extreme. On core2:

Jeremie Pelletier (4/23) Sep 22 2009 I wasn't aware of that, and here I was wondering why my SSE code was

#ponce (3/15) Sep 22 2009 Indeed SSE is known to be overkill when dealing with unaligned data.

Jeremie Pelletier (8/24) Sep 22 2009 The D memory manager already aligns data on 16 bytes boundaries. The

Robert Jacques (5/34) Sep 22 2009 Yes, although classes have hidden vars, which are runtime dependent,

Jeremie Pelletier (8/46) Sep 22 2009 Ah yes, you are right. Then I guess it really is up to the programmer to...
Christopher Wright (7/10) Sep 22 2009 Um, no. Field accesses for class variables are (pointer + offset).

Robert Jacques (10/20) Sep 22 2009 Clarification: I meant slicing an array of value types. i.e. if the size...

bearophile (93/95) Sep 22 2009 LDC doesn't align to 16 the normal arrays inside functions:

bearophile (11/18) Sep 22 2009 As usual this discussion is developing into other directions that are bo...

Jeremie Pelletier <jeremiep gmail.com> writes:

While writing SSE assembly by hand in D is fun and works well, I'm wondering if
the compiler has intrinsics for its instruction set, much like xmmintrin.h in C.

The reason is that the compiler can usually reorder the intrinsics to optimize
performance.

I could always use C code to implement my SSE routines but then I'd lose the
ability to inline them in D.

Aug 26 2009

Don <nospam nospam.com> writes:

Jeremie Pelletier wrote:
 While writing SSE assembly by hand in D is fun and works well, I'm wondering
if the compiler has intrinsics for its instruction set, much like xmmintrin.h
in C.
 
 The reason is that the compiler can usually reorder the intrinsics to optimize
performance.
 
 I could always use C code to implement my SSE routines but then I'd lose the
ability to inline them in D.

I know this is an old post, but since it wasn't answered...

Make sure you know what the SSE intrinsics actually *do* in VC++/Intel! 
I've read many complaints about how poorly they perform on all compilers 
-- the penalty for allowing them to be reordered is that extra 
instructions are often added, which means that straightforward C code is 
sometimes faster!

In this regard, I'm personally excited about array operations. I think 
the need for SSE intrinsics and vectorisation is a result of abstract 
inversion: the instruction set is higher-level than the "high level 
language"! Array operations allow D to catch up with asm again. When 
array operations get implemented properly, it'll be interesting to see 
how much need for SSE intrinsics remains.

Sep 21 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Don (nospam nospam.com)'s article
 Jeremie Pelletier wrote:
 While writing SSE assembly by hand in D is fun and works well, I'm wondering


if the compiler has intrinsics for its instruction set, much like xmmintrin.h
in C.
 The reason is that the compiler can usually reorder the intrinsics to optimize


performance.
 I could always use C code to implement my SSE routines but then I'd lose the


ability to inline them in D.
 I know this is an old post, but since it wasn't answered...
 Make sure you know what the SSE intrinsics actually *do* in VC++/Intel!
 I've read many complaints about how poorly they perform on all compilers
 -- the penalty for allowing them to be reordered is that extra
 instructions are often added, which means that straightforward C code is
 sometimes faster!
 In this regard, I'm personally excited about array operations. I think
 the need for SSE intrinsics and vectorisation is a result of abstract
 inversion: the instruction set is higher-level than the "high level
 language"! Array operations allow D to catch up with asm again. When
 array operations get implemented properly, it'll be interesting to see
 how much need for SSE intrinsics remains.

What's wrong with the current implementation of array ops (other than a few
misc.
bugs that have already been filed)?  I thought they already use SSE if
available.

Sep 21 2009

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:

 What's wrong with the current implementation of array ops (other than a few
misc.
 bugs that have already been filed)?  I thought they already use SSE if
available.

The idea is to improve array operations so they become a handy way to
efficiently use present and future (AVX too,
http://en.wikipedia.org/wiki/Advanced_Vector_Extensions ) vector instructions.

So for example if in my D code I have:
float[4] a = [1.f, 2., 3., 4.];
float[4] b[] = 10f;
float[4] c = a + b;

The compiler has to use a single inlined SSE instruction to implement the third
line (the 4 float sum) of D code. And to use two instructions to load &
broadcast the float value 10 to a whole XMM register.

If the D code is:
float[8] a = [1.f, 2., 3., 4., 5., 6., 7., 8.];
float[8] b = [10.f, 20., 30., 40., 50., 60., 70., 80.];
float[8] c = a + b;
The current vector instructions aren't wide enough to do that in a single
instruction (but future AVX will be able to), so the compiler has to inline two
SSE instructions.

Currently such operations are implemented with calls to a function (that also
tests if/what vector instructions are available), that slow down code if you
have to sum just 4 floats.

Another problem is that some important semantics is missing, for example some
shuffling, and few other things. With some care some, most, or all such
operations (keeping a good look at AVX too) can be mapped to built-in array
methods...

The problem here is that you don't want to tie too much the D language to the
currently available vector instructions because in 5-10 years CPUs may change.
So what you want is to add enough semantics that later the compiler can compile
as it can (with the scalar instructions, with SSE1, with future AVX 1024 bit
wide, or with something today unknown). If the language doesn't give enough
semantics to the compiler, you are forced to do as GCC that now tries to infer
vector operations from normal code, but it's a complex thing and usually not as
efficient as using GCC SSE intrinsics.

This is something that deserves a thread here :-) In the end implementing all
this doesn't look hard. It's mostly a matter of designing it well (while
implementing the auto-vectorization as in GCC is harder to implement).

Bye,
bearophile

Sep 21 2009

Don <nospam nospam.com> writes:

dsimcha wrote:
 == Quote from Don (nospam nospam.com)'s article
 Jeremie Pelletier wrote:
 While writing SSE assembly by hand in D is fun and works well, I'm wondering


 if the compiler has intrinsics for its instruction set, much like xmmintrin.h
in C.
 The reason is that the compiler can usually reorder the intrinsics to optimize


 performance.
 I could always use C code to implement my SSE routines but then I'd lose the


 ability to inline them in D.
 I know this is an old post, but since it wasn't answered...
 Make sure you know what the SSE intrinsics actually *do* in VC++/Intel!
 I've read many complaints about how poorly they perform on all compilers
 -- the penalty for allowing them to be reordered is that extra
 instructions are often added, which means that straightforward C code is
 sometimes faster!
 In this regard, I'm personally excited about array operations. I think
 the need for SSE intrinsics and vectorisation is a result of abstract
 inversion: the instruction set is higher-level than the "high level
 language"! Array operations allow D to catch up with asm again. When
 array operations get implemented properly, it'll be interesting to see
 how much need for SSE intrinsics remains.

 
 What's wrong with the current implementation of array ops (other than a few
misc.
 bugs that have already been filed)?  I thought they already use SSE if
available.

(1) They don't take advantage of fixed-length arrays. In particular, 
operations on float[4] should be a single SSE instruction (no function 
call, no loop, nothing). This will make a huge difference to game and 
graphics programmers, I believe.
(2) The operations don't block on cache size.
(3) DMD doesn't allow you to generate code assuming a minimum CPU 
capabilities. (In fact, when generating inline asm, the CPU type is 
8086! (this is in bugzilla)) This limits the possible use of (1).

It's issue (1) which is the killer.

Sep 21 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

Don wrote:
 dsimcha wrote:
 == Quote from Don (nospam nospam.com)'s article
 Jeremie Pelletier wrote:
 While writing SSE assembly by hand in D is fun and works well, I'm 
 wondering


 if the compiler has intrinsics for its instruction set, much like 
 xmmintrin.h in C.
 The reason is that the compiler can usually reorder the intrinsics 
 to optimize


 performance.
 I could always use C code to implement my SSE routines but then I'd 
 lose the


 ability to inline them in D.
 I know this is an old post, but since it wasn't answered...
 Make sure you know what the SSE intrinsics actually *do* in VC++/Intel!
 I've read many complaints about how poorly they perform on all compilers
 -- the penalty for allowing them to be reordered is that extra
 instructions are often added, which means that straightforward C code is
 sometimes faster!
 In this regard, I'm personally excited about array operations. I think
 the need for SSE intrinsics and vectorisation is a result of abstract
 inversion: the instruction set is higher-level than the "high level
 language"! Array operations allow D to catch up with asm again. When
 array operations get implemented properly, it'll be interesting to see
 how much need for SSE intrinsics remains.

 What's wrong with the current implementation of array ops (other than 
 a few misc.
 bugs that have already been filed)?  I thought they already use SSE if 
 available.

 
 (1) They don't take advantage of fixed-length arrays. In particular, 
 operations on float[4] should be a single SSE instruction (no function 
 call, no loop, nothing). This will make a huge difference to game and 
 graphics programmers, I believe.
 (2) The operations don't block on cache size.
 (3) DMD doesn't allow you to generate code assuming a minimum CPU 
 capabilities. (In fact, when generating inline asm, the CPU type is 
 8086! (this is in bugzilla)) This limits the possible use of (1).
 
 It's issue (1) which is the killer.


I agree that a -arch switch of some sort would the best thing to hit 
dmd. It is already most useful in gcc which supported up to core2 when I 
last used it.

I wrote a linear algebra module with support for 2D,3D,4D vectors, 
quaternions, 3x2 and 4x4 matrices, all with template structs so I can 
declare them for float, double, or real components. I used SSE for the 
bigger operations which grew up the module size considerably. This is 
where I first started looking for SSE intrinsics. It would also be 
greatly helpful if the compiler could generate SSE code by itself, it 
would save a LOT of inline assembly for simple operations.

Sep 21 2009

bearophile <bearophileHUGS lycos.com> writes:

Don:
 (1) They don't take advantage of fixed-length arrays. In particular, 
 operations on float[4] should be a single SSE instruction (no function 
 call, no loop, nothing). This will make a huge difference to game and 
 graphics programmers, I believe.

[...]
It's issue (1) which is the killer.

In my answer I have forgotten to say another small thing.

The std.gc.malloc() of D returns pointers aligned to 16 bytes (but I may like
to add a second argument to such GC malloc, to specify the alignment, this can
be used to save some memory when the alignment isn't necessary), while I think
the std.c.stdlib.malloc() doesn't give pointers aligned to 16 bytes.

In the following code if you want to implement the last line with one vector
instruction then a and b arrays have to be aligned to 16 bytes. I think that
currently LDC doesn't align a and b to 16 bytes.

float[4] a = [1.f, 2., 3., 4.];
float[4] b[] = 10f;
float[4] c[] = a[] + b[];

So you may need a syntax like the following, that's not handy:

align(16) float[4] a = [1.f, 2., 3., 4.];
align(16) float[4] b[] = 10f;
align(16) float[4] c[] = a[] + b[];

A possible solution is to automatically align to 16 (by default, but it can be
changed to save stack space in specific situations) all static arrays allocated
on the stack too :-)
A note: in future probably CPU vector instructions will relax their alignment
requirements... it's already happening.

Bye,
bearophile

Sep 21 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

bearophile wrote:
 Don:
 (1) They don't take advantage of fixed-length arrays. In particular, 
 operations on float[4] should be a single SSE instruction (no function 
 call, no loop, nothing). This will make a huge difference to game and 
 graphics programmers, I believe.

 [...]
 It's issue (1) which is the killer.

 
 In my answer I have forgotten to say another small thing.
 
 The std.gc.malloc() of D returns pointers aligned to 16 bytes (but I may like
to add a second argument to such GC malloc, to specify the alignment, this can
be used to save some memory when the alignment isn't necessary), while I think
the std.c.stdlib.malloc() doesn't give pointers aligned to 16 bytes.
 
 In the following code if you want to implement the last line with one vector
instruction then a and b arrays have to be aligned to 16 bytes. I think that
currently LDC doesn't align a and b to 16 bytes.
 
 float[4] a = [1.f, 2., 3., 4.];
 float[4] b[] = 10f;
 float[4] c[] = a[] + b[];
 
 So you may need a syntax like the following, that's not handy:
 
 align(16) float[4] a = [1.f, 2., 3., 4.];
 align(16) float[4] b[] = 10f;
 align(16) float[4] c[] = a[] + b[];
 
 A possible solution is to automatically align to 16 (by default, but it can be
changed to save stack space in specific situations) all static arrays allocated
on the stack too :-)
 A note: in future probably CPU vector instructions will relax their alignment
requirements... it's already happening.
 
 Bye,
 bearophile

That 16bytes alignment is a restriction of the current usage of bit 
fields. Since every bit in the field indexes a single 16bytes block, a 
simple shift 4 bits to the right translate a pointer into its index in 
the bit field. You could align on 4 bytes boundaries but at the cost of 
doubling the size of bit fields, and possibly having slower collection runs.

Doesn't SSE have aligned and unaligned versions of its move 
instructions? like MOVAPS and MOVUPS.

Sep 21 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Mon, 21 Sep 2009 18:32:50 -0400, Jeremie Pelletier <jeremiep gmail.com>  
wrote:

 bearophile wrote:
 Don:
 (1) They don't take advantage of fixed-length arrays. In particular,  
 operations on float[4] should be a single SSE instruction (no function  
 call, no loop, nothing). This will make a huge difference to game and  
 graphics programmers, I believe.

 [...]
 It's issue (1) which is the killer.

  In my answer I have forgotten to say another small thing.
  The std.gc.malloc() of D returns pointers aligned to 16 bytes (but I  
 may like to add a second argument to such GC malloc, to specify the  
 alignment, this can be used to save some memory when the alignment  
 isn't necessary), while I think the std.c.stdlib.malloc() doesn't give  
 pointers aligned to 16 bytes.
  In the following code if you want to implement the last line with one  
 vector instruction then a and b arrays have to be aligned to 16 bytes.  
 I think that currently LDC doesn't align a and b to 16 bytes.
  float[4] a = [1.f, 2., 3., 4.];
 float[4] b[] = 10f;
 float[4] c[] = a[] + b[];
  So you may need a syntax like the following, that's not handy:
  align(16) float[4] a = [1.f, 2., 3., 4.];
 align(16) float[4] b[] = 10f;
 align(16) float[4] c[] = a[] + b[];
  A possible solution is to automatically align to 16 (by default, but  
 it can be changed to save stack space in specific situations) all  
 static arrays allocated on the stack too :-)
 A note: in future probably CPU vector instructions will relax their  
 alignment requirements... it's already happening.
  Bye,
 bearophile

 That 16bytes alignment is a restriction of the current usage of bit  
 fields. Since every bit in the field indexes a single 16bytes block, a  
 simple shift 4 bits to the right translate a pointer into its index in  
 the bit field. You could align on 4 bytes boundaries but at the cost of  
 doubling the size of bit fields, and possibly having slower collection  
 runs.

 Doesn't SSE have aligned and unaligned versions of its move  
 instructions? like MOVAPS and MOVUPS.

Yes, but the unaligned version is slower, even for aligned data.

Also, another issue for game/graphic/robotic programmers is the ability to  
return fixed length arrays from functions. Though struct wrappers  
mitigates this.

Sep 21 2009

bearophile <bearophileHUGS lycos.com> writes:

Robert Jacques:

 Yes, but the unaligned version is slower, even for aligned data.

This is true today, but in future it may become a little less true, thanks to
improvements in the CPUs.


 Also, another issue for game/graphic/robotic programmers is the ability to  
 return fixed length arrays from functions. Though struct wrappers  
 mitigates this.

Why doesn't D allow to return fixed-sized arrays from functions? It's a basic
feature that I can find useful in many situations, it looks more useful than
most of the last features implemented in D2.

Bye,
bearophile

Sep 22 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 22 Sep 2009 07:09:09 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:
 Robert Jacques:

[snip]
 Also, another issue for game/graphic/robotic programmers is the ability  
 to
 return fixed length arrays from functions. Though struct wrappers
 mitigates this.

 Why doesn't D allow to return fixed-sized arrays from functions? It's a  
 basic feature that I can find useful in many situations, it looks more  
 useful than most of the last features implemented in D2.

 Bye,
 bearophile

Well, fixed length arrays are an implicit/explicit pointer to some  
(stack/heap) allocated memory. So returning a fixed length array usually  
means returning a pointer to now invalid stack memory. Allowing  
fixed-length arrays to be returned by value would be nice, but basically  
means the compiler is wrapping the array in a struct, which is easy enough  
to do yourself. Using wrappers also avoids the breaking the logical  
semantics of arrays (i.e. pass by reference).

Sep 22 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

Robert Jacques wrote:
 On Tue, 22 Sep 2009 07:09:09 -0400, bearophile
 <bearophileHUGS lycos.com> wrote:
 Robert Jacques:

 [snip]
 Also, another issue for game/graphic/robotic programmers is the
 ability to
 return fixed length arrays from functions. Though struct wrappers
 mitigates this.

 Why doesn't D allow to return fixed-sized arrays from functions? It's
 a basic feature that I can find useful in many situations, it looks
 more useful than most of the last features implemented in D2.

 Bye,
 bearophile

 
 Well, fixed length arrays are an implicit/explicit pointer to some
 (stack/heap) allocated memory. So returning a fixed length array usually
 means returning a pointer to now invalid stack memory. Allowing
 fixed-length arrays to be returned by value would be nice, but basically
 means the compiler is wrapping the array in a struct, which is easy
 enough to do yourself. Using wrappers also avoids the breaking the
 logical semantics of arrays (i.e. pass by reference).

The problem is that currently you have a class of types which can be
passed as arguments but cannot be returned.

For example, Tango's Variant has this horrible hack where the ACTUAL
definition of Variant.get is:

    returnT!(S) get(S)();

where you have:

    template returnT(T)
    {
        static if( isStaticArrayType!(T) )
            alias typeof(T.dup) returnT;
        else
            alias T returnT;
    }

I can't recall the number of times this stupid hole in the language has
bitten me.  As for safety concerns, it's really no different to allowing
people to return delegates.  Not a very good reason, but I *REALLY* hate
having to special-case static arrays.

P.S. And another thing while I'm at it: why can't we declare void
variables?  This is another thing that really complicates generic code.

Sep 22 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

Daniel Keep wrote:
 
 Robert Jacques wrote:
 On Tue, 22 Sep 2009 07:09:09 -0400, bearophile
 <bearophileHUGS lycos.com> wrote:
 Robert Jacques:

 [snip]
 Also, another issue for game/graphic/robotic programmers is the
 ability to
 return fixed length arrays from functions. Though struct wrappers
 mitigates this.

 Why doesn't D allow to return fixed-sized arrays from functions? It's
 a basic feature that I can find useful in many situations, it looks
 more useful than most of the last features implemented in D2.

 Bye,
 bearophile

 Well, fixed length arrays are an implicit/explicit pointer to some
 (stack/heap) allocated memory. So returning a fixed length array usually
 means returning a pointer to now invalid stack memory. Allowing
 fixed-length arrays to be returned by value would be nice, but basically
 means the compiler is wrapping the array in a struct, which is easy
 enough to do yourself. Using wrappers also avoids the breaking the
 logical semantics of arrays (i.e. pass by reference).

 
 The problem is that currently you have a class of types which can be
 passed as arguments but cannot be returned.
 
 For example, Tango's Variant has this horrible hack where the ACTUAL
 definition of Variant.get is:
 
     returnT!(S) get(S)();
 
 where you have:
 
     template returnT(T)
     {
         static if( isStaticArrayType!(T) )
             alias typeof(T.dup) returnT;
         else
             alias T returnT;
     }
 
 I can't recall the number of times this stupid hole in the language has
 bitten me.  As for safety concerns, it's really no different to allowing
 people to return delegates.  Not a very good reason, but I *REALLY* hate
 having to special-case static arrays.
 P.S. And another thing while I'm at it: why can't we declare void
 variables?  This is another thing that really complicates generic code.

Why would you declare void variables? The point of declaring typed 
variables is to know what kind of storage to use, void means no storage 
at all. The only time I use void in variable types is for void* and 
void[] (which really is just a void* with a length).

In fact, every single scope has an infinity of void variables, you just 
don't need to explicitly declare them :)

'void foo;' is the same semantically as ''.

Sep 22 2009

Lutger <lutger.blijdestijn gmail.com> writes:

Jeremie Pelletier wrote:

...
 Why would you declare void variables? The point of declaring typed
 variables is to know what kind of storage to use, void means no storage
 at all. The only time I use void in variable types is for void* and
 void[] (which really is just a void* with a length).
 
 In fact, every single scope has an infinity of void variables, you just
 don't need to explicitly declare them :)
 
 'void foo;' is the same semantically as ''.

exactly: thus 'return foo;' in generic code can mean 'return;' when foo is 
of type void. This is similar to how return foo(); is already allowed when 
foo itself returns void.

Sep 22 2009

Christopher Wright <dhasenan gmail.com> writes:

Jeremie Pelletier wrote:
 Why would you declare void variables? The point of declaring typed 
 variables is to know what kind of storage to use, void means no storage 
 at all. The only time I use void in variable types is for void* and 
 void[] (which really is just a void* with a length).
 
 In fact, every single scope has an infinity of void variables, you just 
 don't need to explicitly declare them :)
 
 'void foo;' is the same semantically as ''.

It simplifies generic code a fair bit. Let's say you want to intercept a 
method call transparently -- maybe wrap it in a database transaction, 
for instance. I do similar things in dmocks.

Anyway, you need to store the return value. You could write:

ReturnType!(func) func(ParameterTupleOf!(func) params)
{
	auto result = innerObj.func(params);
	// do something interesting
	return result;
}

Except then you get the error: voids have no value

So instead you need to do some amount of special casing, perhaps quite a 
lot if you have to do something with the function result.

Sep 22 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

Christopher Wright wrote:
 Jeremie Pelletier wrote:
 Why would you declare void variables? The point of declaring typed 
 variables is to know what kind of storage to use, void means no 
 storage at all. The only time I use void in variable types is for 
 void* and void[] (which really is just a void* with a length).

 In fact, every single scope has an infinity of void variables, you 
 just don't need to explicitly declare them :)

 'void foo;' is the same semantically as ''.

 
 It simplifies generic code a fair bit. Let's say you want to intercept a 
 method call transparently -- maybe wrap it in a database transaction, 
 for instance. I do similar things in dmocks.
 
 Anyway, you need to store the return value. You could write:
 
 ReturnType!(func) func(ParameterTupleOf!(func) params)
 {
     auto result = innerObj.func(params);
     // do something interesting
     return result;
 }
 
 Except then you get the error: voids have no value
 
 So instead you need to do some amount of special casing, perhaps quite a 
 lot if you have to do something with the function result.

I don't get how void could be used to simplify generic code. You can 
already use type unions and variants for that and if you need a single 
more generic type you can always use void* to point to the data.

Besides in your above example, suppose the interesting thing its doing 
is to modify the result data, how would the compiler know how to modify 
void? It would just push back the error to the next statement.

Why don't you just replace ReturnType!func by auto and let the compiler 
resolve the return type to void?

Sep 22 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

Jeremie Pelletier wrote:
 I don't get how void could be used to simplify generic code. You can
 already use type unions and variants for that and if you need a single
 more generic type you can always use void* to point to the data.

You can't take the address of a return value.  I'm not even sure you
could define a union type that would function generically without
specialising on void anyway.

And using a Variant is just ridiculous; it's adding runtime overhead
that is completely unnecessary.

 Besides in your above example, suppose the interesting thing its doing
 is to modify the result data, how would the compiler know how to modify
 void? It would just push back the error to the next statement.

Example from actual code:

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
    alias ReturnType!(Fn) returnT;

    static if( is( returnT == void ) )
        Fn(args);
    else
        auto result = Fn(args);

    glCheckError();

    static if( !is( returnT == void ) )
        return result;
}

This function is used to wrap OpenGL calls so that error checking is
performed automatically.  Here's what it would look like if we could use
void variables:

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
    auto result = Fn(args);

    glCheckError();

    return result;
}

I don't CARE about the result.  If I did, I wouldn't be allowing voids
at all, or I would be special-casing on it anyway and it wouldn't be an
issue.

The point is that there is NO WAY in a generic function to NOT care what
the return type is.  You have to, even if it ultimately doesn't matter.

 Why don't you just replace ReturnType!func by auto and let the compiler
 resolve the return type to void?

Well, there's this thing called "D1".  Quite a few people use it.

Especially since D2 isn't finished yet.

Sep 22 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

Daniel Keep wrote:
 
 Jeremie Pelletier wrote:
 I don't get how void could be used to simplify generic code. You can
 already use type unions and variants for that and if you need a single
 more generic type you can always use void* to point to the data.

 
 You can't take the address of a return value.  I'm not even sure you
 could define a union type that would function generically without
 specialising on void anyway.
 
 And using a Variant is just ridiculous; it's adding runtime overhead
 that is completely unnecessary.
 
 Besides in your above example, suppose the interesting thing its doing
 is to modify the result data, how would the compiler know how to modify
 void? It would just push back the error to the next statement.

 
 Example from actual code:
 
 ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
 {
     alias ReturnType!(Fn) returnT;
 
     static if( is( returnT == void ) )
         Fn(args);
     else
         auto result = Fn(args);
 
     glCheckError();
 
     static if( !is( returnT == void ) )
         return result;
 }
 
 This function is used to wrap OpenGL calls so that error checking is
 performed automatically.  Here's what it would look like if we could use
 void variables:
 
 ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
 {
     auto result = Fn(args);
 
     glCheckError();
 
     return result;
 }
 
 I don't CARE about the result.  If I did, I wouldn't be allowing voids
 at all, or I would be special-casing on it anyway and it wouldn't be an
 issue.
 
 The point is that there is NO WAY in a generic function to NOT care what
 the return type is.  You have to, even if it ultimately doesn't matter.
 
 Why don't you just replace ReturnType!func by auto and let the compiler
 resolve the return type to void?

 
 Well, there's this thing called "D1".  Quite a few people use it.
 
 Especially since D2 isn't finished yet.

Oops sorry! I tend to forget the semantics and syntax of D1, I haven't 
used it since I first found about D2!

I would have to agree that you do make a good point here, void values 
could be useful in such a case, so long as the value is only assigned by 
method calls and not modified locally.

Basically in your example, auto result would just mean "use no storage 
and ignore return statements on result if auto resolves to void, but 
keep the value around until I return result if auto resolves to any 
other type".

Jeremie

Sep 22 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 22 Sep 2009 19:40:03 -0400, Jeremie Pelletier <jeremiep gmail.com>  
wrote:

 Christopher Wright wrote:
 Jeremie Pelletier wrote:
 Why would you declare void variables? The point of declaring typed  
 variables is to know what kind of storage to use, void means no  
 storage at all. The only time I use void in variable types is for  
 void* and void[] (which really is just a void* with a length).

 In fact, every single scope has an infinity of void variables, you  
 just don't need to explicitly declare them :)

 'void foo;' is the same semantically as ''.

  It simplifies generic code a fair bit. Let's say you want to intercept  
 a method call transparently -- maybe wrap it in a database transaction,  
 for instance. I do similar things in dmocks.
  Anyway, you need to store the return value. You could write:
  ReturnType!(func) func(ParameterTupleOf!(func) params)
 {
     auto result = innerObj.func(params);
     // do something interesting
     return result;
 }
  Except then you get the error: voids have no value
  So instead you need to do some amount of special casing, perhaps quite  
 a lot if you have to do something with the function result.

 I don't get how void could be used to simplify generic code. You can  
 already use type unions and variants for that and if you need a single  
 more generic type you can always use void* to point to the data.

 Besides in your above example, suppose the interesting thing its doing  
 is to modify the result data, how would the compiler know how to modify  
 void? It would just push back the error to the next statement.

 Why don't you just replace ReturnType!func by auto and let the compiler  
 resolve the return type to void?

Because auto returns suffer from forward referencing problems :

//Bad
auto x = bar;
auto bar() { return foo; }
auto foo() { return 1.0; }

//Okay
auto foo() { return 1.0; }
auto bar() { return foo; }
auto x = bar;

Sep 22 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Christopher Wright wrote:
 Jeremie Pelletier wrote:
 Why would you declare void variables? The point of declaring typed 
 variables is to know what kind of storage to use, void means no 
 storage at all. The only time I use void in variable types is for 
 void* and void[] (which really is just a void* with a length).

 In fact, every single scope has an infinity of void variables, you 
 just don't need to explicitly declare them :)

 'void foo;' is the same semantically as ''.

 
 It simplifies generic code a fair bit. Let's say you want to intercept a 
 method call transparently -- maybe wrap it in a database transaction, 
 for instance. I do similar things in dmocks.
 
 Anyway, you need to store the return value. You could write:
 
 ReturnType!(func) func(ParameterTupleOf!(func) params)
 {
     auto result = innerObj.func(params);
     // do something interesting
     return result;
 }
 
 Except then you get the error: voids have no value
 
 So instead you need to do some amount of special casing, perhaps quite a 
 lot if you have to do something with the function result.

Yah, but inside "do something interesting" you need to do special casing 
anyway.

Andrei

Sep 22 2009

Christopher Wright <dhasenan gmail.com> writes:

Andrei Alexandrescu wrote:
 Yah, but inside "do something interesting" you need to do special casing 
 anyway.
 
 Andrei

Sure, but if you're writing a generic library you can punt the problem 
to the user, who may or may not care about the return value at all. As 
is, it's a cost you pay whether you care or not.

Sep 23 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Daniel Keep wrote:
 
 Robert Jacques wrote:
 On Tue, 22 Sep 2009 07:09:09 -0400, bearophile
 <bearophileHUGS lycos.com> wrote:
 Robert Jacques:

 [snip]
 Also, another issue for game/graphic/robotic programmers is the
 ability to
 return fixed length arrays from functions. Though struct wrappers
 mitigates this.

 Why doesn't D allow to return fixed-sized arrays from functions? It's
 a basic feature that I can find useful in many situations, it looks
 more useful than most of the last features implemented in D2.

 Bye,
 bearophile

 Well, fixed length arrays are an implicit/explicit pointer to some
 (stack/heap) allocated memory. So returning a fixed length array usually
 means returning a pointer to now invalid stack memory. Allowing
 fixed-length arrays to be returned by value would be nice, but basically
 means the compiler is wrapping the array in a struct, which is easy
 enough to do yourself. Using wrappers also avoids the breaking the
 logical semantics of arrays (i.e. pass by reference).

 
 The problem is that currently you have a class of types which can be
 passed as arguments but cannot be returned.
 
 For example, Tango's Variant has this horrible hack where the ACTUAL
 definition of Variant.get is:
 
     returnT!(S) get(S)();
 
 where you have:
 
     template returnT(T)
     {
         static if( isStaticArrayType!(T) )
             alias typeof(T.dup) returnT;
         else
             alias T returnT;
     }
 
 I can't recall the number of times this stupid hole in the language has
 bitten me.  As for safety concerns, it's really no different to allowing
 people to return delegates.  Not a very good reason, but I *REALLY* hate
 having to special-case static arrays.

Yah, same in std.variant. I think there it's called 
DecayStaticToDynamicArray!T. Has someone added the correct handling of 
static arrays to bugzilla? Walter wants to implement it, but we want to 
make sure it's not forgotten.

 P.S. And another thing while I'm at it: why can't we declare void
 variables?  This is another thing that really complicates generic code.

How would you use them?


Andrei

Sep 22 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 22 Sep 2009 12:32:25 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Daniel Keep wrote:

[snip]
  The problem is that currently you have a class of types which can be
 passed as arguments but cannot be returned.
  For example, Tango's Variant has this horrible hack where the ACTUAL
 definition of Variant.get is:
      returnT!(S) get(S)();
  where you have:
      template returnT(T)
     {
         static if( isStaticArrayType!(T) )
             alias typeof(T.dup) returnT;
         else
             alias T returnT;
     }
  I can't recall the number of times this stupid hole in the language has
 bitten me.  As for safety concerns, it's really no different to allowing
 people to return delegates.  Not a very good reason, but I *REALLY* hate
 having to special-case static arrays.

 Yah, same in std.variant. I think there it's called  
 DecayStaticToDynamicArray!T. Has someone added the correct handling of  
 static arrays to bugzilla? Walter wants to implement it, but we want to  
 make sure it's not forgotten.

Well, what is the correct handling? Struct style RVO or delegate  
auto-magical heap allocation? Something else?

Both solutions are far from perfect.
RVO breaks the reference semantics of arrays, though it works for many  
common cases and is high performance. This would be my choice, as I would  
like to efficiently return short vectors from functions.
Delegate style heap allocation runs into the whole  
I'd-rather-be-safe-than-sorry issue of excessive heap allocations. I'd  
imagine this would be better for generic code, since it would always work.

Sep 22 2009

grauzone <none example.net> writes:

Robert Jacques wrote:
 On Tue, 22 Sep 2009 12:32:25 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 Daniel Keep wrote:

 [snip]
  The problem is that currently you have a class of types which can be
 passed as arguments but cannot be returned.
  For example, Tango's Variant has this horrible hack where the ACTUAL
 definition of Variant.get is:
      returnT!(S) get(S)();
  where you have:
      template returnT(T)
     {
         static if( isStaticArrayType!(T) )
             alias typeof(T.dup) returnT;
         else
             alias T returnT;
     }
  I can't recall the number of times this stupid hole in the language has
 bitten me.  As for safety concerns, it's really no different to allowing
 people to return delegates.  Not a very good reason, but I *REALLY* hate
 having to special-case static arrays.

 Yah, same in std.variant. I think there it's called 
 DecayStaticToDynamicArray!T. Has someone added the correct handling of 
 static arrays to bugzilla? Walter wants to implement it, but we want 
 to make sure it's not forgotten.

 
 Well, what is the correct handling? Struct style RVO or delegate 
 auto-magical heap allocation? Something else?
 
 Both solutions are far from perfect.
 RVO breaks the reference semantics of arrays, though it works for many 
 common cases and is high performance. This would be my choice, as I 
 would like to efficiently return short vectors from functions.
 Delegate style heap allocation runs into the whole 
 I'd-rather-be-safe-than-sorry issue of excessive heap allocations. I'd 
 imagine this would be better for generic code, since it would always work.

I think static arrays should be value types. Then this isn't a problem 
anymore, and returning a static array can be handled exactly like 
returning structs.

Didn't Walter once say that a type shouldn't behave differently, if it's 
wrapped into a struct? With current static array semantics, this rule is 
violated. If a static array has reference or value semantics kind of 
depends, if it's inside a struct: if you copy a struct, the embedded 
static array obviously looses its reference semantics.

Also, I second that it should be possible to declare void variables. 
It'd be really useful for doing return value handling when transparently 
wrapping delegate calls in generic code.

Sep 22 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

grauzone wrote:
 Robert Jacques wrote:
 On Tue, 22 Sep 2009 12:32:25 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 Daniel Keep wrote:

 [snip]
  The problem is that currently you have a class of types which can be
 passed as arguments but cannot be returned.
  For example, Tango's Variant has this horrible hack where the ACTUAL
 definition of Variant.get is:
      returnT!(S) get(S)();
  where you have:
      template returnT(T)
     {
         static if( isStaticArrayType!(T) )
             alias typeof(T.dup) returnT;
         else
             alias T returnT;
     }
  I can't recall the number of times this stupid hole in the language 
 has
 bitten me.  As for safety concerns, it's really no different to 
 allowing
 people to return delegates.  Not a very good reason, but I *REALLY* 
 hate
 having to special-case static arrays.

 Yah, same in std.variant. I think there it's called 
 DecayStaticToDynamicArray!T. Has someone added the correct handling 
 of static arrays to bugzilla? Walter wants to implement it, but we 
 want to make sure it's not forgotten.

 Well, what is the correct handling? Struct style RVO or delegate 
 auto-magical heap allocation? Something else?

 Both solutions are far from perfect.
 RVO breaks the reference semantics of arrays, though it works for many 
 common cases and is high performance. This would be my choice, as I 
 would like to efficiently return short vectors from functions.
 Delegate style heap allocation runs into the whole 
 I'd-rather-be-safe-than-sorry issue of excessive heap allocations. I'd 
 imagine this would be better for generic code, since it would always 
 work.

 
 I think static arrays should be value types. Then this isn't a problem 
 anymore, and returning a static array can be handled exactly like 
 returning structs.
 
 Didn't Walter once say that a type shouldn't behave differently, if it's 
 wrapped into a struct? With current static array semantics, this rule is 
 violated. If a static array has reference or value semantics kind of 
 depends, if it's inside a struct: if you copy a struct, the embedded 
 static array obviously looses its reference semantics.

Yah.

 Also, I second that it should be possible to declare void variables. 
 It'd be really useful for doing return value handling when transparently 
 wrapping delegate calls in generic code.

I think that already works.


Andrei

Sep 22 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

Andrei Alexandrescu wrote:
 Daniel Keep wrote:
 P.S. And another thing while I'm at it: why can't we declare void
 variables?  This is another thing that really complicates generic code.

 
 How would you use them?
 
 
 Andrei

Here's an OLD example:

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
    alias ReturnType!(Fn) returnT;

    static if( is( returnT == void ) )
        Fn(args);
    else
        auto result = Fn(args);

    glCheckError();

    static if( !is( returnT == void ) )
        return result;
}

This function is used to wrap OpenGL calls so that error checking is
performed automatically.  Here's what it would look like if we could use
void variables:

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
    auto result = Fn(args);

    glCheckError();

    return result;
}

Sep 22 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Daniel Keep wrote:
 Andrei Alexandrescu wrote:
 Daniel Keep wrote:
 P.S. And another thing while I'm at it: why can't we declare void
 variables?  This is another thing that really complicates generic code.

 How would you use them?


 Andrei

 
 Here's an OLD example:
 
 ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
 {
     alias ReturnType!(Fn) returnT;
 
     static if( is( returnT == void ) )
         Fn(args);
     else
         auto result = Fn(args);
 
     glCheckError();
 
     static if( !is( returnT == void ) )
         return result;
 }
 
 This function is used to wrap OpenGL calls so that error checking is
 performed automatically.  Here's what it would look like if we could use
 void variables:
 
 ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
 {
     auto result = Fn(args);
 
     glCheckError();
 
     return result;
 }

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
     scope(exit) glCheckError();
     return Fn(args);
}

:o)


Andrei

Sep 22 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

Andrei Alexandrescu wrote:
 Daniel Keep wrote:
 Andrei Alexandrescu wrote:
 Daniel Keep wrote:
 P.S. And another thing while I'm at it: why can't we declare void
 variables?  This is another thing that really complicates generic code.

 How would you use them?


 Andrei

 Here's an OLD example:

 ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
 {
     alias ReturnType!(Fn) returnT;

     static if( is( returnT == void ) )
         Fn(args);
     else
         auto result = Fn(args);

     glCheckError();

     static if( !is( returnT == void ) )
         return result;
 }

 This function is used to wrap OpenGL calls so that error checking is
 performed automatically.  Here's what it would look like if we could use
 void variables:

 ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
 {
     auto result = Fn(args);

     glCheckError();

     return result;
 }

 
 ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
 {
     scope(exit) glCheckError();
     return Fn(args);
 }
 
 :o)
 
 
 Andrei

Calling into a framehandler for such a trivial routine, especially if 
used with real time rendering, is definitely not a good idea, no matter 
how elegant its syntax is!

Sep 22 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jeremie Pelletier wrote:
 Andrei Alexandrescu wrote:
 Daniel Keep wrote:
 Andrei Alexandrescu wrote:
 Daniel Keep wrote:
 P.S. And another thing while I'm at it: why can't we declare void
 variables?  This is another thing that really complicates generic 
 code.

 How would you use them?


 Andrei

 Here's an OLD example:

 ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
 {
     alias ReturnType!(Fn) returnT;

     static if( is( returnT == void ) )
         Fn(args);
     else
         auto result = Fn(args);

     glCheckError();

     static if( !is( returnT == void ) )
         return result;
 }

 This function is used to wrap OpenGL calls so that error checking is
 performed automatically.  Here's what it would look like if we could use
 void variables:

 ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
 {
     auto result = Fn(args);

     glCheckError();

     return result;
 }

 ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
 {
     scope(exit) glCheckError();
     return Fn(args);
 }

 :o)


 Andrei

 
 Calling into a framehandler for such a trivial routine, especially if 
 used with real time rendering, is definitely not a good idea, no matter 
 how elegant its syntax is!

I guess that's what the smiley was about!

Andrei

Sep 22 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

Andrei Alexandrescu wrote:
 Jeremie Pelletier wrote:
 Andrei Alexandrescu wrote:
 Daniel Keep wrote:
 Andrei Alexandrescu wrote:
 Daniel Keep wrote:
 P.S. And another thing while I'm at it: why can't we declare void
 variables?  This is another thing that really complicates generic 
 code.

 How would you use them?


 Andrei

 Here's an OLD example:

 ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
 {
     alias ReturnType!(Fn) returnT;

     static if( is( returnT == void ) )
         Fn(args);
     else
         auto result = Fn(args);

     glCheckError();

     static if( !is( returnT == void ) )
         return result;
 }

 This function is used to wrap OpenGL calls so that error checking is
 performed automatically.  Here's what it would look like if we could 
 use
 void variables:

 ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
 {
     auto result = Fn(args);

     glCheckError();

     return result;
 }

 ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
 {
     scope(exit) glCheckError();
     return Fn(args);
 }

 :o)


 Andrei

 Calling into a framehandler for such a trivial routine, especially if 
 used with real time rendering, is definitely not a good idea, no 
 matter how elegant its syntax is!

 
 I guess that's what the smiley was about!
 
 Andrei

I thought it meant "there, problem solved!"

:o)

Sep 22 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-09-22 12:32:25 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Daniel Keep wrote:
 P.S. And another thing while I'm at it: why can't we declare void
 variables?  This is another thing that really complicates generic code.

 
 How would you use them?

Here's some generic code that would benefit from void as a variable 
type in the D/Objective-C bridge. Basically, it keeps the result of a 
function call, does some cleaning, and returns the result (with value 
conversions if needed). Unfortunately, you need a separate path for 
functions that returns void:

		// Call Objective-C code that may raise an exception here.
		static if (is(R == void)) func(objcArgs);
		else ObjcType!(R) objcResult = func(objcArgs);

		_NSRemoveHandler2(&_localHandler);
		
		// Converting return value.
		static if (is(R == void)) return;
		else return decapsulate!(R)(objcResult);

It could be rewriten in a simpler way if void variables were supported:

		// Call Objective-C code that may raise an exception here.
		ObjcType!(R) objcResult = func(objcArgs);

		_NSRemoveHandler2(&_localHandler);
		
		// Converting return value.
		return decapsulate!(R)(objcResult);

Note that returning a void resulting from a function call already works 
in D. You just can't "store" the result of such functions in a variable.

That said, it's not a big hassle in this case, thanks to static if. 
What suffers most is code readability.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Sep 23 2009

Christopher Wright <dhasenan gmail.com> writes:

Robert Jacques wrote:
 On Tue, 22 Sep 2009 07:09:09 -0400, bearophile 
 <bearophileHUGS lycos.com> wrote:
 Robert Jacques:

 [snip]
 Also, another issue for game/graphic/robotic programmers is the 
 ability to
 return fixed length arrays from functions. Though struct wrappers
 mitigates this.

 Why doesn't D allow to return fixed-sized arrays from functions? It's 
 a basic feature that I can find useful in many situations, it looks 
 more useful than most of the last features implemented in D2.

 Bye,
 bearophile

 
 Well, fixed length arrays are an implicit/explicit pointer to some 
 (stack/heap) allocated memory. So returning a fixed length array usually 
 means returning a pointer to now invalid stack memory. Allowing 
 fixed-length arrays to be returned by value would be nice, but basically 
 means the compiler is wrapping the array in a struct, which is easy 
 enough to do yourself. Using wrappers also avoids the breaking the 
 logical semantics of arrays (i.e. pass by reference).

You could ease the restriction by disallowing implicit conversion from 
static to dynamic arrays in certain situations. A function returning a 
dynamic array cannot return a static array; you cannot assign the return 
value of a function returning a static array to a dynamic array.

Or in those cases, put the static array on the heap.

Sep 22 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 22 Sep 2009 19:06:22 -0400, Christopher Wright  
<dhasenan gmail.com> wrote:
 Robert Jacques wrote:
 On Tue, 22 Sep 2009 07:09:09 -0400, bearophile  
 <bearophileHUGS lycos.com> wrote:
 Robert Jacques:

 [snip]
 Also, another issue for game/graphic/robotic programmers is the  
 ability to
 return fixed length arrays from functions. Though struct wrappers
 mitigates this.

 Why doesn't D allow to return fixed-sized arrays from functions? It's  
 a basic feature that I can find useful in many situations, it looks  
 more useful than most of the last features implemented in D2.

 Bye,
 bearophile

  Well, fixed length arrays are an implicit/explicit pointer to some  
 (stack/heap) allocated memory. So returning a fixed length array  
 usually means returning a pointer to now invalid stack memory. Allowing  
 fixed-length arrays to be returned by value would be nice, but  
 basically means the compiler is wrapping the array in a struct, which  
 is easy enough to do yourself. Using wrappers also avoids the breaking  
 the logical semantics of arrays (i.e. pass by reference).

 You could ease the restriction by disallowing implicit conversion from  
 static to dynamic arrays in certain situations. A function returning a  
 dynamic array cannot return a static array; you cannot assign the return  
 value of a function returning a static array to a dynamic array.

 Or in those cases, put the static array on the heap.

I'm not sure what you're referencing.

 A function returning a dynamic array cannot return a static array;

This is already true; you have to .dup the array to return it.

 you cannot assign the return value of a function returning a static  
 array to a dynamic array.

This is already sorta true; once the return value is assigned to a  
static-array, it may then be implicitly casted to dynamic.

Neither of which help the situation.

Sep 22 2009

Don <nospam nospam.com> writes:

bearophile wrote:
 Robert Jacques:
 
 Yes, but the unaligned version is slower, even for aligned data.

 
 This is true today, but in future it may become a little less true, thanks to
improvements in the CPUs.

The problem is that difference today is so extreme. On core2:
  movaps [mem128], xmm0; // aligned,   1 micro-op
  movups [mem128], xmm0; // unaligned, 9 micro-ops, even on aligned data!
In practice it's about an 8X speed difference!

On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
On i7, movups on aligned data is the same speed as movaps. It's still 
slower if it's an unaligned access.

It all depends on how important you think performance on Core2 and 
earlier Intel processors is.

Sep 22 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

Don wrote:
 bearophile wrote:
 Robert Jacques:

 Yes, but the unaligned version is slower, even for aligned data.

 This is true today, but in future it may become a little less true, 
 thanks to improvements in the CPUs.

 
 The problem is that difference today is so extreme. On core2:
  movaps [mem128], xmm0; // aligned,   1 micro-op
  movups [mem128], xmm0; // unaligned, 9 micro-ops, even on aligned data!
 In practice it's about an 8X speed difference!
 
 On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
 On i7, movups on aligned data is the same speed as movaps. It's still 
 slower if it's an unaligned access.
 
 It all depends on how important you think performance on Core2 and 
 earlier Intel processors is.

I wasn't aware of that, and here I was wondering why my SSE code was 
slower than the FPU in certain places on my core2 quad, I now recall 
using a lot of movups instructions, thanks for the tip.

Sep 22 2009

#ponce <aliloko gmail.com> writes:

 In practice it's about an 8X speed difference!
 
 On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
 On i7, movups on aligned data is the same speed as movaps. It's still 
 slower if it's an unaligned access.
 
 It all depends on how important you think performance on Core2 and 
 earlier Intel processors is.

 
 I wasn't aware of that, and here I was wondering why my SSE code was 
 slower than the FPU in certain places on my core2 quad, I now recall 
 using a lot of movups instructions, thanks for the tip.

Indeed SSE is known to be overkill when dealing with unaligned data.
In C++ writing SSE code is so painful you either have to use intrisics, or use
libraries like Eigen (a SIMD vectorization library based on expression
templates, which can generate SSE, AVX or FPU code). But using such a library
is often way too intrusive, and alignement is not in standard C++.

D does already understand arrays operations like Eigen do, in order to increase
cacheability. It would be great if it could statically detect 16-byte aligned
data and perform SSE when possible (though there must be many others things to
do :) ).

Sep 22 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

#ponce wrote:
 In practice it's about an 8X speed difference!

 On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
 On i7, movups on aligned data is the same speed as movaps. It's still 
 slower if it's an unaligned access.

 It all depends on how important you think performance on Core2 and 
 earlier Intel processors is.

 I wasn't aware of that, and here I was wondering why my SSE code was 
 slower than the FPU in certain places on my core2 quad, I now recall 
 using a lot of movups instructions, thanks for the tip.

 
 Indeed SSE is known to be overkill when dealing with unaligned data.
 In C++ writing SSE code is so painful you either have to use intrisics, or use
libraries like Eigen (a SIMD vectorization library based on expression
templates, which can generate SSE, AVX or FPU code). But using such a library
is often way too intrusive, and alignement is not in standard C++.
 
 D does already understand arrays operations like Eigen do, in order to
increase cacheability. It would be great if it could statically detect 16-byte
aligned data and perform SSE when possible (though there must be many others
things to do :) ).

The D memory manager already aligns data on 16 bytes boundaries. The 
only case I can think of right now is when data is in a struct or class:

struct {
	float[4] vec; // aligned!
	int a;
	float[4] vec; // unaligned!
}

Sep 22 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 22 Sep 2009 12:09:23 -0400, Jeremie Pelletier <jeremiep gmail.com>  
wrote:

 #ponce wrote:
 In practice it's about an 8X speed difference!

 On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
 On i7, movups on aligned data is the same speed as movaps. It's still  
 slower if it's an unaligned access.

 It all depends on how important you think performance on Core2 and  
 earlier Intel processors is.

 I wasn't aware of that, and here I was wondering why my SSE code was  
 slower than the FPU in certain places on my core2 quad, I now recall  
 using a lot of movups instructions, thanks for the tip.

  Indeed SSE is known to be overkill when dealing with unaligned data.
 In C++ writing SSE code is so painful you either have to use intrisics,  
 or use libraries like Eigen (a SIMD vectorization library based on  
 expression templates, which can generate SSE, AVX or FPU code). But  
 using such a library is often way too intrusive, and alignement is not  
 in standard C++.
  D does already understand arrays operations like Eigen do, in order to  
 increase cacheability. It would be great if it could statically detect  
 16-byte aligned data and perform SSE when possible (though there must  
 be many others things to do :) ).

 The D memory manager already aligns data on 16 bytes boundaries. The  
 only case I can think of right now is when data is in a struct or class:

 struct {
 	float[4] vec; // aligned!
 	int a;
 	float[4] vec; // unaligned!
 }

Yes, although classes have hidden vars, which are runtime dependent,  
changing the offset. Structs may be embedded in other things (therefore  
offset). And then there's the whole slicing from an array issue.

Sep 22 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

Robert Jacques wrote:
 On Tue, 22 Sep 2009 12:09:23 -0400, Jeremie Pelletier 
 <jeremiep gmail.com> wrote:
 
 #ponce wrote:
 In practice it's about an 8X speed difference!

 On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
 On i7, movups on aligned data is the same speed as movaps. It's 
 still slower if it's an unaligned access.

 It all depends on how important you think performance on Core2 and 
 earlier Intel processors is.

 I wasn't aware of that, and here I was wondering why my SSE code was 
 slower than the FPU in certain places on my core2 quad, I now recall 
 using a lot of movups instructions, thanks for the tip.

  Indeed SSE is known to be overkill when dealing with unaligned data.
 In C++ writing SSE code is so painful you either have to use 
 intrisics, or use libraries like Eigen (a SIMD vectorization library 
 based on expression templates, which can generate SSE, AVX or FPU 
 code). But using such a library is often way too intrusive, and 
 alignement is not in standard C++.
  D does already understand arrays operations like Eigen do, in order 
 to increase cacheability. It would be great if it could statically 
 detect 16-byte aligned data and perform SSE when possible (though 
 there must be many others things to do :) ).

 The D memory manager already aligns data on 16 bytes boundaries. The 
 only case I can think of right now is when data is in a struct or class:

 struct {
     float[4] vec; // aligned!
     int a;
     float[4] vec; // unaligned!
 }

 
 Yes, although classes have hidden vars, which are runtime dependent, 
 changing the offset. Structs may be embedded in other things (therefore 
 offset). And then there's the whole slicing from an array issue.

Ah yes, you are right. Then I guess it really is up to the programmer to 
know if the data is aligned or not and select different code paths from 
it. Adding checks at runtime just adds to the overhead we're trying to 
save by using SSE in the first place.

It would be great if we could declare aliases to asm instructions and 
use template functions with a (bool aligned = true) and set a movps 
alias to either movaps or movups depending on the value of aligned.

Sep 22 2009

Christopher Wright <dhasenan gmail.com> writes:

Robert Jacques wrote:
 Yes, although classes have hidden vars, which are runtime dependent, 
 changing the offset. Structs may be embedded in other things (therefore 
 offset). And then there's the whole slicing from an array issue.

Um, no. Field accesses for class variables are (pointer + offset). 
Successive subclasses append their fields to the object, so if you 
sliced an object and changed its vtbl pointer, you could get a valid 
instance of its superclass.

If the class layout weren't determined at compile time, field accesses 
would be as slow as virtual function calls.

Sep 22 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 22 Sep 2009 18:56:12 -0400, Christopher Wright  
<dhasenan gmail.com> wrote:

 Robert Jacques wrote:
 Yes, although classes have hidden vars, which are runtime dependent,  
 changing the offset. Structs may be embedded in other things (therefore  
 offset). And then there's the whole slicing from an array issue.

 Um, no. Field accesses for class variables are (pointer + offset).  
 Successive subclasses append their fields to the object, so if you  
 sliced an object and changed its vtbl pointer, you could get a valid  
 instance of its superclass.

 If the class layout weren't determined at compile time, field accesses  
 would be as slow as virtual function calls.

Clarification: I meant slicing an array of value types. i.e. if the size  
of the value type isn't a multiple of 16, then the alignment will change.  
(i.e. float3[])

As for classes, yes the compiler knows, but the point is that you don't  
know the size and therefore alignment of your super-class. Worse, it could  
change with different run-times or OSes. So trying to manually align  
things by introducing spacing vars, etc. is both hard, error-prone and  
non-portable.

Sep 22 2009

bearophile <bearophileHUGS lycos.com> writes:

Jeremie Pelletier:

 The D memory manager already aligns data on 16 bytes boundaries. The 
 only case I can think of right now is when data is in a struct or class:

LDC doesn't align to 16 the normal arrays inside functions:
A small test program:

void main() {
    float[4] a = [1.0f, 2.0, 3.0, 4.0];
    float[4] b, c;        
    b[] = 10.0f;
    c[] = a[] + b[];
}


The ll code (the asm of the LLVM) LDC produces, this is the head:
ldc -O3 -inline -release -output-ll vect1.d

define x86_stdcallcc i32  _Dmain(%"char[][]" %unnamed) {
entry:
  %a = alloca [4 x float], align 4                ; <[4 x float]*> [#uses=5]
  %b = alloca [4 x float], align 4                ; <[4 x float]*> [#uses=4]
  %c = alloca [4 x float], align 4                ; <[4 x float]*> [#uses=4]
  %.gc_mem = call noalias i8*  _d_newarrayvT(%object.TypeInfo*
 _D11TypeInfo_Af6__initZ, i32 4) ; <i8*> [#uses=5]
[...]


The asm it produces for the whole main (the call to the array op is inlined,
while _d_array_init_float is not inlined, I don't know why):
ldc -O3 -inline -release -output-s vect1.d

_Dmain:
	pushl	%esi
	subl	$64, %esp
	movl	$4, 4(%esp)
	movl	$_D11TypeInfo_Af6__initZ, (%esp)
	call	_d_newarrayvT
	movl	$1065353216, (%eax)
	movl	$1073741824, 4(%eax)
	movl	$1077936128, 8(%eax)
	movl	$1082130432, 12(%eax)
	movl	8(%eax), %ecx
	movl	%ecx, 56(%esp)
	movl	4(%eax), %ecx
	movl	%ecx, 52(%esp)
	movl	(%eax), %eax
	movl	%eax, 48(%esp)
	movl	$1082130432, 60(%esp)
	leal	32(%esp), %esi
	movl	%esi, (%esp)
	movl	$2143289344, 8(%esp)
	movl	$4, 4(%esp)
	call	_d_array_init_float
	leal	16(%esp), %eax
	movl	%eax, (%esp)
	movl	$2143289344, 8(%esp)
	movl	$4, 4(%esp)
	call	_d_array_init_float
	movl	%esi, (%esp)
	movl	$1092616192, 8(%esp)
	movl	$4, 4(%esp)
	call	_d_array_init_float
	movss	48(%esp), %xmm0
	addss	32(%esp), %xmm0
	movss	%xmm0, 16(%esp)
	movss	52(%esp), %xmm0
	addss	36(%esp), %xmm0
	movss	%xmm0, 20(%esp)
	movss	56(%esp), %xmm0
	addss	40(%esp), %xmm0
	movss	%xmm0, 24(%esp)
	movss	60(%esp), %xmm0
	addss	44(%esp), %xmm0
	movss	%xmm0, 28(%esp)
	xorl	%eax, %eax
	addl	$64, %esp
	popl	%esi
	ret	$8


By the way, using Link-Time Optimization and interning LDC produces this LL
(whole main):

define x86_stdcallcc i32  _Dmain(%"char[][]" %unnamed) {
entry:
  %b = alloca [4 x float], align 4                ; <[4 x float]*> [#uses=1]
  %c = alloca [4 x float], align 4                ; <[4 x float]*> [#uses=1]
  %.gc_mem = call noalias i8*  _d_newarrayvT(%object.TypeInfo*
 _D11TypeInfo_Af6__initZ, i32 4) ; <i8*> [#uses=4]
  %.gc_mem1 = bitcast i8* %.gc_mem to float*      ; <float*> [#uses=1]
  store float 1.000000e+00, float* %.gc_mem1
  %tmp3 = getelementptr i8* %.gc_mem, i32 4       ; <i8*> [#uses=1]
  %0 = bitcast i8* %tmp3 to float*                ; <float*> [#uses=1]
  store float 2.000000e+00, float* %0
  %tmp4 = getelementptr i8* %.gc_mem, i32 8       ; <i8*> [#uses=1]
  %1 = bitcast i8* %tmp4 to float*                ; <float*> [#uses=1]
  store float 3.000000e+00, float* %1
  %tmp5 = getelementptr i8* %.gc_mem, i32 12      ; <i8*> [#uses=1]
  %2 = bitcast i8* %tmp5 to float*                ; <float*> [#uses=1]
  store float 4.000000e+00, float* %2
  %tmp8 = getelementptr [4 x float]* %b, i32 0, i32 0 ; <float*> [#uses=2]
  call void  _d_array_init_float(float* nocapture %tmp8, i32 4, float
0x7FF8000000000000)
  %tmp9 = getelementptr [4 x float]* %c, i32 0, i32 0 ; <float*> [#uses=1]
  call void  _d_array_init_float(float* nocapture %tmp9, i32 4, float
0x7FF8000000000000)
  call void  _d_array_init_float(float* nocapture %tmp8, i32 4, float
1.000000e+01)
  ret i32 0
}


Bye,
bearophile

Sep 22 2009

bearophile <bearophileHUGS lycos.com> writes:

Robert Jacques:

 Well, fixed length arrays are an implicit/explicit pointer to some  
 (stack/heap) allocated memory. So returning a fixed length array usually  
 means returning a pointer to now invalid stack memory. Allowing  
 fixed-length arrays to be returned by value would be nice, but basically  
 means the compiler is wrapping the array in a struct, which is easy enough  
 to do yourself. Using wrappers also avoids the breaking the logical  
 semantics of arrays (i.e. pass by reference).

As usual this discussion is developing into other directions that are both
interesting and bordeline too complex for me :-)

Arrays are the most common and useful data structure (beside single
values/variables). And experience shows me that in some situations static
arrays can lead to higher performance (for example if you have a matrix, and
its number of columns is known at compile time and such number is a power of 2,
then the compiler can use just a shift to find a cell).

So I'd like to see improving the D management of such arrays (for me it's a
MUCH more common problem than for example the last contravariant argument types
discussed by Andrei. I am for improving simple things that I can understand and
use every day first, and complex things later. D2 is getting too much difficult
for me), even if some extra annotations are necessary.

The possible ways that can be useful:
- To return small arrays (for example the ones used by SSE/AVX registers) by
value. Non need to create silly wrapper structs. The compiler has to show a
performance warning when such arrays is bigger than 1024 bytes of RAM.
- LLVM has good stack-allocated (alloca) arrays, like the ones introduced by
C99. Having a way to use them in D too is good.
- A way to return just the reference to a dynamic array when the function
already takes in input the reference to it.
- To automatically allocate and copy returned static arrays on the heap, to
keep the situation safe and avoid too many copies of large arrays (so it gets
copied only once here). I'm not sure about this.

Bye,
bearophile

Sep 22 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Does dmd have SSE intrinsics?