digitalmars.D - Auto-Vectorization and array/vector operations

Steven (33/33) Jul 15 2015 I was trying to show someone how awesome Dlang was earlier, and

jmh530 (6/12) Jul 15 2015 I'm not sure how the compilers handle auto-vectorization, but I
John Colvin (5/13) Jul 16 2015 Not sure why DMD isn't using SIMD on the first one, haven't
Iain Buclaw via Digitalmars-d (11/25) Jul 16 2015 DMD makes leverage of vector operations in the library, rather than in
"Casper =?UTF-8?B?RsOmcmdlbWFuZCI=?= <shorttail hotmail.com> (15/20) Jul 18 2015 If you want to use vector operations, YOU have to write the code

"Steven" <stevenwilson500 gmail.com> writes:

I was trying to show someone how awesome Dlang was earlier, and 
how the vector operations are expected to take advantage of the 
CPU vector instructions, and was dumbstruck when dmd and gdc both 
failed to auto-vectorize a simple case.  I've stripped it down to 
the bare minimum and loaded the example on the interactive 
compiler: 


The reference documentation for arrays says:
Implementation note: many of the more common vector operations 
are expected to take advantage of any vector math instructions 
available on the target computer.

Does this mean that while compilers are expected to take 
advantage of them, they currently do not, even when they have 
proper alignment?  I haven't tried LDC yet, so maybe LDC does 
perform auto-vectorization and I should attempt to use LDC if I 
plan on using vector ops a lot?

import core.simd;

float[256] exampleA(float[256] a, float[256] b)
{
   float[256] c;
   // results in subss (scalar instruction)
   c[] = a[] - b[];
   return c;
}

float[256] exampleB(float[256] a, float[256] b)
{
   float8[32]va = cast(float8[32])a;
   float8[32]vb = cast(float8[32])b;
   float8[32]vc;

   // results in subps (vector instruction)
   vc[] = va[] - vb[];

   return cast(float[256])vc;
}

Jul 15 2015

"jmh530" <john.michael.hall gmail.com> writes:

On Wednesday, 15 July 2015 at 22:42:05 UTC, Steven wrote:
 I was trying to show someone how awesome Dlang was earlier, and 
 how the vector operations are expected to take advantage of the 
 CPU vector instructions, and was dumbstruck when dmd and gdc 
 both failed to auto-vectorize a simple case.  I've stripped it 
 down to the bare minimum and loaded the example on the 
 interactive compiler:

I'm not sure how the compilers handle auto-vectorization, but I 
found
http://dconf.org/2013/talks/evans_2.html
informative. It recommends not casting between float and simd 
types.

Jul 15 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Wednesday, 15 July 2015 at 22:42:05 UTC, Steven wrote:
 I was trying to show someone how awesome Dlang was earlier, and 
 how the vector operations are expected to take advantage of the 
 CPU vector instructions, and was dumbstruck when dmd and gdc 
 both failed to auto-vectorize a simple case.  I've stripped it 
 down to the bare minimum and loaded the example on the 
 interactive compiler: 


 [...]

Not sure why DMD isn't using SIMD on the first one, haven't 
looked at that code in a while. Anyway, gdc vectorises both: 
http://goo.gl/CzD15s and that's with gcc4.9 backend, it can 
probably do better build against something more recent.

Jul 16 2015

Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 16 July 2015 at 00:42, Steven via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 I was trying to show someone how awesome Dlang was earlier, and how the
 vector operations are expected to take advantage of the CPU vector
 instructions, and was dumbstruck when dmd and gdc both failed to
 auto-vectorize a simple case.  I've stripped it down to the bare minimum and
 loaded the example on the interactive compiler:


 The reference documentation for arrays says:
 Implementation note: many of the more common vector operations are expected
 to take advantage of any vector math instructions available on the target
 computer.

DMD makes leverage of vector operations in the library, rather than in
the generated code.  So as long as you are doing array operations
using any of the supported types...


 Does this mean that while compilers are expected to take advantage of them,
 they currently do not, even when they have proper alignment?  I haven't
 tried LDC yet, so maybe LDC does perform auto-vectorization and I should
 attempt to use LDC if I plan on using vector ops a lot?

Auto-vectorization is deliberately strict in what triggers it to occur.

It is possible to give the compiler hints, however I'm not sure that
this should be done by the code generator.

See, for example: http://goo.gl/iMBbRs

Regards
Iain

Jul 16 2015

"Casper =?UTF-8?B?RsOmcmdlbWFuZCI=?= <shorttail hotmail.com> writes:

On Wednesday, 15 July 2015 at 22:42:05 UTC, Steven wrote:
 Does this mean that while compilers are expected to take 
 advantage of them, they currently do not, even when they have 
 proper alignment?  I haven't tried LDC yet, so maybe LDC does 
 perform auto-vectorization and I should attempt to use LDC if I 
 plan on using vector ops a lot?

If you want to use vector operations, YOU have to write the code 
for it. Addition and multiplication seem like easy things to have 
vectorized automatically, but it's complicated to do (I don't 
know of any compiler that does a convincing and reliable job of 
auto-vectorization) and likely it won't give you many of the 
other useful vector operations.

Someone from a game engine company (Unreal?) held a nice talk 
about how SIMD pervades their code base, including memory layout, 
in order to get decent performance on the less trivial 
operations, like dot product. I don't have a link though. =/

The main reason to write the SIMD code yourself is that you know 
that it's going to work the way you want. There won't ever be a 
case where you add one more variable to some structure and the 
compiler decides it can no longer auto-vectorize a loop.

Jul 18 2015

D Programming

C/C++ Programming

Other

digitalmars.D - Auto-Vectorization and array/vector operations