www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Auto-Vectorization and array/vector operations

reply "Steven" <stevenwilson500 gmail.com> writes:
I was trying to show someone how awesome Dlang was earlier, and 
how the vector operations are expected to take advantage of the 
CPU vector instructions, and was dumbstruck when dmd and gdc both 
failed to auto-vectorize a simple case.  I've stripped it down to 
the bare minimum and loaded the example on the interactive 
compiler: 
http://asm.dlang.org/#%7B%22version%22%3A3%2C%22filterAsm%22%3A%7B%7D%2C%22compilers%22%3A%5B%7B%22sourcez%22%3A%22JYWwDg9gTgLgBAY2gUwHQGdQBMDcAoPAMwBsIBDGAbQCYBWANgF05kAPM8Y5AQQAoTyVOkzhkANHAEUaDZgCMAlHgDeeOJNLThzBPnUB6fXCjJ0AV2Ix0cYADs45uenRrElZgF5R7uAFo4cu56xsgwZlD2ungAvgRSQrIs7JzIAEL8mgki4hqCMiKKKq7xAByUAMzUjABuZHBeCGToMBmCZZWMCmTBpRVV1XL1iE0tvR0Kcj2Z7f1R6q6GIeaW1nYOZnJgLurVCD5etT7%2BA0EE6iZhEcPNrVqyCrv40UAAA%3D%22%2C%22compiler%22%3A%22dmd2067%22%2C%22options%22%3A%22-O%20-release%20-inline%20-boundscheck%3Doff%22%7D%5D%7D

The reference documentation for arrays says:
Implementation note: many of the more common vector operations 
are expected to take advantage of any vector math instructions 
available on the target computer.

Does this mean that while compilers are expected to take 
advantage of them, they currently do not, even when they have 
proper alignment?  I haven't tried LDC yet, so maybe LDC does 
perform auto-vectorization and I should attempt to use LDC if I 
plan on using vector ops a lot?

import core.simd;

float[256] exampleA(float[256] a, float[256] b)
{
   float[256] c;
   // results in subss (scalar instruction)
   c[] = a[] - b[];
   return c;
}

float[256] exampleB(float[256] a, float[256] b)
{
   float8[32]va = cast(float8[32])a;
   float8[32]vb = cast(float8[32])b;
   float8[32]vc;

   // results in subps (vector instruction)
   vc[] = va[] - vb[];

   return cast(float[256])vc;
}
Jul 15 2015
next sibling parent "jmh530" <john.michael.hall gmail.com> writes:
On Wednesday, 15 July 2015 at 22:42:05 UTC, Steven wrote:
 I was trying to show someone how awesome Dlang was earlier, and 
 how the vector operations are expected to take advantage of the 
 CPU vector instructions, and was dumbstruck when dmd and gdc 
 both failed to auto-vectorize a simple case.  I've stripped it 
 down to the bare minimum and loaded the example on the 
 interactive compiler:
I'm not sure how the compilers handle auto-vectorization, but I found http://dconf.org/2013/talks/evans_2.html informative. It recommends not casting between float and simd types.
Jul 15 2015
prev sibling next sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 15 July 2015 at 22:42:05 UTC, Steven wrote:
 I was trying to show someone how awesome Dlang was earlier, and 
 how the vector operations are expected to take advantage of the 
 CPU vector instructions, and was dumbstruck when dmd and gdc 
 both failed to auto-vectorize a simple case.  I've stripped it 
 down to the bare minimum and loaded the example on the 
 interactive compiler: 
 http://asm.dlang.org/#%7B%22version%22%3A3%2C%22filterAsm%22%3A%7B%7D%2C%22compilers%22%3A%5B%7B%22sourcez%22%3A%22JYWwDg9gTgLgBAY2gUwHQGdQBMDcAoPAMwBsIBDGAbQCYBWANgF05kAPM8Y5AQQAoTyVOkzhkANHAEUaDZgCMAlHgDeeOJNLThzBPnUB6fXCjJ0AV2Ix0cYADs45uenRrElZgF5R7uAFo4cu56xsgwZlD2ungAvgRSQrIs7JzIAEL8mgki4hqCMiKKKq7xAByUAMzUjABuZHBeCGToMBmCZZWMCmTBpRVV1XL1iE0tvR0Kcj2Z7f1R6q6GIeaW1nYOZnJgLurVCD5etT7%2BA0EE6iZhEcPNrVqyCrv40UAAA%3D%22%2C%22compiler%22%3A%22dmd2067%22%2C%22options%22%3A%22-O%20-release%20-inline%20-boundscheck%3Doff%22%7D%5D%7D

 [...]
Not sure why DMD isn't using SIMD on the first one, haven't looked at that code in a while. Anyway, gdc vectorises both: http://goo.gl/CzD15s and that's with gcc4.9 backend, it can probably do better build against something more recent.
Jul 16 2015
prev sibling next sibling parent Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 16 July 2015 at 00:42, Steven via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 I was trying to show someone how awesome Dlang was earlier, and how the
 vector operations are expected to take advantage of the CPU vector
 instructions, and was dumbstruck when dmd and gdc both failed to
 auto-vectorize a simple case.  I've stripped it down to the bare minimum and
 loaded the example on the interactive compiler:
 http://asm.dlang.org/#%7B%22version%22%3A3%2C%22filterAsm%22%3A%7B%7D%2C%22compilers%22%3A%5B%7B%22sourcez%22%3A%22JYWwDg9gTgLgBAY2gUwHQGdQBMDcAoPAMwBsIBDGAbQCYBWANgF05kAPM8Y5AQQAoTyVOkzhkANHAEUaDZgCMAlHgDeeOJNLThzBPnUB6fXCjJ0AV2Ix0cYADs45uenRrElZgF5R7uAFo4cu56xsgwZlD2ungAvgRSQrIs7JzIAEL8mgki4hqCMiKKKq7xAByUAMzUjABuZHBeCGToMBmCZZWMCmTBpRVV1XL1iE0tvR0Kcj2Z7f1R6q6GIeaW1nYOZnJgLurVCD5etT7%2BA0EE6iZhEcPNrVqyCrv40UAAA%3D%22%2C%22compiler%22%3A%22dmd2067%22%2C%22options%22%3A%22-O%20-release%20-inline%20-boundscheck%3Doff%22%7D%5D%7D

 The reference documentation for arrays says:
 Implementation note: many of the more common vector operations are expected
 to take advantage of any vector math instructions available on the target
 computer.
DMD makes leverage of vector operations in the library, rather than in the generated code. So as long as you are doing array operations using any of the supported types...
 Does this mean that while compilers are expected to take advantage of them,
 they currently do not, even when they have proper alignment?  I haven't
 tried LDC yet, so maybe LDC does perform auto-vectorization and I should
 attempt to use LDC if I plan on using vector ops a lot?
Auto-vectorization is deliberately strict in what triggers it to occur. It is possible to give the compiler hints, however I'm not sure that this should be done by the code generator. See, for example: http://goo.gl/iMBbRs Regards Iain
Jul 16 2015
prev sibling parent "Casper =?UTF-8?B?RsOmcmdlbWFuZCI=?= <shorttail hotmail.com> writes:
On Wednesday, 15 July 2015 at 22:42:05 UTC, Steven wrote:
 Does this mean that while compilers are expected to take 
 advantage of them, they currently do not, even when they have 
 proper alignment?  I haven't tried LDC yet, so maybe LDC does 
 perform auto-vectorization and I should attempt to use LDC if I 
 plan on using vector ops a lot?
If you want to use vector operations, YOU have to write the code for it. Addition and multiplication seem like easy things to have vectorized automatically, but it's complicated to do (I don't know of any compiler that does a convincing and reliable job of auto-vectorization) and likely it won't give you many of the other useful vector operations. Someone from a game engine company (Unreal?) held a nice talk about how SIMD pervades their code base, including memory layout, in order to get decent performance on the less trivial operations, like dot product. I don't have a link though. =/ The main reason to write the SIMD code yourself is that you know that it's going to work the way you want. There won't ever be a case where you add one more variable to some structure and the compiler decides it can no longer auto-vectorize a loop.
Jul 18 2015