digitalmars.D.learn - SIMD under LDC

Igor (11/11) Sep 04 2017 I found that I can't use __simd function from core.simd under LDC

Nicholas Wilson (18/31) Sep 04 2017 You have several options:

12345swordy (3/36) Sep 04 2017 I seen cases where the compiler fail to optimized for smid.

Igor (5/9) Sep 05 2017 I tried it and LDC optimized build did generate SIMD instructions

Johan Engelen (9/19) Sep 05 2017 You can use the module ldc.gccbuiltins_x86.di,

Igor (5/26) Sep 06 2017 I'll try that this evening. Thanks! I'll also open an issue but

Igor (18/49) Sep 06 2017 I opened a feature request on github. I also tried using the

Johan Engelen (10/14) Sep 07 2017 That's because SSSE3 instructions are not enabled by default, so

Igor (5/22) Sep 07 2017 Thanks Johan. I tried this and now it does compile but it crashes

Igor (18/42) Sep 11 2017 I will try to reproduce this in minimal project and open LDC bug

Igor (8/23) Sep 11 2017 Regarding the crash in debug mode the problem was that my masks

Igor <stojkovic.igor gmail.com> writes:

I found that I can't use __simd function from core.simd under LDC 
and that it has ldc.simd but I couldn't find how to implement 
equivalent to this with it:

ubyte16* masks = ...;
foreach (ref c; pixels) {
	c = __simd(XMM.PSHUFB, c, *masks);
}

I see it has shufflevector function but it only accepts constant 
masks and I am using a variable one. Is this possible under LDC?

BTW. Shuffling channels within pixels using DMD simd is about 5 
times faster than with normal code on my machine :)

Sep 04 2017

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:
 I found that I can't use __simd function from core.simd under 
 LDC

Correct LDC does not support the core.simd interface.

 and that it has ldc.simd but I couldn't find how to implement 
 equivalent to this with it:

 ubyte16* masks = ...;
 foreach (ref c; pixels) {
 	c = __simd(XMM.PSHUFB, c, *masks);
 }

 I see it has shufflevector function but it only accepts 
 constant masks and I am using a variable one. Is this possible 
 under LDC?

You have several options:
* write a regular for loop and let LDC's optimiser take care of 
the rest.

alias mask_t = ReturnType!(equalMask!ubyte16);
pragma(LDC_intrinsic, "llvm.masked.load.v16i8.p0v16i8")
     ubyte16 llvm_masked_load(ubyte16* val,int align, mask_t mask, 
ubyte16 fallthru);

ubyte16* masks = ...;
foreach (ref c; pixels) {
         auto mask = equalMask!ubyte16(*masks, [-1,-1,-1, ...]);
	c = llvm_masked_load(&c,16,mask, [0,0,0,0 ... ]);
}

The second one might not work, because of type differences in 
llvm, but should serve as a guide to hacking the `cmpMask` IR 
code in ldc.simd to do what you want it to.

 BTW. Shuffling channels within pixels using DMD simd is about 5 
 times faster than with normal code on my machine :)

Don't underestimate ldc's optimiser ;)

Sep 04 2017

12345swordy <alexanderheistermann gmail.com> writes:

On Monday, 4 September 2017 at 23:06:27 UTC, Nicholas Wilson 
wrote:
 On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:
 I found that I can't use __simd function from core.simd under 
 LDC

 Correct LDC does not support the core.simd interface.

 and that it has ldc.simd but I couldn't find how to implement 
 equivalent to this with it:

 ubyte16* masks = ...;
 foreach (ref c; pixels) {
 	c = __simd(XMM.PSHUFB, c, *masks);
 }

 I see it has shufflevector function but it only accepts 
 constant masks and I am using a variable one. Is this possible 
 under LDC?

 You have several options:
 * write a regular for loop and let LDC's optimiser take care of 
 the rest.

 alias mask_t = ReturnType!(equalMask!ubyte16);
 pragma(LDC_intrinsic, "llvm.masked.load.v16i8.p0v16i8")
     ubyte16 llvm_masked_load(ubyte16* val,int align, mask_t 
 mask, ubyte16 fallthru);

 ubyte16* masks = ...;
 foreach (ref c; pixels) {
         auto mask = equalMask!ubyte16(*masks, [-1,-1,-1, ...]);
 	c = llvm_masked_load(&c,16,mask, [0,0,0,0 ... ]);
 }

 The second one might not work, because of type differences in 
 llvm, but should serve as a guide to hacking the `cmpMask` IR 
 code in ldc.simd to do what you want it to.

 BTW. Shuffling channels within pixels using DMD simd is about 
 5 times faster than with normal code on my machine :)

 Don't underestimate ldc's optimiser ;)

I seen cases where the compiler fail to optimized for smid.

Sep 04 2017

Igor <stojkovic.igor gmail.com> writes:

On Tuesday, 5 September 2017 at 01:11:29 UTC, 12345swordy wrote:
 On Monday, 4 September 2017 at 23:06:27 UTC, Nicholas Wilson 
 wrote:
 Don't underestimate ldc's optimiser ;)

 I seen cases where the compiler fail to optimized for smid.

I tried it and LDC optimized build did generate SIMD instructions 
from regular code but it used multiple ones to do job so it is 
about 1.4 times slower than manual SIMD version with DMD. That is 
probably good enough for me.

Sep 05 2017

Johan Engelen <j j.nl> writes:

On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:
 I found that I can't use __simd function from core.simd under 
 LDC and that it has ldc.simd but I couldn't find how to 
 implement equivalent to this with it:

 ubyte16* masks = ...;
 foreach (ref c; pixels) {
 	c = __simd(XMM.PSHUFB, c, *masks);
 }

 I see it has shufflevector function but it only accepts 
 constant masks and I am using a variable one. Is this possible 
 under LDC?

You can use the module ldc.gccbuiltins_x86.di, 
__builtin_ia32_pshufb128 and __builtin_ia32_pshufb256.

(also see 
https://gcc.gnu.org/onlinedocs/gcc-4.4.5/gcc/X86-Built_002din-Functions.html)

Please file a feature request about shufflevector with variable 
mask in our (LDC) issue tracker on Github; with some code that 
you'd expect to work. Thanks.

- Johan

Sep 05 2017

Igor <stojkovic.igor gmail.com> writes:

On Tuesday, 5 September 2017 at 18:50:34 UTC, Johan Engelen wrote:
 On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:
 I found that I can't use __simd function from core.simd under 
 LDC and that it has ldc.simd but I couldn't find how to 
 implement equivalent to this with it:

 ubyte16* masks = ...;
 foreach (ref c; pixels) {
 	c = __simd(XMM.PSHUFB, c, *masks);
 }

 I see it has shufflevector function but it only accepts 
 constant masks and I am using a variable one. Is this possible 
 under LDC?

 You can use the module ldc.gccbuiltins_x86.di, 
 __builtin_ia32_pshufb128 and __builtin_ia32_pshufb256.

 (also see 
 https://gcc.gnu.org/onlinedocs/gcc-4.4.5/gcc/X86-Built_002din-Functions.html)

 Please file a feature request about shufflevector with variable 
 mask in our (LDC) issue tracker on Github; with some code that 
 you'd expect to work. Thanks.

 - Johan

I'll try that this evening. Thanks! I'll also open an issue but 
are you sure such feature request is valid since LLVM 
shufflevector instruction, as far as I see, only supports 
constant masks as well.

Sep 06 2017

Igor <stojkovic.igor gmail.com> writes:

On Wednesday, 6 September 2017 at 09:01:18 UTC, Igor wrote:
 On Tuesday, 5 September 2017 at 18:50:34 UTC, Johan Engelen 
 wrote:
 On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:
 I found that I can't use __simd function from core.simd under 
 LDC and that it has ldc.simd but I couldn't find how to 
 implement equivalent to this with it:

 ubyte16* masks = ...;
 foreach (ref c; pixels) {
 	c = __simd(XMM.PSHUFB, c, *masks);
 }

 I see it has shufflevector function but it only accepts 
 constant masks and I am using a variable one. Is this 
 possible under LDC?

 You can use the module ldc.gccbuiltins_x86.di, 
 __builtin_ia32_pshufb128 and __builtin_ia32_pshufb256.

 (also see 
 https://gcc.gnu.org/onlinedocs/gcc-4.4.5/gcc/X86-Built_002din-Functions.html)

 Please file a feature request about shufflevector with 
 variable mask in our (LDC) issue tracker on Github; with some 
 code that you'd expect to work. Thanks.

 - Johan

 I'll try that this evening. Thanks! I'll also open an issue but 
 are you sure such feature request is valid since LLVM 
 shufflevector instruction, as far as I see, only supports 
 constant masks as well.

I opened a feature request on github. I also tried using the 
gccbuiltins but I got this error:

LLVM ERROR: Cannot select: 0x2199c96fd70: v16i8 = X86ISD::PSHUFB 
0x2199c74e9a8, 0x2199c74d6c0
   0x2199c74e9a8: v16i8,ch = CopyFromReg 0x21994bcfd90, 
Register:v16i8 %vreg384
     0x2199c96fb00: v16i8 = Register %vreg384
   0x2199c74d6c0: v16i8,ch = CopyFromReg 0x21994bcfd90, 
Register:v16i8 %vreg385
     0x2199c74ed50: v16i8 = Register %vreg385
In function: _D7assetdb12loadBmpImageFAxaZf
Building x64\LDCDebug\DNgin.exe failed!

You can see the code I used here: 
https://github.com/igor84/dngin/blob/3c171330843af71170a6ee4ae164a76bf58c35f6/source/assetdb.d#L123

Note that if you want to try it you will need a test.bmp in 
specific format where header.compression == 3, like this one: 
https://drive.google.com/file/d/0B9l8IgnRaPwCU0hIWEtHUElhTTg/view?usp=sharing

Sep 06 2017

Johan Engelen <j j.nl> writes:

On Wednesday, 6 September 2017 at 20:43:01 UTC, Igor wrote:
 I opened a feature request on github. I also tried using the 
 gccbuiltins but I got this error:

 LLVM ERROR: Cannot select: 0x2199c96fd70: v16i8 = 
 X86ISD::PSHUFB 0x2199c74e9a8, 0x2199c74d6c0

That's because SSSE3 instructions are not enabled by default, so 
the compiler isn't allowed to generate the PSHUFB instruction.
Some options you have:
1. Set a cpu that has ssse3, e.g. compile with `-mcpu=native`
2. Enable SSSE3: compile with `-mattr=+ssse3`
3. Perhaps best for your case, enable SSSE3 for that function, 
importing the ldc.attributes module and using the 
 target("ssse3") UDA on that function.

-Johan

Sep 07 2017

Igor <stojkovic.igor gmail.com> writes:

On Thursday, 7 September 2017 at 15:24:13 UTC, Johan Engelen 
wrote:
 On Wednesday, 6 September 2017 at 20:43:01 UTC, Igor wrote:
 I opened a feature request on github. I also tried using the 
 gccbuiltins but I got this error:

 LLVM ERROR: Cannot select: 0x2199c96fd70: v16i8 = 
 X86ISD::PSHUFB 0x2199c74e9a8, 0x2199c74d6c0

 That's because SSSE3 instructions are not enabled by default, 
 so the compiler isn't allowed to generate the PSHUFB 
 instruction.
 Some options you have:
 1. Set a cpu that has ssse3, e.g. compile with `-mcpu=native`
 2. Enable SSSE3: compile with `-mattr=+ssse3`
 3. Perhaps best for your case, enable SSSE3 for that function, 
 importing the ldc.attributes module and using the 
  target("ssse3") UDA on that function.

 -Johan

Thanks Johan. I tried this and now it does compile but it crashes 
with Access Violation in debug build. In optimized build it seems 
to be working though.

Sep 07 2017

Igor <stojkovic.igor gmail.com> writes:

On Thursday, 7 September 2017 at 16:45:40 UTC, Igor wrote:
 On Thursday, 7 September 2017 at 15:24:13 UTC, Johan Engelen 
 wrote:
 On Wednesday, 6 September 2017 at 20:43:01 UTC, Igor wrote:
 I opened a feature request on github. I also tried using the 
 gccbuiltins but I got this error:

 LLVM ERROR: Cannot select: 0x2199c96fd70: v16i8 = 
 X86ISD::PSHUFB 0x2199c74e9a8, 0x2199c74d6c0

 That's because SSSE3 instructions are not enabled by default, 
 so the compiler isn't allowed to generate the PSHUFB 
 instruction.
 Some options you have:
 1. Set a cpu that has ssse3, e.g. compile with `-mcpu=native`
 2. Enable SSSE3: compile with `-mattr=+ssse3`
 3. Perhaps best for your case, enable SSSE3 for that function, 
 importing the ldc.attributes module and using the 
  target("ssse3") UDA on that function.

 -Johan

 Thanks Johan. I tried this and now it does compile but it 
 crashes with Access Violation in debug build. In optimized 
 build it seems to be working though.

I will try to reproduce this in minimal project and open LDC bug 
if successful.

In the meantime can anyone tell me how to add an attribute to a 
function only if something is defined, since this doesn't work:

version(USE_SIMD_WITH_LDC) {
   import ldc.attributes;
    target("ssse3")
} void funcThatUsesSIMD() {
   ...
   version(LDC) {
     import ldc.gccbuiltins_x86;
     c = __builtin_ia32_pshufb128(c, *simdMasks);
   } else {
     c = __simd(XMM.PSHUFB, c, *simdMasks);
   }
   ...
}

Sep 11 2017

Igor <stojkovic.igor gmail.com> writes:

On Monday, 11 September 2017 at 11:55:45 UTC, Igor wrote:
 In the meantime can anyone tell me how to add an attribute to a 
 function only if something is defined, since this doesn't work:

 version(USE_SIMD_WITH_LDC) {
   import ldc.attributes;
    target("ssse3")
 } void funcThatUsesSIMD() {
   ...
   version(LDC) {
     import ldc.gccbuiltins_x86;
     c = __builtin_ia32_pshufb128(c, *simdMasks);
   } else {
     c = __simd(XMM.PSHUFB, c, *simdMasks);
   }
   ...
 }

Regarding the crash in debug mode the problem was that my masks 
variable wasn't properly aligned and I guess the best I can do 
with the attribute is this:

version(LDC) import ldc.attributes;
else private struct target { string specifier; }
 target("ssse3")
void funcThatUsesSIMD() {...}

Sep 11 2017

D Programming

C/C++ Programming

Other

digitalmars.D.learn - SIMD under LDC