digitalmars.D.ldc - Using SSE3 vector shuffel with LDC

KytoDragon (13/13) May 26 2019 I have been trying to port some programs to D that heavely use

kinke (37/50) May 26 2019 There's https://github.com/AuburnSounds/intel-intrinsics which

KytoDragon (6/17) May 26 2019 Thank You! I already have the intel-intrinsics package, that one

KytoDragon (8/8) May 26 2019 After tinkering with ldc.llvmasm (and figuring out that the asm

kinke (3/11) May 26 2019 Excellent. Wrt. order, yeah, LLVM uses AT&T syntax. Guillaume

Guillaume Piolat (3/16) May 29 2019 Absolutely, SSE3 up to SSE4.2 are on the roadmap, there were just

Nicholas Wilson (3/16) May 26 2019 Have you seen https://github.com/AuburnSounds/intel-intrinsics ?

KytoDragon <kytodragon e.mail.de> writes:

I have been trying to port some programs to D that heavely use 
SSE instructions.
In particular, i still need _mm_shuffle_epi8, _mm_alignr_epi8 and 
_mm_aesdec_si128.
LDC does not support the core.simd approach and ldc.simd only 
supports a few operations, including a vector shuffel with a 
fixed mask (I need a variable mask).
So how would one go about using theese with LDC?

I need to be able to:
- consistently generate SSE instruction, even in debug builds.
- inline the function.

I have been unable to find a solution using either the simd 
package, inline asm or inline llvm-ir.

May 26 2019

kinke <noone nowhere.com> writes:

On Sunday, 26 May 2019 at 12:10:30 UTC, KytoDragon wrote:
 I have been trying to port some programs to D that heavely use 
 SSE instructions.
 In particular, i still need _mm_shuffle_epi8, _mm_alignr_epi8 
 and _mm_aesdec_si128.
 LDC does not support the core.simd approach and ldc.simd only 
 supports a few operations, including a vector shuffel with a 
 fixed mask (I need a variable mask).
 So how would one go about using theese with LDC?

 I need to be able to:
 - consistently generate SSE instruction, even in debug builds.
 - inline the function.

 I have been unable to find a solution using either the simd 
 package, inline asm or inline llvm-ir.

There's https://github.com/AuburnSounds/intel-intrinsics which 
tries to be compatible with the Intel intrinsic names.

_mm_aesdec_si128 is available in ldc.gccbuiltins_x86 as 
__builtin_ia32_aesdec128; _mm_shuffle_epi8 as 
__builtin_ia32_pshufb128. Make sure to specify that the 
instructions are available via something like `-mattr=+ssse3` in 
the LDC command line.
I haven't found something corresponding to _mm_alignr_epi8, but 
inline asm can always be used. Here's an example for a manual 
__builtin_ia32_pshufb128 using LLVM inline assembly:

alias byte16 = __vector(byte[16]);

version (Manual)
{
     pragma(inline, true)
     byte16 _mm_shuffle_epi8(byte16 a, byte16 b)
     {
         import ldc.llvmasm;
         return __asm!byte16("pshufb $2, $1", "=x,0,x", a, b);
     }
}
else
{
     import ldc.gccbuiltins_x86 : _mm_shuffle_epi8 = 
__builtin_ia32_pshufb128;
}

void main()
{
     byte16 a = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 
14, 15 ];
     byte16 b = [ -1, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 
13, 12 ];
     const actual = _mm_shuffle_epi8(a, b);
     byte16 expected = b;
     expected[0] = 0;
     assert(actual == expected);
}

May 26 2019

KytoDragon <kytodragon e.mail.de> writes:

On Sunday, 26 May 2019 at 13:54:32 UTC, kinke wrote:
 There's https://github.com/AuburnSounds/intel-intrinsics which 
 tries to be compatible with the Intel intrinsic names.

 _mm_aesdec_si128 is available in ldc.gccbuiltins_x86 as 
 __builtin_ia32_aesdec128; _mm_shuffle_epi8 as 
 __builtin_ia32_pshufb128. Make sure to specify that the 
 instructions are available via something like `-mattr=+ssse3` 
 in the LDC command line.
 I haven't found something corresponding to _mm_alignr_epi8, but 
 inline asm can always be used. Here's an example for a manual 
 __builtin_ia32_pshufb128 using LLVM inline assembly:
 <skip>

Thank You! I already have the intel-intrinsics package, that one 
just didn't have these spefic ones. I also didn't know about the 
sse compiler option, got that working now.
Concerning inline asm I thought that that prevents inlining, I 
will try out ldc.llvmasm .

May 26 2019

KytoDragon <kytodragon e.mail.de> writes:

After tinkering with ldc.llvmasm (and figuring out that the asm 
argument a specified in reverse order) i have got everything 
working. E.g.

__m128i _mm_alignr_epi8(u8 count)(__m128i A, __m128i B) {
     return __asm!__m128i("palignr $3, $2, $1", "=x,0,x,i", A, B, 
count);
}

Thank you again!

May 26 2019

kinke <noone nowhere.com> writes:

On Sunday, 26 May 2019 at 16:35:48 UTC, KytoDragon wrote:
 After tinkering with ldc.llvmasm (and figuring out that the asm 
 argument a specified in reverse order) i have got everything 
 working. E.g.

 __m128i _mm_alignr_epi8(u8 count)(__m128i A, __m128i B) {
     return __asm!__m128i("palignr $3, $2, $1", "=x,0,x,i", A, 
 B, count);
 }

 Thank you again!

Excellent. Wrt. order, yeah, LLVM uses AT&T syntax. Guillaume 
would surely welcome an intel-intrinsics PR. :)

May 26 2019

Guillaume Piolat <first.last gmail.com> writes:

On Sunday, 26 May 2019 at 16:40:58 UTC, kinke wrote:
 On Sunday, 26 May 2019 at 16:35:48 UTC, KytoDragon wrote:
 After tinkering with ldc.llvmasm (and figuring out that the 
 asm argument a specified in reverse order) i have got 
 everything working. E.g.

 __m128i _mm_alignr_epi8(u8 count)(__m128i A, __m128i B) {
     return __asm!__m128i("palignr $3, $2, $1", "=x,0,x,i", A, 
 B, count);
 }

 Thank you again!

 Excellent. Wrt. order, yeah, LLVM uses AT&T syntax. Guillaume 
 would surely welcome an intel-intrinsics PR. :)

Absolutely, SSE3 up to SSE4.2 are on the roadmap, there were just 
a lack of people showing up with more needs.

May 29 2019

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Sunday, 26 May 2019 at 12:10:30 UTC, KytoDragon wrote:
 I have been trying to port some programs to D that heavely use 
 SSE instructions.
 In particular, i still need _mm_shuffle_epi8, _mm_alignr_epi8 
 and _mm_aesdec_si128.
 LDC does not support the core.simd approach and ldc.simd only 
 supports a few operations, including a vector shuffel with a 
 fixed mask (I need a variable mask).
 So how would one go about using theese with LDC?

 I need to be able to:
 - consistently generate SSE instruction, even in debug builds.
 - inline the function.

 I have been unable to find a solution using either the simd 
 package, inline asm or inline llvm-ir.

Have you seen https://github.com/AuburnSounds/intel-intrinsics ? 
( see also http://dconf.org/2019/talks/piolat.html)

May 26 2019

D Programming

C/C++ Programming

Other

digitalmars.D.ldc - Using SSE3 vector shuffel with LDC