www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - SSE and AVX with D

reply "Pavel Umnikov" <pavelumnikov gmail.com> writes:
Hello everyone,

I am just recently jumped to D Language from C++ and want to 
rewrite my current engine from scratch using this language. My 
math and physics libraries were written utilizing many SSE 
functions(and AVX if such CPU is presented). Can I use SSE/AVX 
code in D? SSE/AVX direct intrinsics or Assember inlining with 
SSE/AVX?

Thanks!
May 15 2012
next sibling parent reply =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <xtzgzorex gmail.com> writes:
On 15-05-2012 16:27, Pavel Umnikov wrote:
 Hello everyone,

 I am just recently jumped to D Language from C++ and want to rewrite my
 current engine from scratch using this language. My math and physics
 libraries were written utilizing many SSE functions(and AVX if such CPU
 is presented). Can I use SSE/AVX code in D? SSE/AVX direct intrinsics or
 Assember inlining with SSE/AVX?

 Thanks!

Have a look at these: * http://dlang.org/phobos/core_cpuid.html * http://dlang.org/iasm.html * http://dlang.org/simd.html -- - Alex
May 15 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/15/2012 9:39 AM, jerro wrote:
 Note that core.simd currently only defines SSE intrinsics for
 instructions of the form

 INSTRUCTION xmm1, xmm2/m128

 which means that instructions such as shufps are not supported.
 You could take a look at gdc, which provides gcc builtins
 through module gcc.builtins. To find the builtin names you can
 take a look at gcc implementation of xmmintrin.h. GDC also
 produces faster code than DMD, especially for floating point
 code. It does not yet support AVX, though.

 If you want to use AVX for operations that don't have an
 operator, currently your only choice (AFAIK) is to use LDC
 and an ugly workaround that I used at
 https://github.com/jerro/pfft. You write your"intrinsics"
 in c and use clang to compile them to .bc (or write a .ll
 file manually if you know the llvm assembly language). Then
 you compile your D code with LLVM using the flags -output-bc
 and -single-obj. You merge the resulting .bc file with the
 "intrinsics" file using llvm-link, then optimize it using
 opt and convert them to assembly using llc. Here is an
 example:

 https://github.com/jerro/pfft/blob/master/build-ldc2.sh

 I have only tried this on linux.

You can use the inline assembler for shufps, also for AVX.
May 15 2012
prev sibling next sibling parent "Pavel Umnikov" <pavelumnikov gmail.com> writes:
On Tuesday, 15 May 2012 at 14:28:51 UTC, Alex Rønne Petersen 
wrote:
 On 15-05-2012 16:27, Pavel Umnikov wrote:
 Hello everyone,

 I am just recently jumped to D Language from C++ and want to 
 rewrite my
 current engine from scratch using this language. My math and 
 physics
 libraries were written utilizing many SSE functions(and AVX if 
 such CPU
 is presented). Can I use SSE/AVX code in D? SSE/AVX direct 
 intrinsics or
 Assember inlining with SSE/AVX?

 Thanks!

Have a look at these: * http://dlang.org/phobos/core_cpuid.html * http://dlang.org/iasm.html * http://dlang.org/simd.html

Thank you, Alex!
May 15 2012
prev sibling next sibling parent "jerro" <a a.com> writes:
On Tuesday, 15 May 2012 at 14:32:20 UTC, Pavel Umnikov wrote:
 On Tuesday, 15 May 2012 at 14:28:51 UTC, Alex Rønne Petersen 
 wrote:
 On 15-05-2012 16:27, Pavel Umnikov wrote:
 Hello everyone,

 I am just recently jumped to D Language from C++ and want to 
 rewrite my
 current engine from scratch using this language. My math and 
 physics
 libraries were written utilizing many SSE functions(and AVX 
 if such CPU
 is presented). Can I use SSE/AVX code in D? SSE/AVX direct 
 intrinsics or
 Assember inlining with SSE/AVX?

 Thanks!

Have a look at these: * http://dlang.org/phobos/core_cpuid.html * http://dlang.org/iasm.html * http://dlang.org/simd.html

Thank you, Alex!

Note that core.simd currently only defines SSE intrinsics for instructions of the form INSTRUCTION xmm1, xmm2/m128 which means that instructions such as shufps are not supported. You could take a look at gdc, which provides gcc builtins through module gcc.builtins. To find the builtin names you can take a look at gcc implementation of xmmintrin.h. GDC also produces faster code than DMD, especially for floating point code. It does not yet support AVX, though. If you want to use AVX for operations that don't have an operator, currently your only choice (AFAIK) is to use LDC and an ugly workaround that I used at https://github.com/jerro/pfft. You write your"intrinsics" in c and use clang to compile them to .bc (or write a .ll file manually if you know the llvm assembly language). Then you compile your D code with LLVM using the flags -output-bc and -single-obj. You merge the resulting .bc file with the "intrinsics" file using llvm-link, then optimize it using opt and convert them to assembly using llc. Here is an example: https://github.com/jerro/pfft/blob/master/build-ldc2.sh I have only tried this on linux.
May 15 2012
prev sibling parent "jerro" <a a.com> writes:
 You can use the inline assembler for shufps, also for AVX.

Of course you can, I forgot to mention that. I do that in parts of pfft when it is compiled using DMD (but only for SSE). But because of the overhead of copying values from the stack to registers and back to the stack or calling a function it only makes sense to do that when the chunk of code you are replacing with inline assmbly takes longer than a few cycles. This forces you to write larger chunks of code in inline assembly, which is not always practical.
May 15 2012