www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - intel-intrinsics v1.0.0

reply Guillaume Piolat <first.last gmail.com> writes:
"intel-intrinsics" is a DUB package for people interested in x86 
performance that want neither to write assembly, nor a 
LDC-specific snippet... and still have fastest possible code.

Available through DUB: 
http://code.dlang.org/packages/intel-intrinsics


*** Features of v1.1.0:

- All intrinsics in this list: 
https://software.intel.com/sites/landingpage/IntrinsicsGuide
#techs=MMX,SSE,SSE2 Use existing Intel documentation and syntax

- write the same code for both DMD and LDC, in the last 6 
versions for each. (Note that debug performance might suffer a 
lot when no inlining is activated.)

- Use operators on SIMD vectors as if core.simd were implemented 
on DMD 32-bit

- Introduces int2 and float2 because short SIMD vectors are useful

- about 6000 LOC (for now! more to come)

- Bonus: approximated pow/exp/log. Perform 4 approximated pow at 
once.


<future>
The long-term goal for this library is to be _only about 
semantics_, and not particularly codegen(!). This is because LLVM 
IR is portable, so forcing a particular instruction is undoing 
this portability work. **This can seem odd** for an "intrinsics" 
library but this way exact codegen options can be choosen by the 
library user, and most intrinsics can gracefuly degrade to 
portable IR in theory.

In the future, "magic" LLVM intrinsics will only be used when 
built for x86, but I think all of it can become portable and not 
x86-specific. Besides, there is a trend in LLVM to remove magic 
intrinsics once they are doable with IR only.
</future>


tl;dr you can use "intel-intrinsics" today, and get quite-optimal 
code with LDC, without duplication. You may come across early 
bugs too.
http://code.dlang.org/packages/intel-intrinsics

(note: it's important to bench against vanilla D code or arrays 
ops too, in some case the vanilla code wins)
Feb 05
parent reply Simen =?UTF-8?B?S2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:
On Wednesday, 6 February 2019 at 01:05:29 UTC, Guillaume Piolat 
wrote:
 "intel-intrinsics" is a DUB package for people interested in 
 x86 performance that want neither to write assembly, nor a 
 LDC-specific snippet... and still have fastest possible code.
Neat. Question: On Github it's stated that implicit conversions aren't supported, with this example: __m128i b = _mm_set1_epi32(42); __m128 a = b; // NO, only works in LDC Couldn't this be solved through something like this: struct __m128 { float4 value; alias value this; void opAssign(__m128i rhs) { value = cast(float4)rhs.value; } } -- Simen
Feb 05
parent Guillaume Piolat <first.last gmail.com> writes:
On Wednesday, 6 February 2019 at 07:41:25 UTC, Simen Kjærås wrote:
 struct __m128 {
     float4 value;
     alias value this;
     void opAssign(__m128i rhs) {
         value = cast(float4)rhs.value;
     }
 }

 --
   Simen
The problem is that when you emulate core.simd (DMD 32-bit on Windows require that, if you want super fast OPTLINK build times), then you have no way to have user-defined implicit conversions. and magic vector types from the compiler float4 / int4 / short8 / long2 / byte16 are all implicitely convertible to each other, but I don't think we can replicate this.
Feb 06