www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - What is the better signature for this?

reply Guillaume Piolat <first.last spam.org> writes:
Consider the following "intrinsic" signature.

     __m256i _mm256_loadu_si256 (const(__m256i)* mem_addr) pure 
 trusted;  // (A)


The intel intrinsics signature have the problem that you must 
pass an implictely aligned `__m256i` (aka `long4`), however the 
pointer doesn't need to be aligned for an unaligned load. So, 
this is a bit playing with the type system. Inside the 
"intrinsic" implementation, nothing should use that non-existent 
alignment. Though in a way that hasn't blown up yet.

It is tempting to fix that and just take a long* or void* instead.

     __m256i _mm256_loadu_si256 (const(void)* mem_addr) pure 
 system;      // (B)

However, in that case, the function is not ` trusted` anymore, 
but becomes ` system`.
Indeed, it is safe to dereference a pointer, but not index from 
it.

What about `float[4]` then? We can get back ` trusted`.


      __m256i _mm256_loadu_si256 (const(float[4])* mem_addr) pure 
 trusted; // (C)


Then, we loose compatibility ith intrinsics code originally 
written in C++. Casting to `const(float[4])*` is even more 
annoying to type than casting to `const(__m256i)*`.

What do you think is the better signature?
I'd prefer to go A > B > C, but figured I might be missing 
something.
Oct 09 2022
next sibling parent Guillaume Piolat <first.last spam.org> writes:
On Sunday, 9 October 2022 at 19:44:13 UTC, Guillaume Piolat wrote:
 What about `float[4]` then? We can get back ` trusted`.


      __m256i _mm256_loadu_si256 (const(float[4])* mem_addr) 
 pure  trusted; // (C)


 Then, we loose compatibility ith intrinsics code originally 
 written in C++. Casting to `const(float[4])*` is even more 
 annoying to type than casting to `const(__m256i)*`.
Erratum: it is `long[4]`, not `float[4]`
Oct 09 2022
prev sibling parent reply Bruce Carneal <bcarneal gmail.com> writes:
On Sunday, 9 October 2022 at 19:44:13 UTC, Guillaume Piolat wrote:
 Consider the following "intrinsic" signature.

 ...

 What do you think is the better signature?
 I'd prefer to go A > B > C, but figured I might be missing 
 something.
Could using the static array representation type of the vector (.array) be a useful idiom here? I ask because I don't know the constraints/preferences of veteran intrinsic programmers. That idiom does work well in other SIMD formulations but may not be well suited here.
Oct 10 2022
parent reply Guillaume Piolat <first.last spam.org> writes:
On Monday, 10 October 2022 at 12:31:04 UTC, Bruce Carneal wrote:
 On Sunday, 9 October 2022 at 19:44:13 UTC, Guillaume Piolat 
 wrote:
 Consider the following "intrinsic" signature.

 ...

 What do you think is the better signature?
 I'd prefer to go A > B > C, but figured I might be missing 
 something.
Could using the static array representation type of the vector (.array) be a useful idiom here? I ask because I don't know the constraints/preferences of veteran intrinsic programmers. That idiom does work well in other SIMD formulations but may not be well suited here.
That is solution C. It could work. The slight problem is that function that takes __m128i* use that as "any packed integer taking 128-bit" space, and it's not immediately obvious that __m128i is int4 and __m256i is long4, it's rather counterintuitive. Smenatically, it could be short8 or byte16... GCC vectors can be unaligned, and there are types for it (eg: __m128i_u), but I don't think the other compilers can do that. That would be a prime contender.
Oct 10 2022
next sibling parent Johan <j j.nl> writes:
On Monday, 10 October 2022 at 12:44:18 UTC, Guillaume Piolat 
wrote:
 GCC vectors can be unaligned, and there are types for it (eg: 
 __m128i_u), but I don't think the other compilers can do that. 
 That would be a prime contender.
I think you should be able to define the unaligned type like this: ``` struct __m128u { align(1) __m128 data; alias data this; } ``` It works, but I am not 100% sure if this type will always behave the same (ABI) as __m128 when used as value, e.g. when passing to a function (`void fun(__m128u a, __m128u b`, passed in simd register?). But unfortunately currently it runs into this LDC bug: https://github.com/ldc-developers/ldc/issues/4236 . cheers, Johan
Oct 11 2022
prev sibling parent reply Kagamin <spam here.lot> writes:
On Monday, 10 October 2022 at 12:44:18 UTC, Guillaume Piolat 
wrote:
 That is solution C.
 It could work.
Static array works like this: ``` int Load4LE(in ref ubyte[4] b) pure { return (b[3]<<24)|(b[2]<<16)|(b[1]<<8)|b[0]; } ubyte[] data; int val=Load4LE(data[0..4]); ``` It's safe, bound checked, ctfeable, no casts.
 The slight problem is that function that takes __m128i* use 
 that as "any packed integer taking 128-bit" space, and it's not 
 immediately obvious that __m128i is int4 and __m256i is long4, 
 it's rather counterintuitive. Smenatically, it could be short8 
 or byte16...
There's no solution, only tradeoffs.
Oct 11 2022
parent Bruce Carneal <bcarneal gmail.com> writes:
On Tuesday, 11 October 2022 at 12:51:13 UTC, Kagamin wrote:
 On Monday, 10 October 2022 at 12:44:18 UTC, Guillaume Piolat 
 wrote:
 That is solution C.
 It could work.
Static array works like this: ``` int Load4LE(in ref ubyte[4] b) pure { return (b[3]<<24)|(b[2]<<16)|(b[1]<<8)|b[0]; } ubyte[] data; int val=Load4LE(data[0..4]); ``` It's safe, bound checked, ctfeable, no casts.
Yes. Starting at line 57 you'll find examples of the above for a target-adaptive/generic environment: https://godbolt.org/z/qW6PYT3Yd I've not found a way to trigger those one-instruction unaligned loads from DMD but ldc and gdc are doing great.
Oct 11 2022