www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Spec for the =?UTF-8?B?4oCYbG9jYWxpdHnigJk=?= parameter to the LDC and

reply Cecil Ward <cecil cecilward.com> writes:
I’m trying to write a cross-platform function that gives access 
to the CPU’s prefetch instructions such as x86 
prefetch0/1/2/prefetchnta and AAarch64 too. I’ve found that the 
GDC and LDC compilers provide builtin magic functions for this, 
and are what I need. I am trying to put together a plain-English 
detailed spec for the respective builtin magic functions.

My questions:

Q1) I need to compare the spec for the GCC and LDC builtin magic 
functions’ "locality" parameter. Can anyone tell me if GDC and 
LDC have kept mutual compatibility here?

Q2) Could someone help me turn the GCC and LDC specs into english 
regarding the locality parameter ? - see (2) and (4) below.

Q3) Does the locality parameter determine which _level_ of the 
data cache hierarchy data is fetched into? Or is it always 
fetched into L1 data cache and the outer ones, and this parameter 
affects caches’ _future behaviour_?

Q3) Will these magic builtins work on AAarch64?

Here’s what I’ve found so far

1. GCC builtin published by the D runtime:
    import gcc.simd : 
prefetch;
	    	prefetch!( rw, locality )( p );

    2. GCC: builtin_prefetch (const void *addr, ...) ¶
“This function is used to minimize cache-miss latency by moving 
data into a cache before it is accessed. You can insert calls to 
__builtin_prefetch into code for which you know addresses of data 
in memory that is likely to be accessed soon. If the target 
supports them, data prefetch instructions are generated. If the 
prefetch is done early enough before the access then the data 
will be in the cache by the time it is accessed.
The value of addr is the address of the memory to prefetch. There 
are two optional arguments, rw and locality. The value of rw is a 
compile-time constant one or zero; one means that the prefetch is 
preparing for a write to the memory address and zero, the 
default, means that the prefetch is preparing for a read. The 
value locality must be a compile-time constant integer between 
zero and three. A value of zero means that the data has no 
temporal locality, so it need not be left in the cache after the 
access. A value of three means that the data has a high degree of 
temporal locality and should be left in all levels of cache 
possible. Values of one and two mean, respectively, a low or 
moderate degree of temporal locality. The default is three.”

3. declare void  llvm.prefetch(ptr <address>, i32 <rw>, i32 
<locality>, i32 <cache type>

4. Regarding llvm.prefetch() I found the following spec:
“rw is the specifier determining if the fetch should be for a 
read (0) or write (1), and locality is a temporal locality 
specifier ranging from (0) - no locality, to (3) - extremely 
local keep in cache. The cache type specifies whether the 
prefetch is performed on the data (1) or instruction (0) cache. 
The rw, locality and cache type arguments must be constant 
integers.”

5. I also found this snippet 
https://dlang.org/phobos/core_builtins.html - which is great for 
the syntax of the call to the LDC builtin, but the call for GDC 
is no good as it lacks the parameters that I want. This D runtime 
routine might benefit from accepting all the parameters that 
GCC’s prefetch builtin takes.

Many thanks in advance.
Aug 19 2023
next sibling parent Iain Buclaw <ibuclaw gdcproject.org> writes:
On Saturday, 19 August 2023 at 19:23:38 UTC, Cecil Ward wrote:
 I’m trying to write a cross-platform function that gives access 
 to the CPU’s prefetch instructions such as x86 
 prefetch0/1/2/prefetchnta and AAarch64 too. I’ve found that the 
 GDC and LDC compilers provide builtin magic functions for this, 
 and are what I need. I am trying to put together a 
 plain-English detailed spec for the respective builtin magic 
 functions.

 My questions:

 Q1) I need to compare the spec for the GCC and LDC builtin 
 magic functions’ "locality" parameter. Can anyone tell me if 
 GDC and LDC have kept mutual compatibility here?
I'd have thought GCC and LLVM have mutual compatibility thanks to a common target API in Intel's `_mm_prefetch()` function (and in fact, the magic locality numbers match `_MM_HINT_*` constants). ``` #define _MM_HINT_T0 1 #define _MM_HINT_T1 2 #define _MM_HINT_T2 3 #define _MM_HINT_NTA 0 ```
 Q2) Could someone help me turn the GCC and LDC specs into 
 english regarding the locality parameter ? - see (2) and (4) 
 below.
https://gcc.gnu.org/projects/prefetch.html
 Q3) Does the locality parameter determine which _level_ of the 
 data cache hierarchy data is fetched into? Or is it always 
 fetched into L1 data cache and the outer ones, and this 
 parameter affects caches’ _future behaviour_?
It really depends on the CPU, and what features it has. x86 SSE intrinsics are described in the x86 instruction manual, along with the meaning of T[012], and NTA. https://www.felixcloutier.com/x86/prefetchh
 Q3) Will these magic builtins work on AAarch64?
It'll work on all targets that define a prefetch insn, or it'll be a no-op. Similarly one or both read-write or locality arguments might be ignored too.
Aug 20 2023
prev sibling parent Guillaume Piolat <first.name gmail.com> writes:
On Saturday, 19 August 2023 at 19:23:38 UTC, Cecil Ward wrote:
 I’m trying to write a cross-platform function that gives access 
 to the CPU’s prefetch instructions such as x86 
 prefetch0/1/2/prefetchnta and AAarch64 too. I’ve found that the 
 GDC and LDC compilers provide builtin magic functions for this, 
 and are what I need. I am trying to put together a 
 plain-English detailed spec for the respective builtin magic 
 functions.
Have you found that? https://github.com/AuburnSounds/intel-intrinsics/blob/002da84215a58f098cee671c5ba4ab6052613865/source/inteli/xmmintrin.d#L1935C9-L1935C9
Aug 22 2023