www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - bounds checking and optimization

reply Bruce Carneal <bcarneal gmail.com> writes:
The following code auto vectorizes nicely, as you'd hope:

     void abc(const(int)[] a, const(int)[] b, int[] c)  nogc 
nothrow pure  trusted {
         const bound = c.length;
         (a.length == bound) || assert(0);
         (b.length == bound) || assert(0);
         foreach(i; 0..bound)
             c.ptr[i] = a.ptr[i] + b.ptr[i];
     }

If you drop the .ptr suffixes the bounds checks are elided, 
correctly, but the code is not auto vectorized.

Similarly, if you drop the .ptr suffixes and compile  safe, the 
bounds checks are elided but the code is not auto vectorized.

It would be great if known-within-bound indexing could yield fast 
code without requiring either .ptr or  trusted.  IOW, I'm hoping 
for faster safe code.  Is that easily achieved in cases like the 
above or is it quite hard?
Jun 27 2021
parent reply Johan <j j.nl> writes:
On Sunday, 27 June 2021 at 17:49:22 UTC, Bruce Carneal wrote:
 The following code auto vectorizes nicely, as you'd hope:

     void abc(const(int)[] a, const(int)[] b, int[] c)  nogc 
 nothrow pure  trusted {
         const bound = c.length;
         (a.length == bound) || assert(0);
         (b.length == bound) || assert(0);
         foreach(i; 0..bound)
             c.ptr[i] = a.ptr[i] + b.ptr[i];
     }

 If you drop the .ptr suffixes the bounds checks are elided, 
 correctly, but the code is not auto vectorized.

 Similarly, if you drop the .ptr suffixes and compile  safe, the 
 bounds checks are elided but the code is not auto vectorized.

 It would be great if known-within-bound indexing could yield 
 fast code without requiring either .ptr or  trusted.  IOW, I'm 
 hoping for faster safe code.  Is that easily achieved in cases 
 like the above or is it quite hard?
Vectorization does happen when running it again through the optimization pipeline: D --> LLVM IR: https://d.godbolt.org/ IR --> opt --> IR: https://opt.godbolt.org/ This means that we can solve it by reordering/optimizing our optimization pipeline in LDC. It is non-trivial work and I would suggest not to undertake it before we have switched to LLVM's new PassManager (because I think that also tweaks the order of optimization passes) -Johan
Jun 29 2021
parent reply Bruce Carneal <bcarneal gmail.com> writes:
On Tuesday, 29 June 2021 at 11:20:19 UTC, Johan wrote:
 On Sunday, 27 June 2021 at 17:49:22 UTC, Bruce Carneal wrote:
 [snip]
[snip] This means that we can solve it by reordering/optimizing our optimization pipeline in LDC. It is non-trivial work and I would suggest not to undertake it before we have switched to LLVM's new PassManager (because I think that also tweaks the order of optimization passes) -Johan
Thanks for the update Johan. I'll keep working with the "manual" version, trusted and .ptr, and watch for any updates. You guys seem pretty busy already so I hope it fits in with your other work. A related question: is there anything preventing the compiler from lifting range checks out of loops and throwing early? For example, on something like: foreach(i, ref dst; c[]) dst = a[i] + b[i]; could c.length be extracted and asserted as being leq a.length and b.length before we drop in to the loop? Finally, it seems like bounds checking optimizations of this sort belong in the front end or am I mistaken?
Jun 29 2021
parent reply Johan <j j.nl> writes:
On Wednesday, 30 June 2021 at 01:18:53 UTC, Bruce Carneal wrote:
 A related question: is there anything preventing the compiler 
 from lifting range checks out of loops and throwing early?  For 
 example, on something like:

     foreach(i, ref dst; c[])
         dst = a[i] + b[i];

 could c.length be extracted and asserted as being leq a.length 
 and b.length before we drop in to the loop?
That would be observable behavior change, so not possible. Unless the behavior upon out-of-bounds access is defined as UB.
 Finally, it seems like bounds checking optimizations of this 
 sort belong in the front end or am I mistaken?
I am wary of any optimization actually performed (rather than checking for validity) in the frontend. -Johan
Jun 30 2021
next sibling parent reply Bruce Carneal <bcarneal gmail.com> writes:
On Wednesday, 30 June 2021 at 11:43:08 UTC, Johan wrote:
 On Wednesday, 30 June 2021 at 01:18:53 UTC, Bruce Carneal wrote:
 A related question: is there anything preventing the compiler 
 from lifting range checks out of loops and throwing early?  
 For example, on something like:

     foreach(i, ref dst; c[])
         dst = a[i] + b[i];

 could c.length be extracted and asserted as being leq a.length 
 and b.length before we drop in to the loop?
That would be observable behavior change, so not possible. Unless the behavior upon out-of-bounds access is defined as UB.
So a compiler that could pull bounds checking out of a loop would need to generate code that would run a potentially shortened loop and then throw if OOB. Something like: const __bound = min(a.length, b.length, c.length); const __throwAtEnd = a.length < __bound || b.length < __bound; foreach(i, ref dst; c[0..__bound]) dst = a[i] + b[i]; if (__throwAtEnd) { /* throw */ } The above is ugly enough that I imagine it puts enabling speedy safe loops of this type pretty far down on the TODO list. I now suspect that there may be an easier path to safe data parallel code: we improve code gen for sequences of operations on slices (single bounds check at the top of the basic block and operation fusing to reduce the memory touches). Thanks for the info.
Jun 30 2021
next sibling parent Elronnd <elronnd elronnd.net> writes:
On Wednesday, 30 June 2021 at 15:51:58 UTC, Bruce Carneal wrote:
 So a compiler that could pull bounds checking out of a loop 
 would need to generate code that would run a potentially 
 shortened loop and then throw if OOB.
This is effectively how loop unrolling works in hotspot.
Jun 30 2021
prev sibling parent Bruce Carneal <bcarneal gmail.com> writes:
On Wednesday, 30 June 2021 at 15:51:58 UTC, Bruce Carneal wrote:
 ...

     const __bound = min(a.length, b.length, c.length);
     const __throwAtEnd = a.length < __bound || b.length < 
 __bound;
The compiler emitted throw flag in the pseudo-code should be: const __throwAtEnd = a.length < c.length || b.length < c.length; Sorry bout the noise. I appreciate the later contributors having read through it to the intent. Glad to hear that there may be some paths open to faster safe code in the future.
Jun 30 2021
prev sibling parent "David Nadlinger" <code klickverbot.at> writes:
On 30 Jun 2021, at 12:43, Johan via digitalmars-d-ldc wrote:
 On Wednesday, 30 June 2021 at 01:18:53 UTC, Bruce Carneal wrote:
 could c.length be extracted and asserted as being leq a.length and 
 b.length before we drop in to the loop?
That would be observable behavior change, so not possible. Unless the behavior upon out-of-bounds access is defined as UB.
There might be LLVM optimisation passes to recognise the range checks and coalesce them into a single one before the loop header, though – I vaguely remember seeing something like this in the tree at some point, developed with a few towards Java, and not enabled by default in the PassManagerBuilder. We might be able to reuse/adapt that logic. — David
Jun 30 2021