digitalmars.D.ldc - bounds checking and optimization

Bruce Carneal (17/17) Jun 27 2021 The following code auto vectorizes nicely, as you'd hope:

Johan (11/28) Jun 29 2021 Vectorization does happen when running it again through the

Bruce Carneal (14/23) Jun 29 2021 Thanks for the update Johan. I'll keep working with the "manual"

Johan (6/15) Jun 30 2021 That would be observable behavior change, so not possible. Unless

Bruce Carneal (16/29) Jun 30 2021 So a compiler that could pull bounds checking out of a loop would

Elronnd (2/5) Jun 30 2021 This is effectively how loop unrolling works in hotspot.
Bruce Carneal (8/12) Jun 30 2021 The compiler emitted throw flag in the pseudo-code should be:

David Nadlinger (7/12) Jun 30 2021 There might be LLVM optimisation passes to recognise the range checks

Bruce Carneal <bcarneal gmail.com> writes:

The following code auto vectorizes nicely, as you'd hope:

     void abc(const(int)[] a, const(int)[] b, int[] c)  nogc 
nothrow pure  trusted {
         const bound = c.length;
         (a.length == bound) || assert(0);
         (b.length == bound) || assert(0);
         foreach(i; 0..bound)
             c.ptr[i] = a.ptr[i] + b.ptr[i];
     }

If you drop the .ptr suffixes the bounds checks are elided, 
correctly, but the code is not auto vectorized.

Similarly, if you drop the .ptr suffixes and compile  safe, the 
bounds checks are elided but the code is not auto vectorized.

It would be great if known-within-bound indexing could yield fast 
code without requiring either .ptr or  trusted.  IOW, I'm hoping 
for faster safe code.  Is that easily achieved in cases like the 
above or is it quite hard?

Jun 27 2021

Johan <j j.nl> writes:

On Sunday, 27 June 2021 at 17:49:22 UTC, Bruce Carneal wrote:
 The following code auto vectorizes nicely, as you'd hope:

     void abc(const(int)[] a, const(int)[] b, int[] c)  nogc 
 nothrow pure  trusted {
         const bound = c.length;
         (a.length == bound) || assert(0);
         (b.length == bound) || assert(0);
         foreach(i; 0..bound)
             c.ptr[i] = a.ptr[i] + b.ptr[i];
     }

 If you drop the .ptr suffixes the bounds checks are elided, 
 correctly, but the code is not auto vectorized.

 Similarly, if you drop the .ptr suffixes and compile  safe, the 
 bounds checks are elided but the code is not auto vectorized.

 It would be great if known-within-bound indexing could yield 
 fast code without requiring either .ptr or  trusted.  IOW, I'm 
 hoping for faster safe code.  Is that easily achieved in cases 
 like the above or is it quite hard?

Vectorization does happen when running it again through the 
optimization pipeline:
D --> LLVM IR: https://d.godbolt.org/
IR --> opt --> IR: https://opt.godbolt.org/

This means that we can solve it by reordering/optimizing our 
optimization pipeline in LDC. It is non-trivial work and I would 
suggest not to undertake it before we have switched to LLVM's new 
PassManager (because I think that also tweaks the order of 
optimization passes)

-Johan

Jun 29 2021

Bruce Carneal <bcarneal gmail.com> writes:

On Tuesday, 29 June 2021 at 11:20:19 UTC, Johan wrote:
 On Sunday, 27 June 2021 at 17:49:22 UTC, Bruce Carneal wrote:
 [snip]

 [snip]

 This means that we can solve it by reordering/optimizing our 
 optimization pipeline in LDC. It is non-trivial work and I 
 would suggest not to undertake it before we have switched to 
 LLVM's new PassManager (because I think that also tweaks the 
 order of optimization passes)

 -Johan

Thanks for the update Johan.  I'll keep working with the "manual" 
version,  trusted and .ptr, and watch for any updates.  You guys 
seem pretty busy already so I hope it fits in with your other 
work.

A related question: is there anything preventing the compiler 
from lifting range checks out of loops and throwing early?  For 
example, on something like:

     foreach(i, ref dst; c[])
         dst = a[i] + b[i];

could c.length be extracted and asserted as being leq a.length 
and b.length before we drop in to the loop?

Finally, it seems like bounds checking optimizations of this sort 
belong in the front end or am I mistaken?

Jun 29 2021

Johan <j j.nl> writes:

On Wednesday, 30 June 2021 at 01:18:53 UTC, Bruce Carneal wrote:
 A related question: is there anything preventing the compiler 
 from lifting range checks out of loops and throwing early?  For 
 example, on something like:

     foreach(i, ref dst; c[])
         dst = a[i] + b[i];

 could c.length be extracted and asserted as being leq a.length 
 and b.length before we drop in to the loop?

That would be observable behavior change, so not possible. Unless 
the behavior upon out-of-bounds access is defined as UB.

 Finally, it seems like bounds checking optimizations of this 
 sort belong in the front end or am I mistaken?

I am wary of any optimization actually performed (rather than 
checking for validity) in the frontend.

-Johan

Jun 30 2021

Bruce Carneal <bcarneal gmail.com> writes:

On Wednesday, 30 June 2021 at 11:43:08 UTC, Johan wrote:
 On Wednesday, 30 June 2021 at 01:18:53 UTC, Bruce Carneal wrote:
 A related question: is there anything preventing the compiler 
 from lifting range checks out of loops and throwing early?  
 For example, on something like:

     foreach(i, ref dst; c[])
         dst = a[i] + b[i];

 could c.length be extracted and asserted as being leq a.length 
 and b.length before we drop in to the loop?

 That would be observable behavior change, so not possible. 
 Unless the behavior upon out-of-bounds access is defined as UB.

So a compiler that could pull bounds checking out of a loop would 
need to generate code that would run a potentially shortened loop 
and then throw if OOB.  Something like:

     const __bound = min(a.length, b.length, c.length);
     const __throwAtEnd = a.length < __bound || b.length < __bound;
     foreach(i, ref dst; c[0..__bound])
         dst = a[i] + b[i];
     if (__throwAtEnd) { /* throw */ }


The above is ugly enough that I imagine it puts enabling speedy 
 safe loops of this type pretty far down on the TODO list.

I now suspect that there may be an easier path to safe data 
parallel code: we improve code gen for sequences of operations on 
slices (single bounds check at the top of the basic block and 
operation fusing to reduce the memory touches).

Thanks for the info.

Jun 30 2021

Elronnd <elronnd elronnd.net> writes:

On Wednesday, 30 June 2021 at 15:51:58 UTC, Bruce Carneal wrote:
 So a compiler that could pull bounds checking out of a loop 
 would need to generate code that would run a potentially 
 shortened loop and then throw if OOB.

This is effectively how loop unrolling works in hotspot.

Jun 30 2021

Bruce Carneal <bcarneal gmail.com> writes:

On Wednesday, 30 June 2021 at 15:51:58 UTC, Bruce Carneal wrote:
 ...

     const __bound = min(a.length, b.length, c.length);
     const __throwAtEnd = a.length < __bound || b.length < 
 __bound;

The compiler emitted throw flag in the pseudo-code should be:

     const __throwAtEnd = a.length < c.length || b.length < 
c.length;

Sorry bout the noise.  I appreciate the later contributors having 
read through it to the intent.

Glad to hear that there may be some paths open to faster  safe 
code in the future.

Jun 30 2021

"David Nadlinger" <code klickverbot.at> writes:

On 30 Jun 2021, at 12:43, Johan via digitalmars-d-ldc wrote:
 On Wednesday, 30 June 2021 at 01:18:53 UTC, Bruce Carneal wrote:
 could c.length be extracted and asserted as being leq a.length and 
 b.length before we drop in to the loop?

 That would be observable behavior change, so not possible. Unless the 
 behavior upon out-of-bounds access is defined as UB.

There might be LLVM optimisation passes to recognise the range checks 
and coalesce them into a single one before the loop header, though – I 
vaguely remember seeing something like this in the tree at some point, 
developed with a few towards Java, and not enabled by default in the 
PassManagerBuilder. We might be able to reuse/adapt that logic.

  — David

Jun 30 2021

D Programming

C/C++ Programming

Other

digitalmars.D.ldc - bounds checking and optimization