Loop Optimizations

Loops come in many forms, but a typical one looks like:

for (int i = 0; i < 10; i++)
    statement;

For us non-functional programmers, loops are one of the ubiquitous building blocks of our source code.

Given that, it's no surprise that compiler optimizers spend considerable effort trying to generate better code for loops. Compiler benchmarks tend to be loops, and the mettle of compiler technology is how good a job it does with loops. The various loop optimizations are well known, and come with their own terminology:

strength reduction
loop fusion
loop rotation
loop unrolling
auto-vectorization
loop invariants
lots more

(How all these work would be a long digression, instead I'll point the curious at the wikipedia article on loops.)

Fortran compilers are particularly known for how well they generate code for loops.

But if one steps back a bit, there's a common thread among those loop optimizations that seems peculiar. That is, considerable effort is expended attempting to reverse engineer out of the source code what the loop is. It's not enough to notice that a for-loop is of the form:

for (declaration; expression; expression)
    statement;

which ironically gets us exactly nowhere. The declarations, expressions, and statements could contain any valid code. To do any useful loop optimizations, instead we need to tease out the loop index variable, the initial value, the ending value, the stride of the increment, and figure out what the loop body statement is doing. By now, you should be seeing where this is going. Consider the loop:

int[dimension] array;
for (int i = 0; i < dimension; i++)
    array[i] = 5;

Humans, with our amazing perceptual and pattern-matching capabilities, instantly recognize this as setting the contents of array[] to 5. All a compiler initially sees, though, is a jumble of declarations, comparisons, assignments, and increments. People write books about algorithms that proceed from such a jumble to recognizing that Aha!! moment about what the loop is really doing. Once it knows this, the compiler can then set about generating the optimal code sequence to set the array's contents to 5. (The optimal way is hardly straightforward in modern CPU hardware, and often is not expressible in the source language.)

Now the peculiar thought intrudes. Isn't the compiler's job to enable the programmer to express an algorithm in a high level manner, and then generate the necessary low level implementation of it? We don't write assembler anymore, the compiler does that for us. Right? Right?

The language could in fact define high-level constructs that express implied loops. That way the programmer gets to write short and expressive code, and the compiler would be happy too! It's as if a quicksort could not be expressed in the language, so the programmer writes a bubblesort instead, and relies on the compiler to figure out that it's a bubblesort and then replace it with a quicksort.

Our programming languages need better loop constructs.

We can start by dumping the bookkeeping code. The foreach, which has appeared in several modern languages, does that nicely. (We'll be showing code in the D programming language for illustrative purposes.) The array setting code now looks like:

int[dimension] array;
foreach (ref v; array)
    v = 5;

Look, ma, no loop index variable! No begin, end, nor stride. It's all handled by the compiler; the compiler already knows about them, it no longer has to reverse engineer it. (Of course, other benefits accrue. By having the compiler handle such details, there is no longer any possibility of the programmer making a mistake specifying them. And the construct suggests that it can work with any collection, not just arrays.)

It's still only half the job. There's that annoying variable v which is a placeholder for the contents of the array for that iteration. There's analyzing the loop body statement to figure out what is happening in the loop. We need another semantic leap to get past that. Try this on for size:

int[dimension] array;
array[] = 5;

Aherm, what happened to the loop? It got replaced by the [] operator, which in D parlance means "the contents of". Now the compiler, by simply parsing the source code, knows that the contents of array should be set to 5. It's free to implement it by generating a loop, using CPU vector set instructions, or even by calling memset(), whichever the compiler dude thought would work best. The source code programmer does not and should not care.

Can this work for other kinds of loops? You betcha!

for (int i = 0; i < dimension; i++)
    a[i] = c * b[i] + d;

becomes:

a[] = c * b[] + d;

etc.

By adding a simple construct, we've replaced a big chunk of compiler optimization technology. This is not a total solution, but sure as heck marks a step forward for all of us. Anytime one needs an optimization technology that needs to reverse engineer a loop, there's an opportunity for more progress in language design.

A remarkable demonstration of how raising the abstraction level can obviate the need for many loop reverse engineering techniques can be seen with the Blitz++ library.

Tl;dr

Many compiler loop optimizations depend on reverse engineering the programmer's intent out of the low level mechanics of the loop source code. By adding some higher level constructs to the source language, we can simplify life for both the programmer and the compiler implementor.

References

"Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance" by Aart J.C. Bik
Goals of the Blitz++ library"

Acknowledgements

Thanks to Andrei Alexandrescu, Bartosz Milewski, David Held, Eric Niebler and Jason House for their comments and suggestions on this article.

Articles

Loop Optimizations

Tl;dr

References

Acknowledgements