Optimizing Code
Optimizers modify compiled code to make it more efficient. This chapter describes how the Digital Mars global optimization feature works and how to use it to produce faster and smaller programs.What's in This Chapter:
- What global optimization is, and how it differs from local optimization.
- When and how to use the global optimization feature.
- Global optimizer options in alphabetical order.
- Hints and tips for maximizing the efficiency of optimization.
About Global Optimization
A global optimizer (as opposed to a local optimizer) analyzes and optimizes all the code for a function as a whole. Local optimizers look at only one statement at a time or at short sequences of statements. A global optimizer uses a powerful technique called "data flow analysis" to gather information about each function. The more an optimizer knows about a function, the more optimizations that are possible. This technique is why global optimizers improve code more than local optimizers do.Advantages of using the global optimization feature
You can tune most C and C++ programs to a particular machine, and the global optimization feature does few transformations that you cannot duplicate at the source level. However, this kind of fine-tuning is usually not appropriate because:- Tuning code takes time. Compilers are meant to allow programmers to be more productive, and global optimization largely eliminates the need to tune programs by hand.
- Cryptically tuned code is harder to maintain.
- Code that is tuned for one machine may turn out to be poorly tuned for another. A widely applicable example of this problem is the fact that different compilers support different register values.
- Digital Mars C++ uses simple methods to expand inline functions, relying on the global optimization feature to "fix" them into more acceptable code.
- C++ is one of the few languages that permits implicit promotion of user-defined types; global optimization helps clean up the resulting temporary variables.
When to use the global optimization feature
Global optimization slows compilation but is a highly effective timesaver on frequently executed programs. Optimization also moves code to the extent that some of the correlation between source and object code is lost. Therefore, do not use global optimization in conjunction with the debugger.Effects of optimization on performance
Typical computation-bound programs can speed up as much as 30 percent when optimized. I/O-bound programs may not speed up at all. The most dramatic speed improvements are in small, frequently executed loops. Speed improves least in code that consists primarily of function calls.Optimization versus assembly language code
Globally optimized code still does not run nearly as efficiently as carefully crafted assembly language code. One reason is that the assembly language programmer knows more about the algorithm than the compiler knows. The compiler always optimizes conservatively, to ensure safe optimizations. However, you may not have time to carefully craft an entire program or program component in assembly language; an optimizing compiler then offers the most cost-effective solution.What global optimization does not do
Global optimization does not replace inefficient algorithms with efficient ones. It cannot recognize a bubble sort and replace it with a quick sort. Regardless of how good a compiler might be, improving the algorithm often produces the most dramatic increases in execution speed.Optimizing for Windows
All global optimization methods are safe for use with code that runs on any version of Microsoft Windows. Using the Global Optimization Feature. To use the global optimization feature, specify the -o option to dmc.For example,
dmc demo.c -omeans compile and optimize demo.c.
To optimize from the IDDE, use the Code Optimization subpage on the Build page of the Project Settings dialog box.
Using Optimization Flags
The -o option without specifying any modifiers defaults to optimizing for program speed.Note: The scppn.exe compiler pass uses -o to specifiy the output file, not optimization. This can be confusing, but there is generally no need to run scppn.exe independently.
Optimizing for speed vs. space
When space is at a premium, you can change the default of "speed over space" to "space over speed," like this:dmc demo.c -o+spaceThis command compiles demo.c with all optimizations applied; the speed optimization is turned off, and the space optimization is turned on. For information about what speed and space optimizations do, see "Space/speed tradeoffs" later in this chapter.
Turning off all optimization methods
To turn all optimizations off, use the flags -o-all or -o+none. The -o+all and -o-none flags are equivalent to -o. For example:dmc demo.c -o+nonecompiles demo.c without performing global optimization.
Turning off specific optimization methods
If you must explicitly disable specific methods, do so by first turning on all methods with -o and then disabling individual methods one at a time with -flag. For example: dmc demo.c -o-cnp -o-cp performs all optimizations except "constant propagation" and "copy propagation."Note:
You cannot turn on individual optimization methods.List of global optimization flags
The table below lists all the optimization flags. Complete descriptions of each flag follow the table.
flag | description |
---|---|
all | Perform all optimizations |
cnp | Constant propagation |
cp | Copy propagation |
cse | Global common subexpressions elimination |
da | Dead assignment elimination |
dc | Dead code elimination |
dv | Dead variable elimination; compute live ranges |
li | Loop invariant removal |
liv | Loop induction variable replacement |
loop | Iterate until optimizations have no further effect |
none | Do no optimizations |
reg | Assign variables to registers where possible |
space | Favor space optimizations over execution speed optimizations |
speed, time | Favor execution speed (time) optimizations over space optimizations |
vbe | Very busy expressions |
w | Turn on warnings |
all Turns on every optimization except space. The flags none and all are mutually exclusive; +all is equivalent to -none (shutting off none). Use +all to optimize your code to the fullest extent.
cnp replaces certain variables with constants. Consider the code:
A = 5; for (i = 0; i < A; i++) abc[i] = A;A always has the value 5 within the loop body. To optimize, the compiler replaces A with its value:
A = 5; for (i = 0; i < 5; i++) abc[i] = 5;Constant propagation opportunities frequently occur when loop rotation is done. For example, constant propagation converts:
while (e) expression;to:
if (e) do expression; while (e)cp Copy propagation is similar to constant propagation, except it copies variables instead of constants. For example, it replaces:
A = b; for (i = 0; i < A; i++) abc[i] = A;with:
A = b; for (i = 0; i < b; i++) abc[i] = b;Copy propagation frequently uncovers unnecessary assignments, such as the assignment to A, that can be removed.
cse Common subexpression elimination removes redundant computations. For example:
plus10 = A + B + 10; minus10 = A + B - 10; tang = (A + B) / cos(B); // A + B is computed three timesbecomes:
T = A + B; plus10 = T + 10; minus10 = T - 10; tang = T / cos(B); // A + B is computed onceda Eliminates assignments to variables that have no further use. For example:
int abc() { int f; static int g = 0; if (g != 3) { f = 3; // A dead assignment to f g = 3; // Not a dead assignment } }dc Dead code is code that can never execute. Sometimes dead code is subtle:
#define VALUE 0 if (VALUE > 10) a = dead + code; // This statement is deaddv Dead variables are automatic variables that are declared but never used. They often result from other optimizations. The live range of a variable is that portion of a function in which the variable's value must be preserved. If the live range is zero, the variable is dead. If two or more variables have nonintersecting live ranges, they can occupy the same memory location.
When allocating a large number of variables into a fixed number of registers, the compiler figures the live range of each register and compares it with the live range of the corresponding variable. If they do not intersect, the compiler assigns the variable to that register.
li If an expression appears in a loop body but its result never changes, then it is a loop invariant, and the compiler can remove its computation from the loop. For example:
while (f()) // b * c is loop invariant g(b * c);becomes:
T = b * c; while (f()) g(T);liv Loop invariant removal makes loops faster, especially those that cycle through an array. It can make code slightly larger, however. For example:
int a[ARRAY_SIZE], i; for (i = 0; i < ARRAY_SIZE; i++) a[i] = GetNextElement();Without this optimization, your code has to perform a multiplication each time it figures the address for the next array element (i * sizeof( int)). With the optimization, it remembers the address of the last element and adds the size of an element to that address.
loop Performing many optimizations makes further optimizations possible. This switch tells the compiler to continue until it discovers no further optimizations.
none The none switch shuts off all optimizations, which is the same as not using the optimization feature. The none and all optimizations are mutually exclusive; +none is equal to -all (shutting off all).
reg reg puts as many variables in registers as possible. Variables are assigned to registers based on how often they are used and how deeply nested in loops they are.
Using the register storage class is unnecessary when you use the global optimization feature. However, the compiler honors all such declarations in your code.
space speed/time Sometimes you must choose whether you want your code to run faster or to take up a minimal amount of memory. One specific example is:
while (f()) g();When optimized for space, the above example becomes:
goto L1; do { g(); L1:; } while (f());When optimized for speed, the example becomes:
if (f()) { do g(); while(f()); }Note: If you don't specify +space, the compiler defaults to optimizing for speed.
The speed and space flags are mutually exclusive:
-o-speed is the same as -o+space -o-space is the same as -o+speedvbe Very busy expressions occur along more than one path in the flow graph:
if (condition) a = b * c; else d = b * c;Here b * c is a very busy expression. The optimization is:
T = b * c; if (condition) a = T; else d = T;This optimization only saves space; it does not increase execution speed.
Optimization Tips
Here are some ideas to keep in mind when you use the global optimization feature.Use of const and volatile keywords
The const keyword defines a variable that is never modified, so the compiler does not have to make worst-case assumptions about that variable's changing.The volatile keyword tells the compiler that the variable can change asynchronously and never "optimizes out" loads and stores to volatile variables.
You can use the const keyword for static tables and the volatile keyword for hardware locations (like low-memory globals) and data that could change when an interrupt occurs.
Example:
const static int a = 5, b[3] = { 1,2,7 }; extern volatile bool flag;
If a working program fails when optimized
The compiler uses a strict interpretation of the definition of the C++ language and strives to conform to it. However, some aspects of C++ are undefined, and the compiler may interpret the same code differently, depending on whether optimization is performed. Some reasons that code may behave differently when optimized include:
The code is dependent on the order of evaluation of side effects. For example:
a = f1() + f2();
Whether f1() or f2() is executed first is undefined. You can control the order of evaluation by using explicit temporary variables, such as:
tmp = f1(); a = tmp + f2();
Now the evaluation order is guaranteed.
- The code is dereferencing uninitialized pointers. Since the compiler rearranges the physical allocation of variables when it optimizes code, uninitialized variables can contain different values.
- The code is storing data past the end or before the beginning of arrays or storing data allocated with unsuccessful calls to new or malloc().
- The code refers to data freed with realloc() or free().
- A variable that could be modified at the interrupt level is not declared with a volatile storage class modifier.
Recompiling code with the global optimization feature can uncover problems similar to those encountered when porting to another compiler. Adherence to standard "portable" coding practices minimizes any problems encountered when using the optimization feature.