Optimizing with Help from the Compiler

It is possible to have the compiler perform optimizations on the executable code it generates. This is a very cheap form of optimization as no extra development effort is needed for finding places to optimize and performing the actual code optimizations. Only compiling the sources will possibly take longer, but this extra time is negligible. How and when to instruct the compiler to perform optimizations will be explained in the following sections.

Optimizing with Compiler Optimization Options

Most compilers will perform only basic code translation optimizations when invoked without specific optimization instructions. It is possible, however, to instruct compilers to do more optimization on the output they generate. For this the option -On is commonly used, where n is the level of optimization (-O3 will perform more optimizations than -O2 and so on). The exact range of n for the GNU compiler depends on the version used. The lowest level of optimization is of course obtained by completely omitting the -O compiler option. One compilation instruction for generating an optimized executable from the example program of the beginning of this chapter is

gcc -O2 myfile.c -o myfile

Before looking closer at what kind of optimizations the compiler can actually make (see the section "A Closer Look at What Actually Happens," later in this chapter), this section will explain why compilers do not perform the maximum level of optimization per default. Intuitively, you would expect that the more the compiler optimizes, the better off you are, but there are some practical reasons why this is not always the case.

Reasons why the compiler does not perform maximum optimization by default:

  • The more the compiler optimizes, the more difficult debugging becomes. The compiler is able to combine segments of code which have a strong logical connection—for instance, segments that result in identical groupings of machine code. During debugging, it will seem as if single stepping causes the processor to jump from code segment to code segment at random.

  • Compiler optimizations carry a certain amount of risk. In optimizing, the compiler will attempt to combine or skip as many code segments as possible. It can happen that the compiler does not correctly assess what the programmer is attempting to do and optimizes something important out of existence. For example, variables which are initialized but never used will probably be removed.

  • Optimizing execution speed of a generated executable can cause this executable to become larger. Some optimizations make the generated executable faster but also larger, this happens when function calls are expanded (see Chapter 8). The compiler cannot assess easily where footprint and speed optimization priorities lie. For this it needs to be instructed. More about these instructions can be found in the following section "A Closer Look at What Actually Happens."

Taking into account the dangers specified in the preceding list, it becomes apparent that it is a good idea to make sure a program actually works before turning on the optimization option (for instance, on the release build), especially when a high level of optimization is attempted. The optimized executable will, of course, have to be tested again to evaluate whether any functionality was damaged and whether the program still adheres to footprint and performance requirements.

A Closer Look at What Actually Happens

The compiler can offer substantial help in optimizing executable code, especially where performance of the code is concerned. This section walks through some examples of optimizations made by the compiler and shows exactly what the compiler has done. First, here is a list of all the optimization options for the G++ and GCC compilers:

Optimization Options GCC and G++
              -fcaller-saves -fcse-follow-jumps -fcse-skip-blocks
              -fdelayed-branch -felide-constructors
              -fexpensive-optimizations -ffast-math -ffloat-store
              -fforce-addr -fforce-mem -finline-functions
              -fkeep-inline-functions -fmemoize-lookups
              -fno-default-inline -fno-defer-pop
              -fno-function-cse -fno-inline -fno-peephole
              -fomit-frame-pointer -frerun-cse-after-loop
              -fschedule-insns -fschedule-insns2
              -fstrength-reduce -fthread-jumps -funroll-all-loops
              -funroll-loops -O -O2 -O3

This list can be obtained on UNIX via the command line

man gcc

This section presents in-depth examples of what happens when you use several different optimization options. For an explanation of all the options, you are referred to the UNIX man pages on gcc.

To illustrate the optimization steps of the compiler, this section presents several Assembly files generated by the g++ compiler. It is not actually necessary to understand the Assembly code line-by-line to appreciate what happens as the general concepts will be explained in the accompanying texts.

Listing 4.10 presents the example program which is used in this section.

Code Listing 4.10. Example Program for Compiler Optimization: test.cc
int main (int argc, char *argv[])
{
  int i, j=0;

  for (i = 1; i <= 10; i++)
    j+=i;

  exit(j);
}

This program simply adds the numbers 1–10 during a loop and returns the result (55) on exit.

Listing 4.11 shows the Assembly code generated without any optimizations turned on (g++ -S test.cc).

Code Listing 4.11. Generated Assembly for Listing 4.10 Without Compiler Optimization: test.s (Not Optimized)
.file    "test.cc"
gcc2_compiled.:
___gnu_compiled_cplusplus:
    .def    ___main;    .scl    2;    .type    32;    .endef
.text
    .align 4
.globl _main
    .def    _main;    .scl    2;    .type    32;    .endef
_main:
    pushl
							
							 %ebp
    movl %esp,%ebp
    subl $16,%esp
    call ___main
    movl $0,-8(%ebp)
    movl $1,-4(%ebp)
    .p2align 4,,7
L2:
    cmpl $10,-4(%ebp)
    jle L5
    jmp L3
    .p2align 4,,7
L5:
    movl -4(%ebp),%eax
    addl %eax,-8(%ebp)
L4:
    incl -4(%ebp)
    jmp L2
    .p2align 4,,7
L3:
    movl -8(%ebp),%eax
    pushl %eax
    call _exit
    addl $4,%esp
    xorl %eax,%eax
    jmp L1
    .p2align 4,,7
L1:
    movl %ebp,%esp
    popl %ebp
    ret
    .def    _exit;    .scl    3;    .type 
							
							
							   32;    .endef

Listing 4.12 shows the Assembly code generated with optimization turned on; level -O2 looks like this (g++ -O2 -S test.cc).

Code Listing 4.12. Generated Assembly for Listing 4.10 with Level 2 Compiler Optimization: test.s (Optimized with -O2) by gcc
.file    "test.cc"
gcc2_compiled.:
___gnu_compiled_cplusplus:
    .def    ___main;    .scl    2;    .type    32;    .endef
.text
    .align 4
.globl _main
    .def    _main;    .scl    2;    .type    32;    .endef
_main:
    pushl
							
							 %ebp
    movl %esp,%ebp
    call ___main
    xorl %edx,%edx
    movl $1,%eax
    .p2align 4,,7
L5:
    addl %eax,%edx
    incl %eax
    cmpl $10,%eax
    jle L5
    pushl %edx
    call _exit
    .def    _exit;    .scl    3;  .type    32;    .endef

Even though this is a small, trivial program, you can already see substantial differences in efficiency and footprint. The optimized version of the example program is clearly much shorter and, when you look at the loop (the code between L2 and L3 in the first Assembly listing, and L5 in the second Assembly listing), you see a noticeable decrease in the number of compare instructions (cmp) and jumps (jmp, jle) . In this example, the optimized code is, in fact, easier to read than the original code, but this is certainly not always the case. Also, note that -O2 takes care of several kinds of optimization in a single compilation.

More can be done with this example. There is a loop in the program and, having seen the available optimization options at the beginning of this section, you have undoubtedly spotted the option -funroll-loops.

Unrolling loops is a technique to make software faster by doing more steps inside a loop and doing fewer actual loop iterations. The following example demonstrates this principle:

Normal Loop Unrolled Loop
execute loop ten times execute loop 2 times
step A step A
end loop step A
  step A
  step A
  step A
 end loop

Both loops perform step A ten times. The unrolled code is definitely larger, but it is also faster because the overhead incurred by looping around (checking the end condition and jumping back to the beginning) is performed less often, thus taking up a smaller percentage of loop- execution time (Chapter 8 describes loops in more detail). Listing 4.13 shows what unrolling loops can do for the example program (g++ -O2 -funroll-loops -S test.cc).

Code Listing 4.13. Generated Assembly for Listing 4.10 with Further Compiler Optimization by gcc: test.s (Optimized with -O2 -funroll-loops)
.file    "test.cc"
gcc2_compiled.:
___gnu_compiled_cplusplus:
    .def    ___main;    .scl    2;    .type    32;    .endef
.text
    .align 4
.globl _main
    .def    _main;    .scl    2;    .type    32;    .endef
_main:
    pushl %ebp
    movl %esp,%ebp
    call ___main
    pushl $55
    call _exit
    .def  
							
							
							  _exit;    .scl    3;    .type    32;    .endef

Apparently, this was a very good choice. The generated file is now pretty small; notice that the loop has, in fact, completely disappeared! The only evidence remaining is the placement of the number 55 (the result of the ex-loop) on the stack. This means the compiler performed the loop during compilation and placed only the result in the generated code. This is possible only because the result of the loop is known at compile time; there was, in fact, a hidden constant in the program.

It seems that using compiler optimizations can already have benefits before you even begin thinking long and hard about what to do. Playing around with the other optimization options found at the beginning of this section will give more insight into what the compiler is willing and able to do.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset