How it works...

The most common optimization that's applied to C++ is the speed of execution. To optimize C++ for speed, we must start by developing different approaches to the same problem and then benchmark each solution to determine which solution executes the fastest. Benchmarking tools such as Hayai, a C++ based benchmarking library on GitHub, aid in making this determination. To explain this, let's look at a simple example:

#include <string>
#include <vector>
#include <hayai.hpp>

std::vector<std::string> data;

BENCHMARK(vector, push_back, 10, 100)
{
    data.push_back("The answer is: 42");
}

BENCHMARK(vector, emplace_back, 10, 100)
{
    data.emplace_back("The answer is: 42");
}

When we execute the preceding code, we get the following output:

In the preceding example, we use the Hayai library to benchmark the performance difference between adding a string to a vector using push_back() versus emplace_back(). The difference between push_back() and emplace_back() is that push_back() creates the object and then copies or moves it into the vector, while emplace_back() creates the object in the vector itself without the need for the temporary object and subsequent copy/move. That is to say, if you use push_back(), an object must be constructed and then either copied or moved into the vector. If you use emplace_back(), the object is simply constructed. As expected, emplace_back() outperforms push_back(), which is why tools such as Clang-Tidy recommend the use of emplace_back() over push_back() whenever possible.

Benchmark libraries such as Hayai are simple to use and extremely effective at aiding the programmer with optimizing source code and are capable of not only benchmarking speed but also resource usage as well. The problem with these libraries is they are better leveraged at the unit level and not at the integration and system level; that is, to test an entire executable, these libraries are not well suited to aid the programmer as they do not scale well as the size of the test increases. To analyze an entire executable and not a single function, tools such as Valgrind exist, which help you profile which functions need the most attention with respect to optimizations. From there, a benchmarking tool can be used to analyze the functions that need the most attention.

Valgrind is a dynamic analysis tool that's capable of detecting memory leaks and tracing the execution of a program. To see this in action, let's look at the following example:

volatile int data = 0;

void foo()
{
    data++;
}

int main(void)
{
    for (auto i = 0; i < 100000; i++) {
        foo();
    }
}

In the preceding example, we increment a global variable (marked volatile to ensure the compiler does not optimize away the variable) from a function named foo() and then execute this function 100,000 times. To analyze this example, run the following (which uses callgrind to output how many times each function is called in your program):

> valgrind --tool=callgrind ./recipe01_example02
> callgrind_annotate callgrind.out.*

This results in the following output:

As we can see, the foo() function is listed near the top of the preceding output (with the dynamic linker's _dl_lookup_symbol_x() function called the most, which is used to link the program prior to execution). It should be noted that the program lists (on the left-hand side) the total number of instructions for the foo() function as 800,000. This is due to the foo() function being 8 assembly instructions long and being executed 100,000 times. For example, let's look at the assembly of the foo() function using objdump (a tool capable of outputting the compiled assembly of an executable), as follows:

Using Valgrind, it is possible to profile an executable to determine which functions take the longest to execute. For example, let's look at ls:

> valgrind --tool=callgrind ls
> callgrind_annotate callgrind.out.*

This results in the following output:

As we can see, the strcmp function is called a lot. This information can be combined with benchmarking APIs at the unit level to determine whether a faster version of strcmp can be written (for example, using handwritten assembly and special CPU instructions). Using tools such as Hayai and Valgrind, it is possible to isolate which functions in your program are consuming the most CPU, memory, and even power, and rewrite them to provide better performance while focusing your efforts on the optimizations that will provide the best return of investment.

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...