Arithmetic Operations

Most programs at some point perform arithmetic operations. This can be as simple as small addition or multiplication, or as complex as three-dimensional volume calculations. Whatever the case may be, often arithmetic operations can be broken down into several smaller, simpler steps involving a series of basic calculations and conversions. These steps are ideal candidates for optimization because they are generally used repeatedly on large amounts of data, and most contain looped or recursive functionality. The many optimization techniques discussed in this book are well suited for arithmetic optimization. Think for instance of rewriting recursive routines into iterative routines, using loop unrolling, creating tables of predetermined values and optimizing mathematical equations. Because arithmetic operations are an important area for optimization, which in practice often gets overlooked or executed in a far from optimal form, the following sections give tips and examples on how to use optimization techniques specifically for arithmetic operations.

Using Bit Operators for Arithmetic Operations

Interesting optimizations can be made by looking at bit representations of byte values. The example in Listing 13.1 demonstrates this. Often during a calculation it is important to know whether a number is a multiple of another number—that is, whether it can be divided by another number without any remainder. For this, the % operator (modulo operator) is often used as shown in Listing 13.1.

Code Listing 13.1. Counting Multiples
int i, k;
long j, count = 0;

for (k = 0 ; k < DATALEN; k++)
    for (i = 1; i <= 256; i <<= 1)
    {
        if ((data[k] % i) == 0)
            count++;
    }

The example in Listing 13.1 checks each byte in a data block whether it is a multiple of 1, 2, 4, 8,…128. Because this example checks for very specific (but certainly often occurring) numbers, it can be optimized with a fast and simple bitwise check, as shown in Listing 13.2.

Code Listing 13.2. Bitwise Counting Multiples
int i, k;
long j, count = 0;

for (k = 0; k < DATALEN; k++)
    for (i = 1; i <= 256; i <<= 1)
    {
        if ((data[k] & (i-1)) == 0)
            count++;
    }

The example in Listing 13.2 avoids the use of the time-consuming % operator by making use of the knowledge that when a number is a multiple of x, all the bits representing x-1 are not set. This means, for instance, that a number is a multiple of eight when its first three bits are not set (the binary representation of eight is 1000, the binary representation of seven is 0111). The result of using a bitwise & instead of the % operator is that the example routine becomes up to five times faster. A program that demonstrates this can be found in the file 13Source01.cpp on the Web site.

More examples of bit operator use can be found in the listings of the following sections.

Identifying Static Parts in Calculations and Conversions

Part of optimizing is making sure that no double work is done. Information which is static, and therefore does not change between different usages of a certain piece of code, can be precalculated at program startup or perhaps even at compile time. This means time is won whenever this information is used. This can be done for intermediate or even end results of calculations. Refer to Chapters 7, "Basic Programming Statements", and 11, "Storage Structures", for more details on using recalculated values. The following sections show examples of calculations that contain, perhaps unexpectedly, static parts, and how they can be optimized.

Determining the Number of Bits Set in a Data Block

A routine that determines the parity of a data block is a good example of a calculation containing static parts. The parity of a data block depends on the number of bits in that block that are set (a bit is set when it has the value 1). For instance, the number of bits set in a byte with value 9 is 2. (The binary representation of 9 is 1001).

The most straightforward way to determine the number of bits set in a data block is to check the bits of every byte in the block and increase a counter for every bit that is set. Listing 13.3 shows an example of how this could be implemented.

Code Listing 13.3. Using Two for Loops to Count Set Bits in a Data Block
int i, k;
long count = 0;

for (k = 0 ; k < DATALEN; k++)
    for (i = 1; i <= 256; i <<= 1)
    {
        if (data[k] & i)
            count++;
    }

Upon loop termination the variable count in Listing 13.3 will contain the number of set bits in data block data. Notice that a logical and is used together with mask i to determine whether or not a particular bit is set.

This function can be sped up significantly when the inner for loop is eliminated. Unrolling this loop will take care of that (refer to Chapter 7 for more details on loop optimization). Listing 13.4 shows a bit-counting function that uses only a single loop.

Code Listing 13.4. Using One for Loop to Count Set Bits in a Data Block
int k;
long count = 0;

for (k = 0; k < DATALEN; k++)
{
    count +=((data[k] & 128) >> 7)  +
            ((data[k] &  64) >> 6)  +
            ((data[k] &  32) >> 5)  +
            ((data[k] &  16) >> 4)  +
            ((data[k] &   8) >> 3)  +
            ((data[k] &   4) >> 2)  +
            ((data[k] &   2) >> 1)  +
            ((data[k] &   1));
}

As with Listing 13.3, the variable count in Listing 13.4 will contain the number of set bits in data block data upon loop termination. Notice that the inner loop of Listing 13.3 has been transformed into eight fixed logical ands. The results of these and operations are shifted with the >> operator, so each bit that is set increments count only by one.

Counting set bits with the code in Listing 13.4 is up to five times faster than counting set bits with the code in Listing 13.3. It seems this loop unrolling is a valuable calculation optimization here; it allows you to process five times as much data per second while collecting parity information. But this code can be made faster still. When you analyze the problem domain carefully you will see that static data information is gathered dynamically. The static data is the number of bits set in a byte with a certain value. For instance, a byte with the value 1 will always have only a single bit set (the binary representation of 1 is 0001). Similarly, each byte with the value 5 has two bits set (the binary representation of 5 is 0101). It is possible, therefore, to create a predetermined list containing the number of bits set for each value a byte can have. This list contains 256 values and can be used as shown in Listing 13.5.

Code Listing 13.5. Using a Lookup Table to Count Set Bits in a Data Block
//                              0 1 2 3 4 5 6 7 8
unsigned char lookupTable[] = { 0,1,1,2,1,2,2,3,1..

int i, k;
long count = 0;
for (i = 0; i < DATALEN; i++)
{
    count+=lookupTable[data[i]];
}

In Listing 13.5, each byte of the data block is used as an index in the lookupTable in order to determine the number of bits set for its value. For instance, a data byte with the value 8 will cause the number 1 to be retrieved from the lookup table (the binary representation of 8 is 1000). The code in Listing 13.5 is up to be four times faster than that of Listing 13.4. The file 13Source01.cpp contains a program that compares the speeds of the bit count solutions presented in Listings 13.3, 13.4, and 13.5. Table 13.1 shows the results of this program.

Table 13.1. Results of Counting the Number of Bits Set in a Block of 8192 Bytes
Listing Time
Two loops (13.1) 6210
One loop (13.2) 1160
No loop (13.3) 320

The final solution put forward in Listing 13.5 is up to twenty times faster than the first solution put forward in Listing 13.3. Clearly it pays to spend some time determining static parts of calculations and how to handle these. A remaining question is whether or not the static allocation of a 256-byte array can be justified by the performance won. This is true for most systems (even most memory-tight embedded systems). Notice, however, that transforming the solution of Listing 13.4 into a macro and using it a few times in a source file will already take up more than 256 bytes.

Converting Byte Values to ASCII

Often programs need to convert binary data into text strings for output. As you may expect, conversions contain a high level of static information. A conversion often used by programmers is that of binary data into hexadecimal text strings. Listing 13.6 shows the kind of conversion functionality that is often used for this purpose.

Code Listing 13.6. Standard Conversion of Byte Values to Hex String
inline void Nibble2Ascii(int nibble, char &c)
{
    if (nibble < 10)
        c = (char)(0x30 + nibble);
    else
        c = (0x37 + nibble);
}

void HexStr1(char *s, int val)
{
    Nibble2Ascii(val >> 4, s[0]);
    Nibble2Ascii(val %16, s[1]);
    s[2] = 0x0;
}

When you look for static information in binary-to-hex string conversions, you will notice that the text representation of each nibble is predetermined and does not have to be calculated at all. Once again a simple array with predetermined answers will do nicely. Listing 13.7 demonstrates a conversion routine that is more than twice as fast as that of Listing 13.6.

Code Listing 13.7. Optimized Conversion of Byte Values to Hex String
char  HexString[] = { "0123456789ABCDEF"} ;

void HexStr2(char *s, int val)
{

    s[0] = HexString[val >> 4];
    s[1] = HexString[val & 0xf];
    s[2] = ' 0';
}

The same can be done for conversions of byte values to decimal strings. It is of course possible to convert each byte individually using a standard C/C++ function:

// Convert byte j to decimal.
char s[4];
itoa( j, s, 10 );

But when large blocks of bytes need to be converted or when conversion is performed often, you may opt to build a two-dimensional array to store all the possible answers and simply select a pointer to the correct answer string on conversion.

char decLookup[256][4];    // fill this at some point.
~
// Convert byte j to decimal.
s = decLookup[j];

This way each byte value is converted only once and after this each following conversion is suddenly over forty times faster. Footprint considerations here are whether to make a static array of values in a source file—in which case the runtime footprint as well as the storage footprint increases with the size of the table (refer to Chapter 1, "Optimizing: What Is It All About?" for more details on footprint terminology)—or to create the array dynamically during runtime. When you choose the second option do not be fooled into thinking that the footprint of the table can be optimized by using 256 pointers to variable length strings in order to save some bytes on the first 99 entries. As it is, the table is 4*256 bytes large, which is also the size of an array of pointers.

The file 13Source01.cpp on the Web site contains a program that compares conversion speeds of several conversion methods. It also shows how standard C/C++ functions like toupper() and tolower() can be optimized in a similar way.

Optimizing Mathematical Equations

When calculations contain no static parts to optimize, think about optimizing the mathematical equations instead. This can yield very good results. The following example demonstrates this. Let's say you need to determine the sum of a range of numbers that increase with a given step size. Examples of such ranges are 0,2,4,6,8 (step size = 2) and 5,10,15,20 (step size = 5). The most straightforward way to do this is to add all the numbers one by one. Listing 13.8 shows an example of how this could be implemented.

Code Listing 13.8. Standard Determination of the Sum of a Range
int k;
long count = 0;
for (k = 0 ; k < DATALEN; k++)
      count+= data[k];

Again, however, the few things known about the data can be used to create a better algorithm. The information you have is the length of the data block and the range, which contains numbers that increase with a given step size. Given this, the following mathematical function can be derived for the sum:

               (n * s (n - 1))
sum = n * b +  ---------------
                       2

b = begin value
s = step value
n = nr of data elements

Listing 13.9 shows how this formula can be implemented to determine the sum of the range.

Code Listing 13.9. Determining the Sum of a Range of Numbers that Increase with a Given Step Size
step = (data[1]-data[0]);

count = DATALEN * data[0] +
        ((DATALEN * step * (DATALEN-1))  >> 1);

The variable step in Listing 13.9 is only needed when you want to use a non-fixed step size that is dynamically determined by looking at the range itself.

Note that the advantage here is that determining the sum went from an O(n) algorithm (Listing 13.8) to an O(1) algorithm (Listing 13.9)—refer to Chapter 5, "Measuring Time and Complexity," for more details. This means that the amount of time needed to determine the sum is not related to the size of the data block for Listing 13.9. It will not surprise you that this solution is therefore thousands of times faster. The file 13Source01.cpp on the Web site contains a program that measures performance differences between the solution of 13.8 and the solution of 13.9.

Note that a good math book can help you a long way in finding and optimizing mathematical functions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset