Variable Base Types

The standard C/C++ variables consist of base types (char, short, int, and so on), which can be grouped together to form user-defined types (struct, union, bitfield, and so on). This chapter looks at base types and grouped types separately, starting with the base types.

Size and Range of Variables

Although most compilers these days use the same memory mapping of base types, this is still not always the case. Furthermore, not all programmers are always fully aware of the implications of choosing a certain base type. This section therefore takes a closer look at the size—number of bytes—that each variable type takes up and the range of values they can contain.

Determining variable type size is easily done with the C/C++ sizeof() function . Although the range of a variable can be deducted from the size of the variable, for some types (such as float or double) it is best to consult the relevant compiler documentation. Table 6.1 presents the sizes and ranges of variable types for the two example environments of this book.

Table 6.1. Variable Sizes and Ranges
Base Type W/D U/G Range
 (Size in Bytes) (Signed) (Unsigned)
char 1 1 -128 to 127 255
short 2 2 -32,768 to 32,767 65,535
short int 2 2 -32,768 to 32,767 65,535
long 4 4 -2,147,483,648 to 2,147,483,647 4,294,967,295
int 4 4 -2,147,483,648 to 2,147,483,647 4,294,967,295
float 4 4 3.4E ± 38(7 digits) ..
double 8 8 1.7E ± 308(15 digits) ..
long double 8 8 *1.2E ± 4932(19 digits)* ..
struct 1 1 .. ..
bit field 4 4 .. ..
union 1 1 .. ..
long long/__int64 8 8 - 9,223,372,036, 854,775,808 to 9,223,372,036, 854,775,807 14,757,395,258,967,641,292

The first column of sizes is valid for Windows with Developer Studio (W/D). Some documentation for this compiler claims the long double to be ten bytes in size instead of eight, in which case the larger range would be valid. A 64-bit integer is called __int64 by Developer Studio.

The second column of sizes is valid for UNIX and Linux with the GNU compiler (U/G). When we use the GNU compiler the 64-bit integer is called a long long .

Careful readers might notice the struct , bit field , and union types appearing in this list. These are included because the smallest possible size of these types is not always apparent. It is possible to determine the sizes of these types by checking an instance containing the smallest possible element—for example,

struct Small { char c;} ; struct Bitfield { unsigned bit:1;} ; union Un { char a;}

These types are discussed in more detail at the end of this chapter.

When you are using a different target environment from the two dealt with above, it is a good idea to run some size tests. A function which does just that is included in the book tools on the Web site: BaseTypeSizes() (see Chapter 5, "Measuring Time and Complexity"). To prevent cryptic descriptions of types, the remainder of this book refers to the variable types as they are specified in the column U/G. So, when a reference is made to a 64-bit integer, the type name long long will be used.

After the sizes and ranges of variable types have been determined, implementers select variable types with the best fit for their variables. In the instance of more visible variables, the design should aid in this selection. Examples include elements that occur as database fields (the amount of characters possible in a name; the maximum value of a ZIP code or postal box number), global definitions (number of possible application states, range of error codes), and return types of interface functions. And always consider whether variables are allowed to contain only positive values or both negative and positive values. As shown earlier in Table 6.1, the positive range of a type is halved when negative values are also allowed. For strictly positive variables use the unsigned keyword:

unsigned short bigAl; // 0-65,535
signed short littleAl; // 0-32,767 (and -32,768-0)

Performance of Variables

When choosing between base type variables, it also makes sense to think about the performance implications of using the different types. When you run the three functions below—Float(), Double(), Int()—in the timing test of Chapter 5, the reason becomes quite clear, as shown in Listing 6.1.

Code Listing 6.1. Performance of Base Types
long Float()
{
    long i;
    float j = 10.31, k = 3.0;

    for (i = 0 ; i < 10000000; i++)
    {
      j *= k;
    }
    return j;
}

long Double()
{
    long i;
    double j = 10.31, k = 3.0;

    for (i = 0 ; i < 10000000; i++)
    {
      j *= k;
    }
    return j;
}

long Int()
{
    long i;
    int j = 1031, k=3;

    for (i = 0 ; i < 10000000; i++)
    {
        j *= k;
    }
    return j;
}

Although the difference in speed between multiplying floats and doubles is negligible, using integers is many times faster. So it makes sense to use integers where possible and simply adjust the meaning of the value to accommodate the use of fractions. In the Int() example, by effectively multiplying the variable j by 100, the decimal point was no longer needed. Of course it is possible that the decimal point will be needed again—when generating application output, for instance—and in that case it will be necessary to convert the value of j. However, the idea is that this conversion will only have to be done a few times (possibly only once), whereas the rest of the application can use much faster calculations. It is still necessary to examine other calculations though. Consider these examples, where the arithmetic is substituted in the three functions for

j *= k;         // original statement.
j -= k;         // alternative 1.
j += k;         // alternative 2.
j /= k;         // alternative 3.

Notice that the integer is a speed demon in all cases but one—the division. This is because the integer is converted several times during a division. So, to judge whether to replace floats and doubles with integers, an implementer must determine how often the different arithmetic functions are likely to be used. An application that uses a certain floating point variable mostly in multiplication, addition, and subtraction is thus an ideal candidate for integer replacement.

Scope of Variables and Objects

When using variables and object instances, be aware of their scope. The scope of a variable or object has implications for both footprint and performance. This section discusses variable and object scope in three important contexts: lifetime, initialization, and use.

Lifetime

Generally, a variable is created within its specified scope and destroyed when this scope runs out of existence. So it makes sense to ensure that no unnecessary overhead is incurred for creating a variable. Listing 6.2 shows some examples.

Code Listing 6.2. Variable Scope Example 1
void Scope1()
{
    for (int a1 = 0; a1 < 10; a1++)
    {
        char arr[] = "pretend this is a long string";
        int b2 = (int)arr[a1];
    }
}

Apart from not actually doing anything, it seems as if this example also contains a few serious scoping flaws. Integer a1 is defined in a for statement header, whereas array arr and integer b2 are defined inside the scope of a for loop. It would appear that at least arr and b2 generate overhead by being created and destroyed for each iteration of the loop. However, the setup of this loop is pretty straightforward and most C/C++ compilers will assign a stack space to each of the three variables. (See Chapters 4, "Tools and Languages," and 8, "Functions," for more information about stack space for variables.) So the generated code should be no different from that shown in Listing 6.3.

Code Listing 6.3. Variable Scope Example 2
void Scope2()
{
    int a1, b2;
    char arr[] = "pretend this is a long string";

    for (a1 = 0; a1 < 10; a1++)
    {
        b2 = (int)arr[a1];
    }
}

However, when you make things a little more complicated, you will see that it makes sense to try to declare variables and objects only once. What if the compiler is unable to predefine a variable outside of its scope? This situation occurs when you allocate variables dynamically or need to call a constructor to create a type, as shown in Listing 6.4.

Code Listing 6.4. Variable Scope Example 3
void Scope3()
{
    for (int a1 = 0; a1 < 10; a1++)
    {
        char *arr = new char[20];
        Combi *b2 = new Combi;
        b2->a = (char) a1;

In this case, variable b2 seems to be a structure or a class, and array arr is now also dynamically created. This generates a lot of overhead because reserving and releasing heap memory occurs in every loop iteration. This sort of overhead can be avoided in situations where it is possible to reuse object instances. In those cases objects such as b2 and arr are instantiated once, outside the loop, and they are only used by statements inside the loop body. Obviously, when objects are dynamically created inside a loop to be stored away for later use (filling an array, list, or database) you have no choice but to create different instances.

Listing 6.4 is shown because, although in small loops it can be pretty apparent when dynamic creations are wasted, in larger and more-complex algorithms it is not. So it's good standard programming practice to consider using dynamic objects in limited scopes such as loops and even complete functions. Often implementers use dynamically allocated variables to contain temporary values, perhaps retrieved from a list or a database and so on.

Initialization

A similar problem to that described in the previous paragraph can occur when variables or objects are initialized with function results. Consider the following example, shown in Listing 6.5.

Code Listing 6.5. Initialization Example 1
void IncreaseAllCounters(DB *baseOne)
{

    for (int i = 0; i < baseOne->GetSize(); i++)
    {
        DBElem * pElem = baseOne->NextElem();

This piece of code will iterate over all the elements of object baseOne. To make sure no element is missed, the function GetSize() is called in order to determine the number of elements in baseOne. Because of the implementation choice to call GetSize() in the header of the for statement, a lot of needless overhead is incurred. This piece of code will call the GetSize() function as many times as there are elements in baseOne. A single call would have sufficed as shown in Listing 6.6.

Code Listing 6.6. Initialization Example 2
void IncreaseAllCounters(DB *baseOne)
{
    int bSize = baseOne->GetSize();

    for (int i = 0; i < bSize; i++)
    {
        DBElem * pElem =baseOne->NextElem();
        ~

Again, this is a simple example, but not that many programmers take the time to determine which information is static for an algorithm and calculate or retrieve it beforehand. It seems that the more complex algorithms become, the less time is spent on these kinds of matters. This is ironic because these are exactly the kind of algorithms that implementers end up trying to optimize when the application proves to be too slow.

Use

A similar slowdown, as described in the previous paragraph, can occur with the use of variables. Listing 6.7 demonstrates how not to use member variables.

Code Listing 6.7. Using Member Variables Example 1
class DB
{
    ~
    int dbSize;
    int i;
    ~
} ;

void DB::IncreaseAllCounters(int addedValue)
{

    for (i = 0; i < dbSize; i++)
    {
        ~

The variables i and dbSize are member variables of the DB class. They can, of course, be used in the way described earlier, but accessing member variables can easily take twice as long as accessing local variables . This is because the this has to be used in order to retrieve the base address of a member variable. A better way to use this information might be to create local variables and initialize them (once!) with member variable values as shown in Listing 6.8.

Code Listing 6.8. Using Member Variables Example 2
void DB::IncreaseAllCounters(int addedValue)
{
    int iSize = dbSize; // use iSize in the rest of the function.
    ~

    for (i = 0; i < iSize;  i++)
    {
        ~

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset