Footprint

This second part of the chapter looks at optimization from the footprint viewpoint, where several techniques to reduce footprint size are discussed together with footprint problems that can arise if preventive actions are omitted.

What Is Footprint?

Strictly speaking, footprint is a spatial measurement term. The footprint of a desktop computer, for example, is about 40´40 centimeters (about 16´16 inches); that is the desk space it occupies. When talking about software, however, footprint can best be described as the amount of memory an object (a program, data structure, or task, for example) needs to function properly. This memory can be strictly internal ROM or RAM, but it is also possible that footprint requirements necessitate the use of external memory, for example, for temporary data storage. Refer to Chapter 2 for more information.

Where executable programs are concerned, different kinds of footprint sizes can be identified, as explained in the following sections.

Storage Requirements

This is the amount of memory needed when the program is inactive, the footprint of the storage, and the memory required to store the executable file and the data files it needs/has acquired. From the perspective of the user, the storage requirement is simply the amount of space needed to store the program. During development, however, the story can be rather more complicated. When development is not done on the machines that are the eventual targets for running the program, storage calculations become a lot more involved. Refer to "How to Measure Footprint Size" for more information.

Runtime Memory Requirements

This is the amount of memory needed while the program is being executed. This footprint can differ from the storage footprint for several reasons. For example, the program might not need all the executable code at once, the program will probably use working memory for temporary data storage, and so on. Moreover, the memory used during startup and execution will rarely equal that of the stored files, especially larger programs, which are those made up of more than just a single executable file. Although most often more memory is needed during execution, as one might expect, it is equally possible that a program in fact uses less memory. Practical examples of how and why runtime requirements can differ from storage requirements are given in sections that follow.

Compression

Often parts of a program and its data can be stored in compressed form. When a program uses large data files, it is unlikely that this data will be stored exactly as it is used in internal memory. It can be compressed using known or proprietary compression algorithms or even stored using a structure that is more optimal for storage purposes. Also the executable code itself might be compressed. When the program consists of several modules, the main program might load and decompress other modules as they are needed.

JPEG graphical images and MP3 sound files are examples of well-known data compression techniques. Here, clearly, footprint size reduction is chosen over performance—data compression ratios of up to 90% can be achieved. Note that these techniques do allow data loss. This can be incurred in such a way that it is not, or barely, noticeable to human eyes or ears. However, this is of course not something we would want to do to executable code. The nature of the data being compressed plays an important role in the choice of compression technique.

Whichever form of compression is used, however, the fact remains that the runtime memory requirements will most likely differ significantly from the storage requirements. Perhaps even extra memory is needed to perform the compression and decompression.

It should not be overlooked that compression and decompression take time, which means that starting a program might take longer. This highlights again the relationship between performance and footprint.

Data Structures

The structure that is used to hold and access the data might differ between storage time and runtime. Whereas data mostly needs to be small during storage, it might be far more important during runtime that it can be accessed quickly. Data that is small during storage is usually compressed and therefore moves slowly, whereas data that moves quickly during runtime is most likely not compressed, which means the data takes up a lot of space. Each storage method uses an entirely different structure.

The structure that is chosen for fast access might include redundant information in extra index files, generated before the data is accessed. For example, think of hashing tables, doubly linked lists, or sets of intermediate search results (refer to Chapters 10, "Blocks of Data," and 11, "Storage Structures"). Again it should be noted that generating this overhead will cost time. Important decisions include how much data to consider describing in these index files, when to generate these index files, and what level of redundancy should be provided by these index files (at some point generating more redundancy will no longer make the data access faster).

Overlay Techniques

Not all the executable code might be needed at the same time. Dividing a program into functional modules allows the use of overlay techniques. For example, consider a graphics editor. It contains a module of scanning routines that is used when the user wants to scan a picture. After the user closes the window containing the scanner interface, the scanner module is replaced by a module containing special effects routines. The modules do not necessarily have to be exactly the same size; just having one in memory at all times will decrease the runtime memory requirements. The more distinctly you can identify functional modules during the design phase, the better the overlay principle will work. The footprint size actually won depends on the choices the developers make. It is important to identify program states that can never occur at the same time and switch them intelligently. If it is impossible to use special effects during scanning, these two functions are a good choice for being interchanged.

You can, of course, try to overlay groups that might in some cases be needed simultaneously—or closely after each other—but then you get a much more statistical picture of footprint size. When there is a fixed maximum to the footprint size that can be used, the worst-case combination of overlaid modules should be able to fit into it.

Switching overlaid modules costs time and so has an impact on overall program performance.

Working Memory

A program being executed will need working memory regardless of whatever architectural choices are made about storage, compression, and overlaying. The use of working memory is very diverse. The following list shows the most common uses:

  • Storing variables (pointers, counters, arrays, strings, and so on)

  • Stack space

  • Storing intermediate data

  • Storing placeholders for data (structures, linked lists, binary trees)

  • Storing handles and pointers to operating system resources

  • Storing user input (history buffers and undo and redo functionality)

Cache

In certain situations, it might enhance performance to set aside some memory to act as a buffer or cache. Think, for example, of the interaction with hardware devices as described in the section "Performance of Physical Devices." Refer also to Chapter 5, "Measuring Time and Complexity," for more detail on cache use and cache misses.

Memory Fragmentation

Another subject for consideration is the fragmentation of memory. It might not exactly fit our definition of runtime memory requirements, but it certainly does affect the amount of working memory needed. While a program runs, it will continuously allocate and free pieces of memory to house objects (class instances, variables, buffers, and so on). Because the lifetimes of these objects are not all the same and often are unpredictable, fragmentation of large blocks of memory into smaller pieces will occur. It stands to reason that the longer a program runs, the more serious the fragmentation becomes. Programs designed to run longer than a few hours at a time (and that use memory fairly dynamically) could even benefit from some kind of defragmentation routines, which are usually quite expensive performance-wise. Chapter 9 discusses memory fragmentation in more detail.

The following sections give advice on how to measure and control memory requirements.

Why Is Monitoring Footprint Important?

The reasons for keeping footprint size small can be compared to driving a through a busy town. A large luxury car might look impressive, but practically speaking it is more difficult to maneuver through small and busy streets than a small car. The same holds true for finding parking spaces and squeezing into small alleys. Where software is concerned, it is also the small and practical programs (with little overhead in interface and functionality) that are the most enjoyable to work with. Large programs that pose a heavy burden on system resources will generally only be used when the user intends to work on something for a longer period of time or has a specific goal in mind that can only be achieved with that specific program. Consider the differences between a small text editor and a desktop publishing package. To edit a few lines of a text file, it would be impractical to use a desktop publishing package—much like using a helicopter to cross a street.

The main considerations for keeping footprint sizes in check are the impact on available resources and the impact on performance, as discussed here in more detail.

Impact on Available Resources

The obvious reason for keeping storage and runtime footprints as small as possible is the cost involved. For example, the larger a piece of embedded software is, the more memory is needed in a product that incorporates it. The incurred cost is directly related to runtime execution size. But not only embedded systems need this consideration. Plainly stated, the larger the footprint of a program is, the smaller is its possible market share. Not every user has a state-of-the-art machine with the largest available hard disk. A program with lower system requirements accommodates more users.

Also, a program's usability is affected by the size of the runtime footprint. If a program uses a lot of internal memory, it might force the operating system to start swapping memory to the hard disk and back. Remember also that programs virtually never have the sole use of the system. When there is little free internal memory left, an increasingly large part of the memory that is temporarily not needed is swapped out onto the hard disk. The chance that a certain operation will require data that is not found in memory increases. The result is a hard disk that makes a lot of noise and programs that halt and stutter, causing overall slowdown and annoyance.

Impact on Performance

Another reason to keep footprints small is the time loss (performance problems) incurred while handling programs. Because we have already mentioned many examples in this chapter, this section therefore discusses only performance issues relating to the handling of larger file sizes. Think, for example, of the following:

  • Making backups of programs

  • Loading programs into memory before execution (+ network traffic)

  • Exchanging data and program components between processes

  • Slow database access due to large data sizes

How to Measure Footprint Size

Measuring footprint size is an activity that differs strongly between projects. An exact definition, or even a set of definitions, is highly project dependent, as are the accuracy with which to specify the needed storage footprint (megabytes, kilobytes, or even in bytes or bits), how early in the project to make footprint estimations, and so on.

This section highlights issues concerned with estimating footprint sizes.

Measuring Storage Requirements

In most cases it's sufficient to simply add up the program file sizes, but there can be more to measuring the storage space requirement of a program. Think, for example, about the use of cross compilers. A lot more work needs to be done with a program developed on hardware not entirely identical to the target machine—the machine which will eventually run the program in a commercial setup. To fairly estimate the requirements for the target machine, some preliminary tests might be necessary to determine any differences of machine code or executable size on the two machines. This way, it will be easier to predict the size of the final program. For example, the processor of the target machine can use twice as many bytes to describe its instructions as the machine used for development. The target machine in turn could have a much better file management system that uses only half the overhead for block allocation. To make a better initial estimate of the target footprint, it is wise to make a good document or spreadsheet describing these kinds of differences at the beginning of the project. This is also useful during development to quickly recalculate target footprint size as the design of the program changes or its user requirements readjust.

Measuring Runtime Requirements

Added to the problems associated with using different development and target systems, the dynamic aspects of programs further complicates measurement of runtime requirements. This section discusses these aspects and offers advice on measuring them.

It is necessary to calculate the actual uncompressed runtime code size. This is the largest size of all possible module combinations; the design should clearly state which combinations of modules could be found in memory when overlaying is used.

Moreover, the dynamic memory usage of a program can be monitored. Basically this means running the program through realistic worst-case test scenarios and looking at what the program does with the internal memory. Although tests are performed with tools, it might also be necessary to write some kind of indirection layer that will catch the calls the program makes to the operating system. By having the program call a special glue layer when asking for (or returning) memory, this layer can aid by recording sizes required and released. It could even store information about memory location, making it possible to look at fragmentation patterns.

Spy software could also be used to look at the behavior of the program stack, heap size, or usage of other scarce operating resources (IRQs, DMA channels, serial and parallel ports, and buses).

When Do Footprint Problems Arise?

Runtime footprint problems can be inadvertently introduced during any phase of development, but they are unlikely to be discovered before either rigorous testing or field use of programs. The following sections explain how footprint problems could remain undetected for so long.

Design

Apart from bugs in programs, memory leakage, and fragmentation, it is also possible that conscious design decisions actually cause unnecessary or unexpected footprint problems. As we have seen before, part of the runtime footprint is used during the execution of the program for storing (intermediate) data. How the program handles this intermediate data determines its behavior when the data set grows. It is fairly likely that a routine used for transforming a certain set of data will, at some point, cause the original data and the transformed data to be in memory at the same time. An example of this is the aforementioned decompression routine. Designed very badly, this decompression routine might keep all the compressed data in memory while it allocates yet more memory to store all the decompressed data. This will not pose a problem for small files; in fact it is a very fast way of working due to the low level of instruction overhead. Consequently, this routine will seem to work fine during a test phase that uses only small files. However, as the program is called on to decompress ever larger files in the field, the problem becomes apparent. A better designed program would first take into account the amount of memory available and then determine the appropriate amount of input data to load.

Of course this example is rather extreme, but it does illustrate what kind of oversight can be made during the design phase and the unexpected impact memory management schemes can have.

Efficient memory management is described in detail in Chapter 9.

Implementation

The design of a program gives its overall description; it is usually made with a high level of abstraction as far as individually identified components are concerned. Especially when dealing with large projects, a lot of the implementation details will be left up to the implementer. The design of a component could, for example, describe it as "knowing" where to find its peer components. The implementer of that component has to choose the specific means of imple menting this relationship. He could choose a linked list of data blocks, an array, or even a tree. When choosing the linked list, he still would have to decide whether to make it doubly linked and allow some kind of sorting.

Though these choices do not seem relevant or even interesting from the design point of view, they can nevertheless have an unexpected effect on footprint size. Take, for example, a programmer who has heard that using macros and inline functions can increase performance by reducing function-call overhead. In essence his assumption is correct, but in using these techniques indiscriminately, he has failed to comprehend the downside of what he is doing. The macros and inline functions, some of which he has even managed to use in nested constructions, are expanded during compile time and cause his program to become unexpectedly large. The programmer in this example would have done better by using macros and inline functions only in those parts of the program which are performance bottlenecks, lacking a more elegant solution.

There are tools, called profilers, to help locate performance bottlenecks (see Chapter 4). The use of macros and inline functions is further discussed in Chapter 9.

Compilation

The compiler options used during compile time greatly influence the generated footprint. In fact, most compilers even have specific options for optimizing the generated code for either footprint size or performance. There are hidden dangers in the use of these options however. Refer to Chapter 4 for more detailed information.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset