System Influences

When doing timing tests as proposed in the previous section, it is important to realize that you are in fact timing system behavior while you are running your code. This means that, because your piece of code is not the only thing the system is dealing with, other factors will influence your test results. The most intrusive factors are other programs running on the system. This influence can of course be minimized by running as few other programs on your system as possible while performing timing tests, and increasing the priority of the process you are timing as much as possible. Still some influences will remain because they are inherent to the operating system. This section discusses the remaining system influences and how to deal with them.

Cache Misses

Cache is memory that is used as a buffer between two or more memory-using objects. The cache that is referred to in this section is the buffer between the CPU and the internal memory of a computer system. Most CPU architectures split this cache into two levels. Level 1 cache is located physically in the CPU itself, and level 2 cache is located outside, but close to, the CPU. Level 1 cache is usually referred to as l1 in technical literature, level 2 cache is usually referred to as l2. Figure 5.1 depicts the two-level cache concept.

Figure 5.1. CPU level 1 and level 2 cache.


The reasons for splitting the cache into two levels have to do with the different characteristics of the cache locations. In general, available space for cache inside a CPU is smaller than outside, with the added advantage that accessing cache inside the CPU is often faster. On most architectures l2 has to be accessed with a clock frequency which is a fraction of the CPUs clock frequency. So, as a rule of thumb: l1 is smaller and faster than l2.

Usage of caching means that a CPU does not need to retrieve each unit of data from memory separately; rather, a block of data is copied from memory into cache in one action. The advantage of this is that part of the overhead of transferring data from memory to CPU (for example, finding an address and doing bus interaction) is incurred only once for a large number of memory addresses. The CPU uses this cached data until it needs data from a memory address that is not cached. At this point a cache miss is incurred and a new block of data needs to be transferred from memory to cache. Of course data is not only read, it can be changed or overwritten as well. When a cache miss occurs, it is often necessary to transfer the cache back to internal memory first, to save the changes. Each time a cache miss is incurred, processing halts while the copy actions take place. It would, therefore, be unfortunate if a cache miss causes a block of cached data to be switched for a new block, after which another cache miss switches the new block back for the original block again. To minimize these kind of scenarios occurring, further refining caching concepts are often introduced in system architecture, as the following two sections explain.

Using Cache Pages

With cache paging the available cache resource is divided into blocks of a certain size. Each block is called a page. When a cache miss occurs, only one page (or a few pages) is actually overwritten with new data, and the rest of the pages stay untouched. In this scenario, at least the page in which the cache miss occurred has to stay untouched. This does mean that for each page, a separate administration is needed to keep track of the memory addresses that the page represents and whether changes have been made to the data in the page.

Using Separate Data and Instruction Caches

With separate data and instruction caches the available cache resource is split into two functional parts: one part for caching CPU instructions, which is a copy of part of the executable image of the active process; and another part for caching data, which is a copy of a subset of the data that is used by the active process.

These strategies are, of course, very generic as they cannot take into account any specific characteristics of the software that will run on a certain system. Software designers and implementers can, however, design their software in such a way that cache misses are likely to be minimized. When software is designed to run on a specific system, even more extreme optimizations can be made by taking into account actual system characteristics—such as l1 and l2 cache sizes.

Different techniques for minimizing cache misses and page faults are presented in a later section titled "Techniques for Minimizing System Influences."

Page Faults

The concept of using paging, as discussed in the previous , can also be used on internal memory. In this case, internal memory acts as a cache or buffer between the CPU and a secondary storage device—for example, a hard disk. When this concept is applied to internal memory it is called virtual memory management. Figure 5.2 depicts the virtual memory concept.

Figure 5.2. Virtual memory.


When virtual memory is used, the internal memory is divided into a number of blocks of a certain size. Each block is called a page. The memory management system can move these pages to and from the secondary storage. On the secondary storage, these pages are stored in something called a page file or a swap file (for maximum performance, even a swap partition can be created). In general this page file can contain more pages than can fit in memory. The size of available internal memory is thus virtually increased by the usage of the larger page file. However, whenever an active process refers to data or an instruction which is located in a page that is not at that time in internal memory, a page fault occurs. The page which is referred to has to be transferred from secondary storage to internal memory. In the worst case another page has to be transferred from internal memory to the page file first, to make room for the new page. This is the case when some of the data of the page that will be replaced was changed.

Other OS Overhead

Apart from overhead which is directly related to your process, the OS itself can also generate more overhead which is related to other activities.

Task Switching

Unless your target is an embedded system with a very straightforward architecture, chances are that your target is multitasking.

In practice this means there are always other processes (daemons, agents, background processes, and so on) running alongside your testing process. Some will be small, whereas others might be fairly large compared to your tests. What the OS does is give all these processes some CPU time to do their job, based on their individual priorities. This is called task switching and it can bring along overhead varying from simply saving CPU registers on stack to swapping entire pages in and out of internal memory.

IO

When a process on your system is performing IO this will generally be a large influence on system time spent. This is because IO to secondary storage or external devices is generally many times slower than CPU-to-memory interaction. Similarly, when your test program uses a lot of IO, it might not perform exactly the same during two separate test runs. A later run could use data that was buffered during an earlier run.

General OS Administration

An OS will, from time to time, do some administration. This can be anything from updating some kind of registry to doing active memory management. On some systems this is done only after a certain amount of idle time; however, the OS can still decide to do this just as you are about to run your test.

User Interaction

When a user interacts with the system, chances are that some kind of background process will suddenly become very active to deal with all the new input. It is generally a good idea to interact as little as possible with the system while running your tests. This includes not moving the mouse, playing with the keyboard, or inserting floppy discs or CDs as the OS might do more than you expect (like displaying ToolTips, looking for environment variables, checking status/validity, switching focus, and so on).

Techniques for Minimizing System Influences

Before starting tests, you should ensure that OS overhead is set to a minimum by running as few other programs on your system as possible and by setting the priority of the process doing the test as high as possible. This will minimize task switching and OS administration overhead. Also it might be a good idea to start tests on a clean computer—that is, one which was just booted. This way the legacy of old programs, such as a certain level of memory fragmentation, can be minimized. Then, to get a good picture of how a certain algorithm or part of a program performs, several tests should be run. From these test results, you can eliminate those results with inordinately high values (caused by cache misses, page faults, and so on) or inordinately low values (caused by buffered IO and so on). A clean average should prevail. However, for certain influences it is best to minimize their occurrence by writing better code in the first place—for example, for cache misses and page faults. It is possible to design software in such a way that cache misses and page faults are likely to be minimized, even without necessarily taking specific system architecture information into account. This remainder of this section presents these kinds of techniques.

Minimizing Data Cache Misses and Page Faults
  • Group Data According to Frequency of Use

    Place data that is used most frequently, closely together in memory. This increases the change of having at least some of it at hand almost continuously.

  • Group Data According to Time/Space Relations

    Place data that is often needed simultaneously or consecutively closely together in memory. Creating this kind of grouping of related data increases the chance that data is available when it is needed.

  • Access and Store Data Sequentially

    When data is accessed sequentially, chances increase that a miss or a fault will retrieve several of the data elements which are needed in the near future. If the data structure consists of some kind of linked elements (linked list, tree, and so on) this only works when sequential elements are stored closed together and in sequence in memory. This is because otherwise a linked element can basically be anywhere in memory.

  • Perform Intelligent Prefetching

    When it is possible to determine with reasonable accuracy what data is needed in future, a deliberate miss or fault can be generated during idle time to facilitate a kind of prefetching mechanism. This kind of strategy can be inefficient when it is impossible to determine which pages will be used for the prefetch. You will most likely want to combine this strategy with the following.

  • Lock Frequently Used Data

    When data is needed on a frequent or continual basis during some stage of a program, it could be worth while to find out whether the page it is in can somehow be locked. For more information, consult system and compiler documentation.

  • Use the Smallest Working Set Possible

    On most systems, it is possible to influence the working set of a process. This could aid in battling page faults as the working set determines how much memory will be available to a process. The closer this working set matches the maximum memory needs of a program, the closer its data and instruction are packed together. For more information, consult system and compiler documentation.

Minimizing Instruction Cache Misses and Page Faults
  • Group Instructions According to Frequency of Use

    Place functions or sets of instructions that are used most frequently closely together in memory. This increases the change of having at least some of them at hand almost continuously.

  • Group Related Instructions Together

    Place functions or sets of instructions that are often needed simultaneously or consecutively closely together in memory. Grouping related instructions in this way increases the chance that instructions are available when they are needed. In effect this means that you try to determine chains of functions that are likely to call each other and place them together and in order.

  • Lock Frequently Used Instructions

    When a function or group of instructions is needed on a frequent or continual basis during some stage of a program, it could be worth it to find out whether the page they are in can somehow be locked. For more information on this consult system and compiler documentation.

Note that not all strategies given here are always possible or even desirable. Implementing a certain strategy can sometimes have more cons than pros. For each separate situation the strategies should be weighed on their merits. Note also that some strategies are not particularly suited to be combined.

Most standard platforms have tools which help you determine the number of cache misses and page faults that occur during the running of a piece of software, and the time that is lost during a certain kind of cache miss (l1/l2) or page fault. For more information, consult system and compiler documentation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset