Performance

The first part of this chapter discusses optimization from the performance viewpoint. Here not only software and hardware characteristics are discussed, but also how performance is perceived by users of a system.

What Is Performance?

What does performance actually mean in relation to software? The simple answer is that performance is an expression of the amount of work that is done during a certain period of time. The more work a program does per unit of time, the better its performance. Put differently, the performance of a program is measured by the number of input (data) units it manages to transform into output (data) units in a given time. This translates directly into the number of algorithmic steps that need to be taken to complete this transformation. For example, an algorithm that executes 10 program statements to store a name in a database performs poorly compared to one that stores the same name in five statements. Similarly, a database setup that requires 20 steps to be taken before it knows where new data is to be inserted has a higher impact on program performance than a database setup that does the same in 10 steps. But there are more things to consider than purely technical implications, which is what this section will highlight.

Of the software that is written today, a very large part is set up to be used by one or more users interactively. Think of word processors, project management tools, and paint programs. The users of these kinds of programs generally sit behind their computers and work with a single program until they have completed a certain task—for example, planned the activities of a subordinate, drawn a diagram, or written a ransom note. So let's examine how such a user defines performance; after all, in most cases he will be the one we do the optimizations for. Basically, there are only three situations in which a user actually thinks in terms of performance at all:

  • When a task takes less time than anticipated by the user.

  • When a task takes more time than anticipated by the user.

  • When the size or complexity of the task is apparent to the user.

Examining these situations can provide further guidelines for defining performance. Here follow three examples that illustrate the bullet points:

A task can take less time than anticipated by the user when, for example, this user has been working with the same program on the same computer for years and her boss finally decides to upgrade to next-generation machines. The user is still running the same program, but because the hardware can execute it faster, performance seems to be better. Also, the user has become accustomed to a certain kind of behavior. In the new situation her expectations are exceeded, she no longer has to twiddle her thumbs when saving a large file or performing a complex calculation.

A task can take more time than anticipated by the user when, for example, this user works with a program that handles a large base of sorted names. On startup, the program takes about 15 seconds to load and sort its data, without giving status updates. Even if its algorithms are highly optimized, the user views this unanticipated "delay" as poor software performance.

The size and complexity of a task can be apparent to the user when, for example, the user works with a program that searches through megabytes of text files to find the occurrences of a certain string. This action takes only seconds and, because of her technical background, the user knows what is involved with this action. Her perception of the performance of the search program is therefore favorable.

These examples demonstrate that performance is more than a simple measure of time and processing. Performance from the view point of a user is more of a feeling she has about the program than the actual workload per second it manages to process. This feeling is influenced by a number of factors that lead to the following statements:

  • Unexpected and unexplained waiting times have a negative effect on the perceived performance.

  • Performance is a combination of hardware and software.

  • Performance depends on what the user is accustomed to.

  • Performance depends on user's knowledge of what the program is doing.

  • Repetition of a technically efficient action will still affect perceived performance no matter how knowledgeable the user is.

Why Optimize?

Although optimization is a logical choice for those who write time-critical or real-time programs, it has more widespread uses. All types of software can in fact benefit from optimization. This section shows four reasons why:

  • As programs leave the development environment and are put to use in the field, the amounts of data they need to handle will grow steadily. This eventually slows the program down, perhaps even to the point of it being unusable.

  • Carefully designed and implemented programs are easier to extend in the future. Consider the benefits of adding functionality to an existing program without concern about degrading its performance due to problems in the existing code.

  • Working with a fast program is more comfortable for users. In fact, speed is typically not an issue until it slows users down.

  • Time is money.

A tempting question that you are bound to ask sooner or later is, Why not just buy faster hardware? If your software does not seem able to cut it anymore, why not simply upgrade to a faster processor or use more or faster memory? Processing speed tends to double every 18 months, so algorithms that might have posed a problem six months or a year before might now be doable. But there are a number of reasons why optimizing software will always be needed.

A faulty design or implementation decision in an algorithm can easily slow it down 10–100 times—when sorting or storing data, for example. Waiting for hardware that is only twice as fast is not a solution.

Programs, and the data they handle, tend to grow larger and larger during those same 18 months, and users tend to run more and more applications simultaneously. This means the speed requirements for the program might increase as fast as, and sometimes even faster than, the hardware speed increases.

When programmers do not acquire the skills to optimize programs, they will find themselves needing to upgrade to new hardware over and over again.

With software that is part of a mass-market system (for example, embedded software in TVs, VCRs, set-top boxes, and so on), every cent of cost will weigh heavily. Investments in software occur only once, whereas investments in hardware are incurred with every unit produced.

While processors continue to become faster and cheaper, their designs change also. This means that more investments need to be made to upgrade other parts of the systems.

The lower the system requirements for a certain program are, the larger the market it can reach.

Buying new hardware to solve software problems is just a temporary workaround that hides rather than solves problems.

One thing to keep in mind when talking about performance problems is that they are generally not so much the trademarks of entire programs as they are problems with specific parts of a program. The following sections of this chapter focus on those programming areas that are par ticularly prone to causing performance problems.

Performance of Physical Devices

When a program uses one or more physical devices, performance issues can arise in those parts of the program where interaction with these devices takes place. Physical devices are slower than normal memory because they often contain moving parts and need special protocols for access. Also, different kinds of devices operate at different speeds. Important performance decisions include determining which kind of device to use for what purpose and when and where in the program to access the devices. Chapter 12, "Optimizing IO," explains this in greater detail.

Examples of (relatively) slow physical devices include hard disks, smartcard readers, printers, scanners, disk stations, CD-ROM players, DVD players, and modems.

Here are some considerations when using physical devices:

  1. It stands to reason that the more frequently a set of data is used, the closer you will want to place it to the program. Data that is referred to constantly should therefore, if possible, be kept in internal memory. When the data set is not too large and remains constant, it could even be part of the executable file itself. When the data set is subject to change, however, or should be shared between programs, it would be wiser to store it on a local hard disk and load it into memory at runtime. It would be unwise to store it on a network drive; unless you specifically intend for the data to be accessed by different work stations or to be backed up remotely, the use of a network drive would just add unwanted overhead. The choice of device should clearly be closely related to the intended use of the data that it will store.

  2. During the design phase of a program, look closely at how and when the data is accessed. By making a temporary copy of (a block of) data, it is possible to increase the speed of accessing it. For example, consider the following scenario in which it is necessary for a program to access data being used by a physical device. This creates no problem when both the program and the device are merely reading the data and not changing it. However, when the data is being changed, some kind of locking mechanism must be used. Of course, this takes time, as the program and device have to wait for each other. This type of problem can be identified at design time, and possibly avoided. For example, with a temporary copy of the data solely for use by the device, the program could continue to make changes to the data, whereas the physical device is used merely for output purposes (taking, if you will, a snapshot of the data). This way the program and the device will not trip over each other. When the device has finished, the memory containing the copy of the data can be reused. If the amount of data involved is too large either to allow efficient copying or to be allocated in memory twice, the suggested technique could still be applied, but to smaller subsets of the data.

  3. It is usually a good idea to compress data before sending it when communicating with relatively fast devices over a slower connection (two computers connected via serial cable or modem, for example). When choosing the compression algorithm, be sure that the time that is won by sending less information over the connection is more than the time that is needed by the slowest end of the connection to perform the compression or decompression.

Performance of System Resources

Not only physical devices but also the system resources themselves can cause noticeable slowdown (EPROM, ROM, RAM, and so on). This does not necessarily indicate an incorrect choice of hardware but it does mean that care needs to be taken first during the design phase and later during the implementation phase of a project. For example, consider moving parts of ROM to RAM when using ROM slows down the program. Although this type of copy action eats up the necessary CPU clock cycles, it will be done only once and every single access to the memory in question will benefit from a faster response. Clearly, only the intensely used parts of the ROM should be considered for this kind of treatment—and only when there is enough memory to spare. Having said that, there need not be a fragmentation impact on whatever memory management scheme was chosen, as this piece of memory will most likely be used during the entire lifetime of the program. Refer to Chapter 9, "Efficient Memory Management," for more detail.

A similar enhancement can be made for RAM access versus CPU registers, although its application is somewhat more limited. Most compilers allow you to make suggestions about placing variables directly into the registers of the CPU. The advantage of this is that register access is even faster than RAM access. For RAM access, the CPU has to send a request for data on the internal bus to the memory address mappers, which in turn have to interpret the request to find the appropriate memory address and send the contained value as a response (some operating systems also use indirection schemes in memory access). Registers are part of the CPU chip itself and can therefore be accessed directly.

There is a downside of course. CPU registers can have different sizes but will rarely exceed 64 bits. Also, the number of registers per CPU is limited and a fair share will be occupied almost continuously. That is why in practice registers will be used for variables that are accessed often over short period of time (loop counters and so on). Refer to Chapter 6, "The Standard C/C++ Variables," for more detailed information on variable use.

Another aspect to take into account is the operating system (OS) because accessing system resources is often done through operating system calls. Keep in mind that operating systems implement these calls as generically as possible to be able to run every kind of program with reasonable results. A software designer, however, has more information on the typical resource usage of his program. This knowledge can be used to write more efficient interaction. For example, when the OS uses a relatively slow memory management scheme, certain design considerations can be made to compensate. A program might benefit from a design in which allocated memory is reused internally instead of released back to the system. Chapter 9 deals specifically with these kinds of issues.

Finally, consider examiningthe architecture documentation of the CPU(s) being used. The following practical example shows what kind of optimizations can be found. To use the Intel MMX instructions, the coprocessor needs to be set to MMX mode. This switch costs time. Then, when normal calculations need to continue, the coprocessor needs to be switched back again, causing more lost time. So to avoid unnecessary switches, instructions need to be grouped by mode as much as possible in a design that uses these two modes. Refer to Chapter 4, "Tools and Languages," for information on tools to use to determine which parts of a program cause slowdown.

Performance of Subsystems

An old proverb says a chain is only as strong as its weakest link. This holds true also for software, particularly when it comes to performance issues. Performance problems are likely to occur from using a badly designed third-party library, or indeed one that was optimized for a different kind of use. So before using subsystems, it is advisable to run some performance tests—if only to find out what to expect in practice. It might be possible to design around identified problems. But be prepared to rewrite a subsystem or look for replacements. Generally this would be considered the preferred option. Otherwise future enhancements to the program will continue to suffer from an initial bad choice. Avoid creating workarounds if there is even the remotest possibility of having to replace a subsystem at some point down the line anyway. Time constraints could force a development team to use two similar subsystems simply because the old one is too slow and it would take too much time to incorporate the new one in every part of the program. Clearly this is an unfortunate waste of time and resources.

The way in which a subsystem is incorporated into a program affects the performance of the link between the two. Simply calling the interface of the subsystem directly from the program causes the smallest amount of overhead and is therefore the fastest. It does mean that at least one side of the link will need its interface adapted to fit the other. When for some reason both sides cannot be altered—for example, because the third-party sources are unavailable—it is necessary to insert some kind of glue or wrapper layer between the two. This means that communication calls will be redirected. This means extra overhead.

However, this same kind of go-between glue layer can also be used to test the functionality and performance of a single part of a system. In this case the glue layer, now called a stub, does nothing or simply returns fixed values. It does not call another object to pass anything on. It simulates the objects being interacted with. The performance of the object being tested is no longer influenced by other parts of the system. Refer to Chapter 2, "Creating a New System," for more details on prototyping.

Performance of Communications

Performance problems are inevitable where communications take place. Think, for example, of communications between separate computers or different processes on the same computer. The following problems are likely to occur:

  • The sender and receiver operate at different speeds (for example, different hardware configurations or scheduling priorities).

  • The link between the sender and the receiver is slow (for example, a serial cable or modem between two fast computers).

  • The sender or receiver is slowed down because it is handling a high number of connections (for example, an ISP).

  • The sender or receiver has to wait for its peer to arrive in the correct program state (for example, a connection has to be set up or data has to be gathered before being sent).

  • The link between sender and receiver is error-prone (for example, a lot of data needs to be retransmitted).

Where possible, programs should avoid halting activity by waiting on communications (busy-wait constructions) or using polling strategies that periodically check on connections to see whether they need to be serviced. Instead, communication routines should be called on an interrupt basis or via callback functions. This way, a program can go about its business until it is signaled to activate communication routines.

The elegance of using callback functions lies in the fact that callback functions are part of the program that wants to be notified of a certain event taking place. Thus these functions can have complete access to the data and the rest of the functionality of the program. The callback function body contains those instructions that need to be carried out when the event comes about, but the function is in fact called by the object generating the event. By passing a reference to a callback function to an object, you give the event-generating object a means to con tact the program.

So switching from polling or busy-wait strategies to interrupt and callback strategies offers the following advantages:

  • Programs will be smaller, as fewer states need to be incorporated.

  • Programs will be faster, as execution does not need to be halted at strategic places to check for interesting events taking place.

  • Responsiveness to events will be faster, as there is no longer a need to wait for the program to arrive in a state in which it is able to recognize events.

The use of callback functions is discussed further in Chapter 13, "Optimizing Your Code Further."

Application Look and Feel

Application look and feel, otherwise known as graphical user interface (GUI), is important because the users'perceptions of performance are important, as discussed earlier in this chapter. A specific performance optimization task is thus to view the program from the perspective of the user. Although this is logical, it is probably not surprising that in practice this step is mostly overlooked. One reason for this is that developers and testers work with prototypes for a long time and get used to GUI inconsistencies. It is generally assumed that any look-and-feel problems will be weeded out during later phases of development as, per definition, prototypes are unfinished. The unintuitive aspects of the user interface lack, at that point, the priority to be fixed. Although developers and testers might become accustomed to the interface and overlook its problems, it is most unlikely that the user/client will be equally tolerant. Consequently, working on prototypes and beta versions is not a particularly useful way to weed out GUI problems.

It is a good assumption that users and programmers have completely different perspectives of a program for the following reasons:

  • The user sees only the user interface; to her this is the program.

  • Most of the time the user is unaware of exactly what the program is doing internally and has at best only an abstract concept of its overall work.

  • The programmer focuses much more on the source code, making that as good as it can be.

  • The programmer sometimes views the GUI as a necessary evil. An interface that quickly makes all the functionality of the program accessible will then be stuck on top of the program.

  • Perhaps the most important reason of all is that the user and the programmer have different backgrounds, experiences, and goals with respect to the program. Ideas about what is logical will therefore differ.

The following sections provide more detail on how to identify, prevent, and overcome GUI problems and annoyances.

Unexplained Waiting Times

When programmers forget to add some kind of progress indicators at places in the program where large batches of work are being done, the program will in effect seem to be halting at random to the user. He selects a command from the program's menu and suddenly his computer seems to be stuck for a few seconds. This will be regarded as very frustrating because the user is not aware of what is happening. The programmer in turn probably did not even notice this "look and feel" problem because he knows what the program is doing and therefore expects the slowdown.

Simply adding some text in a status bar explaining what is happening, or spawning a little window with a moving slider indicating elapsed time, will greatly enhance the appreciation the end user has for the program.

Illogical Set Up of the User Interface

Another great way to irritate users is to place user interface controls somewhere where they are not expected. This might seem unlikely but there are, in fact, countless examples to be found in even today's most popular software packages. Finding the menu path File, Tools, Change Password is a practical example of this. But it does not even have to be that obvious.

While designing user interfaces, take into account the experiences of the user. For example, when writing a program for a specific OS, it is a good idea to stay as close as possible to its standard interface. So harmonize with the OS, even if it appears less logical than you'd like, such as Print being a submenu of Edit rather than File. Whether or not the intended users are familiar with the standard interface of the OS, it is wise to take advantage of the available experience, even if its setup could be improved.

Another type of experience that might be used can be found in situations where some kind of automation is done. Whenever users are forced to switch from some kind of manual system—for example, on paper—to a computerized system, they will already need to adapt pretty heavily. Designing a user interface that looks like, and follows the same logical steps as, their old system will benefit them greatly. This also holds true when upgrading or switching computer systems.

Problematic Interface Access

The perception a user has of the performance of a program is mostly determined by the speed at which (new) information appears on her screen. Though it is possible that some kind of delay is excepted when calling up stored data, it is unlikely that any kind of delay will be excepted when accessing menu screens. Menus and submenus should therefore appear instantaneous. When a menu contains stored data, at the very least the menu should be drawn immediately (be it a box, a pop-up, or so on) after which the stored data can be added as it becomes available.

Not Sufficiently Aiding the Learning Curve

Here is where a lot of "look and feel" problems can be solved. A good example of a user-friendly program is one that can follow learning curve of the user. A first-time user will, for example, benefit enormously from having access to an integrated help service. This could be a help menu with the ability to search for key words and phrases or perhaps even the ability to automatically generate pop-up windows with information on what the user is doing and how he is likely to want to proceed. This first-time user is also likely to use the mouse to access the user interface. After using the program a while though, this extra help is no longer needed, and pop-ups and nested menus get in the way of fast access. The user is now more prone to use hotkeys to quickly access functionality, and he will want to turn off any automatically generated help and unnecessary notifications.

When Do Performance Problems Arise?

This section deals with performance problems that can arise during use (or abuse) of existing programs. Performance problems of new systems are the subject of Chapter 2.

This chapter has shown that performance depends heavily on user perception and that certain areas in systems are particularly sensitive to performance loss. Many performance problems found in the field, however, arise because insufficient attention is paid to future use of the programs during the design and implementation phases. Often a program is closely tailored to fit the current intended use, causing it to run into performance problems almost as soon as the slightest alteration is made to the way it is used—more often than not this is because developers work under strict time constraints.

Consider a simplified example of a program that uses a database of names. Although it might work fine for its initial use of approximately 1,000 names, does that provide any certainty for its behavior if another customer decides to use it for a base of 100,000 names? It all depends on how efficient the sorting and storage and retrieval algorithms were initially implemented.

The following sections highlight different performance problems that can arise during the lifetime of a program.

Extending Program Functionality

Performance problems often arise when the functionality of a program needs to be extended. The market demands continuous updates of commercially successful software with newer and improved versions. In fact, many users consider programs without regular updates to be dead and, therefore, a bad investment.

The most common upgrades or extensions include the following:

  • New and better user interfaces including "professional" editions

  • Added capabilities or options

  • Support for more simultaneous users

  • Support for new platforms

  • Upgrades to reflect changes in the nature of the data

  • Added filters to increase interaction with other programs

  • Support for devices

  • Network and Internet capabilities

However, keep in mind that it is neither wise nor beneficial to keep enhancing software with things it was not originally designed for. What you end up with is a very unstable and unmaintainable product. The initial design (the framework of the initial functionality) should be geared toward supporting future enhancements. This also means that the technical documentation should be clearly written and up to date, perhaps even split up in function groupings that follow the design. This is a must if future enhancements are made by people other than the original developers. To add functionality properly, the programmer making the enhancements should be able to easily identify where his enhancement should go and how it should connect to the existing framework.

Code Reuse

Problems generated by reuse of existing code are closely related to those mentioned in the previous paragraph. Reuse of tested code can still cause grief even with successful identification of how to integrate new functionality in an existing, well-designed framework. Think, for example, of a program that contains a sorting routine that cleverly sorts a number of names before they are printed in a window. A programmer adding new functionality might decide to use this routine to sort records of address information to save some precious time. Although the sorting routine might have been more than adequate for its initial use, it can have severe shortcomings with respect to performing its new task. It is therefore prudent to investigate con sequences of using existing code, not in the least by defining new test cases. And again, good documentation plays an important role here. Refer to Chapter 3, "Modifying an Existing System," for more details.

Test Cases and Target Systems

On the whole, programmers are most comfortable when they are designing and writing software, so they generally resist both documenting and testing. So it is not unusual that testing is sometimes reduced to merely checking whether new code will run. The question then is whether the test cases and test data used really represent any and all situations that can be found in the field. Does the programmer even know what kind of data sets will be used in the field and whether it is possible to sufficiently simulate field situations in the development environment? The first step in solving such problems is having a good set of requirements which the programmers can use. If that proves insufficient, it might be necessary to use example data sets, or test cases, from the client for whom the program is being written. Or you might need to move the test setup to the client itself to be able to integrate properly in a "real" field environment. Another common mistake is to develop on machines that are faster or more advanced than those used by the client, meaning that the test and development teams do not get a correct impression of the delays the users will suffer.

Side Effects of Long-Term Use

It is possible that programs slow down when they are used over a longer period of time. Some common problems that can be hidden quite well when programs only run for short periods of time include

  • Disk fragmentation due to usage of files

  • Spawned processes that never terminate

  • Allocated data that does not get freed (memory leaks)

  • Memory fragmentation

  • Files that are opened but never closed

  • Interrupts that are never cleared

  • Log files that grow too large

  • Semaphores that are claimed but never freed (locking problems)

  • Queues and arrays that exceed their maximum size

  • Buffers that wrap around when full

  • Counters that wrap to negative numbers

  • Tasks that are not handled often enough because their priority is set too low

These cases will, of course, take effect in the field and users will complain about performance degradation. These kinds of problems are usually difficult to trace back to their source in the development environment.

It is possible to do manual checks by

  • Stepping through the code with the debugger to see what really happens (more about this in Chapter 4)

  • Printing tracing numbers and pointers and checking their values

  • Checking programs with profilers (refer to Chapter 4), taking care to note the relations between parts of the program (should function x really take twice as long as function y?, and so on)

Flexibility Versus Performance

Although the design should take into account all kinds of future extensions and updates, there is of course a limit to what one can do and predict during the initial design. Sadly, there often simply is not enough time to design a very high degree of flexibility, not to mention implementing it. So a choice to be made is the tradeoff between being flexible for future enhancements or achieving better performance in the present. And no guidelines can really be given here as every situation (every client and every software package) is different. However, keep in mind that a client is unlikely to appreciate a very slow program that is exceptionally well equipped to be extended in the future. Similarly, it is unwise to release a program that in testing has already shown to meet performance requirements only under the most optimal conditions. Developers need to decide where to put the performance/footprint accent, using their knowledge of the target systems and the processes involved.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset