Analyzing Target Areas

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Analyzing Target Areas

The next two sections go deeper into the mechanics of the second step of the optimization process: analyzing the candidate areas and selecting targets for optimization.

Design Flaws, Faulty Algorithms, and Bugs

When the information of step 1 is available (profiler data, problem analysis, and so on), it is prudent to have another look at the design. In essence, compare the timing information found in step 1 with that estimated at design time. Questions to consider include, Which parts of the system were expected to be slow? and How slow were they expected to be? It now becomes even more clear how important quantifiable requirements and design estimations are (refer to Chapter 2, "Creating a New System"). Time previously spent on those two topics is now paying off.

The following sections examine some common situations which can be found when comparing the results of step 1 with the requirements and design.

Unpredictably Excessive Processing Time

When parts of the system exceed the amount of time initially anticipated, quite likely a coding problem—introduced in the implementation phase—is at the heart of this behavior. This could simply be a bug or perhaps even a problem with the chosen algorithm.

Some typical examples of the kinds of bugs found in the field are

Loops that do not terminate in time
Functions that are called too often
Counters that wrap around
Incorrect placement (or the omission) of such program statements as ";", "} ", "break", "if", "else", and so on
Semaphores and other kinds of (critical section) locks that do not match or circle around into a deadlock
Variables that are incorrectly initialized before use or that are not updated when they need to be
Input data that is not checked on corruption, or valid values
Input data is not available, causing the program to wait or use buffered data (or perhaps even uninitialized memory)
Debug code that is accidentally being executed (#ifdefs missing or placed incorrectly, debug options in the compiler which are turned on unintentionally)

Some typical examples of algorithm problems found in the field are

Inefficient calculations (calculating sine/cosine and so on instead of using precalculation tables)
Inefficient functions for searching, replacing, and sorting data
Resource managers that perform poorly (memory managers, file interaction classes, and so on)
Caching mechanisms that do not work properly or are not optimally tuned
Interrupts that are never generated or that are serviced at moments when they are not useful (due to being blocked before)
Blocks of data that are marked as deleted but which are never actually deleted, making searches and insertions in data structures slower
Inefficient conversions of data formats
Inefficient storage methods (writing a text character by character rather than in blocks at a time)

For most bugs, just turn directly to the code and begin repairing. This is true also for some algorithm problems; however, most will likely need some kind of redesign. In such cases, those parts of the system were not originally well designed, so more damage than good can come from quickly hacking a patch or workaround. Sometimes, more time is needed for parts of the system that might have seemed trivial at the start.

Predictable Processing Time

More drastic measures might be necessary to optimize performance and footprint when processing times are approximately what is expected but are still too slow. Although it is possible to make existing algorithms somewhat more efficient, the source of the problem usually is in the design. It's necessary to redesign the entire process, not merely the algorithms. To do so, go back to reacquaint yourself with the original problem domain that your system is supposed to solve.

These performance problems are present at a very high abstraction level (the design), and thus introduced not by the implementer(s) but by the architect(s). This means that optimizations will be very time consuming and most likely will result in fairly extensive code changes or additions. These design flaws can generally be traced back to poor, or constantly changing, requirement specifications. In fact, without complete requirements, developers can create software that completely misses its mark. Consider the following scenario. A pen manufacturing company might give developers incomplete requirements for their database. So they create a beautiful and elegant database that is lighting fast at finding specific product entries. However, this manufacturer produces only twenty distinctly different products. He accesses the database through client entries, of which there are thousands. This example shows that without clear requirements that reflect what is expected of the system, the design and implementation might miss the correct focus.

The architect designs the system based on impressions from reading the requirements (and from any interaction with the actual clients or end users). The implementers further interpret the requirements as they write the code. Any considerations overlooked at the beginning of this process will likely be lost.

Unpredictably Reduced Processing Time

You might keep in mind that reduced processing time can be a red herring. Remember that you are optimizing existing code because it suffers from footprint or performance problems. Investigate all unexpected behavior, including parts of the system that are unexpectedly fast or use less memory than anticipated. These characteristics are likely indications of problems. For example, data might be processed incompletely or incorrectly. This might even be the cause of performance problems in other parts of the system. One thing is certain: any existing code problems will crop up sooner or later to make your life miserable (if you want to know exactly when, just ask Murphy). Preventive maintenance is essential.

Looking at the Data Model

Chapter 2 provides an overview of typical life cycles for different types of software objects. This section examines the different characteristics of the program data that can be managed by these objects. How and where this data is stored has an impact on both footprint and performance, as does the choice of when to store, process, and transmit the data.

Many performance and footprint issues are directly related to data access. Performance of most systems will drop drastically when data is needed but cannot be accessed immediately. Likewise, footprint size will become problematic when you try to store too much data close at hand. Intelligent data models are therefore very important in creating an efficient program or system (refer to Chapter 11, "Storage Structures," for technical details and examples). When profiling information indicates that data access seems to be a bottleneck, take another look at the data models used. Note that the (plural) term data models is used here purposely. Systems generally use vastly different types of data (program configuration data, user data, stored input, and so on). Often, it will be unwise to use the same data model for different types of data, even though this might save on development time. Identifying the separate types of data is important.

Distinguish different data types by these key characteristics:

The frequency with which the data is accessed
The specific program modules that access the data
The block size of the data
The number of blocks expected to be stored simultaneously at any given time
The time of creation or receipt of the data
The specific program modules that create or receive the data
The typical lifetime of the data in the system

When the different data types are identified, consider how to store and access each type. This is where data models come in.

Storage Considerations

It is important to determine per data type, whether you are dealing with many small blocks, a few large blocks, or perhaps combinations of both. Sometimes it is advisable to group data (blocks) together into larger blocks. This way, it can be stored and retrieved more efficiently, especially when grouping is done in such a way to group data that is likely to be needed simultaneously. However, sometimes it is advisable to split up larger blocks of data for exactly the same reason. Handling data in blocks which are too large can prove equally inefficient. For example, performance can drop suddenly when the OS needs to start swapping memory to and from storage to find a large enough continuous address space. However, consider keeping the data completely in cache when it is accessed often. Refer to Chapter 5, "Measuring Time and Complexity."

Processing Considerations

You might want to split up or shift the moments at which you process the data. It might be possible to divide the processing step into several substeps. This way, you can store the intermediate processing results to save storage and optimize idle CPU time (effectively evening out the CPU workload). You might even consider shifting the whole processing of data to an earlier or later part of the program.

Transaction Considerations

After obtaining profiler information, consider these questions so that you have a more complete picture of the situation (refer to Chapter 4): How and when does the data come into the system? How and when do you transmit or display the processed data?

When the data arrives in bursts, a different data model is needed from when it trickles in at a low rate. The extent to which data arrival can be predicted also plays a role. Sometimes it is possible to use idle time when data comes in. When input arrives via keyboard, for example, it is unlikely to be taxing for any kind of system.

The answers to these questions provide different details for the data models than those the profiler can give. This is because the profiler cannot account for expectations and predictability. After you map out these data interactions, concentrate on leveling out the CPU load. Even a small amount of leveling will be beneficial. This also holds true for memory usage. Spreading out memory usage over time affects not only runtime footprint but also performance. An added advantage here is that the memory which is freed up during this action can be used for instance to increase cache and buffers to boost performance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Analyzing Target Areas

Create new playlist

Sign In

Sign Up

Analyzing Target Areas

Design Flaws, Faulty Algorithms, and Bugs

Unpredictably Excessive Processing Time

Predictable Processing Time

Unpredictably Reduced Processing Time

Looking at the Data Model

Storage Considerations

Processing Considerations

Transaction Considerations

Table of Contents for
Analyzing Target Areas