Metrics data, organized in the time domain in a framework, present a window into real world. Our purpose here is to see what the present holds out in the context of the past. We also wish to connect events, like a thread connects beads, and see meaningful patterns from which a future can be forecast. We will also be seeing how control charts can be devised to provide support in decision making. Because software projects run a predetermined path known as the life cycle, with a finite start and a finite end, time domain analysis proves to be only natural. Time domain analysis enables project teams to become sensitive to reality, responsive to situations, and self-organizing through continuous learning.
Plotting data in a chronological order, as in Exhibit 1, brings out the hidden temporal patterns. A causal factor for attrition, the motivational level of employees is measured here as a commitment index and gathered every quarter. We recognize first the simple linear trend, and later more intricate nonlinear trends. While the linear trend captures a broad, long-term behavioral pattern, the local characteristics are captured in increasing level of details by power, polynomial, and moving average trends. All of them are effective in suppressing noise but forecasting scope and efficiency vary. Each analysis offers an adaptive perception, different from the rest. The overall problem, of course, is a steady decline in commitment, but the pattern of decline, the seasonality, and similarity with known trends provide knowledge.
Using time series analysis, events can be predicted based on historical trends. The bug arrival pattern shown here is an important input for maintenance projects to decide the following:
Exhibit 1. Measuring commitment trends.
Forecasting requires that we identify structures in the data, which might repeat.
Software failure intensity data can be plotted and the trend can be used to predict failure, as indicated in Exhibit 2. In fixed assets and facilities management, assets downtime data can be plotted in time sequence, and the trend may be derived and used to forecast spare-parts requirements and manpower and tools requirements to fix failure events. With the information made available by forecasting, one stands to plan better and even avoid those marginal losses that are bound to be incurred without the ben-efit of advance information.
Exhibit 3. Signature profiles of bug population.
Beyond the bug arrival statistics, signatures of bug population are captured periodically, as illustrated in Exhibit 3, and used in prediction. The signatures become yet another dimension in forecasting. Here signature refers to a bar graph showing distribution of bugs among the known categories as percentages. The distribution pattern keeps changing. Risk tracking, risk exposure magnitude, and risk distribution may be carried out in a similar fashion. Defect magnitude and defect signature are known to have been tracked in a similar way by IBM in their ODC framework of defect management.
86 Prediction may be done by seeing patterns across projects or can be done locally within a project. For instance, customer satisfaction index may be tracked in an organization, as shown in Exhibit 4, project after project, and the trend may be used in decision making. The prediction window here is quite large and may run into years. Each project runs within a time window inside which predictions are made. Time to complete a project and cost at completion are both predicted from the earned value graph (EVG), which cumulatively tracks value and cost as a time series.
Exhibit 4. Prediction windows.
Exhibit 5. X-bar chart on TTR.
Within a project, there could be smaller process windows where very short time series curves operate. Reliability growth curve (RGC) tracks defects within the inspection window of the project. Failure intensity curve, being a reliability model, operates in a window that begins with in-process inspection but goes beyond delivery and penetrates into deeper time zones of alpha, beta, and acceptance tests and application runs.
Every metric operates in a time window, which also becomes the prediction window. The window patterns are eventually called models.
A process behavior is characterized, in simple terms, by the mean value and the standard deviation. The first refers to the location of the process and the next represents variation of the process. The weekly average (X-bar value) of time to repair (TTR) bugs in a maintenance project itself is a good indicator of the process. Such a plot is called the X-bar chart, shown in Exhibit 5(a). When the process variations are quite large, central tendency is more meaningful with median values. Therefore, monitoring of process median charts is recommended in these conditions. Exhibit 5(b) shows the plot of median values for the same set of data.
Exhibit 6. Range–standard deviation chart.
Process variation is represented by standard deviation. Exhibit 6(a) illustrates the weekly values for standard deviation, in the form of an S chart. There are occasions when process range is used as a measure of variation in place of standard deviation, which is represented in Exhibit 6(b).
When accompanied by another chart showing how the range (maximum/ minimum) varies every week, the pair is called X-bar–R chart, which has been very popular on the work floor. A simpler way is to plot the mean, minimum, and maximum values in the same graph and construct the MMM chart.
The weekly data set is known as sub-group (the sub-groups could stand for a group of projects, a group of components, etc.). In our example, the MMM chart is plotted for sub-groups, each corresponding to one week.
μ The chart could be modified to consider ( μ + σ ) and ( μ - σ ) instead of the maximum and minimum values to express variations.
In the MMM chart shown in Exhibit 7, we try to see the process central value and boundary and observe how they fall with time, showing a declining trend. The MMM format allows forecasting and pattern recognition.
Park et al., Fenton and Pfleeger, Adrian Burr and Mal Owen, and Thomas Thelin are among the earliest to have applied the traditional forms of control charts to software engineering processes. Many software development houses have adapted control charts in one form or another. An established tool in manufacturing, the control chart is an emergent technology in software development.
Exhibit 8. Tracking growth against point estimate.
In a control chart, process results are plotted in time and compared with an expected value. Examples for the expected values are
In Exhibit 8, the estimated value of cumulative lines of code is plotted against month, and the actually delivered lines of code are compared with the estimated. The perceived gap between the estimated and actual makes the process owner see the problem and do something to bring the process result back to the estimated value. Control here means adhering to a budget or a plan. The essential control chart is a decision support tool, an early warning radar that alerts the user.
90 The estimated value, instead of being a point, could have a range, taking a clue from real-life process variations. Hence, there exists an upper limit and a lower limit for the estimated value, for a given confidence level. If σ σ represents the standard deviation and if the limits are estimated at 3 , for instance, the associated confidence level is 99.7 percent.
Exhibit 9. Tracking growth against interval estimate.
As shown in Exhibit 9, the actual values are plotted in the background of the estimated mean value and the limits. Now one sees a problem if the actual values cross the limits because we have already given a tolerance band to deviations from the expected mean value.
Those data points, which lie outside the tolerance band, are known as outliers. The first improvement one can think of is to prevent outliers, the next improvement being reduction of the allowed variation band.
The acceptable limits (point estimates) on defect levels are marked in the life cycle phase control chart shown in Exhibit 10. The actual data is superimposed on the expectation levels. Perhaps this type of control chart is most natural for life cycle projects. One can plot the following metrics values in this control chart format:
These life cycle phase control charts provide an opportunity to disseminate process goals and deploy them phasewise. One can define the ranges around each estimate to be more realistic about goal setting. The expected values and process goals change with time and improve when the organization makes progress in its processes. There is perhaps no expected value 91 that can be stationary and permanent.
Errors/ KSLOC | Test | Errors/ KSLOC | Test |
4 3.5 5 4.1 5.7 2 1.5 1.75 1.8 3.5 |
UT UT UT UT UT ST ST ST ST ST |
1 1.2 1.5 1.8 0.9 0.4 0.4 0.6 0.6 0.8 |
AT AT AT AT AT CP CP CP CP CP |
Exhibit 10. Defect profile control chart.
We must recall that uncertainties are associated with each measured value. Each data point is not a deterministic entity, but probabilistic in nature. If we plot the probability densities of measured values, as in Exhibit 11, each data point is not a single point but a distribution. Let us try to answer the following questions. Have distributions A, B, C, D, and E crossed the limits? Should we read red alert or early warning?
The answer: these are blurred crossings, not abrupt jumps. Statistically, they represent process diffusion.
We may relate control limits to the assumed confidence levels of judgment and appreciate the tentative nature of limits. We can move up or down the control limits and opt for yet another reference point as UCL. We can fix the UCL and LCL at chosen points on the process distribution curve and accept the corresponding confidence level for decision making. Crossing the limit is a question of degree, which depends on assumptions and perceptions and not so much on the seemingly rigorous mathematical expressions that are used to compute the limits.
When the type of distribution is not known we can apply Chebyshev’s theorem, according to which, for any population or sample, at least (1 – (1/k)2 ) of the observations in the dataset fall within k standard deviations of the mean, where k ≥ 1. This is illustrated in Exhibit 12 as a relationship between standard deviation and the corresponding confidence level.
Chebyshev’s theorem provides a lower bound to the proportion of measurements that are within a certain number of standard deviations from the mean. This lower bound estimate can be very helpful when the distribution of a particular population is unknown or mathematically intractable.
Because the software development process is totally a human process, one cannot expect a standard distribution pattern. Therefore, we should adopt an estimation method, which does not depend on data distribution pattern, and at the same time reasonably represent the actual situation. Therefore, depending on the confidence level required one could set the process capability baseline limits with 1.5σ , 2σ, or 3σ for 56, 75, and 89 percent confidence levels, respectively.
When the sample data points are not available it is frequently impossible to construct an X-bar–R chart. In this case the only alternative available is to construct an X moving range chart. Here successive data points are grouped to form a sub-group.
Exhibit 11. Blurred crossings.
Exhibit 12. Selecting confidence limits for control chart.
Control limits for this chart are derived based on control chart constants. The limits are given in Equation 6.1.
Lower control limit = X Bar – E 2 * R Bar
Center line = X Bar
Upper control limit = X Bar + E 2 * R Bar
(6.1)
where E2 is 2.659 for sample size n is 2.
Let us consider an application of X m R chart for effort variance process. Because this data is less frequently available, at the project closure we can characterize this process and arrive at its baseline value through the application of X m R chart, as shown in Exhibit 13.
Exhibit 14 shows the process capability baselines with popular control limits. If tighter control on a metric such as effort variance percent is wanted, one could choose 1.5σ limits; on the contrary, if the project manager does not want too many causal analyses to be made or if the process is in the inception stage, one could choose 3σ control limits, wherein nearly 89 out of 100 times the process value will be within the 3σ control limit.
The process history, if available, can be used to set control limits such as demonstrated in Exhibit 15, where frequency distribution of historical data reveals the existence of natural process limits, the valley points dropping off the principal peak. UNPL refers to upper natural process limit and LNPL refers to lower natural process limit.
This approach allows us to use empirical frequency distributions, which are perhaps more relevant and accurate than the elegant assumptions made in the traditional computations of limits.
The Shewhart control chart, introduced in 1920, decomposes process variation into two components: random variation (predictable bounds) and systematic variation (anomalies). Random variations, when the cause system is constant, approach some distribution function, and hence remain predictable or statistically stable. Systematic variations are due to assignable causes, which are due to unusual causes, freak incidents, process drifts, and environmental threats.
Shewhart demonstrated how control charts could be used to identify and distinguish the two types of process variation, to achieve process effi-ciency, and ensuing economic benefits.
Exhibit 13. X m R chart on effort variance.
Exhibit 14. Control chart with confidence limits.
Exhibit 15. History-based limits.
Exhibit 16 shows how a training manager uses the Shewhart Control Chart to identify (and later solve) two problems: extraordinary cost for Training ID 7 and the average cost (μ) greater than the budget.
Armand V. Feigenbaum allows specifying control limits from past experience and guesswork in a pragmatic manner.
Tests for statistical control have been in use for a long time. The classical tests or decision rules to be applied while reading the control charts are presented in the following list, along with an illustration in Exhibit 17.
If the metric shows trend, such as delivered defect density (DDD) in Exhibit 18, the control charts may be partitioned to make a clearer presentation of the problem. The trend line helps in forecasting and risk estimation. The baseline helps in process analysis, estimation, and setting process guidelines.
Sometimes the metric is a product of two major components, each showing its own independent characteristics. Defects found by design review, for instance, are a product of defect injected and review effectiveness, shown in Equation 6.2.
Defects Found = Defects Injected * Review Effectiveness (6.2)
The UCL in the control chart of defect/KLOC, as shown in Exhibit 19, is more relevant to the designers, who have to keep defect level below the UCL. The LCL, on the other hand, appeals to the reviewers to find defects more than the UCL. In the defect control chart in Exhibit 19, the following references are marked for proper interpretation:
Training ID | Cost of Training | Training ID | Cost of Training |
1 2 3 4 5 6 7 8 9 10 |
400 333 455 400 385 500 833 435 385 313 |
11 12 13 14 15 16 17 18 19 20 |
435 400 370 250 435 455 417 455 500 435 |
Exhibit 16. Controlling the cost of training.
Effort Variance % |
|
4.3 3.8 4.8 3.8 3.7 3.5 5.8 4.9 5.1 5.3 5.5 5.1 5.2 5.8 |
5.6 5 4.2 8.6 4.2 4.1 5.2 4.2 7.6 4.6 7.4 4.2 4.2 - |
Exhibit 17. Tests for Statistical Process Control(SPC) charts.
Exhibit 18. Trend and baselines.
The control chart in Exhibit 19 is cluttered, and one has to strain to read, analyze, and interpret the chart. When the chart is used to give process feedback, some process owners may mix signals, one demanding a minimum production of defects, another may demand just the opposite.
In Process Defects |
|
6.6 7.6 12.4 12.3 24.7 8.7 5.6 5.1 1.7 15 2.9 1.7 1 0 |
5.6 18.2 0.9 10.7 1.2 13.2 10.2 7.5 0.8 0.1 4.9 2.3 - |
Exhibit 19. In-process defect control chart.
Exhibit 20. Splitting a double-side limit into two single-side limits.
This problem may be solved and effective presentation may be made to the process owner, if only we could construct two separate control charts, each delivered to the process owner with the appropriate control limits, as indicated in Exhibit 20. After the split, the new control charts look simple and clear, with just one decision rule marked. The process owner, the designer, or the reviewer, gets a clear signal. The process defects are marked as circles in both cases. With defects clearly marked and the goal (specification limit) clearly specified, each process owner can go into causal analysis of process violations and initiate corrective measures. The purpose of this control chart is to provide effective feedback and facilitate corrective action.
There are several control chart forms in use, including the ones we have used so far. Below is a brief list for a quick reference. The exact formulas for computations may be found elsewhere.
When we have a large number of data points that can be organized as sub-groups according to some real-life order, and when the sub-group sizes are used in determining the control limits, the following charts may be useful.
If instead of sub-groups we have just an individual data point for every process delivery, we can artificially create a sub-group by selecting data points from a moving average window, and plot a graph with control limits calculated in the traditional way.
When all we desire is to characterize the process and generate some performance baseline on a chosen metric, the following forms may be used. These forms can be used across life cycle phases or across subgroups.
If we wish to compare actual values with estimates, then the following may be used:
Most performance models are constructed this way. A few of them are illustrated in this section.
The design review process of each individual can be tracked using the metric called number of pages reviewed per hour. The bar graph in Exhibit 21 shows the individual’s review performance against the average group performance and with respect to maximum and minimum performance.
A simple way to take a holistic and balanced view of processes is to track all related process metrics on a radar chart, marking the target values and the achieved values. Cost drivers, performance drivers, and defect drivers in software development can be plotted on the radar chart for effective process control. Tracking of multiple goals, all competing for resources, is presented in the radar chart format in Exhibit 22. The following is a list of metrics used to represent and measure goals:
All these are measured quantitatively on a 0 to 10 scale (ratio scale). Targets and achievement in each direction are plotted. This is a control chart because it compares reality with expectation and allows one to see deviations. It gives deeper meaning and allows one to visualize a balanced picture or model on goal achievement.
Control charts in modern times have taken a totally new form. They are embedded in metric databases and analysis modules, which perform dynamic functions.
A defect-tracking tool uses a defect database as the platform and tracks bug closure. If the time taken exceeds a preset limit, the software generates a message to the tester. Even if the bug lives long after the message, the software escalates the issue and the message is now flashed to the project manager. The tester or the manager does not see a physical control chart but gets the results.
The limit setting can be a choice from the manager, where his experience and judgment prevail. Or the limit setting can be done by the software logic, which will use an appropriate decision rule and raise an alarm. The decision-making algorithm can be simple algebra or a sophisticated knowledge engine that learns and works with intelligence.
Person | Pages/ hr. | Person | Pages/hr. |
A B C D E F G H I J K L M |
2.8 7.6 2.8 9.8 3.3 1.8 7.7 8.4 2.3 3.5 2.5 9.3 6.8 |
N O P Q R S T U V w X Y Z |
6.4 1.3 8.1 6.3 4.8 4.9 5.9 4.0 1.9 5.9 9.5 3.8 2.7 |
Exhibit 21. Review performance comparison.
Target | Achieved | |
CUST SAT PROD EMP SAT RFT DRE TNF |
9 8 7.5 7 9 6 |
4 6 4 7 3 4 |
Exhibit 22. Goal control radar.
The graph is printed, on demand, as a report from the tool along with other statistics. In a similar way, metrics data analysis tools can generate dynamic control charts on all metrics. These charts can be published in the monthly process capability baseline reports.
There are many forms of control charts but they all must be structured well for effective application. Here are some suggestions.
On any metric we can plot a control chart. Choose the metric that communicates better. For instance, a training manager can choose cost of absenteeism instead of number of people who are absent because the former makes senior management look at the control chart seriously.
The data should be in chronological order. Most software development processes follow the learning curve, both first order and second order. Before process stability is achieved, the learning curve is encountered. Chronological order gives control charts the vital meaning and power.
A decision rule must be provided to enable problem recognition. The rule could be expressed in the following ways:
The reader must be made familiar with the rules for interpretation. The chart must be designed with the most likely readers in mind, and every effort must be made to make the chart provide effective communication to a human system (biofeedback).
Provide support data as annotations for significant data points. For example, a defect distribution pie chart can be provided as a companion to a defect control chart.
Annotate identified hot spots or trends with causal analysis findings. We learn from such annotations. Wherever possible, suggested corrective action may be indicated.
109
Metrics data, when presented in time series, offers a new form that helps to understand the process. A well-structured time series chart could emerge into a model once it captures a pattern that can be applied as a historic lesson. The time series analysis for trend or process control is also a time series model of the process, inasmuch as it can increase one’s understanding of the process behavior and forecast.
But the outstanding issue in software projects is whether a process goes according to a plan or estimate. The need for statistically derived, self-organizing goals, should it arise, is only secondary. The term control chart may then be replaced with the term decision support chart. The concept of control limit will be substituted with the concept of decision thresholds. What-if analysis can be done on a control chart by shifting the limits and seeing each time how many events are picked up and earmarked for investigation. The problem set will shift according to the location of the threshold line.
There are reasons why metrics control charts end up issuing suggestive clues but not convincing proof about process problems:
But all a project manager is looking for is a set of clues, not final proof. A decision support chart can coexist with ambiguity but the classical control chart cannot.
If known problems are not solved, nobody wants to use a control chart to detect new problems. If trouble can be spotted without having to use a control chart, avoid control charts. Going one step further, if without the aid of control limits we can spot outliers using the naked eye, let us not draw control limits.
The connection of control charts with action is now legendary. The best control chart is the one on which somebody acts.