As a result of the general survey, we decided to survey the design and coding MRs in depth. The following were the goals we wanted to achieve in this part of the study:
Determine the kinds of faults that occurred in design and coding
Determine the difficulty both in finding or reproducing these faults and in fixing them
Determine the underlying causes of the faults
Determine how the faults might have been prevented
Compare the difficulty in finding and fixing interface and implementation faults
There were two reasons for choosing this part of the general set of MRs. First, it seemed to be exceedingly difficult to separate the two kinds of faults. Second, catching these kinds of faults earlier in the process would provide a significant reduction in overall fault cost; that is, the cost of finding faults before system integration is significantly less than finding them in the laboratory testing environment. Our internal cost data is consistent with Boehm’s [Boehm 1981] (see also Chapter 10 by Barry Boehm). Thus, gaining insight into these problems will yield significant (and cost beneficial) results.
In the two subsections that follow, we summarize the survey questionnaire, present the results of our statistical analysis, and summarize our findings with regard to interface and implementation faults.
The respondents were asked to indicate the difficulty of finding and fixing the problem, determine the actual and underlying causes, indicate the best means of either preventing or avoiding the problem, and give their level of confidence in their responses. It should be kept in mind that the people surveyed were those who owned the MR at the time it was closed (i.e., completed).
For each MR, rank it according to how difficult it was to reproduce the failure and locate the fault.
Easy—could produce at will.
Moderate—happened some of the time (intermittent).
Difficult—needed theories to figure out how to reproduce the error.
Very Difficult—exceedingly hard to reproduce.
For each MR, how much time was needed to design and code the fix, and document and test it. (Note that what would be an easy fix in a single-programmer system takes considerably more time in a large, multiperson project with a complex laboratory test environment.)
Easy—less than one day
Moderate—1 to 5 days
Difficult—6 to 30 days
Very difficult—greater than 30 days
For each MR, consider the following 22 possible types and select the one that most closely applies to the immediate cause (that is, the fault type).
Language pitfalls—for example, pointer problems, or the use of “=” instead of “= =”.
Protocol—violated rules about interprocess communication.
Low-level logic—for example, loop termination problems, pointer initialization, etc.
CMS complexity—for example, due to software change management system complexity.
Internal functionality—either inadequate functionality or changes and/or additions were needed to existing functionality within the module or subsystem.
External functionality—either inadequate functionality or changes and/or additions were needed to existing functionality outside the module or subsystem.
Primitives misused—the design or code depended on primitives that were not used correctly.
Primitives unsupported—the design or code depended on primitives that were not adequately developed (that is, the primitives do not work correctly).
Change coordination—either did not know about previous changes or depended on concurrent changes.
Interface complexity—interfaces were badly structured or incomprehensible.
Design/code complexity—the implementation was badly structured or incomprehensible.
Error handling—incorrect handling of, or recovery from, exceptions.
Race conditions—incorrect coordination in the sharing of data.
Performance—for example, real-time constraints, resource access, or response-time constraints.
Resource allocation—incorrect resource allocation and deallocation.
Dynamic data design—incorrect design of dynamic data resources or structures.
Dynamic data use—incorrect use of dynamic data structures (for example, initialization, maintaining constraints, etc.).
Static data design—incorrect design of static data structures (for example, their location, partitioning, redundancy, etc.).
Unknown interactions—unknowingly involved other functionality or parts of the system.
Unexpected dependencies—unexpected interactions or dependencies on other parts of the system.
Concurrent work—unexpected dependencies on concurrent work in other releases.
Other—describe the fault.
Because the fault may be only a symptom, provide what you regard to be the underlying root cause for each problem.
None given—no underlying causes given.
Incomplete/omitted requirements—the source of the fault stemmed from either incomplete or unstated requirements.
Ambiguous requirements—the requirements were (informally) stated, but they were open to more than one interpretation. The interpretation selected was evidently incorrect.
Incomplete/omitted design—the source of the fault stemmed from either incomplete or unstated design specifications.
Ambiguous design—the design was (informally) given, but was open to more than one interpretation. The interpretation selected was evidently incorrect.
Earlier incorrect fix—the fault was induced by an earlier, incorrect fix (that is, the fault was not the result of new development).
Lack of knowledge—there was something that I needed to know, but did not know that I needed to know it.
Incorrect modification—I suspected that the solution was incorrect, but could not determine how to correctly solve the problem.
Submitted under duress—the solution was submitted under duress, knowing that it was incorrect (generally due to schedule pressure, etc.).
Other—describe the underlying cause.
For this fault, consider possible ways to prevent or avoid it, and select the most useful or appropriate choice for preventing, avoiding, or detecting the fault.
Formal requirements—use precise, unambiguous requirements (or design) in a formal notation (which may be either graphical or textual).
Requirements/design templates—provide more specific requirements (or design) document templates.
Formal interface specifications—use a formal notation for describing the module interfaces.
Training—provide discussions, training seminars, and formal courses.
Application walk-throughs—determine, informally, the interactions among the various application-specific processes and data objects.
Expert person/documentation—provide an “expert” person or clear documentation when needed.
Design/code currency—keep design documents up to date with code changes.
Guideline enforcement—enforce code inspections guidelines and the use of static analysis tools such as lint.
Better test planning—provide better test planning and/or execution (for example, automatic regression testing).
Others—describe the means of prevention.
Confidence levels requested of the respondents were: very high, high, moderate, low, and very low. We discarded the small number of responses that had a confidence level of either low or very low.
Out of all the questionnaires, 68% were returned. Of those, we dropped the responses that were either low or very low in confidence (6%). The remainder of the questionnaires were then subjected to Chi-Square analysis [Siegel et al. 1988] to test for independence (and for interdependence) of various paired sets of data. In Chi-Square analysis, the lower the total chi-square value, the more independent the two sets of data; the higher the value, the more interdependent the two sets of data. The p-value indicates the significance of the analysis: the lower the number, the less likely the relationships are due to chance. In Table 25-2, the prevention data and the find data are the most independent (the total chi-square is the lowest), and that lack of relationship is significant (the p-value is less than the standard .05 and indicates that the odds are less than 1 in 20 that the relationship happened by chance). The fault-cause, fault-prevention, and cause-prevention pairs are the most interdependent, as their total chi-square values are the largest three of the entire set and the significance of these relationships is very high (the odds are less than 1 in 10,000 of being by chance).
The fact that the relationships between the faults and their underlying causes, faults and means of prevention, and means of prevention and the underlying causes are the most significantly interdependent is a good thing: 1) faults should be strongly related to their underlying causes, and 2) both faults and their underlying causes should be strongly related to their means of prevention. This indicates that the respondents were consistent in their responses and the data aligns with what one would logically expect.
Table 25-2. Chi-Square analysis summary
Variables |
Degrees of freedom |
Total Chi-Square |
p |
---|---|---|---|
Find, Fix |
6 |
51.489 |
.0001 |
Fault, Find |
63 |
174.269 |
.0001 |
Fault, Fix |
63 |
204.252 |
.0001 |
Cause, Find |
27 |
94.493 |
.0001 |
Cause, Fix |
27 |
55.232 |
.0011 |
Fault, Cause |
189 |
403.136 |
.0001 |
Prevention, Find |
27 |
41.021 |
.041 |
Prevention, Fix |
27 |
97.886 |
.0001 |
Fault, Prevention |
189 |
492.826 |
.0001 |
Cause, Prevention |
81 |
641.417 |
.0001 |
Table 25-3 provides a cross-tabulation of the difficulty in finding and fixing the design and coding faults. Of these faults, 78% took five days or less to fix. In general, the easier-to-find faults were easier to fix; the more difficult-to-find faults were more difficult to fix.
Table 25-3. Find versus fix comparison
Find/fix |
|
<1 day |
1-5 days |
6-30 days | >30 days |
---|---|---|---|---|---|
30.1% |
48.8% |
18.0% |
3.6% | ||
easy |
67.5% |
23.7% |
32.1% |
10.0% |
1.7% |
moderate |
23.4% |
4.2% |
12.5% |
5.6% |
1.1% |
difficult |
7.7% |
1.7% |
3.4% |
2.1% |
.5% |
very difficult |
1.4% |
.5% |
.3% |
.3% |
.3% |
One of the interesting things about Chi-Square analysis is that it is based on the difference between expected and observed values of the paired data. The expected value in this case is the product of the observed find value and the observed fix value. If the two sets of data are independent of each other, the expected percentages will match or be very close to the observed percentages; otherwise, the two sets of data are not independent.
The first row of data is the observed percentages of how long it took to fix the MR; the first column is the observed percentages of how hard it was to find/duplicate the problem. The expected value of easy to find and fixable in a day or less is 67.1% × 30.1% = 20.2%, whereas the actually observed value of 23.7% is 17% more than that expected value.
There were more faults that were easy to find and took less than one day to fix than were expected by the Chi-Square analysis. Interestingly, there were fewer than expected easy to find faults (expected: 12%) that took 6 to 30 days to fix (observed: 10%).
Although the coordinates of the effort to find and fix the faults are non-comparable, we note that the following relationship is suggestive. Collapsing the previous table yields an interesting insight in Table 25-4 that seems counter to the common wisdom that says “once you have found the problem, it is easy to fix it.” There is a significant number of “easy/moderate to find” faults that require a relatively long time to fix.
Table 25-5 shows the fault types of the MRs as ordered by their frequency in the survey independent of any other factors. For the sake of brevity in the subsequent tables, we use the fault type number to represent the fault types.
The first five fault types account for 60% of the faults. That “internal functionality” is the leading fault by such a large margin is somewhat surprising; that “interface complexity” is such a significant problem is not surprising at all. However, that the first five fault types are leading faults is consistent with the nature of the evolution of the system. Adding significant amounts of new functionality to a system easily accounts for problems with “internal functionality,” “low-level logic,” and “external functionality.”
The fact that the system is a very large, complicated real-time system easily accounts for the fact that there are problems with “interface complexity,” “unexpected dependencies” and design/code complexity,” “change coordination,” and “concurrent work.”
C has well-known “language pitfalls” that account for the rank of that fault in the middle of the set. Similarly, “race conditions” are a reasonably significant problem because of the lack of suitable language facilities in C.
That “performance” faults are relatively insignificant is probably due to the fact that this is not an early release of the system, and performance was always a significant concern of code inspections.
There are two interesting relationships to consider in the ordering of the various faults: the effect that the difficulty in finding the faults has on the ordering and the effect that the difficulty of fixing the faults has on the ordering. The purpose of weighting is to provide an adjustment to the observed frequency by how easy or hard the faults are to find or to fix. From the standpoint of “getting the most bang for the buck,” the frequency of a fault is a good prima facie indicator of the importance of one fault relative to another. Table 25-5 shows the fault types ordered by frequency.
Table 25-5. Fault types ordered by frequency
Fault type |
Observed % |
Fault type description |
---|---|---|
5 |
25.0% |
internal functionality |
10 |
11.4% |
interface complexity |
20 |
8.0% |
unexpected dependencies |
3 |
7.9% |
low-level logic |
11 |
7.7% |
design/code complexity |
22 |
5.8% |
other |
9 |
4.9% |
change coordination |
21 |
4.4% |
concurrent work |
13 |
4.3% |
race conditions |
6 |
3.6% |
external functionality |
1 |
3.5% |
language pitfalls |
12 |
3.3% |
error handling |
7 |
2.4% |
primitives misused |
17 |
2.1% |
dynamic data use |
15 |
1.5% |
resource allocation |
18 |
1.0% |
static data design |
14 |
.9% |
performance |
19 |
.7% |
unknown interactions |
8 |
.6% |
primitives unsupported |
2 |
.4% |
protocol |
4 |
.3% |
CMS complexity |
16 |
.3% |
dynamic data design |
Table 25-6 is an attempt to capture the weighted difficulty of finding the various faults. The weighting is done by multiplying the proportion of observed values for each fault with multiplicative weights of 1, 2, 3, and 4 for each find category, respectively, and summing the results.
Obviously it would have been better to have had some duration assigned to the effort to find faults and then correlated the weighting with those durations, as we do subsequently in weighting by effort to fix faults. The weights used are intended to be suggestive, not definitive. We experimented with several different weightings, and the results were pretty much the same. Thus we used the simplest approach.
Better yet would have been effort data associated with each MR that could be used to get a more realistic picture of actual difficulty. But this type of data is seldom available, and an approximation is needed instead.
For example, if a fault was easy to find in 66% of the cases, moderate in 23%, difficult in 11%, and very difficult in 0%, the weight is 145 = (66 * 1) + (23 * 2) + (11 * 3) + (0 * 4). Table 25-6 shows the fault types weighted by difficulty to find, from easiest to most difficult.
Table 25-6. Determining the find weighting
Fault type |
Find proportion e/m/d/vd |
Weight |
Fault type description |
---|---|---|---|
4 |
100/0/0/0 |
100 |
CMS complexity |
18 |
100/0/0/0 |
100 |
static data design |
7 |
88/8/4/0 |
120 |
primitives misused |
2 |
75/25/0/0 |
125 |
protocol |
20 |
78/16/5/1 |
129 |
unexpected dependencies |
21 |
70/23/2/4 |
130 |
concurrent work |
3 |
73/22/5/0 |
132 |
low-level logic |
22 |
82/12/2/5 |
132 |
other |
5 |
74/19/6/1 |
134 |
internal functionality |
6 |
67/31/3/0 |
139 |
external functionality |
1 |
68/26/2/2 |
141 |
language pitfalls |
10 |
66/23/11/0 |
145 |
interface complexity |
9 |
65/20/12/2 |
149 |
change coordination |
8 |
67/17/17/0 |
152 |
primitives unsupported |
19 |
88/8/4/0 |
157 |
unknown interactions |
16 |
67/0/33/0 |
157 |
dynamic data design |
17 |
52/38/10/0 |
158 |
dynamic data use |
15 |
47/47/7/0 |
162 |
resource allocation |
12 |
55/30/12/3 |
163 |
error handling |
11 |
55/29/16/1 |
165 |
code complexity |
14 |
56/11/11/22 |
199 |
performance |
13 |
12/67/21/0 |
209 |
race conditions |
Typically, performance faults and race conditions are very difficult to isolate and reproduce. We would expect that “code complexity” and “error handling” faults also would be difficult to find and reproduce. Not surprisingly, “language pitfalls” and “interface complexity” are reasonably hard to detect.
In the Chi-Square analysis, “internal functionality,” “unexpected dependencies,” and “other” tended to be easier to find than expected. “Code complexity” and “performance” tended to be harder to find than expected. There tended to be more significant deviations where the population was larger.
If we weight the proportions by multiplying the number of occurrences of each fault by its weight from Table 25-5 and dividing by the total weighted number of occurrences, we get only a slight change in the ordering of the faults, with “internal functionality,” “code complexity,” and “race conditions” (faults 5, 11, and 13) changing slightly more than the rest of the faults.
Table 25-7 represents the results of weighting the difficulty of fixing the various faults by factoring in the actual time needed to fix the faults. The multiplicative scheme uses the values 1, 3, 15, and 30 for the four average times in fixing a fault. The calculations are performed as in the example of weighting the difficulty of finding the faults.
The weighting according to the difficulty in fixing the fault causes some interesting shifts in the ordering. “Language pitfalls,” “low-level logic,” and “internal functionality” (faults 1, 3, and 5) drop significantly in their relative importance. This coincides with one’s intuition about these kinds of faults. “Design/code complexity,” “resource allocation,” and “unexpected dependencies” (faults 11, 15, and 20) rise significantly in their relative importance; “interface complexity,” “race conditions,” and “performance” (faults 10, 13, 14) also rise, but not significantly so.
Table 25-7. Determining the fix weighting
Fault type |
Proportion e/m/d/vd |
Weight |
Fault type description |
---|---|---|---|
16 |
67/33/0/0 |
166 |
dynamic data design |
4 |
67/33/0/0 |
166 |
CMS complexity |
8 |
50/50/0/0 |
200 |
primitives unsupported |
18 |
50/50/0/0 |
200 |
static data design |
1 |
63/31/6/0 |
244 |
language pitfalls |
3 |
59/37/3/1 |
245 |
low-level logic |
2 |
25/75/0/0 |
250 |
protocol |
17 |
38/48/14/0 |
392 |
dynamic data use |
9 |
37/49/14/0 |
394 |
change coordination |
5 |
27/59/14/0 |
414 |
internal functionality |
22 |
40/43/12/5 |
496 |
other |
7 |
46/37/8/8 |
497 |
primitives misused |
10 |
17/57/26/1 |
608 |
interface complexity |
21 |
25/43/30/2 |
661 |
concurrent work |
6 |
22/50/22/6 |
682 |
external functionality |
13 |
16/56/21/7 |
709 |
race conditions |
12 |
21/52/18/9 |
717 |
error handling |
19 |
29/43/14/14 |
785 |
unknown interactions |
20 |
24/39/33/5 |
786 |
unexpected dependencies |
11 |
22/39/27/12 |
904 |
design/code complexity |
14 |
11/22/44/22 |
1397 |
performance |
15 |
0/47/27/27 |
1356 |
resource allocation |
Table 25-8 shows the top fix-weighted faults. According to our weighting schemes, these four faults account for 55.2% of the effort expended to fix all the faults and 51% of the effort to find them, but represent 52.1% of the faults by frequency count. Collectively, they are somewhat harder to fix than rest of the faults and slightly easier to find. We again note that although the two scales are not strictly comparable, the comparison is an interesting one nonetheless.
Table 25-8. Faults weighted by fix difficulty
Fault type |
Weighted % |
Brief description |
---|---|---|
5 |
18.7% |
internal functionality |
10 |
12.6% |
interface complexity |
11 |
12.6% |
code complexity |
20 |
11.3% |
unexpected dependencies |
In the Chi-Square analysis, “language pitfalls” and “low-level logic” took fewer days to fix than expected. “Interface complexity” and “internal functionality” took 1 to 6 days to fix more often than expected, and “design/code complexity” and “unexpected dependencies” took longer to fix (that is, 6 to over 30 days) than expected. These deviations reinforce our weighted assessment of the effort to fix the faults.
In Table 25-9, we show the underlying causes of the MRs as ordered by their frequency in the survey, independent of any other factors.
Table 25-9. Underlying causes of faults
Underlying causes |
Observed % |
Brief description |
---|---|---|
4 |
25.2% |
incomplete/omitted design |
1 |
20.5% |
none given |
7 |
17.8% |
lack of knowledge |
5 |
9.8% |
ambiguous design |
6 |
7.3% |
earlier incorrect fix |
9 |
6.8% |
submitted under duress |
2 |
5.4% |
incomplete/omitted requirements |
10 |
4.1% |
other |
3 |
2.0% |
ambiguous requirements |
8 |
1.1% |
incorrect modification |
The high proportion of “none given” as an underlying cause requires some explanation. One of the reasons for this is that faults such as “language pitfalls,” “low-level logic,” “race conditions,” and “change coordination” tend to be both the fault and the underlying cause (7.8%—or 33% of the faults in the “none given” underlying cause category in Table 25-12 below). In addition, one could easily imagine that some of the faults, such as “interface complexity” and “design/code complexity,” could also be considered both the fault and the underlying cause (3.4%—or 16% of the faults in the “none given” underlying cause category in Table 25-12). On the other hand, we were surprised that no cause was given for a substantial part of the “internal functionality” faults (3.3%—or 16% of the faults in the “none given” category in Table 25-12). One would expect there to be some underlying cause for that particular fault.
Table 25-10 shows the relative difficulty in finding the faults associated with the underlying causes. The resulting ordering is particularly nonintuitive: the MRs with no underlying cause are the second most difficult to find; those submitted under duress are the most difficult to find.
Table 25-10. Weighting of the underlying causes by find effort
Underlying causes |
Proportion |
Weight |
Brief description |
---|---|---|---|
8 |
91/9/0/0 |
109 |
incorrect modification |
7 |
74/18/7/1 |
135 |
lack of knowledge |
3 |
60/40/0/0 |
140 |
ambiguous requirements |
5 |
66/27/7/0 |
141 |
ambiguous design |
2 |
70/17/13/0 |
143 |
incomplete/omitted requirements |
4 |
68/25/7/1 |
143 |
incomplete/omitted design |
6 |
73/12/10/5 |
147 |
earlier incorrect fix |
10 |
76/12/0/12 |
148 |
other |
1 |
63/25/11/1 |
150 |
none given |
9 |
50/46/4/0 |
158 |
submitted under duress |
In the Chi-Square analysis of finding underlying causes, faults caused by “lack of knowledge” tended to be easier to find than expected, whereas faults caused by “submitted under duress” tended to be moderately hard to find more often than expected. This latter finding is interesting, as we know very little about faults “submitted under duress.”
In Table 25-11, we weight the underlying causes by the effort to fix the faults represented by the underlying causes. This yields a few shifts in the proportion of effort: “incomplete/omitted design” increased significantly; “unclear requirements” and “incomplete/omitted requirements” increased less significantly; “none” decreased significantly; and “unclear design” and “other” decreased less significantly. However, the relative ordering of the various underlying causes is unchanged.
Table 25-11. Weighting of the underlying causes by fix effort
Underlying causes |
Proportion |
Weight |
Brief description |
---|---|---|---|
10 |
37/42/12/10 |
340 |
other |
1 |
43/43/12/2 |
412 |
none given |
5 |
29/55/14/2 |
464 |
ambiguous design |
7 |
30/50/17/3 |
525 |
lack of knowledge |
6 |
34/45/17/4 |
544 |
earlier incorrect fix |
9 |
18/57/25/0 |
564 |
submitted under duress |
8 |
18/55/27/0 |
588 |
incorrect modification |
4 |
23/50/22/5 |
653 |
incomplete/omitted design |
2 |
26/44/24/6 |
698 |
incomplete/omitted requirements |
3 |
25/30/24/6 |
940 |
ambiguous requirements |
The relative weighting of the effort to fix these kinds of underlying causes seems to coincide with one’s intuition very nicely.
In the Chi-Square analysis of fixing underlying causes, faults caused by “none given” tended to take less time to fix than expected, whereas faults caused by “incomplete/omitted design” and “submitted under duress” tended to take more time to fix than expected.
In Table 25-12, we present the cross-tabulation of faults and their underlying causes. Faults are represented by the rows, underlying causes by the columns. The numbers in the matrix are the percentages of the total population of faults. Thus, 1.5% of the total faults were fault 1 with the underlying cause 1. The expected number of faults for fault 1 and underlying cause 1 can be computed by multiplying the total faults for each of those categories: 20.5% * 3.5% = .7%. In this example, the actual number of faults was higher than expected.
Table 25-12. Cross-tabulating fault types and underlying causes
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 | ||
---|---|---|---|---|---|---|---|---|---|---|---|
20.5% |
5.4% |
2.0% |
25.2% |
9.8% |
7.3% |
17.8% |
1.1% |
6.8% |
4.1% | ||
1 language pitfalls |
3.5% |
1.5 |
.0 |
.0 |
.2 |
.1 |
.2 |
.8 |
.1 |
.5 |
.1 |
2 protocol |
.4% |
.0 |
.0 |
.1 |
.2 |
.0 |
.0 |
.1 |
.0 |
.0 |
.0 |
3 low-level logic |
7.9% |
3.7 |
.3 |
.1 |
.6 |
.3 |
1.2 |
.7 |
.0 |
.6 |
.4 |
4 CMS complexity |
.3% |
.1 |
.0 |
.0 |
.0 |
.0 |
.1 |
.1 |
.0 |
.0 |
.0 |
5 internal functionality |
25.0% |
3.3 |
1.3 |
.6 |
7.7 |
2.8 |
2.0 |
5.2 |
.3 |
1.2 |
.6 |
6 external functionality |
3.6% |
.7 |
.3 |
.1 |
.4 |
.5 |
.6 |
.7 |
.0 |
.3 |
.0 |
7 primitives misused |
2.4% |
.4 |
.0 |
.0 |
.5 |
.0 |
.1 |
.8 |
.0 |
.0 |
.6 |
8 primitives unsupported |
.6% |
.0 |
.2 |
.0 |
.1 |
.0 |
.1 |
.1 |
.0 |
.1 |
.0 |
9 change coordination |
4.9% |
1.1 |
.0 |
.0 |
.8 |
1.0 |
.6 |
.8 |
.1 |
.3 |
.2 |
10 interface complexity |
11.4% |
2.1 |
.6 |
.2 |
4.1 |
1.4 |
1.1 |
1.4 |
.2 |
.0 |
.3 |
11 design/code complexity |
7.7% |
1.3 |
.0 |
.3 |
3.0 |
1.6 |
.2 |
1.0 |
.0 |
.0 |
.3 |
12 error handling |
3.3% |
.9 |
.3 |
.0 |
.8 |
.0 |
.1 |
.7 |
.0 |
.4 |
.1 |
13 race conditions |
4.3% |
1.4 |
.2 |
.0 |
1.3 |
.5 |
.1 |
.3 |
.0 |
.4 |
.1 |
14 performance |
.9% |
.2 |
.0 |
.1 |
.2 |
.0 |
.0 |
.3 |
.0 |
.0 |
.1 |
15 resource allocation |
1.5% |
.5 |
.0 |
.0 |
.3 |
.1 |
.0 |
.4 |
.1 |
.0 |
.1 |
16 dynamic data design |
.3% |
.0 |
.0 |
.0 |
.1 |
.0 |
.0 |
.1 |
.0 |
.1 |
.0 |
17 dynamic data use |
2.1% |
.7 |
.1 |
.0 |
.2 |
.1 |
.0 |
.6 |
.0 |
.4 |
.0 |
18 static data design |
1.0% |
.3 |
.1 |
.1 |
.2 |
.1 |
.0 |
.1 |
.0 |
.1 |
.0 |
19 unknown interactions |
.7% |
.0 |
.1 |
.1 |
.0 |
.2 |
.0 |
.2 |
.0 |
.1 |
.0 |
20 unexpected dependencies |
8.0% |
.5 |
.8 |
.3 |
2.7 |
.5 |
.1 |
1.4 |
.0 |
1.7 |
.0 |
21 concurrent work |
4.4% |
.6 |
.3 |
.0 |
1.2 |
.2 |
.4 |
.9 |
.2 |
.4 |
.2 |
22 other |
5.8% |
1.2 |
.8 |
.0 |
.6 |
.4 |
.4 |
1.1 |
.1 |
.2 |
1.0 |
For the sake of brevity, we consider only the most frequently occurring faults and their major underlying causes. “Incomplete/omitted design” (cause 4) is the primary underlying cause in all of these major faults. “Ambiguous design” (cause 5), “lack of knowledge” (cause 7), and “none given” (cause 1) were also significant contributors to the presence of these faults.
“Incomplete/omitted design” (cause 4) was felt to have been the cause of 31% (that is, 7.7% / 25%) of the occurrences of this fault, a percentage higher than expected; “lack of knowledge” (cause 7) was thought to have caused 21% of the occurrences of this fault, higher than expected; and “none given” was listed as the third underlying cause, representing 13% of the occurrences.
Again, “incomplete/omitted design” was seen to be the primary cause in the occurrence of this fault (36%), higher than expected; “lack of knowledge” and “ambiguous design” were seen as the second and third primary causes of this fault (13% and 12%, respectively).
Not surprisingly, “incomplete/omitted design” was felt to have been the primary cause of this fault (in 34% of the cases); “submitted under duress” (cause 9) contributed to 21% of the occurrences, a percentage higher than expected; and “lack of knowledge” was the tertiary cause of this fault, representing 18% of the occurrences.
Again, “incomplete/omitted design” was felt to have been the primary cause in 39% of the occurrences of this fault, a percentage higher than expected; “ambiguous design” was the second most frequent underlying cause of this fault, causing 21% of the faults (also a higher percentage than expected); and “none given” was listed as the third underlying cause, representing 17% of the occurrences.
Again, for the sake of brevity, we consider only the most frequently occurring underlying causes and the faults to which they were most applicable.
As we noted previously, “internal functionality,” “interface complexity,” “code/design complexity,” and “unexpected dependencies” were the major applicable faults (31%, 12%, 12%, and 11%, respectively), with the first three occurring with higher than expected frequency.
“Low-level logic” (fault 3) was the leading fault, representing 18% of the occurrences (a percentage higher than expected); “internal functionality” (fault 5) was the second major fault, representing 16% of the occurrences (a percentage lower than expected); “interface complexity” (fault 10) was the third leading fault, representing 10% of the occurrences; and “language pitfalls” was the fourth leading fault, representing 8% of the occurrences (a percentage higher than expected).
“Internal functionality” was the leading fault, representing 29% of the occurrences (a percentage higher than expected); “interface complexity” was next with 8% of the occurrences (a percentage lower than expected); “unexpected dependencies” was third with 8% of the occurrences; and “other” (fault 22) was the fourth with 6%.
“Internal functionality” represented 29% of the occurrences; “code/design complexity” (fault 11) was second fault, representing 16% of the occurrences (a percentage higher than expected); “interface complexity” was third with 14%; and “change coordination” (fault 9) was fourth, representing 10% of the occurrences (a percentage higher than expected).
Table 25-13 shows the means of prevention of the MRs, as ordered by their occurrence independent of any other factors. We note that the means selected may well reflect a particular approach of the responder in selecting one means over another (for example, see the discussion later in this section about formal versus informal means of prevention).
Table 25-13. Means of error prevention
Means of prevention |
Observed % |
Brief description |
---|---|---|
5 |
24.5% |
application walk-throughs |
6 |
15.7% |
expert person/documentation |
8 |
13.3% |
guideline enforcement |
2 |
10.0% |
requirements/design templates |
9 |
9.9% |
better test planning |
1 |
8.8% |
formal requirements |
3 |
7.2% |
formal interface specifications |
10 |
6.9% |
other |
4 |
2.2% |
training |
7 |
1.5% |
design/code currency |
It is interesting to note that the application-specific means of prevention (“application walk-throughs”) is considered the most effective. This selection of application walk-throughs as the most useful means of error prevention appears to confirm the observation of Curtis, Krasner, and Iscoe [Curtis et al. 1988] that a thin spread of application knowledge is the most significant problem in building large systems.
Further, it is worth noting that informal means of prevention rank higher than formal ones. On the one hand, this may reflect the general bias in the United States against formal methods. On the other hand, the informal means are a nontechnical solution to providing the information that may be supplied by formal representations (and which provide a more technical solution with perhaps higher attendant adoption costs).
The level of effort to find the faults for which these are the means of prevention does not change the order found in Table 25-13, with the exception of “requirements/design templates,” which seems to apply to the easier-to-find faults, and “guideline enforcement,” which seems to apply more to the harder-to-find faults.
In the Chi-Square analysis, the relationship between finding faults and preventing them is the most independent of the relationships, reported here with p=.041. “Application walk-throughs” applied to faults that were marginally easier to find than expected, whereas “guideline enforcement” applied to faults that were less easy to find than expected.
In Table 25-14, the means of prevention is weighted by the effort to fix the associated faults.
Table 25-14. Means of prevention weighted by fix effort
Prevention |
Proportion |
Weight |
Brief description |
---|---|---|---|
8 |
38/52/7/3 |
389 |
guideline enforcement |
9 |
35/52/12/1 |
401 |
better test planning |
7 |
40/40/20/0 |
460 |
design/code currency |
5 |
33/50/17/1 |
468 |
application walk-throughs |
10 |
49/36/6/9 |
517 |
other |
2 |
10/52/30/1 |
654 |
requirements/design templates |
3 |
26/43/26/4 |
675 |
formal interface specifications |
6 |
22/48/24/6 |
706 |
expert person/documentation |
1 |
20/50/22/8 |
740 |
formal requirements |
4 |
23/36/23/18 |
1016 |
training |
It is interesting to note that the faults considered to be prevented by training are the hardest to fix. The formal methods also apply to classes of faults that take a long time to fix.
Weighting the means of prevention by effort to fix their corresponding faults yields a few shifts in proportion: “application walk-throughs,” “better test planning,” and “guideline enforcement” decreased in proportion; “expert person/documentation” and “formal requirements” increased in proportion; and “formal interface specifications” and “other” less so. As a result, the ordering changes slightly to 5, 6, 2, 1, 8, 10, 3, 9, 4, 7: “expert person/documentation” and “formal requirements” (numbers 6 and 1) are weighted significantly higher; “requirements/design templates,” “formal interface specifications,” “training,” and “other” (numbers 2, 3, 4, and 10) are less significantly higher; and “guideline enforcement” and “better test planning” (numbers 8 and 9) are significantly lower.
In the Chi-Square analysis, faults prevented by “application walk-throughs,” “guideline enforcement,” and “other” tended to take fewer days to fix than expected, whereas faults prevented by “formal requirements,” “requirements/design templates,” and “expert person/documentation” took longer to fix than expected.
In Table 25-15, we present the cross-tabulation of faults and their means of prevention. Again, the faults are represented by the rows, and the means of prevention are represented by the columns. The data is analogous to the preceding cross-tabulation of faults and underlying causes.
For the sake of brevity, we consider only the most frequently occurring faults and their major means of prevention. “Application walk-throughs” were felt to be an effective means of preventing these most significant faults. “Expert person/documentation,” “formal requirements,” and “formal interface specifications” were also significant means of preventing these faults.
“Application walk-throughs” (prevention 5) were thought to be the most effective means of prevention, applicable to 27% of the occurrences of this fault; “expert person/documentation” (prevention 6) was felt to be the second most effective means, applicable to 18% of the fault occurrences; and “requirements/design templates” were thought to be applicable to 14% of the fault occurrences, a percentage higher than expected.
Table 25-15. Cross-tabulating faults and means of prevention
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 | ||
---|---|---|---|---|---|---|---|---|---|---|---|
8.8% |
10.0% |
7.2% |
2.2% |
24.5% |
15.7% |
1.5% |
13.3% |
9.9% |
6.9% | ||
1 language pitfalls |
3.5% |
.0 |
.1 |
.1 |
.0 |
1.0 |
.3 |
.1 |
1.3 |
.4 |
.2 |
2 protocol |
.4% |
.1 |
.2 |
.0 |
.0 |
.1 |
.0 |
.0 |
.0 |
.0 |
.0 |
3 low-level logic |
7.9% |
.1 |
.0 |
.1 |
.2 |
2.3 |
.3 |
.2 |
3.2 |
.8 |
.7 |
4 CMS complexity |
.3% |
.0 |
.0 |
.0 |
.0 |
.0 |
.1 |
.0 |
.1 |
.1 |
.0 |
5 internal functionality |
25.0% |
1.9 |
3.5 |
1.5 |
.4 |
6.6 |
4.4 |
.2 |
3.3 |
3.1 |
.1 |
6 external functionality |
3.6% |
.6 |
.3 |
.4 |
.0 |
.1 |
.7 |
.0 |
.5 |
.9 |
.1 |
7 primitives misused |
2.4% |
.1 |
.1 |
.2 |
.0 |
.8 |
.3 |
.0 |
.1 |
.2 |
.6 |
8 primitives unsupported |
.6% |
.1 |
.0 |
.0 |
.0 |
.3 |
.0 |
.0 |
.0 |
.1 |
.1 |
9 change coordination |
4.9% |
.4 |
.9 |
.3 |
.4 |
.8 |
.3 |
.3 |
.3 |
.7 |
.5 |
10 interface complexity |
11.4% |
2.1 |
.3 |
2.1 |
.0 |
3.0 |
1.7 |
.1 |
1.2 |
.7 |
.2 |
11 design/code complexity |
7.7% |
.8 |
.5 |
.1 |
.4 |
2.2 |
2.4 |
.2 |
.3 |
.4 |
.4 |
12 error handling |
3.3% |
.2 |
.2 |
.3 |
.1 |
.6 |
.6 |
.0 |
.4 |
.5 |
.4 |
13 race conditions |
4.3% |
.8 |
.0 |
.4 |
.0 |
1.2 |
.4 |
.2 |
.4 |
.2 |
.7 |
14 performance |
.9% |
.0 |
.0 |
.0 |
.2 |
.2 |
.3 |
.0 |
.0 |
.0 |
.2 |
15 resource allocation |
1.5% |
.1 |
.1 |
.1 |
.0 |
.3 |
.3 |
.0 |
.3 |
.3 |
.0 |
16 dynamic data design |
.3% |
.0 |
.0 |
.0 |
.0 |
.1 |
.0 |
.0 |
.1 |
.0 |
.1 |
17 dynamic data use |
2.1% |
.0 |
.0 |
.2 |
.0 |
.8 |
.5 |
.0 |
.5 |
.0 |
.1 |
18 static data design |
1.0% |
.1 |
.1 |
.0 |
.0 |
.2 |
.2 |
.0 |
.0 |
.3 |
.1 |
19 unknown interactions |
.7% |
.1 |
.0 |
.2 |
.0 |
.0 |
.2 |
.0 |
.0 |
.2 |
.0 |
20 unexpected dependencies |
8.0% |
.6 |
2.2 |
1.1 |
.1 |
2.3 |
.6 |
.0 |
.4 |
.6 |
.1 |
21 concurrent work |
4.4% |
.4 |
.7 |
.0 |
.2 |
1.2 |
1.1 |
.1 |
.3 |
.0 |
.4 |
22 other |
5.8% |
.3 |
.8 |
.1 |
.2 |
.4 |
1.0 |
.1 |
.6 |
.4 |
1.9 |
Again, “application walk-throughs” were considered to be the most effective, applicable to 26% of the cases; “formal requirements” (prevention 1) and “formal interface specifications” were felt to be equally effective, with each preventing 18% of the fault occurrences (in both cases, a percentage higher than expected).
“Application walk-throughs” were felt to be the most effective means of preventing this fault, applicable to 29% of the occurrences; “requirements/design templates” were considered the second most effective and applicable to 28% of the fault occurrences (a percentage higher than expected); and “formal interface specifications” were considered applicable to 14% of the fault occurrences, a percentage higher than expected.
“Expert person/documentation” was felt to be the most effective means of preventing this fault, applicable to 31% of the cases (higher than expected); “application walk-throughs” were the second most effective means, applicable to 29% of the occurrences; and “formal requirements” was third, applicable to 10% of the fault occurrences.
Again, for the sake of brevity, we consider only the most frequently occurring means of prevention and the faults to which they were most applicable. Not surprisingly, these means were most applicable to “internal functionality” and “interface complexity,” the most prevalent faults. Counterintuitively, they are also strongly recommended as applicable to “low-level logic.”
“Internal functionality” (fault 5) was considered as the primary target in 27% of the uses of this means of prevention; “interface complexity” (fault 10) was felt to be the secondary target, representing 12% of the uses of this means; and “low-level logic” (fault 3) and “unexpected dependencies” (fault 20) were next with 9% each.
Again, “internal functionality” is the dominant target for this means, representing 29% of the possible applications; “design/code complexity” is the second most applicable target, representing 15% of the possible applications (a percentage higher than expected); and “interface complexity” represented 11% of the uses (higher than expected).
“Internal functionality” and “low-level logic” were the dominant targets for this means of prevention, representing 25% and 24%, respectively (the latter being higher than expected); “language pitfalls” (fault 1) was seen as the third most relevant fault, representing 10% of the possible applications (higher than expected); and “interface complexity” was the fourth with 9% of the possible applications of this means of prevention.
In Table 25-16, it is interesting to note that in the Chi-Square analysis there are lots of deviations (that is, there is a wider variance between the actual values and the expected values in correlating underlying causes and means of prevention). This indicates that there are strong dependencies between the underlying causes and their means of prevention. Intuitively, this type of relationship is just what we would expect.
Table 25-16. Cross-tabulating means of prevention and underlying causes
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 | ||
---|---|---|---|---|---|---|---|---|---|---|---|
20.5% |
5.4% |
2.0% |
25.2% |
9.8% |
7.3% |
17.8% |
1.1% |
6.8% |
4.1% | ||
1 formal requirements |
8.8% |
.4 |
2.3 |
.9 |
3.5 |
.8 |
.3 |
.5 |
.1 |
.0 |
.0 |
2 reqs/design templates |
10.0% |
.4 |
1.7 |
.1 |
3.7 |
1.9 |
.1 |
.8 |
.0 |
1.3 |
.0 |
3 formal interface specs |
7.2% |
.8 |
.3 |
.1 |
2.7 |
.8 |
.3 |
2.0 |
.0 |
.2 |
.0 |
4 training |
2.2% |
.4 |
.0 |
.1 |
.7 |
.1 |
.3 |
.6 |
.0 |
.0 |
.0 |
5 application walk-thrus |
24.5% |
7.5 |
.2 |
.3 |
7.3 |
3.1 |
1.8 |
3.1 |
.0 |
.5 |
.7 |
6 expert person/doc |
15.7% |
1.5 |
.4 |
.4 |
3.5 |
1.8 |
1.0 |
5.8 |
.6 |
.3 |
.4 |
7 design/code currency |
1.5% |
.4 |
.0 |
.0 |
.6 |
.2 |
.1 |
.2 |
.0 |
.0 |
.0 |
8 guideline enforcement |
13.3% |
4.0 |
.1 |
.0 |
.6 |
.2 |
1.6 |
2.5 |
.0 |
3.7 |
.6 |
9 better test planning |
9.90% |
2.8 |
.2 |
.0 |
1.7 |
.8 |
1.6 |
1.9 |
.3 |
.2 |
.4 |
10 others |
6.9% |
2.3 |
.2 |
.1 |
.9 |
.1 |
.2 |
.4 |
.1 |
.6 |
2.0 |
We first summarize the means of prevention associated with the major underlying causes. “Application walk-throughs,” “expert person/documentation,” and “guideline enforcement” were considered important in addressing these major underlying causes.
“Application walk-throughs” (prevention 5) was thought to be applicable to 28% of the faults with this underlying cause (a percentage higher than expected); “requirements/design templates” (prevention 2) and “expert person/documentation” (prevention 6) were next in importance with 14% each (the first being higher than expected); and “formal requirements” (prevention 1) was felt to be applicable to 12% of the faults with this underlying cause (a percentage higher than expected).
Again, “application walk-throughs” was thought to be applicable to 37% of the faults with these underlying causes; “guideline enforcement” (prevention 8), “better test planning” (prevention 9), and “other” (prevention 10) were felt to be applicable to 19%, 14%, and 10% of the faults, respectively. In all four of these cases, the percentages were higher than expected.
“Expert person/documentation” was thought to be applicable to 32% of the faults with this underlying cause, a percentage higher than expected; “application walk-throughs,” “guideline enforcement,” and “formal interface specifications” were felt to be applicable to 17%, 14%, and 11% of the faults with this underlying cause, respectively, though “application walk-throughs” had a lower percentage than expected, whereas “formal interface specifications” had a higher percentage than expected.
The following summarizes the major underlying causes addressed by the most frequently considered means of prevention. “Lack of knowledge,” “none given,” “incomplete/omitted design,” and “ambiguous design” were the major underlying causes for which these means of prevention were considered important. It is somewhat non-intuitive that the “none given” underlying cause category is so prominent as an appropriate target for these primary means of prevention.
“None given” (cause 1) and “incomplete/omitted design” (cause 4) were thought to be the appropriate for this means of prevention for 31% and 30% of the cases, respectively (higher than expected); “ambiguous design” (cause 5) and “lack of knowledge” (cause 7) both were felt to apply to 13% of the cases (though the first was higher than expected and the second lower).
“Lack of knowledge” was considered the major target for this means of prevention, accounting for 37% of the cases (a higher than expected value); “incomplete/omitted design” and “ambiguous design” were thought to be appropriate in 23% and 11% of the cases, respectively; and “none given” was thought appropriate in 10% of the cases (lower than expected).
“None given” and “incorrect modification” were felt to be the most appropriate for this means of prevention for 30% and 28% of the cases, respectively (both higher than expected); “lack of knowledge” and “incorrect earlier fix” were appropriate in 19% and 12% of the cases, respectively (the latter was higher than expected).
The definition of an interface fault that we use here is that of Basili and Perricone [Basili and Perricone 1984] and Perry and Evangelist [Perry and Evangelist 1985], [Perry and Evangelist 1987]: interface faults are “those that are associated with structures existing outside the module’s local environment but which the module used.” Using this definition, we roughly characterize “language pitfalls” (1), “low-level logic” (3), “internal functionality” (5), “design/code complexity” (11), “performance” (14), and “other” (22) as implementation faults. The remainder are considered interface faults. We say “roughly” because there are some cases where the implementation categories may contain some interface problems; remember that some of the “design/code complexity” faults were considered preventable by formal interface specifications. Table 25-17 shows our interface versus implementation fault comparison.
Table 25-17. Interface/implementation fault comparison
Interface |
Implementation | |
---|---|---|
Frequency |
49% |
51% |
Find weighted |
50% |
50% |
Fix weighted |
56% |
44% |
Interface faults occur with slightly less frequency than implementation faults, but require about the same effort to find them and more effort to fix them.
Table 25-18 compares interface and implementation faults with respect to their underlying causes. Underlying causes “other,” “ambiguous requirements,” “none given,” “earlier incorrect fix,” and “ambiguous design” tended to be the underlying causes more for implementation faults than for interface faults. Underlying causes “incomplete/omitted requirements,” “incorrect modification,” and “submitted under duress” tended to be the causes more for interface faults than for implementation faults.
Note that underlying causes that involved ambiguity tended to result more in implementation faults than in interface faults, whereas underlying causes involving incompleteness or omission of information tended to result more in interface faults than in implementation faults.
Table 25-18. Interface/implementation faults and underlying causes
Interface |
Implementation | ||
---|---|---|---|
49% |
51% | ||
1 |
none given |
45.2% |
54.8% |
2 |
incomplete/omitted requirements |
79.6% |
20.4% |
3 |
ambiguous requirements |
44.5% |
55.5% |
4 |
incomplete/omitted design |
50.8% |
49.2% |
5 |
ambiguous design |
47.0% |
53.0% |
6 |
earlier incorrect fix |
45.1% |
54.9% |
7 |
lack of knowledge |
49.2% |
50.8% |
8 |
incorrect modification |
54.5% |
45.5% |
9 |
submitted under duress |
63.1% |
36.9% |
10 |
other |
39.1% |
60.1% |
Table 25-19 compares interface and implementation faults with respect to the means of prevention. Not surprisingly, means 1 and 3 were more applicable to interface faults than to implementation faults. Means of prevention 8, 4, and 6 were considered more applicable to implementation faults than to interface faults.
Table 25-19. Interface/implementation faults and means of prevention
Interface |
Implementation | ||
---|---|---|---|
49% |
51% | ||
1 |
formal requirements |
64.8% |
35.2% |
2 |
requirements/design templates |
51.5% |
48.5% |
3 |
formal interface specifications |
73.6% |
26.4% |
4 |
training |
36.4% |
63.6% |
5 |
application walk-throughs |
48.0% |
52.0% |
6 |
expert person/documentation |
44.3% |
55.7% |
7 |
design/code currency |
46.7% |
53.3% |
8 |
guideline enforcement |
33.1% |
66.9% |
9 |
better test planning |
48.0% |
52.0% |
10 |
others |
49.3% |
50.7% |