Characteristics of Faulty Files

Once we had evidence of bug concentration, we were able to start looking for characteristics of the buggy code that would allow us to identify it. The software in an ongoing project provides two classes of properties that potentially can be used to characterize particular code units. The first class consists of static structural code properties that can be extracted directly from the source code. They include things such as programming language, the number of lines of code in a file, the number of method calls, and various software complexity metrics.

The second class consists of process properties, which relate to the history of the system’s development and testing. They include information such as the number of changes and faults that were detected in previous releases, and the length of time that a particular code unit has been part of the system.

The goal of our early research was to find properties of the files from both of these classes that had a strong correlation with the occurrence of faults in the files. The first two systems that we studied provided enough evidence for us to build preliminary models that did a creditable job of predicting which files would be the most likely to have the largest number of faults in future releases of those systems.

The third system we examined, the Voice Response system, was similar to the first two systems in size, duration of time in the field, and multideveloper team, but it used a development process that was different from the usual sequence of phases and did not release versions at regular release intervals. Its defects could not be associated with any particular development phase.

This Voice Response system was released continuously, with new builds being created and released whenever bugs were fixed or new functionality was introduced. Without the discipline of regular releases and their associated substantial system test phase, it wasn’t obvious that the models developed for the first two systems would be applicable to this system, but we were pleasantly surprised to discover that with some variation, these models performed very well. The “releases” shown in Tables 9-1 and 9-2 for Voice Response are simply consecutive three-month periods of time in the system’s life, and the bugs reported for each release are just those bugs that were first reported during that three-month time period. This different development paradigm might explain why this system had a larger number of faults per release than most of the others. Many of its MRs reported changes made by developers relatively early in a module’s life, corresponding to a traditional project’s usual phase of unit testing.

Based on our experiments with the first three systems, we settled on a standard prediction model that we applied to the three Maintenance Support systems, with excellent results.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset