Where to Go from Here

A distribution of bugs and changes like the one in Figure 27-2 is already worthy in itself: you literally see where most activity takes place and where the most problems occur. The next interesting question is the cause of this distribution. As a manager, you will be able to explain many of the effects you see, particularly when it comes to extremes, such as the modules with the most bugs or the highest amount of change. You may also be surprised, though, at some effects that did not make it into the headlines: effects distributed all across your project and possibly related to some recurring causes.

In our experience so far, the problems on every project have their own specific causes that are related to changes and bugs (which of course motivates the need for mining project histories). Typical starting points for investigation include:

The problem domain of the component

Analyzing the APIs that components use, Schröter et al. showed that Eclipse internal compiler code is seven times as error-prone as Eclipse GUI components [Schröter et al. 2006]. Evidence like this can help not only to identify critical areas of your application but can also be used to predict the quality of new components without any history, just by looking at their used APIs.

The complexity of the source code

Complicated tasks increase the likelihood of mistakes. The more complex source code is, the more difficult it is to make changes that are defect-free. There are many ways to define code complexity and many research studies showing that code complexity can indeed be an excellent defect predictor [Nagappan et al. 2006b], [Zimmermann et al. 2007].

The change history of the component

Components that were recently (or massively) changed carry a higher risk of bugs. Hassan and Holt introduced a caching algorithm for fault-prone modules that highlight the 10 most susceptible subsystems to contain a bug [Hassan and Hold 2005]. To build this top-10 list, the authors use only history information that reveals the modules that were most frequently modified, most recently modified, most frequently fixed, and most recently fixed. As a manager or developer, you can use tools like the “top-10 list” to focus your testing resources on particular modules that are most likely to fail.

The test history of the component

The more tested a program, the less likely its risk to contain a bug. Hutchins et al. showed that test sets with high coverage levels (more than 90%) show better fault detection than randomly chosen test sets of the same size [Hutchins et al. 1994]. Later, Nagappan et al. used coverage data sets to build successful defect prediction models [Nagappan et al. 2006a]. So mining test data sets and system health information in your software project can certainly help to detect risky but not yet well-tested parts of your source code.

The people involved in component development

Humans write software, and humans tend to err. Therefore, the authors of source code are a consideration in determining the quality of a component. At Microsoft, Nagappan et al. investigated whether components written by distributed development teams are more risky than local developed components [Nagappan et al. 2008]. Surprisingly, it does not make a significant difference. But as Ekanayake et al. showed, the number of authors editing the same source code file and the number of defects they fixed influence the defect prediction quality [Ekanayake et al. 2009].

By combining these data sources, you will find correlations, but not all of these translate into causations. As we tried to relate the bug density in Eclipse to individual developers, for instance, we found that Erich Gamma, the head architect, appeared among the developers with the highest bug density overall. As we asked him how this could be, Gamma replied: “There are always problems that you cannot solve, so you pass them on to your supervisor, who is supposed to be better than you. Some problems go up the entire hierarchy—to the group leader, to the department head, and so on, and nobody wants to touch them, until they end up on my desk. And as I have no one to delegate them to, I have to take care of them. In fact, most of what I do all day is really risky.”

With this in mind, you are now ready to mine for correlations—and to turn these into causations by providing a theory on how specific effects in your product come to be. Be aware, though, that every bit of evidence thus gained raises more questions—and the need for more evidence. An infrastructure for automated mining will make it easy for you to obtain this evidence and to support your decisions with real information.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset