The Effectiveness of TDD

We analyzed the TDD trials that reported quantitative results of the TDD pill’s effects on productivity, internal and external quality, and test quality. Direct comparison of the quantitative results across trials was impossible, since the trials measured TDD’s effectiveness in different ways. Instead, we assign each trial a summary value of “better,” “worse,” “mixed,” or “inconclusive/no-difference.” The summary value is determined by the quantitative results reported for the TDD pill compared with a control. The summary value also incorporates the report author’s interpretation of the trial results. In trials with a summary value of “better,” a majority of quantitative measures favor the TDD pill in comparison to the control treatment. In trials with a summary value of “worse,” a majority of measures favor the control treatment. Trials with a summary value of “inconclusive/no-difference” were inconclusive or report no observed differences. Finally, in trials with a summary value of “mixed,” some measures favor TDD while others don’t. In all cases, the summary assignation was guided by the report author’s interpretation of the study findings because, in many cases, the reports omitted details of the trial that would have enabled an objective external evaluation.

In the following sections we do our best to draw some conclusions about the value of TDD from the trials.

Internal Quality

Available evidence from the trials suggests that TDD does not have a consistent effect on internal quality. Although TDD appears to yield better results over the control group for certain types of metrics (complexity and reuse), other metrics (coupling and cohesion) are often worse in the TDD treatment. Another observation from the trial data is that TDD yields production code that is less complex at the method/class level, but more complex at the package/project level. This inconsistent effect is more visible in more rigorous trials (i.e., L2 and L3 trials). The differences in internal quality may be due to other factors, such as motivation, skill, experience, and learning effects. Table 12-3 classifies the trials according to internal quality metrics.

Note

In the following tables, the first number in each cell reports all trials, whereas the number in parentheses reports only L2 and L3 trials.

Table 12-3. Effects on internal quality

TypeBETTERWORSEMIXEDINC | NO-DIFFTotal
Controlled experiment1 (0)0 (0)0 (0)2 (2)3 (2)
Pilot study1 (1)1 (1)3 (1)2 (2)7 (5)
Industrial use3 (1)1 (1)0 (0)0 (0)4 (2)
Total5 (2)2 (2)3 (1)4 (4)14 (9)

External Quality

There is some evidence to suggest that TDD improves external quality. Although the outcomes of controlled experiments are mostly inconclusive, industrial use and pilot studies strongly favor TDD. However, the supporting evidence from industrial use and controlled experiments disappears after filtering out the less rigorous studies (i.e., L0 and L1 trials). Furthermore, the evidence from pilot studies and controlled experiments is contradictory once L0 and L1 trials are filtered out. If all studies are counted equally, however, the evidence suggests that the TDD pill can improve external quality. Table 12-4 classifies the trials according to external quality metrics.

Table 12-4. Effects on external quality

TypeBETTERWORSEMIXEDINC | NO-DIFFTotal
Controlled experiment1 (0)2 (2)0 (0)3 (3)6 (5)
Pilot study6 (5)1 (1)0 (0)2 (2)9 (8)
Industrial use6 (0)0 (0)0 (0)1 (1)7 (1)
Total13 (5)3 (3)0 (0)6 (6)22 (14)

Productivity

The productivity dimension engenders the most controversial discussion of TDD. Although many admit that adopting TDD may require a steep learning curve that may decrease the productivity initially, there is no consensus on the long-term effects. One line of argument expects productivity to increase with TDD; reasons include easy context switching from one simple task to another, improved external quality (i.e., there are few errors and errors can be detected quickly), improved internal quality (i.e., fixing errors is easier due to simpler design), and improved test quality (i.e., chances of introducing new errors is low due to automated tests). The opposite line argues that TDD incurs too much overhead and will negatively impact productivity because too much time and focus may be spent on authoring tests as opposed to adding new functionality. The different measures used in TDD trials for evaluating productivity included development and maintenance effort, the amount of code or features produced over time, and the amount of code or features produced per unit of development effort.

The available evidence from the trials suggests that TDD does not have a consistent effect on productivity. The evidence from controlled experiments suggests an improvement in productivity when TDD is used. However, the pilot studies provide mixed evidence, some in favor of and others against TDD. In the industrial studies, the evidence suggests that TDD yields worse productivity. Even when considering only the more rigorous studies (L2 and L3), the evidence is equally split for and against a positive effect on productivity. Table 12-5 classifies the trials according to effects on productivity.

Table 12-5. Effects on productivity

TypeBETTERWORSEMIXEDINC | NO-DIFFTotal
Controlled experiment3 (1)0 (0)0 (0)1 (1)4 (2)
Pilot study6 (5)4 (4)0 (0)4 (3)14 (12)
Industrial use1 (0)5 (1)0 (0)1 (0)7 (1)
Total10 (6)9 (5)0 (0)6 (4)25 (15)

Test Quality

Because test cases precede all development activities with TDD, testing the correctness of an evolving system is expected to be made easier by a growing library of automated tests. Further, the testing process is expected to be of high quality due to the fine granularity of the tests produced. In the trials, test quality is captured by test density, test coverage, test productivity, or test effort.

There is some evidence to suggest that TDD improves test quality. Most of the evidence comes from pilot studies and is in favor of TDD, even after filtering out less rigorous studies. Controlled experiments suggest that TDD fares at least as well as the control treatments. There is insufficient evidence from industrial use to reach a conclusion.

Therefore, the test quality associated with TDD seems at least not worse and often better than alternative approaches. Here we would have expected stronger results: since encouraging test case development is one of the primary active ingredients of TDD, the overall evidence should have favored TDD in promoting the test quality measures reported in these studies.

Table 12-6 classifies the trials according to test quality.

Table 12-6. Effects on test quality

TypeBETTERWORSEMIXEDINC | NO-DIFFTotal
Controlled experiment2 (1)0 (0)0 (0)3 (3)5 (4)
Pilot study7 (5)1 (1)0 (0)1 (1)9 (7)
Industrial use1 (0)1 (1)0 (0)1 (0)3 (1)
Total10 (6)2 (2)0 (0)5 (4)17 (12)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset