So far we have analyzed sets of metrics—complexity metrics, code churn, code coverage, etc.—in isolation. In this section we address how the metrics may be combined to form stronger predictors of failures.
In Figure 23-3 a simple network of engineering working together in different binaries is shown. Similarly, Figure 23-3 shows the code dependencies between various networks. Figure 23-3 shows combining both pieces of information to integrate people, churn (in terms of edits/contributions), and dependencies together into one network.
For Windows Vista we generate such a network integrating the people, churn contribution, and dependency information. Several social-network measures [Bird et al. 2009], detailed next, are computed for the Windows Vista social network (similar to the network in Figure 23-3).
Ego network measures [Borgatti et al. 2002] are based on the neighborhood for any particular node. The node being evaluated is denoted ego, and the neighborhood includes ego, the set of nodes connected to an ego, and the complete set of edges between this set of nodes.
The number of nodes in the ego network
Number of edges in the ego network
Number of possible directed edges in the ego network
Proportion of possible ties that actually are present (Ties/Pairs)
Number of weakly connected components
Number of weakly connected components normalized by size, i.e., (Weak Components/Size)
The proportion of nodes that are within two hops of ego
Two Step Reach normalized by size of the network (higher reach efficiency indicates that the ego’s primary contacts are influential in the network)
Number of pairs of nodes that are connected only by ego (thus ego acts as the sole broker for the pair)
Brokerage normalized by number of pairs
Betweenness of ego within its ego network
The preceding social network measures are computed for the complete Vista network (which includes the developers, contributions, and dependencies). Using these social network measures as input, prediction models are built. We observe that precision and recall of the built models are much higher when also using the dependency network for prediction. Similar results were also observed for multiple versions of IBM Eclipse (the open source IDE from IBM) [Bird et al. 2009]. Table 23-2 shows the precision and recall values with the model fit F-scores, which also indicate the increased ability of the socio-technical approach to predict failures. The “combined” model in Table 23-2 denotes a model purely built by just adding both the “Contribution” network (people working together) and the “Dependency” network (between various pieces of code), which provides a contrast to the socio-technical network. Further readings in which code complexity, churn, and coverage metrics are combined to predict failures can be found in [Nagappan et al. 2006b].
Table 23-2. Overall socio-technical network model efficacy using different release of Eclipse [Bird et al. 2009]
Release |
Network |
Precision |
Recall |
F-score |
Nagel. |
---|---|---|---|---|---|
2.0 |
Dependency |
0.667 |
0.779 |
0.705 |
0.532 |
Contribution |
0.808 |
0.854 |
0.824 |
0.702 | |
Combined |
0.826 |
0.814 |
0.813 |
0.909 | |
Socio-technical |
0.755 |
0.859 |
0.800 |
0.747 | |
2.1 |
Dependency |
0.693 |
0.753 |
0.710 |
0.626 |
Contribution |
0.675 |
0.780 |
0.719 |
0.607 | |
Combined |
0.755 |
0.777 |
0.758 |
0.805 | |
Socio-technical |
0.747 |
0.809 |
0.770 |
0.689 | |
3.0 |
Dependency |
0.631 |
0.737 |
0.673 |
0.494 |
Contribution |
0.681 |
0.683 |
0.673 |
0.353 | |
Combined |
0.745 |
0.756 |
0.743 |
0.616 | |
Socio-technical |
0.767 |
0.777 |
0.769 |
0.600 | |
3.1 |
Dependency |
0.579 |
0.718 |
0.634 |
0.391 |
Contribution |
0.639 |
0.646 |
0.629 |
0.295 | |
Combined |
0.693 |
0.796 |
0.735 |
0.689 | |
Socio-technical |
0.820 |
0.800 |
0.806 |
0.668 | |
3.2 |
Dependency |
0.698 |
0.780 |
0.731 |
0.495 |
Contribution |
0.614 |
0.720 |
0.654 |
0.371 | |
Combined |
0.835 |
0.866 |
0.846 |
0.816 | |
Socio-technical |
0.793 |
0.784 |
0.785 |
0.572 | |
3.3 |
Dependency |
0.693 |
0.743 |
0.711 |
0.433 |
Contribution |
0.725 |
0.669 |
0.688 |
0.356 | |
Combined |
0.742 |
0.780 |
0.754 |
0.686 | |
Socio-technical |
0.820 |
0.831 |
0.823 |
0.727 |