The wisdom of the crowds in predictive modeling for software engineering

L.L. Minku    University of Leicester, Leicester, United Kingdom

Abstract

The “wisdom of the crowds” phenomenon has long been observed in cognitive, social, and political sciences. The idea is that the combination of the judgments given by a large group of people on a certain matter is often better than the judgment given by any individual within this group. This chapter shows that this idea can also be used by predictive models in software engineering in order to improve predictions, deal with changes, transfer knowledge, deal with multiple goals, and gain insights into software processes.

Keywords

Machine learning; Predictive models; Ensembles; Bagging; Concept drift; Transfer learning; Multi-objective learning; Software effort estimation; Software bug prediction

The Wisdom of the Crowds

The “wisdom of the crowds” phenomenon has long been observed in cognitive, social, and political sciences. The idea is that the combination of the judgments given by a large group of people on a certain matter is often better than the judgment given by any individual within this group.

Classically, the wisdom of the crowds was studied in continuous estimation problems, even though it has also been studied in other types of problems. A landmark study was performed in 1906, when a contest was set up in a country fair in Plymouth (UK) to estimate the weight of a slaughtered and dressed ox. The person whose guess was the closest to the actual weight of 1198 pounds would win a prize. Around 800 people, including both experts and people with no expertise in judging the weight of cattle, participated in the contest. The interesting fact was that, even though people gave several different and wrong estimations, statistician Francis Galton found that the average guess of all the participants was 1197 pounds. The collective “wisdom” of the Plymouth crowd was remarkably close to the actual weight of the ox, and better than the estimations given by any of the experts [1]!

The reason why the combined guess was so close to the actual weight was that, even though individual guesses were frequently completely wrong, the overestimations of the weight given by some people cancelled out the underestimations given by others, resulting in an excellent combined guess.

So… How is That Related to Predictive Modeling for Software Engineering?

Existing data on software projects and processes can be used to create predictive models able to help us with several different software engineering tasks, such as prediction of the effort required to develop software projects, prediction of whether a given software module is likely to contain bugs, prediction of whether a commit is likely to induce a crash, prediction of the energy likely to be consumed by a software program, etc. However, the accuracy of the predictions given by single models is sometimes not ideal.

Similar to the wisdom of the crowds, in order to improve predictive accuracy, we can combine the predictions given by a crowd (ensemble) of different models, instead of using a single model! Numeric predictions (eg, effort or energy estimations) given by different individual models can be combined by taking their average, allowing errors to cancel each other out. Categorical predictions (eg, whether or not a software module is likely to be buggy, or whether or not a commit is likely to induce a crash) can be combined by choosing the category “voted” on by the majority of the individual models. In this case, the correct categories predicted by some of the models can compensate for the incorrect categories predicted by the others.

Examples of Ensembles and Factors Affecting Their Accuracy

The predictive accuracy of an ensemble tends to improve more if individual models are not only themselves accurate, but also diverse, ie, if they make different mistakes. Without diversity, the combined prediction would make similar mistakes to the individual predictions, rather than individual mistakes canceling each other out, or correct predictions compensating for incorrect ones. Therefore, algorithms for creating ensembles of models consist of different techniques to create diverse (and not only accurate) individual models.

An example of an ensemble learning algorithm is bagging [2]. Given a learning algorithm for creating single predictive models and a data set, bagging creates diverse predictive models by feeding different uniform samples of the data set to the learning algorithm in order to create each model. Another example of ensembles are heterogeneous ensembles, where each individual model is created based on a different learning algorithm in order to produce different models [3]. Both of these ensembles have been shown to help improve predictive accuracy (and stability) in software engineering tasks [4,5].

Besides individual models’ predictive accuracy and diversity, another factor that can influence the predictive accuracy of ensembles is their size, ie, the number of individual models composing the ensemble. A too-small ensemble size (eg, two models) may not be enough to improve predictive accuracy. A large ensemble size may use extra computational resources unnecessarily, or even cause reductions in predictive accuracy if too large, eg, 10,000 + models [6]. Even though some early studies suggested that ensembles with as few as 10 models were sufficient for improving predictive accuracy [7], other studies suggested that predictive accuracy can be further improved by using more than 10 models, eg, 25 modes [8]. The minimum ensemble size before further improvements in predictive accuracy cease to be achieved is likely to depend both on the predictive task and the learning algorithms involved.

Crowds for Transferring Knowledge and Dealing With Changes

Sometimes there is not much data for building predictive models within a given environment, hindering the accuracy of predictive approaches. There may be more data available from other environments, but these data are not always directly relevant for predictions in the targeted environment. For example, the effort required to develop a certain software project within a given company may be different from the effort required to develop the same project in another company because these companies adopt different management strategies. So, a software effort estimation model created with data from one of these companies may not be directly applicable to the other.

Even though data from different environments are not always compatible, they may become more or less relevant over time. This is because environments may change over time, becoming more or less similar to other environments. Changes in the environment are referred to as “concept drifts” by the machine learning community. They may affect how well predictive models represent the environment, triggering the need for updating predictive models. As an example, a software company may adopt a new software development process, resulting in its productivity becoming more similar to the productivity of other companies that adopt this process. If the company wishes to keep its software effort estimation models accurate, it must update them to reflect the new situation. If we can identify when data from other environments become useful and how much more useful they are, we can benefit from them to obtain better and more up-to-date predictive models.

Ensembles are useful in this context because they can maintain several different models representing different environments. When changes affect the adequacy of a given model to the targeted environment, we can identify which other models would be most adequate for the new situation, based on a few new data examples from the targeted environment. We can then emphasize the predictions of the most appropriate models. This is useful for improving predictive accuracy when there are not enough new examples from the targeted environment to create a whole new model representing the new situation of this environment. Given that it may take a lot of time and effort to collect enough new data from the targeted environment, ensemble approaches can be particularly useful to transfer knowledge from different models in changing environments [9].

Moreover, it is possible to use a few incoming examples from the targeted environment to learn functions that are able to map the predictions given by models representing different environments to the context of the targeted environment. An ensemble of such mapped models can transfer knowledge from models representing environments that do not directly match the targeted environment. This can greatly reduce the number of examples that need to be collected from within the targeted environment, being particularly useful when data collection is expensive within the targeted environment [10].

Crowds for Multiple Goals

Ensembles of models can also be used to deal with different goals in software engineering predictive tasks. For example, in software bug prediction, one may wish to identify the largest possible number of software components that contain bugs at the same time as making very few mistakes in terms of pointing out a non-buggy software module as buggy. These are two different and often conflicting goals. In order to deal with different goals, we can create models that emphasize different goals. Combined, these models can provide a good trade-off among different goals [11].

A Crowd of Insights

Ensembles have the potential to provide several insights into software engineering. Besides the insights given by the predictions themselves, different models representing different environments can give us insights into the differences between these environments. In particular, when used for software effort estimation, ensemble models can themselves be visualized in order to reveal how a company’s productivity compares to the productivity of other companies [10]. This can then be used to monitor how the productivity of a company changes over time in comparison to other companies and help identify areas where improvement is needed. The differences between the predictions given by different models to different sets of input values can also potentially lead to insights into how best choices vary from one environment to another, and whether it is worth trying to migrate from one environment to another.

Ensembles as Versatile Tools

In summary, ensembles are versatile tools that can help us to deal with different issues in predictive modeling for software engineering. They can help us to improve predictive accuracy (and stability) across data sets, to deal with changes and transfer knowledge, to handle different goals, and to gain insights into software processes.

Even though some ensemble approaches have potential drawbacks, depending on the problem at hand, other ensemble approaches can actually help us to overcome these drawbacks. For example, a potential drawback of ensembles is the possible increase in the computational resources (eg, running time and memory) required for creating ensembles in comparison to single models. Even though many ensemble approaches do not increase the time and memory complexity of the learning algorithms used by their individual models, their increase in required computational resources may still become a problem for very large data sets. That said, several other ensemble approaches are specifically designed to reduce the computational resources that would be required by single models when data sets are very large. This can be easily achieved, for example, by creating individual models with disjointed subsets of the data, as done by chunk-based incremental learning ensembles [12].

Another potential drawback of ensembles is lack of explanatory power. As the ensemble size increases, it becomes difficult to “read” ensembles in order to explain how exactly their predictions are made. This is not a problem if practitioners are more interested in how helpful the predictions themselves are, rather than in how the predictions are made. However, if practitioners wish to understand the models behind the predictions, lack of explanatory power can become a problem. Even though lack of explanatory power is a drawback of many ensemble approaches, some ensemble approaches do not hinder explanatory power, or can even help us to understand certain conditions better. For example, ensembles where a single model is created for each different environment can be useful for understanding the relationship and the differences between environments [10].

To learn more about ensembles and their applications to software engineering, I recommend Polikar’s [13] and Menzies et al. [14] manuscripts, respectively.

References

[1] Galton F. Vox Populi. Nature. 1907;450–451. http://galton.org/essays/1900-1911/galton-1907-vox-populi.pdf [accessed 04.01.16].

[2] Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123–140.

[3] Perrone M.P., Cooper L.N. When networks disagree: ensemble methods for hybrid neural networks. In: Mammone R.J., ed. Neural networks for speech and image processing. UK: Chapman Hall; 1993:126–142.

[4] Kocaguneli E., Menzies T., Keung J. On the value of ensemble effort estimation. IEEE Trans Softw Eng. 2012;38(6):1403–1416.

[5] Minku L.L., Yao X. Ensembles and locality: insight on improving software effort estimation. Inform Softw Technol. 2013;55(8):1512–1528.

[6] Grove A.J., Schuurmans D. Boosting in the limit: maximizing the margin of learned ensembles. In: Proceedings of the fifteenth national conference on artificial intelligence; 1998:692–699.

[7] Hansen L.K., Salamon P. Neural network ensembles. IEEE Trans Pattern Anal Mach Intellig. 1990;12(10):993–1001.

[8] Opitz D., Maclin R. Popular ensemble methods: an empirical study. J Artif Intellig Res. 1999;11:169–198.

[9] Minku L.L., Yao X. Can cross-company data improve performance in software effort estimation? In: Proceedings of the 8th international conference on predictive models in software engineering (PROMISE'2012); 2012:69–78.

[10] Minku L.L., Yao X. How to make best use of cross-company data in software effort estimation? In: Proceedings of the 36th international conference on software engineering (ICSE'14); 2014:446–456.

[11] Minku L.L., Yao X. Software effort estimation as a multi-objective learning problem. ACM Trans Softw Eng Methodol. 2013;22(4):article no. 35.

[12] Nick Street W., Kim W.S. A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining; 2001:377–382.

[13] Polikar R. Ensemble based systems in decision making. IEEE Circ Syst Mag. 2006;6(3):21–45.

[14] Menzies T., Kocaguneli E., Minku L.L., Peters F., Turhan B. Sharing data and models in software engineering, part IV: sharing models. USA: Morgan Kaufmann; 2014.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset