Software development projects vary in many dimensions: the domain of the project, the expertise of the developers, the size of the project, and the programming languages used to express the source code comprising the system, to name just a few. We want the projects we analyze for the purpose of investigating the questions of interest to vary across at least some of these dimensions. We also want projects that have archival information about the software development available. Many projects meet these criteria. We chose to include in our analyses the following three projects:
Evolution (http://projects.gnome.org/evolution), an integrated email, address book, and calendar application included in the GNOME desktop (http://gnome.org)
Mozilla Firefox (http://www.mozilla.com/firefox), a popular cross-platform web browser
Mylyn (http://www.eclipse.org/mylyn), a task-focused interface for Eclipse (http://www.eclipse.org) that is included in the standard Eclipse distribution
Table 21-1 demonstrates the variability in these projects by providing an overview of these projects in terms of the length of the development, the primary language used to express the source code, the number of modules (see What Is a Module?), lines of code, and changes (see What Is a Change?). Only changes that were analyzed in our study are included in these counts.
Table 21-1. An overview of the three systems we analyzed
Project |
First release |
Primary language |
Modules |
Approximate SLOC[a] |
Changes |
---|---|---|---|---|---|
Evolution |
December 2001 |
C |
43 |
300,000 |
1,939 |
Firefox |
November 2004 |
C++ |
45 |
4,000,000 |
11,710 |
Mylyn |
November 2006 |
Java |
18 |
675,000 |
3,055 |
[a] Source lines of code were measured using cloc (http://cloc.sourceforge.net). |
Aggregate statistics are a useful starting point for understanding the systems being analyzed, but they really tell only part of the story. In particular, these statistics treat the archives of these systems as static, hiding the dynamics of how developers make changes to the system over time. To better understand how the system developments compare, we also examine the rate at which the system changes.
We can characterize the rate of change within each system by looking at the number of lines of code modified per day. Figure 21-1 shows the activity on each project over time. Each dot in these graphs corresponds to a single day of data recorded in the project’s code repository; the height of each dot on the vertical axis represents the number of lines of code that were committed to the repository on that day. We see that Evolution is characterized by a period of slow change initially, followed by a long period of sustained activity. Firefox changes less frequently and shows almost no activity after a certain point, as developers completed work on the 3.5 release branch and moved on to the next version of the project. Mylyn appears to grow in periodic bursts over time.
We can reach similar conclusions about the rates of change of these systems by looking at the cumulative sum of this data. Figure 21-2 is such a plot for Evolution, clearly showing the initial period of slow change and later sustained activity. Plots for the other systems look similar.