What Is a Change?

Before we can analyze the relationship between the modularity of a system and the changes developers make to that system, we first need a consistent definition of a change.

All of the projects we studied use Bugzilla (http://www.bugzilla.org) as a central bug-tracking system. Bug reports, despite their name, are not limited to just tracking defects; indeed, bug reports are also used to track feature requests (referred to as “enhancements” in Bugzilla). Thus, each report captures the notion of a single logical change to the system, which could be made either to fix a defect or to add new functionality to the system.

Table 21-2 shows the ratio of these two types of reports for the changes that we studied. For all three projects, the majority of the changes are fixes for defects in the system, although more than a quarter of the Mylyn changes are actually enhancements.

Table 21-2. Ratio of change types for each system

Project

Total changes

Defects (%)

Enhancements (%)

Evolution

1,939

1,792 (92.4%)

147 (7.6%)

Firefox

11,710

11,198 (95.6%)

512 (4.4%)

Mylyn

3,055

2,247 (73.6%)

808 (26.4%)

Each of these three projects uses a source control system that organizes and tracks changes using a notion of atomic commit operations. Each commit may modify multiple files and includes various metadata, including an identifier for the committer (typically a username or email address), the date and time at which the commit occurred, and a free-text description of the changes that were made.

A single change may correspond to more than one commit in the source control system. Developers may subdivide the work of one logical change into several smaller subtasks, completing and committing the code changes for each subtask before moving on to the next one. In some cases, multiple developers may contribute code to a new feature, each submitting their own part of the work as a separate commit. These commits are all part of the same logical change to the system, even though they happen to be committed to the source code repository separately.

We therefore define our unit of change as the aggregation of all commits associated with one bug report. Commits are grouped together by extracting bug IDs from the description field using the case-insensitive regular expression /bug #?(d+)/, then taking the union of the code changes from all commits with the same bug ID. For example, these are the description fields of three commits from the Firefox project that combine to form a single logical change:

  • Bug 385423. Refactor textrun cache so that all textrun clients use a single global word-based cache. Responsibility for stripping out problematic characters (e.g. newlines) is given to the word cache. r=vlad,smontagu

  • Bug 385423. Force ZWSP, PSEP and LSEP to be treated as zero-width invisible and not passed into platform textrun creation. Avoids potential bugs and forces consistent handling. r=vlad

  • [OS/2] Fix build break in gfxOS2Fonts.cpp (mimic gfxPangoFonts change that supposedly came from Bug 385423)

Commits with descriptions that did not match the specified regular expression were excluded from our analysis. For both Firefox and Mylyn, we were able to match approximately three-quarters of all commits (74.7% and 75.8%, respectively). For Evolution, only 19.5% of commits were matched, because most commits for this project did not explicitly reference a bug ID and therefore could not be analyzed.

Since our analysis will explore the scope of these changes (in particular, the number of modules consulted and modified by a developer as part of each change), it is useful to have a sense of the size of each change. Figure 21-3 shows the partial distribution of the number of lines of code modified in each change for Firefox; the distributions for the other two projects have a similar shape. Changes affecting over 1,000 lines are not shown in the histogram in order to save space, but are still counted in the summary statistics given in Table 21-3.

Distribution of the number of lines of code modified in each change to Firefox

Figure 21-3. Distribution of the number of lines of code modified in each change to Firefox

Table 21-3. Lines of code modified in each change

Project

Median

Mean

% changes affecting over 100 lines

Evolution

24

157.2

2.0%

Firefox

26

383.3

3.6%

Mylyn

62

248.5

8.6%

We observe that most changes to all three systems are very small, with a few very large changes (e.g., refactoring a large component or moving code from one directory to another) in the right tail.

For the Mylyn project, additional information about each change is available via “task context” files that are attached to most of the project’s bug reports. As we briefly mentioned earlier, Mylyn is a task-focused interface for the Eclipse IDE (http://www.eclipse.org). Among other features, Mylyn includes support for tracking all of the software artifacts a developer interacts with as he is working on a task. For each task, Mylyn maintains a set of “interesting” code elements (e.g., fields and methods) that includes not only those elements that have been modified but also those that have been navigated to in the editor [Kersten et al. 2006]. Many of the views within Eclipse can then be filtered to show only these interesting elements, allowing the developer to focus on only the subset of the larger system that is relevant to the task at hand.

As a matter of policy, whenever a developer is working to resolve a bug on the Mylyn project, she begins by opening a new task inside Mylyn that is linked directly to the bug report. When the developer has finished working on the bug, she publishes the task context information collected by Mylyn to the bug report. This posting allows other developers to import the task context into their own Mylyn instances if they wish to examine the system from the filtered perspective of the developer who worked on the bug.

Figure 21-4 shows a sample task context. Items in the context that are bold are so interesting to the developer as part of the task that they have become landmarks. The slider on the top left of the screenshot can be used to show elements only with interest greater than a threshold specified by the position of the slider. Developers do not typically work with the view shown in the screenshot, relying instead on filtering in other views in the IDE to show only the program elements related to the task context. The view in Figure 21-4 is used primarily to investigate the context prior to attaching to a bug.

A task context attached to a change as viewed from within Mylyn

Figure 21-4. A task context attached to a change as viewed from within Mylyn

Each task context stores timestamps that we can use to approximate the total amount of time the developer spent working on a change. Timestamps are recorded for many different types of “interaction events,” including selection events (when the developer clicks on a source code element), edit events (when the developer modifies text in the editor), and command events (as the developer invokes commands in the IDE). We expect these events to be generated frequently while a developer is actively working. We can therefore compute an estimate for the time spent working on a change by simply sorting the interaction events chronologically, taking the difference between each pair of consecutive events and summing all of these differences. In order to account for breaks in a developer’s work, we do not count the time that elapses between any pair of consecutive events that occur more than five minutes apart.

Thus when we analyze a change to Mylyn, we have two related sources of information: the commit logs from the source control system, which show the modifications a developer made to the code; and the task context files attached to the bug report, which show the path the developer took through the code in order to make those modifications. Using timestamps stored in the task context, we can also approximate how much time the developer spent working on the bug. This allows us, for at least one system, to answer questions about how a developer interacts with the system’s modules in a much richer way than just looking at the changes that are ultimately made.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset