Commit Emails and Gumption Sinks

This next example shows what can happen when a team doesn't pay enough attention to tool usage. It's about how a seemingly trivial interface decision can have large effects on how people behave. First, some background.

Most open source projects have a commit email list. The list receives an email every time a change enters the master repository, and the mail is generated automatically by the repository. Typically, each one shows the author of the change, the time the change was made, the associated log message, and the line-by-line change itself (expressed in the "patch" format mentioned earlier, except that for historical reasons, in this context the patch is called a "diff"). The email may also include URLs to provide a permanent reference for the change or for some of its subparts.

Here's a commit email from the Subversion project:

From: [email protected]
Subject: svn commit: r30009 - trunk/subversion/libsvn_wc
To: [email protected]
Date: Sat, 22 Mar 2008 13:54:38 -0700

Author: dionisos
Date: Sat Mar 22 13:54:37 2008
New Revision: 30009

Log:
Fix issue #3135 (property update on locally deleted file breaks WC).

* subversion/libsvn_wc/update_editor.c (merge_file): Only fill WC file
   related entry cache-fields if the cache will serve any use (ie
   when the entry is schedule-normal).

Modified:
   trunk/subversion/libsvn_wc/update_editor.c

Modified: trunk/subversion/libsvn_wc/update_editor.c
URL: http://svn.collab.net/viewvc/svn/trunk/subversion/libsvn_wc/update_editor.
c?pathrev=30009&r1=30008&r2=30009
==============================================================================
--- trunk/subversion/libsvn_wc/update_editor.c Sat Mar 22 07:33:07 2008 (r30008)
+++ trunk/subversion/libsvn_wc/update_editor.c Sat Mar 22 13:54:37 2008 (r30009)
@@ -2621,8 +2621,10 @@ merge_file(svn_wc_notify_state_t *conten
   SVN_ERR(svn_wc_  _loggy_entry_modify(&log_accum, adm_access,
                                       fb->path, &tmp_entry, flags, pool));
 
-  /* Log commands to handle text-timestamp and working-size */
-  if (!is_locally_modified)
+  /* Log commands to handle text-timestamp and working-size,
+     if the file is - or will be - unmodified and schedule-normal */
+  if (!is_locally_modified &&
+      (fb->added || entry->schedule == svn_wc_schedule_normal))
     {
       /* Adjust working copy file unless this file is an allowed
          obstruction. */

When done right, commit emails are a powerful collaboration tool for software projects. They're a perfect marriage of automated information flow and human participation. Each change arrives in the developer's mailbox packaged as a well-understood, discrete unit: an email. The developer can view the change using a comfortable and familiar interface (her mailreader), and if she sees something that looks questionable, she can reply, quoting just the parts of the change that interest her, and her reply will automatically be put into a thread that connects the change to everyone's comments on it. Thus, by taking advantage of the data management conventions already in place for email, people (and other programs) can conveniently track the fallout from any given change. [24]

However, another project I'm a member of, GNU Emacs, [25] does things a little differently. Partly for historical reasons, and partly because of the way its version control system [26] works, each commit to GNU Emacs generates twocommit emails: one showing the log message, and the other containing the diff itself.

The log message email looks like this:

From: Juanma Barranquero <[email protected]>
Subject: [Emacs-commit] emacs/lisp info.el
To: [email protected]
Date: Sat, 08 Mar 2008 00:09:29 +0000

CVSROOT:    /cvsroot/emacs
Module name:    emacs
Changes by:    Juanma Barranquero <lektu>    08/03/08 00:09:29

Modified files:
    lisp           : info.el

Log message:
    (bookmark-make-name-function, bookmark-get-bookmark-record):
    Pacify byte-compiler.

CVSWeb URLs:
http://cvs.savannah.gnu.org/viewcvs/emacs/lisp/info.el?cvsroot=emacs&r1=1.519&r2=1.520

And the diff email looks like this:

From: Juanma Barranquero <[email protected]>
Subject: [Emacs-diffs] Changes to emacs/lisp/info.el,v
To: [email protected]
Date: Sat, 08 Mar 2008 00:09:29 +0000

CVSROOT:    /cvsroot/emacs
Module name:    emacs
Changes by:    Juanma Barranquero <lektu>    08/03/08 00:09:29

Index: info.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/info.el,v
retrieving revision 1.519
retrieving revision 1.520
diff -u -b -r1.519 -r1.520
--- info.el    7 Mar 2008 19:31:59 -0000    1.519
+++ info.el    8 Mar 2008 00:09:29 -0000    1.520
@@ -3375,6 +3375,8 @@
 
 (defvar tool-bar-map)
 (defvar bookmark-make-record-function)
+(defvar bookmark-make-name-function)
+(declare-function bookmark-get-bookmark-record "bookmark" (bookmark))
 
 ;; Autoload cookie needed by desktop.el
 ;;;###autoload

Together, those two emails contain the same information as a single one like the one I showed earlier from the Subversion project. But the key word is together. They are not together; they are separate. Although the programmer committed a single change, [27] there is no single email containing everything a reviewer would need to understand and review that change. To review a change, you need the log message so that you can understand the general intent of the change, and the diff so that you can see whether the actual code edits match that intent.

It turns out that if people don't have both of these things in one place, they're much less likely to review changes.

Or so it seems from my highly rigorous [28] survey comparing the two projects. In February 2008, there were 207 unique threads (from 908 messages) on the Subversion development list. Of these, 50 were follow-up threads to commit emails. So by one reasonable measurement, 24% of development list attention goes to commit review (or if you want to count messages instead of threads, then a little over 5%). Note that follow-ups to Subversion commit emails are automatically directed to the main development list via a Reply-to header, so the development list is the right data set to use.

Meanwhile, in February 2008, the Emacs development list had 491 unique threads (from 3,158 messages), of which, apparently, zero were review mails.

Stunned at this result, I loosened my filters for detecting review mails and scanned again. This time I came up with, at most, 49 emails. But spot-checking those 49 showed that most seemed to be reviews of patches posted to the list from elsewhere, rather than reviews based on commit emails; only two were definite review mails. However, even if we count all 49 (which is almost certainly overly permissive), that's at most 10% of the development list traffic being commit reviews (or 1.5%, if we count messages instead of threads). Because Emacs does not automatically redirect commit email follow-ups to the main development list, I also searched in the commits list and the diffs list archives. I found no review follow-ups there in February, and exactly two in March, both of which were from me.

Now, the two projects have different commit rates and different traffic levels on their development lists. But we can partially control for this by approaching the question from the other side: what percentage of commits gets reviewed? In February 2008, Subversion had 274 unique commits, and Emacs had 807. [29] Thus, the ratio of review threads to commits for Subversion is about 18%, and for Emacs is somewhere between 0% and 6% (but probably tending low, around 0.2%, if there really were only two true review mails).

Something is happening here, something that makes one project much more likely than the other to do peer review. What is it?

I cannot prove it, of course, but I think it's simply the fact that each Emacs commit arrives separated into two emails: one for the log message and another for the diff. There is no way, using a normal mail-reading interface without extensive customizations, to view the log message and the diff at the same time. Thus, there is no convenient way to review a change. It's not that review is impossible, or even hard. It's neither: if I wanted to, I could review all the Emacs commits, and so could the other developers. But each review would require shuffling back and forth between two emails, or clicking on the URL in the shorter email and waiting for a page to load. In practice, it's too much trouble, and I can never be bothered. Apparently, neither can anyone else.

There's a sobering lesson here: adding a few seconds of overhead to a common task is enough to make that task uncommon. Your team isn't lazy, just human. Put light switches at roughly shoulder level and people will happily turn off lights when they leave a room; put the switches at knee level, and your electricity bill will skyrocket.

What is the cost to a project of not getting code review? Pretty high, I think. Looking over the Subversion commits for that month, 55 indicate that they follow up to a previous specific commit, and 35 bear a special marker (see http://subversion.tigris.org/hacking.html#crediting) indicating that they fix problems found by someone else. In my experience with the project, such suggestions are usually the result of commit review. Thus, probably somewhere between 12% and 20% of the commits made in Subversion result from review of previous commits. I cannot easily come up with the comparable number for Emacs, because Emacs does not have standardized attribution conventions the way Subversion does. But I've been watching Emacs development for a long time, and while it's clear that some percentage of commits there results from reviews of previous commits (or from just stumbling across problematic code in the course of other work), I do not believe it reaches 12% to 20%.

Besides, the benefits of timely commit review cannot be measured solely in further code changes. Commit review sustains morale, sharpens people's skills (because they're all learning from each other), reinforces a team's ability to work together (because everyone gets accustomed to receiving constructive criticism from everyone else), and spurs participation (because review happens in public, thus encouraging others to do the same). To deprive a team of these benefits because of a trivial user interface choice is a costly mistake.



[25] GNU Emacs is a text editing tool favored by many programmers, and is one of the oldest continuously maintained free software programs around. See http://www.gnu.org/software/emacs/ for more.

[27] Some people use the word changeset for what I'm talking about here: the situation where perhaps multiple files were modified, but the modifications are all part of a single logical group. We can call that overall group a "change" or a "changeset"; the two words mean the same thing in this context.

[28] Read: "extremely anecdotal."

[29] If you're checking these numbers against primary sources, note that when counting Emacs commits, you shouldn't count commits that affect only ChangeLog files, because (due to certain oddities of the way the Emacs project uses its version control system) those are duplicates of other commits.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset