Chapels in the Bazaar of Open Source Software

The previous two studies have looked at commercial software. The past two decades have seen a new style of development termed free or open source software that occurs primarily over the Internet between parties who have never met and who do not (often) share a financial interest. It differs from “traditional” development in a number of ways. Although OSS contributors may be paid by their own employers to work on the projects, the OSS project itself rarely pays contributors directly. Thus, most people can come and go as they please. Since no one company “runs” the project, there is no mandated organizational structure.

Eric Raymond, who emerged as a spokesperson for this movement by writing a series of essays, has posited that this form of development can be characterized as an ad-hoc bazaar in which contributors meander around the code base, talk to others somewhat at random, and work on whatever they please. In contrast, he characterizes a controlled and planned development process in an industrial setting as a cathedral model where workers are directed to work on clearly delineated tasks and everything is planned up front. Of course, the dichotomy involves a bit of hyperbole. Even characterizing open source development and industrial development as two distinct styles is quite a generalization.

We’ve seen that studies of industrial projects tend to follow Conway’s Corollary, but what about open source software? One of the reasons that Brooks’ law holds is due to the communication overhead involved in large teams. If OSS is really a bazaar that lacks structure, then how does it overcome the quadratic increase in possible communication paths as the project grows? We decided to investigate the social structure of a number of prominent OSS projects to see whether they really do resemble a bazaar, and additionally to look at the relationship between the organizational structure and the architecture of the system [Bird et al. 2008].

Unlike commercial entities, OSS projects lack a mandated and clearly defined organizational structure. In order to recover the social structure of each OSS project, we gathered the historical archives of the developer mailing lists. These lists comprise discussions regarding decisions, fixes, policies, and nearly all communication required to run an OSS project. Some OSS projects also use IRC, but those that we selected require discussions of import to be conducted on the mailing lists. The participants on these lists are the project developers (i.e., those with write access to the source code repository), contributors who submit patches, and others who simply want to participate in discussions regarding features, bugs, and everything else. Many of the active participants in these projects use multiple email addresses, and we want to be able to attribute each email that a particular person uses to that one person rather than a number of entities. To address this issue, we resolved email aliases using heuristics and manual inspection. We also matched email addresses to source code repository accounts for the developers. We were able to reconstruct the social organization by examining interactions on these mailing lists to create social networks. If Alice posted a message to the developer mailing list and Bob read it and posted a response, there is evidence of information flow and possible collaboration between Alice and Bob. We therefore create an organizational network of all project mailing list participants based on these interactions. The network edges are weighted based on the number of interactions between each pair of participants.

We examined the Perl and Python projects, the Apache webserver, the PostgreSQL database system, and the Ant Java build system. The number of mailing list participants ranged from 1,329 for Python to 3,621 for Perl and the number of actual project developers ranged from 25 for Perl to 92 for Python. Table 11-2 shows statistics for the projects that we studied.

Table 11-2. Open source project statistics

ProjectApacheAntPythonPerlPostgreSQL
Begin date1995-02-272000-01-121999-04-211999-03-011998-01-03
End date2005-07-132006-08-312006-07-272007-06-202007-03-01
Messages101,25073,15766,541112,514132,698
List participants2,0171,9601,3293,6213,607
Developers5740922529

We drew on the social network along with the commit activity of developers to answer two important questions:

  • Do teams of participants form spontaneously and organically in the social network revealed by communication in the OSS projects?

  • What is the relationship between the technical structure of the software and the teams of participants?

To answer the first question, we turn to the area of network analysis in complex physics. One active research topic in this community is the detection of community structure, the existence of strongly connected subnetworks within a network. Community structure detection techniques attempt to partition a network into groups of nodes such that the connections within groups are dense and the connections between groups are sparse. The level of community structure is quantified by a measure called modularity. Modularity can range from 0, which represents a completely random network, to 1, which represents a network of a number of disconnected cliques. Prior research has found that naturally occurring networks with clear clusters (e.g., networks that are modular) have values of modularity ranging from 0.3 to 0.7.

Figure 11-2 shows the boxplots of the modularity for three-month snapshots in each of these projects. The vast majority of modularity values shown are well above the 0.3 threshold, indicating strong community structure. Put more concretely, although the networks are too large and complex to easily visualize, the communication network is modular, with well-defined groups of developers who interact with each other much more than with other developers in other modules. This is evidence that these projects are not disorganized, chaotic groups of developers with everyone talking to everyone else in a haphazard manner. When left to their own devices, it appears that (at least for these projects) OSS projects actually are characterized by organized teams of developers. The key difference between industrial teams and these teams is that the OSS teams are formed organically rather than having their structure imposed by management. Further, we observed that the makeup of the teams was more dynamic than industrial software teams, with teams rarely lasting more than six months before disbanding and reforming in different ways.

Average congruence by release

Figure 11-2. Average congruence by release

The mere existence of tightly connected subgraphs within the communication network for these projects is not in and of itself a validation of Conway’s Law or our corollary. We are also interested in the relationship between these groups of participants who are talking to one another and the technical tasks that they are working on. Is the social structure tied to the software structure? To answer this question, we looked at the development activity, characterized by the files, function, and classes of the developers involved in each team. Based on our examination of the development behavior, we present two observations.

First, those who talk together work on related parts of the system. If two developers were found to be in the same software team by community structure detection algorithms, they were much more likely to be working on the same files and functions. Further, these same functions and files were often mentioned in the bodies of the email communication between members of the same team. Thus, the communication patterns are reflective of collaborative development behavior.

Second, we looked closely at the actual development to see what was really happening. We present a few examples here and refer the reader to our paper for a more in-depth analysis.

In the Apache webserver project, from May to July of 2003, one team consisted of Rowe and Thorpe (developers) as well as Deaves, Adkins, and Chandran. They discussed a number of bug fixes to mod_ssl, the Apache interface to the Secure Sockets Layer (SSL). Topics in the discussion included issues with incorrect input/output code; module loading, unloading, and initialization; and integration of mod_ssl with the server. Nearly all the discussion was about the SSL code, and virtually all of the files modified by people in this group during this time period are in the modules/ssl directory, as shown here:

modules/arch/win32/mod_isapi.h
modules/ssl/mod_ssl.c
modules/ssl/mod_ssl.h
modules/ssl/ssl_engine_config.c
modules/ssl/ssl_engine_init.c
modules/ssl/ssl_engine_io.c
modules/ssl/ssl_engine_kernel.c
modules/ssl/ssl_engine_pphrase.c
modules/ssl/ssl_toolkit_compat.h
modules/ssl/ssl_util.c
modules/ssl/ssl_util_ssl.c
modules/ssl/ssl_util_ssl.h
support/ab.c

In another example, from October to December of 2002 in PostgreSQL, one team worked solely on embedded SQL in C, while another focused on updating the SGML documentation source. In the following time period, a group emerged whose activity and discussion concerned the development and testing of the PostgreSQL JDBC driver (with source code and test code changes spanning the code base within the JDBC subtree), and another much smaller group worked on Unicode support.

We found many more examples of teams of project participants forming around tasks that were directly related to portions of the code base. The discussion of approaches, delegation of work, and decision making in the communication was reflective of the actual development work (and in some cases, contributed patches) that was going on. Clearly, Conway’s Corollary is at work in these projects.

So why do these findings matter? First, this shows that the strong ties between software structure and social structure appear to transcend different development processes and that even open source software projects seem to work well when these relationships are followed. Second, we’ve learned that OSS solves the problem of scalability and delegation of work in ways that are similar to more traditional development styles. Teams made up of a few individuals (we saw team sizes range from three to seven participants on average) work together to accomplish tasks within limited portions of the software. The common claim (though still open to debate) that successful OSS projects are self-adapting and self-optimizing communities is supported by their conformance to Conway’s Corollary, as shown by our study. Also, although it might have been thought that OSS is able to somehow avoid the effects of Brooks’ Law and Conway’s Law, it appears that they adhere to both.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset