What Is a Module?

To be able to discuss how developers work with the modules in a software system, we need a consistent definition of what a module is and how the code in one module is separated from other modules in the system.

Although it would be convenient to define modules by reference to explicit declarations in the source code, this approach presents two difficulties for our analysis. First, the projects we selected for our study are implemented in a variety of programming languages, each of which treats modularity differently. Evolution is implemented in C, which provides minimal support for defining modules within source code. Firefox primarily uses C++, a language that supports modules at the language level through namespace declarations, but Firefox itself does not make consistent use of this language feature. Further, Firefox also includes code written in many other languages, such as JavaScript and XUL (the XML User Interface Language), which have completely different modularity constructs. Mylyn, a Java application, is the only one of the three systems that has clearly defined modules within source code, which it accomplishes using Java packages.

Second, the systems we chose to study are sufficiently large and long-lived that analysis at the fine-grained level of the programming language could be overwhelmed by development issues, such as changing members of the development team and refactorings of the code. To reduce the likelihood that such issues would impact our analysis, we chose to focus on a coarser granularity of modules that are more representative of architectural modules. It would be interesting to focus subsequent analysis on a finer-grained definition of module.

Rather than trying to define a module in terms of these programming-language-specific features, we instead turn to the directory structure used by each system to store its code. As we will see, all three of the projects we analyzed organize their filesystems in such a way that a consistent definition of a module can be applied to each. Further, this representation of modules as directories makes it trivial to identify the boundaries that separate each module from other modules in the system.

Both Evolution and Firefox use a simple directory structure that matches the way other data is typically stored on a computer: each top-level directory represents a major system component, with subdirectories dividing each component into several smaller subcomponents. Here’s how that looks for one small part of the Firefox repository:

layout         netwerk         parser
  /base          /base           /expat
  /generic       /cache          /htmlparser 
  /inspector     /cookie         /xml
  ...           ...

Each subdirectory may contain additional subdirectories dividing source code from test files and documentation.

Two possible definitions for a module emerge from this structure. At a high level, each top-level directory (e.g., layout) could be considered a module; at a lower level, each subdirectory of these top-level directories (e.g., layout/base) could be considered a separate module.

For Evolution, the smallest of the three projects in our study, we define a module as a top-level directory. For Firefox, which is an order of magnitude larger, we define a module as a subdirectory of a top-level directory. Applying the former definition to Firefox (i.e., using only top-level directories) results in extremely large modules that mask many of the interesting interactions within the system. Conversely, if we were to break Evolution down into subdirectory-level modules, there would be too much noise in the resulting data to perform a meaningful analysis.

Mylyn has a very different directory structure. Java source files are arranged in a nested directory structure that matches their package declarations, resulting in a much deeper folder hierarchy than we find with the C/C++ systems. These Java packages are in turn grouped together to form Eclipse projects (the top level of organization within the Eclipse IDE), which themselves are grouped by naming conventions. Part of the Mylyn repository looks like this:

org.eclipse.mylyn.bugzilla.core
  /src/org/eclipse/mylyn/internal/bugzilla/core
    /history
    /service
org.eclipse.mylyn.bugzilla.ui
  /src/org/eclipse/mylyn/internal/bugzilla/ui
    /action
    /editor
    /search
    ...
org.eclipse.mylyn.context.core
  /src/org/eclipse/mylyn/context
    /core
    /internal/core

As with the other systems, a couple of definitions for a module are readily apparent from this structure. The top-level directories, representing Eclipse projects, correspond roughly to the subdirectories in Evolution and Firefox. At an even higher level, we can group these directories together by the first component after mylyn in their names. For example, both org.eclipse.mylyn.bugzilla.core and org.eclipse.mylyn.bugzilla can be seen as part of a larger bugzilla component. Indeed, if we look at the Java package structure, both of these directories contain only code belonging to subpackages of org.eclipse.mylyn.bugzilla, so this choice of grouping matches the language semantics.

It is the latter top-tier model that we use here, rather than the Firefox second-tier model: we define a module in Mylyn as all the top-level directories that correspond to the same immediate subpackage of org.eclipse.mylyn. This choice allows the core logic and UI widgets, which typically reside in separate directories in this structure, to be viewed as a single module in the system.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset