Chapter 9. Keep Your Codebase Small

Program complexity grows until it exceeds the capability of the programmer who must maintain it.

7th Law of Computer Programming

Guideline:

  • Keep your codebase as small as feasible.

  • Do this by avoiding codebase growth and actively reducing system size.

  • This improves maintainability because having a small product, project, and team is a success factor.

A codebase is a collection of source code that is stored in one repository, can be compiled and deployed independently, and is maintained by one team. A system has at least one codebase. Larger systems sometimes have more than one codebase. A typical example is packaged software. There may be a codebase for the standard functionality, and there are different, independently maintained codebases for customer- or market-specific plugins.

Given two systems with the same functionality, in which one has a small codebase and the other has a large codebase, you surely would prefer the small system. In a small system it is easier to search through, analyze, and understand code. If you modify something, it is easier to tell whether the change has effects elsewhere in the system. This ease of maintenance leads to fewer mistakes and lower costs. That much is obvious.

Motivation

Software development and maintenance become increasingly hard with growing system size. Building larger systems requires larger teams and longer-lasting projects, which bring additional overhead and risks of (project) failure. The rest of this section discusses the adverse effects of large software systems.

A Project That Sets Out to Build a Large Codebase Is More Likely to Fail

There is a strong correlation between project size and project risks. A large project leads to a larger team, complex design, and longer project duration. As a result, there is more complex communication and coordination among stakeholders and team members, less overview over the software design, and a larger number of requirements that change during the project. This all increases the chance of reduced quality, project delays, and project failure. The probabilities in the graph in Figure 9-1 are cumulative: for example, for all projects over 500 man-years of development effort, more than 90% are indentified as “poor project quality.” A subset of this is projects with delays (80–90% of the total) and failed projects (50% of the total).

Figure 9-1 illustrates the relationship between project size and project failure: it shows that as the size of a project increases, the chances of project failure (i.e., project is terminated or does not deliver results), of project delay, and of a project delivered with poor quality are increasingly high.

Chance of project failures by system size
Figure 9-1. Probability of project failures by project size1

Large Codebases Are Harder to Maintain

Figure 9-2 illustrates how codebase size affects maintainability.

Distribution of system maintainability in SIG benchmark among different volume groups
Figure 9-2. Distribution of system maintainability in SIG benchmark among different volume groups

The graph is based on a set of codebases of over 1,500 systems in the SIG Software Analysis Warehouse. Volume is measured as the amount of development effort in man-years to reproduce the system (see also “How SIG Rates Codebase Volume”). Each bar shows the distribution of systems in different levels of maintainability (benchmarked in stars). As the graph shows, over 30% of systems in the smallest volume category manage to reach 4- or 5-star maintainability, while in the largest volume category only a tiny percentage reaches this level.

Large Systems Have Higher Defect Density

You may expect that a larger system has more defects in absolute numbers. But the defect density (defined as the number of defects per 1,000 lines of code) also increases substantially as systems grow larger. Figure 9-3 shows the relationship between code volume and the number of defects per 1,000 lines of code. Since the number of defects rises when code volume grows, the graph shows that larger systems have higher defects both absolutely and relatively.

...
Figure 9-3. Impact of code volume on the number of defects2

How to Apply the Guideline

All other things being equal, a system that has less functionality will be smaller than a system that has more functionality. Then, the implementation of that functionality may be either concise or verbose. Therefore, achieving a small codebase first requires keeping the functionality of a system limited, and then requires attention to keep the amount of code limited.

Functional Measures

Functionality-related measures are not always within your span of control, but whenever new or adapted functionality is being discussed with developers, the following should be considered:

Fight scope creep:

In projects, scope creep is a common phenomenon in which requirements extend during development. This may lead to “nice-to-have functionality” that adds growth to the system without adding much value to the business or the user. Fight scope creep by confronting the business with the price of additional functionality, in terms of project delays or higher future maintenance costs.

Standardize functionality:

By standardization of functionality we mean consistency in the behavior and interactions of the program. First of all, this is intended to avoid the implementation of the same core functionality in multiple, slightly different ways. Secondly, standardization of functionality offers possibilities for reuse of code—assuming the code itself is written in a reusable way.

Technical Measures

For the technical implementation, the goal is to use less code to implement the same functionality. You can achieve this mainly through reusing code by referral (instead of writing or copying and pasting code again) or by avoiding coding altogether, but using existing libraries or frameworks.

Do not copy and paste code:

Referring to existing code is always preferable to copying and pasting code in pieces that will need to be maintained individually. If there are multiple copies of a piece of code, maintenance needs to occur in multiple places, too. Mistakes easily crop up if an update in one piece of logic requires individual adjustment (or not) and testing of multiple, scattered copies. Note that the intention of the guideline presented in Chapter 4 is precisely to avoid copying and pasting.

Refactor existing code:

While refactoring has many merits for code maintainability, it can have an immediate and visible effect in reducing the codebase. Typically, refactoring involves revisiting code, simplifying its structure, removing code redundancies, and improving the amount of reuse. This may be as simple as removing unused/obsolete functionality. See, for example, the refactoring patterns in Chapter 4.

Use third-party libraries and frameworks:

Many applications share the same type of behavior for which a vast number of frameworks and libraries exist—for example, UI behavior (e.g., jQuery), database access (e.g., Hibernate), security measurements (e.g., Spring Security), logging (e.g., SLF4J), or utilities (e.g., Google Guava). Using third-party libraries is especially helpful for such generic functionality. If functionality is used and maintained by other parties, why invent your own? Using third-party code is especially helpful because it avoids unnecessary over-engineering. It is well worth considering adjusting functionality to fit it to third-party code instead of building a custom solution.

Important

Do not make changes to the source code of a third-party library. If you do, essentially you have made the library code part of your own codebase. In particular, updates of changed libraries are cumbersome and can easily lead to bugs. Typically, difficulties arise when developers try to update the library to a newer version, since they need to analyze what has been changed in the library code and how that impacts the locally changed code.

Split up a large system:

Splitting up a large system into multiple smaller systems is a way to minimize the issues that come with larger systems. A prerequisite is that the system can be divided into parts that are independent, from a functional, technical, and lifecycle perspective. To the users, the systems (or plugins) must be clearly separated. Technically, the code in the different systems must be loosely coupled; that is, their code is related via interfaces instead of direct dependencies. Systems are only really independent if their lifecycles are decoupled (i.e., they are developed and released independently). Note that the split systems may well have some mutual or shared dependencies. There is an additional advantage. It might turn out that some of the new subsystems can be replaced by a third-party package, completely removing the need to have any codebase for this subsystem. An example is a Linux distribution such as Ubuntu. The Linux kernel is a codebase that lives at kernel.org and is maintained by a large team of volunteers headed by Linus Torvalds. Next to the actual Linux kernel, a distribution contains thousands of other software applications, each of which has its own codebase. These are the types of plugins that we mean here.

Decoupling (on a code level) is discussed in more detail in the chapters that deal with loose coupling, particularly Chapter 7.

Common Objections to Keeping the Codebase Small

The measures described in this chapter are applicable to all phases of software development. They support the primary maintainability goal of achieving a small codebase.

There are generally two familiar strategies with which you can actively pursue the goal of a small codebase: avoiding the problem (avoiding further codebase growth) or fixing the problem (reducing the size of the codebase).

The biggest long-term gains are achieved when you are working on a system that is already quite small or in an early stage of development. Technical adjustments such as refactoring and reuse of functionality are easier with a small system and will be beneficial for all further coding.

The most visible improvements will appear once a system is big and parts of it can be removed—for example, when functionality is being replaced by third-party code or after a system has been split into multiple parts.

Objection: Reducing the Codebase Size Is Impeded by Productivity Measures

“I cannot possibly reduce the size of my system, since my programming productivity is being measured in terms of added code volume.”

If this is the case, we suggest escalating this issue. Measuring development productivity in terms of added code volume is a bad practice. It provides a negative incentive, as it encourages the bad habit of copying and pasting code. Code reference is better because it improves analyzing, testing, and changing code.

We understand that the number of code additions can help managers monitor progress and predict timelines. However, productivity should be measured in terms of value added, not lines of code added. Experienced developers can often add functionality with a minimum number of additional lines of code, and they will refactor the code whenever they see an opportunity, often resulting in reduction of the code size.

Objection: Reducing the Codebase Size is Impeded by the Programming Language

“I work with a language that is more verbose than others, so I cannot achieve a small codebase.”

In most projects, the programming language is a given. It may very well be true that in some programming languages, it is impossible to get a small codebase (SQL-based languages come to mind). However, you can always strive to get a smaller codebase than you currently have, in the same programming language. Every codebase benefits from decreasing its size, even those in low-level languages with little possibility for abstraction.

Objection: System Complexity Forces Code Copying

“My system is so complicated that we can only add functionality by copying large pieces of existing code. Hence, it is impossible to keep the codebase small.”

Difficulty in understanding existing code, and hence the fear of touching that code, is a common reason that programmers resort to copying and pasting. This is particularly the case if the code has an insufficient number of automated tests.

The best approach here is to find the functionality that is most like the one that you are trying to add. By analyzing that code, you should find some common functionality; otherwise, you would not consider copying that code in the first place. If that original functionality can be split up into multiple parts, then ideally you end up with a piece of code that can be referred to independently by the new functionality, avoiding duplication and taming codebase growth. Write unit tests for the new units to verify that you understand the inner workings of the unit. Besides, it is recommended practice; see Chapter 10.

Objection: Splitting the Codebase Is Impossible Because of Platform Architecture

“We cannot split the system into smaller parts because we are building for a platform where all functionality is tied to a common codebase.”

Yes, platform-based software tends to grow large over time because it assimilates new functionality and rarely reduces functionality. One way to dramatically decrease the size of the codebase is to decouple the system into a plug-in architecture. This leads to multiple codebases that are each smaller than the original one. There is a codebase for the common core, and one or more codebases for the plugins. If those plugins are technically decoupled, they allow for separate release cycles. That means that small changes in functionality do not need an update of the whole system. Keep in mind that those small updates still need full integration/regression tests to ensure that the system as a whole still functions as expected.

Objection: Splitting the Codebase Leads to Duplication

“Splitting the codebase forces me to duplicate code.”

There may be cases in which decoupling a system into separate parts (such as plugins/extensions) requires (interfaces to) common functionality or data structures to be duplicated in those extensions.

In such a case, duplication is a bigger problem than having a large codebase, and the guideline of Chapter 4 prevails over this guideline of achieving a small codebase. It is then preferable to code common functionality either as a separate extension or as part of a common codebase.

Objection: Splitting the Codebase Is Impossible Because of Tight Coupling

“I cannot split up my system since the system parts are tightly coupled.”

Then decouple the system first. To achieve this, you can write specific interfaces that act as a uniform entry point to functionality. This can be achieved with WebServices, REST APIs, or other tooling that provides that functionality (e.g., middleware or ESB).

Note

Keep in mind that the goal is to have subsystems that can be maintained independently, not necessarily systems that operate independently.

1 Source: The Economics of Software Quality by Capers Jones and Olivier Bonsignour (Addison-Wesley Professional 2012). The original data is simplified into man-years (200 function points/year for Java).

2 Source: Steve McConnell, Code Complete, 2nd edition (Microsoft Press, 2004), p.652.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset