Chapter 3. Write Simple Units of Code

Each problem has smaller problems inside.

Martin Fowler

Guideline:

  • Limit the number of branch points per unit to 4.

  • Do this by splitting complex units into simpler ones and avoiding complex units altogether.

  • This improves maintainability because keeping the number of branch points low makes units easier to modify and test.

Complexity is an often disputed quality characteristic. Code that appears complex to an outsider or novice developer can appear straightforward to a developer that is intimately familiar with it. To a certain extent, what is “complex” is in the eye of the beholder. There is, however, a point where code becomes so complex that modifying it becomes extremely risky and very time-consuming task, let alone testing the modifications afterward. To keep code maintainable, we must put a limit on complexity. Another reason to measure complexity is knowing the minimum number of tests we need to be sufficiently certain that the system acts predictably. Before we can define such a code complexity limit, we must be able to measure complexity.

A common way to objectively assess complexity is to count the number of possible paths through a piece of code. The idea is that the more paths can be distinguished, the more complex a piece of code is. We can determine the number of paths unambiguously by counting the number of branch points. A branch point is a statement where execution can take more than one direction depending on a condition. Examples of branch points in Java code are if and switch statements (a complete list follows later). Branch points can be counted for a complete codebase, a class, a package, or a unit. The number of branch points of a unit is equal to the minimum number of paths needed to cover all branches created by all branch points of that unit. This is called branch coverage. However, when you consider all paths through a unit from the first line of the unit to a final statement, combinatory effects are possible. The reason is that it may matter whether a branch follows another in a particular order. All possible combinations of branches are the execution paths of the unit—that is, the maximum number of paths through the unit.

Consider a unit containing two consecutive if statements. Figure 3-1 depicts the control flow of the unit and shows the difference between branch coverage and execution path coverage.

blms 0301
Figure 3-1. Branch coverage and execution path coverage

Suppose the point to the left of the first if statement modifies a database, and the point to the right of the second if statement reads from that database. These are side effects and require us to test the “zigzag” paths as well (the dotted lines in Figure 3-1).

In summary, the number of branch points is the number of paths that cover all branches created by branch points. It is the minimum number of paths and can be zero (for a unit that has no branch points). The number of execution paths is a maximum, and can be very large due to combinatorial explosion. Which one to choose?

The answer is to take the number of branch points plus one. This is called cyclomatic complexity or McCabe complexity. Consequently, the guideline “limit the number of branch points per unit to 4” is equal to “limit code McCabe complexity to 5.” This is the minimum number of test cases that you need to cover a unit such that every path has a part not covered by the other paths. The cyclomatic (McCabe) complexity of a unit is at least one, which is easy to understand as follows. Consider a unit with no branch points. According to the definition, its cyclomatic complexity is one (number of branch points plus one). It also fits intuitively: a unit with no branch points has one execution path, and needs at least one test.

For the sake of completeness: only for units with one exit point, the cyclomatic or McCabe complexity is equal to the number of branch points plus one. It becomes more complex for units with more than one exit point. Do not worry about that: focus on limiting the number of branch points to four.

Tip

The minimum number of tests needed to cover all independent execution paths of a unit is equal to the number of branch points plus one.

Now consider the following example. Given a nationality, the getFlagColors method returns the correct flag colors:

public List<Color> getFlagColors(Nationality nationality) {
    List<Color> result;
    switch (nationality) {
    case DUTCH:
        result = Arrays.asList(Color.RED, Color.WHITE, Color.BLUE);
        break;
    case GERMAN:
        result = Arrays.asList(Color.BLACK, Color.RED, Color.YELLOW);
        break;
    case BELGIAN:
        result = Arrays.asList(Color.BLACK, Color.YELLOW, Color.RED);
        break;
    case FRENCH:
        result = Arrays.asList(Color.BLUE, Color.WHITE, Color.RED);
        break;
    case ITALIAN:
        result = Arrays.asList(Color.GREEN, Color.WHITE, Color.RED);
        break;
    case UNCLASSIFIED:
    default:
        result = Arrays.asList(Color.GRAY);
        break;
    }
    return result;
}

The switch statement in the method body needs to handle all cases of the nationality enumeration type and return the correct flag colors. As there are five possible nationalities and the unclassified/default case, the number of isolated paths to be tested (control flow branches) is six.

On first sight, the getFlagColors method might seem harmless. Indeed, the method is quite readable, and its behavior is as expected. Still, if we want to test the behavior of this method, we would need six unique test cases (one for each nationality plus one for the default/unclassified case). Writing automated tests might seem excessive for the getFlagColors method, but suppose a developer adds the flag of Luxembourg (which is very similar to the Dutch flag) as a quick fix:

...
case DUTCH:
    result = Arrays.asList(Color.RED, Color.WHITE, Color.BLUE);
case LUXEMBOURGER:
    result = Arrays.asList(Color.RED, Color.WHITE, Color.LIGHT_BLUE);
    break;
case GERMAN:
....

Being in a hurry, the developer copied the constructor call for the Dutch flag and updated the last argument to the right color. Unfortunately, the break statement escaped the developer’s attention, and now all Dutch nationalities will see the flag from Luxembourg on their profile page!

This example looks like a forged scenario, but we know from our consultancy practice that this is what happens to complex code in reality. These types of simple mistakes are also responsible for many “trivial” bugs that could be easily prevented.

To understand why complex code is such a problem for maintenance, it is important to realize that code that starts out quite straightforward tends to grow much more complex over time. Consider the following snippet taken from the codebase of the open source build server Jenkins. There are 20 control flow branches in this code snippet. Imagine having to modify or test this method:

/**
 * Retrieve a user by its ID, and create a new one if requested.
 * @return
 *      An existing or created user. May be {@code null} if
 *      a user does not exist and {@code create} is false.
 */
private static User getOrCreate(String id, String fullName,
    boolean create) {
    String idkey = idStrategy().keyFor(id);

    byNameLock.readLock().lock();
    User u;
    try {
        u = byName.get(idkey);
    } finally {
        byNameLock.readLock().unlock();
    }
    final File configFile = getConfigFileFor(id);
    if (!configFile.isFile() && !configFile.getParentFile().isDirectory()) {
        // check for legacy users and migrate if safe to do so.
        File[] legacy = getLegacyConfigFilesFor(id);
        if (legacy != null && legacy.length > 0) {
            for (File legacyUserDir : legacy) {
                final XmlFile legacyXml = new XmlFile(XSTREAM,
                    new File(legacyUserDir, "config.xml"));
                try {
                    Object o = legacyXml.read();
                    if (o instanceof User) {
                        if (idStrategy().equals(id, legacyUserDir.getName())
                            && !idStrategy()
                                .filenameOf(legacyUserDir.getName())
                                .equals(legacyUserDir.getName())) {
                            if (!legacyUserDir
                                .renameTo(configFile.getParentFile())) {
                                LOGGER.log(Level.WARNING,
                                    "Failed to migrate user record " +
                                    "from {0} to {1}",
                                    new Object[] {legacyUserDir,
                                        configFile.getParentFile()});
                            }
                            break;
                        }
                    } else {
                        LOGGER.log(FINE,
                            "Unexpected object loaded from {0}: {1}",
                            new Object[] {legacyUserDir, o});
                    }
                } catch (IOException e) {
                    LOGGER.log(Level.FINE,
                        String.format(
                            "Exception trying to load user from {0}: {1}",
                            new Object[] {legacyUserDir, e.getMessage()}),
                        e);
                }
            }
        }
    }
    if (u == null && (create || configFile.exists())) {
        User tmp = new User(id, fullName);
        User prev;
        byNameLock.readLock().lock();
        try {
            prev = byName.putIfAbsent(idkey, u = tmp);
        } finally {
            byNameLock.readLock().unlock();
        }
        if (prev != null) {
            u = prev; // if someone has already put a value in the map, use it
            if (LOGGER.isLoggable(Level.FINE)
                && !fullName.equals(prev.getFullName())) {
                LOGGER.log(Level.FINE,
                    "mismatch on fullName (‘" + fullName + "’ vs. ‘"
                        + prev.getFullName() + "’) for ‘" + id + "’",
                    new Throwable());
            }
        } else
            if (!id.equals(fullName) && !configFile.exists()) {
            // JENKINS-16332: since the fullName may not be recoverable
            // from the id, and various code may store the id only, we
            // must save the fullName
            try {
                u.save();
            } catch (IOException x) {
                LOGGER.log(Level.WARNING, null, x);
            }
        }
    }
    return u;
}

Motivation

Based on the code examples in the previous section, keeping your units simple is important for two main reasons:

  • A simple unit is easier to understand, and thus modify, than a complex one.

  • Simple units ease testing.

Simple Units Are Easier to Modify

Units with high complexity are generally hard to understand, which makes them hard to modify. The first code example of the first section was not overly complicated, but it would be when it checks for, say, 15 or more nationalities. The second code example covers many use cases for looking up or creating users. Understanding the second code example in order to make a functional change is quite a challenge. The time it takes to understand the code makes modification harder.

Simple Units Are Easier to Test

There is a good reason you should keep your units simple: to make the process of testing easier. If there are six control flow branches in a unit, you will need at least six test cases to cover all of them. Consider the getFlagColors method: six tests to cover five nationalities plus the default case would prevent trivial bugs from being introduced by maintenance work.

How to Apply the Guideline

As explained at the beginning of this chapter, we need to limit the number of branch points to four. In Java the following statements and operators count as branch points:

  • if

  • case

  • ?

  • &&, ||

  • while

  • for

  • catch

So how can we limit the number of branch points? Well, this is mainly a matter of identifying the proper causes of high complexity. In a lot of cases, a complex unit consists of several code blocks glued together, where the complexity of the unit is the sum of its parts. In other cases, the complexity arises as the result of nested if-then-else statements, making the code increasingly harder to understand with each level of nesting. Another possibility is the presence of a long chain of if-then-else statements or a long switch statement, of which the getFlagColors method in the introduction is an example.

Each of these cases has its own problem, and thus, its own solution. The first case, where a unit consists of several code blocks that execute almost independently, is a good candidate for refactoring using the Extract Method pattern. This way of reducing complexity is similar to Chapter 2. But what to do when faced with the other cases of complexity?

Dealing with Conditional Chains

A chain of if-then-else statements has to make a decision every time a conditional if is encountered. An easy-to-handle situation is the one in which the conditionals are mutually exclusive; that is, they each apply to a different situation. This is also the typical use case for a switch statement, like the switch from the getFlagColors method.

There are many ways to simplify this type of complexity, and selecting the best solution is a trade-off that depends on the specific situation. For the getFlagColors method we present two alternatives to reduce complexity. The first is the introduction of a Map data structure that maps nationalities to specific Flag objects. This refactoring reduces the complexity of the getFlagColors method from McCabe 7 to McCabe 2.

private static Map<Nationality, List<Color>> FLAGS =
    new HashMap<Nationality, List<Color>>();

static {
    FLAGS.put(DUTCH, Arrays.asList(Color.RED, Color.WHITE, Color.BLUE));
    FLAGS.put(GERMAN, Arrays.asList(Color.BLACK, Color.RED, Color.YELLOW));
    FLAGS.put(BELGIAN, Arrays.asList(Color.BLACK, Color.YELLOW, Color.RED));
    FLAGS.put(FRENCH, Arrays.asList(Color.BLUE, Color.WHITE, Color.RED));
    FLAGS.put(ITALIAN, Arrays.asList(Color.GREEN, Color.WHITE, Color.RED));
}

public List<Color> getFlagColors(Nationality nationality) {
    List<Color> colors = FLAGS.get(nationality);
    return colors != null ? colors : Arrays.asList(Color.GRAY);
}

A second, more advanced way to reduce the complexity of the getFlagColors method is to apply a refactoring pattern that separates functionality for different flags in different flag types. You can do this by applying the Replace Conditional with Polymorphism pattern: each flag will get its own type that implements a general interface. The polymorphic behavior of the Java language will ensure that the right functionality is called during runtime.

For this refactoring, we start with a general Flag interface:

public interface Flag {
    List<Color> getColors();
}

and specific flag types for different nationalities, such as for the Dutch:

public class DutchFlag implements Flag {
    public List<Color> getColors() {
        return Arrays.asList(Color.RED, Color.WHITE, Color.BLUE);
    }
}

and the Italian:

public class ItalianFlag implements Flag {
    public List<Color> getColors() {
        return Arrays.asList(Color.GREEN, Color.WHITE, Color.RED);
    }
}

The getFlagColors method now becomes even more concise and less error-prone:

private static final Map<Nationality, Flag> FLAGS =
    new HashMap<Nationality, Flag>();

static {
    FLAGS.put(DUTCH, new DutchFlag());
    FLAGS.put(GERMAN, new GermanFlag());
    FLAGS.put(BELGIAN, new BelgianFlag());
    FLAGS.put(FRENCH, new FrenchFlag());
    FLAGS.put(ITALIAN, new ItalianFlag());
}

public List<Color> getFlagColors(Nationality nationality) {
    Flag flag = FLAGS.get(nationality);
    flag = flag != null ? flag : new DefaultFlag();
    return flag.getColors();
}

This refactoring offers the most flexible implementation. For example, it allows the flag type hierarchy to grow over time by implementing new flag types and testing these types in isolation. A drawback of this refactoring is that it introduces more code spread out over more classes. The developer much choose between extensibility and conciseness.

Dealing with Nesting

Suppose a unit has a deeply nested conditional, as in the following example. Given a binary search tree root node and an integer, the calculateDepth method determines whether the integer occurs in the tree. If so, the method returns the depth of the integer in the tree; otherwise, it throws a TreeException:

public static int calculateDepth(BinaryTreeNode<Integer> t, int n) {
    int depth = 0;
    if (t.getValue() == n) {
        return depth;
    } else {
        if (n < t.getValue()) {
            BinaryTreeNode<Integer> left = t.getLeft();
            if (left == null) {
                throw new TreeException("Value not found in tree!");
            } else {
                return 1 + calculateDepth(left, n);
            }
        } else {
            BinaryTreeNode<Integer> right = t.getRight();
            if (right == null) {
                throw new TreeException("Value not found in tree!");
            } else {
                return 1 + calculateDepth(right, n);
            }
        }
    }
}

To improve readability, we can get rid of the nested conditional by identifying the distinct cases and insert return statements for these. In terms of refactoring, this is called the Replace Nested Conditional with Guard Clauses pattern. The result will be the following method:

public static int calculateDepth(BinaryTreeNode<Integer> t, int n) {
    int depth = 0;
    if (t.getValue() == n)
        return depth;
    if (n < t.getValue() && t.getLeft() != null)
        return 1 + calculateDepth(t.getLeft(), n);
    if (n > t.getValue() && t.getRight() != null)
        return 1 + calculateDepth(t.getRight(), n);
    throw new TreeException("Value not found in tree!");
}

Although the unit is now easier to understand, its complexity has not decreased. In order to reduce the complexity, you should extract the nested conditionals to separate methods. The result will be as follows:

public static int calculateDepth(BinaryTreeNode<Integer> t, int n) {
    int depth = 0;
    if (t.getValue() == n)
        return depth;
    else
        return traverseByValue(t, n);
}

private static int traverseByValue(BinaryTreeNode<Integer> t, int n) {
    BinaryTreeNode<Integer> childNode = getChildNode(t, n);
    if (childNode == null) {
        throw new TreeException("Value not found in tree!");
    } else {
        return 1 + calculateDepth(childNode, n);
    }
}

private static BinaryTreeNode<Integer> getChildNode(
    BinaryTreeNode<Integer> t, int n) {
    if (n < t.getValue()) {
        return t.getLeft();
    } else {
        return t.getRight();
    }
}

This actually does decrease the complexity of the unit. Now we have achieved two things: the methods are easier to understand, and they are easier to test in isolation since we can now write unit tests for the distinct functionalities.

Common Objections to Writing Simple Units of Code

Of course, when you are writing code, units can easily become complex. You may argue that high complexity is bound to arise or that reducing unit complexity in your codebase will not help to increase the maintainability of your system. Such objections are discussed next.

Objection: High Complexity Cannot Be Avoided

“Our domain is very complex, and therefore high code complexity is unavoidable.”

When you are working in a complex domain—such as optimizing logistical problems, real-time visualizations, or anything that demands advanced application logic—it is natural to think that the domain’s complexity carries over to the implementation, and that this is an unavoidable fact of life.

We argue against this common interpretation. Complexity in the domain does not require the technical implementation to be complex as well. In fact, it is your responsibility as a developer to simplify problems such that they lead to simple code. Even if the system as a whole performs complex functionality, it does not mean that units on the lowest level should be complex as well. In cases where a system needs to process many conditions and exceptions (such as certain legislative requirements), one solution may be to implement a default, simple process and model the exceptions explicitly.

It is true that the more demanding a domain is, the more effort the developer must expend to build technically simple solutions. But it can be done! We have seen many highly maintainable systems solving complex business problems. In fact, we believe that the only way to solve complex business problems and keep them under control is through simple code.

Objection: Splitting Up Methods Does Not Reduce Complexity

“Replacing one method with McCabe 15 by three methods with McCabe 5 each means that overall McCabe is still 15 (and therefore, there are 15 control flow branches overall). So nothing is gained.”

Of course, you will not decrease the overall McCabe complexity of a system by refactoring a method into several new methods. But from a maintainability perspective, there is an advantage to doing so: it will become easier to test and understand the code that was written. So, as we already mentioned, newly written unit tests allow you to more easily identify the root cause of your failing tests.

Important

Put your code in simple units (at most four branch points) that have carefully chosen names describing their function and cases.

See Also

See also Chapter 2 on refactoring patterns for splitting units up in smaller units.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset