Chapter 2. Write Short Units of Code

Any fool can write code that a computer can understand. Good programmers write code that humans can understand.

Martin Fowler

Guideline:

  • Limit the length of code units to 15 lines of code.

  • Do this by not writing units that are longer than 15 lines of code in the first place, or by splitting long units into multiple smaller units until each unit has at most 15 lines of code.

  • This improves maintainability because small units are easy to understand, easy to test, and easy to reuse.

Units are the smallest groups of code that can be maintained and executed independently. In Java, units are methods or constructors. A unit is always executed as a whole. It is not possible to invoke just a few lines of a unit. Therefore, the smallest piece of code that can be reused and tested is a unit.

Consider the code snippet presented next. Given a customer identifier in a URL, this code generates a list of all of the customer’s bank accounts along with the balance of each account. The list is returned as a string formatted according to the JSON standard, and also includes the overall total balance. It checks the validity of the bank account numbers using a checksum and skips invalid numbers. See the sidebar “The 11-Check for Bank Account Numbers” for an explanation of the checksum used.

public void doGet(HttpServletRequest req, HttpServletResponse resp)
    throws ServletException, IOException {
    resp.setContentType("application/json");
    try {
        Connection conn = DriverManager.
            getConnection(this.conf.getProperty("handler.jdbcurl"));
        ResultSet results =
            conn.createStatement()
                .executeQuery(
                    "SELECT account, balance FROM ACCTS WHERE id="
                        + req.getParameter(conf.
                            getProperty("request.parametername")));
        float totalBalance = 0;
        resp.getWriter().print("{"balances":[");
        while (results.next()) {
            // Assuming result is 9-digit bank account number,
            // validate with 11-test:
            int sum = 0;
            for (int i = 0; i < results.getString("account")
                .length(); i++) {
                sum = sum + (9 - i)
                        * Character.getNumericValue(results.getString(
                        "account").charAt(i));
            }
            if (sum % 11 == 0) {
                totalBalance += results.getFloat("balance");
                resp.getWriter().print(
                    "{"" + results.getString("account") + "":"
                        + results.getFloat("balance") + "}");
            }
            if (results.isLast()) {
                resp.getWriter().println("],");
            } else {
                resp.getWriter().print(",");
            }
        }
        resp.getWriter().println(""total":" + totalBalance + "}");
    } catch (SQLException e) {
        System.out.println("SQL exception: " + e.getMessage());
    }
}

Understanding this unit requires you to keep track of a large number of details. First, there is the JDBC connection. Then the checksum is validated in a for loop inside the while loop, which iterates over all records returned by the SQL query. There are also the details of JSON formatting, and the details of Java servlets, to keep in mind.

The for loop in the middle of the unit implements the checksum validation. While conceptually not that difficult, this type of code requires testing. That is easier said than done, since you can only test the code by invoking the doGet method. That requires first creating HttpServletRequest and HttpServletResponse objects. It also requires making a database server available and populating it with account numbers to test. After the call to doGet, all you have is a JSON-formatted string hidden in the HttpServletResponse object. To test whether the total balance is correct, you have to extract this value from the JSON-formatted string.

The checksum code is also hard to reuse. The only way to execute the checksum code is to call doGet. Consequently, any future code that wants to reuse the checksum code needs to have a SQL database available just to provide the bank account number to check.

Long units tend to be hard to test, reuse, and understand. In the example just given, the root cause is that doGet is mixing (at least) four responsibilities: handling an HTTP GET request, accessing data from a database, executing some business logic, and transferring the result to the data transfer format of choice—in this case, JSON. Together, doGet has 39 lines of code (every line that is not empty and does not contain only a comment counts as a line of code). A shorter unit would simply not accommodate that many responsibilities.

Motivation

The advantages of short units are that they are easy to test, easy to analyze, and easy to reuse.

Short Units Are Easy to Test

Units encapsulate the application logic of your system, and typically much testing effort is spent on validating the application logic’s correctness. This is because the Java compiler will not detect errors in the application logic automatically, and neither will your editor or IDE (integrated development environment; e.g., Eclipse). Code with a single responsibility is easier to test. In general, short units may do only one thing, while long units do multiple things and tend to have more responsibilities. A unit with one responsibility is easier to test, since it implements a single indivisible task. That allows the test to be isolated (specific to the unit) and simple. Chapter 10 discusses testing in more detail.

Short Units Are Easy to Analyze

It takes less time to read all the code in a short unit in order to analyze how the unit works internally than it does in a long unit. This may not be apparent when you are writing new code, but it makes all the difference when you are modifying existing code. This is not an exceptional situation, since maintenance begins the day after the project is started.

Short Units Are Easy to Reuse

A unit should always be invoked in at least one method (otherwise, the unit is dead code). In a system, you can reuse a unit by invoking it in more than one method. Small units are better candidates for reuse than long units. Long units tend to offer specific details or provide a specific combination of functionalities. As a result, they have more specialized functionality than short units. This makes reuse hard, because it is not very likely that the specific functionality of a long unit is suitable. In contrast, short units tend to be more generic. This makes reuse easier, because it is more likely to fit your needs. Reusing code also helps keep the total code volume low (see Chapter 9).

Note

Copying and pasting a unit is not what we mean when we speak about reuse. That type of reuse leads to duplication, which is to be avoided at all times (see Chapter 4).

How to Apply the Guideline

Following this guideline is not difficult when you know the right techniques, but it requires discipline. This section presents two techniques that we find particularly important. When writing a new unit, never let it grow beyond 15 lines of code. That means that well before you reach 15 lines of code, you need to start thinking about how to add further functionality. Does it really belong in the unit you are writing, or should it go into its own unit? When a unit grows beyond 15 lines of code despite your efforts, you need to shorten it.

When Writing a New Unit

Assume you are writing a class that represents a level in JPacman, the codebase we use for a number of examples in this book. See the sidebar “About JPacman” for an introduction to it. This class provides public start and stop methods that are called from buttons in the user interface of the game. A level maintains a list of observers: classes that need to be informed whenever the level has finished.

The most basic version of the start method checks whether the game is already in progress. If so, it silently returns; otherwise, it updates inProgress, a private member to keep track of its state:

public void start() {
    if (inProgress) {
        return;
    }
    inProgress = true;
}

So far our unit contains only four lines of code. At this point, we can add a unit test for our unit. When you use TDD (test-driven development), you already have a unit test at this point. Unit testing is discussed in Chapter 10.

When Extending a Unit with New Functionality

When you extend your system with new functionality, you will see that units start to grow longer. Discipline is required to adhere to a strict size limit. The next thing the start method needs is functionality that updates all observers of the level to inform them about the current state. Here is how that works if we add code that tells all observers the level has been lost if the player has died, and that tells all observers that the level is won if any pellets are left:

public void start() {
    if (inProgress) {
        return;
    }
    inProgress = true;
    // Update observers if player died:
    if (!isAnyPlayerAlive()) {
        for (LevelObserver o : observers) {
            o.levelLost();
        }
    }
    // Update observers if all pellets eaten:
    if (remainingPellets() == 0) {
        for (LevelObserver o : observers) {
            o.levelWon();
        }
    }
}

Adding the code to update observers made our unit grow to 16 lines of code (and 18 lines in total, including the 2 lines that contain comments). After testing the behavior of this new code, you are probably already thinking about the next functionality to implement. However, you need to refactor first to follow the guideline of this chapter.

Using Refactoring Techniques to Apply the Guideline

This section discusses two refactoring techniques to apply the guideline and achieve shorter units of code.

Refactoring technique: Extract Method

One refactoring technique that works in this case is Extract Method. In the following snippet, this technique is applied to extract a method from the former snippet:

public void start() {
    if (inProgress) {
        return;
    }
    inProgress = true;
    updateObservers();
}

private void updateObservers() {
    // Update observers if player died:
    if (!isAnyPlayerAlive()) {
        for (LevelObserver o : observers) {
            o.levelLost();
        }
    }
    // Update observers if all pellets eaten:
    if (remainingPellets() == 0) {
        for (LevelObserver o : observers) {
            o.levelWon();
        }
    }
}

As you can see, the unit (the method called start) that had grown to 16 lines of code is now back to 7 lines of code, well below the limit of 15 lines. A new unit (method), called updateObservers, has been added. It has 12 lines of code, which is also under the limit of 15 lines. There is an additional benefit. Starting or resuming a level is not the only place where observers need to be updated; they also need to be informed after every move (of the player or any of the ghosts). Implementing that is easy now: just call updateObservers from move, the method that controls the movement of the player and all ghosts.

The new method still has two responsibilities, as indicated by the comments. We can refactor the code further, applying Extract Method two more times:

private void updateObservers() {
     updateObserversPlayerDied();
     updateObserversPelletsEaten();
}

private void updateObserversPlayerDied() {
    if (!isAnyPlayerAlive()) {
        for (LevelObserver o : observers) {
            o.levelLost();
        }
    }
}

private void updateObserversPelletsEaten() {
    if (remainingPellets() == 0) {
        for (LevelObserver o : observers) {
            o.levelWon();
        }
    }
}

There is no need for the comments anymore: they have been replaced by the names of the new methods. Using short units makes source code self-explanatory, as the names of the methods take over the role of comments. There is a price, however: the total number of code lines has increased, from 16 to 25.

Writing maintainable code is always a trade-off between different guidelines. When splitting a unit into multiple units, you might increase the total number of code lines. That seems to contradict the guideline of keeping the codebase small (see Chapter 9). However, you have decreased the length and complexity of units that need to be tested and understood. Therefore, maintainability has improved. While keeping the codebase small is a good practice, the advantages of short units far outweigh the increase in overall volume—especially given the marginal volume increase in this case.

That writing maintainable code is always a trade-off is also evident in the choices made by the JPacman authors. In the source code as it appears on GitHub, Extract Method has been applied once, resulting in the 12-line version of updateObservers. The authors of JPacman have not chosen to split updateObservers into updateObserversPlayerDied and updateObserversPelletsEaten.

Refactoring technique: Replace Method with Method Object

In this example, it was easy to apply the Extract Method refactoring technique. The reason is that the groups of code lines that were extracted did not access any local variables, nor did they return any value. Sometimes, you want to extract a method that does access local variables. It is always possible to pass local variables as parameters to the extracted method. However, this may lead to long parameter lists, which are a problem in themselves (see Chapter 5). Return values can be even more troublesome, as in Java you can return only a single value from a method. In these cases, you can use a second refactoring technique, called Replace Method with Method Object.

JPacman contains a snippet for which this refactoring is applicable. Consider the following 18-line method from the class BoardFactory:

public Board createBoard(Square[][] grid) {
    assert grid != null;

    Board board = new Board(grid);

    int width = board.getWidth();
    int height = board.getHeight();
    for (int x = 0; x < width; x++) {
        for (int y = 0; y < height; y++) {
            Square square = grid[x][y];
            for (Direction dir : Direction.values()) {
                int dirX = (width + x + dir.getDeltaX()) % width;
                int dirY = (height + y + dir.getDeltaY()) % height;
                Square neighbour = grid[dirX][dirY];
                square.link(neighbour, dir);
            }
        }
    }

    return board;
}

The four lines inside the innermost for loop are a candidate for the Extract Method technique. However, together these four lines use six local variables—width, height, x, y, dir, and square—and one pseudovariable, the grid parameter. If you apply the Extract Method technique, you will have to pass seven parameters to the extracted method:

private void setLink(Square square, Direction dir, int x, int y, int width,
    int height, Square[][] grid) {
    int dirX = (width + x + dir.getDeltaX()) % width;
    int dirY = (height + y + dir.getDeltaY()) % height;
    Square neighbour = grid[dirX][dirY];
    square.link(neighbour, dir);
}

The refactored createBoard method would look like this:

public Board createBoard(Square[][] grid) {
    assert grid != null;

    Board board = new Board(grid);

    int width = board.getWidth();
    int height = board.getHeight();
    for (int x = 0; x < width; x++) {
        for (int y = 0; y < height; y++) {
            Square square = grid[x][y];
            for (Direction dir : Direction.values()) {
                setLink(square, dir, x, y, width, height, grid);
            }
        }
    }

    return board;
}

Let us try the Replace Method with Method Object technique instead. In this technique, we create a new class that will take over the role of createBoard, the method we are refactoring:

class BoardCreator {
    private Square[][] grid;
    private Board board;
    private int width;
    private int height;

    BoardCreator(Square[][] grid) {
        assert grid != null;
        this.grid = grid;
        this.board = new Board(grid);
        this.width = board.getWidth();
        this.height = board.getHeight();
    }

    Board create() {
        for (int x = 0; x < width; x++) {
            for (int y = 0; y < height; y++) {
                Square square = grid[x][y];
                for (Direction dir : Direction.values()) {
                    setLink(square, dir, x, y);
                }
            }
        }
        return this.board;
    }

    private void setLink(Square square, Direction dir, int x, int y) {
        int dirX = (width + x + dir.getDeltaX()) % width;
        int dirY = (height + y + dir.getDeltaY()) % height;
        Square neighbour = grid[dirX][dirY];
        square.link(neighbour, dir);
    }
}

In this new class, three local variables (board, width, and height) and one parameter (grid) of the createBoard method have been turned into (private) fields of the new class. These fields are accessible to all methods of the new class. Hence, they no longer need to be passed around as parameters. The four lines of the innermost for loop now appear in a new method, setLink, that has four parameters, not seven.

We’re almost done. To complete the refactoring, we have to change the original createBoard method as follows:

public Board createBoard(Square[][] grid) {
    return new BoardCreator(grid).create();
}

Not only have we ended up only with methods shorter than 15 lines of code and avoided creating methods with long parameter lists, but the code is actually easier to read, test, and reuse.

Common Objections to Writing Short Units

While writing short units may sound simple, software developers often find it quite difficult in practice. The following are typical objections to the principle explained in this chapter.

Objection: Having More Units Is Bad for Performance

“Writing short units means having more units, and therefore more method calls. That will never perform.”

Indeed, theoretically, there is a performance penalty for having more units. There will be more method invocations (compared to having fewer, longer units). For each invocation, a bit of work needs to be done by the Java Virtual Machine (JVM). In practice, this is almost never a problem. In the worst case, we are talking about microseconds. Unless a unit is executed hundreds of thousands of times in a loop, the performance penalty of a method invocation is not noticeable. Also, the JVM is very good at optimizing the overhead of method invocations.

Except for very specific cases in enterprise software development, you can focus on maintainability without sacrificing performance. An example is when a method is invoked hundreds of thousands of times in the case of certain algorithms. This is probably one of the very few cases in a programmer’s life where you can have your cake and eat it too. We are not saying that there are no performance issues in enterprise software development; however, they seldom, if ever, are caused by excessive method calling.

Tip

Do not sacrifice maintainability to optimize for performance, unless solid performance tests have proven that you actually have a performance problem and your performance optimization actually makes a difference.

Objection: Code Is Harder to Read When Spread Out

“Code becomes harder to read when spread out over multiple units.”

Well, psychology says that is not the case. People have a working memory of about seven items, so someone who is reading a unit that is significantly longer than seven lines of code cannot process all of it. The exception is probably the original author of a piece of source code while he or she is working on it (but not a week later).

Tip

Write code that is easy to read and understand for your successors (and for your future self).

Guideline Encourages Improper Formatting

“Your guideline encourages improper source code formatting.”

Do not try to comply with guideline by cutting corners in the area of formatting. We are talking about putting multiple statements or multiple curly brackets on one line. It makes the code slightly harder to read and thus decreases its maintainability. Resist the temptation to do so.

Consider what purpose the guideline really serves. We simply cannot leave unit length unconstrained. That would be akin to removing speed limits in traffic because they discourage being on time. It is perfectly possible to obey speed limits and arrive on time: just leave home a bit earlier. It is equally possible to write short units. Our experience is that 15 lines of properly formatted code is enough to write useful units.

As proof, Table 2-1 presents some data from a typical Java 2 Enterprise Edition system, consisting of Java source files but also some XSD and XSLT. The system, currently in production at a SIG client, provides reporting functionality for its owner. The Java part consists of about 28,000 lines of code (a medium-sized system). Of these 28,000 lines of code, just over 17,000 lines are in units. There are just over 3,000 units in this codebase.

Table 2-1. Distribution of unit length in a real-world enterprise system
Unit length Number of units (absolute) Number of units (relative) Number of lines (absolute) Number of lines (relative)

1-15

3,071

95.4%

14,032

81.3%

16 or more

149

4.6%

3,221

18.7%

Total

3,220

100%

17,253

100%

Out of the 3,220 units in this system, 3,071 (95.4%) are at most 15 lines of code, while 149 units (4.6% of all units) are longer. This shows that it is very possible in practice to write short units—at least for a vast majority of units.

Important

Agree on formatting conventions in your team. Keep units short and comply with these conventions.

This Unit Is Impossible to Split Up

“My unit really cannot be split up.”

Sometimes, splitting a method is indeed difficult. Take, for instance, a properly formatted switch statement in Java. For each case of the switch statement, there is a line for the case itself, at least one line to do anything useful, and a line for the break statement. So, anything beyond four cases becomes very hard to fit into 15 lines of code, and a case statement cannot be split. In Chapter 3, we present some guidelines on how to deal specifically with switch statements.

However, it is true that sometimes a source code statement simply cannot be split. A typical example in enterprise software is SQL query construction. Consider the following example (adapted from a real-world system analyzed by the authors of this book):

public static void printDepartmentEmployees(String department) {
    Query q = new Query();
    for (Employee e : q.addColumn("FamilyName")
        .addColumn("Initials")
        .addColumn("GivenName")
        .addColumn("AddressLine1")
        .addColumn("ZIPcode")
        .addColumn("City")
        .addTable("EMPLOYEES")
        .addWhere("EmployeeDep='" + department + "'")
        .execute()) {
        System.out.println("<div name='addressDiv'" + e.getFamilyName()
            + ", " + e.getInitials() + "<br />" + e.getAddressLine1()
            + "<br />" + e.getZipCode() + e.getCity() + "</div>");
    }
}

This example has 16 lines of code. However, there are just three statements. The second statement contains an expression that spans nine lines. Indeed, you cannot extract just this statement; neither the Extract Method nor the Replace Method with Method Object technique is applicable, at least not directly. However, the nine-line expression starting with q.addColumn("FamilyName") can be extracted into a new method. But before doing that (and seeing the newly created method grow to over 15 lines when the query gets more complex in the future), rethink the architecture. Is it wise to create a SQL query piece by piece as in this snippet? Should the HTML markup really appear here? A templating solution such as JSP or JSF may be more suitable for the job at hand.

So, if you are faced with a unit that seems impossible to refactor, do not ignore it and move on to another programming task, but indeed raise the issue with your team members and team lead.

Important

When a refactoring seems possible but doesn’t make sense, rethink the architecture of your system.

There Is No Visible Advantage in Splitting Units

“Putting code in doSomethingOne, doSomethingTwo, doSomethingThree has no benefit over putting the same code all together in one long doSomething.”

Actually, it does, provided you choose better names than doSomethingOne, doSomethingTwo, and so on. Each of the shorter units is, on its own, easier to understand than the long doSomething. More importantly, you may not even need to consider all the parts, especially since each of the method names, when chosen carefully, serves as documentation indicating what the unit of code is supposed to do. Moreover, the long doSomething typically will combine multiple tasks. That means that you can only reuse doSomething if you need the exact same combination. Most likely, you can reuse each of doSomethingOne, doSomethingTwo, and so on much more easily.

Important

Put code in short units (at most 15 lines of code) that have carefully chosen names that describe their function.

See Also

See Chapters 3, 4, and 5 for additional refactoring techniques. For a discussion on how to test methods, see Chapter 10.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset