Any fool can write code that a computer can understand. Good programmers write code that humans can understand.
Martin Fowler
Limit the length of code units to 15 lines of code.
Do this by not writing units that are longer than 15 lines of code in the first place, or by splitting long units into multiple smaller units until each unit has at most 15 lines of code.
This improves maintainability because small units are easy to understand, easy to test, and easy to reuse.
Units are the smallest groups of code that can be maintained and executed independently. In Java, units are methods or constructors. A unit is always executed as a whole. It is not possible to invoke just a few lines of a unit. Therefore, the smallest piece of code that can be reused and tested is a unit.
Consider the code snippet presented next. Given a customer identifier in a URL, this code generates a list of all of the customer’s bank accounts along with the balance of each account. The list is returned as a string formatted according to the JSON standard, and also includes the overall total balance. It checks the validity of the bank account numbers using a checksum and skips invalid numbers. See the sidebar “The 11-Check for Bank Account Numbers” for an explanation of the checksum used.
public
void
doGet
(
HttpServletRequest
req
,
HttpServletResponse
resp
)
throws
ServletException
,
IOException
{
resp
.
setContentType
(
"application/json"
);
try
{
Connection
conn
=
DriverManager
.
getConnection
(
this
.
conf
.
getProperty
(
"handler.jdbcurl"
));
ResultSet
results
=
conn
.
createStatement
()
.
executeQuery
(
"SELECT account, balance FROM ACCTS WHERE id="
+
req
.
getParameter
(
conf
.
getProperty
(
"request.parametername"
)));
float
totalBalance
=
0
;
resp
.
getWriter
().
(
"{"balances":["
);
while
(
results
.
next
())
{
// Assuming result is 9-digit bank account number,
// validate with 11-test:
int
sum
=
0
;
for
(
int
i
=
0
;
i
<
results
.
getString
(
"account"
)
.
length
();
i
++)
{
sum
=
sum
+
(
9
-
i
)
*
Character
.
getNumericValue
(
results
.
getString
(
"account"
).
charAt
(
i
));
}
if
(
sum
%
11
==
0
)
{
totalBalance
+=
results
.
getFloat
(
"balance"
);
resp
.
getWriter
().
(
"{""
+
results
.
getString
(
"account"
)
+
"":"
+
results
.
getFloat
(
"balance"
)
+
"}"
);
}
if
(
results
.
isLast
())
{
resp
.
getWriter
().
println
(
"],"
);
}
else
{
resp
.
getWriter
().
(
","
);
}
}
resp
.
getWriter
().
println
(
""total":"
+
totalBalance
+
"}"
);
}
catch
(
SQLException
e
)
{
System
.
out
.
println
(
"SQL exception: "
+
e
.
getMessage
());
}
}
Understanding this unit requires you to keep track of a large number of details. First, there is the JDBC connection. Then the checksum is validated in a for
loop inside the while
loop, which iterates over all records returned by the SQL query. There are also the details of JSON formatting, and the details of Java servlets, to keep in mind.
The for
loop in the middle of the unit implements the checksum validation. While conceptually not that difficult, this type of code requires testing. That is easier said than done, since you can only test the code by invoking the doGet
method. That requires first creating HttpServletRequest
and HttpServletResponse
objects. It also requires making a database server available and populating it with account numbers to test. After the call to doGet
, all you have is a JSON-formatted string hidden in the HttpServletResponse
object. To test whether the total balance is correct, you have to extract this value from the JSON-formatted string.
The checksum code is also hard to reuse. The only way to execute the checksum code is to call doGet
. Consequently, any future code that wants to reuse the checksum code needs to have a SQL database available just to provide the bank account number to check.
Long units tend to be hard to test, reuse, and understand. In the example just given, the root cause is that doGet
is mixing (at least) four responsibilities: handling an HTTP GET request, accessing data from a database, executing some business logic, and transferring the result to the data transfer format of choice—in this case, JSON. Together, doGet
has 39 lines of code (every line that is not empty and does not contain only a comment counts as a line of code). A shorter unit would simply not accommodate that many responsibilities.
The advantages of short units are that they are easy to test, easy to analyze, and easy to reuse.
Units encapsulate the application logic of your system, and typically much testing effort is spent on validating the application logic’s correctness. This is because the Java compiler will not detect errors in the application logic automatically, and neither will your editor or IDE (integrated development environment; e.g., Eclipse). Code with a single responsibility is easier to test. In general, short units may do only one thing, while long units do multiple things and tend to have more responsibilities. A unit with one responsibility is easier to test, since it implements a single indivisible task. That allows the test to be isolated (specific to the unit) and simple. Chapter 10 discusses testing in more detail.
It takes less time to read all the code in a short unit in order to analyze how the unit works internally than it does in a long unit. This may not be apparent when you are writing new code, but it makes all the difference when you are modifying existing code. This is not an exceptional situation, since maintenance begins the day after the project is started.
A unit should always be invoked in at least one method (otherwise, the unit is dead code). In a system, you can reuse a unit by invoking it in more than one method. Small units are better candidates for reuse than long units. Long units tend to offer specific details or provide a specific combination of functionalities. As a result, they have more specialized functionality than short units. This makes reuse hard, because it is not very likely that the specific functionality of a long unit is suitable. In contrast, short units tend to be more generic. This makes reuse easier, because it is more likely to fit your needs. Reusing code also helps keep the total code volume low (see Chapter 9).
Copying and pasting a unit is not what we mean when we speak about reuse. That type of reuse leads to duplication, which is to be avoided at all times (see Chapter 4).
Following this guideline is not difficult when you know the right techniques, but it requires discipline. This section presents two techniques that we find particularly important. When writing a new unit, never let it grow beyond 15 lines of code. That means that well before you reach 15 lines of code, you need to start thinking about how to add further functionality. Does it really belong in the unit you are writing, or should it go into its own unit? When a unit grows beyond 15 lines of code despite your efforts, you need to shorten it.
Assume you are writing a class that represents a level in JPacman, the codebase we use for a number of examples in this book. See the sidebar “About JPacman” for an introduction to it. This class provides public start
and stop
methods that are called from buttons in the user interface of the game. A level maintains a list of observers: classes that need to be informed whenever the level has finished.
The most basic version of the start
method checks whether the game is already in progress. If so, it silently returns; otherwise, it updates inProgress
, a private member to keep track of its state:
public
void
start
()
{
if
(
inProgress
)
{
return
;
}
inProgress
=
true
;
}
So far our unit contains only four lines of code. At this point, we can add a unit test for our unit. When you use TDD (test-driven development), you already have a unit test at this point. Unit testing is discussed in Chapter 10.
When you extend your system with new functionality, you will see that units start to grow longer. Discipline is required to adhere to a strict size limit. The next thing the start
method needs is functionality that updates all observers of the level to inform them about the current state. Here is how that works if we add code that tells all observers the level has been lost if the player has died, and that tells all observers that the level is won if any pellets are left:
public
void
start
()
{
if
(
inProgress
)
{
return
;
}
inProgress
=
true
;
// Update observers if player died:
if
(!
isAnyPlayerAlive
())
{
for
(
LevelObserver
o
:
observers
)
{
o
.
levelLost
();
}
}
// Update observers if all pellets eaten:
if
(
remainingPellets
()
==
0
)
{
for
(
LevelObserver
o
:
observers
)
{
o
.
levelWon
();
}
}
}
Adding the code to update observers made our unit grow to 16 lines of code (and 18 lines in total, including the 2 lines that contain comments). After testing the behavior of this new code, you are probably already thinking about the next functionality to implement. However, you need to refactor first to follow the guideline of this chapter.
This section discusses two refactoring techniques to apply the guideline and achieve shorter units of code.
One refactoring technique that works in this case is Extract Method. In the following snippet, this technique is applied to extract a method from the former snippet:
public
void
start
()
{
if
(
inProgress
)
{
return
;
}
inProgress
=
true
;
updateObservers
();
}
private
void
updateObservers
()
{
// Update observers if player died:
if
(!
isAnyPlayerAlive
())
{
for
(
LevelObserver
o
:
observers
)
{
o
.
levelLost
();
}
}
// Update observers if all pellets eaten:
if
(
remainingPellets
()
==
0
)
{
for
(
LevelObserver
o
:
observers
)
{
o
.
levelWon
();
}
}
}
As you can see, the unit (the method called start
) that had grown to 16 lines of code is now back to 7 lines of code, well below the limit of 15 lines. A new unit (method), called updateObservers
, has been added. It has 12 lines of code, which is also under the limit of 15 lines. There is an additional benefit. Starting or resuming a level is not the only place where observers need to be updated; they also need to be informed after every move (of the player or any of the ghosts). Implementing that is easy now: just call updateObservers
from move
, the method that controls the movement of the player and all ghosts.
The new method still has two responsibilities, as indicated by the comments. We can refactor the code further, applying Extract Method two more times:
private
void
updateObservers
()
{
updateObserversPlayerDied
();
updateObserversPelletsEaten
();
}
private
void
updateObserversPlayerDied
()
{
if
(!
isAnyPlayerAlive
())
{
for
(
LevelObserver
o
:
observers
)
{
o
.
levelLost
();
}
}
}
private
void
updateObserversPelletsEaten
()
{
if
(
remainingPellets
()
==
0
)
{
for
(
LevelObserver
o
:
observers
)
{
o
.
levelWon
();
}
}
}
There is no need for the comments anymore: they have been replaced by the names of the new methods. Using short units makes source code self-explanatory, as the names of the methods take over the role of comments. There is a price, however: the total number of code lines has increased, from 16 to 25.
Writing maintainable code is always a trade-off between different guidelines. When splitting a unit into multiple units, you might increase the total number of code lines. That seems to contradict the guideline of keeping the codebase small (see Chapter 9). However, you have decreased the length and complexity of units that need to be tested and understood. Therefore, maintainability has improved. While keeping the codebase small is a good practice, the advantages of short units far outweigh the increase in overall volume—especially given the marginal volume increase in this case.
That writing maintainable code is always a trade-off is also evident in the choices made by the JPacman authors. In the source code as it appears on GitHub, Extract Method has been applied once, resulting in the 12-line version of updateObservers
. The authors of JPacman have not chosen to split updateObservers
into updateObserversPlayerDied
and updateObserversPelletsEaten
.
In this example, it was easy to apply the Extract Method refactoring technique. The reason is that the groups of code lines that were extracted did not access any local variables, nor did they return any value. Sometimes, you want to extract a method that does access local variables. It is always possible to pass local variables as parameters to the extracted method. However, this may lead to long parameter lists, which are a problem in themselves (see Chapter 5). Return values can be even more troublesome, as in Java you can return only a single value from a method. In these cases, you can use a second refactoring technique, called Replace Method with Method Object.
JPacman contains a snippet for which this refactoring is applicable. Consider the following 18-line method from the class BoardFactory
:
public
Board
createBoard
(
Square
[][]
grid
)
{
assert
grid
!=
null
;
Board
board
=
new
Board
(
grid
);
int
width
=
board
.
getWidth
();
int
height
=
board
.
getHeight
();
for
(
int
x
=
0
;
x
<
width
;
x
++)
{
for
(
int
y
=
0
;
y
<
height
;
y
++)
{
Square
square
=
grid
[
x
][
y
];
for
(
Direction
dir
:
Direction
.
values
())
{
int
dirX
=
(
width
+
x
+
dir
.
getDeltaX
())
%
width
;
int
dirY
=
(
height
+
y
+
dir
.
getDeltaY
())
%
height
;
Square
neighbour
=
grid
[
dirX
][
dirY
];
square
.
link
(
neighbour
,
dir
);
}
}
}
return
board
;
}
The four lines inside the innermost for
loop are a candidate for the Extract Method technique. However, together these four lines use six local variables—width
, height
, x
, y
, dir
, and square
—and one pseudovariable, the grid
parameter. If you apply the Extract Method technique, you will have to pass seven parameters to the extracted method:
private
void
setLink
(
Square
square
,
Direction
dir
,
int
x
,
int
y
,
int
width
,
int
height
,
Square
[][]
grid
)
{
int
dirX
=
(
width
+
x
+
dir
.
getDeltaX
())
%
width
;
int
dirY
=
(
height
+
y
+
dir
.
getDeltaY
())
%
height
;
Square
neighbour
=
grid
[
dirX
][
dirY
];
square
.
link
(
neighbour
,
dir
);
}
The refactored createBoard
method would look like this:
public
Board
createBoard
(
Square
[][]
grid
)
{
assert
grid
!=
null
;
Board
board
=
new
Board
(
grid
);
int
width
=
board
.
getWidth
();
int
height
=
board
.
getHeight
();
for
(
int
x
=
0
;
x
<
width
;
x
++)
{
for
(
int
y
=
0
;
y
<
height
;
y
++)
{
Square
square
=
grid
[
x
][
y
];
for
(
Direction
dir
:
Direction
.
values
())
{
setLink
(
square
,
dir
,
x
,
y
,
width
,
height
,
grid
);
}
}
}
return
board
;
}
Let us try the Replace Method with Method Object technique instead. In this technique, we create a new class that will take over the role of createBoard
, the method we are refactoring:
class
BoardCreator
{
private
Square
[][]
grid
;
private
Board
board
;
private
int
width
;
private
int
height
;
BoardCreator
(
Square
[][]
grid
)
{
assert
grid
!=
null
;
this
.
grid
=
grid
;
this
.
board
=
new
Board
(
grid
);
this
.
width
=
board
.
getWidth
();
this
.
height
=
board
.
getHeight
();
}
Board
create
()
{
for
(
int
x
=
0
;
x
<
width
;
x
++)
{
for
(
int
y
=
0
;
y
<
height
;
y
++)
{
Square
square
=
grid
[
x
][
y
];
for
(
Direction
dir
:
Direction
.
values
())
{
setLink
(
square
,
dir
,
x
,
y
);
}
}
}
return
this
.
board
;
}
private
void
setLink
(
Square
square
,
Direction
dir
,
int
x
,
int
y
)
{
int
dirX
=
(
width
+
x
+
dir
.
getDeltaX
())
%
width
;
int
dirY
=
(
height
+
y
+
dir
.
getDeltaY
())
%
height
;
Square
neighbour
=
grid
[
dirX
][
dirY
];
square
.
link
(
neighbour
,
dir
);
}
}
In this new class, three local variables (board
, width
, and height
) and one parameter (grid
) of the createBoard
method have been turned into (private) fields of the new class. These fields are accessible to all methods of the new class. Hence, they no longer need to be passed around as parameters. The four lines of the innermost for
loop now appear in a new method, setLink
, that has four parameters, not seven.
We’re almost done. To complete the refactoring, we have to change the original createBoard
method as follows:
public
Board
createBoard
(
Square
[][]
grid
)
{
return
new
BoardCreator
(
grid
).
create
();
}
Not only have we ended up only with methods shorter than 15 lines of code and avoided creating methods with long parameter lists, but the code is actually easier to read, test, and reuse.
While writing short units may sound simple, software developers often find it quite difficult in practice. The following are typical objections to the principle explained in this chapter.
“Writing short units means having more units, and therefore more method calls. That will never perform.”
Indeed, theoretically, there is a performance penalty for having more units. There will be more method invocations (compared to having fewer, longer units). For each invocation, a bit of work needs to be done by the Java Virtual Machine (JVM). In practice, this is almost never a problem. In the worst case, we are talking about microseconds. Unless a unit is executed hundreds of thousands of times in a loop, the performance penalty of a method invocation is not noticeable. Also, the JVM is very good at optimizing the overhead of method invocations.
Except for very specific cases in enterprise software development, you can focus on maintainability without sacrificing performance. An example is when a method is invoked hundreds of thousands of times in the case of certain algorithms. This is probably one of the very few cases in a programmer’s life where you can have your cake and eat it too. We are not saying that there are no performance issues in enterprise software development; however, they seldom, if ever, are caused by excessive method calling.
“Code becomes harder to read when spread out over multiple units.”
Well, psychology says that is not the case. People have a working memory of about seven items, so someone who is reading a unit that is significantly longer than seven lines of code cannot process all of it. The exception is probably the original author of a piece of source code while he or she is working on it (but not a week later).
Write code that is easy to read and understand for your successors (and for your future self).
“Your guideline encourages improper source code formatting.”
Do not try to comply with guideline by cutting corners in the area of formatting. We are talking about putting multiple statements or multiple curly brackets on one line. It makes the code slightly harder to read and thus decreases its maintainability. Resist the temptation to do so.
Consider what purpose the guideline really serves. We simply cannot leave unit length unconstrained. That would be akin to removing speed limits in traffic because they discourage being on time. It is perfectly possible to obey speed limits and arrive on time: just leave home a bit earlier. It is equally possible to write short units. Our experience is that 15 lines of properly formatted code is enough to write useful units.
As proof, Table 2-1 presents some data from a typical Java 2 Enterprise Edition system, consisting of Java source files but also some XSD and XSLT. The system, currently in production at a SIG client, provides reporting functionality for its owner. The Java part consists of about 28,000 lines of code (a medium-sized system). Of these 28,000 lines of code, just over 17,000 lines are in units. There are just over 3,000 units in this codebase.
Unit length | Number of units (absolute) | Number of units (relative) | Number of lines (absolute) | Number of lines (relative) |
---|---|---|---|---|
1-15 |
3,071 |
95.4% |
14,032 |
81.3% |
16 or more |
149 |
4.6% |
3,221 |
18.7% |
Total |
3,220 |
100% |
17,253 |
100% |
Out of the 3,220 units in this system, 3,071 (95.4%) are at most 15 lines of code, while 149 units (4.6% of all units) are longer. This shows that it is very possible in practice to write short units—at least for a vast majority of units.
“My unit really cannot be split up.”
Sometimes, splitting a method is indeed difficult. Take, for instance, a properly formatted switch
statement in Java. For each case of the switch
statement, there is a line for the case itself, at least one line to do anything useful, and a line for the break
statement. So, anything beyond four cases becomes very hard to fit into 15 lines of code, and a case
statement cannot be split. In Chapter 3, we present some guidelines on how to deal specifically with switch
statements.
However, it is true that sometimes a source code statement simply cannot be split. A typical example in enterprise software is SQL query construction. Consider the following example (adapted from a real-world system analyzed by the authors of this book):
public
static
void
printDepartmentEmployees
(
String
department
)
{
Query
q
=
new
Query
();
for
(
Employee
e
:
q
.
addColumn
(
"FamilyName"
)
.
addColumn
(
"Initials"
)
.
addColumn
(
"GivenName"
)
.
addColumn
(
"AddressLine1"
)
.
addColumn
(
"ZIPcode"
)
.
addColumn
(
"City"
)
.
addTable
(
"EMPLOYEES"
)
.
addWhere
(
"EmployeeDep='"
+
department
+
"'"
)
.
execute
())
{
System
.
out
.
println
(
"<div name='addressDiv'"
+
e
.
getFamilyName
()
+
", "
+
e
.
getInitials
()
+
"<br />"
+
e
.
getAddressLine1
()
+
"<br />"
+
e
.
getZipCode
()
+
e
.
getCity
()
+
"</div>"
);
}
}
This example has 16 lines of code. However, there are just three statements. The second statement contains an expression that spans nine lines. Indeed, you cannot extract just this statement; neither the Extract Method nor the Replace Method with Method Object technique is applicable, at least not directly. However, the nine-line expression starting with q.addColumn("FamilyName")
can be extracted into a new method. But before doing that (and seeing the newly created method grow to over 15 lines when the query gets more complex in the future), rethink the architecture. Is it wise to create a SQL query piece by piece as in this snippet? Should the HTML markup really appear here? A templating solution such as JSP or JSF may be more suitable for the job at hand.
So, if you are faced with a unit that seems impossible to refactor, do not ignore it and move on to another programming task, but indeed raise the issue with your team members and team lead.
When a refactoring seems possible but doesn’t make sense, rethink the architecture of your system.
“Putting code in doSomethingOne
, doSomethingTwo
, doSomethingThree
has no benefit over putting the same code all together in one long doSomething
.”
Actually, it does, provided you choose better names than doSomethingOne
, doSomethingTwo
, and so on. Each of the shorter units is, on its own, easier to understand than the long doSomething
. More importantly, you may not even need to consider all the parts, especially since each of the method names, when chosen carefully, serves as documentation indicating what the unit of code is supposed to do. Moreover, the long doSomething
typically will combine multiple tasks. That means that you can only reuse doSomething
if you need the exact same combination. Most likely, you can reuse each of doSomethingOne
, doSomethingTwo
, and so on much more easily.
See Chapters 3, 4, and 5 for additional refactoring techniques. For a discussion on how to test methods, see Chapter 10.