Each problem has smaller problems inside.
Martin Fowler
Limit the number of branch points per unit to 4.
Do this by splitting complex units into simpler ones and avoiding complex units altogether.
This improves maintainability because keeping the number of branch points low makes units easier to modify and test.
Complexity is an often disputed quality characteristic. Code that appears complex to an outsider or novice developer can appear straightforward to a developer that is intimately familiar with it. To a certain extent, what is “complex” is in the eye of the beholder. There is, however, a point where code becomes so complex that modifying it becomes extremely risky and very time-consuming task, let alone testing the modifications afterward. To keep code maintainable, we must put a limit on complexity. Another reason to measure complexity is knowing the minimum number of tests we need to be sufficiently certain that the system acts predictably. Before we can define such a code complexity limit, we must be able to measure complexity.
A common way to objectively assess complexity is to count the number of possible paths through a piece of code. The idea is that the more paths can be distinguished, the more complex a piece of code is. We can determine the number of paths unambiguously by counting the number of branch points. A branch point is a statement where execution can take more than one direction depending on a condition. Examples of branch points in Java code are if
and switch
statements (a complete list follows later). Branch points can be counted for a complete codebase, a class, a package, or a unit. The number of branch points of a unit is equal to the minimum number of paths needed to cover all branches created by all branch points of that unit. This is called branch coverage. However, when you consider all paths through a unit from the first line of the unit to a final statement, combinatory effects are possible. The reason is that it may matter whether a branch follows another in a particular order. All possible combinations of branches are the execution paths of the unit—that is, the maximum number of paths through the unit.
Consider a unit containing two consecutive if
statements. Figure 3-1 depicts the control flow of the unit and shows the difference between branch coverage and execution path coverage.
Suppose the point to the left of the first if
statement modifies a database, and the point to the right of the second if
statement reads from that database. These are side effects and require us to test the “zigzag” paths as well (the dotted lines in Figure 3-1).
In summary, the number of branch points is the number of paths that cover all branches created by branch points. It is the minimum number of paths and can be zero (for a unit that has no branch points). The number of execution paths is a maximum, and can be very large due to combinatorial explosion. Which one to choose?
The answer is to take the number of branch points plus one. This is called cyclomatic complexity or McCabe complexity. Consequently, the guideline “limit the number of branch points per unit to 4” is equal to “limit code McCabe complexity to 5.” This is the minimum number of test cases that you need to cover a unit such that every path has a part not covered by the other paths. The cyclomatic (McCabe) complexity of a unit is at least one, which is easy to understand as follows. Consider a unit with no branch points. According to the definition, its cyclomatic complexity is one (number of branch points plus one). It also fits intuitively: a unit with no branch points has one execution path, and needs at least one test.
For the sake of completeness: only for units with one exit point, the cyclomatic or McCabe complexity is equal to the number of branch points plus one. It becomes more complex for units with more than one exit point. Do not worry about that: focus on limiting the number of branch points to four.
The minimum number of tests needed to cover all independent execution paths of a unit is equal to the number of branch points plus one.
Now consider the following example. Given a nationality, the getFlagColors
method returns the correct flag colors:
public
List
<
Color
>
getFlagColors
(
Nationality
nationality
)
{
List
<
Color
>
result
;
switch
(
nationality
)
{
case
DUTCH:
result
=
Arrays
.
asList
(
Color
.
RED
,
Color
.
WHITE
,
Color
.
BLUE
);
break
;
case
GERMAN:
result
=
Arrays
.
asList
(
Color
.
BLACK
,
Color
.
RED
,
Color
.
YELLOW
);
break
;
case
BELGIAN:
result
=
Arrays
.
asList
(
Color
.
BLACK
,
Color
.
YELLOW
,
Color
.
RED
);
break
;
case
FRENCH:
result
=
Arrays
.
asList
(
Color
.
BLUE
,
Color
.
WHITE
,
Color
.
RED
);
break
;
case
ITALIAN:
result
=
Arrays
.
asList
(
Color
.
GREEN
,
Color
.
WHITE
,
Color
.
RED
);
break
;
case
UNCLASSIFIED:
default
:
result
=
Arrays
.
asList
(
Color
.
GRAY
);
break
;
}
return
result
;
}
The switch
statement in the method body needs to handle all cases of the nationality enumeration type and return the correct flag colors. As there are five possible nationalities and the unclassified/default case, the number of isolated paths to be tested (control flow branches) is six.
On first sight, the getFlagColors
method might seem harmless. Indeed, the method is quite readable, and its behavior is as expected. Still, if we want to test the behavior of this method, we would need six unique test cases (one for each nationality plus one for the default/unclassified case). Writing automated tests might seem excessive for the getFlagColors
method, but suppose a developer adds the flag of Luxembourg (which is very similar to the Dutch flag) as a quick fix:
...
case
DUTCH:
result
=
Arrays
.
asList
(
Color
.
RED
,
Color
.
WHITE
,
Color
.
BLUE
);
case
LUXEMBOURGER:
result
=
Arrays
.
asList
(
Color
.
RED
,
Color
.
WHITE
,
Color
.
LIGHT_BLUE
);
break
;
case
GERMAN:
....
Being in a hurry, the developer copied the constructor call for the Dutch flag and updated the last argument to the right color. Unfortunately, the break
statement escaped the developer’s attention, and now all Dutch nationalities will see the flag from Luxembourg on their profile page!
This example looks like a forged scenario, but we know from our consultancy practice that this is what happens to complex code in reality. These types of simple mistakes are also responsible for many “trivial” bugs that could be easily prevented.
To understand why complex code is such a problem for maintenance, it is important to realize that code that starts out quite straightforward tends to grow much more complex over time. Consider the following snippet taken from the codebase of the open source build server Jenkins. There are 20 control flow branches in this code snippet. Imagine having to modify or test this method:
/**
* Retrieve a user by its ID, and create a new one if requested.
* @return
* An existing or created user. May be {@code null} if
* a user does not exist and {@code create} is false.
*/
private
static
User
getOrCreate
(
String
id
,
String
fullName
,
boolean
create
)
{
String
idkey
=
idStrategy
().
keyFor
(
id
);
byNameLock
.
readLock
().
lock
();
User
u
;
try
{
u
=
byName
.
get
(
idkey
);
}
finally
{
byNameLock
.
readLock
().
unlock
();
}
final
File
configFile
=
getConfigFileFor
(
id
);
if
(!
configFile
.
isFile
()
&&
!
configFile
.
getParentFile
().
isDirectory
())
{
// check for legacy users and migrate if safe to do so.
File
[]
legacy
=
getLegacyConfigFilesFor
(
id
);
if
(
legacy
!=
null
&&
legacy
.
length
>
0
)
{
for
(
File
legacyUserDir
:
legacy
)
{
final
XmlFile
legacyXml
=
new
XmlFile
(
XSTREAM
,
new
File
(
legacyUserDir
,
"config.xml"
));
try
{
Object
o
=
legacyXml
.
read
();
if
(
o
instanceof
User
)
{
if
(
idStrategy
().
equals
(
id
,
legacyUserDir
.
getName
())
&&
!
idStrategy
()
.
filenameOf
(
legacyUserDir
.
getName
())
.
equals
(
legacyUserDir
.
getName
()))
{
if
(!
legacyUserDir
.
renameTo
(
configFile
.
getParentFile
()))
{
LOGGER
.
log
(
Level
.
WARNING
,
"Failed to migrate user record "
+
"from {0} to {1}"
,
new
Object
[]
{
legacyUserDir
,
configFile
.
getParentFile
()});
}
break
;
}
}
else
{
LOGGER
.
log
(
FINE
,
"Unexpected object loaded from {0}: {1}"
,
new
Object
[]
{
legacyUserDir
,
o
});
}
}
catch
(
IOException
e
)
{
LOGGER
.
log
(
Level
.
FINE
,
String
.
format
(
"Exception trying to load user from {0}: {1}"
,
new
Object
[]
{
legacyUserDir
,
e
.
getMessage
()}),
e
);
}
}
}
}
if
(
u
==
null
&&
(
create
||
configFile
.
exists
()))
{
User
tmp
=
new
User
(
id
,
fullName
);
User
prev
;
byNameLock
.
readLock
().
lock
();
try
{
prev
=
byName
.
putIfAbsent
(
idkey
,
u
=
tmp
);
}
finally
{
byNameLock
.
readLock
().
unlock
();
}
if
(
prev
!=
null
)
{
u
=
prev
;
// if someone has already put a value in the map, use it
if
(
LOGGER
.
isLoggable
(
Level
.
FINE
)
&&
!
fullName
.
equals
(
prev
.
getFullName
()))
{
LOGGER
.
log
(
Level
.
FINE
,
"mismatch on fullName (‘"
+
fullName
+
"’ vs. ‘"
+
prev
.
getFullName
()
+
"’) for ‘"
+
id
+
"’"
,
new
Throwable
());
}
}
else
if
(!
id
.
equals
(
fullName
)
&&
!
configFile
.
exists
())
{
// JENKINS-16332: since the fullName may not be recoverable
// from the id, and various code may store the id only, we
// must save the fullName
try
{
u
.
save
();
}
catch
(
IOException
x
)
{
LOGGER
.
log
(
Level
.
WARNING
,
null
,
x
);
}
}
}
return
u
;
}
Based on the code examples in the previous section, keeping your units simple is important for two main reasons:
A simple unit is easier to understand, and thus modify, than a complex one.
Simple units ease testing.
Units with high complexity are generally hard to understand, which makes them hard to modify. The first code example of the first section was not overly complicated, but it would be when it checks for, say, 15 or more nationalities. The second code example covers many use cases for looking up or creating users. Understanding the second code example in order to make a functional change is quite a challenge. The time it takes to understand the code makes modification harder.
There is a good reason you should keep your units simple: to make the process of testing easier. If there are six control flow branches in a unit, you will need at least six test cases to cover all of them. Consider the getFlagColors
method: six tests to cover five nationalities plus the default case would prevent trivial bugs from being introduced by maintenance work.
As explained at the beginning of this chapter, we need to limit the number of branch points to four. In Java the following statements and operators count as branch points:
if
case
?
&&
, ||
while
for
catch
So how can we limit the number of branch points? Well, this is mainly a matter of identifying the proper causes of high complexity. In a lot of cases, a complex unit consists of several code blocks glued together, where the complexity of the unit is the sum of its parts. In other cases, the complexity arises as the result of nested if-then-else
statements, making the code increasingly harder to understand with each level of nesting. Another possibility is the presence of a long chain of if-then-else
statements or a long switch
statement, of which the getFlagColors
method in the introduction is an example.
Each of these cases has its own problem, and thus, its own solution. The first case, where a unit consists of several code blocks that execute almost independently, is a good candidate for refactoring using the Extract Method pattern. This way of reducing complexity is similar to Chapter 2. But what to do when faced with the other cases of complexity?
A chain of if-then-else
statements has to make a decision every time a conditional if
is encountered. An easy-to-handle situation is the one in which the conditionals are mutually exclusive; that is, they each apply to a different situation. This is also the typical use case for a switch
statement, like the switch from the getFlagColors
method.
There are many ways to simplify this type of complexity, and selecting the best solution is a trade-off that depends on the specific situation. For the getFlagColors
method we present two alternatives to reduce complexity. The first is the introduction of a Map
data structure that maps nationalities to specific Flag
objects. This refactoring reduces the complexity of the getFlagColors
method from McCabe 7 to McCabe 2.
private
static
Map
<
Nationality
,
List
<
Color
>>
FLAGS
=
new
HashMap
<
Nationality
,
List
<
Color
>>();
static
{
FLAGS
.
put
(
DUTCH
,
Arrays
.
asList
(
Color
.
RED
,
Color
.
WHITE
,
Color
.
BLUE
));
FLAGS
.
put
(
GERMAN
,
Arrays
.
asList
(
Color
.
BLACK
,
Color
.
RED
,
Color
.
YELLOW
));
FLAGS
.
put
(
BELGIAN
,
Arrays
.
asList
(
Color
.
BLACK
,
Color
.
YELLOW
,
Color
.
RED
));
FLAGS
.
put
(
FRENCH
,
Arrays
.
asList
(
Color
.
BLUE
,
Color
.
WHITE
,
Color
.
RED
));
FLAGS
.
put
(
ITALIAN
,
Arrays
.
asList
(
Color
.
GREEN
,
Color
.
WHITE
,
Color
.
RED
));
}
public
List
<
Color
>
getFlagColors
(
Nationality
nationality
)
{
List
<
Color
>
colors
=
FLAGS
.
get
(
nationality
);
return
colors
!=
null
?
colors
:
Arrays
.
asList
(
Color
.
GRAY
);
}
A second, more advanced way to reduce the complexity of the getFlagColors
method is to apply a refactoring pattern that separates functionality for different flags in different flag types. You can do this by applying the Replace Conditional with Polymorphism pattern: each flag will get its own type that implements a general interface. The polymorphic behavior of the Java language will ensure that the right functionality is called during runtime.
For this refactoring, we start with a general Flag
interface:
public
interface
Flag
{
List
<
Color
>
getColors
();
}
and specific flag types for different nationalities, such as for the Dutch:
public
class
DutchFlag
implements
Flag
{
public
List
<
Color
>
getColors
()
{
return
Arrays
.
asList
(
Color
.
RED
,
Color
.
WHITE
,
Color
.
BLUE
);
}
}
and the Italian:
public
class
ItalianFlag
implements
Flag
{
public
List
<
Color
>
getColors
()
{
return
Arrays
.
asList
(
Color
.
GREEN
,
Color
.
WHITE
,
Color
.
RED
);
}
}
The getFlagColors
method now becomes even more concise and less error-prone:
private
static
final
Map
<
Nationality
,
Flag
>
FLAGS
=
new
HashMap
<
Nationality
,
Flag
>();
static
{
FLAGS
.
put
(
DUTCH
,
new
DutchFlag
());
FLAGS
.
put
(
GERMAN
,
new
GermanFlag
());
FLAGS
.
put
(
BELGIAN
,
new
BelgianFlag
());
FLAGS
.
put
(
FRENCH
,
new
FrenchFlag
());
FLAGS
.
put
(
ITALIAN
,
new
ItalianFlag
());
}
public
List
<
Color
>
getFlagColors
(
Nationality
nationality
)
{
Flag
flag
=
FLAGS
.
get
(
nationality
);
flag
=
flag
!=
null
?
flag
:
new
DefaultFlag
();
return
flag
.
getColors
();
}
This refactoring offers the most flexible implementation. For example, it allows the flag type hierarchy to grow over time by implementing new flag types and testing these types in isolation. A drawback of this refactoring is that it introduces more code spread out over more classes. The developer much choose between extensibility and conciseness.
Suppose a unit has a deeply nested conditional, as in the following example. Given a binary search tree root node and an integer, the calculateDepth
method determines whether the integer occurs in the tree. If so, the method returns the depth of the integer in the tree; otherwise, it throws a TreeException
:
public
static
int
calculateDepth
(
BinaryTreeNode
<
Integer
>
t
,
int
n
)
{
int
depth
=
0
;
if
(
t
.
getValue
()
==
n
)
{
return
depth
;
}
else
{
if
(
n
<
t
.
getValue
())
{
BinaryTreeNode
<
Integer
>
left
=
t
.
getLeft
();
if
(
left
==
null
)
{
throw
new
TreeException
(
"Value not found in tree!"
);
}
else
{
return
1
+
calculateDepth
(
left
,
n
);
}
}
else
{
BinaryTreeNode
<
Integer
>
right
=
t
.
getRight
();
if
(
right
==
null
)
{
throw
new
TreeException
(
"Value not found in tree!"
);
}
else
{
return
1
+
calculateDepth
(
right
,
n
);
}
}
}
}
To improve readability, we can get rid of the nested conditional by identifying the distinct cases and insert return
statements for these. In terms of refactoring, this is called the Replace Nested Conditional with Guard Clauses pattern. The result will be the following method:
public
static
int
calculateDepth
(
BinaryTreeNode
<
Integer
>
t
,
int
n
)
{
int
depth
=
0
;
if
(
t
.
getValue
()
==
n
)
return
depth
;
if
(
n
<
t
.
getValue
()
&&
t
.
getLeft
()
!=
null
)
return
1
+
calculateDepth
(
t
.
getLeft
(),
n
);
if
(
n
>
t
.
getValue
()
&&
t
.
getRight
()
!=
null
)
return
1
+
calculateDepth
(
t
.
getRight
(),
n
);
throw
new
TreeException
(
"Value not found in tree!"
);
}
Although the unit is now easier to understand, its complexity has not decreased. In order to reduce the complexity, you should extract the nested conditionals to separate methods. The result will be as follows:
public
static
int
calculateDepth
(
BinaryTreeNode
<
Integer
>
t
,
int
n
)
{
int
depth
=
0
;
if
(
t
.
getValue
()
==
n
)
return
depth
;
else
return
traverseByValue
(
t
,
n
);
}
private
static
int
traverseByValue
(
BinaryTreeNode
<
Integer
>
t
,
int
n
)
{
BinaryTreeNode
<
Integer
>
childNode
=
getChildNode
(
t
,
n
);
if
(
childNode
==
null
)
{
throw
new
TreeException
(
"Value not found in tree!"
);
}
else
{
return
1
+
calculateDepth
(
childNode
,
n
);
}
}
private
static
BinaryTreeNode
<
Integer
>
getChildNode
(
BinaryTreeNode
<
Integer
>
t
,
int
n
)
{
if
(
n
<
t
.
getValue
())
{
return
t
.
getLeft
();
}
else
{
return
t
.
getRight
();
}
}
This actually does decrease the complexity of the unit. Now we have achieved two things: the methods are easier to understand, and they are easier to test in isolation since we can now write unit tests for the distinct functionalities.
Of course, when you are writing code, units can easily become complex. You may argue that high complexity is bound to arise or that reducing unit complexity in your codebase will not help to increase the maintainability of your system. Such objections are discussed next.
“Our domain is very complex, and therefore high code complexity is unavoidable.”
When you are working in a complex domain—such as optimizing logistical problems, real-time visualizations, or anything that demands advanced application logic—it is natural to think that the domain’s complexity carries over to the implementation, and that this is an unavoidable fact of life.
We argue against this common interpretation. Complexity in the domain does not require the technical implementation to be complex as well. In fact, it is your responsibility as a developer to simplify problems such that they lead to simple code. Even if the system as a whole performs complex functionality, it does not mean that units on the lowest level should be complex as well. In cases where a system needs to process many conditions and exceptions (such as certain legislative requirements), one solution may be to implement a default, simple process and model the exceptions explicitly.
It is true that the more demanding a domain is, the more effort the developer must expend to build technically simple solutions. But it can be done! We have seen many highly maintainable systems solving complex business problems. In fact, we believe that the only way to solve complex business problems and keep them under control is through simple code.
“Replacing one method with McCabe 15 by three methods with McCabe 5 each means that overall McCabe is still 15 (and therefore, there are 15 control flow branches overall). So nothing is gained.”
Of course, you will not decrease the overall McCabe complexity of a system by refactoring a method into several new methods. But from a maintainability perspective, there is an advantage to doing so: it will become easier to test and understand the code that was written. So, as we already mentioned, newly written unit tests allow you to more easily identify the root cause of your failing tests.
Put your code in simple units (at most four branch points) that have carefully chosen names describing their function and cases.
See also Chapter 2 on refactoring patterns for splitting units up in smaller units.