2

Using TDD to Create Good Code

We’ve seen that bad code is bad news: bad for business, bad for users, and bad for developers. Test-driven development (TDD) is a core software engineering practice that helps us keep bad code out of our systems.

The goal of this chapter is to learn the specifics of how TDD helps us to create well-engineered, correct code, and how it helps us to keep it that way. By the end, we will understand the basic principles behind good code and how TDD helps us create it. It is important for us to understand why TDD works in order to motivate us and so that we have a response to give to colleagues about why we recommend that they use it as well.

In this chapter, we’re going to cover the following main topics:

  • Designing good quality code
  • Revealing design flaws
  • Preventing logic flaws
  • Protecting against future defects
  • Documenting our code

Designing good quality code

Good quality code doesn’t happen by accident. It is intentional. It is the result of thousands of small decisions, each one shaping how easy our code is to read, test, compose, and change. We must choose between quick-and-dirty hacks, where we have no idea what edge cases are covered, and more robust approaches, where we are confident that no matter how the user misuses our code, it will work as expected.

Every line of source code involves at least one of these decisions. That’s an awful lot of deciding that we have to do.

You’ll notice that we haven’t mentioned TDD so far. As we will see, TDD does not design your code for you. It doesn’t remove that essential engineering sensibility and creative input that is needed to turn requirements into code. To be honest, I’m grateful for that – it’s the part that I enjoy.

However, that does cause a lot of early failure with TDD, which is worth noting. Expecting to implement the TDD process and get good quality code out without your own design input will simply not work. TDD, as we will see, is a tool that allows you to get rapid feedback on these design decisions. You can change your mind and adapt while the code is still cheap and quick to change but they are still your design decisions that are playing out.

So, what is good code? What are we aiming for?

Good code, for me, is all about readability. I optimize for clarity. I want to be kind to my future self and my long-suffering colleagues by engineering code that is clear and safe to work with. I want to create clear and simple code that is free of hidden traps.

While there is a huge range of advice on what makes good code, the basics are straightforward:

  • Say what you mean, mean what you say
  • Take care of the details in private
  • Avoid accidental complexity

It’s worth a quick review of what I mean by those things.

Say what you mean, mean what you say

Here’s an interesting experiment. Take a piece of source code (in any language) and strip out everything that is not part of the language specification, then see if you can figure out what it does. To make things really stand out, we will replace all method names and variable identifiers with the symbol ???.

Here’s a quick example:

public boolean ??? (int ???) {
    if ( ??? > ??? ) {
        return ???;
    }
    return ???;
}

Any ideas what this code does? No, me neither. I haven’t a clue.

I can tell by its shape that it is some kind of assessment method that passes something in and returns true/false. Maybe it implements a threshold or limit. It uses a multipath return structure, where we check something, then return an answer as soon as we know what that answer is.

While the shape of the code and the syntax tell us something, it’s not telling us much. It is definitely not enough. Nearly all the information we share about what our code does is a result of the natural language identifiers we choose. Names are absolutely vital to good code. They are beyond important. They are everything. They can reveal intent, explain outcomes, and describe why a piece of data is important to us, but they can’t do any of this if we do a bad job choosing our names.

I use two guidelines for names, one for naming active code – methods and functions – and one for variables:

  • Method – Say what it does. What is the outcome? Why would I call this?
  • Variable – Say what it contains. Why would I access this?

A common mistake with method naming is to describe how it works internally, instead of describing what the outcome is. A method called addTodoItemToItemQueue is committing us to one specific implementation of a method that we don’t really care about. Either that or it is misinformation. We can improve the name by calling it add(Todo item). This name tells us why exactly we should call this method. It leaves us free to revise how it is coded later.

The classic mistake with variable names is to say what they are made of. For example, the variable name String string helps nobody, whereas String firstName tells me clearly that this variable is somebody’s first name. It tells me why I would want to read or write that variable.

Perhaps more importantly, it tells us what not to write in that variable. Having one variable serve multiple purposes in the same scope is a real headache. Been there, done that, never going back.

It turns out that code is storytelling, pure and simple. We tell the story of what problem we are solving and how we have decided to solve it to human programmers. We can throw any old code into a compiler and the computer will make it work but we must take more care if we want humans to understand our work.

Take care of the details in private

Taking care of the details in private is a simple way to describe the computer science concepts of abstraction and information hiding. These are fundamental ideas that allow us to break complex systems into smaller, simpler parts.

The way I think about abstraction is the same way I think about hiring an electrician for my house.

I know that my electric water heater needs to be fixed but I don’t want to know how. I don’t want to learn how to do it. I don’t want to have to figure out what tools are needed and buy them. I want to have nothing whatsoever to do with it, beyond asking that it gets done when I need it done. So, I’ll call the electrician and ask them to do it. I’m more than happy to pay for a good job, as long as I don’t have to do it myself.

This is what abstraction means. The electrician abstracts the job of fixing my water heater. Complex stuff gets done in response to my simple requests.

Abstraction happens everywhere in good software.

Every time you make some kind of detail less important, you have abstracted it. A method has a simple signature but the code inside it may be complex. This is an abstraction of an algorithm. A local variable might be declared as type String. This is an abstraction of the memory management of each text character and the character encoding. A microservice that will send discount vouchers to our top customers who haven’t visited the site in a while is an abstraction of a business process. Abstraction is everywhere in programming, across all major paradigms – object-oriented programming (OOP), procedural, and functional.

The idea of splitting software into components, each of which takes care of something for us, is a massive quality driver. We centralize decisions, meaning that we don’t make mistakes in duplicated code. We can test a component thoroughly in isolation. We design out problems caused by hard-to-write code just by writing it once and having an easy-to-use interface.

Avoid accidental complexity

This is my personal favorite destroyer of good code – complex code that simply never needed to exist.

There are always many ways of writing a piece of code. Some of them use complicated features or go all around the houses; they use convoluted chains of actions to do a simple thing. All versions of the code get the same result but some just do it in a more complicated way by accident.

My goal for code is to tell at first sight the story of what problem I am solving, leaving the details about how I am solving it for closer analysis. This is quite different from how I learned how to code originally. I choose to emphasize domain over mechanism. The domain here means using the same language as the user, for example, expressing the problem in business terms, not just raw computer code syntax. If I am writing a banking system, I want to see money, ledgers, and transactions coming to the forefront. The story the code is telling has to be that of banking.

Implementation details such as message queues and databases are important but only as far as they describe how we are solving the problem today. They may need to change later. Whether they change or not, we still want the primary story to be about transactions going into an account and not message queues talking to REST services.

As our code gets better at telling the story of the problem we are solving, we make it easier to write replacement components. Swapping out a database for another vendor’s product is simplified because we know exactly what purpose it is serving in our system.

This is what we mean by hiding details. At some level, it is important to see how we wired up the database, but only after we have seen why we even needed one in the first place.

To give you a concrete example, here is a piece of code similar to some code that I found in a production system:

public boolean isTrue (Boolean b) {
    boolean result = false;
    if ( b == null ) {
        result = false;
    }
    else if ( b.equals(Boolean.TRUE)) {
        result = true;
    }
    else if ( b.equals(Boolean.FALSE)) {
        result = false;
    }
    else {
        result = false;
    }
    return result;
}

You can see the problem here. Yes, there is a need for a method like this. It is a low-level mechanism that converts a Java true/false object into its equivalent primitive type and does it safely. It covers all edge cases relating to a null value input, as well as valid true/false values.

However, it has problems. This code is cluttered. It is unnecessarily hard to read and test. It has high cyclomatic complexity (CYC). CYC is an objective measure of how complex a piece of code is, based on the number of independent execution paths possible in a section of code.

The previous code is unnecessarily verbose and over-complicated. I’m pretty sure it has a dead-code path – meaning a path containing unreachable code – on that final else, as well.

Looking at the logic needed, there are only three interesting input conditions: null, true, and false. It certainly does not need all those else/if chains to decode that. Once you’ve got that null-to-false conversion out of the way, you really only need to inspect one value before you can fully decide what to return.

A better equivalent would be the following:

    public boolean isTrue (Boolean b) {
        return Boolean.TRUE.equals(b);
    }

This code does the same thing with a lot less fuss. It does not have the same level of accidental complexity as the previous code. It reads better. It is easier to test with fewer paths needing testing. It has a better cyclomatic complexity figure, which means fewer places for bugs to hide. It tells a better story about why the method exists. To be perfectly honest, I might even refactor this method by inlining it. I’m not sure the method adds any worthwhile extra explanation to the implementation.

This method was a simple example. Just imagine seeing this scaled up to thousands of lines of copy-pasted, slightly-changed code. You can see why accidental complexity is a killer. This cruft builds up over time and grows exponentially. Everything becomes harder to read and harder to safely change.

Yes, I have seen that. I will never stop being sad about it when I do. We can do better than this. As professional software engineers, we really should.

This section has been a lightning tour of good design fundamentals. They apply across all styles of programming. However, if we can do things right, we can also do things wrong. In the next section, we’ll take a look at how TDD tests can help us prevent bad designs.

Revealing design flaws

Bad design is truly bad. It is the root cause of software being hard to change and hard to work with. You can never quite be sure whether your changes are going to work because you can never quite be sure what a bad design is really doing. Changing that kind of code is scary and often gets put off. Whole sections of code can be left to rot with only a /* Here be dragons! */ comment to show for it.

The first major benefit of TDD is that it forces us to think about the design of a component. We do that before we think about how we implement it. By doing things in this order, we are far less likely to drift into a bad design by mistake.

The way we consider the design first is to think about the public interfaces of a component. We think about how that component will be used and how it will be called. We don’t yet consider how we will make any implementations actually work. This is outside-in thinking. We consider the usage of the code from outside callers before we consider any inside implementation.

This is quite a different approach to take for many of us. Typically, when we need code to do something, we start by writing the implementation. After that, we will ripple out whatever is needed in method signatures, without a thought about the call site. This is inside-out thinking. It works, of course, but it often leads to complex calling code. It locks us into implementation details that just aren’t important.

Outside-in thinking means we get to dream up the perfect component for its users. Then, we will bend the implementation to work with our desired code at the call site. Ultimately, this is far more important than the implementation. This is, of course, abstraction being used in practice.

We can ask questions like the following:

  • Is it easy to set up?
  • Is it easy to ask it to do something?
  • Is the outcome easy to work with?
  • Is it difficult to use it the wrong way?
  • Have we made any incorrect assumptions about it?

You can see that by asking the right sort of questions, we’re going to get the right sort of results.

By writing tests first, we cover all these questions. We decide upfront how we are going to set up our component, perhaps deciding on a clear constructor signature for an object. We decide how we are going to make the calling code look and what the call site will be. We decide how we will consume any results returned or what the effect will be on collaborating components.

This is the heart of software design. TDD does not do this for us, nor does it force us to do a good job. We could still come up with terrible answers for all those questions and simply write a test to lock those poor answers into place. I’ve seen that happen on numerous occasions in real code as well.

TDD provides that early opportunity to reflect on our decisions. We are literally writing the first example of a working, executable call site for our code before we even think about how it will work. We are totally focused on how this new component is going to fit into the bigger picture.

The test itself provides immediate feedback on how well our decisions have worked out. It gives three tell-tale signals that we could and should improve. We’ll save the details for a later chapter but the test code itself clearly shows when your component is either hard to set up, hard to call, or its outputs are hard to work with.

Analyzing the benefits of writing tests before production code

There are three times you can choose to write tests: before the code, after the code, or never.

Obviously, never writing any tests sends us back to the dark ages of development. We’re winging it. We write code assuming it will work, then leave it all to a manual test stage later. If we’re lucky, we will discover functional errors at this stage, before our customers do.

Writing tests just after we complete a small chunk of code is a much better option. We get much faster feedback. Our code isn’t necessarily any better though, because we write with the same mindset as we do without the implementation of tests. The same kinds of functional errors will be present. The good news is that we will then write tests to uncover them.

This is a big improvement, but it still isn’t the gold standard, as it leads to a couple of subtle problems:

  • Missing tests
  • Leaky abstractions

Missing tests – undetected errors

Missing tests happen because of human nature. When we are busy writing code, we are juggling many ideas in our heads at once. We focus on specific details at the expense of others. I always find that I mentally move on a bit too quickly after a line of code. I just assume that it’s going to be okay. Unfortunately, when I come to write my tests, that means I’ve forgotten some key points.

Suppose I end up writing some code like this:

public boolean isAllowed18PlusProducts( Integer age ) {
    return (age != null)  && age.intValue() > 18;
}

I’ll probably have quickly started with the > 18 check, then moved on mentally and remembered that the age could be null. I will have added the And clause to check whether it is or not. That makes sense. My experience tells me that this particular snippet of code needs to do more than be a basic, robust check.

When I write my test, I’ll remember to write a test for what happens when I pass in null, as that is fresh in my mind. Then, I will write another test for what happens with a higher age, say 21. Again, good.

Chances are that I will forget about writing a test for the edge case of an age value of 18. That’s really important here but my mind has moved on from that detail already. All it will take is one Slack message from a colleague about what’s for lunch, and I will most likely forget all about that test and start coding the next method.

The preceding code has a subtle bug in it. It is supposed to return true for any age that is 18 or above. It doesn’t. It returns true only for 19 and above. The greater-than symbol should have been a greater-than-or-equal-to symbol but I missed this detail.

Not only did I miss the nuance in the code but I missed out a vital test. I wrote two important tests but I needed three.

Because I wrote the other tests, I get no warning at all about this. You don’t get a failing test that you haven’t written.

We can avoid this by writing a failing test for every piece of code, then adding only enough code to make that test pass. That workflow would have been more likely to steer us toward thinking through the four tests needed to drive out null handling and the three boundary cases relating to age. It cannot guarantee it, of course, but it can drive the right kind of thinking.

Leaky abstractions – exposing irrelevant details

Leaky abstractions are a different problem. This is where we focus so much on the inside of the method that we forget to think about our dream call site. We just ripple out whatever is easiest to code.

We might be writing an interface where we store UserProfile objects. We might proceed code-first, pick ourselves a JDBC library that we like, code up the method, then find that it needs a database connection.

We might simply add a Connection parameter to fix that:

interface StoredUserProfiles {
    UserProfile load( Connection conn, int userId );
}

At first sight, there’s nothing much wrong with it. However, look at that first parameter: it’s the JDBC-specific Connection object. We have locked our interface into having to use JDBC. Or at the very least, having to supply some JDBC-related thing as a first parameter. We didn’t even mean to do that. We simply hadn’t thought about it thoroughly.

If we think about the ideal abstraction, it should load the corresponding UserProfile object for the given userId. It should not know how it is stored. The JDBC-specific Connection parameter should not be there.

If we think outside-in and consider the design before the implementation, we are less likely to go down this route.

Leaky abstractions like this create accidental complexity. They make code harder to understand by forcing future readers to wonder why we are insisting on JDBC use when we never meant to do so. We just forgot to design it out.

Writing tests first helps prevent this. It leads us to think about the ideal abstractions as a first step so we can write the test for them.

Once we have that test coded up, we have locked in our decision on how the code will be used. Then, we can figure out how to implement that without any unwanted details leaking out.

The previously explained techniques are simple but cover most of the basics of good design. Use clear names. Use simple logic. Use abstraction to hide implementation details, so that we emphasize what problem we are solving, rather than how we are solving it. In the next section, let’s review the most obvious benefit of TDD: preventing flaws in our logic.

Preventing logic flaws

The idea of logic errors is perhaps what everybody thinks of first when we talk about testing: did it work right?

I can’t disagree here – this is really important. As far as users, revenues, our Net Promoter Score®™, and market growth go, if your code doesn’t work right, it doesn’t sell. It’s that simple.

Understanding the limits of manual testing

We know from bitter experience that the simplest logic flaws are often the easiest to create. The examples that we can all relate to are those one-off errors, that NullPointerException from an uninitialized variable, and that exception thrown by a library that wasn’t in the documentation. They are all so simple and small. It seems like it would be so obvious for us to realize that we were making these mistakes, yet we all know they are often the hardest to spot. When we humans concentrate on the big picture of our code, sometimes these critical details just go unnoticed.

We know that manual testing can reveal these logic flaws but we also know from experience that manual test plans are fragile. It is possible to miss steps out or rush and miss important errors. We might simply assume that something does not need testing on this release because we did not change that section of code. You guessed it – that doesn’t always work out so well for us. Bugs can arise in sections of code that seem totally unrelated to the bug if some underlying assumption has changed.

Manual testing costs money, which is money that can now not be spent on adding shiny new features instead.

Manual testing also gets blamed for delaying ship dates. Now, this is spectacularly unfair to our manual test colleagues. The development team – obviously writing code without TDD tests – stumble over their own bugs until there are only a couple of days left to ship. Then, we hand over the code to the testers, who have to run a huge test document in next to no time. They sometimes get blamed for delaying the release, even though the real cause was development taking longer than it should.

Yet, we never truly had a release. If we define a release as including tested code, which we should, then it is clear that the necessary testing never happened. You can’t ethically release code when you don’t even know whether it works. If you do, your users will be quick to complain.

It’s no wonder some of my testing colleagues get so grumpy by the end of a sprint.

Solving problems by automating the tests

TDD has this totally covered. These logic errors simply cannot arise, which sounds like fantasy, but it really is true.

Before you type any production code, you have already written a failing test. Once you add your new code, you rerun the test. If you somehow typed in a logic error, the test still fails and you know about it right away. That’s the magic here: your mistake happens but is highlighted right away. This enables you to fix it when it is fresh in your mind. It also means you cannot forget about fixing it later on.

You can often go to the exact line that’s wrong and make the change. It’s 10 seconds of work, not months of waiting for a test silo to get to work and fill out a JIRA bug ticket.

The kinds of unit tests we are talking about are also fast to run – very fast. Many of them run within a millisecond. Compare that to the total time to write a test plan document, run the whole app, set up stored data, operate the user interface (UI), record output, then write up a bug ticket. It is incomparably better, isn’t it?

You can see how this is a bug-squashing superpower. We are making significant time savings within the code-test-debug cycle. This reduces development costs and increases delivery velocity. These are big wins for our team and our users.

Every time you write a test before code, you have kept bugs out of that code. You follow the most basic rule that you do not check code with failing tests. You make them pass.

It shouldn’t need saying but you also don’t cheat around that failing test by deleting it, ignoring it, or making it always pass by using some technical hack. However, I am saying all this because I have seen exactly that done in real code.

We’ve seen how writing tests first helps prevent adding bugs in our new code but TDD is even better than that: it helps prevent adding bugs in code that we will add in the future, which we will cover in the next section.

Protecting against future defects

As we grow our code by writing tests first, we could always simply delete each test after it has passed. I’ve seen some students do that when I’ve taught them TDD because I hadn’t explained that we shouldn’t do that yet. Regardless, we don’t delete tests once they pass. We keep them all.

Tests grow into large regression suites, automatically testing every feature of the code we have built. By frequently running all the tests, we gain safety and confidence in the entire code base.

As team members add features to this code base, keeping all the tests passing shows that nobody has accidentally broken something. It is quite possible in software to add a perfectly innocent change somewhere, only to find that some seemingly unrelated thing has now stopped working. This will be because of the relationship between those two pieces that we previously did not understand.

The tests have now caused us to learn more about our system and our assumptions. They have prevented a defect from being written into the code base. These are both great benefits but the bigger picture is that our team has the confidence to make changes safely and know they have tests automatically looking after them.

This is true agility, the freedom to change. Agility was never about JIRA tickets and sprints. It was always about the ability to move quickly, with confidence, through an ever-changing landscape of requirements. Having tens of thousands of fast-running automated tests is probably the biggest enabling practice we have.

The ability of tests to give team members confidence to work quickly and effectively is a huge benefit of TDD. You may have heard the phrase move fast and break things, famous from the early days of Facebook. TDD allows us to move fast and not break things.

As we’ve seen, tests are great at providing fast feedback on design and logic correctness, as well as providing a defense against future bugs, but one huge extra benefit is that tests document our code.

Documenting our code

Everybody likes helpful, clear documentation, but not when it is out of date and unrelated to the current code base.

There is a general principle in software that the more separation there is between two related ideas, the more pain they will bring. As an example, think of some code that reads some obscure file format that nobody remembers. All works well, so long as you are reading files in that old format. Then you upgrade the application, that old file format is no longer supported, and everything breaks. The code was separated from the data content in those old files. The files didn’t change but the code did. We didn’t even realize what was going on.

It’s the same with documentation. The worst documentation is often contained in the glossiest productions. These are artifacts written a long time after the code was created by teams with separate skillsets – copywriting, graphic design, and so on. Documentation updates are the first thing to get dropped from a release when time gets tight.

The solution is to bring documentation closer to the code. Get it produced by people closer to the code who know how it works in detail. Get it read by people who need to work directly with that code.

As with all other aspects of Extreme Programming (XP), the most obvious major win is to make it so close to the code that it is the code. Part of this involves using our good design fundamentals to write clear code and our test suite also plays a key role.

Our TDD tests are code, not manual test documents. They are usually written in the same language and repo as the main code base. They will be written by the same people who are writing the production code – the developers.

The tests are executable. As a form of documentation, you know that something that can run has to be up to date. Otherwise, the compiler will complain, and the code will not run.

Tests also form the perfect example of how to use our production code. They clearly define how it should be set up, what dependencies it has, what its interesting methods and functions are, what its expected effects are, and how it will report errors. Everything you would want to know about that code is in the tests.

It may be surprising at first. Testing and documentation are not normally confused with each other. Because of how TDD works, there is a huge overlap between the two. Our test is a detailed description of what our code should do and how we can make it do that for us.

Summary

In this chapter, we’ve learned that TDD helps us create good designs, write correct logic, prevent future defects, and provide executable documentation for our code. Understanding what TDD will do for our projects is important to use it effectively and to persuade our teams to use it as well. There are many advantages to TDD and yet it is not used as often as it should be in real-world projects.

In the next chapter, we will look into some common objections to TDD, learn why they are not valid, and how we can help our colleagues overcome them.

Questions and answers

  1. What is the connection between testing and clean code?

There is not a direct one, which is why we need to understand how to write clean code. How TDD adds value is that it forces us to think about how our code will be used before we write it and when it is easiest to clean up. It also allows us to refactor our code, changing its structure without changing its function, with certainty that we have not broken that function.

  1. Can tests replace documentation?

Well-written tests replace some but not all documentation. They become a detailed and up-to-date executable specification for our code. What they cannot replace are documents such as user manuals, operations manuals, or contractual specifications for public application programming interfaces (APIs).

  1. What are the problems with writing production code before tests?

If we write production code first, then add tests later, we are more likely to face the following problems:

  • Missing broken edge cases on conditionals
  • Leaking implementation details through interfaces
  • Forgetting important tests
  • Having untested execution paths
  • Creating difficult-to-use code
  • Forcing more rework when design flaws are revealed later in the process

Further reading

A formal definition of cyclomatic complexity can be found in the WikiPedia link. Basically, every conditional statement adds to the complexity, as it creates a new possible execution path:

https://en.wikipedia.org/wiki/Cyclomatic_complexity

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset