Writing a unit test

Unit tests take their name after the fact that they are used to test small units of code. To explain how to write a unit test, let's take a look at a simple snippet:

# data.py
def get_clean_data(source): 
    data = load_data(source) 
    cleaned_data = clean_data(data) 
    return cleaned_data

The get_clean_data function is responsible for getting data from source, cleaning it, and returning it to the caller. How do we test this function?

One way of doing this is to call it and then make sure that load_data was called once with source as its only argument. Then we have to verify that clean_data was called once, with the return value of load_data. And, finally, we would need to make sure that the return value of clean_data is what is returned by the get_clean_data function as well.

To do this, we need to set up the source and run this code, and this may be a problem. One of the golden rules of unit testing is that anything that crosses the boundaries of your application needs to be simulated. We don't want to talk to a real data source, and we don't want to actually run real functions if they are communicating with anything that is not contained in our application. A few examples would be a database, a search service, an external API, and a file in the filesystem.

We need these restrictions to act as a shield, so that we can always run our tests safely without the fear of destroying something in a real data source.

Another reason is that it may be quite difficult for a single developer to reproduce the whole architecture on their box. It may require the setting up of databases, APIs, services, files and folders, and so on and so forth, and this can be difficult, time-consuming, or sometimes not even possible.

Very simply put, an application programming interface (API) is a set of tools for building software applications. An API expresses a software component in terms of its operations, input and output, and underlying types. For example, if you create a software that needs to interface with a data provider service, it's very likely that you will have to go through their API in order to gain access to the data.

Therefore, in our unit tests, we need to simulate all those things in some way. Unit tests need to be run by any developer without the need for the whole system to be set up on their box.

A different approach, which I always favor when it's possible to do so, is to simulate entities without using fake objects, but using special-purpose test objects instead. For example, if your code talks to a database, instead of faking all the functions and methods that talk to the database and programming the fake objects so that they return what the real ones would, I'd much rather spawn a test database, set up the tables and data I need, and then patch the connection settings so that my tests are running real code, against the test database, thereby doing no harm at all. In-memory databases are excellent options for these cases.

One of the applications that allow you to spawn a database for testing is Django. Within the django.test package, you can find several tools that help you write your tests so that you won't have to simulate the dialog with a database. By writing tests this way, you will also be able to check on transactions, encodings, and all other database-related aspects of programming. Another advantage of this approach consists in the ability of checking against things that can change from one database to another.

Sometimes, though, it's still not possible, and we need to use fakes, so let's talk about them.

Table of Contents for Writing a unit test

Create new playlist

Sign In

Sign Up

Table of Contents for
Writing a unit test