Unit tests take their name after the fact that they are used to test small units of code. To explain how to write a unit test, let's take a look at a simple snippet:
# data.py
def get_clean_data(source): data = load_data(source) cleaned_data = clean_data(data) return cleaned_data
The get_clean_data function is responsible for getting data from source, cleaning it, and returning it to the caller. How do we test this function?
One way of doing this is to call it and then make sure that load_data was called once with source as its only argument. Then we have to verify that clean_data was called once, with the return value of load_data. And, finally, we would need to make sure that the return value of clean_data is what is returned by the get_clean_data function as well.
To do this, we need to set up the source and run this code, and this may be a problem. One of the golden rules of unit testing is that anything that crosses the boundaries of your application needs to be simulated. We don't want to talk to a real data source, and we don't want to actually run real functions if they are communicating with anything that is not contained in our application. A few examples would be a database, a search service, an external API, and a file in the filesystem.
We need these restrictions to act as a shield, so that we can always run our tests safely without the fear of destroying something in a real data source.
Another reason is that it may be quite difficult for a single developer to reproduce the whole architecture on their box. It may require the setting up of databases, APIs, services, files and folders, and so on and so forth, and this can be difficult, time-consuming, or sometimes not even possible.
Therefore, in our unit tests, we need to simulate all those things in some way. Unit tests need to be run by any developer without the need for the whole system to be set up on their box.
A different approach, which I always favor when it's possible to do so, is to simulate entities without using fake objects, but using special-purpose test objects instead. For example, if your code talks to a database, instead of faking all the functions and methods that talk to the database and programming the fake objects so that they return what the real ones would, I'd much rather spawn a test database, set up the tables and data I need, and then patch the connection settings so that my tests are running real code, against the test database, thereby doing no harm at all. In-memory databases are excellent options for these cases.
Sometimes, though, it's still not possible, and we need to use fakes, so let's talk about them.