Chapter 7. Testing, Profiling, and Dealing with Exceptions

 

"Code without tests is broken by design."

 
 --Jacob Kaplan-Moss

Jacob Kaplan-Moss is one of the core developers of the Django web framework. We're going to explore it in the next chapters. I strongly agree with this quote of his. I believe code without tests shouldn't be deployed to production.

Why are tests so important? Well, for one, they give you predictability. Or, at least, they help you achieve high predictability. Unfortunately, there is always some bug that sneaks into our code. But we definitely want our code to be as predictable as possible. What we don't want is to have a surprise, our code behaving in an unpredictable way. Would you be happy to know that the software that checks on the sensors of the plane that is taking you on holidays sometimes goes crazy? No, probably not.

Therefore we need to test our code, we need to check that its behavior is correct, that it works as expected when it deals with edge cases, that it doesn't hang when the components it's talking to are down, that the performances are well within the acceptable range, and so on.

This chapter is all about this topic, making sure that your code is prepared to face the scary outside world, that is fast enough and that it can deal with unexpected or exceptional conditions.

We're going to explore testing, including a brief introduction to test-driven development (TDD), which is one of my favorite working methodologies. Then, we're going to explore the world of exceptions, and finally we're going to talk a little bit about performances and profiling. Deep breath, and here we go...

Testing your application

There are many different kinds of tests, so many in fact that companies often have a dedicated department, called quality assurance (QA), made up of individuals that spend their day testing the software the company developers produce.

To start making an initial classification, we can divide tests into two broad categories: white-box and black-box tests.

White-box tests are those which exercise the internals of the code, they inspect it down to a very fine level of granularity. On the other hand, black-box tests are those which consider the software under testing as if being within a box, the internals of which are ignored. Even the technology, or the language used inside the box is not important for black-box tests. What they do is to plug input to one end of the box and verify the output at the other end, and that's it.

Note

There is also an in-between category, called gray-box testing, that involves testing a system in the same way we do with the black-box approach, but having some knowledge about the algorithms and data structures used to write the software and only partial access to its source code.

There are many different kinds of tests in these categories, each of which serves a different purpose. Just to give you an idea, here's a few:

  • Front-end tests make sure that the client side of your application is exposing the information that it should, all the links, the buttons, the advertising, everything that needs to be shown to the client. It may also verify that it is possible to walk a certain path through the user interface.
  • Scenario tests make use of stories (or scenarios) that help the tester work through a complex problem or test a part of the system.
  • Integration tests verify the behavior of the various components of your application when they are working together sending messages through interfaces.
  • Smoke tests are particularly useful when you deploy a new update on your application. They check whether the most essential, vital parts of your application are still working as they should and that they are not on fire. This term comes from when engineers tested circuits by making sure nothing was smoking.
  • Acceptance tests, or user acceptance testing (UAT) is what a developer does with a product owner (for example, in a SCRUM environment) to determine if the work that was commissioned was carried out correctly.
  • Functional tests verify the features or functionalities of your software.
  • Destructive tests take down parts of your system, simulating a failure, in order to establish how well the remaining parts of the system perform. These kinds of tests are performed extensively by companies that need to provide an extremely reliable service, such as Amazon, for example.
  • Performance tests aim to verify how well the system performs under a specific load of data or traffic so that, for example, engineers can get a better understanding of which are the bottlenecks in the system that could bring it down to its knees in a heavy load situation, or those which prevent scalability.
  • Usability tests, and the closely related user experience (UX) tests, aim to check if the user interface is simple and easy to understand and use. They aim to provide input to the designers so that the user experience is improved.
  • Security and penetration tests aim to verify how well the system is protected against attacks and intrusions.
  • Unit tests help the developer to write the code in a robust and consistent way, providing the first line of feedback and defense against coding mistakes, refactoring mistakes, and so on.
  • Regression tests provide the developer with useful information about a feature being compromised in the system after an update. Some of the causes for a system being said to have a regression are an old bug coming back to life, an existing feature being compromised, or a new issue being introduced.

Many books and articles have been written about testing, and I have to point you to those resources if you're interested in finding out more about all the different kinds of tests. In this chapter, we will concentrate on unit tests, since they are the backbone of software crafting and form the vast majority of tests that are written by a developer.

Testing is an art, an art that you don't learn from books, I'm afraid. You can learn all the definitions (and you should), and try and collect as much knowledge about testing as you can but I promise you, you will be able to test your software properly only when you have done it for long enough in the field.

When you are having trouble refactoring a bit of code, because every little thing you touch makes a test blow up, you learn how to write less rigid and limiting tests, which still verify the correctness of your code but, at the same time, allow you the freedom and joy to play with it, to shape it as you want.

When you are being called too often to fix unexpected bugs in your code, you learn how to write tests more thoroughly, how to come up with a more comprehensive list of edge cases, and strategies to cope with them before they turn into bugs.

When you are spending too much time reading tests and trying to refactor them in order to change a small feature in the code, you learn to write simpler, shorter, and better focused tests.

I could go on with this when you... you learn..., but I guess you get the picture. You need to get your hands dirty and build experience. My suggestion? Study the theory as much as you can, and then experiment using different approaches. Also, try to learn from experienced coders; it's very effective.

The anatomy of a test

Before we concentrate on unit tests, let's see what a test is, and what its purpose is.

A test is a piece of code whose purpose is to verify something in our system. It may be that we're calling a function passing two integers, that an object has a property called donald_duck, or that when you place an order on some API, after a minute you can see it dissected into its basic elements, in the database.

A test is typically comprised of three sections:

  • Preparation: This is where you set up the scene. You prepare all the data, the objects, the services you need in the places you need them so that they are ready to be used.
  • Execution: This is where you execute the bit of logic that you're checking against. You perform an action using the data and the interfaces you have set up in the preparation phase.
  • Verification: This is where you verify the results and make sure they are according to your expectations. You check the returned value of a function, or that some data is in the database, some is not, some has changed, a request has been made, something has happened, a method has been called, and so on.

Testing guidelines

Like software, tests can be good or bad, with the whole range of shades in the middle. In order to write good tests, here are some guidelines:

  • Keep them as simple as possible: It's okay to violate some good coding rules, such as hardcoding values or duplicating code. Tests need first and foremost to be as readable as possible and easy to understand. When tests are hard to read or understand, you can never be sure if they are actually making sure your code is performing correctly.
  • Tests should verify one thing and one thing only: It's very important that you keep them short and contained. It's perfectly fine to write multiple tests to exercise a single object or function. Just make sure that each test has one and only one purpose.
  • Tests should not make any unnecessary assumption when verifying data: This is tricky to understand at first, but say you are testing the return value of a function and it is an unordered list of numbers (like [2, 3, 1]). If the order in that list is random, in the test you may be tempted to sort it and compare it with [1, 2, 3]. If you do, you will introduce an extra assumption on the ordering of the result of your function call, and this is bad practice. You should always find a way to verify things without introducing any assumptions or any feature that doesn't belong in the use case you're describing with your test.
  • Tests should exercise the what, rather than the how: Tests should focus on checking what a function is supposed to do, rather than how it is doing it. For example, focus on the fact that it's calculating the square root of a number (the what), instead of on the fact that it is calling math.sqrt to do it (the how). Unless you're writing performance tests or you have a particular need to verify how a certain action is performed, try to avoid this type of testing and focus on the what. Testing the how leads to restrictive tests and makes refactoring hard. Moreover, the type of test you have to write when you concentrate on the how is more likely to degrade the quality of your testing code base when you amend your software frequently (more on this later).
  • Tests should assume the least possible in the preparation phase: Say you have 10 tests that are checking how a data structure is manipulated by a function. And let's say this data structure is a dict with five key/value pairs. If you put the complete dict in each test, the moment you have to change something in that dict, you also have to amend all ten tests. On the other hand, if you strip down the test data as much as you can, you will find that, most of the time, it's possible to have the majority of tests checking only a partial version of the data, and only a few running with a full version of it. This means that when you need to change your data, you will have to amend only those tests that are actually exercising it.
  • Test should run as fast as possible: A good test codebase could end up being much longer than the code being tested itself. It varies according to the situation and the developer but whatever the length, you'll end up having hundreds, if not thousands, of tests to run, which means the faster they run, the faster you can get back to writing code. When using TDD, for example, you run tests very often, so speed is essential.
  • Tests should use up the least possible amount of resources: The reason for this is that every developer who checks out your code should be able to run your tests, no matter how powerful their box is. It could be a skinny virtual machine or a neglected Jenkins box, your tests should run without chewing up too many resources.

    Note

    A Jenkins box is a machine that runs Jenkins, software that is capable of, amongst many other things, running your tests automatically. Jenkins is frequently used in companies where developers use practices like continuous integration, extreme programming, and so on.

Unit testing

Now that you have an idea about what testing is and why we need it, let's finally introduce the developer's best friend: the unit test.

Before we proceed with the examples, allow me to spend some words of caution: I'll try to give you the fundamentals about unit testing, but I don't follow any particular school of thought or methodology to the letter. Over the years, I have tried many different testing approaches, eventually coming up with my own way of doing things, which is constantly evolving. To put it as Bruce Lee would have:

"Absorb what is useful, discard what is useless and add what is specifically your own".

Writing a unit test

In order to explain how to write a unit test, let's help ourselves with a simple snippet:

data.py

def get_clean_data(source):
    data = load_data(source)
    cleaned_data = clean_data(data)
    return cleaned_data

The function get_clean_data is responsible for getting data from source, cleaning it, and returning it to the caller. How do we test this function?

One way of doing this is to call it and then make sure that load_data was called once with source as its only argument. Then we have to verify that clean_data was called once, with the return value of load_data. And, finally, we would need to make sure that the return value of clean_data is what is returned by the get_clean_data function as well.

In order to do this, we need to set up the source and run this code, and this may be a problem. One of the golden rules of unit testing is that anything that crosses the boundaries of your application needs to be simulated. We don't want to talk to a real data source, and we don't want to actually run real functions if they are communicating with anything that is not contained in our application. A few examples would be a database, a search service, an external API, a file in the filesystem, and so on.

We need these restrictions to act as a shield, so that we can always run our tests safely without the fear of destroying something in a real data source.

Another reason is that it may be quite difficult for a single developer to reproduce the whole architecture on their box. It may require the setting up of databases, APIs, services, files and folders, and so on and so forth, and this can be difficult, time consuming, or sometimes not even possible.

Note

Very simply put, an application programming interface (API) is a set of tools for building software applications. An API expresses a software component in terms of its operations, inputs and outputs, and underlying types. For example, if you create a software that needs to interface with a data provider service, it's very likely that you will have to go through their API in order to gain access to the data.

Therefore, in our unit tests, we need to simulate all those things in some way. Unit tests need to be run by any developer without the need for the whole system to be set up on their box.

A different approach, which I always favor when it's possible to do so, is to simulate entities without using fake objects, but using special purpose test objects instead. For example, if your code talks to a database, instead of faking all the functions and methods that talk to the database and programming the fake objects so that they return what the real ones would, I'd much rather prefer to spawn a test database, set up the tables and data I need, and then patch the connection settings so that my tests are running real code, against the test database, thereby doing no harm at all. In-memory databases are excellent options for these cases.

Note

One of the applications that allow you to spawn a database for testing, is Django. Within the django.test package you can find several tools that help you write your tests so that you won't have to simulate the dialog with a database. By writing tests this way, you will also be able to check on transactions, encodings, and all other database related aspects of programming. Another advantage of this approach consists in the ability of checking against things that can change from one database to another.

Sometimes, though, it's still not possible, and we need to use fakes, therefore let's talk about them.

Mock objects and patching

First of all, in Python, these fake objects are called mocks. Up to version 3.3, the mock library was a third-party library that basically every project would install via pip but, from version 3.3, it has been included in the standard library under the unittest module, and rightfully so, given its importance and how widespread it is.

The act of replacing a real object or function (or in general, any piece of data structure) with a mock, is called patching. The mock library provides the patch tool, which can act as a function or class decorator, and even as a context manager (more on this in Chapter 8, The Edges – GUIs and Scripts), that you can use to mock things out. Once you have replaced everything you need not to run, with suitable mocks, you can pass to the second phase of the test and run the code you are exercising. After the execution, you will be able to check those mocks to verify that your code has worked correctly.

Assertions

The verification phase is done through the use of assertions. An assertion is a function (or method) that you can use to verify equality between objects, as well as other conditions. When a condition is not met, the assertion will raise an exception that will make your test fail. You can find a list of assertions in the unittest module documentation, and their corresponding Pythonic version in the nose third-party library, which provides a few advantages over the sheer unittest module, starting from an improved test discovery strategy (which is the way a test runner detects and discovers the tests in your application).

A classic unit test example

Mocks, patches, and assertions are the basic tools we'll be using to write tests. So, finally, let's see an example. I'm going to write a function that takes a list of integers and filters out all those which aren't positive.

filter_funcs.py

def filter_ints(v):
    return [num for num in v if is_positive(num)]

def is_positive(n):
    return n > 0

In the preceding example, we define the filter_ints function, which basically uses a list comprehension to retain all the numbers in v that are positive, discarding zeros and negative ones. I hope, by now, any further explanation of the code would be insulting.

What is interesting, though, is to start thinking about how we can test it. Well, how about we call filter_ints with a list of numbers and we make sure that is_positive is called for each of them? Of course, we would have to test is_positive as well, but I will show you later on how to do that. Let's write a simple test for filter_ints now.

Note

Just to be sure we're on the same page, I am putting the code for this chapter in a folder called ch7, which lies within the root of our project. At the same level of ch7, I have created a folder called tests, in which I have placed a folder called test_ch7. In this folder I have one test file, called test_filter_func.py.

Basically, within the tests folder, I will recreate the tree structure of the code I'm testing, prepending everything with test_. This way, finding tests is really easy, as well as is keeping them tidy.

tests/test_ch7/test_filter_funcs.py

from unittest import TestCase  # 1
from unittest.mock import patch, call  # 2
from nose.tools import assert_equal  # 3
from ch7.filter_funcs import filter_ints  # 4

class FilterIntsTestCase(TestCase):  # 5

    @patch('ch7.filter_funcs.is_positive')  # 6
    def test_filter_ints(self, is_positive_mock):  # 7
        # preparation
        v = [3, -4, 0, 5, 8]

        # execution
        filter_ints(v)  # 8

        # verification
        assert_equal(
            [call(3), call(-4), call(0), call(5), call(8)],
            is_positive_mock.call_args_list
        )  # 9

My, oh my, so little code, and yet so much to say. First of all: #1. The TestCase class is the base class that we use to have a contained entity in which to run our tests. It's not just a bare container; it provides you with methods to write tests more easily.

On #2, we import patch and call from the unittest.mock module. patch is responsible for substituting an object with a Mock instance, thereby giving us the ability to check on it after the execution phase has been completed. call provides us with a nice way of expressing a (for example, function) call.

On #3, you can see that I prefer to use assertions from nose, rather than the ones that come with the unittest module. To give you an example, assert_equal(...) would become self.assertEqual(...) if I didn't use nose. I don't enjoy typing self. for any assertion, if there is a way to avoid it, and I don't particularly enjoy camel case, therefore I always prefer to use nose to make my assertions.

assert_equal is a function that takes two parameters (and an optional third one that acts as a message) and verifies that they are the same. If they are equal, nothing happens, but if they differ, then an AssertionError exception is raised, telling us something is wrong. When I write my tests, I always put the expected value as the first argument, and the real one as the second. This convention saves me time when I'm reading tests.

On #4, we import the function we want to test, and then (#5) we proceed to create the class where our tests will live. Each method of this class starting with test_, will be interpreted as a test. As you can see, we need to decorate test_filter_ints with patch (#6). Understanding this part is crucial, we need to patch the object where it is actually used. In this case, the path is very simple: ch7.filter_func.is_positive.

Tip

Patching can be very tricky, so I urge you to read the Where to patch section in the mock documentation: https://docs.python.org/3/library/unittest.mock.html#where-to-patch.

When we decorate a function using patch, like in our example, we get an extra argument in the test signature (#7), which I like to call as the patched function name, plus a _mock suffix, just to make it clear that the object has been patched (or mocked out).).

Finally, we get to the body of the test, and we have a very simple preparation phase in which we set up a list with at least one representative of all the integer number categories (negative, zero, and positive).

Then, in #8, we perform the execution phase, which runs the filter_ints function, without collecting its results. If all has gone as expected, the fake is_positive function must have been called with each of the integers in v.

We can verify this by comparing a list of call objects to the call_args_list attribute on the mock (#9). This attribute is the list of all the calls performed on the object since its creation.

Let's run this test. First of all, make sure that you install nose ($ pip freeze will tell you if you have it already):

$ pip install nose

Then, change into the root of the project (mine is called learning.python), and run the tests like this:

$ nosetests tests/test_ch7/
.
------------------------------------------------------------
Ran 1 test in 0.006s
OK

The output shows one dot (each dot is a test), a separation line, and the time taken to run the whole test suite. It also says OK at the end, which means that our tests were all successful.

Making a test fail

Good, so just for fun let's make one fail. In the test file, change the last call from call(8) to call(9), and run the tests again:

$ nosetests tests/test_ch7/
F
============================================================
FAIL: test_filter_ints (test_filter_funcs.FilterIntsTestCase)
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.4/unittest/mock.py", line 1125, in patched
    return func(*args, **keywargs)
  File "/home/fab/srv/learning.python/tests/test_ch7/test_filter_funcs.py", line 21, in test_filter_ints
    is_positive_mock.call_args_list
AssertionError: [call(3), call(-4), call(0), call(5), call(9)] != [call(3), call(-4), call(0), call(5), call(8)]
------------------------------------------------------------
Ran 1 test in 0.008s
FAILED (failures=1)

Wow, we made the beast angry! So much wonderful information, though. This tells you that the test test_filter_ints (with the path to it), was run and that it failed (the big F at the top, where the dot was before). It gives you a Traceback, that tells you that in the test_filter_funcs.py module, at line 21, when asserting on is_positive_mock.call_args_list, we have a discrepancy. The test expects the list of calls to end with a call(9) instance, but the real list ends with a call(8). This is nothing less than wonderful.

If you have a test like this, can you imagine what would happen if you refactored and introduced a bug into your function by mistake? Well, your tests will break! They will tell you that you have screwed something up, and here's the details. So, you go and check out what you broke.

Interface testing

Let's add another test that checks on the returned value. It's another method in the class, so I won't reproduce the whole code again:

tests/test_ch7/test_filter_funcs.py

def test_filter_ints_return_value(self):
    v = [3, -4, 0, -2, 5, 0, 8, -1]

    result = filter_ints(v)

    assert_list_equal([3, 5, 8], result)

This test is a bit different from the previous one. Firstly, we cannot mock the is_positive function, otherwise we wouldn't be able to check on the result. Secondly, we don't check on calls, but only on the result of the function when input is given.

I like this test much more than the previous one. This type of test is called an interface test because it checks on the interface (the set of inputs and outputs) of the function we're testing. It doesn't use any mocks, which is why I use this technique much more than the previous one. Let's run the new test suite and then let's see why I like interface testing more than those with mocks.

$ nosetests tests/test_ch7/
..
------------------------------------------------------------
Ran 2 tests in 0.006s
OK

Two tests ran, all good (I changed that 9 back to an 8 in the first test, of course).

Comparing tests with and without mocks

Now, let's see why I don't really like mocks and use them only when I have no choice. Let's refactor the code in this way:

filter_funcs_refactored.py

def filter_ints(v):
    v = [num for num in v if num != 0]  # 1
    return [num for num in v if is_positive(num)]

The code for is_positive is the same as before. But the logic in filter_ints has now changed in a way that is_positive will never be called with a 0, since they are all filtered out in #1. This leads to an interesting result, so let's run the tests again:

$ nosetests tests/test_ch7/test_filter_funcs_refactored.py 
F.
============================================================
FAIL: test_filter_ints (test_filter_funcs_refactored.FilterIntsTestCase)
------------------------------------------------------------
... omit ...
AssertionError: [call(3), call(-4), call(0), call(5), call(8)] != [call(3), call(-4), call(5), call(8)]
------------------------------------------------------------
Ran 2 tests in 0.002s
FAILED (failures=1)

One test succeeded but the other one, the one with the mocked is_positive function, failed. The AssertionError message shows us that we now need to amend the list of expected calls, removing call(0), because it is no longer performed.

This is not good. We have changed neither the interface of the function nor its behavior. The function is still keeping to its original contract. What we've done by testing it with a mocked object is limit ourselves. In fact, we now have to amend the test in order to use the new logic.

This is just a simple example but it shows one important flaw in the whole mock mechanism. You must keep your mocks up-to-date and in sync with the code they are replacing, otherwise you risk having issues like the preceding one, or even worse. Your tests may not fail because they are using mocked objects that perform fine, but because the real ones, now not in sync any more, are actually failing.

So use mocks only when necessary, only when there is no other way of testing your functions. When you cross the boundaries of your application in a test, try to use a replacement, like a test database, or a fake API, and only when it's not possible, resort to mocks. They are very powerful, but also very dangerous when not handled properly.

So, let's remove that first test and keep only the second one, so that I can show you another issue you could run into when writing tests. The whole test module now looks like this:

tests/test_ch7/test_filter_funcs_final.py

from unittest import TestCase
from nose.tools import assert_list_equal
from ch7.filter_funcs import filter_ints

class FilterIntsTestCase(TestCase):
    def test_filter_ints_return_value(self):
        v = [3, -4, 0, -2, 5, 0, 8, -1]
        result = filter_ints(v)
        assert_list_equal([3, 5, 8], result)

If we run it, it will pass.

A brief chat about triangulation. Now let me ask you: what happens if I change my filter_ints function to this?

filter_funcs_triangulation.py

def filter_ints(v):
    return [3, 5, 8]

If you run the test suite, the test we have will still pass! You may think I'm crazy but I'm showing you this because I want to talk about a concept called triangulation, which is very important when doing interface testing with TDD.

The whole idea is to remove cheating code, or badly performing code, by pinpointing it from different angles (like going to one vertex of a triangle from the other two) in a way that makes it impossible for our code to cheat, and the bug is exposed. We can simply modify the test like this:

tests/test_ch7/test_filter_funcs_final_triangulation.py

def test_filter_ints_return_value(self):
    v1 = [3, -4, 0, -2, 5, 0, 8, -1]
    v2 = [7, -3, 0, 0, 9, 1]

    assert_list_equal([3, 5, 8], filter_ints(v1))
    assert_list_equal([7, 9, 1], filter_ints(v2))

I have moved the execution section in the assertions directly, and you can see that we're now pinpointing our function from two different angles, thereby requiring that the real code be in it. It's no longer possible for our function to cheat.

Triangulation is a very powerful technique that teaches us to always try to exercise our code from many different angles, to cover all possible edge cases to expose any problems.

Boundaries and granularity

Let's now add a test for the is_positive function. I know it's a one-liner, but it presents us with opportunity to discuss two very important concepts: boundaries and granularity.

That 0 in the body of the function is a boundary, the > in the inequality is how we behave with regards to this boundary. Typically, when you set a boundary, you divide the space into three areas: what lies before the boundary, after the boundary, and on the boundary itself. In the example, before the boundary we find the negative numbers, the boundary is the element 0 and, after the boundary, we find the positive numbers. We need to test each of these areas to be sure we're testing the function correctly. So, let's see one possible solution (I will add the test to the class, but I won't show the repeated code):

tests/test_ch7/test_filter_funcs_is_positive_loose.py

def test_is_positive(self):
    assert_equal(False, is_positive(-2))  # before boundary
    assert_equal(False, is_positive(0))  # on the boundary
    assert_equal(True, is_positive(2))  # after the boundary

You can see that we are exercising one number for each different area around the boundary. Do you think this test is good? Think about it for a minute before reading on.

The answer is no, this test is not good. Not good enough, anyway. If I change the body of the is_positive function to read return n > 1, I would expect my test to fail, but it won't. -2 is still False, as well as 0, and 2 is still True. Why does that happen? It is because we haven't taken granularity properly into account. We're dealing with integers, so what is the minimum granularity when we move from one integer to the next one? It's 1. Therefore, when we surround the boundary, taking all three areas into account is not enough. We need to do it with the minimum possible granularity. Let's change the test:

tests/test_ch7/test_filter_funcs_is_positive_correct.py

def test_is_positive(self):
    assert_equal(False, is_positive(-1))
    assert_equal(False, is_positive(0))
    assert_equal(True, is_positive(1))

Ah, now it's better. Now if we change the body of is_positive to read return n > 1, the third assertion will fail, which is what we want. Can you think of a better test?

tests/test_ch7/test_filter_funcs_is_positive_better.py

def test_is_positive(self):
    assert_equal(False, is_positive(0))
    for n in range(1, 10 ** 4):
        assert_equal(False, is_positive(-n))
        assert_equal(True, is_positive(n))

This test is even better. We test the first ten thousand integers (both positive and negative) and 0. It basically provides us with a better coverage than just the one across the boundary. So, keep this in mind. Zoom closely around each boundary with minimal granularity, but try to expand as well, finding a good compromise between optimal coverage and execution speed. We would love to check the first billion integers, but we can't wait days for our tests to run.

A more interesting example

Okay, this was as gentle an introduction as I could give you, so let's move on to something more interesting. Let's write and test a function that flattens a nested dictionary structure. For a couple of years, I have worked very closely with Twitter and Facebook APIs. Handling such humongous data structures is not easy, especially since they're often deeply nested. It turns out that it's much easier to flatten them in a way that you can work on them without losing the original structural information, and then recreate the nested structure from the flat one. To give you an example, we want something like this:

data_flatten.py

nested = {
    'fullname': 'Alessandra',
    'age': 41,
    'phone-numbers': ['+447421234567', '+447423456789'],
    'residence': {
        'address': {
            'first-line': 'Alexandra Rd',
            'second-line': '',
        },
        'zip': 'N8 0PP',
        'city': 'London',
        'country': 'UK',
    },
}

flat = {
    'fullname': 'Alessandra',
    'age': 41,
    'phone-numbers': ['+447421234567', '+447423456789'],
    'residence.address.first-line': 'Alexandra Rd',
    'residence.address.second-line': '',
    'residence.zip': 'N8 0PP',
    'residence.city': 'London',
    'residence.country': 'UK',
}

A structure like flat is much simpler to manipulate. Before writing the flattener, let's make some assumptions: the keys are strings, we leave every data structure as it is unless it's a dictionary, in which case we flatten it, we use the dot as separator, but we want to be able to pass a different one to our function. Here's the code:

data_flatten.py

def flatten(data, prefix='', separator='.'):
    """Flattens a nested dict structure. """
    if not isinstance(data, dict):
        return {prefix: data} if prefix else data

    result = {}
    for (key, value) in data.items():
        result.update(
            flatten(
                value,
                _get_new_prefix(prefix, key, separator),
                separator=separator))
    return result

def _get_new_prefix(prefix, key, separator):
    return (separator.join((prefix, str(key)))
            if prefix else str(key))

The preceding example is not difficult, but also not trivial so let's go through it. At first, we check if data is a dictionary. If it's not a dictionary, then it's data that doesn't need to be flattened; therefore, we simply return either data or, if prefix is not an empty string, a dictionary with one key/value pair: prefix/data.

If instead data is a dict, we prepare an empty result dict to return, then we parse the list of data's items, which, at I'm sure you will remember, are 2-tuples (key, value). For each (key, value) pair, we recursively call flatten on them, and we update the result dict with what's returned by that call. Recursion is excellent when running through nested structures.

At a glance, can you understand what the _get_new_prefix function does? Let's use the inside-out technique once again. I see a ternary operator that returns the stringified key when prefix is an empty string. On the other hand, when prefix is a non-empty string, we use the separator to join the prefix with the stringified version of key. Notice that the braces inside the call to join aren't redundant, we need them. Can you figure out why?

Let's write a couple of tests for this function:

tests/test_ch7/test_data_flatten.py

# ... imports omitted ...
class FlattenTestCase(TestCase):

    def test_flatten(self):
        test_cases = [
            ({'A': {'B': 'C', 'D': [1, 2, 3], 'E': {'F': 'G'}},
              'H': 3.14,
              'J': ['K', 'L'],
              'M': 'N'},
             {'A.B': 'C',
              'A.D': [1, 2, 3],
              'A.E.F': 'G',
              'H': 3.14,
              'J': ['K', 'L'],
              'M': 'N'}),
            (0, 0),
            ('Hello', 'Hello'),
            ({'A': None}, {'A': None}),
        ]
        for (nested, flat) in test_cases:
            assert_equal(flat, flatten(nested))

    def test_flatten_custom_separator(self):
        nested = {'A': {'B': {'C': 'D'}}}
        assert_equal(
            {'A#B#C': 'D'}, flatten(nested, separator='#'))

Let's start from test_flatten. I defined a list of 2-tuples (nested, flat), each of which represents a test case (I highlighted nested to ease reading). I have one big dict with three levels of nesting, and then some smaller data structures that won't change when passed to the flatten function. These test cases are probably not enough to cover all edge cases, but they should give you a good idea of how you could structure a test like this. With a simple for loop, I cycle through each test case and assert that the result of flatten(nested) is equal to flat.

Tip

One thing to say about this example is that, when you run it, it will show you that two tests have been run. This is actually not correct because even if technically there were only two tests running, in one of them we have multiple test cases. It would be nicer to have them run in a way that they were recognized as separate. This is possible through the use of libraries such as nose-parameterized, which I encourage you to check out. It's on https://pypi.python.org/pypi/nose-parameterized.

I also provided a second test to make sure the custom separator feature worked. As you can see, I used only one data structure, which is much smaller. We don't need to go big again, nor to test other edge cases. Remember, tests should make sure of one thing and one thing only, and test_flatten_custom_separator just takes care of verifying whether or not we can feed the flatten function a different separator.

I could keep blathering on about tests for about another book if only I had the space, but unfortunately, we need to stop here. I haven't told you about doctests (tests written in the documentation using a Python interactive shell style), and about another half a million things that could be said about this subject. You'll have to discover that for yourself.

Take a look at the documentation for the unittest module, the nose and nose-parameterized libraries, and pytest (http://pytest.org/), and you will be fine. In my experience, mocking and patching seem to be quite hard to get a good grasp of for developers who are new to them, so allow yourself a little time to digest these techniques. Try and learn them gradually.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset