Putting it all together

Let's take a short detour to try out some of the tools we've introduced on a slightly larger example. Textbooks typically avoid such pragmatism, especially in the early chapters, but we think it's fun to apply new ideas to practical situations. To avoid getting off the the wrong stylistic foot, we'll need to introduce a few "black-box" components to get the job done, but you'll learn about them in detail later, so don't worry.

We're going to write a longer snippet at the REPL, and briefly introduce the with statement. Our code will fetch some text data for some classic literature from the web using a Python standard library function called urlopen(). Here's the code entered at the REPL in full. We've annotated this code snippet with line numbers to facilitate referring to lines from the explanation:

>>> from urllib.request import urlopen
>>> with urlopen('http://sixty-north.com/c/t.txt') as story:
...     story_words = []
...     for line in story:
...         line_words = line.split()
...         for word in line_words:
...             story_words.append(word)
...

We'll work through this code, explaining each line in turn.

To get access to urlopen() we need to import the function from the
request module, which itself resides within the standard library urllib
package.
We're going to call urlopen() with the URL to the story text. We use a Python construct called a with-block to manage the resource obtained from the URL, since fetching the resource from the web requires operating system sockets and suchlike. We'll be talking more about with statements in a later chapter, but for now it's enough to know that using a with statement with objects which use external resources is good practice to avoid so-called resource leaks. The with statement calls the urlopen() function and binds the response object to a variable named story.
Notice that the with statement is terminated by a colon, which introduces a new block, so within the block we must indent four spaces. We create an empty list which ultimately will hold all of the words from the retrieved text.
We open a for-loop which will iterate through the story. Recall that for-loops request items one-by-one from the expression on the right of the in keyword — in this case story — and assign them in turn to the the name on the left — in this case line. It so happens that that type of the HTTP response object referred to by story yields successive lines of text from the response body when iterated over in this way, so the for-loop retrieves one line of text at a time from the story. The for statement is also terminated by a colon because it introduces the body of the for-loop, which is a new block and hence a further level of indentation.
The for each line of text, we use the split() method to divide it into words on whitespace boundaries, resulting in a list of words we call line_words.
Now we use a second for-loop nested inside the first to iterate over this list of words.
We append() each word in turn to the accumulating story_words list.

Finally, we enter a blank line at the three dots prompt to close all open blocks — in this case the inner for-loop , the outer for-loop, and the with-block will all be terminated. The block will be executed, and after a short delay, Python now returns us to the regular triple-arrow prompt. At this point if Python gives you an error, such as a SyntaxError or IndentationError, you should go back, review what you entered, and carefully re-enter the code until Python accepts the whole block without complaint. If you get an HTTPError, then you were unable to fetch the resource over the Internet, and you should check your network connection or try again later, although it's worth checking that you typed the URL correctly.

We can look at the words we've collected by asking Python to evaluate the valueof story_words:

>>> story_words
[b'It', b'was', b'the', b'best', b'of', b'times', b'it', b'was', b'the', b'worst', b'of', b'times',b'it', b'was', b'the', b'age', b'of', b'wisdom', b'it', b'was', b'the', b'age', b'of', b'foolishness', b'it', b'was', b'the', b'epoch', b'of', b'belief', b'it', b'was', b'the', b'epoch', b'of', b'incredulity', b'it', b'was', b'the', b'season', b'of', b'Light', b'it', b'was', b'the', b'season', b'of', b'Darkness', b'it', b'was', b'the', b'spring', b'of', b'hope', b'it', b'was', b'the', b'winter', b'of', b'despair', b'we', b'had', b'everything', b'before', b'us', b'we', b'had', b'nothing', b'before', b'us', b'we', b'were', b'all', b'going', b'direct', b'to', b'Heaven', b'we', b'were', b'all', b'going', b'direct', b'the', b'other', b'way', b'in', b'short', b'the', b'period', b'was', b'so', b'far', b'like', b'the', b'present', b'period', b'that', b'some', b'of', b'its', b'noisiest',b'authorities', b'insisted', b'on', b'its', b'being', b'received', b'for', b'good', b'or', b'for', b'evil', b'in', b'the', b'superlative', b'degree', b'of', b'comparison', b'only']

This sort of exploratory programming at the REPL is very common for Python, as it allows us to figure out what bits of code do before we decide to use them. In this case notice that each of the single-quoted words is prefixed by a lower-case letter b meaning that we have a list of bytes objects where we would have preferred a list of str objects. This is because the HTTP request transferred raw bytes to us over the network.

To get a list of strings we should decode the byte stream in each line from UTF-8 into Unicode strings. We can do this by inserting a call to the decode() method of the bytes object, and then operating on the resulting Unicode string. The Python REPL supports a simple command history, and by careful use of the up and down arrow keys, we can re-enter our snippet, although there's no need to re-import urlopen, so we can skip the first line:

 >>> with urlopen('http://sixty-north.com/c/t.txt') as story:
 … story_words = []
 … for line in story:
 … line_words = line.decode('utf-8').split()
 … for word in line_words:
 … story_words.append(word)
 …

It is the fourth line here we have changed – you can just edit it using the left and right arrow keys to insert the requisite call to decode() when you get to that part of the command history. When we re-run the block and take a fresh look at story_words, we should see we have a list of strings:

>>> story_words
['It', 'was', 'the', 'best', 'of', 'times', 'it',
'was', 'the', 'worst', 'of', 'times', 'it', 'was', 'the', 'age', 'of',
'wisdom', 'it', 'was', 'the', 'age', 'of', 'foolishness', 'it', 'was',
'the', 'epoch', 'of', 'belief', 'it', 'was', 'the', 'epoch', 'of',
'incredulity', 'it', 'was', 'the', 'season', 'of', 'Light', 'it',
'was', 'the', 'season', 'of', 'Darkness', 'it', 'was', 'the',
'spring', 'of', 'hope', 'it', 'was', 'the', 'winter', 'of', 'despair',
'we', 'had', 'everything', 'before', 'us', 'we', 'had', 'nothing',
'before', 'us', 'we', 'were', 'all', 'going', 'direct', 'to',
'Heaven', 'we', 'were', 'all', 'going', 'direct', 'the', 'other',
'way', 'in', 'short', 'the', 'period', 'was', 'so', 'far', 'like',
'the', 'present', 'period', 'that', 'some', 'of', 'its', 'noisiest',
'authorities', 'insisted', 'on', 'its', 'being', 'received', 'for',
'good', 'or', 'for', 'evil', 'in', 'the', 'superlative', 'degree',
'of', 'comparison', 'only']

We've just about reached the limit of what's comfortable to enter and revise at the Python REPL, so in the next chapter we'll look at how to move this code into a file where it can be more easily worked with in a text editor.

Table of Contents for Putting it all together

Create new playlist

Sign In

Sign Up

Table of Contents for
Putting it all together