Iterators

In typical design pattern parlance, an iterator is an object with a next() method and a done() method; the latter returns True if there are no items left in the sequence. In a programming language without built-in support for iterators, the iterator would be looped over like this:

while not iterator.done():
    item = iterator.next()
    # do something with the item

In Python, iteration is a special feature, so the method gets a special name, __next__. This method can be accessed using the next(iterator) built-in. Rather than a done method, the iterator protocol raises StopIteration to notify the loop that it has completed. Finally, we have the much more readable for item in iterator syntax to actually access items in an iterator instead of messing around with a while loop. Let's look at these in more detail.

The iterator protocol

The abstract base class Iterator, in the collections.abc module, defines the iterator protocol in Python. As mentioned, it must have a __next__ method that the for loop (and other features that support iteration) can call to get a new element from the sequence. In addition, every iterator must also fulfill the Iterable interface. Any class that provides an __iter__ method is iterable; that method must return an Iterator instance that will cover all the elements in that class. Since an iterator is already looping over elements, its __iter__ function traditionally returns itself.

This might sound a bit confusing, so have a look at the following example, but note that this is a very verbose way to solve this problem. It clearly explains iteration and the two protocols in question, but we'll be looking at several more readable ways to get this effect later in this chapter:

class CapitalIterable:
    def __init__(self, string):
        self.string = string

    def __iter__(self):
        return CapitalIterator(self.string)


class CapitalIterator:
    def __init__(self, string):
        self.words = [w.capitalize() for w in string.split()]
        self.index = 0

    def __next__(self):
        if self.index == len(self.words):
            raise StopIteration()

        word = self.words[self.index]
        self.index += 1
        return word

    def __iter__(self):
        return self

This example defines an CapitalIterable class whose job is to loop over each of the words in a string and output them with the first letter capitalized. Most of the work of that iterable is passed to the CapitalIterator implementation. The canonical way to interact with this iterator is as follows:

>>> iterable = CapitalIterable('the quick brown fox jumps over the lazy dog')
>>> iterator = iter(iterable)
>>> while True:
...     try:
...         print(next(iterator))
...     except StopIteration:
...         break
...     
The
Quick
Brown
Fox
Jumps
Over
The
Lazy
Dog

This example first constructs an iterable and retrieves an iterator from it. The distinction may need explanation; the iterable is an object with elements that can be looped over. Normally, these elements can be looped over multiple times, maybe even at the same time or in overlapping code. The iterator, on the other hand, represents a specific location in that iterable; some of the items have been consumed and some have not. Two different iterators might be at different places in the list of words, but any one iterator can mark only one place.

Each time next() is called on the iterator, it returns another token from the iterable, in order. Eventually, the iterator will be exhausted (won't have any more elements to return), in which case Stopiteration is raised, and we break out of the loop.

Of course, we already know a much simpler syntax for constructing an iterator from an iterable:

>>> for i in iterable:
...     print(i)
...     
The
Quick
Brown
Fox
Jumps
Over
The
Lazy
Dog

As you can see, the for statement, in spite of not looking terribly object-oriented, is actually a shortcut to some obviously object-oriented design principles. Keep this in mind as we discuss comprehensions, as they, too, appear to be the polar opposite of an object-oriented tool. Yet, they use the exact same iteration protocol as for loops and are just another kind of shortcut.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset