Using iterators

It's a no-brainer that a key input for a data science program is data. Data may vary in size—some of them may fit into memory and some may not. The record access mechanism can vary from one data format to another. Interestingly, different algorithms may demand chunks of varying length to process. For example, let's say that you are writing a stochastic gradient descent algorithm and you want to pass chunks of 5,000 records in each epoch, it will be very nice to have an abstraction that can handle the accessing of the data, understanding the data format, looping through the data, and providing the caller with the required data. This will result in a clean code. Most of the time, the interesting part lies in what we do with the data and not how we access the data. Python provides us with an elegant way in the form of iterators to handle all of these requirements.

Getting ready

An iterator in Python implements an iterator pattern. It allows us to go over a sequence one by one without materializing the whole sequence!

How to do it…

Let's create a simple iterator called simple counter and provide it with some code on how to effectively use the iterator:

# 1.	Let us write a simple iterator.
class SimpleCounter(object):
    def __init__(self, start, end):
        self.current = start
        self.end = end

    def __iter__(self):
        'Returns itself as an iterator object'
        return self

    def next(self):
        'Returns the next value till current is lower than end'
        if self.current > self.end:


            raise StopIteration
        else:
            self.current += 1
            return self.current - 1
            
# 2.	Now let us try to access the iterator
c = SimpleCounter(1,3)
print c.next()
print c.next()
print c.next()
print c.next()


# 3.	Another way to access
for entry in iter(c):
    print entry     

How it works…

In step 1, we defined a class by the name of SimpleCounter. The __init__ constructor takes two parameters, start and end, defining the beginning and end of our sequence. Note the two methods, __iter__ and next. Any object in Python that is meant to be an iterator object should support these two functions. The __iter__ returns the complete class object as an iterator object. The next method returns the next value in the iterator.

As shown in step 2, we can access the successive elements in the iterator using the next() function. Python also provides us with a convenient function, iter(), which can be used in a loop to access elements sequentially as shown in step 3. The iter() uses the next() function internally.

A point to note is that an iterator object can be used only once. After running the preceding code, we will try to access the iterator as follows:

print next(c)

It will throw the StopIteration exception. Calling c.next() after the sequence has exhausted will result in a StopIteration exception:

    raise StopIteration
StopIteration
>>>

The iter() function handles this exception and exits the loop once the data has been exhausted.

There's more…

Let's see another example of an iterator. Let's say that we need to access a very large file in our program; however, in our program, we will work through it only one line at a time:

f = open(some_file_of_interest)
for l in iter(f):
print l
f.close()

In Python, a file object is an iterator; it supports the iter() and next() functions. Hence, instead of loading the whole file in memory, we can work with a single line at a time.

An iterator gives you the power to write custom code in order to access your data sources in a manner that your application demands.

The following link provides more information about how iterators can be used in various ways in Python:

Infinite iterators, count(), cycle() and repeat() in itertools:

https://docs.python.org/2/library/itertools.html#itertools.cycle

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset