Generator expressions

Generator expressions are a cross between comprehensions and generator functions. They use a similar syntax as comprehensions, but they result in the creation of a generator object which produces the specified sequence lazily. The syntax for generator expressions is very similar to list comprehensions:

( expr(item) for item in iterable )

It is delimited by parentheses instead of the brackets used for list comprehensions.

Generator expressions are useful for situations where you want the lazy evaluation of generators with the declarative concision of comprehensions. For example, this generator expression yields a list of the first one-million square numbers:

>>> million_squares = (x*x for x in range(1, 1000001))

At this point, none of the squares have been created; we've just captured the specification of the sequence into a generator object:

>>> million_squares
<generator object <genexpr> at 0x1007a12d0>

We can force evaluation of the generator by using it to create a (long!) list:

>>> list(million_squares)
. . .
999982000081, 999984000064, 999986000049, 999988000036, 999990000025,
999992000016, 999994000009, 999996000004, 999998000001, 1000000000000]

This list obviously consumes a significant chunk of memory - in this case about 40 MB for the list object and the integer objects contained therein.

Generator objects only run once

Notice that a generator object is just an iterator and, once run exhaustively in this way, will yield no more items. Repeating the previous statement returns an empty list:

>>> list(million_squares)
[]

Generators are single use objects. Each time we call a generator function we create a new generator object. To recreate a generator from a generator expression we must execute the expression itself once more.

Iteration without memory

Let's raise the stakes by computing the sum of the first ten million squares using the built-in sum() function which accepts an iterable series of numbers. If we were to use a list comprehension we could expect this to consume around 400 MB of memory. Using, a generator expression memory usage will be insignificant:

>>> sum(x*x for x in range(1, 10000001))
333333383333335000000

This produces a result in a second or so and uses almost no memory.

Optional parentheses

Looking carefully, you see that in this case we didn't supply separate enclosing parentheses for the generator expression in addition to those needed for the sum() function call. This elegant ability to have the parentheses used for the function call also serve for the generator expression aids readability. You can include the second set of parentheses if you wish.

Using an if-clause in generator expressions

As with comprehensions, you can include an if-clause at the end of the generator expression. Reusing our admittedly inefficient is_prime() predicate, we can determine the sum of those integers from the first thousand which are prime like this:

>>> sum(x for x in range(1001) if is_prime(x))
76127

Note that is is not the same thing as computing the sum of the first 1000 primes, which is a more awkward question because we don't know in advance how many integers we need to test before we clock up a thousand primes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset