Generator expressions

Documentatation

  1. Generator expressions in the Python Tutorial
  2. Generator expressions in the Python Language Reference
  3. Generator expressions and list comprehensions in the Functional Programming HOWTO

List comprehension vs. generator expression

The following code creates two lists, oldList and newList.

oldList = [10, 110, 50, 120, 30]
newList = [item for item in oldList if item >= 100]  #list comprehension
print(f"type(newList) = {type(newList)}")

for item in newList:
    print(item)
type(newList) = <class 'list'>
110
120

The following code produces the same output but creates only one list.

oldList = [10, 110, 50, 120, 30]
it = (item for item in oldList if item >= 100)   #generator expression
print(f"type(it) = {type(it)}")

for item in it:
    print(item)
type(it) = <class 'generator'>
110
120

The above code does the same thing as

def f(oldList):                    #generator function
    for item in oldList:
        if item >= 100:
            yield item

oldList = [10, 110, 50, 120, 30]
it = f(oldList)
print(f"type(it) = {type(it)}")

for item in it:
    print(item)
type(it) = <class 'generator'>
110
120

As we have seen, a generator function is often just a yield statement inside a for loop, or maybe just a yield statement inside an if statement inside a for loop. In this case, a generator expression is a convenient abbreviation for the generator function.

Get only the items that you need

If you want only the first number that is ≥ 100, call next only once.

oldList = [10, 110, 50, 120, 30]
it = (item for item in oldList if item >= 100)

try:
    i = next(it)   #means i = it.__next__()
    print(f"The first number greater >= 100 is {i}.")
except StopIteration:
    print("All the numbers in the oldList are < 100.")
The first number greater than or equal to 100 is 110.

More compact way of doing the same thing:

oldList = [10, 110, 50, 120, 30]

try:
    i = next(item for item in oldList if item >= 100)
    print(f"The first number greater >= 100 is {i}.")
except StopIteration:
    print("All the numbers in the oldList are < 100.")
The first number greater than or equal to 100 is 110.

For this job, a list comprehension is worse than a generator for three reasons:

  1. The list created by the list comprehension takes up memory. The generator creates no list.
  2. It takes time for the list comprehension to create the list.
  3. Most of the list is wasted since only the first item is used.
oldList = [10, 110, 50, 120, 30]
newList = [item for item in oldList if item >= 100]

try:
    i = newList[0]
    print(f"The first number >= 100 is {i}.")
except IndexError:
    print("All the numbers in the oldList are < 100.")
The first number >= 100 is 110.

Old fashioned way:

oldList = [10, 110, 50, 120, 30]

for i in oldList:
    if i >= 100:
        print(f"The first number >= 100 is {i}.")
        break
else:   #Arrive here if the break statement was never executed.
    print("All the numbers in the oldList are < 100.")
The first number >= 100 is 110.

Three ways to create the same iterator

Create the iterator manually:

class FloatRange(object):
    "A range of n+1 equally spaced floats."

    def __init__(self, start, end, n):
        self.start = start
        self.end = end
        self.n = n

    def __iter__(self):
        return FloatRange_iterator(self.start, self.end, self.n)


class FloatRange_iterator(object):
    def __init__(self, start, end, n):
        self.start = start
        self.end = end
        self.n = n
        self.i = 0

    def __iter__(self):
        return self

    def __next__(self):
        if self.i >= self.n + 1:
            raise StopIteration
        result = self.start + (self.end - self.start) * self.i / self.n
        self.i += 1
        return result


for f in FloatRange(0.0, 1.0, 10):
    print(f)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0

With a generator function:

def FloatRange(start, end, n):
    for i in range(n + 1):
        yield start + (end - start) * i / n


for f in FloatRange(0.0, 1.0, 10):
    print(f)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0

With a generator expression:

def FloatRange(start, end, n):
    return (start + (end - start) * i / n for i in range(n + 1))


for f in FloatRange(0.0, 1.0, 10):
    print(f)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0

Two very different objects

li is a list containing 11 floats. it is an iterator whose __next__ method will return a different float each time you call it. But on the twelfth call, it will raise a StopIteration exception.

start = 0.0
end = 1.0
n = 10

li = [start + (end - start) * i / n for i in range(n + 1)]
it = (start + (end - start) * i / n for i in range(n + 1))

for f in li:   #or for f in it:
    print(f)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0