Chapter 9. The Iterator Pattern
We've discussed how many of Python's built-ins and idioms that seem, at first blush, to be non-object-oriented are actually providing access to major objects under the hood. In this chapter, we'll discuss how the for
loop that seems so structured is actually a lightweight wrapper around a set of object-oriented principles. We'll also see a variety of extensions to this syntax that automatically create even more types of object. We will cover:
What design patterns are
The iterator protocol—one of the most powerful design patterns
List, set, and dictionary comprehensions
Generators and coroutines
When engineers and architects decide to build a bridge, or a tower, or a building, they follow certain principles to ensure structural integrity. There are various possible designs for bridges (suspension or cantilever, for example), but if the engineer doesn't use one of the standard designs, and doesn't have a brilliant new design, it is likely the bridge he/she designs will collapse.
Design patterns are an attempt to bring this same formal definition for correctly designed structures to software engineering. There are many different design patterns to solve different general problems. People who create design patterns first identify a common problem faced by developers in a wide variety of situations. They then suggest what might be considered the ideal solution for that problem, in terms of object-oriented design.
Knowing a design pattern and choosing to use it in our software does not, however, guarantee that we are creating a "correct" solution. In 1907, the...
In typical design pattern parlance, an iterator is an object with a next()
method and a done()
method; the latter returns True
if there are no items left in the sequence. In a programming language without built-in support for iterators, the iterator would be looped over like this:
In Python, iteration is a special feature, so the method gets a special name, __next__
. This method can be accessed using the next(iterator)
built-in. Rather than a done
method, the iterator protocol raises StopIteration
to notify the loop that it has completed. Finally, we have the much more readable for item in iterator
syntax to actually access items in an iterator instead of messing around with a while
loop. Let's look at these in more detail.
The abstract base class Iterator
, in the collections.abc
module, defines the iterator protocol in Python. As mentioned, it must have a __next__
method that...
Comprehensions are simple, but powerful, syntaxes that allow us to transform or filter an iterable object in as little as one line of code. The resultant object can be a perfectly normal list, set, or dictionary, or it can be a generator expression that can be efficiently consumed in one go.
List comprehensions are one of the most powerful tools in Python, so people tend to think of them as advanced. They're not. Indeed, I've taken the liberty of littering previous examples with comprehensions and assuming you'd understand them. While it's true that advanced programmers use comprehensions a lot, it's not because they're advanced, it's because they're trivial, and handle some of the most common operations in software development.
Let's have a look at one of those common operations; namely, converting a list of items into a list of related items. Specifically, let's assume we just read a list of strings from a file, and now we want to convert it to a list of...
Generator expressions are actually a sort of comprehension too; they compress the more advanced (this time it really is more advanced!) generator syntax into one line. The greater generator syntax looks even less object-oriented than anything we've seen, but we'll discover that once again, it is a simple syntax shortcut to create a kind of object.
Let's take the log file example a little further. If we want to delete the WARNING
column from our output file (since it's redundant: this file contains only warnings), we have several options, at various levels of readability. We can do it with a generator expression:
That's perfectly readable, though I wouldn't want to make the expression much more complicated than that. We...
Coroutines are extremely powerful constructs that are often confused with generators. Many authors inappropriately describe coroutines as "generators with a bit of extra syntax." This is an easy mistake to make, as, way back in Python 2.5, when coroutines were introduced, they were presented as "we added a send
method to the generator syntax." This is further complicated by the fact that when you create a coroutine in Python, the object returned is a generator. The difference is actually a lot more nuanced and will make more sense after you've seen a few examples.
Note
While coroutines in Python are currently tightly coupled to the generator syntax, they are only superficially related to the iterator protocol we have been discussing. The upcoming (as this is published) Python 3.5 release makes coroutines a truly standalone object and will provide a new syntax to work with them.
The other thing to bear in mind is that coroutines are pretty hard to understand. They are not used all...
One of the fields in which Python is the most popular these days is data science. Let's implement a basic machine learning algorithm! Machine learning is a huge topic, but the general idea is to make predictions or classifications about future data by using knowledge gained from past data. Uses of such algorithms abound, and data scientists are finding new ways to apply machine learning every day. Some important machine learning applications include computer vision (such as image classification or facial recognition), product recommendation, identifying spam, and speech recognition. We'll look at a simpler problem: given an RGB color definition, what name would humans identify that color as?
There are more than 16 million colors in the standard RGB color space, and humans have come up with names for only a fraction of them. While there are thousands of names (some quite ridiculous; just go to any car dealership or makeup store), let's build a classifier that attempts to divide...
If you don't use comprehensions in your daily coding very often, the first thing you should do is search through some existing code and find some for
loops. See if any of them can be trivially converted to a generator expression or a list, set, or dictionary comprehension.
Test the claim that list comprehensions are faster than for
loops. This can be done with the built-in timeit
module. Use the help documentation for the timeit.timeit
function to find out how to use it. Basically, write two functions that do the same thing, one using a list comprehension, and one using a for
loop. Pass each function into timeit.timeit
, and compare the results. If you're feeling adventurous, compare generators and generator expressions as well. Testing code using timeit
can become addictive, so bear in mind that code does not need to be hyperfast unless it's being executed an immense number of times, such as on a huge input list or file.
Play around with generator functions. Start with basic iterators...
In this chapter, we learned that design patterns are useful abstractions that provide "best practice" solutions for common programming problems. We covered our first design pattern, the iterator, as well as numerous ways that Python uses and abuses this pattern for its own nefarious purposes. The original iterator pattern is extremely object-oriented, but it is also rather ugly and verbose to code around. However, Python's built-in syntax abstracts the ugliness away, leaving us with a clean interface to these object-oriented constructs.
Comprehensions and generator expressions can combine container construction with iteration in a single line. Generator objects can be constructed using the yield
syntax. Coroutines look like generators on the outside but serve a much different purpose.
We'll cover several more design patterns in the next two chapters.