Modern Python Standard Library Cookbook

4.4 (7 reviews total)
By Alessandro Molina
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Containers and Data Structures

About this book

The Python 3 Standard Library is a vast array of modules that you can use for developing various kinds of applications. It contains an exhaustive list of libraries, and this book will help you choose the best one to address specific programming problems in Python.

The Modern Python Standard Library Cookbook begins with recipes on containers and data structures and guides you in performing effective text management in Python. You will find Python recipes for command-line operations, networking, filesystems and directories, and concurrent execution. You will learn about Python security essentials in Python and get to grips with various development tools for debugging, benchmarking, inspection, error reporting, and tracing. The book includes recipes to help you create graphical user interfaces for your application. You will learn to work with multimedia components and perform mathematical operations on date and time. The recipes will also show you how to deploy different searching and sorting algorithms on your data.

By the end of the book, you will have acquired the skills needed to write clean code in Python and develop applications that meet your needs.

Publication date:
August 2018
Publisher
Packt
Pages
366
ISBN
9781788830829

 

Chapter 1. Containers and Data Structures

In this chapter, we will cover the following recipes:

  • Counting frequencies—count occurrences of any hashable value
  • Dictionary with fallback—have a fallback value for any missing key
  • Unpacking multiple—keyword arguments—how to use ** more than once
  • Ordered dictionaries—maintaining order of keys in a dictionary
  • MultiDict—dictionary with multiple values per key
  • Prioritizing entries—efficiently get the top of sorted entries
  • Bunch—dictionaries that behave like objects
  • Enumerations—handle a known set of states
 

Introduction


Python has a very easy and flexible set of built-in containers. As a Python developer, there is little you can't achieve with a dict or a list. The convenience of Python dictionaries and lists is such that developers often forget that those have limits. Like any data structure, they are optimized and designed for specific use cases and might be inefficient in some conditions, or even unable to handle them.

Ever tried to put a key in a dictionary twice? Well you can't, because Python dictionaries are designed as hash tables with unique keys, but the MultiDict recipe will show you how to do that. Ever tried to grab the lowest/highest values out of a list without traversing it whole? The list itself can't, but in the Prioritized entries recipe, we will see how to achieve that.

The limits of standard Python containers are well known to Python experts. For that reason, the standard library has grown over the years to overcome those limits, and frequently there are patterns so common that their name is widely recognized, even though they are not formally defined.

 

Counting frequencies


A very common need in many kinds of programs is to count the occurrences of a value or of an event, which means counting frequency. Be it the need to count words in text, count likes on a blog post, or track scores for players of a video game, in the end counting frequency means counting how many we have of a specific value.

The most obvious solution for such a need would be to keep around counters for the things we need to count. If there are two, three, or four, maybe we can just track them in some dedicated variables, but if there are hundreds, it's certainly not feasible to keep around such a large amount of variables and we will quickly end up with a solution based on a container to collect all those counters.

How to do it...

Here are the steps for this recipe:

  1. Suppose we want to track the frequency of words in text; the standard library comes to our rescue and provides us with a very good way to track counts and frequencies, which is through the dedicated collections.Counter object.
  2. The collections.Counter object not only keeps track of frequencies, but provides some dedicated methods to retrieve the most common entries, entries that appear at last once and quickly count any iterable.
  3. Any iterable you provide to the Counter is "counted" for its frequency of values:
>>> txt = "This is a vast world you can't traverse world in a day"
>>>
>>> from collections import Counter
>>> counts = Counter(txt.split())
  1. The result would be exactly what we expect, a dictionary with the frequencies of the words in our phrase:
Counter({'a': 2, 'world': 2, "can't": 1, 'day': 1, 'traverse': 1, 
         'is': 1, 'vast': 1, 'in': 1, 'you': 1, 'This': 1})
  1. Then, we can easily query for the most frequent words:
>>> counts.most_common(2)
[('world', 2), ('a', 2)]

 

  1. Get the frequency of a specific word:
>>> counts['world']
2

Or, get back the total number of occurrences:

>>> sum(counts.values())
12
  1. And we can even apply some set operations on counters, such as joining them, subtracting them, or checking for intersections:
>>> Counter(["hello", "world"]) + Counter(["hello", "you"])
Counter({'hello': 2, 'you': 1, 'world': 1})
>>> Counter(["hello", "world"]) & Counter(["hello", "you"])
Counter({'hello': 1})

How it works...

Our counting code relies on the fact that Counter is just a special kind of dictionary, and that dictionaries can be built by providing an iterable. Each entry in the iterable will be added to the dictionary.

In the case of a counter, adding an element means incrementing its count; for every "word" in our list, we add that word multiple times (one every time it appears in the list), so its value in the Counter continues to get incremented every time the word is encountered.

There's more...

Relying on Counter is actually not the only way to track frequencies; we already know that Counter is a special kind of dictionary, so reproducing the Counter behavior should be quite straightforward.

Probably every one of us came up with a dictionary in this form:

counts = dict(hello=0, world=0, nice=0, day=0)

 

Whenever we face a new occurrence of hello, world, nice, or day, we increment the associated value in the dictionary and call it a day:

for word in 'hello world this is a very nice day'.split():
    if word in counts:
        counts[word] += 1

By relying on dict.get, we can also easily adapt it to count any word, not just those we could foresee:

for word in 'hello world this is a very nice day'.split():
    counts[word] = counts.get(word, 0) + 1

But the standard library actually provides a very flexible tool that we can use to improve this code even further, collections.defaultdict.

defaultdict is a plain dictionary that won't throw KeyError for any missing value, but will call a function we can provide to generate the missing value.

So, something such as defaultdict(int) will create a dictionary that provides 0 for any key that it doesn't have, which is very convenient for our counting purpose:

from collections import defaultdict

counts = defaultdict(int)
for word in 'hello world this is a very nice day'.split():
    counts[word] += 1

The result will be exactly what we expect:

defaultdict(<class 'int'>, {'day': 1, 'is': 1, 'a': 1, 'very': 1, 'world': 1, 'this': 1, 'nice': 1, 'hello': 1})

As for each word, the first time we face it, we will call int to get the starting value and then add 1 to it. As int gives 0 when called without any argument, that achieves what we want.

While this roughly solves our problem, it's far from being a complete solution for counting—we track frequencies, but on everything else, we are on our own. What if we want to know the most frequent entry in our bag of words?

The convenience of Counter is based on the set of additional features specialized for counting that it provides; it's not just a dictionary with a default numeric value, it's a class specialized in keeping track of frequencies and providing convenient ways to access them.

 

 

Dictionary with fallback


When working with configuration values, it's common to look them up in multiple places—maybe we load them from a configuration file—but we can override them with an environment variable or a command-line option, and in case the option is not provided, we can have a default value.

This can easily lead to long chains of if statements like these:

value = command_line_options.get('optname')
if value is None:
    value = os.environ.get('optname')
if value is None:
    value = config_file_options.get('optname')
if value is None:
    value = 'default-value'

This is annoying, and while for a single value it might be just annoying, it will tend to grow into a huge, confusing list of conditions as more options get added.

Command-line options are a very frequent use case, but the problem is related to chained scopes resolution. Variables in Python are resolved by looking at locals(); if they are not found, the interpreter looks at globals(), and if they are not yet found, it looks for built-ins.

How to do it...

For this step, you need to go through the following steps:

  1. The alternative for chaining default values of dict.get, instead of using multiple if instances, probably wouldn't improve much the code and if we want to add one additional scope, we would have to add it in every single place where we are looking up the values.
  2. collections.ChainMap is a very convenient solution to this problem; we can provide a list of mapping containers and it will look for a key through them all.

 

  1. Our previous example involving multiple different if instances can be converted to something like this:
import os
from collections import ChainMap

options = ChainMap(command_line_options, os.environ, config_file_options)
value = options.get('optname', 'default-value')
  1. We can also get rid of the last .get call by combining ChainMap with defaultdict. In this case, we can use defaultdict to provide a default value for every key:
import os
from collections import ChainMap, defaultdict

options = ChainMap(command_line_options, os.environ, config_file_options,
                   defaultdict(lambda: 'default-value'))
value = options['optname']
value2 = options['other-option']
  1. Print value and value2 will result in the following:
optvalue
default-value

optname will be retrieved from the command_line_options containing it, while other-option will end up being resolved by defaultdict.

How it works...

The ChainMap class receives multiple dictionaries as arguments; whenever a key is requested to ChainMap, it's actually going through the provided dictionaries one by one to check whether the key is available in any of them. Once the key is found, it is returned, as if it was a key owned by ChainMap itself.

The default value for options that are not provided is implemented by having defaultdict as the last dictionary provided to ChainMap. Whenever a key is not found in any of the previous dictionaries, it gets looked up in defaultdict, which uses the provided factory function to return a default value for all keys.

There's more...

Another great feature of ChainMap is that it allows updating too, but instead of updating the dictionary where it found the key, it always updates the first dictionary. The result is the same, as on next lookup of that key, we would have the first dictionary override any other value for that key (as it's the first place where the key is checked). The advantage is that if we provide an empty dictionary as the first mapping provided to ChainMap, we can change those values without touching the original container:

>>> population=dict(italy=60, japan=127, uk=65)
>>> changes = dict()
>>> editablepop = ChainMap(changes, population)

>>> print(editablepop['japan'])
127
>>> editablepop['japan'] += 1
>>> print(editablepop['japan'])
128

But even though we changed the population of Japan to 128 million, the original population didn't change:

>>> print(population['japan'])
127

And we can even use changes to find out which values were changed and which values were not:

>>> print(changes.keys()) 
dict_keys(['japan']) 
>>> print(population.keys() - changes.keys()) 
{'italy', 'uk'}

It's important to know, by the way, that if the object contained in the dictionary is mutable and we directly mutate it, there is little ChainMap can do to avoid mutating the original object. So if, instead of numbers, we store lists in the dictionaries, we will be mutating the original dictionary whenever we append values to the dictionary:

>>> citizens = dict(torino=['Alessandro'], amsterdam=['Bert'], raleigh=['Joseph'])
>>> changes = dict() 
>>> editablecits = ChainMap(changes, citizens) 
>>> editablecits['torino'].append('Simone') 
>>> print(editablecits['torino']) ['Alessandro', 'Simone']
>>> print(changes)
{}
>>> print(citizens)
{'amsterdam': ['Bert'], 
 'torino': ['Alessandro', 'Simone'], 
 'raleigh': ['Joseph']}
 

Unpacking multiple keyword arguments


Frequently, you ended up in a situation where you had to provide arguments to a function from a dictionary. If you've ever faced that need, you probably also ended up in a case where you had to take the arguments from multiple dictionaries.

Generally, Python functions accept arguments from a dictionary through unpacking (the ** syntax), but so far, it hasn't been possible to use unpacking twice in the same call, nor was there an easy way to merge two dictionaries.

How to do it...

The steps for this recipe are:

  1. Given a function, f, we want to pass the arguments from two dictionaries, d1 and d2 as follows:
>>> def f(a, b, c, d):
...     print (a, b, c, d)
...
>>> d1 = dict(a=5, b=6)
>>> d2 = dict(b=7, c=8, d=9)
  1. collections.ChainMap can help us achieve what we want; it can cope with duplicated entries and works with any Python version:
>>> f(**ChainMap(d1, d2))
5 6 8 9
  1. In Python 3.5 and newer versions, you can also create a new dictionary by combining multiple dictionaries through the literal syntax, and then pass the resulting dictionary as the argument of the function:
>>> f(**{**d1, **d2})
5 7 8 9
  1. In this case, the duplicated entries are accepted too, but are handled in reverse order of priority to ChainMap (so right to left). Notice how b has a value of 7, instead of the 6 it had with ChainMap, due to the reversed order of priorities.

This syntax might be harder to read due to the amount of unpacking operators involved, and with ChainMap it is probably more explicit what's happening for a reader.

How it works...

As we already know from the previous recipe, ChainMap looks up keys in all the provided dictionaries, so it's like the sum of all the dictionaries. The unpacking operator (**) works by inviting all keys to the container and then providing an argument for each key.

As ChainMap has keys resulting from the sum of all the provided dictionaries keys, it will provide the keys contained in all the dictionaries to the unpacking operator, thus allowing us to provide keyword arguments from multiple dictionaries.

There's more...

Since Python 3.5 through PEP 448, it's now possible to unpack multiple mappings to provide keyword arguments:

>>> def f(a, b, c, d):
...     print (a, b, c, d)
...
>>> d1 = dict(a=5, b=6)
>>> d2 = dict(c=7, d=8)
>>> f(**d1, **d2)
5 6 7 8

This solution is very convenient, but has two limits:

  • It's only available in Python 3.5+
  • It chokes on duplicated arguments

If you don't know where the mappings/dictionaries you are unpacking come from, it's easy to end up with the issue of duplicated arguments:

>>> d1 = dict(a=5, b=6)
>>> d2 = dict(b=7, c=8, d=9)
>>> f(**d1, **d2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: f() got multiple values for keyword argument 'b'

In the previous example, the b key is declared in both d1 and d2, and that causes the function to complain that it received duplicate arguments.

 

Ordered dictionaries


One of the most surprising aspects of Python dictionaries for new users is that their order is unpredictable and can change from environment to environment. So, the order of keys you expected on your system might be totally different on your friend's computer.

This frequently causes unexpected failures during tests; if a continuous integration system is involved, the ordering of dictionary keys on the system running the tests can be different from the ordering on your system, which might lead to random failures.

Suppose you have a snippet of code that generates an HTML tag with some attributes:

>>> attrs = dict(style="background-color:red", id="header")
>>> '<span {}>'.format(' '.join('%s="%s"' % a for a in attrs.items()))
'<span id="header" style="background-color:red">'

It might surprise you that on some systems you end up with this:

'<span id="header" style="background-color:red">'

While on others, the result might be this:

'<span style="background-color:red" id="header">'

So, if you expect to be able to compare the resulting string to check whether your function did the right thing when generating this tag, you might be disappointed.

How to do it...

Keys ordering is a very convenient feature and in some cases, it's actually necessary, so the Python standard library comes to help and provides the collections.OrderedDict container.

 

 

In the case of collections.OrderedDict, the keys are always in the order they were inserted in:

>>> attrs = OrderedDict([('id', 'header'), ('style', 'background-color:red')])
>>> '<span {}>'.format(' '.join('%s="%s"' % a for a in attrs.items()))
'<span id="header" style="background-color:red">'

How it works...

OrderedDict stores both a mapping of the keys to their values and a list of keys that is used to preserve the order of them.

So whenever your look for a key, the lookup goes through the mapping, but whenever you want to list the keys or iterate over the container, you go through the list of keys to ensure they are processed in the order they were inserted in.

The main problem when using OrderedDict is that Python on versions before 3.6 didn't guarantee any specific order of keyword arguments:

>>> attrs = OrderedDict(id="header", style="background-color:red")

This would have again introduced a totally random order of keys even though OrderedDict was used. Not because OrderedDict didn't preserve the order of those keys, but because it would have received them in a random order.

Thanks to PEP 468, the order of arguments is now guaranteed in Python 3.6 and newer versions (the order of dictionaries is still not, remember; so far it's just by chance that they are ordered). So if you are using Python 3.6 or newer, our previous example would work as expected, but if you are on older versions of Python, you would end up with a random order.

Thankfully, this is an issue that is easily solved. Like standard dictionaries, OrderedDict supports any iterable as the source of its content. As long as the iterable provides a key and a value, it can be used to build OrderedDict.

So by providing the keys and values in a tuple, we can provide them at construction time and preserve the order in any Python version:

>>> OrderedDict((('id', 'header'), ('style', 'background-color:red')))
OrderedDict([('id', 'header'), ('style', 'background-color:red')])

There's more...

Python 3.6 introduced a guarantee of preserving the order of dictionary keys as a side effect of some changes to dictionaries, but it was considered an internal implementation detail and not a language guarantee. Since Python 3.7, it became an official feature of the language so it's actually safe to rely on dictionary ordering if you are using Python 3.6 or newer.

 

MultiDict


If you have ever need to provide a reverse mapping, you have probably discovered that Python lacks a way to store more than a value for each key in a dictionary. This is a very common need, and most languages provide some form of multimap container.

Python tends to prefer having a single way of doing things, and as storing multiple values for the key means just storing a list of values for a key, it doesn't provide a specialized container.

The issue with storing a list of values is that to be able to append to values to our dictionary, the list must already exist.

How to do it...

Proceed with the following steps for this recipe:

  1. As we already know, defaultdict will create a default value by calling the provided callable for every missing key. We can provide the list constructor as a callable:
>>> from collections import defaultdict
>>> rd = defaultdict(list)
  1. So, we insert keys into our multimap by using rd[k].append(v) instead of the usual rd[k] = v:
>>> for name, num in [('ichi', 1), ('one', 1), ('uno', 1), ('un', 1)]:
...   rd[num].append(name)
...
>>> rd
defaultdict(<class 'list'>, {1: ['ichi', 'one', 'uno', 'un']})

How it works...

MultiDict works by storing a list for each key. Whenever a key is accessed, the list containing all the values for that key is retrieved.

In the case of missing keys, an empty list will be provided so that values can be added for that key.

This works because every time defaultdict faces a missing key, it will insert it with a value generated by calling list. And calling list will actually provide an empty list. So, doing rd[v] will always provide a list, empty or not, depending on whether v was an already existing key or not. Once we have our list, adding a new value is just a matter of appending it.

There's more...

Dictionaries in Python are associative containers where keys are unique. A key can appear a single time and has exactly one value.

If we want to support multiple values per key, we can actually solve the need by saving list as the value of our key. This list can then contain all the values we want to keep around for that key:

>>> rd = {1: ['one', 'uno', 'un', 'ichi'],
...       2: ['two', 'due', 'deux', 'ni'],
...       3: ['three', 'tre', 'trois', 'san']}
>>> rd[2]
['two', 'due', 'deux', 'ni']

If we want to add a new translation to 2 (Spanish, for example), we would just have to append the entry:

>>> rd[2].append('dos')
>>> rd[2]
['two', 'due', 'deux', 'ni', 'dos']

The problem arises when we want to introduce a new key:

>>> rd[4].append('four')
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
KeyError: 4

For key 4, no list exists, so there is nowhere we can append it. So, our snippet to automatically reverse the mapping can't be easily adapted to handle multiple values, as it would fail with key errors the first time it tries to insert a value:

>>> rd = {}
>>> for k,v in d.items():
...     rd[v].append(k)
Traceback (most recent call last):
    File "<stdin>", line 2, in <module>
KeyError: 1

Checking for every single entry, whether it's already in the dictionary or not, and acting accordingly is not very convenient. While we can rely on the setdefault method of dictionaries to hide that check, we can get a far more elegant solution by using collections.defaultdict.

 

Prioritizing entries


Picking the first/top entry of a set of values is a pretty frequent need; this usually means defining one value that has priority over the other and involves sorting.

But sorting can be expensive and re-sorting every time you add an entry to your values is certainly not a very convenient way to pick the first entry out of a set of values with some kind of priority.

How to do it...

Heaps are a perfect match for everything that has priorities, such as a priority queue:

import time
import heapq

class PriorityQueue:
    def __init__(self):
        self._q = []

    def add(self, value, priority=0):
        heapq.heappush(self._q, (priority, time.time(), value))

    def pop(self):
        return heapq.heappop(self._q)[-1]

 

 

Then, our PriorityQueue can be used to retrieve entries given a priority:

>>> def f1(): print('hello')
>>> def f2(): print('world')
>>>
>>> pq = PriorityQueue()
>>> pq.add(f2, priority=1)
>>> pq.add(f1, priority=0)
>>> pq.pop()()
hello
>>> pq.pop()()
world

 

How it works...

PriorityQueue works by storing everything in an heap. Heaps are particularly efficient at retrieving the top/first element of a sorted set without having to actually sort the whole set.

Our priority queue stores all the values in a three-element tuple: priority, time.time(), and value.

The first entry of our tuple is priority (lower is better). In the example, we recorded f1 with a better priority than f2, which ensures than when we use heap.heappop to fetch tasks to process, we get f1 and then f2, so that we end up with the hello world message and not world hello.

The second entry, timestamp, is used to ensure that tasks that have the same priority are processed in their insertion order. The oldest task will be served first as it will have the smallest timestamp.

Then, we have the value itself, which is the function we want call for our task.

There's more...

A very common approach to sorting is to keep a list of entries in a tuple, where the first element is key for which we are sorting and the second element is the value itself.

For a scoreboard, we can keep each player's name and how many points they got:

scores = [(123, 'Alessandro'),
          (143, 'Chris'),
          (192, 'Mark']

 

Storing those values in tuples works because comparing two tuples is performed by comparing each element of the first tuple with the element in the same index position in the other tuple:

>>> (10, 'B') > (10, 'A')
True
>>> (11, 'A') > (10, 'B')
True

It's very easy to understand what's going on if you think about strings. 'BB' > 'BB' is the same as ('B', 'B') > ('B', 'A'); in the end, a string is just a list of characters.

We can use this property to sort our scores and retrieve the winner of a competition:

>>> scores = sorted(scores)
>>> scores[-1]
(192, 'Mark')

The major problem with this approach is that every time we add an entry to our list, we have to sort it again, or our scoreboard would became meaningless:

>>> scores.append((137, 'Rick'))
>>> scores[-1]
(137, 'Rick')
>>> scores = sorted(scores)
>>> scores[-1]
(192, 'Mark')

This is very inconvenient because it's easy to miss re-sorting somewhere if we have multiple places appending to the list, and sorting the whole list every time can be expensive.

The Python standard library offers a data structure that is a perfect match when we're interested in finding out the winner of a competition.

In the heapq module, we have a fully working implementation of a heap data structure, a particular kind of tree where each parent is smaller than its children. This provides us with a tree that has a very interesting property: the root element is always the smallest one.

And being implemented on top of a list, it means that l[0] is always the smallest element in a heap:

>>> import heapq
>>> l = []
>>> heapq.heappush(l, (192, 'Mark'))
>>> heapq.heappush(l, (123, 'Alessandro'))
>>> heapq.heappush(l, (137, 'Rick'))
>>> heapq.heappush(l, (143, 'Chris'))
>>> l[0]
(123, 'Alessandro')

You might have noticed, by the way, that the heap finds the loser of our tournament, not the winner, and we were interested in finding the best player, with the highest value.

This is a minor problem we can easily solve by storing all scores as negative numbers. If we store each score as * -1, the head of the heap will always be the winner:

>>> l = []
>>> heapq.heappush(l, (-143, 'Chris'))
>>> heapq.heappush(l, (-137, 'Rick'))
>>> heapq.heappush(l, (-123, 'Alessandro'))
>>> heapq.heappush(l, (-192, 'Mark'))
>>> l[0]
(-192, 'Mark')
 

Bunch


Python is very good at shapeshifting objects. Each instance can have its own attributes and it's absolutely legal to add/remove the attributes of an object at runtime.

Once in a while, our code needs to deal with data of unknown shapes. For example, in the case of a user-submitted data, we might not know which fields the user is providing; maybe some of our users have a first name, some have a surname, and some have one or more middle name fields.

If we are not processing this data ourselves, but are just providing it to some other function, we really don't care about the shape of the data; as long as our objects have those attributes, we are fine.

A very common case is when working with protocols, if you are an HTTP server, you might want to provide to the application running behind you a request object. This object has a few known attributes, such as host and path, and it might have some optional attributes, such as a query string or a content type. But, it can also have any attribute the client provided, as HTTP is pretty flexible regarding headers, and our clients could have provided an x-totally-custom-header that we might have to expose to our code.

When representing this kind of data, Python developers often tend to look at dictionaries. In the end, Python objects themselves are built on top of dictionaries and they fit the need to map arbitrary values to names.

So, we will probably end up with something like the following:

>>> request = dict(host='www.example.org', path='/index.html')

A side effect of this approach is pretty clear once we have to pass this object around, especially to third-party code. Functions usually work with objects, and while they don't require a specific kind of object as duck-typing is the standard in Python, they will expect certain attributes to be there.

Another very common example is when writing tests, Python being a duck-typed language, it's absolutely reasonable to want to provide a fake object instead of providing a real instance of the object, especially when we need to simulate the values of some properties (as declared with @property), so we don't want or can't afford to create real instances of the object.

In such cases, using a dictionary is not viable as it will only provide access to its values through the request['path'] syntax and not through request.path, as probably expected by the functions we are providing our object to.

Also, the more we end up accessing this value, the more it's clear that the syntax using dot notation conveys the feeling of an entity that collaborates to the intent of the code, while a dictionary conveys the feeling of plain data.

As soon as we remember that Python objects can change shape at any time, we might be tempted to try creating an object instead of a dictionary. Unfortunately, we won't be able to provide the attributes at initialization time:

>>> request = object(host='www.example.org', path='/index.html')
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
TypeError: object() takes no parameters

Things don't improve much if we try to assign those attributes after the object is built:

>>> request = object()
>>> request.host = 'www.example.org'
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
AttributeError: 'object' object has no attribute 'host'

 

 

How to do it...

With a little effort, we can create a class that leverages dictionaries to contain any attribute we want and allow access both as a dictionary and through properties:

>>> class Bunch(dict):
...    def __getattribute__(self, key):
...        try: 
...            return self[key]
...        except KeyError:
...            raise AttributeError(key)
...    
...    def __setattr__(self, key, value): 
...        self[key] = value
...
>>> b = Bunch(a=5)
>>> b.a
5
>>> b['a']
5

How it works...

The Bunch class inherits dict, mostly as a way to provide a context where values can be stored, then most of the work is done by __getattribute__ and __setattr__. So, for any attribute that is retrieved or set on the object, they will just retrieve or set a key in self (remember we inherited from dict, so self is in fact a dictionary).

This allows the Bunch class to store and retrieve any value as an attribute of the object. The convenient feature is that it can behave both as an object and as a dict in most contexts.

For example, it is possible to find out all the values that it contains, like any other dictionary:

>>> b.items()
dict_items([('a', 5)])

It is also able to access those as attributes:

>>> b.c = 7
>>> b.c
7
>>> b.items()
dict_items([('a', 5), ('c', 7)])

 

 

There's more...

Our bunch implementation is not yet complete, as it will fail any test for class name (it's always named Bunch) and any test for inheritance, thus failing at faking other objects.

The first step is to make Bunch able to shapeshift not only its properties, but also its name. This can be achieved by creating a new class dynamically every time we create Bunch. The class will inherit from Bunch and will do nothing apart from providing a new name:

>>> class BunchBase(dict):
...    def __getattribute__(self, key):
...        try: 
...            return self[key]
...        except KeyError:
...            raise AttributeError(key)
...    
...    def __setattr__(self, key, value): 
...        self[key] = value
...
>>> def Bunch(_classname="Bunch", **attrs):
...     return type(_classname, (BunchBase, ), {})(**attrs)
>>>

The Bunch function moved from being the class itself to being a factory that will create objects that all act as Bunch, but can have different classes. Each Bunch will be a subclass of BunchBase, where the _classname name can be provided when Bunch is created:

>>> b = Bunch("Request", path="/index.html", host="www.example.org")
>>> print(b)
{'path': '/index.html', 'host': 'www.example.org'}
>>> print(b.path)
/index.html
>>> print(b.host)
www.example.org

This will allow us to create as many kinds of Bunch objects as we want, and each will have its own custom type:

>>> print(b.__class__)
<class '__main__.Request'>

The next step is to make our Bunch actually look like any other type that it has to impersonate. That is needed for the case where we want to use Bunch in place of another object. As Bunch can have any kind of attribute, it can take the place of any kind of object, but to be able to, it has to pass type checks for custom types.

 

 

We need to go back to our Bunch factory and make the Bunch objects not only have a custom class name, but also appear to be inherited from a custom parent.

To better understand what's going on, we will declare an example Person type; this type will be the one our Bunch objects will try to fake:

class Person(object):
    def __init__(name, surname):
        self.name = name
        self.surname = surname

    @property
    def fullname(self):
        return '{} {}'.format(self.name, self.surname)

Specifically, we are going to print Hello Your Name through a custom print function that only works for Person:

def hello(p):
    if not isinstance(p, Person):
        raise ValueError("Sorry, can only greet people")
    print("Hello {}".format(p.fullname))

We want to change our Bunch factory to accept the class and create a new type out of it:

def Bunch(_classname="Bunch", _parent=None, **attrs):
    parents = (_parent, ) if parent else tuple()
    return type(_classname, (BunchBase, ) + parents, {})(**attrs)

Now, our Bunch objects will appear as instances of a class named what we wanted, and will always appear as a subclass of _parent:

>>> p = Bunch("Person", Person, fullname='Alessandro Molina')
>>> hello(p)
Hello Alessandro Molina

Bunch can be a very convenient pattern; in both its complete and simplified versions, it is widely used in many frameworks with various implementations that all achieve pretty much the same result.

The showcased implementation is interesting because it gives us a clear idea of what's going on. There are ways to implement Bunch that are very smart, but might make it hard to guess what's going on and customize it.

Another possible way to implement the Bunch pattern is by patching the __dict__ class, which contains all the attributes of the class:

class Bunch(dict):
    def __init__(self, **kwds):
        super().__init__(**kwds)
        self.__dict__ = self

In this form, whenever Bunch is created, it will populate its values as a dict (by calling super().__init__, which is the dict initialization) and then, once all the attributes provided are stored in dict, it swaps the __dict__ object, which is the dictionary that contains all object attributes, with self. This makes the dict that was just populated with all the values also the dict that contains all the attributes of the object.

Our previous implementation worked by replacing the way we looked for attributes, while this implementation replaces the place where we look for attributes.

 

Enumerations


Enumeration is a common way to store values that can only represent a few states. Each symbolic name is bound to a specific value, usually numeric, that represents the states the enumeration can have.

Enumerations are very common in other programming languages, but until recently, Python didn't have any explicit support for enumerations.

How to do it...

Typically, enumerations are implemented by mapping symbolic names to numeric values; this is allowed in Python through enum.IntEnum:

>>> from enum import IntEnum
>>> 
>>> class RequestType(IntEnum):
...     POST = 1
...     GET = 2
>>>
>>> request_type = RequestType.POST
>>> print(request_type)
RequestType.POST

How it works...

IntEnum is an integer, apart from the fact that all possible values are created when the class is defined. IntEnum inherits from int, so its values are real integers.

During the RequestType definition, all the possible values for enum are declared within the class body and the values are verified against duplicates by the metaclass.

Also, enum provides support for a special value, auto, which means just put in a value, I don't care. As you usually only care whether it's POST or GET, you usually don't care whether POST is 1 or 2.

Last but not least, enumerations cannot be subclassed if they define at least one possible value.

There's more...

IntEnum values behave like int in most cases, which is usually convenient, but they can cause problems if the developer doesn't pay attention to the type.

For example, a function might unexpectedly perform the wrong thing if another enumeration or an integer value is provided, instead of the proper enumeration value:

>>> def do_request(kind):
...    if kind == RequestType.POST:
...        print('POST')
...    else:
...        print('OTHER')

As an example, invoking do_request with RequestType.POST or 1 will do exactly the same thing:

>>> do_request(RequestType.POST)
POST
>>> do_request(1)
POST

When we want to avoid treating our enumerations as numbers, we can use enum.Enum, which provides enumerated values that are not considered plain numbers:

>>> from enum import Enum
>>> 
>>> class RequestType(Enum):
...     POST = 1
...     GET = 2
>>>
>>> do_request(RequestType.POST)
POST
>>> do_request(1)
OTHER

So generally, if you need a simple set of enumerated values or possible states that rely on enum, Enum is safer, but if you need a set of numeric values that rely on enum, IntEnum will ensure that they behave like numbers.

About the Author

  • Alessandro Molina

    Alessandro Molina has been a Python developer since 2001, and has always been interested in Python as a web development platform. He has worked as a CTO and a team leader of Python teams for the past 10 years and is currently the core developer of the TurboGears2 web framework and maintainer of Beaker Caching/Session framework. He authored the DEPOT file storage framework and the DukPy JavaScript interpreter for Python and has collaborated with various Python projects related to web development, such as FormEncode, ToscaWidgets, and the Ming MongoDB ORM.

    Browse publications by this author

Latest Reviews

(7 reviews total)
Good book
My books have been OK so far. More 80% off might be better
Bon ouvrage sur python modernes

Recommended For You

Expert Python Programming - Third Edition

Refine your Python programming skills and build professional grade applications with this comprehensive guide

By Michał Jaworski and 1 more
Mastering Python for Finance - Second Edition

Take your financial skills to the next level by mastering cutting-edge mathematical and statistical financial applications

By James Ma Weiming
Mastering Python 3.x - Third Edition [Video]

Build high-quality Python programs

By Daniel Arbuckle
Mastering Object-Oriented Python - Second Edition

Gain comprehensive insights into programming practices, and code portability and reuse to build flexible and maintainable apps using object-oriented principles

By Steven F. Lott