Search icon
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
$9.99 | ALL EBOOKS & VIDEOS
Over 7,000 tech titles at $9.99 each with AI-powered learning assistants on new releases
Modern Python Standard Library Cookbook
Modern Python Standard Library Cookbook

Modern Python Standard Library Cookbook: Over 100 recipes to fully leverage the features of the standard library in Python

By Alessandro Molina
$43.99 $9.99
Book Aug 2018 366 pages 1st Edition
eBook
$43.99 $9.99
Print
$54.99
Subscription
$15.99 Monthly
eBook
$43.99 $9.99
Print
$54.99
Subscription
$15.99 Monthly

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now
Table of content icon View table of contents Preview book icon Preview Book

Modern Python Standard Library Cookbook

Chapter 1. Containers and Data Structures

In this chapter, we will cover the following recipes:

  • Counting frequencies—count occurrences of any hashable value
  • Dictionary with fallback—have a fallback value for any missing key
  • Unpacking multiple—keyword arguments—how to use ** more than once
  • Ordered dictionaries—maintaining order of keys in a dictionary
  • MultiDict—dictionary with multiple values per key
  • Prioritizing entries—efficiently get the top of sorted entries
  • Bunch—dictionaries that behave like objects
  • Enumerations—handle a known set of states

Introduction


Python has a very easy and flexible set of built-in containers. As a Python developer, there is little you can't achieve with a dict or a list. The convenience of Python dictionaries and lists is such that developers often forget that those have limits. Like any data structure, they are optimized and designed for specific use cases and might be inefficient in some conditions, or even unable to handle them.

Ever tried to put a key in a dictionary twice? Well you can't, because Python dictionaries are designed as hash tables with unique keys, but the MultiDict recipe will show you how to do that. Ever tried to grab the lowest/highest values out of a list without traversing it whole? The list itself can't, but in the Prioritized entries recipe, we will see how to achieve that.

The limits of standard Python containers are well known to Python experts. For that reason, the standard library has grown over the years to overcome those limits, and frequently there are patterns so common that their name is widely recognized, even though they are not formally defined.

Counting frequencies


A very common need in many kinds of programs is to count the occurrences of a value or of an event, which means counting frequency. Be it the need to count words in text, count likes on a blog post, or track scores for players of a video game, in the end counting frequency means counting how many we have of a specific value.

The most obvious solution for such a need would be to keep around counters for the things we need to count. If there are two, three, or four, maybe we can just track them in some dedicated variables, but if there are hundreds, it's certainly not feasible to keep around such a large amount of variables and we will quickly end up with a solution based on a container to collect all those counters.

How to do it...

Here are the steps for this recipe:

  1. Suppose we want to track the frequency of words in text; the standard library comes to our rescue and provides us with a very good way to track counts and frequencies, which is through the dedicated collections.Counter object.
  2. The collections.Counter object not only keeps track of frequencies, but provides some dedicated methods to retrieve the most common entries, entries that appear at last once and quickly count any iterable.
  3. Any iterable you provide to the Counter is "counted" for its frequency of values:
>>> txt = "This is a vast world you can't traverse world in a day"
>>>
>>> from collections import Counter
>>> counts = Counter(txt.split())
  1. The result would be exactly what we expect, a dictionary with the frequencies of the words in our phrase:
Counter({'a': 2, 'world': 2, "can't": 1, 'day': 1, 'traverse': 1, 
         'is': 1, 'vast': 1, 'in': 1, 'you': 1, 'This': 1})
  1. Then, we can easily query for the most frequent words:
>>> counts.most_common(2)
[('world', 2), ('a', 2)]

 

  1. Get the frequency of a specific word:
>>> counts['world']
2

Or, get back the total number of occurrences:

>>> sum(counts.values())
12
  1. And we can even apply some set operations on counters, such as joining them, subtracting them, or checking for intersections:
>>> Counter(["hello", "world"]) + Counter(["hello", "you"])
Counter({'hello': 2, 'you': 1, 'world': 1})
>>> Counter(["hello", "world"]) & Counter(["hello", "you"])
Counter({'hello': 1})

How it works...

Our counting code relies on the fact that Counter is just a special kind of dictionary, and that dictionaries can be built by providing an iterable. Each entry in the iterable will be added to the dictionary.

In the case of a counter, adding an element means incrementing its count; for every "word" in our list, we add that word multiple times (one every time it appears in the list), so its value in the Counter continues to get incremented every time the word is encountered.

There's more...

Relying on Counter is actually not the only way to track frequencies; we already know that Counter is a special kind of dictionary, so reproducing the Counter behavior should be quite straightforward.

Probably every one of us came up with a dictionary in this form:

counts = dict(hello=0, world=0, nice=0, day=0)

 

Whenever we face a new occurrence of hello, world, nice, or day, we increment the associated value in the dictionary and call it a day:

for word in 'hello world this is a very nice day'.split():
    if word in counts:
        counts[word] += 1

By relying on dict.get, we can also easily adapt it to count any word, not just those we could foresee:

for word in 'hello world this is a very nice day'.split():
    counts[word] = counts.get(word, 0) + 1

But the standard library actually provides a very flexible tool that we can use to improve this code even further, collections.defaultdict.

defaultdict is a plain dictionary that won't throw KeyError for any missing value, but will call a function we can provide to generate the missing value.

So, something such as defaultdict(int) will create a dictionary that provides 0 for any key that it doesn't have, which is very convenient for our counting purpose:

from collections import defaultdict

counts = defaultdict(int)
for word in 'hello world this is a very nice day'.split():
    counts[word] += 1

The result will be exactly what we expect:

defaultdict(<class 'int'>, {'day': 1, 'is': 1, 'a': 1, 'very': 1, 'world': 1, 'this': 1, 'nice': 1, 'hello': 1})

As for each word, the first time we face it, we will call int to get the starting value and then add 1 to it. As int gives 0 when called without any argument, that achieves what we want.

While this roughly solves our problem, it's far from being a complete solution for counting—we track frequencies, but on everything else, we are on our own. What if we want to know the most frequent entry in our bag of words?

The convenience of Counter is based on the set of additional features specialized for counting that it provides; it's not just a dictionary with a default numeric value, it's a class specialized in keeping track of frequencies and providing convenient ways to access them.

 

Dictionary with fallback


When working with configuration values, it's common to look them up in multiple places—maybe we load them from a configuration file—but we can override them with an environment variable or a command-line option, and in case the option is not provided, we can have a default value.

This can easily lead to long chains of if statements like these:

value = command_line_options.get('optname')
if value is None:
    value = os.environ.get('optname')
if value is None:
    value = config_file_options.get('optname')
if value is None:
    value = 'default-value'

This is annoying, and while for a single value it might be just annoying, it will tend to grow into a huge, confusing list of conditions as more options get added.

Command-line options are a very frequent use case, but the problem is related to chained scopes resolution. Variables in Python are resolved by looking at locals(); if they are not found, the interpreter looks at globals(), and if they are not yet found, it looks for built-ins.

How to do it...

For this step, you need to go through the following steps:

  1. The alternative for chaining default values of dict.get, instead of using multiple if instances, probably wouldn't improve much the code and if we want to add one additional scope, we would have to add it in every single place where we are looking up the values.
  2. collections.ChainMap is a very convenient solution to this problem; we can provide a list of mapping containers and it will look for a key through them all.

 

  1. Our previous example involving multiple different if instances can be converted to something like this:
import os
from collections import ChainMap

options = ChainMap(command_line_options, os.environ, config_file_options)
value = options.get('optname', 'default-value')
  1. We can also get rid of the last .get call by combining ChainMap with defaultdict. In this case, we can use defaultdict to provide a default value for every key:
import os
from collections import ChainMap, defaultdict

options = ChainMap(command_line_options, os.environ, config_file_options,
                   defaultdict(lambda: 'default-value'))
value = options['optname']
value2 = options['other-option']
  1. Print value and value2 will result in the following:
optvalue
default-value

optname will be retrieved from the command_line_options containing it, while other-option will end up being resolved by defaultdict.

How it works...

The ChainMap class receives multiple dictionaries as arguments; whenever a key is requested to ChainMap, it's actually going through the provided dictionaries one by one to check whether the key is available in any of them. Once the key is found, it is returned, as if it was a key owned by ChainMap itself.

The default value for options that are not provided is implemented by having defaultdict as the last dictionary provided to ChainMap. Whenever a key is not found in any of the previous dictionaries, it gets looked up in defaultdict, which uses the provided factory function to return a default value for all keys.

There's more...

Another great feature of ChainMap is that it allows updating too, but instead of updating the dictionary where it found the key, it always updates the first dictionary. The result is the same, as on next lookup of that key, we would have the first dictionary override any other value for that key (as it's the first place where the key is checked). The advantage is that if we provide an empty dictionary as the first mapping provided to ChainMap, we can change those values without touching the original container:

>>> population=dict(italy=60, japan=127, uk=65)
>>> changes = dict()
>>> editablepop = ChainMap(changes, population)

>>> print(editablepop['japan'])
127
>>> editablepop['japan'] += 1
>>> print(editablepop['japan'])
128

But even though we changed the population of Japan to 128 million, the original population didn't change:

>>> print(population['japan'])
127

And we can even use changes to find out which values were changed and which values were not:

>>> print(changes.keys()) 
dict_keys(['japan']) 
>>> print(population.keys() - changes.keys()) 
{'italy', 'uk'}

It's important to know, by the way, that if the object contained in the dictionary is mutable and we directly mutate it, there is little ChainMap can do to avoid mutating the original object. So if, instead of numbers, we store lists in the dictionaries, we will be mutating the original dictionary whenever we append values to the dictionary:

>>> citizens = dict(torino=['Alessandro'], amsterdam=['Bert'], raleigh=['Joseph'])
>>> changes = dict() 
>>> editablecits = ChainMap(changes, citizens) 
>>> editablecits['torino'].append('Simone') 
>>> print(editablecits['torino']) ['Alessandro', 'Simone']
>>> print(changes)
{}
>>> print(citizens)
{'amsterdam': ['Bert'], 
 'torino': ['Alessandro', 'Simone'], 
 'raleigh': ['Joseph']}

Unpacking multiple keyword arguments


Frequently, you ended up in a situation where you had to provide arguments to a function from a dictionary. If you've ever faced that need, you probably also ended up in a case where you had to take the arguments from multiple dictionaries.

Generally, Python functions accept arguments from a dictionary through unpacking (the ** syntax), but so far, it hasn't been possible to use unpacking twice in the same call, nor was there an easy way to merge two dictionaries.

How to do it...

The steps for this recipe are:

  1. Given a function, f, we want to pass the arguments from two dictionaries, d1 and d2 as follows:
>>> def f(a, b, c, d):
...     print (a, b, c, d)
...
>>> d1 = dict(a=5, b=6)
>>> d2 = dict(b=7, c=8, d=9)
  1. collections.ChainMap can help us achieve what we want; it can cope with duplicated entries and works with any Python version:
>>> f(**ChainMap(d1, d2))
5 6 8 9
  1. In Python 3.5 and newer versions, you can also create a new dictionary by combining multiple dictionaries through the literal syntax, and then pass the resulting dictionary as the argument of the function:
>>> f(**{**d1, **d2})
5 7 8 9
  1. In this case, the duplicated entries are accepted too, but are handled in reverse order of priority to ChainMap (so right to left). Notice how b has a value of 7, instead of the 6 it had with ChainMap, due to the reversed order of priorities.

This syntax might be harder to read due to the amount of unpacking operators involved, and with ChainMap it is probably more explicit what's happening for a reader.

How it works...

As we already know from the previous recipe, ChainMap looks up keys in all the provided dictionaries, so it's like the sum of all the dictionaries. The unpacking operator (**) works by inviting all keys to the container and then providing an argument for each key.

As ChainMap has keys resulting from the sum of all the provided dictionaries keys, it will provide the keys contained in all the dictionaries to the unpacking operator, thus allowing us to provide keyword arguments from multiple dictionaries.

There's more...

Since Python 3.5 through PEP 448, it's now possible to unpack multiple mappings to provide keyword arguments:

>>> def f(a, b, c, d):
...     print (a, b, c, d)
...
>>> d1 = dict(a=5, b=6)
>>> d2 = dict(c=7, d=8)
>>> f(**d1, **d2)
5 6 7 8

This solution is very convenient, but has two limits:

  • It's only available in Python 3.5+
  • It chokes on duplicated arguments

If you don't know where the mappings/dictionaries you are unpacking come from, it's easy to end up with the issue of duplicated arguments:

>>> d1 = dict(a=5, b=6)
>>> d2 = dict(b=7, c=8, d=9)
>>> f(**d1, **d2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: f() got multiple values for keyword argument 'b'

In the previous example, the b key is declared in both d1 and d2, and that causes the function to complain that it received duplicate arguments.

Ordered dictionaries


One of the most surprising aspects of Python dictionaries for new users is that their order is unpredictable and can change from environment to environment. So, the order of keys you expected on your system might be totally different on your friend's computer.

This frequently causes unexpected failures during tests; if a continuous integration system is involved, the ordering of dictionary keys on the system running the tests can be different from the ordering on your system, which might lead to random failures.

Suppose you have a snippet of code that generates an HTML tag with some attributes:

>>> attrs = dict(style="background-color:red", id="header")
>>> '<span {}>'.format(' '.join('%s="%s"' % a for a in attrs.items()))
'<span id="header" style="background-color:red">'

It might surprise you that on some systems you end up with this:

'<span id="header" style="background-color:red">'

While on others, the result might be this:

'<span style="background-color:red" id="header">'

So, if you expect to be able to compare the resulting string to check whether your function did the right thing when generating this tag, you might be disappointed.

How to do it...

Keys ordering is a very convenient feature and in some cases, it's actually necessary, so the Python standard library comes to help and provides the collections.OrderedDict container.

 

 

In the case of collections.OrderedDict, the keys are always in the order they were inserted in:

>>> attrs = OrderedDict([('id', 'header'), ('style', 'background-color:red')])
>>> '<span {}>'.format(' '.join('%s="%s"' % a for a in attrs.items()))
'<span id="header" style="background-color:red">'

How it works...

OrderedDict stores both a mapping of the keys to their values and a list of keys that is used to preserve the order of them.

So whenever your look for a key, the lookup goes through the mapping, but whenever you want to list the keys or iterate over the container, you go through the list of keys to ensure they are processed in the order they were inserted in.

The main problem when using OrderedDict is that Python on versions before 3.6 didn't guarantee any specific order of keyword arguments:

>>> attrs = OrderedDict(id="header", style="background-color:red")

This would have again introduced a totally random order of keys even though OrderedDict was used. Not because OrderedDict didn't preserve the order of those keys, but because it would have received them in a random order.

Thanks to PEP 468, the order of arguments is now guaranteed in Python 3.6 and newer versions (the order of dictionaries is still not, remember; so far it's just by chance that they are ordered). So if you are using Python 3.6 or newer, our previous example would work as expected, but if you are on older versions of Python, you would end up with a random order.

Thankfully, this is an issue that is easily solved. Like standard dictionaries, OrderedDict supports any iterable as the source of its content. As long as the iterable provides a key and a value, it can be used to build OrderedDict.

So by providing the keys and values in a tuple, we can provide them at construction time and preserve the order in any Python version:

>>> OrderedDict((('id', 'header'), ('style', 'background-color:red')))
OrderedDict([('id', 'header'), ('style', 'background-color:red')])

There's more...

Python 3.6 introduced a guarantee of preserving the order of dictionary keys as a side effect of some changes to dictionaries, but it was considered an internal implementation detail and not a language guarantee. Since Python 3.7, it became an official feature of the language so it's actually safe to rely on dictionary ordering if you are using Python 3.6 or newer.

MultiDict


If you have ever need to provide a reverse mapping, you have probably discovered that Python lacks a way to store more than a value for each key in a dictionary. This is a very common need, and most languages provide some form of multimap container.

Python tends to prefer having a single way of doing things, and as storing multiple values for the key means just storing a list of values for a key, it doesn't provide a specialized container.

The issue with storing a list of values is that to be able to append to values to our dictionary, the list must already exist.

How to do it...

Proceed with the following steps for this recipe:

  1. As we already know, defaultdict will create a default value by calling the provided callable for every missing key. We can provide the list constructor as a callable:
>>> from collections import defaultdict
>>> rd = defaultdict(list)
  1. So, we insert keys into our multimap by using rd[k].append(v) instead of the usual rd[k] = v:
>>> for name, num in [('ichi', 1), ('one', 1), ('uno', 1), ('un', 1)]:
...   rd[num].append(name)
...
>>> rd
defaultdict(<class 'list'>, {1: ['ichi', 'one', 'uno', 'un']})

How it works...

MultiDict works by storing a list for each key. Whenever a key is accessed, the list containing all the values for that key is retrieved.

In the case of missing keys, an empty list will be provided so that values can be added for that key.

This works because every time defaultdict faces a missing key, it will insert it with a value generated by calling list. And calling list will actually provide an empty list. So, doing rd[v] will always provide a list, empty or not, depending on whether v was an already existing key or not. Once we have our list, adding a new value is just a matter of appending it.

There's more...

Dictionaries in Python are associative containers where keys are unique. A key can appear a single time and has exactly one value.

If we want to support multiple values per key, we can actually solve the need by saving list as the value of our key. This list can then contain all the values we want to keep around for that key:

>>> rd = {1: ['one', 'uno', 'un', 'ichi'],
...       2: ['two', 'due', 'deux', 'ni'],
...       3: ['three', 'tre', 'trois', 'san']}
>>> rd[2]
['two', 'due', 'deux', 'ni']

If we want to add a new translation to 2 (Spanish, for example), we would just have to append the entry:

>>> rd[2].append('dos')
>>> rd[2]
['two', 'due', 'deux', 'ni', 'dos']

The problem arises when we want to introduce a new key:

>>> rd[4].append('four')
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
KeyError: 4

For key 4, no list exists, so there is nowhere we can append it. So, our snippet to automatically reverse the mapping can't be easily adapted to handle multiple values, as it would fail with key errors the first time it tries to insert a value:

>>> rd = {}
>>> for k,v in d.items():
...     rd[v].append(k)
Traceback (most recent call last):
    File "<stdin>", line 2, in <module>
KeyError: 1

Checking for every single entry, whether it's already in the dictionary or not, and acting accordingly is not very convenient. While we can rely on the setdefault method of dictionaries to hide that check, we can get a far more elegant solution by using collections.defaultdict.

Prioritizing entries


Picking the first/top entry of a set of values is a pretty frequent need; this usually means defining one value that has priority over the other and involves sorting.

But sorting can be expensive and re-sorting every time you add an entry to your values is certainly not a very convenient way to pick the first entry out of a set of values with some kind of priority.

How to do it...

Heaps are a perfect match for everything that has priorities, such as a priority queue:

import time
import heapq

class PriorityQueue:
    def __init__(self):
        self._q = []

    def add(self, value, priority=0):
        heapq.heappush(self._q, (priority, time.time(), value))

    def pop(self):
        return heapq.heappop(self._q)[-1]

 

 

Then, our PriorityQueue can be used to retrieve entries given a priority:

>>> def f1(): print('hello')
>>> def f2(): print('world')
>>>
>>> pq = PriorityQueue()
>>> pq.add(f2, priority=1)
>>> pq.add(f1, priority=0)
>>> pq.pop()()
hello
>>> pq.pop()()
world

 

How it works...

PriorityQueue works by storing everything in an heap. Heaps are particularly efficient at retrieving the top/first element of a sorted set without having to actually sort the whole set.

Our priority queue stores all the values in a three-element tuple: priority, time.time(), and value.

The first entry of our tuple is priority (lower is better). In the example, we recorded f1 with a better priority than f2, which ensures than when we use heap.heappop to fetch tasks to process, we get f1 and then f2, so that we end up with the hello world message and not world hello.

The second entry, timestamp, is used to ensure that tasks that have the same priority are processed in their insertion order. The oldest task will be served first as it will have the smallest timestamp.

Then, we have the value itself, which is the function we want call for our task.

There's more...

A very common approach to sorting is to keep a list of entries in a tuple, where the first element is key for which we are sorting and the second element is the value itself.

For a scoreboard, we can keep each player's name and how many points they got:

scores = [(123, 'Alessandro'),
          (143, 'Chris'),
          (192, 'Mark']

 

Storing those values in tuples works because comparing two tuples is performed by comparing each element of the first tuple with the element in the same index position in the other tuple:

>>> (10, 'B') > (10, 'A')
True
>>> (11, 'A') > (10, 'B')
True

It's very easy to understand what's going on if you think about strings. 'BB' > 'BB' is the same as ('B', 'B') > ('B', 'A'); in the end, a string is just a list of characters.

We can use this property to sort our scores and retrieve the winner of a competition:

>>> scores = sorted(scores)
>>> scores[-1]
(192, 'Mark')

The major problem with this approach is that every time we add an entry to our list, we have to sort it again, or our scoreboard would became meaningless:

>>> scores.append((137, 'Rick'))
>>> scores[-1]
(137, 'Rick')
>>> scores = sorted(scores)
>>> scores[-1]
(192, 'Mark')

This is very inconvenient because it's easy to miss re-sorting somewhere if we have multiple places appending to the list, and sorting the whole list every time can be expensive.

The Python standard library offers a data structure that is a perfect match when we're interested in finding out the winner of a competition.

In the heapq module, we have a fully working implementation of a heap data structure, a particular kind of tree where each parent is smaller than its children. This provides us with a tree that has a very interesting property: the root element is always the smallest one.

And being implemented on top of a list, it means that l[0] is always the smallest element in a heap:

>>> import heapq
>>> l = []
>>> heapq.heappush(l, (192, 'Mark'))
>>> heapq.heappush(l, (123, 'Alessandro'))
>>> heapq.heappush(l, (137, 'Rick'))
>>> heapq.heappush(l, (143, 'Chris'))
>>> l[0]
(123, 'Alessandro')

You might have noticed, by the way, that the heap finds the loser of our tournament, not the winner, and we were interested in finding the best player, with the highest value.

This is a minor problem we can easily solve by storing all scores as negative numbers. If we store each score as * -1, the head of the heap will always be the winner:

>>> l = []
>>> heapq.heappush(l, (-143, 'Chris'))
>>> heapq.heappush(l, (-137, 'Rick'))
>>> heapq.heappush(l, (-123, 'Alessandro'))
>>> heapq.heappush(l, (-192, 'Mark'))
>>> l[0]
(-192, 'Mark')

Bunch


Python is very good at shapeshifting objects. Each instance can have its own attributes and it's absolutely legal to add/remove the attributes of an object at runtime.

Once in a while, our code needs to deal with data of unknown shapes. For example, in the case of a user-submitted data, we might not know which fields the user is providing; maybe some of our users have a first name, some have a surname, and some have one or more middle name fields.

If we are not processing this data ourselves, but are just providing it to some other function, we really don't care about the shape of the data; as long as our objects have those attributes, we are fine.

A very common case is when working with protocols, if you are an HTTP server, you might want to provide to the application running behind you a request object. This object has a few known attributes, such as host and path, and it might have some optional attributes, such as a query string or a content type. But, it can also have any attribute the client provided, as HTTP is pretty flexible regarding headers, and our clients could have provided an x-totally-custom-header that we might have to expose to our code.

When representing this kind of data, Python developers often tend to look at dictionaries. In the end, Python objects themselves are built on top of dictionaries and they fit the need to map arbitrary values to names.

So, we will probably end up with something like the following:

>>> request = dict(host='www.example.org', path='/index.html')

A side effect of this approach is pretty clear once we have to pass this object around, especially to third-party code. Functions usually work with objects, and while they don't require a specific kind of object as duck-typing is the standard in Python, they will expect certain attributes to be there.

Another very common example is when writing tests, Python being a duck-typed language, it's absolutely reasonable to want to provide a fake object instead of providing a real instance of the object, especially when we need to simulate the values of some properties (as declared with @property), so we don't want or can't afford to create real instances of the object.

In such cases, using a dictionary is not viable as it will only provide access to its values through the request['path'] syntax and not through request.path, as probably expected by the functions we are providing our object to.

Also, the more we end up accessing this value, the more it's clear that the syntax using dot notation conveys the feeling of an entity that collaborates to the intent of the code, while a dictionary conveys the feeling of plain data.

As soon as we remember that Python objects can change shape at any time, we might be tempted to try creating an object instead of a dictionary. Unfortunately, we won't be able to provide the attributes at initialization time:

>>> request = object(host='www.example.org', path='/index.html')
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
TypeError: object() takes no parameters

Things don't improve much if we try to assign those attributes after the object is built:

>>> request = object()
>>> request.host = 'www.example.org'
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
AttributeError: 'object' object has no attribute 'host'

 

 

How to do it...

With a little effort, we can create a class that leverages dictionaries to contain any attribute we want and allow access both as a dictionary and through properties:

>>> class Bunch(dict):
...    def __getattribute__(self, key):
...        try: 
...            return self[key]
...        except KeyError:
...            raise AttributeError(key)
...    
...    def __setattr__(self, key, value): 
...        self[key] = value
...
>>> b = Bunch(a=5)
>>> b.a
5
>>> b['a']
5

How it works...

The Bunch class inherits dict, mostly as a way to provide a context where values can be stored, then most of the work is done by __getattribute__ and __setattr__. So, for any attribute that is retrieved or set on the object, they will just retrieve or set a key in self (remember we inherited from dict, so self is in fact a dictionary).

This allows the Bunch class to store and retrieve any value as an attribute of the object. The convenient feature is that it can behave both as an object and as a dict in most contexts.

For example, it is possible to find out all the values that it contains, like any other dictionary:

>>> b.items()
dict_items([('a', 5)])

It is also able to access those as attributes:

>>> b.c = 7
>>> b.c
7
>>> b.items()
dict_items([('a', 5), ('c', 7)])

 

 

There's more...

Our bunch implementation is not yet complete, as it will fail any test for class name (it's always named Bunch) and any test for inheritance, thus failing at faking other objects.

The first step is to make Bunch able to shapeshift not only its properties, but also its name. This can be achieved by creating a new class dynamically every time we create Bunch. The class will inherit from Bunch and will do nothing apart from providing a new name:

>>> class BunchBase(dict):
...    def __getattribute__(self, key):
...        try: 
...            return self[key]
...        except KeyError:
...            raise AttributeError(key)
...    
...    def __setattr__(self, key, value): 
...        self[key] = value
...
>>> def Bunch(_classname="Bunch", **attrs):
...     return type(_classname, (BunchBase, ), {})(**attrs)
>>>

The Bunch function moved from being the class itself to being a factory that will create objects that all act as Bunch, but can have different classes. Each Bunch will be a subclass of BunchBase, where the _classname name can be provided when Bunch is created:

>>> b = Bunch("Request", path="/index.html", host="www.example.org")
>>> print(b)
{'path': '/index.html', 'host': 'www.example.org'}
>>> print(b.path)
/index.html
>>> print(b.host)
www.example.org

This will allow us to create as many kinds of Bunch objects as we want, and each will have its own custom type:

>>> print(b.__class__)
<class '__main__.Request'>

The next step is to make our Bunch actually look like any other type that it has to impersonate. That is needed for the case where we want to use Bunch in place of another object. As Bunch can have any kind of attribute, it can take the place of any kind of object, but to be able to, it has to pass type checks for custom types.

 

 

We need to go back to our Bunch factory and make the Bunch objects not only have a custom class name, but also appear to be inherited from a custom parent.

To better understand what's going on, we will declare an example Person type; this type will be the one our Bunch objects will try to fake:

class Person(object):
    def __init__(name, surname):
        self.name = name
        self.surname = surname

    @property
    def fullname(self):
        return '{} {}'.format(self.name, self.surname)

Specifically, we are going to print Hello Your Name through a custom print function that only works for Person:

def hello(p):
    if not isinstance(p, Person):
        raise ValueError("Sorry, can only greet people")
    print("Hello {}".format(p.fullname))

We want to change our Bunch factory to accept the class and create a new type out of it:

def Bunch(_classname="Bunch", _parent=None, **attrs):
    parents = (_parent, ) if parent else tuple()
    return type(_classname, (BunchBase, ) + parents, {})(**attrs)

Now, our Bunch objects will appear as instances of a class named what we wanted, and will always appear as a subclass of _parent:

>>> p = Bunch("Person", Person, fullname='Alessandro Molina')
>>> hello(p)
Hello Alessandro Molina

Bunch can be a very convenient pattern; in both its complete and simplified versions, it is widely used in many frameworks with various implementations that all achieve pretty much the same result.

The showcased implementation is interesting because it gives us a clear idea of what's going on. There are ways to implement Bunch that are very smart, but might make it hard to guess what's going on and customize it.

Another possible way to implement the Bunch pattern is by patching the __dict__ class, which contains all the attributes of the class:

class Bunch(dict):
    def __init__(self, **kwds):
        super().__init__(**kwds)
        self.__dict__ = self

In this form, whenever Bunch is created, it will populate its values as a dict (by calling super().__init__, which is the dict initialization) and then, once all the attributes provided are stored in dict, it swaps the __dict__ object, which is the dictionary that contains all object attributes, with self. This makes the dict that was just populated with all the values also the dict that contains all the attributes of the object.

Our previous implementation worked by replacing the way we looked for attributes, while this implementation replaces the place where we look for attributes.

Enumerations


Enumeration is a common way to store values that can only represent a few states. Each symbolic name is bound to a specific value, usually numeric, that represents the states the enumeration can have.

Enumerations are very common in other programming languages, but until recently, Python didn't have any explicit support for enumerations.

How to do it...

Typically, enumerations are implemented by mapping symbolic names to numeric values; this is allowed in Python through enum.IntEnum:

>>> from enum import IntEnum
>>> 
>>> class RequestType(IntEnum):
...     POST = 1
...     GET = 2
>>>
>>> request_type = RequestType.POST
>>> print(request_type)
RequestType.POST

How it works...

IntEnum is an integer, apart from the fact that all possible values are created when the class is defined. IntEnum inherits from int, so its values are real integers.

During the RequestType definition, all the possible values for enum are declared within the class body and the values are verified against duplicates by the metaclass.

Also, enum provides support for a special value, auto, which means just put in a value, I don't care. As you usually only care whether it's POST or GET, you usually don't care whether POST is 1 or 2.

Last but not least, enumerations cannot be subclassed if they define at least one possible value.

There's more...

IntEnum values behave like int in most cases, which is usually convenient, but they can cause problems if the developer doesn't pay attention to the type.

For example, a function might unexpectedly perform the wrong thing if another enumeration or an integer value is provided, instead of the proper enumeration value:

>>> def do_request(kind):
...    if kind == RequestType.POST:
...        print('POST')
...    else:
...        print('OTHER')

As an example, invoking do_request with RequestType.POST or 1 will do exactly the same thing:

>>> do_request(RequestType.POST)
POST
>>> do_request(1)
POST

When we want to avoid treating our enumerations as numbers, we can use enum.Enum, which provides enumerated values that are not considered plain numbers:

>>> from enum import Enum
>>> 
>>> class RequestType(Enum):
...     POST = 1
...     GET = 2
>>>
>>> do_request(RequestType.POST)
POST
>>> do_request(1)
OTHER

So generally, if you need a simple set of enumerated values or possible states that rely on enumEnum is safer, but if you need a set of numeric values that rely on enumIntEnum will ensure that they behave like numbers.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Strategic recipes for effective application development in Python
  • Techniques to create GUIs and implement security through cryptography
  • Best practices for developing readily scalable, production-ready applications

Description

The Python 3 Standard Library is a vast array of modules that you can use for developing various kinds of applications. It contains an exhaustive list of libraries, and this book will help you choose the best one to address specific programming problems in Python. The Modern Python Standard Library Cookbook begins with recipes on containers and data structures and guides you in performing effective text management in Python. You will find Python recipes for command-line operations, networking, filesystems and directories, and concurrent execution. You will learn about Python security essentials in Python and get to grips with various development tools for debugging, benchmarking, inspection, error reporting, and tracing. The book includes recipes to help you create graphical user interfaces for your application. You will learn to work with multimedia components and perform mathematical operations on date and time. The recipes will also show you how to deploy different searching and sorting algorithms on your data. By the end of the book, you will have acquired the skills needed to write clean code in Python and develop applications that meet your needs.

What you will learn

Store multiple values per key in associative containers Create interactive character-based user interfaces Work with native time and display data for your time zone Read/write SGML family languages, both as a SAX and DOM parser to meet file sizes and other requirements Group equivalent items using itertools and sorted features together Use partials to create unary functions out of multi-argument functions Implement hashing algorithms to store passwords in a safe way

Product Details

Country selected

Publication date : Aug 31, 2018
Length 366 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781788830829
Category :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Aug 31, 2018
Length 366 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781788830829
Category :

Table of Contents

21 Chapters
Title Page Chevron down icon Chevron up icon
Copyright and Credits Chevron down icon Chevron up icon
Packt Upsell Chevron down icon Chevron up icon
Contributors Chevron down icon Chevron up icon
Preface Chevron down icon Chevron up icon
1. Containers and Data Structures Chevron down icon Chevron up icon
2. Text Management Chevron down icon Chevron up icon
3. Command Line Chevron down icon Chevron up icon
4. Filesystem and Directories Chevron down icon Chevron up icon
5. Date and Time Chevron down icon Chevron up icon
6. Read/Write Data Chevron down icon Chevron up icon
7. Algorithms Chevron down icon Chevron up icon
8. Cryptography Chevron down icon Chevron up icon
9. Concurrency Chevron down icon Chevron up icon
10. Networking Chevron down icon Chevron up icon
11. Web Development Chevron down icon Chevron up icon
12. Multimedia Chevron down icon Chevron up icon
13. Graphical User Interfaces Chevron down icon Chevron up icon
14. Development Tools Chevron down icon Chevron up icon
1. Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.