Concurrency and one of its manifestations, parallel processing, are among the broadest topics in the area of software engineering. Concurrency is such a huge topic that dozens of books could be written and we would still not be able to discuss all of its important aspects and models. The purpose of this chapter is to show you why concurrency may be required in your application, when to use it, and what Python's most important concurrency models are.
We will discuss some of the language features, built-in modules, and third-party packages that allow you to implement these models in your code. But we won't cover them in much detail. Treat the content of this chapter as an entry point for your own research and reading. We will try to guide you through the basic ideas and help in deciding if you really need concurrency. Hopefully, after reading this chapter you will be able to tell which approach suits your needs best.
In this chapter, we will cover the...
The following are the Python packages that are used in this chapter, which you can download from PyPI:
Information on how to install packages is included in Chapter 2, Modern Python Development Environments.
The code files for this chapter can be found at https://github.com/PacktPublishing/Expert-Python-Programming-Fourth-Edition/tree/main/Chapter%206.
Before we delve into various implementations of concurrency available to Python programmers, let's discuss what concurrency actually is.
What is concurrency?
Concurrency is often confused with actual methods of implementing it. Some programmers also think that it is a synonym for parallel processing. This is the reason why we need to start by properly defining concurrency. Only then will we be able to properly understand various concurrency models and their key differences.
First and foremost, concurrency is not the same as parallelism. Concurrency is also not a matter of application implementation. Concurrency is a property of a program, algorithm, or problem, whereas parallelism is just one of the possible approaches to problems that are concurrent.
In Leslie Lamport's 1976 paper Time, Clocks, and the Ordering of Events in Distributed Systems, he defines the concept of concurrency as follows:
"Two events are concurrent if neither can causally affect the other."
By extrapolating events to programs, algorithms, or problems, we can say that something is concurrent if it can be fully or...
Developers often consider multithreading to be a very complex topic. While this statement is totally true, Python provides high-level classes and functions that greatly help in using threads. CPython has some inconvenient implementation details that make threads less effective than in other programming languages like C or Java. But that doesn't mean that they are completely useless in Python.
There is still quite a large range of problems that can be solved effectively and conveniently with Python threads.
In this section, we will discuss those limitations of multithreading in CPython, as well as the common concurrent problems for which Python threads are still a viable solution.
What is multithreading?
Thread is short for a thread of execution. A programmer can split their work into threads that run simultaneously. Threads are still bound to the parent process and can easily communicate because they share the same memory context. The execution...
Let's be honest, multithreading is challenging. Dealing with threads in a sane and safe manner required a tremendous amount of code when compared to the synchronous approach. We had to set up a thread pool and communication queues, gracefully handle exceptions from threads, and also worry about thread safety when trying to provide a rate limiting capability. Dozens of lines of code are needed just to execute one function from some external library in parallel! And we rely on the promise from the external package creator that their library is thread-safe. Sounds like a high price for a solution that is practically applicable only for doing I/O-bound tasks.
An alternative approach that allows you to achieve parallelism is multiprocessing. Separate Python processes that do not constrain each other with the GIL allow for better resource utilization. This is especially important for applications running on multicore processors that are performing really CPU-intensive...
Asynchronous programming has gained a lot of traction in the last few years. In Python 3.5, we finally got some syntax features that solidified the concepts of asynchronous execution. But this does not mean that asynchronous programming wasn't possible before Python 3.5. A lot of libraries and frameworks were provided a lot earlier, and most of them have origins in the old versions of Python 2. There is even a whole alternate implementation of Python called Stackless Python that concentrates on this single programming approach.
The easiest way to think about asynchronous programming in Python is to imagine something similar to threads, but without system scheduling involved. This means that an asynchronous program can concurrently process information, but the execution context is switched internally and not by the system scheduler.
But, of course, we don't use threads to concurrently handle the work in an asynchronous program. Many asynchronous...
It was a long journey, but we successfully struggled through most of the common approaches to concurrent programming that are available for Python programmers.
After explaining what concurrency really is, we jumped into action and dissected one of the typical concurrent problems with the help of multithreading. After identifying the basic deficiencies of our code and fixing them, we turned to multiprocessing to see how it would work in our case. We found that multiple processes with the
multiprocessing module are a lot easier to use than plain threads coming with the
threading module. But just after that, we realized that we can use the same API for threads too, thanks to the
multiprocessing.dummy module. So, the decision between multiprocessing and multithreading is now only a matter of which solution better suits the problem and not which solution has a better interface.
And speaking about problem fit, we finally tried asynchronous programming, which should be the...