Chapter 5. Using Multiprocessing and ProcessPoolExecutor
In the previous chapter, we studied how to use the threading
module to solve two case problems. Throughout this present chapter, we will study how to use the multiprocessing
module, which implements a similar interface to that of threading
. However, here we will use the processes paradigm.
This chapter covers the following topics:
Understanding the concept of a process
Understanding multiprocessing communication
Using multiprocessing to obtain Fibonacci series terms with multiple inputs
Crawling the Web using ProcessPoolExecutor
Understanding the concept of a process
We must understand processes in operating systems as containers for programs in execution and their resources. All that is referring to a program in execution can be managed by means of the process it represents—its data area, its child processes, its estates, as well as its communication with other processes.
Understanding the process model
Processes have associated information and resources that allow their manipulation and control. The operating system has a structure called the Process Control Block (PCB), which stores information referring to processes. For instance, the PCB might store the following information:
Process ID: This is the unique integer value (unsigned) and which identifies a process within the operational system
Program counter: This contains the address of the next program instruction to be executed
I/O information: This is a list of open files and devices associated with the process
Memory allocation: This stores information about...
Implementing multiprocessing communication
The multiprocessing
module (http://docs.python.org/3/library/multiprocessing.html) allows two ways of communication among processes, both based on the message passing paradigm. As seen previously, the message passing paradigm is based on the lack of synchronizing mechanisms as copies of data are exchanged among processes.
Using multiprocessing.Pipe
A pipe consists of a mechanism that establishes communication between two endpoints (two processes in communication). It is a way to create a channel so as to exchange messages among processes.
Tip
The official Python documentation recommends the use of a pipe for every two endpoints since there is no guarantee of reading safety by another endpoint simultaneously.
In order to exemplify the use of the multiprocessing.Pipe
object, we will implement a Python program that creates two processes, A and B. Process A sends a random integer value in intervals from 1 to 10 to process B, and process B will display it...
Using multiprocessing to compute Fibonacci series terms with multiple inputs
Let's implement the case study of processing a Fibonacci series for multiple inputs using the processes approach instead of threads.
The multiprocessing_fibonacci.py
code makes use of the multiprocessing
module, and in order to run, it imports some essential modules as we can observe in the following code:
Some imports have been mentioned in the previous chapters; nevertheless, some of the following imports do deserve special attention:
cpu_count
: This is a function that permits obtaining the quantity of CPUs in a machine
current_process
: This is a function that allows obtaining information on the current process, for example, its name
Manager
: This is a type of object that allows sharing Python objects among different processes by means of proxies (for more information, see http://docs...
Crawling the Web using ProcessPoolExecutor
Just as the concurrent.futures
module offers ThreadPoolExecutor
, which facilitates the creation and manipulation of multiple threads, processes belong to the class of ProcessPoolExecutor
. The ProcessPoolExecutor
class, which also featured in the concurrent.futures
pack, was used to implement our parallel Web crawler. In order to implement this case study, we have created a Python module named process_pool_executor_web_crawler.py
.
The code initiates with the imports known from the previous examples, such as requests
, the Manager
module, and so on. In relation to the definition of the tasks, and referring to the use of threads, little has changed compared to the example from the previous chapter, except that now we send data to be manipulated by means of function arguments; refer to the following signatures:
The group_urls_task
function is defined as follows:
The
crawl_task
function is defined as...
In this chapter, we observed the general concepts about processes and implemented case studies using the multiple processes approach to compute the Fibonacci series terms and the Web crawler in a parallel way.
In the next chapter, we will look at multiple processes using the parallel Python module, which is not a built-in module within Python. We will learn about the concept of inter-process communication and how to use pipes to communicate between processes.