Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Parallel Programming with Python

You're reading from  Parallel Programming with Python

Product type Book
Published in Jun 2014
Publisher
ISBN-13 9781783288397
Pages 124 pages
Edition 1st Edition
Languages

Table of Contents (16) Chapters

Parallel Programming with Python
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Contextualizing Parallel, Concurrent, and Distributed Programming Designing Parallel Algorithms Identifying a Parallelizable Problem Using the threading and concurrent.futures Modules Using Multiprocessing and ProcessPoolExecutor Utilizing Parallel Python Distributing Tasks with Celery Doing Things Asynchronously Index

Chapter 5. Using Multiprocessing and ProcessPoolExecutor

In the previous chapter, we studied how to use the threading module to solve two case problems. Throughout this present chapter, we will study how to use the multiprocessing module, which implements a similar interface to that of threading. However, here we will use the processes paradigm.

This chapter covers the following topics:

  • Understanding the concept of a process

  • Understanding multiprocessing communication

  • Using multiprocessing to obtain Fibonacci series terms with multiple inputs

  • Crawling the Web using ProcessPoolExecutor

Understanding the concept of a process


We must understand processes in operating systems as containers for programs in execution and their resources. All that is referring to a program in execution can be managed by means of the process it represents—its data area, its child processes, its estates, as well as its communication with other processes.

Understanding the process model

Processes have associated information and resources that allow their manipulation and control. The operating system has a structure called the Process Control Block (PCB), which stores information referring to processes. For instance, the PCB might store the following information:

  • Process ID: This is the unique integer value (unsigned) and which identifies a process within the operational system

  • Program counter: This contains the address of the next program instruction to be executed

  • I/O information: This is a list of open files and devices associated with the process

  • Memory allocation: This stores information about...

Implementing multiprocessing communication


The multiprocessing module (http://docs.python.org/3/library/multiprocessing.html) allows two ways of communication among processes, both based on the message passing paradigm. As seen previously, the message passing paradigm is based on the lack of synchronizing mechanisms as copies of data are exchanged among processes.

Using multiprocessing.Pipe

A pipe consists of a mechanism that establishes communication between two endpoints (two processes in communication). It is a way to create a channel so as to exchange messages among processes.

Tip

The official Python documentation recommends the use of a pipe for every two endpoints since there is no guarantee of reading safety by another endpoint simultaneously.

In order to exemplify the use of the multiprocessing.Pipe object, we will implement a Python program that creates two processes, A and B. Process A sends a random integer value in intervals from 1 to 10 to process B, and process B will display it...

Using multiprocessing to compute Fibonacci series terms with multiple inputs


Let's implement the case study of processing a Fibonacci series for multiple inputs using the processes approach instead of threads.

The multiprocessing_fibonacci.py code makes use of the multiprocessing module, and in order to run, it imports some essential modules as we can observe in the following code:

import sys, time, random, re, requests
import concurrent.futures
from multiprocessing import, cpu_count, current_process, Manager

Some imports have been mentioned in the previous chapters; nevertheless, some of the following imports do deserve special attention:

  • cpu_count: This is a function that permits obtaining the quantity of CPUs in a machine

  • current_process: This is a function that allows obtaining information on the current process, for example, its name

  • Manager: This is a type of object that allows sharing Python objects among different processes by means of proxies (for more information, see http://docs...

Crawling the Web using ProcessPoolExecutor


Just as the concurrent.futures module offers ThreadPoolExecutor, which facilitates the creation and manipulation of multiple threads, processes belong to the class of ProcessPoolExecutor. The ProcessPoolExecutor class, which also featured in the concurrent.futures pack, was used to implement our parallel Web crawler. In order to implement this case study, we have created a Python module named process_pool_executor_web_crawler.py.

The code initiates with the imports known from the previous examples, such as requests, the Manager module, and so on. In relation to the definition of the tasks, and referring to the use of threads, little has changed compared to the example from the previous chapter, except that now we send data to be manipulated by means of function arguments; refer to the following signatures:

The group_urls_task function is defined as follows:

def group_urls_task(urls, result_dict, html_link_regex)

The crawl_task function is defined as...

Summary


In this chapter, we observed the general concepts about processes and implemented case studies using the multiple processes approach to compute the Fibonacci series terms and the Web crawler in a parallel way.

In the next chapter, we will look at multiple processes using the parallel Python module, which is not a built-in module within Python. We will learn about the concept of inter-process communication and how to use pipes to communicate between processes.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Parallel Programming with Python
Published in: Jun 2014 Publisher: ISBN-13: 9781783288397
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}