Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Parallel Programming with Python
Parallel Programming with Python

Parallel Programming with Python: Develop efficient parallel systems using the robust Python environment.

$15.99 per month
Book Jun 2014 124 pages 1st Edition
eBook
$15.99 $10.99
Print
$24.99
Subscription
$15.99 Monthly
eBook
$15.99 $10.99
Print
$24.99
Subscription
$15.99 Monthly

What do you get with a Packt Subscription?

Free for first 7 days. $15.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details


Publication date : Jun 25, 2014
Length 124 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781783288397
Category :
Table of content icon View table of contents Preview book icon Preview Book

Parallel Programming with Python

Chapter 1. Contextualizing Parallel, Concurrent, and Distributed Programming

Parallel programming can be defined as a model that aims to create programs that are compatible with environments prepared to execute code instructions simultaneously. It has not been too long since techniques of parallelism began to be used to develop software. Some years ago, processors had a single Arithmetic Logic Unit (ALU) among other components, which could only execute one instruction at a time during a time space. For years, only a clock that measured in hertz to determine the number of instructions a processor could process within a given interval of time was taken into consideration. The more the number of clocks, the more the instructions potentially executed in terms of KHz (thousands of operations per second), MHz (millions of operations per second), and the current GHz (billions of operations per second).

Summing up, the more instructions per cycle given to the processor, the faster the execution. During the '80s, a revolutionary processor came to life, Intel 80386, which allowed the execution of tasks in a pre-emptive manner, that is, it was possible to periodically interrupt the execution of a program to provide processor time to another program; this meant pseudo-parallelism based on time-slicing.

In the late '80s, there came Intel 80486 that implemented a pipelining system, which in practice, divided the stage of execution into distinct substages. In practical terms, in a cycle of the processor, we could have different instructions being carried out simultaneously in each substage.

All the advances mentioned in the preceding section resulted in several improvements in performance, but it was not enough, as we were faced with a delicate issue that would end up as the so-called Moore's law (http://www.mooreslaw.org/).

The quest for high taxes of clock ended up colliding with physical limitations; processors would consume more energy, thereby generating more heat. Moreover, there was another as important issue: the market for portable computers was speeding up in the '90s. So, it was extremely important to have processors that could make the batteries of these pieces of equipment last long enough away from the plug. Several technologies and families of processors from different manufacturers were born. As regards servers and mainframes, Intel® deserves to be highlighted with its family of products Core®, which allowed to trick the operating system by simulating the existence of more than one processor even though there was a single physical chip.

In the Core® family, the processor got severe internal changes and featured components called core, which had their own ALU and caches L2 and L3, among other elements to carry out instructions. Those cores, also known as logical processors, allowed us to parallel the execution of different parts of the same program, or even different programs, simultaneously. The age core enabled lower energy use with power processing superior to its predecessors. As cores work in parallel, simulating independent processors, we can have a multi-core chip and an inferior clock, thereby getting superior performance compared to a single-core chip with higher clock, depending on the task.

So much evolution has, of course, changed the way we approach software designing. Today, we must think of parallelism to design systems that make rational use of resources without wasting them, thereby providing a better experience to the user and saving energy not only in personal computers, but also at processing centers. More than ever, parallel programming is in the developers' daily lives and, apparently, it will never go back.

This chapter covers the following topics:

  • Why use parallel programming?

  • Introducing the common forms of parallelization

  • Communicating in parallel programming

  • Identifying parallel programming problems

  • Discovering Python's programming tools

  • Taking care of Python Global Interpreter Lock (GIL)

Why use parallel programming?


Since computing systems have evolved, they have started to provide mechanisms that allow us to run independent pieces of a specific program in parallel with one another, thus enhancing the response and the general performance. Moreover, we can easily verify that the machines are equipped with more processors and these with plenty of more cores. So, why not take advantage of this architecture?

Parallel programming is a reality in all contexts of system development, from smart phones and tablets, to heavy duty computing in research centers. A solid basis in parallel programming will allow a developer to optimize the performance of an application. This results in enhancement of user experience as well as consumption of computing resources, thereby taking up less processing time for the accomplishment of complex tasks.

As an example of parallelism, let us picture a scenario in which an application that, amongst other tasks, selects information from a database, and this database has considerable size. Consider as well, the application being sequential, in which tasks must be run one after another in a logical sequence. When a user requests data, the rest of the system will be blocked until the data return is not concluded. However, making use of parallel programming, we will be allowed to create a new worker that which will seek information in this database without blocking other functions in the application, thus enhancing its use.

Exploring common forms of parallelization


There is a certain confusion when we try to define the main forms of paralleling systems. It is common to find quotations on parallel and concurrent systems as if both meant the same thing. Nevertheless, there are slight differences between them.

Within concurrent programming, we have a scenario in which a program dispatches several workers and these workers dispute to use the CPU to run a task. The stage at which the dispute takes place is controlled by the CPU scheduler, whose function is to define which worker is apt for using the resource at a specific moment. In most cases, the CPU scheduler runs the task of raking processes so fast that we might get the impression of pseudo-parallelism. Therefore, concurrent programming is an abstraction from parallel programming.

Note

Concurrent systems dispute over the same CPU to run tasks.

The following diagram shows a concurrent program scheme:

Concurrent programming scheme.

Parallel programming can be defined as an approach in which program data creates workers to run specific tasks simultaneously in a multicore environment without the need for concurrency amongst them to access a CPU.

Note

Parallel systems run tasks simultaneously.

The following figure shows the concept of parallel systems:

Parallel programming scheme.

Distributed programming aims at the possibility of sharing the processing by exchanging data through messages between machines (nodes) of computing, which are physically separated.

Distributed programming is becoming more and more popular for many reasons; they are explored as follows:

  • Fault-tolerance: As the system is decentralized, we can distribute the processing to different machines in a network, and thus perform individual maintenance of specific machines without affecting the functioning of the system as a whole.

  • Horizontal scalability: We can increase the capacity of processing in distributed systems in general. We can link new equipment with no need to abort applications being executed. We can say that it is cheaper and simpler compared to vertical scalability.

  • Cloud computing: With the reduction in hardware costs, we need the growth of this type of business where we can obtaining huge machine parks acting in a cooperative way and running programs in a transparent way for their users.

Note

Distributed systems run tasks within physically-separated nodes.

The following figure shows a distributed system scheme:

Distributed programming scheme.

Communicating in parallel programming


In parallel programming, the workers that are sent to perform a task often need to establish communication so that there can be cooperation in tackling a problem. In most cases, this communication is established in such a way that data can be exchanged amongst workers. There are two forms of communication that are more widely known when it comes to parallel programming: shared state and message passing. In the following sections, a brief description of both will be presented.

Understanding shared state

One the most well-known forms of communication amongst workers is shared state. Shared state seems straightforward to use but has many pitfalls because an invalid operation made to the shared resource by one of the processes will affect all of the others, thereby producing bad results. It also makes it impossible for the program to be distributed between multiple machines for obvious reasons.

Illustrating this, we will make use of a real-world case. Suppose you are a customer of a specific bank, and this bank has only one cashier. When you go to the bank, you must head to a queue and wait for your chance. Once in the queue, you notice that only one customer can make use of the cashier at a time, and it would be impossible for the cashier to attend two customers simultaneously without potentially making errors. Computing provides means to access data in a controlled way, and there are several techniques, such as mutex.

Mutex can be understood as a special process variable that indicates the level of availability to access data. That is, in our real-life example, the customer has a number, and at a specific moment, this number will be activated and the cashier will be available for this customer exclusively. At the end of the process, this customer will free the cashier for the next customer, and so on.

Note

There are cases in which data has a constant value in a variable while the program is running, and the data is shared only for reading purposes. So, access control is not necessary because it will never present integrity problems.

Understanding message passing

Message passing is used when we aim to avoid data access control and synchronizing problems originating from shared state. Message passing consists of a mechanism for message exchange in running processes. It is very commonly used whenever we are developing programs with distributed architecture, where the message exchanges within the network they are placed are necessary. Languages such as Erlang, for instance, use this model to implement communication in its parallel architecture. Once data is copied at each message exchange, it is impossible that problems occur in terms of concurrence of access. Although memory use seems to be higher than in shared memory state, there are advantages to the use of this model. They are as follows:

  • Absence of data access concurrence

  • Messages can be exchange locally (various processes) or in distributed environments

  • This makes it less likely that scalability issues occur and enables interoperability of different systems

  • In general, it is easy to maintain according to programmers

Identifying parallel programming problems


There are classic problems that brave keyboard warriors can face while battling in the lands where parallel programming ghosts dwell. Many of these problems occur more often when inexperienced programmers make use of workers combined with shared state. Some of these issues will be described in the following sections.

Deadlock

Deadlock is a situation in which two or more workers keep indefinitely waiting for the freeing of a resource, which is blocked by a worker of the same group for some reason. For a better understanding, we will use another real-life case. Imagine the bank whose entrance has a rotating door. Customer A heads to the side, which will allow him to enter the bank, while customer B tries to exit the bank by using the entrance side of this rotating door so that both customers would be stuck forcing the door but heading nowhere. This situation would be hilarious in real life but tragic in programming.

Note

Deadlock is a phenomenon in which processes wait for a condition to free their tasks, but this condition will never occur.

Starvation

This is the issue whose side effects are caused by unfair raking of one or more processes that take much more time to run a task. Imagine a group of processes, A, which runs heavy tasks and has data processor priority. Now, imagine that a process A with high priority constantly consumes the CPU, while a lower priority process B never gets the chance. Hence, one can say that process B is starving for CPU cycles.

Note

Starvation is caused by badly adjusted policies of process ranking.

Race conditions

When the result of a process depends on a sequence of facts, and this sequence is broken due to the lack of synchronizing mechanisms, we face race conditions. They result from problems that are extremely difficult to filter in larger systems. For instance, a couple has a joint account; the initial balance before operations is $100. The following table shows the regular case, in which there are mechanisms of protection and the sequence of expected facts, as well as the result:

Husband

Wife

Account balance (dollars)

  

100

Read balance

 

100

Adds 20

 

100

Concludes operation

 

120

 

Read balance

120

 

Withdraws 10

120

 

Concludes operation

110

Presents baking operations without the chance of race conditions occurrence

In the following table, the problematic scenario is presented. Suppose that the account does not have mechanisms of synchronization and the order of operations is not as expected.

Husband

Wife

Account balance (dollars)

  

100

Read balance

 

100

Withdraws 100

 

100

 

Reads balance

100

 

Withdraws 10

100

Concludes operation updating balance

 

0

 

Concludes operation updating balance

90

Analogy to balance the problem in a joint account and race conditions

There is a noticeable inconsistency in the final result due to the unexpected lack of synchronization in the operations sequence. One of the parallel programming characteristics is non-determinism. It is impossible to foresee the moment at which two workers will be running, or even which of them will run first. Therefore, synchronization mechanisms are essential.

Note

Non-determinism, if combined with lack of synchronization mechanisms, may lead to race condition issues.

Discovering Python's parallel programming tools


The Python language, created by Guido Van Rossum, is a multi-paradigm, multi-purpose language. It has been widely accepted worldwide due to its powerful simplicity and easy maintenance. It is also known as the language that has batteries included. There is a wide range of modules to make its use smoother. Within parallel programming, Python has built-in and external modules that simplify implementation. This work is based on Python 3.x.

The Python threading module

The Python threading module offers a layer of abstraction to the module _thread, which is a lower-level module. It provides functions that help the programmer during the hard task of developing parallel systems based on threads. The threading module's official papers can be found at http://docs.python.org/3/library/threading.html?highlight=threading#module-threadin.

The Python multiprocessing module

The multiprocessing module aims at providing a simple API for the use of parallelism based on processes. This module is similar to the threading module, which simplifies alternations between the processes without major difficulties. The approach that is based on processes is very popular within the Python users' community as it is an alternative to answering questions on the use of CPU-Bound threads and GIL present in Python. The multiprocessing module's official papers can be found at http://docs.python.org/3/library/multiprocessing.html?highlight=multiprocessing#multiprocessing.

The parallel Python module

The parallel Python module is external and offers a rich API for the creation of parallel and distributed systems making use of the processes approach. This module promises to be light and easy to install, and integrates with other Python programs. The parallel Python module can be found at http://parallelpython.com. Among some of the features, we may highlight the following:

  • Automatic detection of the optimal configuration

  • The fact that a number of worker processes can be changed during runtime

  • Dynamic load balance

  • Fault tolerance

  • Auto-discovery of computational resources

Celery – a distributed task queue

Celery is an excellent Python module that's used to create distributed systems and has excellent documentation. It makes use of at least three different types of approach to run tasks in concurrent form—multiprocessing, Eventlet, and Gevent. This work will, however, concentrate efforts on the use of the multiprocessing approach. Also, the link between one and another is a configuration issue, and it remains as a study so that the reader is able to establish comparisons with his/her own experiments.

The Celery module can be obtained on the official project page at http://celeryproject.org.

Taking care of Python GIL


GIL is a mechanism that is used in implementing standard Python, known as CPython, to avoid bytecodes that are executed simultaneously by different threads. The existence of GIL in Python is a reason for fiery discussion amongst users of this language. GIL was chosen to protect the internal memory used by the CPython interpreter, which does not implement mechanisms of synchronization for the concurrent access by threads. In any case, GIL results in a problem when we decide to use threads, and these tend to be CPU-bound. I/O Threads, for example, are out of GIL's scope. Maybe the mechanism brings more benefits to the evolution of Python than harm to it. Evidently, we could not consider only speed as a single argument to determine whether something is good or not.

There are cases in which the approach to the use of processes for tasks sided with message passing brings better relations among maintainability, scalability, and performance. Even so, there are cases in which there will be a real need for threads, which would be subdued to GIL. In these cases, what could be done is write such pieces of code as extensions in C language, and embed them into the Python program. Thus, there are alternatives; it is up to the developer to analyze the real necessity. So, there comes the question: is GIL, in a general way, a villain? It is important to remember that, the PyPy team is working on an STM implementation in order to remove GIL from Python. For more details about the project, visit http://pypy.org/tmdonate.html.

Summary


In this chapter, we learned some parallel programming concepts, and learned about some models, their advantages, and disadvantages. Some of the problems and potential issues when thinking of parallelism have been presented in a brief explanations. We also had a short introduction to some Python modules, built-in and external, which makes a developer's life easier when building up parallel systems.

In the next chapter, we will be studying some techniques to design parallel algorithms.

Left arrow icon Right arrow icon

Key benefits

What you will learn

Explore techniques to parallelize problems Integrate the Parallel Python module to implement Python code Execute parallel solutions on simple problems Achieve communication between processes using Pipe and Queue Use Celery Distributed Task Queue Implement asynchronous I/O using the Python asyncio module Create threadsafe structures

What do you get with a Packt Subscription?

Free for first 7 days. $15.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details


Publication date : Jun 25, 2014
Length 124 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781783288397
Category :

Table of Contents

16 Chapters
Parallel Programming with Python Chevron down icon Chevron up icon
Credits Chevron down icon Chevron up icon
About the Author Chevron down icon Chevron up icon
Acknowledgments Chevron down icon Chevron up icon
About the Reviewers Chevron down icon Chevron up icon
www.PacktPub.com Chevron down icon Chevron up icon
Preface Chevron down icon Chevron up icon
Contextualizing Parallel, Concurrent, and Distributed Programming Chevron down icon Chevron up icon
Designing Parallel Algorithms Chevron down icon Chevron up icon
Identifying a Parallelizable Problem Chevron down icon Chevron up icon
Using the threading and concurrent.futures Modules Chevron down icon Chevron up icon
Using Multiprocessing and ProcessPoolExecutor Chevron down icon Chevron up icon
Utilizing Parallel Python Chevron down icon Chevron up icon
Distributing Tasks with Celery Chevron down icon Chevron up icon
Doing Things Asynchronously Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Filter icon Filter
Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.