Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Parallel Programming with Python
Parallel Programming with Python

Parallel Programming with Python: Develop efficient parallel systems using the robust Python environment.

€15.99 €10.99
Book Jun 2014 124 pages 1st Edition
eBook
€15.99 €10.99
Print
€19.99
Subscription
€14.99 Monthly
eBook
€15.99 €10.99
Print
€19.99
Subscription
€14.99 Monthly

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Jun 25, 2014
Length 124 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781783288397
Category :
Table of content icon View table of contents Preview book icon Preview Book

Parallel Programming with Python

Chapter 1. Contextualizing Parallel, Concurrent, and Distributed Programming

Parallel programming can be defined as a model that aims to create programs that are compatible with environments prepared to execute code instructions simultaneously. It has not been too long since techniques of parallelism began to be used to develop software. Some years ago, processors had a single Arithmetic Logic Unit (ALU) among other components, which could only execute one instruction at a time during a time space. For years, only a clock that measured in hertz to determine the number of instructions a processor could process within a given interval of time was taken into consideration. The more the number of clocks, the more the instructions potentially executed in terms of KHz (thousands of operations per second), MHz (millions of operations per second), and the current GHz (billions of operations per second).

Summing up, the more instructions per cycle given to the processor, the faster the execution. During the '80s, a revolutionary processor came to life, Intel 80386, which allowed the execution of tasks in a pre-emptive manner, that is, it was possible to periodically interrupt the execution of a program to provide processor time to another program; this meant pseudo-parallelism based on time-slicing.

In the late '80s, there came Intel 80486 that implemented a pipelining system, which in practice, divided the stage of execution into distinct substages. In practical terms, in a cycle of the processor, we could have different instructions being carried out simultaneously in each substage.

All the advances mentioned in the preceding section resulted in several improvements in performance, but it was not enough, as we were faced with a delicate issue that would end up as the so-called Moore's law (http://www.mooreslaw.org/).

The quest for high taxes of clock ended up colliding with physical limitations; processors would consume more energy, thereby generating more heat. Moreover, there was another as important issue: the market for portable computers was speeding up in the '90s. So, it was extremely important to have processors that could make the batteries of these pieces of equipment last long enough away from the plug. Several technologies and families of processors from different manufacturers were born. As regards servers and mainframes, Intel® deserves to be highlighted with its family of products Core®, which allowed to trick the operating system by simulating the existence of more than one processor even though there was a single physical chip.

In the Core® family, the processor got severe internal changes and featured components called core, which had their own ALU and caches L2 and L3, among other elements to carry out instructions. Those cores, also known as logical processors, allowed us to parallel the execution of different parts of the same program, or even different programs, simultaneously. The age core enabled lower energy use with power processing superior to its predecessors. As cores work in parallel, simulating independent processors, we can have a multi-core chip and an inferior clock, thereby getting superior performance compared to a single-core chip with higher clock, depending on the task.

So much evolution has, of course, changed the way we approach software designing. Today, we must think of parallelism to design systems that make rational use of resources without wasting them, thereby providing a better experience to the user and saving energy not only in personal computers, but also at processing centers. More than ever, parallel programming is in the developers' daily lives and, apparently, it will never go back.

This chapter covers the following topics:

  • Why use parallel programming?

  • Introducing the common forms of parallelization

  • Communicating in parallel programming

  • Identifying parallel programming problems

  • Discovering Python's programming tools

  • Taking care of Python Global Interpreter Lock (GIL)

Why use parallel programming?


Since computing systems have evolved, they have started to provide mechanisms that allow us to run independent pieces of a specific program in parallel with one another, thus enhancing the response and the general performance. Moreover, we can easily verify that the machines are equipped with more processors and these with plenty of more cores. So, why not take advantage of this architecture?

Parallel programming is a reality in all contexts of system development, from smart phones and tablets, to heavy duty computing in research centers. A solid basis in parallel programming will allow a developer to optimize the performance of an application. This results in enhancement of user experience as well as consumption of computing resources, thereby taking up less processing time for the accomplishment of complex tasks.

As an example of parallelism, let us picture a scenario in which an application that, amongst other tasks, selects information from a database, and this database has considerable size. Consider as well, the application being sequential, in which tasks must be run one after another in a logical sequence. When a user requests data, the rest of the system will be blocked until the data return is not concluded. However, making use of parallel programming, we will be allowed to create a new worker that which will seek information in this database without blocking other functions in the application, thus enhancing its use.

Exploring common forms of parallelization


There is a certain confusion when we try to define the main forms of paralleling systems. It is common to find quotations on parallel and concurrent systems as if both meant the same thing. Nevertheless, there are slight differences between them.

Within concurrent programming, we have a scenario in which a program dispatches several workers and these workers dispute to use the CPU to run a task. The stage at which the dispute takes place is controlled by the CPU scheduler, whose function is to define which worker is apt for using the resource at a specific moment. In most cases, the CPU scheduler runs the task of raking processes so fast that we might get the impression of pseudo-parallelism. Therefore, concurrent programming is an abstraction from parallel programming.

Note

Concurrent systems dispute over the same CPU to run tasks.

The following diagram shows a concurrent program scheme:

Concurrent programming scheme.

Parallel programming can be defined as an approach in which program data creates workers to run specific tasks simultaneously in a multicore environment without the need for concurrency amongst them to access a CPU.

Note

Parallel systems run tasks simultaneously.

The following figure shows the concept of parallel systems:

Parallel programming scheme.

Distributed programming aims at the possibility of sharing the processing by exchanging data through messages between machines (nodes) of computing, which are physically separated.

Distributed programming is becoming more and more popular for many reasons; they are explored as follows:

  • Fault-tolerance: As the system is decentralized, we can distribute the processing to different machines in a network, and thus perform individual maintenance of specific machines without affecting the functioning of the system as a whole.

  • Horizontal scalability: We can increase the capacity of processing in distributed systems in general. We can link new equipment with no need to abort applications being executed. We can say that it is cheaper and simpler compared to vertical scalability.

  • Cloud computing: With the reduction in hardware costs, we need the growth of this type of business where we can obtaining huge machine parks acting in a cooperative way and running programs in a transparent way for their users.

Note

Distributed systems run tasks within physically-separated nodes.

The following figure shows a distributed system scheme:

Distributed programming scheme.

Communicating in parallel programming


In parallel programming, the workers that are sent to perform a task often need to establish communication so that there can be cooperation in tackling a problem. In most cases, this communication is established in such a way that data can be exchanged amongst workers. There are two forms of communication that are more widely known when it comes to parallel programming: shared state and message passing. In the following sections, a brief description of both will be presented.

Understanding shared state

One the most well-known forms of communication amongst workers is shared state. Shared state seems straightforward to use but has many pitfalls because an invalid operation made to the shared resource by one of the processes will affect all of the others, thereby producing bad results. It also makes it impossible for the program to be distributed between multiple machines for obvious reasons.

Illustrating this, we will make use of a real-world case. Suppose you are a customer of a specific bank, and this bank has only one cashier. When you go to the bank, you must head to a queue and wait for your chance. Once in the queue, you notice that only one customer can make use of the cashier at a time, and it would be impossible for the cashier to attend two customers simultaneously without potentially making errors. Computing provides means to access data in a controlled way, and there are several techniques, such as mutex.

Mutex can be understood as a special process variable that indicates the level of availability to access data. That is, in our real-life example, the customer has a number, and at a specific moment, this number will be activated and the cashier will be available for this customer exclusively. At the end of the process, this customer will free the cashier for the next customer, and so on.

Note

There are cases in which data has a constant value in a variable while the program is running, and the data is shared only for reading purposes. So, access control is not necessary because it will never present integrity problems.

Understanding message passing

Message passing is used when we aim to avoid data access control and synchronizing problems originating from shared state. Message passing consists of a mechanism for message exchange in running processes. It is very commonly used whenever we are developing programs with distributed architecture, where the message exchanges within the network they are placed are necessary. Languages such as Erlang, for instance, use this model to implement communication in its parallel architecture. Once data is copied at each message exchange, it is impossible that problems occur in terms of concurrence of access. Although memory use seems to be higher than in shared memory state, there are advantages to the use of this model. They are as follows:

  • Absence of data access concurrence

  • Messages can be exchange locally (various processes) or in distributed environments

  • This makes it less likely that scalability issues occur and enables interoperability of different systems

  • In general, it is easy to maintain according to programmers

Identifying parallel programming problems


There are classic problems that brave keyboard warriors can face while battling in the lands where parallel programming ghosts dwell. Many of these problems occur more often when inexperienced programmers make use of workers combined with shared state. Some of these issues will be described in the following sections.

Deadlock

Deadlock is a situation in which two or more workers keep indefinitely waiting for the freeing of a resource, which is blocked by a worker of the same group for some reason. For a better understanding, we will use another real-life case. Imagine the bank whose entrance has a rotating door. Customer A heads to the side, which will allow him to enter the bank, while customer B tries to exit the bank by using the entrance side of this rotating door so that both customers would be stuck forcing the door but heading nowhere. This situation would be hilarious in real life but tragic in programming.

Note

Deadlock is a phenomenon in which processes wait for a condition to free their tasks, but this condition will never occur.

Starvation

This is the issue whose side effects are caused by unfair raking of one or more processes that take much more time to run a task. Imagine a group of processes, A, which runs heavy tasks and has data processor priority. Now, imagine that a process A with high priority constantly consumes the CPU, while a lower priority process B never gets the chance. Hence, one can say that process B is starving for CPU cycles.

Note

Starvation is caused by badly adjusted policies of process ranking.

Race conditions

When the result of a process depends on a sequence of facts, and this sequence is broken due to the lack of synchronizing mechanisms, we face race conditions. They result from problems that are extremely difficult to filter in larger systems. For instance, a couple has a joint account; the initial balance before operations is $100. The following table shows the regular case, in which there are mechanisms of protection and the sequence of expected facts, as well as the result:

Husband

Wife

Account balance (dollars)

  

100

Read balance

 

100

Adds 20

 

100

Concludes operation

 

120

 

Read balance

120

 

Withdraws 10

120

 

Concludes operation

110

Presents baking operations without the chance of race conditions occurrence

In the following table, the problematic scenario is presented. Suppose that the account does not have mechanisms of synchronization and the order of operations is not as expected.

Husband

Wife

Account balance (dollars)

  

100

Read balance

 

100

Withdraws 100

 

100

 

Reads balance

100

 

Withdraws 10

100

Concludes operation updating balance

 

0

 

Concludes operation updating balance

90

Analogy to balance the problem in a joint account and race conditions

There is a noticeable inconsistency in the final result due to the unexpected lack of synchronization in the operations sequence. One of the parallel programming characteristics is non-determinism. It is impossible to foresee the moment at which two workers will be running, or even which of them will run first. Therefore, synchronization mechanisms are essential.

Note

Non-determinism, if combined with lack of synchronization mechanisms, may lead to race condition issues.

Discovering Python's parallel programming tools


The Python language, created by Guido Van Rossum, is a multi-paradigm, multi-purpose language. It has been widely accepted worldwide due to its powerful simplicity and easy maintenance. It is also known as the language that has batteries included. There is a wide range of modules to make its use smoother. Within parallel programming, Python has built-in and external modules that simplify implementation. This work is based on Python 3.x.

The Python threading module

The Python threading module offers a layer of abstraction to the module _thread, which is a lower-level module. It provides functions that help the programmer during the hard task of developing parallel systems based on threads. The threading module's official papers can be found at http://docs.python.org/3/library/threading.html?highlight=threading#module-threadin.

The Python multiprocessing module

The multiprocessing module aims at providing a simple API for the use of parallelism based on processes. This module is similar to the threading module, which simplifies alternations between the processes without major difficulties. The approach that is based on processes is very popular within the Python users' community as it is an alternative to answering questions on the use of CPU-Bound threads and GIL present in Python. The multiprocessing module's official papers can be found at http://docs.python.org/3/library/multiprocessing.html?highlight=multiprocessing#multiprocessing.

The parallel Python module

The parallel Python module is external and offers a rich API for the creation of parallel and distributed systems making use of the processes approach. This module promises to be light and easy to install, and integrates with other Python programs. The parallel Python module can be found at http://parallelpython.com. Among some of the features, we may highlight the following:

  • Automatic detection of the optimal configuration

  • The fact that a number of worker processes can be changed during runtime

  • Dynamic load balance

  • Fault tolerance

  • Auto-discovery of computational resources

Celery – a distributed task queue

Celery is an excellent Python module that's used to create distributed systems and has excellent documentation. It makes use of at least three different types of approach to run tasks in concurrent form—multiprocessing, Eventlet, and Gevent. This work will, however, concentrate efforts on the use of the multiprocessing approach. Also, the link between one and another is a configuration issue, and it remains as a study so that the reader is able to establish comparisons with his/her own experiments.

The Celery module can be obtained on the official project page at http://celeryproject.org.

Taking care of Python GIL


GIL is a mechanism that is used in implementing standard Python, known as CPython, to avoid bytecodes that are executed simultaneously by different threads. The existence of GIL in Python is a reason for fiery discussion amongst users of this language. GIL was chosen to protect the internal memory used by the CPython interpreter, which does not implement mechanisms of synchronization for the concurrent access by threads. In any case, GIL results in a problem when we decide to use threads, and these tend to be CPU-bound. I/O Threads, for example, are out of GIL's scope. Maybe the mechanism brings more benefits to the evolution of Python than harm to it. Evidently, we could not consider only speed as a single argument to determine whether something is good or not.

There are cases in which the approach to the use of processes for tasks sided with message passing brings better relations among maintainability, scalability, and performance. Even so, there are cases in which there will be a real need for threads, which would be subdued to GIL. In these cases, what could be done is write such pieces of code as extensions in C language, and embed them into the Python program. Thus, there are alternatives; it is up to the developer to analyze the real necessity. So, there comes the question: is GIL, in a general way, a villain? It is important to remember that, the PyPy team is working on an STM implementation in order to remove GIL from Python. For more details about the project, visit http://pypy.org/tmdonate.html.

Summary


In this chapter, we learned some parallel programming concepts, and learned about some models, their advantages, and disadvantages. Some of the problems and potential issues when thinking of parallelism have been presented in a brief explanations. We also had a short introduction to some Python modules, built-in and external, which makes a developer's life easier when building up parallel systems.

In the next chapter, we will be studying some techniques to design parallel algorithms.

Left arrow icon Right arrow icon

Key benefits

What you will learn

Explore techniques to parallelize problems Integrate the Parallel Python module to implement Python code Execute parallel solutions on simple problems Achieve communication between processes using Pipe and Queue Use Celery Distributed Task Queue Implement asynchronous I/O using the Python asyncio module Create threadsafe structures

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Jun 25, 2014
Length 124 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781783288397
Category :

Table of Contents

16 Chapters
Parallel Programming with Python Chevron down icon Chevron up icon
Credits Chevron down icon Chevron up icon
About the Author Chevron down icon Chevron up icon
Acknowledgments Chevron down icon Chevron up icon
About the Reviewers Chevron down icon Chevron up icon
www.PacktPub.com Chevron down icon Chevron up icon
Preface Chevron down icon Chevron up icon
Contextualizing Parallel, Concurrent, and Distributed Programming Chevron down icon Chevron up icon
Designing Parallel Algorithms Chevron down icon Chevron up icon
Identifying a Parallelizable Problem Chevron down icon Chevron up icon
Using the threading and concurrent.futures Modules Chevron down icon Chevron up icon
Using Multiprocessing and ProcessPoolExecutor Chevron down icon Chevron up icon
Utilizing Parallel Python Chevron down icon Chevron up icon
Distributing Tasks with Celery Chevron down icon Chevron up icon
Doing Things Asynchronously Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Filter icon Filter
Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.