Reader small image

You're reading from  Mastering matplotlib

Product typeBook
Published inJun 2015
Reading LevelIntermediate
Publisher
ISBN-139781783987542
Edition1st Edition
Languages
Right arrow
Authors (2):
Duncan M. McGreggor
Duncan M. McGreggor
author image
Duncan M. McGreggor

Duncan M. McGreggor, having programmed with GOTOs in the 1980s, has made up for that through community service by making open source contributions for more than 20 years. He has spent a major part of the past 10 years dealing with distributed and scientific computing (in languages ranging from Python, Common Lisp, and Julia to Clojure and Lisp Flavored Erlang). In the 1990s, after serving as a linguist in the US Army, he spent considerable time working on projects related to MATLAB and Mathematica, which was a part of his physics and maths studies at the university. Since the mid 2000s, matplotlib and NumPy have figured prominently in many of the interesting problems that he has solved for his customers. With the most recent addition of the IPython Notebook, matplotlib and the suite of the Python scientific computing libraries remain some of his most important professional tools.
Read more about Duncan M. McGreggor

Duncan M McGreggor
Duncan M McGreggor
author image
Duncan M McGreggor

Duncan M. McGreggor, having programmed with GOTOs in the 1980s, has made up for that through community service by making open source contributions for more than 20 years. He has spent a major part of the past 10 years dealing with distributed and scientific computing (in languages ranging from Python, Common Lisp, and Julia to Clojure and Lisp Flavored Erlang). In the 1990s, after serving as a linguist in the US Army, he spent considerable time working on projects related to MATLAB and Mathematica, which was a part of his physics and maths studies at the university. Since the mid 2000s, matplotlib and NumPy have figured prominently in many of the interesting problems that he has solved for his customers. With the most recent addition of the IPython Notebook, matplotlib and the suite of the Python scientific computing libraries remain some of his most important professional tools.
Read more about Duncan M McGreggor

View More author details
Right arrow

Chapter 9. Clustering for matplotlib

In the final chapter of this book, we will address a topic that has been alluded to several times—clustering and parallel programming for matplotlib. Our motivation to discuss this is nearly identical to that which drove our investigation into working with large datasets. Although matplotlib lib itself isn't a library that makes direct use of large datasets or provides an API that can be used with clusters, advanced users of the library will very likely encounter situations where they may want to utilize matplotlib.

Not to put too fine a point on this, we live in a new world of computing. This was presented exceptionally well in the oft-quoted article, The Free Lunch Is Over, by Herb Sutter. With the drastic limitations faced by the semiconductor industry, yearly gains in computing power are no longer a result of faster chips. Instead, we get this benefit through the addition of cores in a single machine. Unfortunately, common practices in programming...

Clustering and parallel programming


The term clustering may have a number of operational definitions depending on the situation that one is facing or the organization that one is working with. In this chapter, we will use the term in a very general sense to indicate a system of computing nodes across which a task may be split and whose parts may be executed in parallel with all the system nodes. We won't specify what nodes are, as they may be anything from a collection of processes on the same machine or a computer network to virtual machines or physical computers on a network.

The word "cluster" alludes to a logical collection, but in our definition there is a more important word—parallel. For our purposes, clusters exist to make running code in parallel more efficient or convenient. The topic of parallel computing is a vast one and has an interesting history. However, it rose to greater prominence in 2003 due to the physical limitations that were encountered by the chip-making industry...

The custom ZeroMQ cluster


In this section, we will create a task pipeline, which is a messaging pattern that we mentioned briefly when discussing the Disco project's answer to MapReduce. Task pipelines can be viewed as a generalization of MapReduce in that they support data flows through a sequence of steps, where each step provides the results from its processing to the next step in the flow. We will accomplish this by using ZeroMQ to create a messaging topology that is suitable for the execution of embarrassingly parallel code in a number of worker processes.

Note

The descriptive term embarrassingly parallel was adopted in online parallelization discussions after its use in an article named Matrix Computation on Distributed Memory Multiprocessors that was written by Cleve Moler. The parallelizing of serially biased code is notoriously difficult, and problems that were obviously or easily parallelizable were described by using this phrase.

ZeroMQ is an asynchronous messaging framework, which...

Clustering with IPython


As explained in the IPython documentation for parallel computing, IPython has built-in support for parallelism. This came as a result of the architectural overhaul that IPython received when the project finished migrating to ZeroMQ in 2011. The architecture that resulted can be summarized with the following components, all of which are present in the IPython.parallel package:

  • The IPython engine: This is a Python interpreter that accepts Python commands over a network connection. Multiple engines form the basis of IPython's parallel computing capabilities.

  • The IPython hub: This is the process that keeps track of engine connections, schedulers, clients, task requests, and results. Its primary purpose is to facilitate queries that are made from the cluster state.

  • The IPython schedulers: The actions that can be performed on an engine go through a scheduler. They also provide a fully asynchronous interface to a set of engines.

  • The controller client: This is the user interface...

More clustering


There are other interesting options if you wish to parallelize the Python code and run it on clusters on problems that require you to visualize computationally intensive problems. Of particular interest is MIT's StarCluster project, which runs on Amazon AWS (EC2) and supports the Open Grid Engine (formerly known as the Sun Grid Engine). Furthermore, StarCluster has an IPython cluster plugin, which lets you easily run the IPython.parallel code on EC2. For more information on this, be sure to refer to this chapter's IPython Notebook, the StarCluster documentation, and the IPython documentation for parallel computations.

Summary


With this chapter, we reached the culmination of our adventure through the advanced topics in the world of matplotlib. Many of the topics covered in the latter half of this book crossed into other domains, as is often the case with the advanced usage of any software. These topics explored systems that did not have a direct and obvious connection with the narrow scope of matplotlib as a library. Rather, they reflected the usage patterns that are requested of software engineers working on real-world problems.

This bears further reflection. Often, computing problems in both research settings and start-ups are initially tackled quickly to get the results and examine the data as soon as possible. The next round of usage might require the addition of a bit more functionality or some other code tweaks. After a few months, you may be in any of the following situations:

  • You may end up with one or two functions of enormous size with no obvious or clear path towards something that would be more...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering matplotlib
Published in: Jun 2015Publisher: ISBN-13: 9781783987542
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Duncan M. McGreggor

Duncan M. McGreggor, having programmed with GOTOs in the 1980s, has made up for that through community service by making open source contributions for more than 20 years. He has spent a major part of the past 10 years dealing with distributed and scientific computing (in languages ranging from Python, Common Lisp, and Julia to Clojure and Lisp Flavored Erlang). In the 1990s, after serving as a linguist in the US Army, he spent considerable time working on projects related to MATLAB and Mathematica, which was a part of his physics and maths studies at the university. Since the mid 2000s, matplotlib and NumPy have figured prominently in many of the interesting problems that he has solved for his customers. With the most recent addition of the IPython Notebook, matplotlib and the suite of the Python scientific computing libraries remain some of his most important professional tools.
Read more about Duncan M. McGreggor

author image
Duncan M McGreggor

Duncan M. McGreggor, having programmed with GOTOs in the 1980s, has made up for that through community service by making open source contributions for more than 20 years. He has spent a major part of the past 10 years dealing with distributed and scientific computing (in languages ranging from Python, Common Lisp, and Julia to Clojure and Lisp Flavored Erlang). In the 1990s, after serving as a linguist in the US Army, he spent considerable time working on projects related to MATLAB and Mathematica, which was a part of his physics and maths studies at the university. Since the mid 2000s, matplotlib and NumPy have figured prominently in many of the interesting problems that he has solved for his customers. With the most recent addition of the IPython Notebook, matplotlib and the suite of the Python scientific computing libraries remain some of his most important professional tools.
Read more about Duncan M McGreggor