Mastering IPython 4.0

Chapter 1. Using IPython for HPC

In this chapter, we are going to look at why IPython should be considered a viable tool for building high-performance and parallel systems.

This chapter covers the following topics:

The need for speed
Fortran as a solution
Choosing between IPython and Fortran
An example case—the Fast Fourier Transform
High-performance computing and the cloud
Going parallel

FORTRAN to the rescue – the problems FORTRAN addressed

After the initial successes of the computer (breaking German codes and calculating logarithms), the field ran into two problems. Firstly, the machine itself was slow—or at least slower than desired—for the new problems at hand. Secondly, it took too long to write the instructions (code) that the machine would execute to solve the problem.

Making the machine itself faster was largely an engineering problem. The underlying substrate went from steam and valves to electromechanical relays to vacuum tubes to integrated circuits. Each change in the substrate improved the rate at which instructions could be executed. This form of progress, while interesting, is outside of the scope of this book.

Once computers evolved past needing their programs to be wired up, programmers were free to start expressing their algorithms as text, in a programming language. While typing is faster than running wires, it has its own issues. Fortran was one of the first languages to address them successfully.

Readability

Early languages were generally not very human-friendly. It took specialized training to be able to write (and read) programs written in these languages. Programmers would often add comments to their code, either within the code itself or in external documentation, but the problem was deeper. The languages themselves were cryptic.

For example, the following code in x86 assembly language determines whether a year is a leap year or not (from http://rosettacode.org/wiki/Leap_year#X86_Assembly):

    align 16
; Input year as signed dword in EAX
IsLeapYear:
    test eax,11b
    jz .4
    retn ; 75% : ZF=0, not a leap year
.4:
    mov ecx,100
    cdq
    idiv ecx
    test edx,edx
    jz .100
    cmp edx,edx
    retn ; 24% : ZF=1, leap year
.100:
    test eax,11b
    retn ; 1% : ZF=?, leap year if EAX%400=0

This is the first problem Fortran addressed. Fortran set out to be more readable. An important goal was that mathematical equations in code should look like mathematical expressions written by human beings. This was an important step in enabling coders to express algorithms in terms that they themselves understood, as opposed to a format the machine could directly work with. By comparison, a Fortran function to determine whether a year is a leap year reads easily (from http://rosettacode.org/wiki/Leap_year#Fortran):

pure elemental function leap_year(y) result(is_leap)
  implicit none
  logical :: is_leap
  integer,intent(in) :: y

  is_leap = (mod(y,4)==0 .and. .not. mod(y,100)==0) .or. (mod(y,400)==0)

end function leap_year

Portability

The first languages were specific to the machine they were meant to run on. A program written on one machine would not run on another. This led to the wheel being reinvented often. Consider a sorting algorithm. Many programs need to sort their data, so sorting algorithms would be needed on many different computers. Unfortunately, an implementation of quicksort on one machine, in that machine's language, would not run on another machine, in its language. This resulted in many, many reimplementations of the same algorithm.

Also, a programmer who knew how to write code on one machine had to relearn everything to use another. Not only was it difficult for talented individuals to go where they were needed, but also buying a new machine meant retraining the entire staff. The first thing the staff then did was rewrite all the existing (working) code so that it would run on the new machine. It was a tremendous waste of talent and time.

This is the second problem Fortran addressed—how can a program be expressed so that it runs on more than one machine (that is, how can programs be made portable)? The goal was that if a program was written in Fortran on one machine, then it would run on any other machine that supported Fortran.

To this end, Fortran compilers were developed. A compiler translates a program in one language (Fortran in this case) to another language (the language of the machine the program would run on).

Efficiency

While readability and portability were important, no one was going to use Fortran if the resulting program ran slowly on their computer. Early coders expended immense amounts of time and effort making their code run as quickly as possible. Problems were big and computers were slow and time was money.

This is the third problem Fortran addressed—and its solution—is the primary reason Fortran is still in use today: Fortran programs run fast. The details are out of the scope of this book but the result is clear. Algorithms expressed in Fortran run quickly. Fortran was designed that way. Implementations are judged on their efficiency, compilers generate clean code, and coders always have an eye on performance. Other languages have surpassed it in terms of readability, portability, and other measures of quality, but it is a rare language that measures up in terms of efficiency.

The computing environment

It is important to understand some of the environment that Fortran programs were running in when it was first developed. While we are used to a computer running multiple programs simultaneously today (multitasking), early computers ran only one program at a time. The programs would sit in a queue, in order. The operating system would take the first program, run it from beginning to end, then do the same for the next program, and so on. This form of job scheduling is known as a batch system.

Batch systems are very efficient. At the very bottom of things, a processor can only do one thing at a time. A multitasking system just switches what the processor is doing from one thing to another very quickly, so it looks like multiple things are happening at once. This makes for a smoother user experience; however, multitasking systems can spend a lot of time doing this switching.

Batch systems can devote this switching time to running the program. In the end, the program runs faster (although the user experience is degraded). Fortran, with its emphasis on speed, was a natural fit for batch systems.

Choosing between IPython and Fortran

We will start by taking a look at each language in general, and follow that with a discussion on the cost factors that impact a software project and how each language can affect them. No two software development projects are the same, and so the factors discussed next (along with many, many others) should serve as guidelines for the choice of language. This chapter is not an attempt to promote IPython at the expense of Fortran, but it shows that IPython is a superior choice when implementing certain important types of systems.

Fortran

Many of the benefits and drawbacks of Fortran are linked to its longevity. For the kinds of things that have not changed over the decades, Fortran excels (for example, numerical computing, which is what the language was originally designed for). Newer developments (for example, text processing, objects) have been added to the language in its various revisions.

The benefits of Fortran are as follows:

Compilation makes for efficient runtime performance
Existence of many tested and optimized libraries for scientific computing
Highly portable
Optimized for scientific computing (especially matrix operations)
Stable language definition with a well-organized system for revisions

The drawbacks of Fortran are as follows:

Text processing is an add-on
Object-orientation is a recent addition
Shrinking pool of new talent

IPython

IPython/Python is the new kid in town. It began in 2001 when Fernando Perez decided that he wanted some additional features out of Python. In particular, he wanted a more powerful command line and integration with a lab-notebook-style interface. The end result was a development environment that placed greater emphasis on ongoing interaction with the system than what traditional batch processing provided.

The nearly 45-year delay between the advent of Fortran and IPython's birth provided IPython the advantage of being able to natively incorporate ideas about programming that have arisen since Fortran was created (for example, object-orientation and sophisticated data structuring operations). However, its relative newness puts it behind in terms of installed code base and libraries. IPython, as an extension of Python, shares its benefits and drawbacks to a large extent.

The benefits of IPython are as follows:

Good at non-numeric computing
More concise
Many object-oriented features
Ease of adoption
Useful libraries
Sophisticated data structuring capabilities
Testing and documentation frameworks
Built-in visualization tools
Ease of interaction while building and running systems

The drawbacks of IPython are as follows:

Its interpreted nature makes for slower runtime
Fewer libraries (although the ones that exist are of high quality)

Some of these benefits deserve more extensive treatment here, while others merit entire chapters.

Object-orientation

Object-oriented programming (OOP) was designed for writing simulations. While some simulations reduce to computational application of physical laws (for example, fluid dynamics), other types of simulation (for example, traffic patterns and neural networks) require modeling the entities involved at a more abstract level. This is more easily accomplished with a language that supports classes and objects (such as Python) than an imperative language.

The ability to match a program structure to a problem's structure makes it easier to write, test, and debug a system. The OOP paradigm is simply superior when simulating a large number of individually identifiable, complex elements.

Ease of adoption

It is easy to learn Python. It is currently the most popular introductory programming language in the United States among the top 39 departments (http://cacm.acm.org/blogs/blog-cacm/176450-python-is-now-the-most-popular-introductory-teaching-language-at-top-us-universities/fulltext):

Note that Fortran is not on the list.

This is no accident, nor is Python limited to a "teaching language." Rather, it is a well-designed language with an easy-to-learn syntax and a gentle learning curve. It is much easier to learn Python than Fortran, and it is also easier to move from Fortran to Python than the reverse. This has led to an increasing use of Python in many areas.

Popularity – Fortran versus IPython

The trend toward teaching Python has meant that there is a much larger pool of potential developers who know Python. This is an important consideration when staffing a project.

TIOBE Software ranks the popularity of programming languages based on skilled engineers, courses, and third-party vendors. Their rankings for October 2015 put Python in the fifth place and growing. Fortran is 22nd (behind COBOL, which is 21st).

IEEE uses its own methods, and they produced the following graph:

The column on the left is the 2015 ranking, and the column on the right is the 2014 ranking, for comparison. Fortran came in 29th, with a Spectrum ranking of 39.5.

Useful libraries

The growing number of Python coders has led to an increasing number of libraries written in/for Python. SciPy, NumPy, and sage are leading the way, with new open source libraries coming out on a regular basis. The usefulness of a language is heavily dependent on its libraries, and while Python cannot boast the depth in this field that Fortran can, the sheer number of Python developers means that it is closing the gap rapidly.

The cost of building (and maintaining) software

If developers were all equal in talent, they worked for free, development time were no object, all code were bug-free, and all programs only needed to run once and were then thrown away, Fortran would be the clear winner given its efficiency and installed library base.

This is not how commercial software is developed. At a first approximation, a software project's cost can be broken down into the cost of several parts:

Requirements and specification gathering
Development
Execution
Testing and maintenance

Requirements and specification gathering

There is no clear differentiation between IPython and Fortran in the difficulty of production, good requirements, and specifications. These activities are language-independent. While the availability of prewritten software packages may impact parts of the specification, both languages are equally capable of reducing requirements and specifications to a working system.

Development

As discussed previously, Python code tends to be more concise, leading to higher programmer productivity. Combine this with the growing numbers of developers already fluent in Python and Python is the clear winner in terms of reducing development time.

Execution

If it is costly to run on the target system (which is true for many supercomputers), or the program takes a long time to run (which is true for some large-scale simulations such as weather prediction), then the runtime efficiency of Fortran is unmatched. This consideration looms especially large when development on a program has largely concluded and the majority of the time spent on it is in waiting for it to complete its run.

Testing and maintenance

There are many different styles of testing: unit, coverage, mocks, web, and GUI, to name just a few. Good tests are hard to write and not very the effort put into them is often unappreciated. Most programmers will avoid writing tests if they can. To that end, it is important to have a set of good, easy-to-use testing tools.

Python has the advantage in this area, particularly because of such quality unit testing frameworks such as unit test, nose, and Pythoscope. The introspection capabilities of the Python language make the writing and use of testing frameworks much easier than those available for Fortran.

You could always just skip testing (it is, after all, expensive and unpopular), or do it the old-fashioned way; try a few values and check whether they work. This leads to an important consideration governing how much testing to do: the cost of being wrong. This type of cost is especially important in scientific and engineering computing. While the legal issues surrounding software liability are in flux, moral and practical considerations are important. No one wants to be the developer who was responsible for lethally overdosing chemotherapy patients because of a bug. There are types of programming for which this is not important (word processors come to mind), but any system that involves human safety or financial risk incurs a high cost when something goes wrong.

Maintenance costs are similar to testing costs in that maintenance programming tends to be unpopular and allows new errors to creep into previously correct code. Python's conciseness reduces maintenance costs by reducing the number of lines of code that need to be maintained. The superior testing tools allow the creation of comprehensive regression testing suites to minimize the chances of errors being introduced during maintenance.

Alternatives

There are alternatives to the stark IPython/Fortran choice: cross-language development and prototyping.

Cross-language development

Python began as a scripting language. As such, it was always meant to be able to interoperate with other languages. This can be a great advantage in several situations:

A divided development team: If some of your developers know only Fortran and some know only Python, it can be worth it to partition the system between the groups and define a well-structured interface between them. Functionality can then be assigned to the appropriate team:
- Runtime-intensive sections to the Fortran group
- Process coordination, I/O, and others to the Python group
Useful existing libraries: It always seems like there is a library that does exactly what is needed but it is written in another language. Python's heritage as a scripting language means that there are many tools that can be used to make this process easier. Of particular interest in this context is F2Py (part of NumPy), which makes interfacing with Fortran code easier.
Specialized functionality: Even without a pre-existing library, it may be advantageous to write some performance-sensitive modules in Fortran. This can raise development, testing, and maintenance costs, but it can sometimes be worth it. Conversely, IPython provides specialized functionality in several areas (testing, introspection, and graphics) that Fortran projects could use.

Prototyping and exploratory development

It is often the case that it is not clear before writing a program how useful that program will turn out to be. Experience with the finished product would provide important feedback, but building the entire system would be prohibitively costly.

Similarly, there may be several different ways to build a system. Without clear guidelines to start with, the only way to decide between alternatives is to build several different versions and see which one is the best.

These cases share the problem of needing the system to be complete before being able to decide whether to build the system in the first place.

The solution is to build a prototype—a partially functional system that nevertheless incorporates important features of the finished product as envisioned. The primary virtue of a prototype is its short development time and concomitant low cost. It is often the case that the prototype (or prototypes) will be thrown away after a short period of evaluation. Errors, maintainability, and software quality in general are not important insofar as they are important to evaluating the prototype (say, for use in estimating the schedule for the entire project).

Python excels as a prototyping language. It is flexible and easy to work with (reducing development time) while being powerful enough to implement sophisticated algorithms. Its interpreted nature is not an issue, as prototypes are generally not expected to be efficient (only quick and cheap).

It is possible to adopt an approach known as Evolutionary Prototyping. In this approach, an initial prototype is built and evaluated. Based on this evaluation, changes are decided upon. The changes are made to the original prototype, yielding an improved version. This cycle completes until the software is satisfactory. Among other advantages, this means that a working version of the system is always available for benchmarking, testing, and so on. The results of the ongoing evaluations may point out functionality that would be better implemented in one language or another, and these changes could be made as described in the section on cross-language development.

An example case – Fast Fourier Transform

In this section, we will look at a small test program for a common scientific algorithm as written in Fortran and Python. Issues related to efficiency and general software engineering will be addressed.

Fast Fourier Transform

Rosetta Code (http://rosettacode.org/wiki/Rosetta_Code) is an excellent site that contains solutions to many problems in different programming languages. Although there is no guarantee that the code samples contained on the site are optimal (in whatever sense the word "optimal" is being used), its goal is to present a solution usable by visitors who are learning a new language. As such, the code is generally clear and well-organized. The following examples are from the site. All code is covered under the GNU Free Documentation License 1.2.

Fortran

From http://rosettacode.org/wiki/Fast_Fourier_transform#Fortran:

module fft_mod
  implicit none
  integer,       parameter :: dp=selected_real_kind(15,300)
  real(kind=dp), parameter :: pi=3.141592653589793238460_dp
contains

  ! In place Cooley-Tukey FFT
  recursive subroutine fft(x)
    complex(kind=dp), dimension(:), intent(inout)  :: x
    complex(kind=dp)                               :: t
    integer                                        :: N
    integer                                        :: i
    complex(kind=dp), dimension(:), allocatable    :: even, odd

    N=size(x)

    if(N .le. 1) return

    allocate(odd((N+1)/2))
    allocate(even(N/2))

    ! divide
    odd =x(1:N:2)
    even=x(2:N:2)

    ! conquer
    call fft(odd)
    call fft(even)

    ! combine
    do i=1,N/2
       t=exp(cmplx(0.0_dp,-2.0_dp*pi*real(i-1,dp)/real(N,dp),kind=dp))*even(i)
       x(i)     = odd(i) + t
       x(i+N/2) = odd(i) - t
    end do

    deallocate(odd)
    deallocate(even)

  end subroutine fft

end module fft_mod

program test
  use fft_mod
  implicit none
  complex(kind=dp), dimension(8) :: data = (/1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0/)
  integer :: i

  call fft(data)

  do i=1,8
     write(*,'("(", F20.15, ",", F20.15, "i )")') data(i)
  end do

end program test

Python

From http://rosettacode.org/wiki/Fast_Fourier_transform#Python:

from cmath import exp, pi

def fft(x):
    N = len(x)
    if N <= 1: return x
    even = fft(x[0::2])
    odd =  fft(x[1::2])
    T= [exp(-2j*pi*k/N)*odd[k] for k in xrange(N/2)]
    return [even[k] + T[k] for k in xrange(N/2)] + \
           [even[k] - T[k] for k in xrange(N/2)]

print( ' '.join("%5.3f" % abs(f)
for f in fft([1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0])) )

Performance concerns

It would be difficult to compare the performance of these programs. The time required to run a program can be influenced by many things outside of the inherent properties of the language:

Skilled Fortran and Python programmers could find optimizations at the code level
Optimizing compilers vary in quality
Underlying libraries (for example, numpy) could be substituted in and affect performance
Critical sections could be coded in a compiled language (for example, Cython) or even assembly language, yielding a major speedup without affecting most of the lines of code
The architecture of the machine itself could have an impact

Software engineering concerns

The question of how fast code runs is independent of the question of how long it takes to write, debug, and maintain. It is notoriously difficult to estimate how much time / how much effort will be required to write a program before the development has started. This uncertainty remains throughout the development cycle, with many projects going well over time and budget. Even when coding is complete, it can be difficult to tell how efficient the entire process was. A means to measure effort would help answer these questions.

There are two primary ways to measure the amount of effort required to write software, mentioned as follows.

Complexity-based metrics

Complexity-based metrics focus on either of these two:

Code-level complexity (number of variables, loop nesting, branching complexity, and cyclomatic complexity)
Functionality (based on intuitive ideas of how difficult implementing different pieces of functionality might be)

Complexity-based measures have the advantage that they tend to match intuitive ideas of what complexity is and what types of code are complex (that is, difficult to write and debug). The primary drawback is that such measures often seem arbitrary. Especially before a project is started, it can be difficult to tell how much effort will be required to write a particular piece of functionality. Too many things can change between project specification and coding. This effect is even greater on large projects, where the separation between specification and implementation can be years long.

Size-based metrics

Size-based metrics focus on a property that can be expressed on a linear scale, for example:

Lines of code (LOC, or thousands of LOC, also known as KLOC)
Lines of machine code (post-compilation)
Cycles consumed (code that uses more cycles is probably more important and harder to write)

Size-based metrics have the advantage that they are easy to gather, understand, and objectively measure. In addition, LOC seems to be a decent correlate of project cost—the more the lines of code in a project, the more it costs to write it. The most expensive part of a software project is paying the coders, and the more lines of code they have to write, the longer it probably takes them to write. If the lines of code could be estimated upfront, they would be a tool for estimating the cost.

The primary drawback of this is that it is unclear whether they are very valid. It is often the case that better (clearer, faster, and easier to maintain) code will grade out as "smaller" under a size-based metric. In addition, such code is often easier to write, making the development team look even more productive. Bloated, buggy, and inefficient code can make a team look good under these metrics, but can be a disaster for the project overall.

As for the class of projects this book is concerned with, much of it involves taking mathematics-based models and translating them into executable systems. In this case, we can consider the complexity of the problem as fixed by the underlying model and concentrate on size-based measures: speed and lines of code. Speed was addressed previously, so the main concern left is LOC. As illustrated previously, Python programs tend to be shorter than Fortran programs. For a more detailed look, visit http://blog.wolfram.com/2012/11/14/code-length-measured-in-14-languages/.

Admittedly, such measures are fairly arbitrary. It is possible to write programs in such a way as to minimize or maximize the number of lines required, often to the detriment of the overall quality. Absent such incentives, however, any programmer tends to produce the same number of lines of code a day regardless of the language being used. This makes the relative conciseness of Python an important consideration when choosing a language to develop in.

Where we stand now

In the past, most HPC and parallel programming were done on a limited number of expensive machines. As such, the most important criteria by which programs were measured was execution speed. Fortran was an excellent solution to the problems of writing fast, efficient programs. This environment was acceptable to the community, which needed to perform these types of calculations, and it gradually separated from mainstream commercial computing, which developed other concerns.

The birth of cloud computing (and cheaper hardware in general) and the evolution of big data has caused some in the commercial mainstream to reconsider using large, parallel systems. This reconsideration has brought commercial concerns to the fore: development and maintenance costs, testing, training, and other things. In this environment, some (small) trade-off in speed is worth it for significant gains in other areas. Python/IPython has demonstrated that it can provide these gains with a minimal runtime performance cost.

High Performance Computing

At this point, we have to leave consumer computing aside for a while. As computing hardware became more affordable, the need for most people to have programs run as efficiently as possible diminished. Other criteria entered the picture: graphical interfaces, multitasking, interactivity, and so on. Usability became more important than raw speed.

This, however, was not true for everybody. There remained a small (but devoted) group of users/programmers for whom efficiency was not just the most important thing. It was the only thing. These groups hung out in nuclear labs and intelligence agencies and had money to spend on exotic hardware and highly skilled coders. Thus was shaped High Performance Computing (HPC).

True to the nature of HPC, its implementations have been chosen with efficiency in mind. HPC systems are highly parallel, are batch type, and run Fortran. It is important enough to the users of HPC systems that their programs run quickly, so much so that they have ignored any and all advances in the field which did not result faster programs.

The HPC learning curve

This was a satisfactory relationship for some time. The types of problems of interest to the HPC community (complicated physical modeling and advanced mathematics) had little overlap with the rest of computer science. HPC was a niche with a very high barrier to entry. After all, there were just not that many massively parallel computers to go around.

In a sense then, programming HPC systems was an island. On the island, there were ongoing research programs centered on important HPC-centric questions. Tools were built, skills were developed, and a community of practice developed to the point that approaching HPC from the outside could be daunting. Advances occurred outside of HPC also, but those inside it had their own concerns.

As time passed, the HPC island drifted further and further from mainstream computing. New areas opened up: web computing, mobile computing, agile methods, and many others. HPC took what it needed from these areas, but nothing really affected it. Until something finally did…

Cloudy with a chance of parallelism (or Amazon's computer is bigger than yours)

Amazon had a problem. During the Christmas season, it used a lot of computer power. For the rest of the year, these computers would sit idle. If there were some way to allow people to rent time on these idle machines, Amazon could make money. The result was an API that allowed people to store data on those machines (the Amazon Simple Storage Service, or S3) and an API that allowed people to run programs on the same machines (the Amazon Elastic Compute Cloud, or EC2). Together, these made up the start of the Amazon Cloud.

While not the first system to rent out excess capacity (CompuServe started off the same way several decades earlier), Amazon Cloud was the first large-scale system that provided the general public paid access to virtually unlimited storage and computing power.

It is not clear whether anybody realized what this meant at first. There are a lot of uses of clouds—overflow capacity, mass data storage, and redundancy, among others—that have a wide appeal. For our purposes, the cloud meant one thing: now everybody has access to a supercomputer. HPC will never be the same again.

HPC and parallelism

The current relationship between HPC and highly parallel architectures is relatively new. It was only in the 1990s that HPC left the realm of very fast single-processor machines for massively parallel architectures. In one sense, this was unfortunate, as the old Cray machines were aesthetic marvels:

The image is taken from a public domain: https://commons.wikimedia.org/wiki/File:Cray2.jpeg

It was largely inevitable, however, as single-processor systems were bumping up against physical limitations involving transistor density and cooling.

The change in architecture did not bring with it a change in the problems to be solved. To this end, the generic supercomputer physical architecture evolved toward:

Commodity processors—not custom-fast but top-of-the-line and homogeneous
Commodity RAM—ditto
High-end hard drives—lots of smaller, low-latency models (now turning into solid state drives)
Super-fast interconnected networks

Moving from single to multiple processors brought issues with locality. Every time a program running on one processor needed data from another processor (or disk), processing could come to a halt as the data was being retrieved. The physical architecture of the supercomputer is meant to minimize the latency associated with non-local data access.

Given the position of HPC centers as early adopters of parallel architectures, "parallel programming" came to be largely synonymous with "HPC programming." This is largely a historical accident, and new paradigms have opened up parallel computing to constituencies outside of the HPC world. As such, this book will use the two terms interchangeably.

We now turn to one of the new paradigms, cloud computing, and discuss its similarities and differences from standard HPC.

Clouds and HPC

There are some differences between a "real" supercomputer and what most clouds offer. In particular, a cloud's physical architecture will contain:

Commodity processors—not necessarily fast, but they make up for it in sheer numbers
Commodity RAM—ditto
Commodity hard drives—smaller, but larger in aggregate
Slow(er) interconnected networks

In addition, clouds are generally heterogeneous and easily scaled. While an initial cloud is likely to have many subsystems with the same processor, RAM, hard drives, and so on, over time new subsystems will be added, with newer (or at least different) technology. The loose coupling of cloud systems encourages this sort of organic growth.

Differences in architecture mean that some algorithms will run well on supercomputers versus others that favor clouds. A lot of software that runs on supercomputers will not run on clouds; period (and vice versa)! This is not always just a matter of recompiling for a new target platform or using different libraries. The underlying algorithm may not be suited for a particular paradigm.

If speed is imperative and you have the budget, there is still no substitute for a special-purpose HPC system. If cost, ease of access, redundancy, and massive parallelism are desired, a cloud fits the bill.

That is not to say the two worlds (HPC and cloud) are completely distinct. Despite these architectural differences, it is worth noting that an Amazon EC2 C3 instance cluster is listed at 134 on the top 500 list of fastest HPC systems as of June 2015. Even on HPC's own terms, cloud computers offer respectable performance.

The core audience for this book then consists of members of both of these groups:

Python programmers looking to expand into HPC/parallel-style programming
HPC/parallel programmers looking to employ Python

Each group has the skills the other wants. HPC programmers understand scientific computing, efficiency, and parallelism. Python programmers are skilled in interactivity, usability, correctness, powerful development tools, ease of debugging, and other capabilities that mainstream computing values. New technology means that future systems will need to incorporate elements from both skill sets.

Going parallel

The previous sections are applicable to either serial or parallel computing. Even in the most parallelizable number crunching program, a great deal of serial code is written, so these observations are very applicable. After a certain point, however, parallel concerns come to dominate. We will start this section by introducing some terminology, before looking at a simple example.

Terminology

Wall-clock time is the amount of time that passes from the beginning of execution of a program to the end of its execution, as measured by looking at a clock on the wall. Wall-clock time is the measure people usually care about.

Cycle time is the time obtained by summing up the number of cycles taken by the program during its execution. For example, if a CPU is running at 1 MHz, each cycle takes 0.000001 seconds. So if it takes 2,500,000 cycles for a program to run, then it means the program took up 2.5 seconds of cycle time.

In a batch system with a single processor, the times are always the same. In a multitasking system, wall-clock time is often longer than cycle-time as the program may spend wall-clock time waiting to run without using any cycles.

With more than one processor, comparing the two times for an algorithm became more complicated. While not always true, many programs could be divided into pieces, such that running the program on two or more processors simultaneously reduced the wall-clock time, even if the cycle-time went up. Since wall-clock time is the important measure, for these algorithms, the answer was "Yes."

One can quantify this effect as follows:

Note

Given a particular algorithm A

Call the wall-clock time for A when using n processors W(A, n).

Similarly, the cycle time for A using n processors is C(A, n).

We can define the speedup of W(A, n) as Terminology

Similarly, we can define the speedup of C(A, n) as Terminology

In general, when using a batch system:

Note

For most algorithms, W(A, n) < C(A, n) when n > 1.

For most algorithms, Terminology when n > 1. For example, using two processors does not make the program run twice as fast. In general, adding more processors to run a program yields diminishing returns.

There are some algorithms for which Terminology . These are known as embarrassingly parallel algorithms. In this case, adding more processors results in linear speedup, which is where machines with many processors really shine.

In summary, the answer to the question, "Are more processors better?" is that it depends on the algorithm. Luckily for parallel computing, many algorithms show some amount of speedup and many important problems can be solved using these algorithms.

A parallel programming example

Consider the Collatz conjecture. Given the following function:

The conjecture is: for any positive integer, repeated application of f(n) will always reach the number 1. It is believed that the conjecture is true, but there is currently no proof. We are concerned with how long it takes to reach 1, that is, how many applications of f(n) are required for a given n. We would like to find the average for all n, 1 to 100.

The term for the sequence of numbers generated for any n is hailstone sequence. For example, the hailstone sequence for n = 6 is 6, 3, 10, 5, 16, 8, 4, 2, 1. We are interested in the average length of hailstone sequences.

A serial program

A regular (serial) Python program for computing the answer might look as follows:

def f(n):
    curr = n
    tmp = 1
    while curr != 1:
        tmp = tmp + 1
        if curr % 2 == 1:
            curr = 3 * curr + 1
        else:
            curr = curr/2
    return tmp

def main( ):
    sum = 0
    for i in range(1, 101):
        sum = sum + f(i)
    avg = sum / 100.0

Tip

Detailed steps to download the code bundle are mentioned in the Preface of this book. Please have a look.

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Mastering-IPython-4. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

A schematic of the processing would look like this:

Without going into too much detail, it is easy to see that the running time of the preceding program can be expressed as:

Setup (definition of f, initialization of sum, and so on)
Loop body (the sum of the amount of time to compute 100 hailstone sequences one at a time)
Teardown (calculate the average)

It is obvious that the running time of the program will be dominated by one of the loops. There is not much to be done about the while loop inside of f. Each iteration after the first depends on the result of a previous iteration. There is no way to, for example, do the tenth iteration without having already done the ninth, eighth, and so on.

The for loop inside of main has more potential for parallelization. In this case, every iteration is independent. That is:

Each iteration computes its own value, (f(i))
The computation of each f(i) does not depend on any other iteration
The values can easily be combined (via summation)

This algorithm can be converted to a parallel equivalent with a few extra commands. As they stand, these functions are pseudo-code—equivalent IPython functions will be described in later chapters:

getProcs(num): Returns a list of num processors
proc.setFun(fun, arg): Assigns a function fun with an argument arg to the proc processor
procs.executeAll( ): Executes fun on all processors in proc in parallel
proc.fetchValue( ): Returns the value computed on the proc processor when the calculation is complete

A parallel equivalent

With these additions, a parallel equivalent might look as follows:

def f(n):
    curr = n
    tmp = 1
    while curr != 1:
        tmp = tmp + 1
        if curr % 2 == 1:
            curr = 3 * curr + 1
        else:
            curr = curr/2
    return tmp

def main( ):
    sum = 0
    procs = getProcs(100)
    i = 1

    for proc in procs:
        proc.setFun(f, i)
        i = i + 1

    procs.executeAll( )

    for proc in procs:
        sum = sum + proc.fetchValue( )

    avg = sum / 100.0

A schematic of the processing would look as follows:

Discussion

While the parallel version is slightly longer (20 lines of code compared to 15), it is also faster, given enough processors. The intuitive reason is that the invocations of f are not queued up waiting for a single processor. With a single processor, the invocation of f(i) has to wait in line behind all the previous invocations of f(a) where 1 ≤ a < i, even though there is no dependency between them. The single processor is an unnecessary bottleneck. In this case, as no call to f depends on any other call, this algorithm is embarrassingly parallel.

When a series of functions calls, (f1, f2, …, fn), is queued up for an algorithm A, it is easy to see that the cycle time required to complete all n function calls is:

In the embarrassingly parallel case, the cycle time becomes (potentially) much smaller:

This results in a speedup (ignoring setup and teardown) of:

In the case where all fi use the same number of cycles, this simplifies to the following:

Several issues important to parallel programming have been glossed over in the preceding discussion. These issues are important enough to have all of Chapter 3, Stepping Up to IPython for Parallel Computing, devoted to them.

H B Jul 29, 2016

Great book. Easy to understand.

Amazon Verified review

James C. Martin Apr 03, 2017

I was hoping for more pictures.

real san diego reader Sep 30, 2016

The title of this book is way too misleading. I was looking for a quick tutorial on using iPython. Imagine my confusion when the book immediately jumps into high performance computing, a comparison of Python to Fortran, Fast Fourier Transforms, and so on. I mean, why would anyone picking up a book called "Mastering IPython 4.0" want to read several pages about "Choosing between Python and Fortran"? And in the first chapter, no less!If the ttitle of this book were something along the lines of "Python for scientific computing" or "Python for parallel computing", I wouldn't have given it a one-star review. In fact, I would never have picked up the book in the first place.

Mastering IPython 4.0: Complete guide to interactive and parallel computing using IPython 4.0

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Object-orientation

Ease of adoption

Useful libraries

Requirements and specification gathering

Development

Execution

Testing and maintenance

Cross-language development

Prototyping and exploratory development

Fortran

Python

Complexity-based metrics

Size-based metrics

A serial program

A parallel equivalent

Discussion

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the authors

FAQs

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access