There is no doubt that the labor of scientists in the twenty-first century is more comprehensive and interdisciplinary than in previous generations. Members of scientific communities connect in larger teams and work together on mission-oriented goals and across their fields. This paradigm on research is also reflected in the computational resources employed by researchers. No longer are researchers restricted to one type of commercial software, operating system, or vendor, but inspired by open source contributions made available and tested by research institutions and open source communities; research work often spans over various platforms and technologies.

This book presents the highly-recognized open source programming environment till date — a system based on two libraries of the computer language Python: **NumPy** and **SciPy**. In the following sections, we will guide you through examples from science and engineering on the usage of this system.

The ideal programming environment for computational mathematics enjoys the following characteristics:

It must be based on a computer language that allows the user to work quickly and integrate systems effectively. Ideally, the computer language should be portable to all platforms: Windows, Mac OS X, Linux, Unix, Android, and so on. This is key to fostering cooperation among scientists with different resources and accessibilities. It must contain a powerful set of libraries that allow the acquisition, storing, and handling of large datasets in a simple and effective manner. This is central—allowing simulation and the employment of numerical computations at a large scale.

Smooth integration with other computer languages, as well as third-party software.

Besides running the compiled code, the programming environment should allow the possibility of interactive sessions as well as scripting capabilities for quick experimentation.

Different coding paradigms should be supported—imperative, object-oriented, and/or functional coding styles.

It should be an open source software, that allows user access to the raw data code, and allows the user to modify basic algorithms if so desired. With commercial software, the inclusion of the improved algorithms is applied at the discretion of the seller, and it usually comes at a cost of the end user. In the open source universe, the community usually performs these improvements and releases new versions as they are published—at no cost.

The set of applications should not be restricted to mere numerical computations; it should be powerful enough to allow symbolic computations as well.

Among the best-known environments for numerical computations used by the scientific community is **MATLAB**, which is commercial, expensive, and which does not allow any tampering with the code. **Maple** and **Mathematica** are more geared towards symbolic computation, although they can match many of the numerical computations from MATLAB. These are, however, also commercial, expensive, and closed to modifications. A decent alternative to MATLAB and based on a similar mathematical engine is the **GNU Octave system**. Most of the MATLAB code is easily portable to Octave, which is open source. Unfortunately, the accompanying programming environment is not very user friendly, it is also very much restricted to numerical computations. One environment that combines the best of all worlds is Python with the open source libraries NumPy and SciPy for numerical operations. The first property that attracts users to Python is, without a doubt, its code readability. The syntax is extremely clear and expressive. It has the advantage of supporting code written in different paradigms: object oriented, functional, or old school imperative. It allows packing of Python codes and to run them as standalone executable programs through the `py2exe`

, `pyinstaller`

, and `cx_Freeze`

libraries, but it can also be used interactively or as a scripting language. This is a great advantage when developing tools for symbolic computation. Python has therefore been a firm competitor to Maple and Mathematica: the open source mathematics software **Sage** (**System for Algebra and Geometry Experimentation**).

NumPy is an open source extension to Python that adds support for multidimensional arrays of large sizes. This support allows the desired acquisition, storage, and complex manipulation of data mentioned previously. NumPy alone is a great tool to solve many numerical computations.

On top of NumPy, we have yet another open source library, SciPy. This library contains algorithms and mathematical tools to manipulate NumPy objects with very definite scientific and engineering objectives.

The combination of Python, NumPy, and SciPy (which henceforth are coined as "SciPy" for brevity) has been the environment of choice of many applied mathematicians for years; we work on a daily basis with both pure mathematicians and with hardcore engineers. One of the challenges of this trade is to bring about the scientific production of professionals with different visions, techniques, tools, and software to a single workstation. SciPy is the perfect solution to coordinate computations in a smooth, reliable, and coherent manner.

Constantly, we are required to produce scripts with, for example, combinations of experiments written and performed in SciPy itself, C/C++, Fortran, and/or MATLAB. Often, we receive large amounts of data from some signal acquisition devices. From all this heterogeneous material, we employ Python to retrieve and manipulate the data, and once finished with the analysis, to produce high-quality documentation with professional-looking diagrams and visualization aids. SciPy allows performing all these tasks with ease.

This is partly because many dedicated software tools easily extend the core features of SciPy. For example, although graphing and plotting are usually taken care of with the Python libraries of **matplotlib**, there are also other packages available, such as **Biggles** (http://biggles.sourceforge.net/), **Chaco** (https://pypi.python.org/pypi/chaco), **HippoDraw** (https://github.com/plasmodic/hippodraw), **MayaVi** for **3D** rendering (http://mayavi.sourceforge.net/), the
**Python Imaging Library** or **PIL** (http://pythonware.com/products/pil/), and the online analytics and data visualization tool **Plotly** (https://plot.ly/).

Interfacing with non-Python packages is also possible. For example, the interaction of SciPy with the R statistical package can be done with **RPy** (http://rpy.sourceforge.net/rpy2.html). This allows for much more robust data analysis.

At the time of this book, the stable production releases of Python were 2.7.9 and 3.4.2. Still, Python 2.7 is more convenient if the user needs to communicate with third-party applications. No new releases are planned for Python 2; Python 3 is considered the present and the future of Python. For the purposes of SciPy applications, we do recommend you hold on to the 2.7 version, as there are still some packages using SciPy that have not been ported to Python 3 yet. Nevertheless, the companion software of this book was tested to work on both Python 2.7 and Python 3.4.

The Python software package can be downloaded from the official site (https://www.python.org/downloads/) and can be installed on all major systems such as Windows, Mac OS X, Linux, and Unix. It has also been ported to other platforms, including Palm OS, iOS, PlayStation, PSP, Psion, and so on.

The following screenshot shows two popular options for coding in Python on an iPad—**PythonMath** and **Sage Math**. While the first application allows only the use of simple math libraries, the second permits the user to load and use both NumPy and SciPy remotely.

**PythonMath** and **Sage** **Math** bring Python coding to iOS devices. **Sage Math** allows importing NumPy and SciPy.

We shall not go into detail about the installation of Python on your system, since we already assume familiarity with this language. In case of doubt, we advise browsing the excellent book *Expert Python Programming*, *Tarek Ziadé*, *Packt Publishing*, where detailed explanations are given for installing many of the different implementations on different systems. It is usually a good idea to follow the directions given on the official Python website. We will also assume familiarity with carrying out interactive sessions in Python, as well as writing standalone scripts.

The latest libraries for both NumPy and SciPy can be downloaded from the official SciPy site (http://scipy.org/). They both require a Python Version 2.4 or newer, so we should be in good shape at this point. We may choose to download the package from SourceForge (http://sourceforge.net/projects/scipy/), **Gohlke** (http://www.lfd.uci.edu/~gohlke/pythonlibs/) or **Git** repositories (for instance, the **superpack** from http://stronginference.com/ScipySuperpack/).

It is also possible in some systems to use prepackaged executable bundles that simplify the process, such as the **Anaconda** (https://store.continuum.io/cshop/anaconda/) or the **Enthought** (https://www.enthought.com/products/epd/) Python distributions. Here, we will show you how to download and install Scipy on various platforms in the most common cases.

While installing SciPy on Mac OS X, you must consider some criteria before you install it on your system. This helps in smooth installation of SciPy. The following are the things to be taken care of:

For instance, in Mac OS X, if

`MacPorts`

is installed, the process could not be easier. Open a terminal as superuser, and at the prompt (`%`

), issue the following command:**% port search scipy**This presents a list of all ports that either install SciPy or use SciPy as a requirement. For Python 2.7 we need to install

`py27-scipy`

issuing the following command:**% port install py27-scipy**

A few minutes later, the libraries are properly installed and ready to use. Note how `macports`

also installs all needed requirements for us (including the NumPy libraries) without any extra effort on our part.

Under any other Unix/Linux system, if either no ports are available or if the user prefers to install from the packages downloaded from either SourceForge or Git, it is enough to perform the following steps:

Unzip the NumPy and SciPy packages following the recommendation of the official pages. This creates two folders, one for each library.

Within a terminal session, change directories to the folder where the NumPy libraries are stored, which contains the

`setup.py`

file. Find out which Fortran compiler you are using (one of`gnu`

,`gnu95`

, or`fcompiler`

), and at prompt, issue the following command:**% python setup.py build –fcompiler=<compiler>**Once built, and on the same folder, issue the installation command. This should be all:

**% python setup.py install**

You can install Scipy on Windows in many ways. The following are some recommended ways that you might want to have a look on:

Under Microsoft Windows, we recommend you install from the binary installers provided by the Anaconda or Enthought Python Distributions. Please, however, be aware of the memory requirements. Alternatively, you can download and install the SciPy stack or the libraries, individually.

The procedure for the installation of the SciPy libraries is exactly the same, that is, downloading and building before installing under Unix/Linux or downloading and running under Microsoft Windows. Note that different implementations of Python might have different requirements before installing NumPy and SciPy.

As you might know, computer systems are not infallible. Accordingly, before starting computing via SciPy, one needs to be sure it is working correctly. To that end, SciPy developers have included a test suit any user of SciPy can execute to be sure the SciPy being used is working fine. That way, much debugging time can be saved whenever an error occurs while using any function provided by SciPy.

To run the test suite, at the Python prompt, one can run the following commands:

>>> import scipy>>> scipy.test()

### Tip

**Downloading the example code**

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

The reader should be aware that the execution of this test will take some time to finish. It should end with something like this:

This means that at the basic level, your SciPy installation is fine. Eventually, the test could end in the form:

In this case, one needs to revise carefully the errors and the failed tests. A place to get help is the SciPy mailing list (http://mail.scipy.org/pipermail/scipy-user/) to which one could subscribe. We have included a Python script that the reader could use to run these tests that can be found at the companion software for this chapter that comes with the book.

SciPy is organized as a family of modules. We like to think of each module as a different field of mathematics. And as such, each has its own particular techniques and tools. You can find a list of some of the different modules included in SciPy at http://docs.scipy.org/doc/scipy-0.14.0/reference/py-modindex.html.

Let's use some of its functions to solve a simple problem.

The following table shows the IQ test scores of 31 individuals:

114 |
100 |
104 |
89 |
102 |
91 |
114 |
114 |

103 |
105 |
108 |
130 |
120 |
132 |
111 |
128 |

118 |
119 |
86 |
72 |
111 |
103 |
74 |
112 |

107 |
103 |
98 |
96 |
112 |
112 |
93 |

A stem plot of the distribution of these 31 scores (refers to the IPython Notebook for this chapter) shows that there are no major departures from normality, and thus we assume the distribution of the scores to be close to normal. Now, estimate the mean IQ score for this population, using a 99 percent confidence interval.

We start by loading the data into memory, as follows:

>>> import numpy>>> scores = numpy.array([114, 100, 104, 89, 102, 91, 114, 114, 103, 105, 108, 130, 120, 132, 111, 128, 118, 119, 86, 72, 111, 103, 74, 112, 107, 103, 98, 96, 112, 112, 93])

At this point, if we type `dir(scores)`

, hit the *return* key followed by a dot (`.`

), and press the *tab* key ;the system lists all possible methods inherited by the data from the NumPy library, as it is customary in Python. Technically, we could go ahead and compute the required `mean`

, `xmean`

, and corresponding confidence interval according to the formula, *xmean ± zcrit * sigma / sqrt(n)*, where `sigma`

and `n`

are respectively the standard deviation and size of the data, and *zcrit* is the critical value corresponding to the confidence (http://en.wikipedia.org/wiki/Confidence_interval). In this case, we could look up a table on any statistics book to obtain a crude approximation to its value, *zcrit = 2.576*. The remaining values may be computed in our session and properly combined, as follows:

>>> import scipy>>> xmean = scipy.mean(scores)>>> sigma = scipy.std(scores)>>> n = scipy.size(scores)>>> xmean, xmean - 2.576*sigma /scipy.sqrt(n), \xmean + 2.576*sigma / scipy.sqrt(n)

The output is shown as follows:

**(105.83870967741936, 99.343223715529746, 112.33419563930897)**

We have thus computed the estimated mean IQ score (with value `105.83870967741936`

) and the interval of confidence (from about `99.34`

to approximately `112.33`

). We have done so using purely SciPy-based operations while following a known formula. But instead of making all these computations by hand and looking for critical values on tables, we could just ask SciPy.

Note how the `scipy.stats`

module needs to be loaded before we use any of its functions:

>>> from scipy import stats>>> result=scipy.stats.bayes_mvs(scores)

The variable `result`

contains the solution to our problem with some additional information. Note that result is a tuple with three elements as the `help`

documentation suggests:

**>>> help(scipy.stats.bayes_mvs)**

The output of this command will depend on the installed version of SciPy. It might look like this (run the companion IPython Notebook for this chapter to see how the actual output from your system is, or run the command in a Python console):

Our solution is the first element of the tuple `result`

; to see its contents, type:

**>>> result[0]**

The output is shown as follows:

**(105.83870967741936, (101.48825534263035, 110.18916401220837))**

Note how this output gives us the same average as before, but a slightly different confidence interval, due to more accurate computations through SciPy (the output might be different depending on the SciPy version available on your computer).

There is a wealth of information online, either from the official pages of SciPy (although its reference guides are somehow incomplete, as a work in progress), or from many other contributors that present tutorials on forums, YouTube, or personal sites. Several developers also publish examples of their work with great detail online.

As we previously saw, it is also possible to obtain help from our interactive Python sessions. The libraries NumPy and SciPy make use of **docstrings **heavily, which makes it simple to request for help for usage and recommendations with the usual Python help system. For example, if in doubt of the usage of the `bayes_mvs`

routine, the user can issue the following command:

>>> import scipy.stats>>> help(scipy.stats.bayes_mvs)

After executing this command, the system provides the necessary information. Equivalently, both NumPy and SciPy come bundled with their own help system, `info`

. For instance, look at the following command:

>>> import numpy>>> numpy.info('random')

This will offer a summary of all information parsed from the contents of all docstrings from the NumPy library associated with the given keyword (note it must be quoted). The user may navigate the output scrolling up and down, without the possibility of further interaction.

This is convenient provided we already do know the function we want to use if we are unsure of its usage. But, what should we do if we don't know about the existence of this procedure, and suspect that it may exist? The usual Python way is to invoke the `dir()`

command on a module, which lists all possible attributes.

Interactive Python sessions make it easier to search for such information with the possibility of navigating and performing further searches inside the output of help sessions. For instance, type in the following command at prompt:

>>> import scipy.stats>>> help(scipy.stats)

The output of this command will depend on the installed version of SciPy. It might look like this (run the companion IPython Notebook for this chapter to see the actual output from your system, or run the command in a Python console):

Note the colon (**:**) at the end of the screen—this is an old-school prompt. The system is in stand-by mode, expecting the user to issue a command (in the form of a single key). This also indicates that there are a few more pages of help following the given text. If we intend to read the rest of the help file, we may press spacebar to scroll to the next page.

In this way, we can visit the following manual pages on this topic. It is also possible to navigate the manual pages scrolling one line of text at a time using the up and down arrow keys. When we are ready to quit the help session, we simply press (the keyboard letter) *Q*.

It is also possible to search the help contents for a given string. In that case, at the prompt, we press the (*/*) slash key. The prompt changes from a colon into a slash, and we proceed to input the keyword we would like to search for.

For example, is there a SciPy function that computes the **Pearson kurtosis** of a given dataset? At the slash prompt, we type in `kurtosis`

and press *enter*. The help system takes us to the first occurrence of that string. To access successive occurrences of the string kurtosis, we press the *N* key (for next) until we find what we require. At that stage, we proceed to quit this help session (by pressing *Q*) and request more information on the function itself:

**>>> help(scipy.stats.kurtosis)**

The output of this command will depend on the installed version of SciPy. It might look like this (run the companion IPython Notebook for this chapter to see how the actual output from your system is, or run the command in a Python console):

At this point, we would like to introduce you to another resource that we will be using to generate graphs, namely the matplotlib libraries. It may be downloaded from its official web page, http://matplotlib.org/, and installed following the standard Python commands. There is a good online documentation in the official web page, and we encourage the reader to dig deeper than the few commands that we will use in this book. For instance, the excellent monograph *Matplotlib for Python Developers*, *Sandro Tosi*, *Packt Publishing*, provides all that we would need and more. Other plotting libraries are available (commercial or otherwise that aim to very different and specific applications. The degree of sophistication and ease of use of matplotlib makes it one of the best options to generate graphics in scientific computing.

Once installed, it may be imported using `import matplotlib`

. Among all its modules, we will focus on `pyplot`

that provides a comfortable interface with the plotting libraries. For example, if we desire to plot a cycle of the sine function, we could execute the following code snippet:

>>> import numpy>>> import matplotlib.pyplot as plt>>> x=numpy.linspace(0,2*numpy.pi,32)>>> fig = plt.figure()>>> plt.plot(x, numpy.sin(x))>>> plt.show()>>> fig.savefig('sine.png')

We obtain the following plot:

Let us explain each command from the previous session. The first two commands are used to import `numpy`

and `matplotlib.pyplot`

as usual. We define an array *x* of 32 uniformly spaced floating point values from 0 to 2*π*, and define *y* to be the array containing the sine of the values from *x*. The command figure creates space in the memory to store the subsequent plots and puts in place an object of the `matplotlib.figure.Figure`

form. The `plt.plot(x, numpy.sin(x))`

command creates an object of the `matplotlib.lines.Line2D`

form containing data with the plot of *x* against `numpy.sin(x)`

together with a set of axes attached to it and labeled according to the ranges of the variables. This object is stored in the previous `Figure`

object and is displayed on the screen via the `plt.show()`

command. The last command in the session, `fig.savefig()`

, saves the Figure object to whatever valid image format we desire (in this case, a **Portable** **Network** **Graphics** (**PNG**) image). From now on, in any code that deals with matplotlib commands, we will leave the option of showing/saving open.

There are, of course, commands that control the style of axes, aspect ratio between axes, labeling, colors, legends, the possibility of managing several figures at the same time (subplots), and many more features to display all sorts of data. We will be discovering these as we progress with examples throughout the book.

This book comes with a set of IPython Notebooks that will help you interactively test and modify or adapt to your needs to the code snippets shown in each chapter of the book. We should warn, however, that these IPython Notebooks will make sense only if read along side the book.

In this regard, this book assumes familiarity with Python and some of its development environment as the IPython Notebook. Consequently, we will only refer to the documentation on the official website for IPython Notebook (http://ipython.org/notebook.html). You can find additional help at (http://ipython.org/ipython-doc/stable/notebook/index.html). Note that IPython Notebook is also available through **Wakari** (https://wakari.io/), as a standalone or part of the Anaconda package, or by Enthought. If you're new to IPython Notebook, get started by looking at the example collection and reading the documentation.

To use the files for this book, open a terminal and go to the directory where the file you want to open is stored (it should have the form `filename.ipynb`

). At the command line, in that terminal, type:

**ipython notebook filename.ipynb**

After hitting the *enter* key, the file should be displayed in the default web browser. In case that does not happen, please note that the IPython Notebook is officially supported on the browsers Chrome, Safari, and Firefox. For additional details refers to the *Browser Compatibility* section on the documentation currently at http://ipython.org/ipython-doc/stable/install/install.html.

Once the `.ipynb`

file has been opened, press and hold the *shift* key and hit *enter* to start executing the notebook cell by cell. Another way to execute the notebook cell by cell is via the player icon on the menu near the left of the cell labeled as **markdown**. Alternatively, from the **Cell** menu (on the top of the browser) you could choose among several options to execute the contents of the notebook.

To leave the notebook you could choose **Close** **and** **halt**, from the **File** menu on top of the browser below the label **Notebook**. Options to save the notebook can also be found under the **File** menu. To completely close the notebook browser you need to hit the keys *ctrl* and *C* simultaneously on the terminal where the notebook was started and follow the instructions after that.

In this chapter, you have learned the benefits of using the combination of Python, NumPy, SciPy, and matplotlib as a programming environment for any scientific endeavor that requires mathematics; in particular, anything related to numerical computations. You have explored the environment, learned how to download, install, and test the required libraries, used them for some quick computations, and figured out a few good ways to search for help.

In Chapter 2, *Working with the NumPy Array As a First Step to SciPy*, we will guide you through basic object creation in SciPy, including the best methods to manipulate data, or obtain information from it.