Chapter 4. Connecting NumPy with the Rest of the World
In this chapter, we will cover:
Using the buffer protocol
Using the array interface
Exchanging data with MATLAB and Octave
Installing RPy2
Interfacing with R
Installing JPype
Sending a NumPy array to JPype
Installing Google App Engine
Deploying NumPy code in the Google cloud
Running NumPy code in a Python Anywhere web console
Setting up PiCloud
This chapter is about interoperability
. We have to keep reminding ourselves that NumPy is not alone in the scientific (Python) software ecosystem. Working together with SciPy and Matplotlib is pretty easy. Protocols exist for interoperability with other Python packages. Outside of the Python ecosystem languages such as Java, R, C, and FORTRAN are pretty popular. We will go into the details of exchanging data with these environments.
Also, we will discuss how to get our NumPy code in the cloud. This is a continuously evolving technology, in a fast-moving space. Many options are available to you, of which Google App Engine, PiCloud, and Python Anywhere will be covered.
Note
The danger here is to seem subjective, so please be assured that the author is in no way affiliated with any of these companies.
Using the buffer protocol
C-based Python objects have a so called "buffer interface". Python objects can expose their data for direct access without the need to copy it. The buffer protocol enables us to communicate with other Python software such as the Python Imaging Library (PIL)
. We will see an example of saving a PIL image from a NumPy array.
Install PIL and SciPy, if necessary. Check the See Also section of this recipe for instructions.
First, we need a NumPy array with which to play.
Create an array from image data.
In previous chapters, we saw how to load the "Lena" sample image of Lena Soderberg. We will create an array filled with zeroes, and populate the alpha channel with the image data:
Save the data as a PIL image.
Now, we will use the PIL API to save the data as a RGBA image:
Using the array interface
The array interface is a yet another mechanism to communicate with other Python applications. This protocol, as its name suggests, is only applicable to array-like objects. A demonstration is in order. Let's use PIL again, but without saving files.
We will be reusing part of the code from the previous recipe, so the prerequisites are similar. We will skip the first step of the previous step here, and assume it is already known.
The following steps will let us explore the array interface:
The PIL image array interface attribute.
The PIL image object has a
__array_interface__
attribute. Let's inspect its contents. The value of this attribute is a dictionary
:
This code prints the following information:
The NumPy array...
Exchanging data with MATLAB and Octave
MATLAB and its open source alternative Octave are popular mathematical applications. The scipy.io
package has the
savemat
function, which allows you to store NumPy arrays in a .mat
file as a value of a dictionary.
Installing MATLAB or Octave is outside of the scope of this book. The Octave website has some pointers for installing: http://www.gnu.org/software/octave/download.html. Check the See Also section of this recipe, for instructions on installing SciPy, if necessary.
Once you have installed MATLAB or Octave, you need to follow the subsequent steps to store NumPy arrays:
Call savemat
.
Create a NumPy array, and call savemat
to store the array in a .mat
file. This function has two parameters—a file name and a dictionary containing variable names and values.
Load the .mat
file.
Navigate to the directory where you created the file. Load the file, and check the array...
R is a popular scripting language used for statistics and data analysis. RPy2 is an interface between R and Python. We will install RPy2 in this recipe.
If you want to install RPy2, choose one of the following options:
Installing with
pip
or easy_install
.
RPy2 is available on PYPI,
so we can install it with either of the following two commands:
or
Installing from source.
We can install RPy2 from the source tar.gz
:
RPy2 can only be used to call R from Python, and not the other way around. We will import some sample R datasets, and plot the data of one of them.
Install RPy2 if necessary. See the previous recipe.
Let's start by loading a sample R dataset.
Load a data set into an array.
Load the datasets with the RPy2
importr
function. This function can import R packages. In this example, we will import the datasets R package. Create a NumPy array from the
mtcars
dataset:
Plot the dataset.
Plot the dataset with Matplotlib:
The following image shows the data, which is a two dimensional array:
The complete code for this recipe is as follows:
Jython
is the default interoperability solution for Python and Java. However, Jython runs on the Java Virtual Machine, and therefore, cannot access NumPy modules because they are mostly written in C.
JPype is an open-source project that tries to solve this problem. The interfacing occurs on the native level between the Python and Java virtual machines. Let's install JPype.
Follow the ensuing steps for installing JPype:
Sending a NumPy array to JPype
In this recipe, we will start a JVM and send a
NumPy array to it. We will print the received array using standard Java calls. Obviously, you will need to have Java installed.
First, we need to start the JVM from JPype.
Start the JVM.
JPype is conveniently able to find the default JVM path:
Print hello world.
Just because of tradition, let's print hello world:
Send a NumPy array.
Create a NumPy array, convert it to a Python list, and pass it to JPype. Now, it's trivial to print the array elements:
Shutdown the JVM.
After we are done, we will shutdown the JVM:
Only one JVM can run at a time in JPype. If we forget to shutdown the JVM, it could lead to unexpected errors. The program output is as...
Installing Google App Engine
Google App Engine (GAE) enables you to build web applications in the Google cloud. Since 2012, there is official support for NumPy; you need to have a Google account to use GAE.
The first step is to download GAE.
Download GAE.
Download GAE for your operating system from https://developers.google.com/appengine/downloads.
From this page, you can download documentation and the GAE Eclipse plugin as well. If you are developing with Eclipse, you should definitely install it.
The development environment.
GAE comes with a development environment, which simulates the production cloud. GAE, at the time of writing, only supported officially Python 2.5 and 2.7. GAE will try to find Python on your system; however, it may be necessary to set that yourself, for instance, if you have multiple Python versions. You can set this setting in the Preferences dialog of the launcher application.
There are two important scripts in the SDK:
Deploying NumPy code in the Google cloud
Deploying GAE applications is pretty easy.
For NumPy an extra configuration step is required, but that will take only minutes.
Let's create a new application.
Create a new application.
Create a new application with the launcher (File | New Application). Name it numpycloud
. This will create a folder with the same name containing the following files:
app.yaml
: YAML application configuration file
favicon.ico
: Icon image
index.yaml
: Auto generated file
main.py
: Main entry point for the web application
Add NumPy to the libraries.
First, we need to let GAE know that we
want to use NumPy. Add the following lines to the app.yaml
configuration file in the libraries section:
The configuration file should have the following contents:
Running NumPy code in a Python Anywhere web console
In Chapter 1, we already saw a Python Anywhere console in action, without having an account. This recipe will require you to have an account,
but don't worry—it's free; at least if you don't need too many resources.
Signing up is a pretty straightforward process and will not be covered here. NumPy is already installed along with a long list of other Python software. For a complete list, see https://www.pythonanywhere.com/batteries_included/.
We will setup a simple script that gets price data from Google Finance every minute, and does simple statistics with the prices using NumPy.
Once we have signed up, we can login and have a look at the Python Anywhere dashboard:
Write the code.
The complete
code for this example is as follows:
PiCloud is another cloud computing provider, which is actually using the EC2 Amazon infrastructure. However, they do offer environments with preinstalled Python software, including NumPy. These environments are just EC2 instances that we can ssh into. In this recipe, we will be using the Python 2.7—Ubuntu Natty 11.04 environment. For the installed packages in this environment, see http://www.picloud.com/docs/base_environment/2/installed/
. PiCloud follows the freemium model, meaning that you can start out for free and pay later on if you require more resources.
After the mandatory sign up, log in to PiCloud.
Create an environment.
We start out without any environments. In order to create an environment, first click on the Environments tab. Next, click on the "create a new environment button. Select the Python 2.7 base environment. Currently, you can choose between a Python 2.7 and 2.6 environments, both on Ubuntu.
Creating an environment takes a few minutes...