Scientific Computing APIs for Python

In this article, by Hemant Kumar Mehta author of the book Mastering Python Scientific Computing we will have comprehensive discussion of features and capabilities of various scientific computing APIs and toolkits in Python. Besides the basics, we will also discuss some example programs for each of the APIs. As symbolic computing is relatively different area of computerized mathematics, we have kept a special sub section within the SymPy section to discuss basics of computerized algebra system. In this article, we will cover following topics:

  • Scientific numerical computing using NumPy and SciPy
  • Symbolic Computing using SymPy

(For more resources related to this topic, see here.)

Numerical Scientific Computing in Python

The scientific computing mainly demands for facility of performing calculations on algebraic equations, matrices, differentiations, integrations, differential equations, statistics, equation solvers and much more. By default Python doesn't come with these functionalities. However, development of NumPy and SciPy has enabled us to perform these operations and much more advanced functionalities beyond these operations. NumPy and SciPy are very powerful Python packages that enable the users to efficiently perform the desired operations for all types of scientific applications.

NumPy package

NumPy is the basic Python package for the scientific computing. It provides facility of multi-dimensional arrays and basic mathematical operations such as linear algebra. Python provides several data structure to store the user data, while the most popular data structures are lists and dictionaries. The list objects may store any type of Python object as an element. These elements can be processed using loops or iterators. The dictionary objects store the data in key, value format.

The ndarrays data structure

The ndaarays are also similar to the list but highly flexible and efficient. The ndarrays is an array object to represent multidimensional array of fixed-size items. This array should be homogeneous. It has an associated object of dtype to define the data type of elements in the array. This object defines type of the data (integer, float, or Python object), size of data in bytes, byte ordering (big-endian or little-endian). Moreover, if the type of data is record or sub-array then it also contains details about them. The actual array can be constructed using any one of the array, zeros or empty methods.

Another important aspect of ndarrays is that the size of arrays can be dynamically modified. Moreover, if the user needs to remove some elements from the arrays then it can be done using the module for masked arrays. In a number of situations, scientific computing demands deletion/removal of some incorrect or erroneous data. The numpy.ma module provides the facility of masked array to easily remove selected elements from arrays. A masked array is nothing but the normal ndarrays with a mask. Mask is another associated array with true or false values. If for a particular position mask has true value then the corresponding element in the main array is valid and if the mask is false then the corresponding element in the main array is invalid or masked. In such case while performing any computation on such ndarrays the masked elements will not be considered.

File handling

Another important aspect of scientific computing is storing the data into files and NumPy supports reading and writing on both text as well as binary files. Mostly, text files are good way for reading, writing and data exchange as they are inherently portable and most of the platforms by default have capabilities to manipulate them. However, for some of the applications sometimes it is better to use binary files or the desired data for such application can only be stored in binary files. Sometimes the size of data and nature of data like image, sound etc. requires them to store in binary files. In comparison to text files binary files are harder to manage as they have specific formats. Moreover, the size of binary files are comparatively very small and the read/ write operations are very fast then the read/ write text files. This fast read/ write is most suitable for the application working on large datasets. The only drawback of binary files manipulated with NumPy is that they are accessible only through NumPy.

Python has text file manipulation functions such as open, readlines and writelines functions. However, it is not performance efficient to use these functions for scientific data manipulation. These default Python functions are very slow in reading and writing the data in file. NumPy has high performance alternative that load the data into ndarrays before actual computation.  In NumPy, text files can be accessed using numpy.loadtxt and numpy.savetxt functions.  The loadtxt function can be used to load the data from text files to the ndarrays. NumPy also has a separate functions to manipulate the data in binary files. The function for reading and writing are numpy.load and numpy.save respectively.

Sample NumPy programs

The NumPy array can be created from a list or tuple using the array, this method can transform sequences of sequences into two dimensional array.

import numpy as np

x = np.array([4,432,21], int)

print x                            #Output [  4 432  21]

x2d = np.array( ((100,200,300), (111,222,333), (123,456,789)) )

print x2d

Output:

[  4 432  21]

[[100 200 300]

[111 222 333]

[123 456 789]]

Basic matrix arithmetic operation can be easily performed on two dimensional arrays as used in the following program.  Basically these operations are actually applied on elements hence the operand arrays must be of equal size, if the size is not matching then performing these operations will cause a runtime error. Consider the following example for arithmetic operations on one dimensional array.

import numpy as np

x = np.array([4,5,6])

y = np.array([1,2,3])

print x + y                      # output [5 7 9]

print x * y                      # output [ 4 10 18]

print x - y                       # output [3 3 3]   

print x / y                       # output [4 2 2]

print x % y                    # output [0 1 0]

There is a separate subclass named as matrix to perform matrix operations. Let us understand matrix operation by following example which demonstrates the difference between array based multiplication and matrix multiplication. The NumPy matrices are 2-dimensional and arrays can be of any dimension.

import numpy as np

x1 = np.array( ((1,2,3), (1,2,3), (1,2,3)) )

x2 = np.array( ((1,2,3), (1,2,3), (1,2,3)) )

print "First 2-D Array: x1"

print x1

print "Second 2-D Array: x2"

print x2

print "Array Multiplication"

print x1*x2

 

mx1 = np.matrix( ((1,2,3), (1,2,3), (1,2,3)) )

mx2 = np.matrix( ((1,2,3), (1,2,3), (1,2,3)) )

print "Matrix Multiplication"

print mx1*mx2

 

Output:

First 2-D Array: x1

[[1 2 3]

 [1 2 3]

 [1 2 3]]

Second 2-D Array: x2

[[1 2 3]

 [1 2 3]

 [1 2 3]]

Array Multiplication

[[1 4 9]

 [1 4 9]

 [1 4 9]]

Matrix Multiplication

[[ 6 12 18]

 [ 6 12 18]

 [ 6 12 18]]

Following is a simple program to demonstrate simple statistical functions given in NumPy:

import numpy as np

x = np.random.randn(10)   # Creates an array of 10 random elements

print x

mean = x.mean()

print mean

std = x.std()

print std

var = x.var()

print var

First Sample Output:

[ 0.08291261  0.89369115  0.641396   -0.97868652  0.46692439 -0.13954144

 -0.29892453  0.96177167  0.09975071  0.35832954]

0.208762357623

0.559388806817

0.312915837192

Second Sample Output:

[ 1.28239629  0.07953693 -0.88112438 -2.37757502  1.31752476  1.50047537

  0.19905071 -0.48867481  0.26767073  2.660184  ]

0.355946458357

1.35007701045

1.82270793415

The above programs are some simple examples of NumPy.

SciPy package

SciPy extends Python and NumPy support by providing advanced mathematical functions such as differentiation, integration, differential equations, optimization, interpolation, advanced statistical functions, equation solvers etc. SciPy is written on top of the NumPy array framework. SciPy has utilized the arrays and the basic operations on the arrays provided in NumPy and extended it to cover most of the mathematical aspects regularly required by scientists and engineers for their applications. In this article we will cover examples of some basic functionality.

Optimization package

The optimization package in SciPy provides facility to solve univariate and multivariate minimization problems. It provides solutions to minimization problems using a number of algorithms and methods. The minimization problem has wide range of application in science and commercial domains. Generally, we perform linear regression, search for function's minimum and maximum values, finding the root of a function, and linear programming for such cases. All these functionalities are supported by the optimization package. 

Interpolation package

A number of interpolation methods and algorithms are provided in this package as built-in functions. It provides facility to perform univariate and multivariate interpolation, one dimensional and two dimensional Splines etc. We use univariate interpolation when data is dependent of one variable and if data is around more than one variable then we use multivariate interpolation. Besides these functionalities it also provides additional functionality for Lagrange and Taylor polynomial interpolators.

Integration and differential equations in SciPy

Integration is an important mathematical tool for scientific computations. The SciPy integrations sub-package provides functionalities to perform numerical integration. SciPy provides a range of functions to perform integration on equations and data. It also has ordinary differential equation integrator. It provides various functions to perform numerical integrations using a number of methods from mathematics using numerical analysis.

Stats module

SciPy Stats module contains a functions for most of the probability distributions and wide range or statistical functions. Supported probability distributions include various continuous distribution, multivariate distributions and discrete distributions. The statistical functions range from simple means to the most of the complex statistical concepts, including skewness, kurtosis chi-square test to name a few.

Clustering package and Spatial Algorithms in SciPy

Clustering analysis is a popular data mining technique having wide range of application in scientific and commercial applications. In Science domain biology, particle physics, astronomy, life science, bioinformatics are few subjects widely using clustering analysis for problem solution. Clustering analysis is being used extensively in computer science for computerized fraud detection, security analysis, image processing etc. The clustering package provides functionality for K-mean clustering, vector quantization, hierarchical and agglomerative clustering functions.

The spatial class has functions to analyze distance between data points using triangulations, Voronoi diagrams, and convex hulls of a set of points. It also has KDTree implementations for performing nearest-neighbor lookup functionality.

Image processing in SciPy          

SciPy provides support for performing various image processing operations including basic reading and writing of image files, displaying images, simple image manipulations operations such as cropping, flipping, rotating etc. It has also support for image filtering functions such as mathematical morphing, smoothing, denoising and sharpening of images. It also supports various other operations such as image segmentation by labeling pixels corresponding to different objects, Classification, Feature extraction for example edge detection etc.

Sample SciPy programs

In the subsequent subsections we will discuss some example programs using SciPy modules and packages. We start with a simple program performing standard statistical computations. After this, we will discuss a program performing finding a minimal solution using optimizations. At last we will discuss image processing programs.

Statistics using SciPy

The stats module of SciPy has functions to perform simple statistical operations and various probability distributions. The following program demonstrates simple statistical calculations using SciPy stats.describe function. This single function operates on an array and returns number of elements, minimum value, maximum value, mean, variance, skewness and kurtosis.

import scipy as sp

import scipy.stats as st

s = sp.randn(10)

n, min_max, mean, var, skew, kurt = st.describe(s)

print("Number of elements: {0:d}".format(n))

print("Minimum: {0:3.5f} Maximum: {1:2.5f}".format(min_max[0], min_max[1]))

print("Mean: {0:3.5f}".format(mean))

print("Variance: {0:3.5f}".format(var))

print("Skewness : {0:3.5f}".format(skew))

print("Kurtosis: {0:3.5f}".format(kurt))

Output:

Number of elements: 10

Minimum: -2.00080 Maximum: 0.91390

Mean: -0.55638

Variance: 0.93120

Skewness : 0.16958

Kurtosis: -1.15542

Optimization in SciPY

Generally, in mathematical optimization a non convex function called Rosenbrock function is used to test the performance of the optimization algorithm. The following program is demonstrating the minimization problem on this function. The Rosenbrock function of N variable is given by following equation and it has minimum value 0 at xi =1.

Mastering Python Scientific Computing

The program for the above function is:

import numpy as np

from scipy.optimize import minimize

 

# Definition of Rosenbrock function

def rosenbrock(x):

     return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)

 

x0 = np.array([1, 0.7, 0.8, 2.9, 1.1])

res = minimize(rosenbrock, x0, method = 'nelder-mead', options = {'xtol': 1e-8, 'disp': True})

 

print(res.x)

Output is:

Optimization terminated successfully.

         Current function value: 0.000000

         Iterations: 516

         Function evaluations: 827

[ 1.  1.  1.  1.  1.]

The last line is the output of print(res.x) where all the elements of array are 1.

Image processing using SciPy

Following two programs are developed to demonstrate the image processing functionality of SciPy. First of these program is simply displaying the standard test image widely used in the field of image processing called Lena. The second program is applying geometric transformation on this image. It performs image cropping and rotation by 45 %.

The following program is displaying Lena image using matplotlib API. The imshow method renders the ndarrays into an image and the show method displays the image.

from scipy import misc

l = misc.lena()

misc.imsave('lena.png', l)

import matplotlib.pyplot as plt

plt.gray()

plt.imshow(l)

plt.show()

Output: The output of the above program is the following screen shot:

Mastering Python Scientific Computing

The following program is performing geometric transformation. This program is displaying transformed images and along with the original image as a four axis array.

import scipy

from scipy import ndimage

import matplotlib.pyplot as plt

import numpy as np

 

lena = scipy.misc.lena()

lx, ly = lena.shape

crop_lena = lena[lx/4:-lx/4, ly/4:-ly/4]

crop_eyes_lena = lena[lx/2:-lx/2.2, ly/2.1:-ly/3.2]

rotate_lena = ndimage.rotate(lena, 45)

 

# Four axes, returned as a 2-d array

f, axarr = plt.subplots(2, 2)

axarr[0, 0].imshow(lena, cmap=plt.cm.gray)

axarr[0, 0].axis('off')

axarr[0, 0].set_title('Original Lena Image')

axarr[0, 1].imshow(crop_lena, cmap=plt.cm.gray)

axarr[0, 1].axis('off')

axarr[0, 1].set_title('Cropped Lena')

axarr[1, 0].imshow(crop_eyes_lena, cmap=plt.cm.gray)

axarr[1, 0].axis('off')

axarr[1, 0].set_title('Lena Cropped Eyes')

axarr[1, 1].imshow(rotate_lena, cmap=plt.cm.gray)

axarr[1, 1].axis('off')

axarr[1, 1].set_title('45 Degree Rotated Lena')

 

plt.show()

Output:

Mastering Python Scientific Computing

The SciPy and NumPy are core of Python's support for scientific computing as they provide solid functionality of numerical computing.

Symbolic computations using SymPy

Computerized computations performed over the mathematical symbols without evaluating or changing their meaning is called as symbolic computations. Generally the symbolic computing is also called as computerized algebra and such computerized system are called computer algebra system. The following subsection has a brief and good introduction to SymPy.

Computer Algebra System (CAS)

Let us discuss the concept of CAS. CAS is a software or toolkit to perform computations on mathematical expressions using computers instead of doing it manually. In the beginning, using computers for these applications was named as computer algebra and now this concept is called as symbolic computing. CAS systems may be grouped into two types. First is the general purpose CAS and the second type is the CAS specific to particular problem. The general purpose systems are applicable to most of the area of algebraic mathematics while the specialized CAS are the systems designed for the specific area such as group theory or number theory. Most of the time, we prefer the general purpose CAS to manipulate the mathematical expressions for scientific applications.

Features of a general purpose CAS

Various desired features of a general purpose computer algebra system for scientific applications are as:

  • A user interface to manipulate mathematical expressions.
  • An interface for programming  and debugging
  • Such systems requires simplification of various mathematical expressions hence, a simplifier is a most essential component of such computerized algebra system.
  • The general purpose CAS system must support exhaustive set of functions to perform various mathematical operations required by any algebraic computations
  • Most of the applications perform extensive computations an efficient memory management is highly essential.
  • The system must provide support to perform mathematical computations on high precision numbers and large quantities.

A brief idea of SymPy

SymPy is an open source and Python based implementation of computerized algebra system (CAS). The philosophy behind the SymPy development is to design and develop a CAS having all the desired features yet its code as simple as possible so that it will be highly and easily extensible. It is written completely in Python and do not requires any external library.

The basic idea about using SymPy is the creation and manipulation of expressions. Using SymPy, the user represents mathematical expressions in Python language using SymPy classes and objects. These expressions are composed of numbers, symbols, operators, functions etc. The functions are the modules to perform a mathematical functionality such as logarithms, trigonometry etc.

The development of SymPy was started by Ondřej Čertíkin August 2006. Since then, it has been grown considerably with the contributions more than hundreds of the contributors. This library now consists of 26 different integrated modules. These modules have capability to perform computations required for basic symbolic arithmetic, calculus, algebra, discrete mathematics, quantum physics, plotting and printing with the option to export the output of the computations to LaTeX and other formats.

The capabilities of SymPy can be divided into two categories as core capability and advanced capabilities as SymPy library is divided into core module with several advanced optional modules. The various supported functionality by various modules are as follows:

Core capabilities

The core capability module supports basic functionalities required by any mathematical algebra operations to be performed. These operations include basic arithmetic like multiplications, addition, subtraction and division, exponential etc. It also supports simplification of expressions to simplify complex expressions. It provides the functionality of expansion of series and symbols. Core module also supports functions to perform operations related to trigonometry, hyperbola, exponential, roots of equations, polynomials, factorials and gamma functions, logarithms etc. and a number of special functions for B-Splines, spherical harmonics, tensor functions, orthogonal polynomials etc.

There is strong support also given for pattern matching operations in the core module. Core capabilities of the SymPy also include the functionalities to support substitutions required by algebraic operations. It not only supports the high precision arithmetic operations over integers, rational and gloating point numbers but also non-commutative variables and symbols required in polynomial operations.

Polynomials

Various functions to perform polynomial operations belong to the polynomial module. These functions includes basic polynomial operations such as division, greatest common divisor (GCD) least common multiplier (LCM), square-free factorization, representation of  polynomials with symbolic coefficients, some special operations like computation of resultant, deriving trigonometric identities, Partial fraction decomposition, facilities for Gröbner basis over polynomial rings and fields.

Calculus

Various functionalities supporting different operations required by basic and advanced calculus are provided in this module. It supports functionalities required by limits, there is a limit function for this. It also supports differentiation and integrations and series expansion, differential equations and calculus of finite differences. SymPy is also having special support for definite integrals and integral transforms. In differential it supports numerical differential, composition of derivatives and fractional derivatives. 

Solving equations

Solver is the name of the SymPy module providing equations solving functionality. This module supports solving capabilities for complex polynomials, roots of polynomials and solving system of polynomial equations. There is a function to solve the algebraic equations. It not only provides support for solutions for differential equations including ordinary differential equations, some forms of partial differential equations, initial and boundary values problems etc. but also supports solution of difference equations. In mathematics, difference equation is also called recurrence relations, that is an equation that recursively defines a sequence or multidimensional array of values.

Discrete math

Discrete mathematics includes those mathematical structures which are discrete in nature rather than the continuous mathematics like calculus. It deals with the integers, graphs, statements from logic theory etc. This module has full support for binomial coefficient, products, summations etc.

This module also supports various functions from number theory including residual theory, Euler's Totient, partition and a number of functions dealing with prime numbers and their factorizations. SymPy also supports creation and manipulations of logic expressions using symbolic and Boolean values.

Matrices

SymPy has a strong support for various operations related to the matrices and determinants. Matrix belongs to linear algebra category of mathematics. It supports creation of matrix, basic matrix operations like multiplication, addition, matrix of zeros and ones, creation of random matrix and performing operations on matrix elements. It also supports special functions line computation of Hessian matrix for a function, Gram-Schmidt process on set of vectors, Computation of Wronskian for matrix of functions etc.

It has also full support for Eigenvalues/eigenvectors, matrix inversion, solution of matrix and determinants.  For computing determinants of the matrix, it also supports Bareis' fraction-free algorithm and berkowitz algorithms besides the other methods. For matrices it also supports nullspace calculation, cofactor expansion tools, derivative calculation for matrix elements, calculation of dual of matrix etc.

Geometry

SymPy is also having module that supports various operations associated with the two-dimensional (2-D) geometry. It supports creation of various 2-D entity or objects such as point, line, circle, ellipse, polygon, triangle, ray, segment etc. It also allows us to perform query on these entities such as area of some of the suitable objects line ellipse/ circle or triangle, intersection points of lines etc. It also supports other queries line tangency determination, finding similarity and intersection of entities.

Plotting

There is a very good module that allows us to draw two-dimensional and three-dimensional plots. At present, the plots are rendered using the matplotlib package. It also supports other packages such as TextBackend, Pyglet, textplot etc.  It has a very good interactive interface facility of customizations and plotting of various geometric entities.

The plotting module has the following functions:

  • Plotting 2-D line plots
  • Plotting of 2-D parametric plots.
  • Plotting of 2-D implicit and region plots.
  • Plotting of 3-D plots of functions involving two variables.
  • Plotting of 3-D line and surface plots etc.

Physics

There is a module to solve the problem from Physics domain. It supports functionality for mechanics including classical and quantum mechanics, high energy physics. It has functions to support Pauli Algebra, quantum harmonic oscillators in 1-D and 3-D. It is also having functionality for optics. There is a separate module that integrates unit systems into SymPy. This will allow users to select the specific unit system for performing his/ her computations and conversion between the units. The unit systems are composed of units and constant for computations.

Statistics

The statistics module introduced in SymPy to support the various concepts of statistics required in mathematical computations. Apart from supporting various continuous and discrete statistical distributions, it also supports functionality related to the symbolic probability. Generally, these distributions support functions for random number generations in SymPy.

Printing

SymPy is having a module for provide full support for Pretty-Printing. Pretty-print is the idea of conversions of various stylistic formatting into the text files such as source code, text files and markup files or similar content. This module produces the desired output by printing using ASCII and or Unicode characters.

It supports various printers such as LATEX and MathML printer. It is also capable of producing source code in various programming languages such as c, Python or FORTRAN. It is also capable of producing contents using markup languages like HTML/ XML.

SymPy modules

The following list has formal names of the modules discussed in above paragraphs:

  • Assumptions: assumption engine
  • Concrete: symbolic products and summations
  • Core: basic class structure: Basic, Add, Mul, Pow etc.
  • functions: elementary and special functions
  • galgebra: geometric algebra
  • geometry: geometric entities
  • integrals: symbolic integrator
  • interactive: interactive sessions (e.g. IPython)
  • logic: boolean algebra, theorem proving
  • matrices: linear algebra, matrices
  • mpmath: fast arbitrary precision numerical math
  • ntheory: number theoretical functions
  • parsing: Mathematica and Maxima parsers
  • physics: physical units, quantum stuff
  • plotting: 2D and 3D plots using Pyglet
  • polys: polynomial algebra, factorization
  • printing: pretty-printing, code generation
  • series: symbolic limits and truncated series
  • simplify: rewrite expressions in other forms
  • solvers: algebraic, recurrence, differential
  • statistics: standard probability distributions
  • utilities: test framework, compatibility stuf

There are numerous symbolic computing systems available in various mathematical toolkits. There are some proprietary software such as Maple/ Mathematica and there are some open source alternatives also such as Singular/ AXIOM. However, these products have their own scripting language, difficult to extend their functionality and having slow development cycle. Whereas SymPy is highly extensible, designed and developed in Python language and open source API that supports speedy development life cycle.

Simple exemplary programs

These are some very simple examples to get idea about the capacity of SymPy. These are less than ten lines of SymPy source codes which covers topics ranging from basis symbol manipulations to limits, differentiations and integrations. We can test the execution of these programs on SymPy live running SymPy online on Google App Engine available on http://live.sympy.org/.

Basic symbol manipulation

The following code is defines three symbols, an expression on these symbols and finally prints the expression.

import sympy

a = sympy.Symbol('a')

b = sympy.Symbol('b')

c = sympy.Symbol('c')

e = ( a * b * b + 2 * b * a * b) + (a * a + c * c)

print e

Output:

a**2 + 3*a*b**2 + c**2     (here ** represents power operation).

Expression expansion in SymPy

The following program demonstrates the concept of expression expansion. It defines two symbols and a simple expression on these symbols and finally prints the expression and its expanded form.

import sympy

a = sympy.Symbol('a')

b = sympy.Symbol('b')

e = (a + b) ** 4

print e

print e.expand()

Output:

(a + b)**4

a**4 + 4*a**3*b + 6*a**2*b**2 + 4*a*b**3 + b**4

Simplification of expression or formula

The SymPy has facility to simplify the mathematical expressions. The following program is having two expressions to simplify and displays the output after simplifications of the expressions.

import sympy

x = sympy.Symbol('x')

a = 1/x + (x*exp(x) - 1)/x

simplify(a)

simplify((x ** 3 +  x ** 2 - x - 1)/(x ** 2 + 2 * x + 1))

Output:

ex

x – 1

Simple integrations

The following program is calculates the integration of two simple functions.

import sympy

from sympy import integrate

x = sympy.Symbol('x')

integrate(x ** 3 + 2 * x ** 2 + x, x)

integrate(x / (x ** 2 + 2 * x), x)

Output:

x**4/4+2*x**3/3+x**2/2

log(x + 2)

Summary

In this article, we have discussed the concepts, features and selective sample programs of various scientific computing APIs and toolkits. The article started with a discussion of NumPy and SciPy. After covering NymPy, we have discussed concepts associated with symbolic computing and SymPy.

In the remaining article we have discussed the Interactive computing and data analysis & visualization alog with their APIs or toolkits. IPython is the python toolkit for interactive computing. We have also discussed the data analysis package Pandas and the data visualization API names Matplotlib.

Resources for Article:


Further resources on this subject:


You've been reading an excerpt of:

Mastering Python Scientific Computing

Explore Title