You're reading from Scientific Computing with Python 3
File I/O (input and output) is essential in a number of scenarios. For example:
- Working with measured or scanned data. Measurements are stored in files that need to be read to be analyzed.
- Interacting with other programs. Save results to files so that they can be imported in other applications, and vice-versa.
- Storing information for future reference or comparisons.
- Sharing data and results with others, possibly on other platforms using other software.
In this section, we will cover how to handle file I/O in Python.
In Python, an object of type file
represents the contents of a physical file stored on disk. A new file
object may be created using the following syntax:
myfile = open('measurement.dat','r') # creating a new file object from an existing file
The contents of the file may be accessed, for instance, with this:
print(myfile.read())
Usage of file objects requires some care. The problem is that a file has to be closed before it can be reread or used by other...
NumPy has built-in methods for reading and writing NumPy array data to text files. These are numpy.loadtxt
and numpy.savetxt
.
Writing an array to a text file is simple:
savetxt(filename,data)
There are two useful parameters given as strings, fmt
and delimiter
, which control the format and the delimiter between columns. The defaults are space for the delimiter and %.18e
for the format, which corresponds to the exponential format with all digits. The formatting parameters are used as follows:
x = range(100) # 100 integers savetxt('test.txt',x,delimiter=',') # use comma instead of space savetxt('test.txt',x,fmt='%d') # integer format instead of float with e
Reading to an array from a text file is done with the help of the following syntax:
filename = 'test.txt' data = loadtxt(filename)
Due to the fact that each row in an array must have the same length, each row in the text file must have the same number of elements. Similar to savetxt
, the default values...
The read and write methods you just saw convert data to strings before writing. Complex types (such as objects and classes) cannot be written this way. With Python’s pickle module, you can save any object and also multiple objects to file.
Data can be saved in plaintext (ASCII) format or using a slightly more efficient binary format. There are two main methods: dump
, which saves a pickled representation of a Python object to a file, and load
, which retrieves a pickled object from the file. The basic usage is like this:
import pickle with open('file.dat','wb') as myfile: a = random.rand(20,20) b = 'hello world' pickle.dump(a,myfile) # first call: first object pickle.dump(b,myfile) # second call: second object import pickle with open('file.dat','rb') as myfile: numbers = pickle.load(myfile) # restores the array text = pickle.load(myfile) # restores the string
Note the order in which the two objects...
Objects in dictionaries can be accessed by keys. There is a similar way to access particular data in a file by first assigning it a key. This is possible by using the module shelve:
from contextlib import closing import shelve as sv # opens a data file (creates it before if necessary) with closing(sv.open('datafile')) as data: A = array([[1,2,3],[4,5,6]]) data['my_matrix'] = A # here we created a key
In the section File handling, we saw that the built-in open
command generates a context manager, and we saw why this is important for handling external resources, such as files. In contrast to this command, sv.open
does not create a context manager by itself. The closing
command from the contextlib
module is needed to transform it into an appropriate context manager. Consider the following example of restoring the file:
from contextlib import closing import shelve as sv with closing(sv.open('datafile')) as data: # opens a data file ...
SciPy has the ability to read and write data in Matlab’s .mat
file format using the module. The commands are loadmat
and savemat
. To load data, use the following syntax:
import scipy.io data = scipy.io.loadmat('datafile.mat')
The variable data now contains a dictionary, with keys corresponding to the variable names saved in the .mat
file. The variables are in NumPy array format. Saving to .mat
files involves creating a dictionary with all the variables you want to save (variable name and value). The command is then savemat
:
data = {} data['x'] = x data['y'] = y scipy.io.savemat('datafile.mat',data)
This saves the NumPy arrays x
and y
with the same names when read into Matlab.
SciPy comes with some basic functions for handling images. The module function will read images to NumPy arrays. The function will save an array as an image. The following will read a JPEG image to an array, print the shape and type, then create a new array with a resized image, and write the new image to file:
import scipy.misc as sm # read image to array im = sm.imread("test.jpg") print(im.shape) # (128, 128, 3) print(im.dtype) # uint8 # resize image im_small = sm.imresize(im, (64,64)) print(im_small.shape) # (64, 64, 3) # write result to new image file sm.imsave("test_small.jpg", im_small)
Note the data type. Images are almost always stored with pixel values in the range 0...255 as 8-bit unsigned integers. The third shape value shows how many color channels the image has. In this case, 3 means it is a color image with values stored in this order: red im[0]
, green im[1]
, blue im[2]
. A gray scale...
File handling is inevitable when dealing with measurements and other sources of a larger amount of data. Also communication with other programs and tools is done via file handling.
You learned to see a file as a Python object like others with important methods such as readlines
and write
. We showed how files can be protected by special attributes, which may allow only read or only write access.
The way you write to a file often influences the speed of the process. We saw how data is stored by pickling or by using the shelve
method.