About this book

PyTorch is extremely powerful and yet easy to learn. It provides advanced features, such as supporting multiprocessor, distributed, and parallel computation. This book is an excellent entry point for those wanting to explore deep learning with PyTorch to harness its power.

This book will introduce you to the PyTorch deep learning library and teach you how to train deep learning models without any hassle. We will set up the deep learning environment using PyTorch, and then train and deploy different types of deep learning models, such as CNN, RNN, and autoencoders.

You will learn how to optimize models by tuning hyperparameters and how to use PyTorch in multiprocessor and distributed environments. We will discuss long short-term memory network (LSTMs) and build a language model to predict text.

By the end of this book, you will be familiar with PyTorch's capabilities and be able to utilize the library to train your neural networks with relative ease.

Publication date:
December 2018


Introduction to PyTorch

This is a step-by-step introduction to deep learning using the PyTorch framework. PyTorch is a great entry point into deep learning and if you have some knowledge of Python then you will find PyTorch an intuitive, productive, and enlightening experience. The ability to rapidly prototype experiments and test ideas is a core strength of PyTorch. Together with the possibility of being able to turn experiments into productive, deployable resources, the learning curve challenge is abundantly rewarded.

PyTorch is a relatively easy and fun way to understand deep learning concepts. You may be surprised at how few lines of code it takes to solve common problems of classification, such as hand-writing recognition and image classification. Having said that PyTorch is easy cannot override the fact that deep learning is, in many ways, hard. It involves some complicated math and some intractable logical conundrums. This should not, however, distract from the fun and useful part of this enterprise. There is no doubt machine learning can provide deep insights and solve important problems in the world around us but to get there can take some work.

This book is an attempt, not to gloss over important ideas, but to explain them in a way that is jargon free and succinct. If the idea of solving complicated differential equations makes you break out in a cold sweat, you are not alone. This might be related to some high school trauma of a bad-tempered math teacher furiously demanding you cite Euler's formula or the trigonometric identities. This is a problem because math itself should be fun, and insight arises not from the laborious memorizing of formulas but through understanding relationships and foundational concepts.

Another thing that can make deep learning appear difficult is that it has a diverse and dynamic frontier of research. This may be confusing for the novice because it does not present an obvious entry point. If you understand some principles and want to test your ideas, it can be a bewildering task to find a suitable set of tools. The combinations of development language, framework, deployment architecture, and so on, present a non-trivial decision process.

The science of machine learning has matured to the point that a set of general purpose algorithms for solving problems such has classification and regression have emerged. Subsequently, several frameworks have been created to harness the power of these algorithms and use them for general problem solving. This means that the entry point is at such a level that these technologies are now in the hands of the non-computer science professional. Experts in a diverse array of domains can now use these ideas to advance their endeavors. By the end of this book, and with a little dedication, you will be able to build and deploy useful deep learning models to help solve the problems you are interested in.

In this chapter, we will discuss the following topics:

  • What is PyTorch?
  • Installing PyTorch
  • Basic operations
  • Loading data

What is PyTorch?

PyTorch is a dynamic tensor-based, deep learning framework for experimentation, research, and production. It can be used as a GPU-enabled replacement for NumPy or a flexible, efficient platform for building neural networks. The dynamic graph creation and tight Python integration makes PyTorch a standout in deep learning frameworks.

If you are at all familiar with the deep learning ecosystem, then frameworks such as Theano and TensorFlow, or higher-level derivatives such as Keras, are amongst the most popular. PyTorch is a relative newcomer to the deep learning framework set. Despite this, it is now being used extensively by Google, Twitter, and Facebook. It stands out from other frameworks in that both Theano and TensorFlow encode computational graphs in static structures that need to be run in self-contained sessions. In contrast, PyTorch can dynamically implement computational graphs. The consequence for a neural net is that the network can change behavior as it is being run, with little or no overhead. In TensorFlow and Theano, to change behavior, you effectively have to rebuild the network from scratch.

This dynamic implementation comes about through a process called tape-based auto-diif, allowing PyTorch expressions to be automatically differentiated. This has numerous advantages. Gradients can be calculated on the fly and since the computational graph is dynamic, it can be changed at each function call, allowing it to be used in interesting ways in loops and under conditional calls that can respond, for example, to input parameters or intermediate results. This dynamic behavior and great flexibility has made PyTorch a favored experimental platform for deep learning.

Another advantage of PyTorch is that it is closely integrated with the Python language. For Python coders, it is very intuitive and it interoperates seamlessly with other Python packages, such as NumPy and SciPy. PyTorch is very easy to experiment with. It makes an ideal tool for not only building and running useful models, but also as a way to understand deep learning principles by direct experimentation.

As you would expect, PyTorch can be run on multiple graphical processing units (GPUs). Deep learning algorithms can be computationally expensive. This is especially true for big datasets. PyTorch has strong GPU support, with intelligent memory sharing of tensors between processes. This basically means there is an efficient and user-friendly way to distribute the processing load across the CPU and GPUs. This can make a big difference to the time it takes to test and run large complex models.

Dynamic graph generation, tight Python language integration, and a relatively simple API makes PyTorch an excellent platform for research and experimentation. However, versions prior to PyTorch 1 had deficits that prevented it from excelling in production environments. This deficiency is being addressed in PyTorch 1.

Research is an important application for deep learning, but increasingly, deep learning is being embedded in applications that run live on the web, on a device, or in a robot. Such an application may service thousands of simultaneous queries and interact with massive, dynamic data. Although Python is one of the best languages for humans to work with, specific efficiencies and optimizations are available in other languages, most commonly C++ and Java. Even though the best way to build a particular deep learning model may be with PyTorch, this may not be the best way to deploy it. This is no longer a problem because now with PyTorch 1, we can export Python free representations of PyTorch models.

This has come about through a partnership between Facebook, the major stakeholder of PyTorch, and Microsoft, to create the Open Neural Network Exchange (ONNX) to assist developers in converting neural net models between frameworks. This has led to the merging of PyTorch with the more production-ready framework, CAFFE2. In CAFFE2, models are represented by a plain text schema, making them language agnostic. This means they are more easily deployed to Android, iOS, or Rasberry Pi devices.

With this in mind, PyTorch version 1 has expanded its API included production-ready capabilities, such as optimizing code for Android and iPhone, a just in time (JIT) C++ compiler, and several ways to make Python free representations of your models.

In summary, PyTorch has the following characteristics:

  • Dynamic graph representation
  • Tightly integrated with the Python programming language
  • A mix of high-and low-level APIs
  • Straightforward implementation on multiple GPUs
  • Able to build Python-free model representation for export and production
  • Scales to massive data using the Caffe framework

Installing PyTorch

PyTorch will run on macOS X, 64 bit Linux, and 64 bit Windows. Be aware that Windows does not currently offer (easy) support for the use of GPUs in PyTorch. You will need to have either Python 2.7 or Python 3.5 / 3.6 installed on your computer before you install PyTorch, remembering to install the correct version for each Python version. Unless you have a reason not to, it is recommended that you install the Anaconda distribution of Python. This this is available from: https://anaconda.org/anaconda/python.

Anaconda includes all the dependencies of PyTorch, as well as technical, math, and scientific libraries essential to your work in deep learning. These will be used throughout the book, so unless you want to install them all separately, install Anaconda.

The following is a list of the packages and tools that we will be using in this book. They are all installed with Anaconda:

  • NumPy: A math library primarily used for working with multidimensional arrays
  • Matplotlib: A plotting and visualization library
  • SciPy: A package for scientific and technical computing
  • Skit-Learn: A library for machine learning
  • Pandas: A library for working with data
  • IPython: A notebook-style code editor used for writing and running code in a browser

Once you have Anaconda installed, you can now install PyTorch. Go to the PyTorch website at https://pytorch.org/.

The installation matrix on this website is pretty self-explanatory. Simply select your operating system, Python version, and, if you have GPUs, your CUDA version, and then run the appropriate command.

As always, it is good practice to ensure your operating system and dependent packages are up to date before installing PyTorch. Anaconda and PyTorch run on Windows, Linux, and macOS, although Linux is probably the most used and consistent operating system. Throughout this book, I will be using Python 3.7 and Anaconda 3.6.5 running on Linux

Code in this book was written on the Jupyter Notebook and these notebooks are available from the book's website.

You can either choose to set up your PyTorch environment locally on your own machine or remotely on a cloud server. They each have their pros and cons. Working locally has the advantage that it is generally easier and quicker to get started. This is especially true if you are not familiar with SSH and the Linux terminal. It is simply a matter of installing Anaconda and PyTorch, and you are on your way. Also, you get to choose and control your own hardware, and while this is an upfront cost, it is often cheaper in the long run. Once you start expanding hardware requirements, cloud solutions can become expensive. Another advantage of working locally is that you can choose and customize your integrated development envionment (IDE). In fact, Anaconda has its own excellent desktop IDE called Spyder.

There are a few things you need to keep in mind when building your own deep learning hardware and you require GPU acceleration:

  • Use NVIDIA CUDA-compliant GPUs (for example, GTX 1060 or GTX 1080)
  • A chipset that has at least 16 PCIe lanes
  • At least 16 GB of RAM

Working on the cloud does offer the flexibility to work from any machine as well as more easily experiment with different operating systems, platforms, and hardware. You also have the benefit of being able to share and collaborate more easily. It is generally cheap to get started, costing a few dollars a month, or even free, but as your projects become more complex and data intensive, you will need to pay for more capacity.

Let's look briefly at the installation procedures for two cloud server hosts: Digital Ocean and Amazon Web Services.

Digital Ocean

Digital Ocean offers one of the simplest entry points into cloud computing. It offers predictable simple payment structures and straightforward server administration. Unfortunately, Digital Ocean does not currently support GPUs. The functionality revolves around droplets, pre-built instances of virtual private servers. The following are the steps required to set up a droplet:

  1. Sign up for an account with Digital Ocean. Go to https://www.digitalocean.com/.
  2. Click on the Create button and choose New Droplet.
  3. Select the Ubuntu distribution of Linux and choose the two gigabyte plan or above.
  4. Select the CPU optimization if required. The default values should be fine to get started.
  5. Optionally, set up public/private key encryption.
  6. Set up an SSH client (for example, PuTTY) using the information contained in the email sent to you.
  7. Connect to your droplet via your SSH client and curl the latest Anaconda installer. You can find the address location of the installer for your particular environment at https://repo.continuum.io/.
  8. Install PyTorch using this command:
conda install pytorch torchvision -c pytorch

Once you have spun up your droplet, you can access the Linux command through an SSH client. From Command Prompt, you can curl the latest Anaconda installer available from: https://www.anaconda.com/download/#linux.

An installation script is also available from the continuum archive at https://repo.continuum.io/archive/. Full step-by-step instructions are available from the Digital Ocean tutorials section.

Tunneling in to IPython

IPython is an easy and convenient way to edit code through a web browser. If you are working on a desktop computer, you can just launch IPython and point your browser to localhost:8888. This is the port that the IPython server, Jupyter, runs on. However, if you are working on a cloud server, then a common way to work with code is to tunnel in to IPython using SSH. Tunneling in to IPython involves the following steps:

  1. In your SSH client, set your destination port to localhost:8888. In PuTTY, go to Connection | SSH | Tunnels.
  2. Set the source port to anything above 8000 to avoid conflicting with other services. Click Add. Save these settings and open the connection. Log in to your droplet as usual.
  3. Start the IPython server by typing jupyter notebook into Command Prompt of your server instance.
  4. Access IPython by pointing your browser to localhost: source port; for example, localhost:8001.
  5. Start the IPython server.

Note that you may need a token to access the server for the first time. This is available from the command output once you start Jupyter. You can either copy the URL given in this output directly into your browser's address bar, changing the port address to your local source port address, for example: 8001, or you can elect to paste the token, the part after token=, into the Jupyter start-up page and replace it with a password for future convenience. You now should be able to open, run, and save IPython notebooks.

Amazon Web Services (AWS)

AWS is the original cloud computing platform, most noted for its highly-scalable architecture. It offers a vast array of products. What we need to begin is an EC2 instance. This can be accessed form the Services tab of the AWS control panel. From there, select EC2 and then Launch Instance. From here, you can choose the machine image you require. AWS provide several types of machine images specifically for deep learning. Feel free to experiment with any of these but the one we are going to use here is the deep learning AMI for Ubuntu version 10. It comes with pre-installed environments for PyTorch and TensorFlow. After selecting this, you get to choose other options. The default T2 micro with 2 GB of memory should be fine to experiment with; however, if you want GPU acceleration, you will need to use the T2 medium instance type. Finally, when you launch your instance, you will be prompted to create and download your public-private key pair. You can then use your SSH client to connect to the server instance and tunnel in to the Jupyter Notebook as per the previous instructions. Once again, check the documentation for the finer details. Amazon has a pay-per-resource model, so it is important you monitor what resources you are using to ensure you do not receive any unnecessary or unexpected charges.


Basic PyTorch operations

Tensors are the workhorse of PyTorch. If you know linear algebra, they are equivalent to a matrix. Torch tensors are effectively an extension of the numpy.array object. Tensors are an essential conceptual component in deep learning systems, so having a good understanding of how they work is important.

In our first example, we will be looking at tensors of size 2 x 3. In PyTorch, we can create tensors in the same way that we create NumPy arrays. For example, we can pass them nested lists, as shown in the following code:

Here we have created two tensors, each with dimensions of 2 x 3. You can see that we have created a simple linear function (more about linear functions in Chapter 2, Deep Learning Fundamentals) and applied it to x and y and printed out the result. We can visualize this with the following diagram:

As you may know from linear algebra, matrix multiplication and addition occur element-wise so that for the first element of x, let's write this as X00. This is multiplied by two and added to the first element of y, written as Y00, giving F00 = 9. X01 = 2 and Y01 = 8 so f01 = 4 + 12. Notice that the indices start at zero.

If you have never seen any linear algebra, don't worry too much about this, as we are going to brush up on these concepts in Chapter 2, Deep Learning Fundamentals, and you will get to practice with Python indexing shortly. For now, just consider our 2 x 3 tensors as tables with numbers in them.

Default value initialization

There are many cases where we need to initialize torch tensors to default values. Here, we create three 2 x 3 tensors, filling them with zeros, ones, and random floating point numbers:

An important point to consider when we are initializing random arrays is the so-called seed of reproducibility. See what happens when you run the preceding code several times. You get a different array of random numbers each time. Often in machine learning, we need to be able to reproduce results. We can achieve this by using a random seed. This is demonstrated in the following code:

Notice that when you run this code many times, the tensor values stay the same. If you remove the seed by deleting the first line, the tensor values will be different each time the code is run. It does not matter what number you use to seed the random number generator, as long as it is consistently, achieves reproducible results.

Converting between tensors and NumPy arrays

Converting a NumPy array is as simple as performing an operation on it with a torch tensor. The following code should make this clear:

We can see the result of the type torch tensor. In many cases, we can use NumPy arrays interchangeably with tensors and always be sure the result is a tensor. However, there are times when we need to explicitly create a tensor from an array. This is done with the torch.from_numpy function:

To convert from a tensor to a NumPy array, simply call the torch.numpy() function:

Notice that we use Python's built-in type() function, as in type(object), rather than the tensor.type() we used previously. The NumPy arrays do not have a type attribute. Another important thing to understand is that NumPy arrays and PyTorch tensors share the same memory space. For example, see what happens when we change a variables value as demonstrated by the following code:

Note also that when we print a tensor, it returns a tuple consisting of the tensor itself and also its dtype, or data type attribute. It's important here because there are certain dtype arrays that cannot be turned into tensors. For example, consider the following code:

This will generate an error message telling us that only supported dtype are able to be converted into tensors. Clearly, int8 is not one of these supported types. We can fix this by converting our int8 array to an int64 array before passing it to torch.from_numpy. We do this with the numpy.astype function, as the following code demonstrates:

It is also important to understand how numpy dtype arrays convert to torch dtype. In the previous example, numpy int32 converts to IntTensor. The following table lists the torch dtype and their numpy equivalents:

Numpy type


Torch type



torch.int64 torch.float


64 bit integer


torch.int32 torch.int


32 bit signed integer




8 bit unsigned integer

float64 double

torch.float64 torch.double


64 bit floating point


torch.float32 torch.float


32 bit floating point

torch.int16 torch.short


16 bit signed integer



6 bit signed integer

The default dtype for tensors is FloatTensor; however, we can specify a particular data type by using the tensor's dtype attribute. For an example, see the following code:

Slicing and indexing and reshaping

torch.Tensor have most of the attributes and functionality of NumPy. For example, we can slice and index tensors in the same way as NumPy arrays:

Here, we have printed out the first element of x, written as x0, and in the second example, we have printed out a slice of the second element of x; in this case, x11 and x12.

If you have not come across slicing and indexing, you may want to look at this again. Note that indexing begins at 0, not 1, and we have kept our subscript notation consistent with this. Notice also that the slice [1][0:2] is the elements x10 and x11, inclusive. It excludes the ending index, index 2, specified in the slice.

We can can create a reshaped copy of an existing tensor using the view() function. The following are three examples:

It is pretty clear what (3,2) and (6,1) do, but what about the –1 in the first example? This is useful if you know how many columns you require, but do not know how many rows this will fit into. Indicating –1 here is telling PyTorch to calculate the number of rows required. Using it without another dimension simply creates a tensor of a single row. You could rewrite example two mentioned previously, as follows, if you did not know the input tensor's shape but know that it needs to have three rows:

An important operation is swapping axes or transposing. For a two-dimensional tensor, we a can use tensor.transpose(), passing it the axis we want to transpose. In this example, the original 2 x 3 tensor becomes a 3 x 2 tensor. The rows simply become the columns:

In PyTorch, transpose() can only swap two axes at once. We could use transpose in multiple steps; however, a more convenient way is to use permute(), passing it the axes we want to swap. The following example should make this clear:

When we are considering tensors in two dimensions, we can visualize them as flat tables. When we move to higher dimensions, this visual representation becomes impossible. We simply run out of spatial dimensions. Part of the magic of deep learning is that it does not matter much in terms of the mathematics involved. Real-world features are each encoded into a dimension of a data structure. So, we may be dealing with tensors of potentially thousands of dimensions. Although it might be disconcerting, most of the ideas that can be illustrated in two or three dimensions work just as well in higher dimensions.

In place operations

It is important to understand the difference between in place and assignment operations. When, for example, we use transpose(x), a value is returned but the value of x does not change. In all the examples up until now, we have been performing operations by assignment. That is, we have been assigning a variable to the result of an operation, or simply printing it to the output, as in the preceding example. In either case, the original variable remains untouched. Alternatively, we may need to apply an operation in place. We can, of course, assign a variable to itself, such as in x = x.transpose(0,1); however, a more convenient way to do this is with in place operations. In general, in place operations in PyTorch have a trailing underscore. For an example, see the following code:

As another example, here is the linear function we started this chapter with using in place operations on y:


Loading data

Most of the time you will spend on a deep learning project will be spent working with data and one of the main reasons that a deep learning project will fail is because of bad, or poorly understood data. This issue is often overlooked when we are working with well-known and well-constructed datasets. The focus here is on learning the models. The algorithms that make deep learning models work are complex enough themselves without this complexity being compounded by something that is only partially known, such as an unfamiliar dataset. Real-world data is noisy, incomplete, and error prone. These axes of confoundedness mean that if a deep learning algorithm is not giving sensible results, after errors of logic in the code are eliminated, bad data, or errors in our understanding of the data, are the likely culprit.

So putting aside our wrestle with data, and with an understanding that deep learning can provide valuable real-world insights, how do we learn deep learning? Our starting point is to eliminate as many of the variables that we can. This can be achieved by using data that is well known and representative of a specific problem; say, for example, classification. This enables us to have both a starting point for deep learning tasks, as well as a standard to test model ideas.

One of the most well-known datasets is the MNIST dataset of hand-written digits, where the usual task is to correctly classify each of the digits, from zero through nine. The best models get an error rate of around 0.2%. We could apply this well-performing model with a few adjustments, to any visual classification task, with varying results. It is unlikely we will get results anywhere near 0.2% and the reason is because the data is different. Understanding how to tweek a deep learning model to take into account these sometimes subtle differences in data, is one of the key skills of a successful deep learning practitioner.

Consider an image classification task of facial recognition from color photographs. The task is still classification but the differences in that data type and structure dictate how the model will need to change to take this into account. How this is done is at the heart of machine learning. For example, if we are working with color images, as opposed to black and white images, we will need two extra input channels. We will also need output channels for each of the possible classes. In a handwriting classification task, we need 10 output channels; one channel for each of the digits. For a facial recognition task, we would consider having an output channel for each target face (say, for criminals in a police database).

Clearly, an important consideration is data types and structures. The way image data is structured in an image is vastly different to that of, say, an audio signal, or output from a medical device. What if we are trying to classify people's names by the sound of their voice, or classify a disease by its symptoms? They are all classification tasks; however, in each specific case, the models that represent each of these will be vastly different. In order to build suitable models in each case, we will need to become intimately acquainted with the data we are using.

It is beyond the scope of this book to discuss the nuances and subtleties of each data type, format, and structure. What we can do is give you a brief insight into the tools, techniques, and best practice of data handling in PyTorch. Deep learning datasets are often very large and it is an important consideration to see how they are handled in memory. We need to be able to transform data, output data in batches, shuffle data, and perform many other operations on data before we feed it to a model. We need to be able to do all these things without loading the entire dataset into memory, since many datasets are simply too large. PyTorch takes an object approach when working with data, creating class objects for each specific activity. We will examine this in more detail in the coming sections.

PyTorch dataset loaders

Pytorch includes data loaders for several datasets to help you get started. The torch.dataloader is the class used for loading datasets. The following is a list of the included torch datasets and a brief description:


Handwritten digits 1–9. A subset of NIST dataset of handwritten characters. Contains a training set of 60,000 test images and a test set of 10,000.

Fashion- MNIST

A drop-in dataset for MNIST. Contains images of fashion items; for example, T-shirt, trousers, pullover.


Based on NIST handwritten characters, including letters and numbers and split for 47, 26, and 10 class classification problems.


Over 100,000 images classified into everyday objects; for example, person, backpack, and bicycle. Each image can have more than one class.


Used for large-scale scene classification of images; for example, bedroom, bridge, church.


Large-scale visual recognition dataset containing 1.2 million images and 1,000 categories. Implemented with ImageFolder class, where each class is in a folder.


60,000 low-res (32 32) color images in 10 mutually exclusive classes; for example, airplane, truck, and car.


Similar to CIFAR but with higher resolution and larger number of unlabeled images.


600,000 images of street numbers obtained from Google Street View. Used for recognition of digits in real-world settings.


Learning Local Image descriptors. Consists of gray scale images composed of 126 patches accompanied with a descriptor text file. Used for pattern recognition.

Here is a typical example of how we load one of these datasets into PyTorch:

CIFAR10 is a torch.utils.dataset object. Here, we are passing it four arguments. We specify a root directory relative to where the code is running, a Boolean, train, indicating if we want the test or training set loaded, a Boolean that, if set to True, will check to see if the dataset has previously been downloaded and if not download it, and a callable transform. In this case, the transform we select is ToTensor(). This is an inbuilt class of torchvision.transforms that makes the class return a tensor. We will discuss transforms in more detail later in the chapter.

The contents of the dataset can be retrieved by a simple index lookup. We can also check the length of the entire dataset with the len function. We can also loop through the dataset in order. The following code demonstrates this:

Displaying an image

The CIFAR10 dataset object returns a tuple containing an image object and a number representing the label of the image. We see from the size of the image data, that each sample is a 3 x 32 x 32 tensor, representing three color values for each of the 322 pixels in the image. It is important to know that this is not quite the same format used for matplotlib. A tensor treats an image in the format of [color, height, width], whereas a numpy image is in the format [height, width, color]. To plot an image, we need to swap axes using the permute() function, or alternatively convert it to a NumPy array and using the transpose function. Note that we do not need to convert the image to a NumPy array, as matplotlib will display the correctly permuted tensor. The following code should make this clear:


We will see that in a deep learning model, we may not always want to load images one at a time or load them in the same order each time. For this, and other reasons, it is often better to use the torch.utils.data.DataLoader object. DataLoader provides a multipurpose iterator to sample the data in a specified way, such as in batches, or shuffled. It is also a convenient place to assign workers in multiprocessor environments.

In the following example, we sample the dataset in batches of four samples each:

Here DataLoader returns a tuple of two tensors. The first tensor contains the image data of all four images in the batch. The second tensor are the images labels. Each batch consists of four image label, pairs, or samples. Calling next() on the iterator generates the next set of four samples. In machine learning terminology, each pass over the entire dataset is called an epoch. This technique is used extensively, as we will see to train and test deep learning models.

Creating a custom dataset

The Dataset class is an abstract class representing a dataset. Its purpose is to have a consistent way of representing the specific characteristics of a dataset. When we are working with unfamiliar datasets, creating a Dataset object is a good way to understand and represent the structure of the data. It is used with a data loader class to draw samples from a dataset in a clean and efficient manner. The following diagram illustrates how these classes are used:

Common actions we perform with a Dataset class include checking the data for consistency, applying transform methods, dividing the data into training and test sets, and loading individual samples.

In the following example, we are using a small toy dataset consisting of images of objects that are classified as either toys or not toys. This is representative of a simple image classification problem where a model is trained on a set of labeled images. A deep learning model will need the data with various transformations applied in a consistent manner. Samples may need to be drawn in batches and the dataset shuffled. Having a framework for representing these data tasks greatly simplifies and enhances deep learning models.

The complete dataset is available at http://www.vision.caltech.edu/pmoreels/Datasets/Giuseppe_Toys_03/.

For this example, I have created a smaller subset of the dataset, together with a labels.csv file. This is available in the data/GiuseppeToys folder in the GitHub repository for this book. The class representing this dataset is as follows:

The __init__ function is where we initialize all the properties of the class. Since it is only called once when we first create the instance to do all the things, we perform all the housekeeping functions, such as reading CSV files, setting the variables, and checking data for consistency. We only perform operations that occur across the entire dataset, so we do not download the payload (in this example, an image), but we make sure that the critical information about the dataset, such as directory paths, filenames, and dataset labels are stored in variables.

The __len__ function simply allows us to call Python's built-in len() function on the dataset. Here, we simply return the length of the list of label tuples, indicating the number of images in the dataset. We want to make sure that stays as simple and reliable as possible because we depend on it to correctly iterate through the dataset.

The __getitem__ function is an built-in Python function that we override in our Dataset class definition. This gives the Dataset class the functionality of Python sequence types, such as the use of indexing and slicing. This method gets called often—every time we do an index lookup—so make sure it only does what it needs to do to retrieve the sample.

To harness this functionality into our own dataset, we need to create an instance of our custom dataset as follows:


As well as the ToTensor() transform, the torchvision package includes a number of transforms specifically for Python imaging library images. We can apply multiple transforms to a dataset object using the compose function as follows:

Compose objects are essentially a list of transforms that can then be passed to the dataset as a single variable. It is important to note that the image transforms can only be applied to PIL image data, not tensors. Since transforms in a compose are applied in the order that they are listed, it is important that the ToTensor transform occurs last. If it is placed before the PIL transforms in the Compose list, an error will be generated.

Finally, we can check that it all works by using DataLoader to load a batch of images with transforms, as we did before:


We can see that the main function of the dataset object is to take a sample from a dataset, and the function of DataLoader is to deliver a sample, or a batch of samples, to a deep learning model for evaluation. One of the main things to consider when writing our own dataset object is how do we build a data structure in accessible memory from data that is organized in files on a disk. A common way we might want to organize data is in folders named by class. Let's say that, for this example, we have three folders named toy, notoy, and scenes, contained in a parent folder, images. Each of these folders represent the label of the files contained within them. We need to be able to load them while retaining them as separate labels. Happily, there is a class for this, and like most things in PyTorch, it is very easy to use. The class is torchvision.datasets.ImageFolder and it is used as follows:

Within the data/GiuseppeToys/images folder, there are three folders, toys, notoys, and scenes, containing images with their folder names indicating labels. Notice that the retrieved labels using DataLoader are represented by integers. Since, in this example, we have three folders, representing three labels, DataLoader returns integers 1 to 3, representing the image labels.

Concatenating datasets

It is clear that the need will arise to join datasets—we can do this with the torch.utils.data.ConcatDataset class. ConcatDataset takes a list of datasets and returns a concatenated dataset. In the following example, we add two more transforms, removing the blue and green color channel. We then create two more dataset objects, applying these transforms and, finally, concatenating all three datasets into one, as shown in the following code:



In this chapter, we have introduced some of the features and operations of PyTorch. We gave an overview of the installation platforms and procedures. You have hopefully gained some knowledge of tensor operations and how to perform them in PyTorch. You should be clear about the distinction between in place and by assignment operations and should also now understand the fundamentals of indexing and slicing tensors. In the second half of this chapter, we looked at loading data into PyTorch. We discussed the importance of data and how to create a dataset object to represent custom datasets. We looked at the inbuilt data loaders in PyTorch and discussed representing data in folders using the ImageFolder object. Finally, we looked at how to concatenate datasets.

In the next chapter, we will take a whirlwind tour of deep learning fundamentals and their place in the machine learning landscape. We will get you up to speed with the mathematical concepts involved, including looking at linear systems and common techniques for solving them.

About the Author

  • David Julian

    David Julian is a freelance technology consultant and educator. He has worked as a consultant for government, private, and community organizations on a variety of projects, including using machine learning to detect insect outbreaks in controlled agricultural environments (Urban Ecological Systems Ltd., Bluesmart Farms), designing and implementing event management data systems (Sustainable Industry Expo, Lismore City Council), and designing multimedia interactive installations (Adelaide University). He has also written Designing Machine Learning Systems With Python for Packt Publishing and was a technical reviewer for Python Machine Learning and Hands-On Data Structures and Algorithms with Python - Second Edition, published by Packt.

    Browse publications by this author

Latest Reviews

(2 reviews total)
Nützlich, aber kein hoher Mehrwert gegenüber dem, was man online findet. Aber die e-Books sind fast unbrauchbar, weil Formeln, Bilder und Tabellen nicht richtig dargestellt werden. Bilder und Tabellen sind noch mit Mühen zu gebrauchen, aber super-winzige, nicht zoombare Formeln sind Mist. Und sogar einzel griechische Buchstaben im Fließtext sind wohl als Formeln kodiert, und haben dasselbe Problem.
Useful information, good price
Deep Learning with PyTorch Quick Start Guide
Unlock this book and the full library for $5 a month*
Start now