Getting Started with Deep Learning Using PyTorch
Deep learning (DL) has revolutionized industry after industry. It was once famously described by Andrew Ng on Twitter:
Electricity transformed countless industries; artificial intelligence (AI) will now do the same.
AI and DL are used like synonyms, but there are substantial differences between the two. Let's demystify the terminology used in the industry so that you, as a practitioner, will be able to differentiate between signal and noise.
In this chapter, we will cover the following different parts of AI:
- AI itself and its origination
- Machine learning in the real world
- Applications of deep learning
- Why deep learning now?
- Deep learning framework: PyTorch
Artificial intelligence
Countless articles discussing AI are published every day. The trend has increased in the last two years. There are several definitions of AI floating around the web, my favorite being the automation of intellectual tasks normally performed by humans.
The history of AI
The term artificial intelligence was first coined by John McCarthy in 1956, when he held the first academic conference on the subject. The journey of the question of whether machines think or not started much earlier than that. In the early days of AI, machines were able to solve problems that were difficult for humans to solve.
For example, the Enigma machine was built at the end of World War II to be used in military communications. Alan Turing built an AI system that helped to crack the Enigma code. Cracking the Enigma code was a very challenging task for a human, and it could take weeks for an analyst to do. The AI machine was able to crack the code in hours.
Computers have a tough time solving problems that are intuitive to us, such as differentiating between dogs and cats, telling whether your friend is angry at you for arriving late at a party (emotions), differentiating between a truck and a car, taking notes during a seminar (speech recognition), or converting notes to another language for your friend who does not understand your language (for example, French to English). Most of these tasks are intuitive to us, but we were unable to program or hard code a computer to do these kinds of tasks. Most of the intelligence in early AI machines was hard coded, such as a computer program playing chess.
In the early years of AI, a lot of researchers believed that AI could be achieved by hard coding rules. This kind of AI is called symbolic AI and was useful in solving well-defined, logical problems, but it was almost incapable of solving complex problems such as image recognition, object detection, object segmentation, language translation, and natural-language-understanding tasks. Newer approaches to AI, such as machine learning and DL, were developed to solve these kinds of problems.
To better understand the relationships among AI, ML, and DL, let's visualize them as concentric circles with AI—the idea that came first (the largest), then machine learning—(which blossomed later), and finally DL—which is driving today’s AI explosion (fitting inside both):

Machine learning
Machine learning (ML) is a sub-field of AI and has become popular in the last 10 years and, at times, the two are used interchangeably. AI has a lot of other sub-fields aside from machine learning. ML systems are built by showing lots of examples, unlike symbolic AI, where we hard code rules to build the system. At a high level, machine learning systems look at tons of data and come up with rules to predict outcomes for unseen data:

Most ML algorithms perform well on structured data, such as sales predictions, recommendation systems, and marketing personalization. An important factor for any ML algorithm is feature engineering and data scientists need to spend a lot of time to get the features right for ML algorithms to perform. In certain domains, such as computer vision and natural language processing (NLP), feature engineering is challenging as they suffer from high dimensionality.
Until recently, problems like this were challenging for organizations to solve using typical machine-learning techniques, such as linear regression, random forest, and so on, for reasons such as feature engineering and high dimensionality. Consider an image of size 224 x 224 x 3 (height x width x channels), where 3 in the image size represents values of red, green, and blue color channels in a color image. To store this image in computer memory, our matrix will contain 150,528 dimensions for a single image. Assume you want to build a classifier on top of 1,000 images of size 224 x 224 x 3, the dimensions will become 1,000 times 150,528. A special branch of machine learning called deep learning allows you to handle these problems using modern techniques and hardware.
Examples of machine learning in real life
The following are some cool products that are powered by machine learning:
- Example 1: Google Photos uses a specific form of machine learning called deep learning for grouping photos
- Example 2: Recommendation systems, which are a family of ML algorithms, are used for recommending movies, music, and products by major companies such as Netflix, Amazon, and iTunes
Deep learning
Traditional ML algorithms use handwritten feature extraction to train algorithms, while DL algorithms use modern techniques to extract these features in an automatic fashion.
For example, a DL algorithm predicting whether an image contains a face or not extracts features such as the first layer detecting edges, the second layer detecting shapes such as noses and eyes, and the final layer detecting face shapes or more complex structures. Each layer trains based on the previous layer's representation of the data. It's OK if you find this explanation hard to understand, the later chapters of the book will help you to intuitively build and inspect such networks:

The use of DL has grown tremendously in the last few years with the rise of GPUs, big data, cloud providers such as Amazon Web Services (AWS) and Google Cloud, and frameworks such as Torch, TensorFlow, Caffe, and PyTorch. In addition to this, large companies share algorithms trained on huge datasets, thus helping startups to build state-of-the-art systems on several use cases with little effort.
Applications of deep learning
Some popular applications that were made possible using DL are as follows:
- Near-human-level image classification
- Near-human-level speech recognition
- Machine translation
- Autonomous cars
- Siri, Google Voice, and Alexa have become more accurate in recent years
- A Japanese farmer sorting cucumbers
- Lung cancer detection
- Language translation beating human-level accuracy
The following screenshot shows a short example of summarization, where the computer takes a large paragraph of text and summarizes it in a few lines:

In the following image, a computer has been given a plain image without being told what it shows and, using object detection and some help from a dictionary, you get back an image caption stating two young girls are playing with lego toy. Isn't it brilliant?

Hype associated with deep learning
People in the media and those outside the field of AI, or people who are not real practitioners of AI and DL, have been suggesting that things like the story line of the film Terminator 2: Judgement Day could become reality as AI/DL advances. Some of them even talk about a time in which we will become controlled by robots, where robots decide what is good for humanity. At present, the ability of AI is exaggerated far beyond its true capabilities. Currently, most DL systems are deployed in a very controlled environment and are given a limited decision boundary.
My guess is that when these systems can learn to make intelligent decisions, rather than merely completing pattern matching and, when hundreds or thousands of DL algorithms can work together, then maybe we can expect to see robots that could probably behave like the ones we see in science fiction movies. In reality, we are no closer to general artificial intelligence, where machines can do anything without being told to do so. The current state of DL is more about finding patterns from existing data to predict future outcomes. As DL practitioners, we need to differentiate between signal and noise.
The history of deep learning
Though deep learning has become popular in recent years, the theory behind deep learning has been evolving since the 1950s. The following table shows some of the most popular techniques used today in DL applications and their approximate timeline:
Techniques |
Year |
Neural networks |
1943 |
Backpropogation |
Early 1960s |
Convolution Neural Networks |
1979 |
Recurrent neural networks |
1980 |
Long Short-Term Memory |
1997 |
Deep learning has been given several names over the years. It was called cybernetics in the 1970s, connectionism in the 1980s, and now it is either known as deep learning or neural networks. We will use DL and neural networks interchangeably. Neural networks are often referred to as an algorithms inspired by the working of human brains. However, as practitioners of DL, we need to understand that it is majorly inspired and backed by strong theories in math (linear algebra and calculus), statistics (probability), and software engineering.
Why now?
Why has DL became so popular now? Some of the crucial reasons are as follows:
- Hardware availability
- Data and algorithms
- Deep learning frameworks
Hardware availability
Deep learning requires complex mathematical operations to be performed on millions, sometimes billions, of parameters. Existing CPUs take a long time to perform these kinds of operations, although this has improved over the last several years. A new kind of hardware called a graphics processing unit (GPU) has completed these huge mathematical operations, such as matrix multiplications, orders of magnitude faster.
GPUs were initially built for the gaming industry by companies such as Nvidia and AMD. It turned out that this hardware is extremely efficient, not only for rendering high quality video games, but also to speed up the DL algorithms. One recent GPU from Nvidia, the 1080ti, takes a few days to build an image-classification system on top of an ImageNet dataset, which previously could have taken around a month.
If you are planning to buy hardware for running deep learning, I would recommend choosing a GPU from Nvidia based on your budget. Choose one with a good amount of memory. Remember, your computer memory and GPU memory are two different things. The 1080ti comes with 11 GB of memory and it costs around $700.
You can also use various cloud providers such as AWS, Google Cloud, or Floyd (this company offers GPU machines optimized for DL). Using a cloud provider is economical if you are just starting with DL or if you are setting up machines for organization usage where you may have more financial freedom.
The following image shows some of the benchmarks that compare performance between CPUs and GPUs :

Data and algorithms
Data is the most important ingredient for the success of deep learning. Due to the wide adoption of the internet and the growing use of smartphones, several companies, such as Facebook and Google, have been able to collect a lot of data in various formats, particularly text, images, videos, and audio. In the field of computer vision, ImageNet competitions have played a huge role in providing datasets of 1.4 million images in 1,000 categories.
These categories are hand-annotated and every year hundreds of teams compete. Some of the algorithms that were successful in the competition are VGG, ResNet, Inception, DenseNet, and many more. These algorithms are used today in industries to solve various computer vision problems. Some of the other popular datasets that are often used in the deep learning space to benchmark various algorithms are as follows:
- MNIST
- COCO dataset
- CIFAR
- The Street View House Numbers
- PASCAL VOC
- Wikipedia dump
- 20 Newsgroups
- Penn Treebank
- Kaggle
The growth of different algorithms such as batch normalization, activation functions, skip connections, Long Short-Term Memory (LSTM), dropouts, and many more have made it possible in recent years to train very deep networks faster and more successfully. In the coming chapters of this book, we will get into the details of each technique and how they help in building better models.
Deep learning frameworks
In the earlier days, people needed to have expertise in C++ and CUDA to implement DL algorithms. With a lot of organizations now open sourcing their deep learning frameworks, people with knowledge of a scripting language, such as Python, can start building and using DL algorithms. Some of the popular deep learning frameworks used today in the industry are TensorFlow, Caffe2, Keras, Theano, PyTorch, Chainer, DyNet, MXNet, and CNTK.
The adoption of deep learning would not have been this huge if it had not been for these frameworks. They abstract away a lot of underlying complications and allow us to focus on the applications. We are still in the early days of DL where, with a lot of research, breakthroughs are happening every day across companies and organizations. As a result of this, various frameworks have their own pros and cons.
PyTorch
PyTorch, and most of the other deep learning frameworks, can be used for two different things:
- Replacing NumPy-like operations with GPU-accelerated operations
- Building deep neural networks
What makes PyTorch increasingly popular is its ease of use and simplicity. Unlike most other popular deep learning frameworks, which use static computation graphs, PyTorch uses dynamic computation, which allows greater flexibility in building complex architectures.
PyTorch extensively uses Python concepts, such as classes, structures, and conditional loops, allowing us to build DL algorithms in a pure object-oriented fashion. Most of the other popular frameworks bring their own programming style, sometimes making it complex to write new algorithms and it does not support intuitive debugging. In the later chapters, we will discuss computation graphs in detail.
Though PyTorch was released recently and is still in its beta version, it has become immensely popular among data scientists and deep learning researchers for its ease of use, better performance, easier-to-debug nature, and strong growing support from various companies such as SalesForce.
As PyTorch was primarily built for research, it is not recommended for production usage in certain scenarios where latency requirements are very high. However, this is changing with a new project called Open Neural Network Exchange (ONNX) (https://onnx.ai/), which focuses on deploying a model developed on PyTorch to a platform like Caffe2 that is production-ready. At the time of writing, it is too early to say much about this project as it has only just been launched. The project is backed by Facebook and Microsoft.
Throughout the rest of the book, we will learn about the various Lego blocks (smaller concepts or techniques) for building powerful DL applications in the areas of computer vision and NLP.
Summary
In this introductory chapter, we explored what artificial intelligence, machine learning, and deep learning are and we discussed the differences between all the three. We also looked at applications powered by them in our day-to-day lives. We dig deeper into why DL is only now becoming more popular. Finally, we gave a gentle introduction to PyTorch, which is a deep learning framework.
In the next chapter, we will train our first neural network in PyTorch.