Deep learning is a machine learning technique that is getting a lot of attention from the public and researchers. In this chapter, we will explore what deep learning is and how large companies are using it to solve complex problems. We'll look at what makes this technique so exciting and what concepts drive deep learning.
We will then talk about Microsoft Cognitive Toolkit (CNTK), what it is, and how it fits into the bigger picture of deep learning. We'll also discuss what makes CNTK unique compared to other frameworks.
In this chapter, we'll also show you how to get CNTK installed on your computer. We will explore installation on both Windows and Linux. If you have a compatible graphics card, you'll also want to check out the instructions on how to configure your graphics card for use with CNTK, as it will significantly speed up the calculations that are needed to train deep learning models.
In this chapter we will cover the following topics:
- The relationship between AI, machine learning, and deep learning
- How does deep learning work?
- What is CNTK?
- Installing CNTK
In order to understand what deep learning is, we have to explore what Artificial Intelligence (AI) is and how it relates to machine learning and deep learning. Conceptually, deep learning is a form of machine learning, whilst machine learning is a form of AI:
In computer science, Artificial intelligence, is a form of intelligence demonstrated by machines. AI is a term that was invented in the 1950s by scientists doing research in computer science. AI encompasses a large set of algorithms that shows behavior that is more intelligent than the standard software we build for our computers.
Some algorithms demonstrate intelligent behavior but aren't capable of improving themselves. One group of algorithms, called machine learning algorithms, can learn from sample data that you show them and generate models that you then use on similar data to make predictions.
Within the group of machine learning algorithms there's the sub-category of deep learning algorithms. This group of algorithms uses models that are inspired by the structure and function of a biological brain found in humans or animals.
Both machine learning and deep learning learn from sample data that you provide. When we build regular programs, we write business rules by using different language constructs, such as if-statements, loops, and functions. The rules are fixed. In machine learning, we feed samples and an expected answer into an algorithm that then learns the rules that connect the samples to the expected answers:
There are two major components in machine learning: machine learning models and machine learning algorithms.
When you use machine learning to build a program, you first choose a machine learning model. A machine learning model is a mathematical equation containing trainable parameters that transforms input into a predicted answer. This model shapes the rules that the computer will learn. For example: predicting the miles per gallon for a car requires that you model reality in a certain way. Classifying whether a credit card transaction is fraudulent requires a different model.
The representation of the input could be the properties of a car turned into a vector. The output of the model could be the miles per gallon for a car. In the case of credit card fraud, the input could be the properties of the user account and the transaction that was done. The output representation could be a score between 0 and 1 where a value close to 1 means that the transaction should be rejected.
The mathematical transformation in the machine learning model is controlled by a set of parameters that need to be trained for the transformation to produce the correct output representation.
This is where the second part, the machine learning algorithm comes into play. To find the best values for the parameters in the machine learning model we need to perform a multi-step process:
- Initially, the computer will choose a random value for each of the unknown parameters in your model
- It will then use sample data to make an initial prediction
- This prediction is fed into a loss function together with the expected output to get feedback regarding how well the model is performing
- This feedback is then used by the machine learning algorithm to find better values for the parameters in the model
These steps are repeated many times to find the best possible values for the parameters in the model. If all goes well, you end up with a model that is capable of making accurate predictions for many complicated situations.
The fact that we can learn rules from examples is a useful concept. There are many situations where we can't use simple rules to solve a particular problem. For example: credit card fraud cases come in many shapes and sizes. Sometimes a hacker slowly breaks the system injecting smaller hacks over time and then stealing the money. Other times hackers simply try to steal a lot of money in one attempt. A rule-based program would become too hard to maintain because it would need to contain a lot of code to handle all different fraud cases. Machine learning is an elegant way to solve this problem. It understands how to handle different kinds of credit card fraud without a lot of code. And it is also capable of making a judgment on cases that it hasn't seen before, within reasonable boundaries.
Machine learning models are very powerful. You can use them in many cases where rule-based programs fall short. Machine learning is a good first alternative whenever you find a problem that can't be solved with a regular rule-based program. Machine learning models do, however, come with their limitations.
The mathematical transformation in machine learning models is very basic. For example: when you want to classify whether a credit transaction should be marked as fraud, you can use a linear model. A logistic regression model is a great model for this kind of use case; it creates a decision boundary function that separates fraud cases from non-fraud cases. Most of the fraud cases will be above the line and correctly marked as such. But no machine learning model is perfect and some of the cases will not be correctly marked as fraud by the model as you can see in the following image.
If your data happens to be perfectly linearly-separable all cases would be correctly classified by the model. But when have to deal with more complex types of data, the basic machine learning models fall short. And there are more reasons why machine learning is limited in what it can do:
- Many algorithms assume that there's no interaction between features in the input
- Machine learning are, in many cases, based on linear algorithms, that don't handle non-linearity very well
- Often, you are dealing with a lot of features, classic machine learning algorithms have a harder time to deal with high dimensionality in the input data
The limitations discovered in machine learning caused scientists to look for other ways to build more complex models that allowed them to handle non-linear relationships and cases where there's a lot of interaction between the input of a model. This led to the invention of the artificial neural network.
An artificial neural network is a graph composed of several layers of artificial neurons. It's inspired by how the structure and function of the biological brain found in humans and animals.
To understand the power of deep learning and how to use CNTK to build neural networks, we need to look at how a neural network works and how it is trained to detect patterns in samples you feed it.
A neural network is made out of different layers. Each layer contains multiple neurons.
A typical neural network is made of several layers of artificial neurons. The first layer in a neural network is called the input layer. This is where we feed input into the neural network. The last layer of a neural network is called the output layer. This is where the transformed data is coming out of the neural network. The output of a neural network represents the prediction made by the network.
In between the input and output layer of the neural network, you can find one or more hidden layers. The layers in between the input and output are hidden because we don't typically observe the data going through these layers.
Neural networks are mathematical constructs. The data passed through a neural network is encoded as floating-point numbers. This means that everything you want to process with a neural network has to be encoded as vectors of floating-point numbers.
The core of a neural network is the artificial neuron. The artificial neuron is the smallest unit in a neural network that we can train to recognize patterns in data. Each artificial neuron inside the neural network has one or more input. Each of the vector input gets a weight:
The artificial neuron inside a neural network works in much the same way, but doesn't use chemical signals. Each artificial neuron inside the neural network has one or more inputs. Each of the vector inputs gets a weight.
The numbers provided for each input of the neuron gets multiplied by this weight. The output of this multiplication is then added up to produce a total activation value for the neuron.
This activation signal is then passed through an activation function. The activation function performs a non-linear transformation on this signal. For example: it uses a rectified linear function to process the input signal:
The rectified linear function will convert negative activation signals to zero but performs an identity (pass-through) transformation on the signal when it is a positive number.
One other popular activation function is the sigmoid function. It behaves slightly different than the rectified linear function in that it transforms negative values to 0 and positive values to 1. There is, however, a slope in the activation between -0.5 and +0.5, where the signal is transformed in a linear fashion.
Activation functions in artificial neurons play an important role in the neural network. It's because of these non-linear transformation functions that the neural network is capable of working with non-linear relationships in the data.
By combining layers of neurons together we create a stacked function that has non-linear transformations and trainable weights so it can learn to recognize complex relationships. To visualize this, let's transform the neural network from previous sections into a mathematical formula. First, let's take a look at the formula for a single layer:
The X variable is a vector that represents the input for the layer in the neural network. The w parameter represents a vector of weights for each of the elements in the input vector, X. In many neural network implementations, an additional term, b, is added, this is called the bias and basically increases or decreases the overall level of input required to activate the neuron. Finally, there's a function, f, which is the activation function for the layer.
Now that you've seen the formula for a single layer, let's put together additional layers to create the formula for the neural network:
Notice how the formula has changed. We now have the formula for the first layer wrapped in another layer function. This wrapping or stacking of functions continues when we add more layers to the neural network. Each layer introduces more parameters that need to be optimized to train the neural network. It also allows the neural network to learn more complex relationships from the data we feed into it.
To make a prediction with a neural network, we need to fill all of the parameters in the neural network. Let's assume we know those because we trained it before. What's left is the input value for the neural network.
The input is a vector of floating-point numbers that is a representation of the input of our neural network. The output is a vector that forms a representation of the predicted output of the neural network.
We've talked about making predictions with neural networks. We haven't yet talked about how to optimize the parameters in a neural network. Let's go over each of the components in a neural network and explore how they work together when we train it:
A neural network has several layers that are connected together. Each layer will have a set of trainable parameters that we want to optimize. Optimizing a neural network is done using a technique called backpropagation. We aim to minimize the output of a loss function by gradually optimizing the values for the w1, w2, and w3 parameters in the preceding diagram.
The loss function for a neural network can take many shapes. Typically, we choose a function that expresses the difference between the expected output, Y, and the real output produced by the neural network. For example: we could use the following loss function:
Firstly, the neural network is initialized with . We can do this with random values for all of the parameters in the model.
After we initialize the neural network, we feed data into the neural network to make a prediction. We then feed the prediction together with the expected output into a loss function to measure how close the model is to what we expect it to be.
The feedback from the loss function is used to feed an optimizer. The optimizer uses a technique called gradient descent to find out how to optimize each of the parameters.
Gradient descent is a key ingredient of neural network optimization and works because of an interesting property of the loss function. When you visualize the output of the loss function for one set of input with different values for the parameters in the neural network, you end up with a plot that looks similar to this:
At the beginning of the backpropagation process, we start somewhere on one of the slopes in this mountain landscape. Our aim is to walk down the mountain toward a point where the values for the parameters are at their best. This is the point where the output of the loss function is minimized as much as possible.
For us to find the way down the mountain slope, we need to find a function that expresses the slope at the current spot on the mountain slope. We do this by creating a derived function from the loss function. This derived function gives us the gradients for the parameters in the model.
When we perform one pass of the backpropagation process, we take one step down the mountain using the gradients for the parameters. We can add the gradients to the parameters to do this. But this is a dangerous way of following the slope down the mountain. Because if we move too fast, we might miss the optimum spot. Therefore, all neural network optimizers have a setting called the learning rate. The learning rate controls the rate of descent.
Because we can only take small steps in the gradient-descent algorithm, we need to repeat this process many times to reach the optimum values for the neural network parameters.
Building a neural network from scratch is a big undertaking—something I would not advise anyone to start with unless you're looking for a programming challenge. There are some great libraries that can help you build neural networks without the need to fully understand the mathematical formulas.
Microsoft Cognitive Toolkit (CNTK) is an open source library that contains all the basic building blocks to build a neural network.
CNTK is implemented using C++ and Python, but it is also available in C# and Java. Training can only be done in C++ or Python, but you can easily load your models in C# or Java for making predictions after you've trained your neural network.
There is also a variant of CNTK that uses a proprietary language called BrainScript. But for the purpose of this book, we'll only look at Python for the basic features of the framework. Later on, in Chapter 7, Deploying Models to Production, we'll discuss how to use C# or Java to load and use a trained model.
CNTK is a library that has both a low-level and high-level API for building neural networks. The low-level API is meant for scientists looking to build the next generation of neural network components, while the high-level API is meant for building production-quality neural networks.
On top of these basic building blocks, CNTK features a set of components that will make it easier to feed data into your neural network. It also contains various components to monitor and debug neural networks.
Finally, CNTK features a C# and Java API. You can use both of these languages to load trained models and make predictions from within your web application, microservices, or even Windows Store apps. In addition, you can use C# to train models should you want to do this.
Although it is possible to work with CNTK from Java and C#, it is important to know that at this point not all features in the Python version of CNTK are available to the C# and Java APIs. For example: models trained for object detection in Python do not work in C# with version 2.6 of CNTK.
At the core of CNTK, you'll find a low-level API that contains a set of mathematical operators to build neural network components. The low-level API also includes the automatic differentiation needed to optimize the parameters in your neural network.
Microsoft built the components with high performance in mind. For example: it included specific code to train neural networks on graphics cards. Graphics cards contain specialized processors, called GPUs, that are capable of processing large volumes of vector and matrix math at very high speeds. You can typically speed up the training process of a neural network by at least a factor 10.
When you want to build a neural network for production use, you typically use the high-level API. The high-level API contains all kinds of different building blocks of a neural network.
For example: there's a basic dense layer to build the most basic kind of neural network. But you will also find more advanced layer types in the high-level API, such as the layer types needed to process images or time series data.
The high-level API also contains different optimizers to train neural networks, so you don't have to manually build a gradient-descent optimizer. In CNTK, the optimization process is implemented using learners and trainers, where the learner defines which kind of gradient-descent algorithm to use while the trainer defines how to implement the basics of backpropagation.
In Chapter 2, Building Neural Networks with CNTK, we'll explore how to use the high-level API to build and train a neural network. In Chapter 5, Working with Images, and Chapter 6, Working with Time Series Data, you'll learn how to use some of the more advanced layer types to process images and time series data with CNTK.
Once you've built a neural network, you want to make sure that it works correctly. CNTK offers a number of components to measure the performance of neural networks.
You will often find yourself looking for ways to monitor how well the training process for your model is doing. CNTK includes components that will generate log data from your model and the associated optimizer, which you can use to monitor the training process.
When you use deep learning, you often need a large dataset to train neural networks. It is not uncommon to use gigabytes of data to train your model. Included with CNTK is a set of components that allow you to feed data into the neural network for training.
Microsoft did its best to build specialized readers that will load data into memory in batches so you don't need a terabyte of RAM to train your network. We'll talk about these readers in greater depth in Chapter 3, Getting Data into Your Neural Network.
The main CNTK library is built in Python on top of a C++ core. You can use both C++ and Python to train models. When you want to use your models in production, you have a lot more choice. You can use your trained model from C++ or Python, but most developers will want to use Java or C#. Python is much slower than these languages when it comes to runtime performance. Also, C# and Java are more widely used in corporate environments.
You can download the C# and Java version of CNTK as a separate library from NuGet or Maven central. In Chapter 7, Deploying Models to Production, we'll discuss how to use CNTK from these languages to host a trained model inside a microservice environment.
Now that we've seen how neural networks work and what CNTK is, let's take a look at how to install it on your computer. CNTK is supported on both Windows and Linux, so we'll walk through each of them.
We will be using the Anaconda version of Python on Windows to run CNTK. Anaconda is a redistribution of Python that includes additional packages, such as SciPy and scikit-learn, which are used by CNTK to perform various calculations.
You can download Anaconda from the public website: https://www.anaconda.com/download/.
After you've downloaded the setup files, start the installation and follow the instructions to install Anaconda on your computer. You can find the installation instructions at https://docs.anaconda.com/anaconda/install/.
Anaconda will install a number of utilities on your computer. It will install a new command prompt that will automatically include all the Anaconda executables in your PATH variable. You can quickly manage your Python environment from this command prompt, install packages and, of course, run Python scripts.
Optionally, you can install Visual Studio Code with your Anaconda installation. Visual Studio Code is a code editor similar to Sublime and Atom and contains a large number of plugins that make it easier to write program code in different programming languages, such as Python.
CNTK 2.6 supports Python 3.6 only, which means that not all distributions of Anaconda will work correctly. You can get an older version of Anaconda through the Anaconda archives at https://repo.continuum.io/archive/. Alternatively, you can downgrade the Python version in your Anaconda installation if you haven't got a version with Python 3.6 included. To install Python 3.6 in your Anaconda environment, open a new Anaconda prompt and execute the following command:
conda install python=3.6
Anaconda comes with a slightly outdated version of the python package manager, pip. This can cause problems when we try to install the CNTK package. So, before we install the CNTK package, let's upgrade the pip executable.
To upgrade the pip executable, open the Anaconda prompt and execute the following command:
python -m pip install --upgrade pip
This will remove the old pip executable and install a new version in its place.
There's a number of ways to get the CNTK package on your computer. The most common way is to install the package through the pip executable:
pip install cntk
This will download the CNTK package from the package manager website and install it on your machine. pip will automatically check for missing dependencies and install those as well.
There are several alternative methods to install CNTK on your machine. The website has a neat set of documentation that explains the other installation methods in great detail: https://docs.microsoft.com/en-us/cognitive-toolkit/Setup-CNTK-on-your-machine.
Installing CNTK on Linux is slightly different than installing it on Windows. Just as on Windows, we will use Anaconda to run the CNTK package. But instead of a graphical installer for Anaconda, there's a terminal-based installer on Linux. The installer will work on most Linux distributions. We limited the description to Ubuntu, a widely-used Linux distribution.
Before we can install Anaconda, we need to make sure that the system is fully up to date. To check this, execute the following two commands inside a terminal:
sudo apt update
sudo apt upgrade
Automatically Programmed Tool (APT) is used to install all sorts of packages inside Ubuntu. In the code sample, we first ask apt to update the references to the various package repositories. We then ask it to install the latest updates.
After the computer is updated, we can start the installation of Anaconda. First, navigate to https://www.anaconda.com/download/ to get the URL for the latest Anaconda installation files. You can right-click on the download link and copy the URL to your clipboard.
Now open up a terminal window and execute the following command:
wget -O anaconda-installer.sh url
Make sure to replace the url placeholder with the URL you copied from the Anaconda website. Press Enter to execute the command.
Once the installation file is download, you can install Anaconda by running the following command:
This will start the installer. Follow the instructions on the screen to install Anaconda on your computer. By default, Anaconda gets installed in a folder called anaconda3 inside your home directory.
As is the case with the Windows version of CNTK 2.6, it only supports Python 3.6. You can either get an older distribution of Anaconda through their archives at https://repo.continuum.io/archive/, or downgrade your Python version by executing the following command in your terminal:
conda install python=3.6
Once we have Anaconda installed, we need to upgrade pip to the latest version. pip is used to install packages inside Python. It is the tool we're going to use to install CNTK:
python -m pip install --upgrade pip
The final step in the installation process is to install CNTK. This is done through pip using the following command:
pip install cntk
Should you want to, you can also install CNTK by downloading a wheel file directly or using an installer with Anaconda included. You can find more information on alternative installation methods for CNTK at https://docs.microsoft.com/en-us/cognitive-toolkit/Setup-CNTK-on-your-machine.
We looked at how to install the basic version of CNTK for use with your CPU. While the CNTK package is fast, it will run quicker on a GPU. But not all machines support this setup, and that's why I put the description of how to use your GPU into a separate section.
Before you attempt to install CNTK for use with a GPU, make sure you have a supported graphics card. Currently, CNTK supports the NVIDIA graphics card with at least CUDA 3.0 support. CUDA is the programming API from NVIDIA that allows developers to run non-graphical programs on their graphics cards. You can check whether your graphics card supports CUDA on this website: https://developer.nvidia.com/cuda-gpus.
To use your graphics card with CNTK on Windows, you need to have the latest GeForce or Quadro drivers for your graphics card (depending on which one you have). Aside from the latest drivers, you need to install the CUDA toolkit Version 9.0 for Windows.
You can download the CUDA toolkit from the NVIDIA website: https://developer.nvidia.com/cuda-90-download-archive?target_os=Windows&target_arch=x86_64. Once downloaded, run the installer and follow the instructions on the screen.
CNTK uses a layer on top of CUDA, called cuDNN, for neural-network-specific primitives. You can download the cuDNN binaries from the NVIDIA website at https://developer.nvidia.com/rdp/form/cudnn-download-survey. In contrast to the CUDA toolkit, you need to register an account to the website before you can download the cuDNN binaries.
Not all cuDNN binaries work with every version of CUDA. The website mentions which version of cuDNN is compatible with which version of the CUDA toolkit. For CUDA 9.0, you need to download cuDNN 7.4.1.
Once you have downloaded the cuDNN binaries, extract the zip file into the root folder of your CUDA toolkit installation. Typically, the CUDA toolkit is located at C:\program files\NVIDIA GPU Computing Toolkit\CUDA\v9.0.
The final step to enable GPU usage inside CNTK is to install the CNTK-GPU package. Open the Anaconda prompt in Windows and execute the following command:
pip install cntk-gpu
Using your graphics card with CNTK on Linux requires that you run the proprietary drivers for NVIDIA. When you install the CUDA toolkit on your Linux machine, you get asked to install the latest drivers for your graphics card automatically. While you are not required to install the drivers through the CUDA toolkit installer, we strongly recommend you do, as the drivers will match the binaries of the CUDA toolkit. This reduces the risk of a failing installation or other errors later on.
You can download the CUDA toolkit from the NVIDIA website: https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal.
Please make sure you select the appropriate Linux distribution and version. The link automatically selects Ubuntu 16.04 and uses a local runfile.
Once you've downloaded the binaries to disk, you can run the installer by opening a terminal and executing the following command:
Follow the onscreen instructions to install the CUDA toolkit on your machine.
Once you have the CUDA toolkit installed, you need to modify your Bash profile script. Open the $HOME/.bashrc file in your favorite text editor and include the following lines at the end of the script:
The first line includes the CUDA binaries in the PATH variable so CNTK can access them. The second line in the script includes the CNTK libraries in your library PATH so CNTK can load them when needed.
Save the changes to the file and close the editor. Please make sure you restart your terminal window to ensure that the new settings are loaded.
The final step is to download and install the cuDNN binaries. CNTK uses a layer on top of CUDA, called cuDNN, for neural-network-specific primitives. You can download the cuDNN binaries from the NVIDIA website here: https://developer.nvidia.com/rdp/form/cudnn-download-survey. In contrast to the CUDA toolkit, you need to register an account on the website before you can download the cuDNN binaries.
Not all cuDNN binaries work with every version of CUDA. The website mentions which version of cuDNN is compatible with which version of the CUDA toolkit. For CUDA 9.0, you need to download cuDNN 7.4.1. Download the version for Linux and extract it to the /usr/local/cuda-9.0 folder using the following command:
tar xvzf -C /usr/local/cuda-9.0/ cudnn-9.0-linux-x64-v22.214.171.124.tgz
The filename may differ slightly; change the path to the filename as needed.
In this chapter, we learned about deep learning and its relationship to machine learning and AI. We looked at the basic concepts behind deep learning and how to train a neural network using gradient descent. We then talked about CNTK, what it is, and what features the library offers to build deep learning models. We finally spent some time discussing how to install CNTK on Windows and Linux and how to use your GPU should you want to.
In the next chapter, we will learn how to build basic neural networks with CNTK so we get a better understanding of how the concepts in this chapter work in code. We will also discuss the different ways we can use various components in our deep learning model for different scenarios.