Reader small image

You're reading from  Python Data Science Essentials. - Third Edition

Product typeBook
Published inSep 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781789537864
Edition3rd Edition
Languages
Concepts
Right arrow
Author (1)
Alberto Boschetti
Alberto Boschetti
author image
Alberto Boschetti

Alberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges ranging from natural language processing (NLP) and behavioral analysis to machine learning and distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meet-ups, conferences, and other events.
Read more about Alberto Boschetti

Right arrow

Deep Learning Beyond the Basics

In this chapter, we will introduce deep models, and we will show three examples of how to build deep models. More specifically, in this chapter, you'll learn the following:

  • The basics of deep learning
  • How to optimize a deep net
  • The speed/complexity/accuracy problem
  • How to classify images with a CNN
  • How to use a pre-trained network for classification and transfer learning
  • How to operate on sequences using a LSTM

We will be using the Keras package (https://keras.io/), which is a high-level API for deep learning that will render approaching neural networks for deep learning much easier and more understandable because it is characterized by a Lego-like approach (here, the bricks are a neural network's composing elements).

Approaching deep learning

Deep learning is an extension of the classical machine-learning approach using neural networks: instead of building networks of a few layers (so-called shallow networks), we can stack hundreds of layers to create an elaborate, but more powerful, learner. Deep learning is one of the most popular methods of artificial intelligence (AI) nowadays since it's very effective and helps to solve many problems in pattern recognition, such as object or sequence identification, which seemed unbreakable using standard machine learning tools.

The idea of neural networks came from the human central nervous system, where multiple nodes (or neurons) that are able to process simple information are connected together to create a network capable of processing complex information. In fact, neural networks are so named because they can learn the weights of the model...

Classifying images with CNN

Let's now apply a deep neural network to an image-classification problem. Here, we will try to predict a traffic sign from its image. For this task, we will use a CNN (convolutional neural network), which is able to exploit the spatial correlation between nearby pixels in an image, and is the state of the art in deep learning when working on this kind of problem.

The dataset is available here: http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset. We would like to thank the team for having released the dataset free of charge, and reference the publication dealing with this dataset:
J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel. The German Traffic Sign Recognition Benchmark: A multi-class classification competition. In Proceedings of the IEEE International Joint Conference on Neural Networks, pages 1453–1460. 2011.

First...

Using pre-trained models

As you saw in the previous example, increasing the complexity of the network increases the time and the memory needed to train it. Sometimes, we have to accept that we don't have a machine powerful enough to try all the combinations. What can we do in that situation? Basically, we can do two things:

  • Simplify the network; that is, by removing parameters and variables
  • Use a pre-trained network, which has already been trained by someone with a powerful enough machine

In both situations, we will work in sub-optimal conditions, since the deep network won't be as powerful as the one we could have used. More specifically, in the first case, the network won't be very accurate because we have fewer parameters; in the second case, well, we have to cope with someone else's decisions and training set. Although it's not very easy to do, p...

Working with temporal sequences

The last example in this chapter is about dealing with temporal sequences; more specifically, we will see how to deal with text, which is a variable-length sequence of words.

Some data-science algorithms deal with text using the bag-of-words approach; that is, they don't care where the words are and how they're placed in the text, they just care about their presence/absence (and maybe their frequency). Instead, a special class of deep networks is specifically designed to operate on sequences, where the order is important.

Some examples are as follows:

  • Predict a future stock price, given its historical data: In this case, the input is a sequence of numbers, and the output is a number

  • Predict whether the market will go up or down: In this case, given a sequence of numbers, we want to predict a class (up or down)

  • Translate an English...

Summary

In this chapter, we saw the essentials and some advanced models for deep networks. We were introduced to how neural networks work and the difference between shallow networks and deep learning. Then, we learnt ho to build a CNN deep network capable of classifying images of traffic signs. We also predicted the class of an image using a pre-trained network. Detecting the sentiment of a movie review using text found in reviews was also a part of the learning.

Deep learning models are indeed very powerful, though at the cost of having many degrees of freedom to handle and many coefficients to train, which requires having at hand large amounts of data.

In the next chapter, we'll see how Spark helps when the amount of data becomes too large to be handled and processed by a single computer.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Python Data Science Essentials. - Third Edition
Published in: Sep 2018Publisher: PacktISBN-13: 9781789537864
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alberto Boschetti

Alberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges ranging from natural language processing (NLP) and behavioral analysis to machine learning and distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meet-ups, conferences, and other events.
Read more about Alberto Boschetti