Home

Data

Deep Learning By Example

Book

eBook $35.99 $24.99

Print $43.99

Subscription $15.99 $10 p/m for three months

BUY NOW

$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

eBook $35.99 $24.99

Print $43.99

Subscription $15.99 $10 p/m for three months

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

About this book

Deep learning is a popular subset of machine learning, and it allows you to build complex models that are faster and give more accurate predictions. This book is your companion to take your first steps into the world of deep learning, with hands-on examples to boost your understanding of the topic. This book starts with a quick overview of the essential concepts of data science and machine learning which are required to get started with deep learning. It introduces you to Tensorflow, the most widely used machine learning library for training deep learning models. You will then work on your first deep learning problem by training a deep feed-forward neural network for digit classification, and move on to tackle other real-world problems in computer vision, language processing, sentiment analysis, and more. Advanced deep learning models such as generative adversarial networks and their applications are also covered in this book. By the end of this book, you will have a solid understanding of all the essential concepts in deep learning. With the help of the examples and code provided in this book, you will be equipped to train your own deep learning models with more confidence.

Publication date:: February 2018
Publisher: Packt
Pages: 450
ISBN: 9781788399906
Download code from GitHub

Data Science - A Birds' Eye View

Data science or machine learning is the process of giving the machines the ability to learn from a dataset without being told or programmed. For instance, it is extremely hard to write a program that can take a hand-written digit as an input image and outputs a value from 0-9 according to the image that's written. The same applies to the task of classifying incoming emails as spam or non-spam. For solving such tasks, data scientists use learning methods and tools from the field of data science or machine learning to teach the computer how to automatically recognize digits, by giving it some explanatory features that can distinguish one digit from another. The same for the spam/non-spam problem, instead of using regular expressions and writing hundred of rules to classify the incoming email, we can teach the computer through specific learning algorithms how to distinguish between spam and non-spam emails.

For the spam filtering application, you can code it by a rule-based approach, but it won't be good enough to be used in production, like the one in your mailing server. Building a learning system is an ideal solution for that.

You are probably using applications of data science on a daily basis, often without knowing it. For example, your country might be using a system to detect the ZIP code of your posted letter in order to automatically forward it to the correct area. If you are using Amazon, they often recommend things for you to buy and they do this by learning what sort of things you often search for or buy.

Building a learned/trained machine learning algorithm will require a base of historical data samples from which it's going to learn how to distinguish between different examples and to come up with some knowledge and trends from that data. After that, the learned/trained algorithm could be used for making predictions on unseen data. The learning algorithm will be using raw historical data and will try to come up with some knowledge and trends from that data.

In this chapter, we are going to have a bird's-eye view of data science, how it works as a black box, and the challenges that data scientists face on a daily basis. We are going to cover the following topics:

Understanding data science by an example
Design procedure of data science algorithms
Getting to learn
Implementing the fish recognition/detection model
Different learning types
Data size and industry needs

Understanding data science by an example

To illustrate the life cycle and challenges of building a learning algorithm for specific data, let us consider a real example. The Nature Conservancy is working with other fishing companies and partners to monitor fishing activities and preserve fisheries for the future. So they are looking to use cameras in the future to scale up this monitoring process. The amount of data that will be produced from the deployment of these cameras will be cumbersome and very expensive to process manually. So the conservancy wants to develop a learning algorithm to automatically detect and classify different species of fish to speed up the video reviewing process.

Figure 1.1 shows a sample of images taken by conservancy-deployed cameras. These images will be used to build the system.

Figure 1.1: Sample of the conservancy-deployed cameras' output

So our aim in this example is to separate different species such as tunas, sharks, and more that fishing boats catch. As an illustrative example, we can limit the problem to only two classes, tuna and opah.

Figure 1.2: Tuna fish type (left) and opah fish type (right)

After limiting our problem to contain only two types of fish, we can take a sample of some random images from our collection and start to note some physical differences between the two types. For example, consider the following physical differences:

Length: You can see that compared to the opah fish, the tuna fish is longer
Width: Opah is wider than tuna
Color: You can see that the opah fish tends to be more red while the tuna fish tends to be blue and white, and so on

We can use these physical differences as features that can help our learning algorithm(classifier) to differentiate between these two types of fish.

Explanatory features of an object are something that we use in daily life to discriminate between objects that surround us. Even babies use these explanatory features to learn about the surrounding environment. The same for data science, in order to build a learned model that can discriminate between different objects (for example, fish type), we need to give it some explanatory features to learn from (for example, fish length). In order to make the model more certain and reduce the confusion error, we can increase (to some extent) the explanatory features of the objects.

Given that there are physical differences between the two types of fish, these two different fish populations have different models or descriptions. So the ultimate goal of our classification task is to get the classifier to learn these different models and then give an image of one of these two types as an input. The classifier will classify it by choosing the model (tuna model or opah model) that corresponds best to this image.

In this case, the collection of tuna and opah fish will act as the knowledge base for our classifier. Initially, the knowledge base (training samples) will be labeled/tagged, and for each image, you will know beforehand whether it's tuna or opah fish. So the classifier will use these training samples to model the different types of fish, and then we can use the output of the training phase to automatically label unlabeled/untagged fish that the classifier didn't see during the training phase. This kind of unlabeled data is often called unseen data. The training phase of the life cycle is shown in the following diagram:

Supervised data science is all about learning from historical data with known target or output, such as the fish type, and then using this learned model to predict cases or data samples, for which we don't know the target/output.

Figure 1.3: Training phase life cycle

Let's have a look at how the training phase of the classifier will work:

Pre-processing: In this step, we will try to segment the fish from the image by using the relevant segmentation technique.
Feature extraction: After segmenting the fish from the image by subtracting the background, we will measure the physical differences (length, width, color, and so on) of each image. At the end, you will get something like Figure 1.4.

Finally, we will feed this data into the classifier in order to model different fish types.

As we have seen, we can visually differentiate between tuna and opah fish based on the physical differences (features) that we proposed, such as length, width, and color.

We can use the length feature to differentiate between the two types of fish. So we can try to differentiate between the fish by observing their length and seeing whether it exceeds some value (length*) or not.

So, based on our training sample, we can derive the following rule:

If length(fish)> length* then label(fish) = Tuna
Otherwise label(fish) = Opah

In order to find this length* we can somehow make length measurements based on our training samples. So, suppose we get these length measurements and obtain the histogram as follows:

Figure 1.4: Histogram of the length measurements for the two types of fish

In this case, we can derive a rule based on the length feature and differentiate the tuna and opah fish. In this particular example, we can tell that length* is 7. So we can update the preceding rule to be:

If length(fish)> 7 then label(fish) = Tuna
Otherwise label(fish) = Opah

As you may notice, this is not a promising result because of the overlap between the two histograms, as the length feature is not a perfect one to use solely for differentiating between the two types. So we can try to incorporate more features such as the width and then combine them. So, if we somehow manage to measure the width of our training samples, we might get something like the histogram as follows:

Figure 5: Histogram of width measurements for the two types of fish

As you can see, depending on one feature only will not give accurate results and the output model will do lots of misclassifications. Instead, we can somehow combine the two features and come up with something that looks reasonable.

So if we combine both features, we might get something that looks like the following graph:

Figure 1.6 : Combination between the subset of the length and width measurements for the two types of fish

Combining the readings for the length and width features, we will get a scatter plot like the one in the preceding graph. We have the red dots to represent the tuna fish and the green dots to represent the opah fish, and we can suggest this black line to be the rule or the decision boundary that will differentiate between the two types of fish.

For example, if the reading of one fish is above this decision boundary, then it's a tuna fish; otherwise, it will be predicted as an opah fish.

We can somehow try to increase the complexity of the rule to avoid any errors and get a decision boundary like the one in the following graph:

Figure 1.7: Increasing the complexity of the decision boundary to avoid misclassifications over the training data

The advantage of this model is that we get almost 0 misclassifications over the training samples. But actually this is not the objective of using data science. The objective of data science is to build a model that will be able to generalize and perform well over the unseen data. In order to find out whether we built a model that will generalize or not, we are going to introduce a new phase called the testing phase, in which we give the trained model an unlabeled image and expect the model to assign the correct label (Tuna and Opah) to it.

Data science's ultimate objective is to build a model that will work well in production, not over the training set. So don't be happy when you see your model is performing well on the training set, like the one in figure 1.7. Mostly, this kind of model will fail to work well in recognizing the fish type in the image. This incident of having your model work well only over the training set is called overfitting, and most practitioners fall into this trap.

Instead of coming up with such a complex model, you can drive a less complex one that will generalize in the testing phase. The following graph shows the use of a less complex model in order to get fewer misclassification errors and to generalize the unseen data as well:

Figure 1.8: Using a less complex model in order to be able to generalize over the testing samples (unseen data)

Design procedure of data science algorithms

Different learning systems usually follow the same design procedure. They start by acquiring the knowledge base, selecting the relevant explanatory features from the data, going through a bunch of candidate learning algorithms while keeping an eye on the performance of each one, and finally the evaluation process, which measures how successful the training process was.

In this section, we are going to address all these different design steps in more detail:

Figure 1.11: Model learning process outline

Data pre-processing

This component of the learning cycle represents the knowledge base of our algorithm. So, in order to help the learning algorithm give accurate decisions about the unseen data, we need to provide this knowledge base in the best form. Thus, our data may need a lot of cleaning and pre-processing (conversions).

Data cleaning

Most datasets require this step, in which you get rid of errors, noise, and redundancies. We need our data to be accurate, complete, reliable, and unbiased, as there are lots of problems that may arise from using bad knowledge base, such as:

Inaccurate and biased conclusions
Increased error
Reduced generalizability, which is the model's ability to perform well over the unseen data that it didn't train on previously

Data pre-processing

In this step, we apply some conversions to our data to make it consistent and concrete. There are lots of different conversions that you can consider while pre-processing your data:

Renaming (relabeling): This means converting categorical values to numbers, as categorical values are dangerous if used with some learning methods, and also numbers will impose an order between the values
Rescaling (normalization): Transforming/bounding continuous values to some range, typically [-1, 1] or [0, 1]
New features: Making up new features from the existing ones. For example, obesity-factor = weight/height

Feature selection

The number of explanatory features (input variables) of a sample can be enormous wherein you get x_i=(x_i¹, x_i², x_i³, ... , x_i^d) as a training sample (observation/example) and d is very large. An example of this can be a document classification task3, where you get 10,000 different words and the input variables will be the number of occurrences of different words.

This enormous number of input variables can be problematic and sometimes a curse because we have many input variables and few training samples to help us in the learning procedure. To avoid this curse of having an enormous number of input variables (curse of dimensionality), data scientists use dimensionality reduction techniques in order to select a subset from the input variables. For example, in the text classification task they can do the following:

Extracting relevant inputs (for instance, mutual information measure)
Principal component analysis (PCA)
Grouping (cluster) similar words (this uses a similarity measure)

Model selection

This step comes after selecting a proper subset of your input variables by using any dimensionality reduction technique. Choosing the proper subset of the input variable will make the rest of the learning process very simple.

In this step, you are trying to figure out the right model to learn.

If you have any prior experience with data science and applying learning methods to different domains and different kinds of data, then you will find this step easy as it requires prior knowledge of how your data looks and what assumptions could fit the nature of your data, and based on this you choose the proper learning method. If you don't have any prior knowledge, that's also fine because you can do this step by guessing and trying different learning methods with different parameter settings and choose the one that gives you better performance over the test set.

Also, initial data analysis and visualization will help you to make a good guess about the form of the distribution and nature of your data.

Learning process

By learning, we mean the optimization criteria that you are going to use to select the best model parameters. There are various optimization criteria for that:

Mean square error (MSE)
Maximum likelihood (ML) criterion
Maximum a posterior probability (MAP)

The optimization problem may be hard to solve, but the right choice of model and error function makes a difference.

Evaluating your model

In this step, we try to measure the generalization error of our model on the unseen data. Since we only have the specific data without knowing any unseen data beforehand, we can randomly select a test set from the data and never use it in the training process so that it acts like valid unseen data. There are different ways you can to evaluate the performance of the selected model:

Simple holdout method, which is dividing the data into training and testing sets
Other complex methods, based on cross-validation and random subsampling

Our objective in this step is to compare the predictive performance for different models trained on the same data and choose the one with a better (smaller) testing error, which will give us a better generalization error over the unseen data. You can also be more certain about the generalization error by using a statistical method to test the significance of your results.

Getting to learn

Building a machine learning system comes with some challenges and issues; we will try to address them in this section. Many of these issues are domain specific and others aren't.

Challenges of learning

The following is an overview of the challenges and issues that you will typically face when trying to build a learning system.

Feature extraction – feature engineering

Feature extraction is one of the crucial steps toward building a learning system. If you did a good job in this challenge by selecting the proper/right number of features, then the rest of the learning process will be easy. Also, feature extraction is domain dependent and it requires prior knowledge to have a sense of what features could be important for a particular task. For example, the features for our fish recognition system will be different from the ones for spam detection or identifying fingerprints.

The feature extraction step starts from the raw data that you have. Then build derived variables/values (features) that are informative about the learning task and facilitate the next steps of learning and evaluation (generalization).

Some tasks will have a vast number of features and fewer training samples (observations) to facilitate the subsequent learning and generalization processes. In such cases, data scientists use dimensionality reduction techniques to reduce the vast number of features to a smaller set.

Noise

In the fish recognition task, you can see that the length, weight, fish color, as well as the boat color may vary, and there could be shadows, images with low resolution, and other objects in the image. All these issues affect the significance of the proposed explanatory features that should be informative about our fish classification task.

Work-arounds will be helpful in this case. For example, someone might think of detecting the boat ID and mask out certain parts of the boat that most likely won't contain any fish to be detected by our system. This work-around will limit our search space.

Overfitting

As we have seen in our fish recognition task, we have tried to enhance our model's performance by increasing the model complexity and perfectly classifying every single instance of the training samples. As we will see later, such models do not work over unseen data (such as the data that we will use for testing the performance of our model). Having trained models that work perfectly over the training samples but fail to perform well over the testing samples is called overfitting.

If you sift through the latter part of the chapter, we build a learning system with an objective to use the training samples as a knowledge base for our model in order to learn from it and generalize over the unseen data. Performance error of the trained model is of no interest to us over the training data; rather, we are interested in the performance (generalization) error of the trained model over the testing samples that haven't been involved in the training phase.

Selection of a machine learning algorithm

Sometimes you are unsatisfied with the execution of the model that you have utilized for a particular errand and you need an alternate class of models. Each learning strategy has its own presumptions about the information it will utilize as a learning base. As an information researcher, you have to discover which suspicions will fit your information best; by this you will have the capacity to acknowledge to attempt a class of models and reject another.

Prior knowledge

As discussed in the concepts of model selection and feature extraction, the two issues can be dealt with, if you have prior knowledge about:

The appropriate feature
Model selection parts

Having prior knowledge of the explanatory features in the fish recognition system enabled us to differentiate amid different types of fish. We can go promote by endeavoring to envision our information and get some feeling of the information types of the distinctive fish classifications. On the basis of this prior knowledge, apt family of models can be chosen.

Missing values

Missing features mainly occur because of a lack of data or choosing the prefer-not-to-tell option. How can we handle such a case in the learning process? For example, imagine we find the width of specific a fish type is missing for some reason. There are many ways to handle these missing features.

Implementing the fish recognition/detection model

To introduce the power of machine learning and deep learning in particular, we are going to implement the fish recognition example. No understanding of the inner details of the code will be required. The point of this section is to give you an overview of a typical machine learning pipeline.

Our knowledge base for this task will be a bunch of images, each one of them is labeled as opah or tuna. For this implementation, we are going to use one of the deep learning architectures that made a breakthrough in the area of imaging and computer vision in general. This architecture is called Convolution Neural Networks (CNNs). It is a family of deep learning architectures that use the convolution operation of image processing to extract features from the images that can explain the object that we want to classify. For now, you can think of it as a magic box that will take our images, learn from it how to distinguish between our two classes (opah and tuna), and then we will test the learning process of this box by feeding it with unlabeled images and see if it's able to tell which type of fish is in the image.

Different types of learning will be addressed in a later section, so you will understand later on why our fish recognition task is under the supervised learning category.

In this example, we will be using Keras. For the moment, you can think of Keras as an API that makes building and using deep learning way easier than usual. So let's get started! From the Keras website we have:

Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.

Knowledge base/dataset

As we mentioned earlier, we need a historical base of data that will be used to teach the learning algorithm about the task that it's supposed to do later. But we also need another dataset for testing its ability to perform the task after the learning process. So to sum up, we need two types of datasets during the learning process:

The first one is the knowledge base where we have the input data and their corresponding labels such as the fish images and their corresponding labels (opah or tuna). This data will be fed to the learning algorithm to learn from it and try to discover the patterns/trends that will help later on for classifying unlabeled images.
The second one is mainly for testing the ability of the model to apply what it learned from the knowledge base to unlabeled images or unseen data, in general, and see if it's working well.

As you can see, we only have the data that we will use as a knowledge base for our learning method. All of the data we have at hand will have the correct output associated with it. So we need to somehow make up this data that does not have any correct output associated with it (the one that we are going to apply the model to).

While performing data science, we'll be doing the following:

Training phase: We present our data from our knowledge base and train our learning method/model by feeding the input data along with its correct output to the model.
Validation/test phase: In this phase, we are going to measure how well the trained model is doing. We also use different model property techniques in order to measure the performance of our trained model by using (R-square score for regression, classification errors for classifiers, recall and precision for IR models, and so on).

The validation/test phase is usually split into two steps:

In the first step, we use different learning methods/models and choose the best performing one based on our validation data (validation step)
Then we measure and report the accuracy of the selected model based on the test set (test step)

Now let's see how we get this data to which we are going to apply the model and see how well trained it is.

Since we don't have any training samples without the correct output, we can make up one from the original training samples that we will be using. So we can split our data samples into three different sets (as shown in Figure 1.9):

Train set: This will be used as a knowledge base for our model. Usually, will be 70% from the original data samples.
Validation set: This will be used to choose the best performing model among a set of models. Usually this will be 10% of the original data samples.
Test set: This will be used to measure and report the accuracy of the selected model. Usually, it will be as big as the validation set.

Figure 1.9: Splitting data into train, validation, and test sets

In case you have only one learning method that you are using, you can cancel the validation set and re-split your data to be train and test sets only. Usually, data scientists use 75/25 as percentages, or 70/30.

Data analysis pre-processing

In this section we are going to analyze and preprocess the input images and have it in an acceptable format for our learning algorithm, which is the convolution neural networks here.

So let's start off by importing the required packages for this implementation:

import numpy as np
np.random.seed(2018)

import os
import glob
import cv2
import datetime
import pandas as pd
import time
import warnings
warnings.filterwarnings("ignore")

from sklearn.cross_validation import KFold
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.optimizers import SGD
from keras.callbacks import EarlyStopping
from keras.utils import np_utils
from sklearn.metrics import log_loss
from keras import __version__ as keras_version

In order to use the images provided in the dataset, we need to get them to have the same size. OpenCV is a good choice for doing this, from the OpenCV website:

OpenCV (Open Source Computer Vision Library) is released under a BSD license and hence it’s free for both academic and commercial use. It has C++, C, Python and Java interfaces and supports Windows, Linux, Mac OS, iOS and Android. OpenCV was designed for computational efficiency and with a strong focus on real-time applications. Written in optimized C/C++, the library can take advantage of multi-core processing. Enabled with OpenCL, it can take advantage of the hardware acceleration of the underlying heterogeneous compute platform.

You can install OpenCV by using the python package manager by issuing, pip install opencv-python

# Parameters
# ----------
# img_path : path
#    path of the image to be resized
def rezize_image(img_path):
   #reading image file
   img = cv2.imread(img_path)
   #Resize the image to to be 32 by 32
   img_resized = cv2.resize(img, (32, 32), cv2.INTER_LINEAR)
return img_resized

Now we need to load all the training samples of our dataset and resize each image, according to the previous function. So we are going to implement a function that will load the training samples from the different folders that we have for each fish type:

# Loading the training samples and their corresponding labels
def load_training_samples():
    #Variables to hold the training input and output variables
    train_input_variables = []
    train_input_variables_id = []
    train_label = []
    # Scanning all images in each folder of a fish type
    print('Start Reading Train Images')
    folders = ['ALB', 'BET', 'DOL', 'LAG', 'NoF', 'OTHER', 'SHARK', 'YFT']
    for fld in folders:
       folder_index = folders.index(fld)
       print('Load folder {} (Index: {})'.format(fld, folder_index))
       imgs_path = os.path.join('..', 'input', 'train', fld, '*.jpg')
       files = glob.glob(imgs_path)
       for file in files:
           file_base = os.path.basename(file)
           # Resize the image
           resized_img = rezize_image(file)
           # Appending the processed image to the input/output variables of the classifier
           train_input_variables.append(resized_img)
           train_input_variables_id.append(file_base)
           train_label.append(folder_index)
     return train_input_variables, train_input_variables_id, train_label

As we discussed, we have a test set that will act as the unseen data to test the generalization ability of our model. So we need to do the same with testing images; load them and do the resizing processing:

def load_testing_samples():
    # Scanning images from the test folder
    imgs_path = os.path.join('..', 'input', 'test_stg1', '*.jpg')
    files = sorted(glob.glob(imgs_path))
    # Variables to hold the testing samples
    testing_samples = []
    testing_samples_id = []
    #Processing the images and appending them to the array that we have
    for file in files:
       file_base = os.path.basename(file)
       # Image resizing
       resized_img = rezize_image(file)
       testing_samples.append(resized_img)
       testing_samples_id.append(file_base)
    return testing_samples, testing_samples_id

Now we need to invoke the previous function into another one that will use the load_training_samples() function in order to load and resize the training samples. Also, it will add a few lines of code to convert the training data into NumPy format, reshape that data to fit into our classifier, and finally convert it to float:

def load_normalize_training_samples():
    # Calling the load function in order to load and resize the training samples
    training_samples, training_label, training_samples_id = load_training_samples()
    # Converting the loaded and resized data into Numpy format
    training_samples = np.array(training_samples, dtype=np.uint8)
    training_label = np.array(training_label, dtype=np.uint8)
    # Reshaping the training samples
    training_samples = training_samples.transpose((0, 3, 1, 2))
    # Converting the training samples and training labels into float format
    training_samples = training_samples.astype('float32')
    training_samples = training_samples / 255
    training_label = np_utils.to_categorical(training_label, 8)
    return training_samples, training_label, training_samples_id

We also need to do the same with the test:

def load_normalize_testing_samples():
    # Calling the load function in order to load and resize the testing samples
    testing_samples, testing_samples_id = load_testing_samples()
    # Converting the loaded and resized data into Numpy format
    testing_samples = np.array(testing_samples, dtype=np.uint8)
    # Reshaping the testing samples
    testing_samples = testing_samples.transpose((0, 3, 1, 2))
    # Converting the testing samples into float format
    testing_samples = testing_samples.astype('float32')
    testing_samples = testing_samples / 255
    return testing_samples, testing_samples_id

Model building

Now it's time to create the model. As we mentioned, we are going to use a deep learning architecture called CNN as a learning algorithm for this fish recognition task. Again, you are not required to understand any of the previous or the upcoming code in this chapter as we are only demonstrating how complex data science tasks can be solved by using only a few lines of code with the help of Keras and TensorFlow as a deep learning platform.

Also note that CNN and other deep learning architectures will be explained in greater detail in later chapters:

Figure 1.10: CNN architecture

So let's go ahead and create a function that will be responsible for creating the CNN architecture that will be used in our fish recognition task:

def create_cnn_model_arch():
    pool_size = 2 # we will use 2x2 pooling throughout
    conv_depth_1 = 32 # we will initially have 32 kernels per conv. layer...
    conv_depth_2 = 64 # ...switching to 64 after the first pooling layer
    kernel_size = 3 # we will use 3x3 kernels throughout
    drop_prob = 0.5 # dropout in the FC layer with probability 0.5
    hidden_size = 32 # the FC layer will have 512 neurons
    num_classes = 8 # there are 8 fish types
    # Conv [32] -> Conv [32] -> Pool
    cnn_model = Sequential()
    cnn_model.add(ZeroPadding2D((1, 1), input_shape=(3, 32, 32), dim_ordering='th'))
    cnn_model.add(Convolution2D(conv_depth_1, kernel_size, kernel_size, activation='relu', 
      dim_ordering='th'))
    cnn_model.add(ZeroPadding2D((1, 1), dim_ordering='th'))
    cnn_model.add(Convolution2D(conv_depth_1, kernel_size, kernel_size, activation='relu',
      dim_ordering='th'))
    cnn_model.add(MaxPooling2D(pool_size=(pool_size, pool_size), strides=(2, 2),
      dim_ordering='th'))
    # Conv [64] -> Conv [64] -> Pool
    cnn_model.add(ZeroPadding2D((1, 1), dim_ordering='th'))
    cnn_model.add(Convolution2D(conv_depth_2, kernel_size, kernel_size, activation='relu',
      dim_ordering='th'))
    cnn_model.add(ZeroPadding2D((1, 1), dim_ordering='th'))
    cnn_model.add(Convolution2D(conv_depth_2, kernel_size, kernel_size, activation='relu',
      dim_ordering='th'))
    cnn_model.add(MaxPooling2D(pool_size=(pool_size, pool_size), strides=(2, 2),
     dim_ordering='th'))
    # Now flatten to 1D, apply FC then ReLU (with dropout) and finally softmax(output layer)
    cnn_model.add(Flatten())
    cnn_model.add(Dense(hidden_size, activation='relu'))
    cnn_model.add(Dropout(drop_prob))
    cnn_model.add(Dense(hidden_size, activation='relu'))
    cnn_model.add(Dropout(drop_prob))
    cnn_model.add(Dense(num_classes, activation='softmax'))
    # initiating the stochastic gradient descent optimiser
    stochastic_gradient_descent = SGD(lr=1e-2, decay=1e-6, momentum=0.9, nesterov=True)    cnn_model.compile(optimizer=stochastic_gradient_descent,  # using the stochastic gradient descent optimiser
                  loss='categorical_crossentropy')  # using the cross-entropy loss function
    return cnn_model

Before starting to train the model, we need to use a model assessment and validation method to help us assess our model and see its generalization ability. For this, we are going to use a method called k-fold cross-validation. Again, you are not required to understand this method or how it works as we are going to explain this method later in much detail.

So let's start and and create a function that will help us assess and validate the model:

def create_model_with_kfold_cross_validation(nfolds=10):
    batch_size = 16 # in each iteration, we consider 32 training examples at once
    num_epochs = 30 # we iterate 200 times over the entire training set
    random_state = 51 # control the randomness for reproducibility of the results on the same platform
    # Loading and normalizing the training samples prior to feeding it to the created CNN model
    training_samples, training_samples_target, training_samples_id = 
      load_normalize_training_samples()
    yfull_train = dict()
    # Providing Training/Testing indices to split data in the training samples
    # which is splitting data into 10 consecutive folds with shuffling
    kf = KFold(len(train_id), n_folds=nfolds, shuffle=True, random_state=random_state)
    fold_number = 0 # Initial value for fold number
    sum_score = 0 # overall score (will be incremented at each iteration)
    trained_models = [] # storing the modeling of each iteration over the folds
    # Getting the training/testing samples based on the generated training/testing indices by 
      Kfold
    for train_index, test_index in kf:
       cnn_model = create_cnn_model_arch()
       training_samples_X = training_samples[train_index] # Getting the training input variables
       training_samples_Y = training_samples_target[train_index] # Getting the training output/label variable
       validation_samples_X = training_samples[test_index] # Getting the validation input variables
       validation_samples_Y = training_samples_target[test_index] # Getting the validation output/label variable
       fold_number += 1
       print('Fold number {} from {}'.format(fold_number, nfolds))
       callbacks = [
           EarlyStopping(monitor='val_loss', patience=3, verbose=0),
       ]
       # Fitting the CNN model giving the defined settings
       cnn_model.fit(training_samples_X, training_samples_Y, batch_size=batch_size,
         nb_epoch=num_epochs,
             shuffle=True, verbose=2, validation_data=(validation_samples_X,
               validation_samples_Y),
             callbacks=callbacks)
       # measuring the generalization ability of the trained model based on the validation set
       predictions_of_validation_samples = 
         cnn_model.predict(validation_samples_X.astype('float32'), 
         batch_size=batch_size, verbose=2)
       current_model_score = log_loss(Y_valid, predictions_of_validation_samples)
       print('Current model score log_loss: ', current_model_score)
       sum_score += current_model_score*len(test_index)
       # Store valid predictions
       for i in range(len(test_index)):
           yfull_train[test_index[i]] = predictions_of_validation_samples[i]
       # Store the trained model
       trained_models.append(cnn_model)
    # incrementing the sum_score value by the current model calculated score
    overall_score = sum_score/len(training_samples)
    print("Log_loss train independent avg: ", overall_score)
    #Reporting the model loss at this stage
    overall_settings_output_string = 'loss_' + str(overall_score) + '_folds_' + str(nfolds) + 
      '_ep_' + str(num_epochs)
    return overall_settings_output_string, trained_models

Now, after building the model and using k-fold cross-validation method in order to assess and validate the model, we need to report the results of the trained model over the test set. In order to do this, we are also going to use k-fold cross-validation but this time over the test to see how good our trained model is.

So let's define the function that will take the trained CNN models as an input and then test them using the test set that we have:

def test_generality_crossValidation_over_test_set( overall_settings_output_string, cnn_models):
    batch_size = 16 # in each iteration, we consider 32 training examples at once
    fold_number = 0 # fold iterator
    number_of_folds = len(cnn_models) # Creating number of folds based on the value used in the training step
    yfull_test = [] # variable to hold overall predictions for the test set
    #executing the actual cross validation test process over the test set
    for j in range(number_of_folds):
       model = cnn_models[j]
       fold_number += 1
       print('Fold number {} out of {}'.format(fold_number, number_of_folds))
       #Loading and normalizing testing samples
       testing_samples, testing_samples_id = load_normalize_testing_samples()
       #Calling the current model over the current test fold
       test_prediction = model.predict(testing_samples, batch_size=batch_size, verbose=2)
       yfull_test.append(test_prediction)
    test_result = merge_several_folds_mean(yfull_test, number_of_folds)
    overall_settings_output_string = 'loss_' + overall_settings_output_string \ + '_folds_' +
      str(number_of_folds)
    format_results_for_types(test_result, testing_samples_id, overall_settings_output_string)

Model training and testing

Now we are ready to start the model training phase by calling the main function create_model_with_kfold_cross_validation() for building and training the CNN model using 10-fold cross-validation; then we can call the testing function to measure the model's ability to generalize to the test set:

if __name__ == '__main__':
  info_string, models = create_model_with_kfold_cross_validation()
  test_generality_crossValidation_over_test_set(info_string, models)

Fish recognition – all together

After explaining the main building blocks for our fish recognition example, we are ready to see all the code pieces connected together and see how we managed to build such a complex system with just a few lines of code. The full code is placed in the Appendix section of the book.

Different learning types

According to Arthur Samuel (https://en.wikipedia.org/wiki/Arthur_Samuel), data science gives computers the ability to learn without being explicitly programmed. So, any piece of software that will consume training examples in order to make decisions over unseen data without explicit programming is considered learning. Data science or learning comes in three different forms.

Figure 1.12 shows the commonly used types of data science/machine learning:

Figure 1.12: Different types of data science/machine learning.

Supervised learning

The majority of data scientists use supervised learning. Supervised learning is where you have some explanatory features, which are called input variables (X), and you have the labels that are associated with the training samples, which are called output variables (Y). The objective of any supervised learning algorithm is to learn the mapping function from the input variables (X) to the output variables (Y):

So the supervised learning algorithm will try to learn approximately the mapping from the input variables (X) to the output variables (Y), such that it can be used later to predict the Y values of an unseen sample.

Figure 1.13 shows a typical workflow for any supervised data science system:

Figure 1.13: A typical supervised learning workflow/pipeline. The top part shows the training process that starts with feeding the raw data into a feature extraction module where we will select meaningful explanatory feature to represent our data. After that, the extracted/selected explanatory feature gets combined with the training set and we feed it to the learning algorithm in order to learn from it. Then we do some model evaluation to tune the parameters and get the learning algorithm to get the best out of the data samples.

This kind of learning is called supervised learning because you are getting the label/output of each training sample associated with it. In this case, we can say that the learning process is supervised by a supervisor. The algorithm makes decisions on the training samples and is corrected by the supervisor, based on the correct labels of the data. The learning process will stop when the supervised learning algorithm achieves an acceptable level of accuracy.

Supervised learning tasks come in two different forms; regression and classification:

Classification: A classification task is when the label or the output variable is a category, such as tuna or Opah or spam and non spam
Regression: A regression task is when the output variable is a real value, such as house prices or height

Unsupervised learning

Unsupervised learning is viewed as the second most common kind of learning that is utilized by information researchers. In this type of learning, only the explanatory features or the input variables (X) are given, without any corresponding label or output variable.

The target of unsupervised learning algorithms is to take in the hidden structures and examples in the information. This kind of learning is called unsupervised in light of the fact that there aren't marks related with the training samples. So it's a learning procedure without corrections, and the algorithm will attempt to find the basic structure on its own.

Unsupervised learning can be further broken into two forms—clustering and association tasks:

Clustering: A clustering task is where you want to discover similar groups of training samples and group them together, such as grouping documents by topic
Association: An association rule learning task is where you want to discover some rules that describe the relationships in your training samples, such as people who watch movie X also tend to watch movie Y

Figure 1.14 shows a trivial example of unsupervised learning where we have got scattered documents and we are trying to group similar ones together:

Figure 1.14: Shows how unsupervised use similarity measure such as Euclidean distance to group similar documents to together and draw a decision boundaries for them

Semi-supervised learning

Semi-supervised learning is a type of learning that sits in between supervised and unsupervised learning, where you have got training examples with input variables (X), but only some of them are labeled/tagged with the output variable (Y).

A good example of this type of learning is Flickr (https://www.flickr.com/), where you have got lots of images uploaded by users but only some of them are labeled (such as sunset, ocean, and dog) and the rest are unlabeled.

To solve the tasks that fall into this type of learning, you can use one of the following or a combination of them:

Supervised learning: Learn/train the learning algorithm to give predictions about the unlabeled data and then feed the entire training samples back to learn from it and predict the unseen data
Unsupervised learning: Use the unsupervised learning algorithms to learn the underlying structure of the explanatory features or the input variables as if you don't have any tagged training samples

Reinforcement learning

The last type of learning in machine learning is reinforcement learning, in which there's no supervisor but only a reward signal.

So the reinforcement learning algorithm will try to make a decision and then a reward signal will be there to tell whether this decision is right or wrong. Also, this supervision feedback or reward signal may not come instantaneously but get delayed for a few steps. For example, the algorithm will take a decision now, but only after many steps will the reward signal tell whether decision was good or bad.

Data size and industry needs

Data is the information base of our learning calculations; any uplifting and imaginative thoughts will be nothing with the absence of information. So in the event that you have a decent information science application with the right information, at that point you are ready to go.

Having the capacity to investigate and extricate an incentive from your information is obvious these days notwithstanding to the structure of your information, however since enormous information is turning into the watchword of the day then we require information science apparatuses and advancements that can scale with this immense measure of information in an unmistakable learning time. These days everything is producing information and having the capacity to adapt to it is a test. Huge organizations, for example, Google, Facebook, Microsoft, IBM, and so on, manufacture their own adaptable information science arrangements keeping in mind the end goal to deal with the tremendous amount of information being produced once a day by their clients.

TensorFlow, is a machine intelligence/data science platform that was released as an open source library on November 9, 2016 by Google. It is a scalable analytics platform that enables data scientists to build complex systems with a vast amount of data in visible time and it also enables them to use greedy learning methods that require lots of data to get a good performance.

Summary

In this chapter, we went through building a learning system for fish recognition; we also saw how we can build complex applications, such as fish recognition, using a few lines of code with the help of TensorFlow and Keras. This coding example was not meant to be understood from your side, rather to demonstrate the visibility of building complex systems and how data science in general and specifically deep learning became an easy-to-use tool.

We saw the challenges that you might encounter in your daily life as a data scientist while building a learning system.

We also looked at the typical design cycle for building a learning system and explained the overall idea of each component involved in this cycle.

Finally, we went through different learning types, having big data generated daily by big and small companies, and how this vast amount of data raises a red alert to build scalable tools to be able to analyze and extract value from this data.

At this point, the reader may be overwhelmed by all the information mentioned so far, but most of what we explained in this chapter will be addressed in other chapters, including data science challenges and the fish recognition example. The whole purpose of this chapter was to get an overall idea about data science and its development cycle, without any deep understanding of the challenges and the coding example. The coding example was mentioned in this chapter to break the fear of most newcomers in the field of data science and show them how complex systems such as fish recognition can be done in a few lines of code.

Next up, we will start our by example journey, by addressing the basic concepts of data science through an example. The next part will mainly focus on preparing you for the later advanced chapters, by going through the famous Titanic example. Lots of concepts will be addressed, including different learning methods for regression and classification, different types of performance errors and which one to care about most, and more about tackling some of the data science challenges and handling different forms of the data samples.

It was a great value and I look forward to learning all of the techniques in the books.

El contenido es el esperado.