You're reading from Deep Learning with TensorFlow and Keras – 3rd edition - Third Edition

Product typeBook

Published inOct 2022

PublisherPackt

ISBN-139781803232911

Edition3rd Edition

Concepts

Deep Learning

Authors (3):

Amita Kapoor

Antonio Gulli

Sujit Pal

View More author details

Neural Network Foundations with TF

In this chapter, we learn the basics of TensorFlow, an open-source library developed by Google for machine learning and deep learning. In addition, we introduce the basics of neural networks and deep learning, two areas of machine learning that have had incredible Cambrian growth during the last few years. The idea behind this chapter is to provide all the tools needed to do basic but fully hands-on deep learning.

We will learn:

What TensorFlow and Keras are
An introduction to neural networks
What the perceptron and multi-layer perceptron are
A real example: recognizing handwritten digits

All the code files for this chapter can be found at https://packt.link/dltfchp1.

Let’s begin!

What is TensorFlow (TF)?

TensorFlow is a powerful open-source software library developed by the Google Brain Team for deep neural networks, the topic covered in this book. It was first made available under the Apache 2.0 License in November 2015 and has since grown rapidly; as of May 2022, its GitHub repository (https://github.com/tensorflow/tensorflow) has more than 129,000 commits, with roughly 3,100 contributors. This in itself provides a measure of the popularity of TensorFlow.

Let us first learn what exactly TensorFlow is and why it is so popular among deep neural network researchers and engineers. Google calls it “an open-source software library for machine intelligence,” but since there are so many other deep learning libraries like PyTorch (https://pytorch.org/), Caffe (https://caffe.berkeleyvision.org/), and MXNet (https://mxnet.apache.org/), what makes TensorFlow special? Most other deep learning libraries, like TensorFlow, have auto-differentiation (a...

What is Keras?

Keras is a beautiful API for composing building blocks to create and train deep learning models. Keras can be integrated with multiple deep learning engines including Google TensorFlow, Microsoft CNTK, Amazon MXNet, and Theano. Starting with TensorFlow 2.0, Keras, the API developed by François Chollet, has been adopted as the standard high-level API, largely simplifying coding and making programming more intuitive.

Introduction to neural networks

Artificial neural networks (briefly, “nets” or ANNs) represent a class of machine learning models loosely inspired by studies about the central nervous systems of mammals. Each ANN is made up of several interconnected “neurons,” organized in “layers.” Neurons in one layer pass messages to neurons in the next layer (they “fire,” in jargon terms) and this is how the network computes things. Initial studies were started in the early 1950s with the introduction of the “perceptron” [1], a two-layer network used for simple operations, and further expanded in the late 1960s with the introduction of the “back-propagation” algorithm used for efficient multi-layer network training (according to [2] and [3]). Some studies argue that these techniques have roots dating further back than normally cited [4].

Neural networks were a topic of intensive academic studies up until the...

Perceptron

The “perceptron” is a simple algorithm that, given an input vector x of m values (x₁, x₂,..., and x_m), often called input features or simply features, outputs either a 1 (“yes”) or a 0 (“no”). Mathematically, we define a function:

Where w is a vector of weights, is the dot product , and b is the bias. If you remember elementary geometry, wx + b defines a boundary hyperplane that changes position according to the values assigned to w and b. Note that a hyperplane is a subspace whose dimension is one fewer than that of its ambient space. See (Figure 1.2) for an example:

Figure 1.2: An example of a hyperplane

In other words, this is a very simple but effective algorithm! For example, given three input features, the amounts of red, green, and blue in a color, the perceptron could try to decide whether the color is “white” or not.

Note that the perceptron cannot express a “maybe”...

Multi-layer perceptron: our first example of a network

In this chapter, we present our first example of a network with multiple dense layers. Historically, “perceptron” was the name given to the model having one single linear layer, and as a consequence, if it has multiple layers, we call it a Multi-Layer Perceptron (MLP). Note that the input and the output layers are visible from the outside, while all the other layers in the middle are hidden – hence the name hidden layers. In this context, a single layer is simply a linear function and the MLP is therefore obtained by stacking multiple single layers one after the other:

Diagram Description automatically generated

Figure 1.3: An example of multiple layer perceptron

In Figure 1.3 each node in the first hidden layer receives an input and “fires” (0,1) according to the values of the associated linear function. Then the output of the first hidden layer is passed to the second layer where another linear function is applied, the results...

A real example: recognizing handwritten digits

In this section we will build a network that can recognize handwritten numbers. To achieve this goal, we use MNIST (http://yann.lecun.com/exdb/mnist/), a database of handwritten digits made up of a training set of 60,000 examples, and a test set of 10,000 examples. The training examples are annotated by humans with the correct answer. For instance, if the handwritten digit is the number “3,” then 3 is simply the label associated with that example.

In machine learning, when a dataset with correct answers is available, we say that we can perform a form of supervised learning. In this case we can use training examples for improving our net. Testing examples also have the correct answer associated with each digit. In this case, however, the idea is to pretend that the label is unknown, let the network do the prediction, and then later on reconsider the label to evaluate how well our neural network has learned to recognize...

Regularization

In this section we will review a few best practices for improving the training phase. In particular, regularization and batch normalization will be discussed.

Adopting regularization to avoid overfitting

Intuitively, a good machine learning model should achieve low error on training data. Mathematically this is equivalent to minimizing the loss function on the training data given the model:

However, this might not be enough. A model can become excessively complex in order to capture all the relations inherently expressed by the training data. This increase in complexity might have two negative consequences. First, a complex model might require a significant amount of time to be executed. Second, a complex model might achieve very good performance on training data but perform quite badly on validation data. This is because the model is able to contrive relationships between many parameters in the specific training context, but these relationships in...

Playing with Google Colab: CPUs, GPUs, and TPUs

Google offers a truly intuitive tool for training neural networks and for playing with TensorFlow at no cost. You can find an actual Colab, which can be freely accessed, at https://colab.research.google.com/ and if you are familiar with Jupyter notebooks you will find a very familiar web-based environment here. Colab stands for Colaboratory and is a Google research project created to help disseminate machine learning education and research. We will see the difference between CPUs, GPUs, and TPUs in Chapter 15, Tensor Processing Unit.

For now, it’s important to know that CPUs are generic processing units, while GPUs and TPUs are accelerators, specific processing units suitable for deep learning. Let’s see how it works, starting with the screenshot shown in Figure 1.23:

Figure 1.23: An example of notebooks in Colab

By accessing Colab, we can either check a listing of notebooks generated in the past or we...

Sentiment analysis

What is the code we used to test Colab? It is an example of sentiment analysis developed on top of the IMDB dataset. The IMDB dataset contains the text of 50,000 movie reviews from the Internet Movie Database. Each review is either positive or negative (for example, thumbs up or thumbs down). The dataset is split into 25,000 reviews for training and 25,000 reviews for testing. Our goal is to build a classifier that can predict the binary judgment given the text. We can easily load IMDB via tf.keras and the sequences of words in the reviews have been converted to sequences of integers, where each integer represents a specific word in a dictionary. We also have a convenient way of padding sentences to max_len, so that we can use all sentences, whether short or long, as inputs to a neural network with an input vector of fixed size:

import tensorflow as tf
from tensorflow.keras import datasets, layers, models, preprocessing
import tensorflow_datasets as tfds
max_len...

Predicting output

Once a net is trained, it can of course be used for making predictions. In TensorFlow, this is very simple. We can use this method:

# Making predictions.
predictions = model.predict(X)

For a given input, several types of output can be computed including a method model.evaluate() used to compute the loss values, a method model.predict_classes() used to compute category outputs, and a method model.predict_proba() used to compute class probabilities.

A practical overview of backpropagation

Multi-layer perceptrons learn from training data through a process called backpropagation. In this paragraph we will give an intuition while more details are in Chapter 14, The Math Behind Deep Learning. The process can be described as a way of progressively correcting mistakes as soon as they are detected. Let’s see how this works.

Remember that each neural network layer has an associated set of weights that determine the output values for a given set of inputs. Additionally, remember that a neural network can have multiple hidden layers.

At the beginning, all the weights have some random assignment. Then the neural network is activated for each input in the training set: values are propagated forward from the input stage through the hidden stages to the output stage where a prediction is made.

Note that we keep Figure 1.27 below simple by only representing a few values with green dotted lines but in reality, all the values...

What have we learned so far?

In this chapter, we have learned the basics of neural networks. More specifically, we have learned what a perceptron is and what a multi-layer perceptron is, how to define neural networks in TensorFlow, how to progressively improve metrics once a good baseline is established, and how to fine-tune the hyperparameter space. In addition to that, we also have a good idea of useful activation functions (sigmoid and ReLU) available, and how to train a network with backpropagation algorithms based on either GD, SGD, or more sophisticated approaches, such as Adam and RMSProp.

Toward a deep learning approach

While playing with handwritten digit recognition, we came to the conclusion that the closer we get to an accuracy of 99%, the more difficult it is to improve. If we want more improvement, we definitely need a new idea. What are we missing? Think about it.

The fundamental intuition is that in our examples so far, we are not making use of the local spatial structure of images, which means we will use the fact that an image can be described as a matrix with data locality. In particular, this piece of code transforms the bitmap representing each written digit into a flat vector where the local spatial structure (the fact that some pixels are closer to each other) is gone:

# X_train is 60000 rows of 28x28 values; we  --> reshape it as in 60000 x 784.
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)

However, this is not how our brain works. Remember that our vision is based on multiple cortex levels, each one recognizing...

Summary

In this chapter we learned what TensorFlow and Keras are and introduced neural networks with the perceptron and the multi-layer perceptron. Then, we saw a real example of recognizing handwritten digits with several optimizations.

The next chapter is devoted to regression and classification.

References

Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev, vol. 65, pp. 386–408.
Werbos, P. J. (1990). Backpropagation through time: what it does and how to do it. Proc. IEEE, vol. 78, pp. 1550–1560.
Hinton, G. E., Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Comput, vol. 18, pp. 1527–1554.
Schmidhuber, J. (2015). Deep learning in neural networks: an overview. Neural Networks : Off. J. Int. Neural Netw. Soc., vol. 61, pp. 85–117.
Leven, S. (1996). The roots of backpropagation: From ordered derivatives to neural networks and political forecasting. Neural Networks, vol. 9.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, vol. 323.
Herculano-Houzel, S. (2009). The human brain in numbers: a linearly scaled-up primate...

The rest of the chapter is locked

You have been reading a chapter from

Deep Learning with TensorFlow and Keras – 3rd edition - Third Edition

Published in: Oct 2022Publisher: PacktISBN-13: 9781803232911

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Amita Kapoor

Amita Kapoor is an accomplished AI consultant and educator, with over 25 years of experience. She has received international recognition for her work, including the DAAD fellowship and the Intel Developer Mesh AI Innovator Award. She is a highly respected scholar in her field, with over 100 research papers and several best-selling books on deep learning and AI. After teaching for 25 years at the University of Delhi, Amita took early retirement and turned her focus to democratizing AI education. She currently serves as a member of the Board of Directors for the non-profit Neuromatch Academy, fostering greater accessibility to knowledge and resources in the field. Following her retirement, Amita also founded NePeur, a company that provides data analytics and AI consultancy services. In addition, she shares her expertise with a global audience by teaching online classes on data science and AI at the University of Oxford.
Read more about Amita Kapoor

Antonio Gulli

Antonio Gulli has a passion for establishing and managing global technological talent for innovation and execution. His core expertise is in cloud computing, deep learning, and search engines. Currently, Antonio works for Google in the Cloud Office of the CTO in Zurich, working on Search, Cloud Infra, Sovereignty, and Conversational AI.
Read more about Antonio Gulli

Sujit Pal

Sujit Pal is a Technology Research Director at Elsevier Labs, an advanced technology group within the Reed-Elsevier Group of companies. His interests include semantic search, natural language processing, machine learning, and deep learning. At Elsevier, he has worked on several initiatives involving search quality measurement and improvement, image classification and duplicate detection, and annotation and ontology development for medical and scientific corpora.
Read more about Sujit Pal

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages