You're reading from Hands-On Neural Networks with Keras

Product typeBook

Published inMar 2019

Reading LevelIntermediate

PublisherPackt

ISBN-139781789536089

Edition1st Edition

Languages

Python

Tools

Keras

Concepts

Neural Networks

Author (1)

Niloy Purkait

Recurrent Neural Networks

In the previous chapter, we marveled over the visual cortex and leveraged some insights from the way it processes visual signals to inform the architecture of Convolutional Neural Networks (CNNs), which form the base of many state-of-the-art computer vision systems. However, we do not understand the world around us with vision alone. Sound, for one, also plays a very important role. More specifically, we humans love to communicate and express intricate thoughts and ideas through sequences of symbolic reductions and abstract representations. Our built-in hardware allows us to interpret vocalizations or demarcations thereof, composing the base of human thought and collective understandings, upon which more complex representations (such as human languages, for instance) may be composed. In essence, these sequences of symbols are reduced representations of...

Modeling sequences

Perhaps you want to get the right translation for your order in a restaurant while visiting a foreign country. Maybe you want your car to perform a sequence of movements automatically so that it is able to park by itself. Or maybe you want to understand how different sequences of adenine, guanine, thymine, and cytosine molecules in the human genome lead to differences in biological processes occurring in the human body. What's the commonality between these examples? Well, these are all sequence modeling tasks. In such tasks, the training examples (being vectors of words, a set of car movements generated by on-board controls, or configuration of A, G, T, and C molecules) are essentially multiple time-dependent data points of a possibly varied length.

Sentences, for example, are composed of words, and the spatial configuration of these words allude not only...

Using RNNs for sequential modeling

The field of natural language understanding is a common area where recurrent neural networks (RNNs) tend to excel. You may imagine tasks such as recognizing named entities and classifying the predominant sentiment in a given piece of text. However, as we mentioned, RNNs are applicable to a broad spectrum of tasks that involve modeling time-dependent sequences of data. Generating music is also a sequence modeling task as we tend to distinguish music from a cacophony by modeling the sequence of notes that are played in a given tempo.

RNN architectures are even applicable for some visual intelligence tasks, such as video activity recognition. Recognizing whether a person is cooking, running, or robbing a bank in a given video is essentially modeling sequences of human movements and matching them to specific classes. In fact, RNNs have been deployed...

Summarizing different types of sequence processing tasks

Now, we have familiarized ourselves with the basic idea of what a recurrent layer does and have gone over some specific examples of use cases (from speech recognition, machine translation, and image captioning) where variations of such time-dependent models may be used. The following diagram provides a visual summary of some of the sequential tasks we discussed, along with the type of RNN that's suited for the job:

Next, we will dive deeper into the governing equations, as well as the learning mechanism behind RNNs.

How do RNNs learn?

As we saw previously, for virtually all neural nets, you can break down the learning mechanism into two separate parts. The forward...

Predicting an output per time step

Next, we will look at the equation that leverages the activation value that we just calculated to produce a prediction ( at the given time step (t). This is represented like so:

= g [ (Way x at) + by ]

This tells us is that our layer's prediction at a time step is determined by computing a dot product of yet another temporally shared output matrix of weights, along with the activation output (at) we just computed using the earlier equation.

Due to the sharing of the weight parameters, information from previous time steps is preserved and passed through the recurrent layer to inform the current prediction. For example, the prediction at time step three leverages information from the previous time steps, as shown by the green arrow here:

To formalize these computations, we mathematically show the relation between the predicted output at...

Backpropagation through time

Essentially, we are backpropagating our errors through several time steps, reflecting the length of a sequence. As we know, the first thing we need to have to be able to backpropagate our errors is a loss function. We can use any variation of the cross-entropy loss, depending on whether we are performing a binary task per sequence (that is, entity or not, per word à binary cross-entropy) or a categorical one (that is, the next word out of the category of words in our vocabulary à categorical cross entropy). The loss function here computes the cross-entropy loss between a predicted output and actual value (y), at time step, t:

( log - [ (1-

This function essentially lets us perform an element-wise loss computation of each predicted and actual output, at each time step for our recurrent layer. Hence, we generate a loss value at each prediction...

Exploding and vanishing gradients

Backpropagating the model's errors in a deep neural network, however, comes with its own complexities. This holds equally true for RNNs, facing their own versions of the vanishing and exploding gradient problem. As we discussed earlier, the activation of neurons in a given time step is dependent on the following equation:

at = tanH [ (W x t ) + (Waa x a(t-1)) + ba ]

We saw how Wax and Waa are two separate weight matrices that the RNN layers share through time. These matrices are multiplied to the input matrix at current time, and the activation from the previous time step, respectively. The dot products are then summed up, along with a bias term, and passed through a tanh activation function to compute the activation of neurons at current time (t). We then used this activation matrix to compute the predicted output at current time (), before...

GRUs

The GRU can be considered the younger sibling of the LSTM, which we will look at Chapter 6, Long-Short Term Memory Networks. In essence, both leverage similar concepts to modeling long-term dependencies, such as remembering whether the subject of the sentence is plural, when generating following sequences. Soon, we will see how memory cells and flow gates can be used to address the vanishing gradient problem, while better modeling long term dependencies in sequence data. The underlying difference between GRUs and LSTMs is in the computational complexity they represent. Simply put, LSTMs are more complex architectures that, while computationally expensive and time-consuming to train, perform very well at breaking down the training data into meaningful and generalizable representations. GRUs, on the other hand, while computationally less intensive, are limited in their representational...

Building character-level language models in Keras

Now, we have a good command over the basic learning mechanism of different types of RNNs, both simple and complex. We also know a bit about different sequence processing use cases, as well as different RNN architectures that permit us to model these sequences. Let's combine all of this knowledge and put it to use. Next up, we will test these different models on a hands-on task and see how each of them do.

We will explore the simple use case of building a character level language model, much like the autocorrect model almost everybody is familiar with, which is implemented on word processor applications for almost all devices. A key difference will be that we will train our RNN to derive a language model from Shakespeare's Hamlet. Hence, our network will take a sequence of characters from Shakespeare's Hamlet as input...

Statistics of character modeling

We often distinguish words and numbers as being in different realms. As it happens, they are not so far apart. Everything can be deconstructed using the universal language of mathematics. This is quite a fortunate property of our reality, not just for the pleasure of modeling statistical distributions over sequences of characters. However, since we are on the topic, we will go ahead and define the concept of language models. In essence, language models follow Bayesian logic that relates the probability of posterior events (or tokens to come) as a function of prior occurrences (tokens that came). With such an assumption, we are able to construct a feature space corresponding to the statistical distribution of words over a period of time. The RNNs we will build shortly will each construct a unique feature space of probability distributions. Then...

The purpose of controlling stochasticity

The main concept behind sampling is how you choose control stochasticity (or randomness) in selecting the next character from the probability distributions for possible characters to come. Various applications may ask for different approaches.

Greedy sampling

If you are trying to train an RNN for automatic text completion and correction, you will probably be better off going with a greedy sampling strategy. This simply means that, at each sampling step, you will choose the next character in the sequence based on the character that was attributed the highest probability distribution by our Softmax output. This ensures that your network will output predictions that likely correspond to...

Testing different RNN models

Now that we have our training data preprocessed and ready in tensor format, we can try a slightly different approach than previous chapters. Normally, we would go ahead and build a single model and then proceed to train it. Instead, we will construct several models, each reflecting a different RNN architecture, and train them successively to see how each of them do at the task of generating character-level sequences. In essence, each of these models will leverage a different learning mechanism and induct its proper language model, based on sequences of characters it sees. Then, we can sample the language models that are learned by each network. In fact, we can even sample our networks in-between training epochs to see how our network is doing at generating Shakespearean phrases at the level of each epoch. Before we continue to build our networks, we...

Building a SimpleRNN

The SimpleRNN model in Keras is a basic RNN layer, like the ones we discussed earlier. While it has many parameters, most of them are set with excellent defaults that will get you by for many different use cases. Since we have initialized the RNN layer as the first layer of our model, we must pass it an input shape, corresponding to the length of each sequence (which we chose to be 40 characters earlier) and the number of unique characters in our dataset (which was 44). While this model is computationally compact to run, it gravely suffers from the vanishing gradients problem we spoke of. As a result, it has some trouble modeling long-term dependencies:

from keras.models import Sequential
from keras.layers import Dense, Bidirectional, Dropout
from keras.layers import SimpleRNN, GRU, BatchNormalization
from keras.optimizers import RMSprop
'''Fun...

Building GRUs

Excellent at mitigating the vanishing gradients problem, the GRU is a good choice for modeling long-term dependencies such as grammar, punctuation, and word morphology:

def GRU_stacked_model():
    model = Sequential()
    model.add(GRU(128, input_shape=(seq_len, len(characters)), return_sequences=True))
    model.add(GRU(128))
    model.add(Dense(len(characters), activation='softmax'))
    return model

Just like the SimpleRNN, we define the dimensions of the input at the first layer and return a 3D tensor output to the second GRU layer, which will help retain more complex time-dependent representations that are present in our training data. We also stack two GRU layers on top of each other to see what the increased representational power of our model produces:

Hopefully, this architecture results in realistic albeit novel sequences of text that even a...

On processing reality sequentially

The notion of changing the order of processing a sequence is quite an intriguing one. We humans certainly seem to prefer a certain order of learning things over another. The second sentence that's been reproduced in the following image simply makes no sense to us, even though we know exactly what each individual word within the sentence means. Similarly, many of us have a hard time reciting the letters of the alphabet backward, even though we are extremely familiar with each letter, and compose much more complex concepts with them, such as words, ideas, and even Keras code:

It is very likely that our sequential preferences have to do with the nature of our reality, which is sequential and forward-moving by definition. At the end of the day, the configuration of the 10¹¹ neurons in our brain has been engineered by time and natural forces...

Bi-directional layer in Keras

Therefore, the bi-directional layer in Keras processes a sequence of data in both the normal and reverse sequence, which allows us to pick up on words that come later on in the sequence to inform our prediction at the current time.

Essentially, the bi-directional layer duplicates any layer that's fed to it and uses one copy to process information in the normal sequential order, while the other processes data in the reverse order. Pretty neat, no? We can intuitively visualize what a bi-directional layer actually does by going through a simple example. Suppose you were modeling the two-word sequence Whats up, with a bi-directional GRU:

To do this, you will nest the GRU in a bi-directional layer, which allows Keras to generates two versions of the bi-directional model. In the preceding image, we stacked two bi-directional layers on top of each...

Visualizing output values

For the sake of entertainment, we will display some of the more interesting results from our own training experiments to conclude this chapter. The first screenshot shows the output that's generated by our SimpleRNN model at the end of the first epoch (note that the output prints out the first epoch as epoch 0). This is simply an implementational issue, denoting the first index position in range of n epochs. As we can see, even after the very first epoch, the SimpleRNN seems to have picked up on word morphology and generates real English words at low sampling thresholds.

This is just as we expected. Similarly, higher entropy samples (with a threshold of 1.2, for example) produce more stochastic results and generate (from a subjective perspective) interesting sounding words (such as eresdoin, harereus, and nimhte):

...

Summary

In this chapter, we learned about recurren6t neural networks and their aptness at processing sequential time-dependent data. The concepts that you have learned can now be applied to any time-series dataset that you may stumble upon. While this holds true for use cases such as stock market data and time-series in nature, it would be unreasonable to expect fantastic results from feeding your network real time price changes only. This is simply because the elements that affect the market price of stocks (such as investor perception, information networks, and available resources) are not nearly reflected to the level that would allow proper statistical modeling. The key is representing all relevant information in the most learnable manner possible for your network to successfully encode valuable representations therefrom.

While we did extensively explore the learning mechanisms...

Exercise

Train each model on the Hamlet text and use their history objects to compare their relative losses. Which one converges faster? What do they learn?
Examine the samples that are generated at different entropy distributions, at each epoch, to see how each RNN improves upon its language model through time.

The rest of the chapter is locked

You have been reading a chapter from

Hands-On Neural Networks with Keras

Published in: Mar 2019Publisher: PacktISBN-13: 9781789536089

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Niloy Purkait

Niloy Purkait is a technology and strategy consultant by profession. He currently resides in the Netherlands, where he offers his consulting services to local and international companies alike. He specializes in integrated solutions involving artificial intelligence, and takes pride in navigating his clients through dynamic and disruptive business environments. He has a masters in Strategic Management from Tilburg University, and a full specialization in data science from Michigan University. He has advanced industry grade certifications from IBM, in subjects like signal processing, cloud computing, machine and deep learning. He is also perusing advanced academic degrees in several related fields, and is a self-proclaimed lifelong learner.
Read more about Niloy Purkait

Other recommended products

Related to this chapter

Generative Adversarial Networks Projects

In this book, we will use different complexities of datasets in order to build end-to-end projects. With every chapter, the level of complexity and operations will become advanced. It consists of 8 full-fledged projects covering approaches such as 3D-GAN, Age-cGAN, DCGAN, SRGAN, StackGAN, and CycleGAN with real-world use cases.

BookJan 2019316 pages

TensorFlow Reinforcement Learning Quick Start Guide

This book is an essential guide for anyone interested in Reinforcement Learning. The book provides an actionable reference for Reinforcement Learning algorithms and their applications using TensorFlow and Python. It will help readers leverage the power of algorithms such as Deep Q-Network (DQN), Deep Deterministic Policy Gradients (DDPG), and Proximal Policy Optimization (PPO) to solve challenging control and decision-making problems.

BookMar 2019184 pages

Reinforcement Learning with TensorFlow

Reinforcement learning allows you to develop intelligent, self-learning systems. This book shows you how to put the concepts of Reinforcement Learning to train efficient models.You will use popular reinforcement learning algorithms to implement use-cases in image processing and NLP, by combining the power of TensorFlow and OpenAI Gym.

BookApr 2018334 pages

Neural Networks with Keras Cookbook

This book presents solutions to the majority of the challenges you will face while training neural networks to solve deep learning problems. It covers the trending deep learning architectures used in industry and tackles a variety of use cases in computer vision, text processing, audio analysis, recommender systems, and game bots

BookFeb 2019568 pages

Advanced Deep Learning with R

This book will help readers to apply deep learning algorithms in R using advanced examples. You will cover variants of neural network models such as ANN, CNN, RNN, LSTM, and more using expert techniques. Readers will make use of popular deep learning libraries such as Keras-R, Tensorflow-R, and more to implement AI models.

BookDec 2019352 pages

Deep Learning with R Cookbook

This book will help you get through the problems that you face during the execution of different tasks and understand hacks in deep learning. With unique recipes, you will implement various deep learning architectures using R 3.5.x. You will cover complex algorithms to perform tasks such as reinforcement learning, GANs, advanced neural networks and more.

BookFeb 2020328 pages

Python Deep Learning Cookbook

Deep Learning is a rapidly evolving field of Machine Learning science which gives machines the ability to learn from information. This book contains detailed recipes to tackle with the common and not so common problems while dealing with deep learning algorithms and models in Python. You will benefit from this book by finding technical solutions to the issues presented, along with a detailed explanation of the solutions, and a discussion on corresponding pros and cons of implementing the proposed solution using Theano, Tensorflow, MXNet, and Keras. You'll come across recipes on data pre-processing, network models and topologies, supervised and unsupervised learning presented in a “solution to problem” fashion.

BookOct 2017330 pages

Hands-On Deep Learning Algorithms with Python

This book introduces basic-to-advanced deep learning algorithms used in a production environment by AI researchers and principal data scientists; it explains algorithms intuitively, including the underlying math, and shows how to implement them using popular Python-based deep learning libraries such as TensorFlow.

BookJul 2019512 pages

Hands-On Q-Learning with Python

Q-learning is the reinforcement learning approach behind Deep-Q-Learning and is a values-based learning algorithm in RL. This book will help you get comfortable with developing the effective agents for Q learning and also make you learn to effectively develop and deploy Deep Q networks for complex AI applications.

BookApr 2019212 pages

Deep Learning with Keras

Keras is a high-level neural network library written in Python that runs on top of either Theano or TensorFlow. With this book, you’ll learn the basics of Keras in a highly practical way and understand how this minimal, highly modular framework runs on both CPU and GPU, allowing you to put your ideas into action in the shortest possible time.

BookApr 2017318 pages

Deep Learning for Natural Language Processing

Starting with the basics, this book teaches you how to choose from the various text pre-processing techniques and select the best model from the several neural network architectures for NLP issues.

BookJun 2019372 pages

Neural Network Projects with Python

This book contains practical implementations of several deep learning projects in multiple domains, including in regression-based tasks such as taxi fare prediction in New York City, image classification of cats and dogs using a convolutional neural network, implementing a facial recognition security system using Siamese Neural Networks, and more.

BookFeb 2019308 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages