You're reading from Recurrent Neural Networks with Python Quick Start Guide

Product typeBook

Published inNov 2018

Reading LevelIntermediate

PublisherPackt

ISBN-139781789132335

Edition1st Edition

Languages

Python

Tools

TensorFlow

Concepts

Neural Networks

Author (1)

Simeon Kostadinov

Understanding how recurrent neural networks work

With the use of a memory state, the RNN architecture perfectly addresses every sequence-based problem. In this section of the chapter, we will go over a full explanation of how this works. You will obtain knowledge about the general characteristics of a neural network as well as what makes RNNs special. This section emphasizes on the theoretical side (including mathematical equations), but I can assure you that once you grasp the fundamentals, any practical example will go smoothly.

To make the explanations understandable, let's discuss the task of generating text and, in particular, producing a new chapter based on one of my favorite book series, The Hunger Games, by Suzanne Collins.

Basic neural network overview

At the highest level, a neural network, which solves supervised problems, works as follows:

Obtain training data (such as images for image recognition or sentences for generating text)
Encode the data (neural networks work with numbers so a numeric representation of the data is required)
Build the architecture of your neural network model
Train the model until you are satisfied with the results
Evaluate your model by making a fresh new prediction

Let's see how these steps are applied for an RNN.

Obtaining data

For the problem of generating a new book chapter based on the book series The Hunger Games, you can extract the text from all books in The Hunger Games series (The Hunger Games, Mockingjay,and Catching Fire) by copying and pasting it. To do that, you need to find the books, content online.

Encoding the data

We use word embeddings (https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/) for this purpose. Word embedding is a collective name of all techniques where words or phrases from a vocabulary are mapped to vectors of real numbers. Some methods include one-hot encoding, word2vec, and GloVe. You will learn more about them in the forthcoming chapters.

Building the architecture

Each neural network consists of three sets of layers—input, hidden, and output. There is always one input and one output layer. If the neural network is deep, it has multiple hidden layers:

The difference between an RNN and the standard feedforward network comes in the cyclical hidden states. As seen in the following diagram, recurrent neural networks use cyclical hidden states. This way, data propagates from one time step to another, making each one of these steps dependent on the previous:

A common practice is to unfold the preceding diagram for better and more fluent understanding. After rotating the illustration vertically and adding some notations and labels, based on the example we picked earlier (generating a new chapter based on The Hunger Games books), we end up with the following diagram:

This is an unfolded RNN with one hidden layer. The identically looking sets of (input + hidden RNN unit + output) are actually the different time steps (or cycles) in the RNN. For example, the combination of + RNN + illustrates what is happening at time step . At each time step, these operations perform as follows:

The network encodes the word at the current time step (for example, t-1) using any of the word embedding techniques and produces a vector (The produced vector can be or depending on the specific time step)
Then, , the encoded version of the input word I at time step t-1, is plugged into the RNN cell (located in the hidden layer). After several equations (not displayed here but happening inside the RNN cell), the cell produces an output and a memory state . The memory state is the result of the input and the previous value of that memory state . For the initial time step, one can assume that is a zero vector
Producing the actual word (volunteer) at time step t-1 happens after decoding the output using a text corpus specified at the beginning of the training
Finally, the network moves multiple time steps forward until reaching the final step where it predicts the word

You can see how each one of {…, , , , …} holds information about all the previous inputs. This makes RNNs very special and really good at predicting the next unit in a sequence. Let's now see what mathematical equations sit behind the preceding operations.

Text corpus—an array of all words in the example vocabulary.

Training the model

All the magic in this model lies behind the RNN cells. In our simple example, each cell presents the same equations, just with a different set of variables. A detailed version of a single cell looks like this:

First, let's explain the new terms that appear in the preceding diagram:

Weights (, , ): A weight is a matrix (or a number) that represents the strength of the value it is applied to. For example, determines how much of the input should be considered in the following equations.
If consists of high values, then should have significant influence on the end result. The weight values are often initialized randomly or with a distribution (such as normal/Gaussian distribution). It is important to be noted that , , and are the same for each step. Using the backpropagation algorithm, they are being modified with the aim of producing accurate predictions

Biases (, ): An offset vector (different for each layer), which adds a change to the value of the output
Activation function (tanh): This determines the final value of the current memory state and the output . Basically, the activation functions map the resultant values of several equations similar to the following ones into a desired range: (-1, 1) if we are using the tanh function, (0, 1) if we are using sigmoid function, and (0, +infinity) if we are using ReLu (https://ai.stackexchange.com/questions/5493/what-is-the-purpose-of-an-activation-function-in-neural-networks)

Now, let's go over the process of computing the variables. To calculate and , we can do the following:

As you can see, the memory state is a result of the previous value and the input . Using this formula helps in retaining information about all the previous states.

The input is a one-hot representation of the word volunteer. Recall from before that one-hot encoding is a type of word embedding. If the text corpus consists of 20,000 unique words and volunteer is the 19th word, then is a 20,000-dimensional vector where all elements are 0 except the one at the 19^th position, which has a value of 1, which suggests that we only taking into account this particular word.

The sum between , , and is passed to the tanh activation function, which squashes the result between -1 and 1 using the following formula:

In this, e = 2.71828 (Euler's number) and z is any real number.

The output at time step t is calculated using and the softmax function. This function can be categorized as an activation with the exception that its primary usage is at the output layer when a probability distribution is needed. For example, predicting the correct outcome in a classification problem can be achieved by picking the highest probable value from a vector where all the elements sum up to 1. Softmax produces this vector, as follows:

In this, e = 2.71828 (Euler's number) and z is a K-dimensional vector. The formula calculates probability for the value at the i^th position in the vector z.

After applying the softmax function, becomes a vector of the same dimension as (the corpus size 20,000) with all its elements having a total sum of 1. With that in mind, finding the predicted word from the text corpus becomes straightforward.

Evaluating the model

Once an assumption for the next word in the sequence is made, we need to assess how good this prediction is. To do that, we need to compare the predicted word with the actual word from the training data (let's call it ). This operation can be accomplished using a loss (cost) function. These types of functions aim to find the error between predicted and actual values. Our choice will be the cross-entropy loss function, which looks like this:

Since we are not going to give a detailed explanation of this formula, you can treat it as a black box. If you are curious about how it works, I recommend reading the article Improving the way neural networks work by Michael Nielson (http://neuralnetworksanddeeplearning.com/chap3.html#introducing_the_cross-entropy_cost_function). A useful thing to know is that the cross-entropy function performs really well on classification problems.

After computing the error, we came to one of the most complex and, at the same time, powerful techniques in deep learning, called backpropagation.

In simple terms, we can state that the backpropagation algorithm traverses backward through all (or several) time steps while updating the weights and biases of the network. After repeating this procedure, and a certain amount of training steps, the network learns the correct parameters and will be able to yield better predictions.

To clear out any confusion, training and time steps are completely different terms. In one time step, we get a single element from the sequence and predict the next one. A training step is composed of multiple time steps where the number of time steps depends on how large the sequence for this training step is. In addition, time steps are only used in RNNs, but training ones are a general neural network concept.

After each training step, we can see that the value from the loss function decreases. Once it crosses a certain threshold, we can state that the network has successfully learned to predict new words in the text.

The last step is to generate the new chapter. This can happen by choosing a random word as a start (such as: games) and then predicting the next words using the preceding formulas with the pre-trained weights and biases. Finally, we should end up with somewhat meaningful text.

You have been reading a chapter from

Recurrent Neural Networks with Python Quick Start Guide

Published in: Nov 2018Publisher: PacktISBN-13: 9781789132335

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Simeon Kostadinov

Simeon Kostadinoff works for a startup called Speechify which aims to help people go through their readings faster by converting any text into speech. Simeon is Machine Learning enthusiast who writes a blog and works on various projects on the side. He enjoys reading different research papers and implement some of them in code. He was ranked number 1 in mathematics during his senior year of high school and thus he has deep passion about understanding how the deep learning models work under the hood. His specific knowledge in Recurrent Neural Networks comes from several courses that he has taken at Stanford University and University of Birmingham. They helped in understanding how to apply his theoretical knowledge into practice and build powerful models. In addition, he recently became a Stanford Scholar Initiative which includes working in a team of Machine Learning researchers on a specific deep learning research paper.
Read more about Simeon Kostadinov

Other recommended products

Related to this chapter

Hands-On Deep Learning with TensorFlow

With deep learning going mainstream, making sense of data and getting accurate results using deep networks is possible. Dan Van Boxel is your guide to exploring the possibilities with deep learning; he will enable you to understand data like never before. With the efficiency and simplicity of TensorFlow, you will be able to process your data and gain insights that will change how you look at data.

BookJul 2017174 pages

Deep Learning with Microsoft Cognitive Toolkit Quick Start Guide

Cognitive Toolkit is one of the most popular and recently open sourced deep learning toolkit by Microsoft. Cognitive Toolkit is used to train fast and effective deep learning models. This book will be a quick introduction to using Cognitive Toolkit and will teach you how to train and validate different types of neural networks.

BookMar 2019208 pages

Deep Learning with Theano

This book covers a complete overview of Deep Learning with Theano, a Python-based library that makes optimizing numerical expressions easy. Practical code examples address supervised, unsupervised, generative and reinforcement learning for image recognition, natural language processing, or game strategy, with best performing nets and principles.

BookJul 2017300 pages

Natural Language Processing with TensorFlow

TensorFlow is the leading framework for deep learning algorithms critical to artificial intelligence, and natural language processing (NLP) makes much of the data used by deep learning applications accessible to them. This book brings the two together and teaches deep learning developers how to work with today’s vast amount of unstructured data.

BookMay 2018472 pages

Deep Learning with Hadoop

BookFeb 2017206 pages

Mastering TensorFlow 1.x

We cover advanced deep learning concepts (such as transfer learning, generative adversarial models, and reinforcement learning), and implement them using TensorFlow and Keras. We cover how to build and deploy at scale with distributed models. You will learn to build TensorFlow models using R, Keras, TensorFlow Learn, TensorFlow Slim and Sonnet

BookJan 2018474 pages

Deep Learning for Natural Language Processing

Starting with the basics, this book teaches you how to choose from the various text pre-processing techniques and select the best model from the several neural network architectures for NLP issues.

BookJun 2019372 pages

Neural Network Programming with Tensorflow

If you’re aware of the buzz surrounding the terms such as machine learning, artificial intelligence or deep learning, you might know what neural networks are. TensorFlow is a popular framework which can be used to implement efficient neural networks and deep learning models. This book will show you how to leverage the power of TensorFlow to train efficient neural networks. You will start with understanding the fundamentals and basic math for neural networks and why TensorFlow is a popular choice of tool for programming neural networks. During the course of the book, you will be working on real-world datasets to get a hands-on understanding of neural network programming. By the end of this book, you will have a fair understanding of how you can leverage the power of TensorFlow to train neural networks of varying complexities, without any hassle. While you are learning about various neural network implementations you will learn the underlying mathematics and linear algebra and how it maps to the appropriate TensorFlow constructs.

BookNov 2017274 pages

Neural Networks with Keras Cookbook

This book presents solutions to the majority of the challenges you will face while training neural networks to solve deep learning problems. It covers the trending deep learning architectures used in industry and tackles a variety of use cases in computer vision, text processing, audio analysis, recommender systems, and game bots

BookFeb 2019568 pages

Hands-On Natural Language Processing with PyTorch 1.x

Developers working with NLP will be able to put their knowledge to work with this practical guide to PyTorch. You will learn to use PyTorch offerings and how to understand and analyze text using Python. You will learn to extract the underlying meaning in the text using deep neural networks and modern deep learning algorithms.

BookJul 2020276 pages

Hands-On Deep Learning Algorithms with Python

This book introduces basic-to-advanced deep learning algorithms used in a production environment by AI researchers and principal data scientists; it explains algorithms intuitively, including the underlying math, and shows how to implement them using popular Python-based deep learning libraries such as TensorFlow.

BookJul 2019512 pages

Hands-On Natural Language Processing with Python

This book teaches you to leverage deep learning models in performing various NLP tasks along with showcasing the best practices in dealing with the NLP challenges. The book equips you with practical knowledge to implement deep learning in your linguistic applications using NLTk and Python's popular deep learning library, TensorFlow.

BookJul 2018312 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages