Reader small image

You're reading from  Deep Learning with TensorFlow

Product typeBook
Published inApr 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781786469786
Edition1st Edition
Languages
Right arrow
Authors (3):
Giancarlo Zaccone
Giancarlo Zaccone
author image
Giancarlo Zaccone

Giancarlo Zaccone has over fifteen years' experience of managing research projects in the scientific and industrial domains. He is a software and systems engineer at the European Space Agency (ESTEC), where he mainly deals with the cybersecurity of satellite navigation systems. Giancarlo holds a master's degree in physics and an advanced master's degree in scientific computing. Giancarlo has already authored the following titles, available from Packt: Python Parallel Programming Cookbook (First Edition), Getting Started with TensorFlow, Deep Learning with TensorFlow (First Edition), and Deep Learning with TensorFlow (Second Edition).
Read more about Giancarlo Zaccone

Md. Rezaul Karim
Md. Rezaul Karim
author image
Md. Rezaul Karim

Md. Rezaul Karim is a researcher, author, and data science enthusiast with a strong computer science background, coupled with 10 years of research and development experience in machine learning, deep learning, and data mining algorithms to solve emerging bioinformatics research problems by making them explainable. He is passionate about applied machine learning, knowledge graphs, and explainable artificial intelligence (XAI). Currently, he is working as a research scientist at Fraunhofer FIT, Germany. He is also a PhD candidate at RWTH Aachen University, Germany. Before joining FIT, he worked as a researcher at the Insight Centre for Data Analytics, Ireland. Previously, he worked as a lead software engineer at Samsung Electronics, Korea.
Read more about Md. Rezaul Karim

Ahmed Menshawy
Ahmed Menshawy
author image
Ahmed Menshawy

Ahmed Menshawy is a Research Engineer at the Trinity College Dublin, Ireland. He has more than 5 years of working experience in the area of ML and NLP. He holds an MSc in Advanced Computer Science. He started his Career as a Teaching Assistant at the Department of Computer Science, Helwan University, Cairo, Egypt. He taught several advanced ML and NLP courses such as ML, Image Processing, and so on. He was involved in implementing the state-of-the-art system for Arabic Text to Speech. He was the main ML specialist at the Industrial research and development lab at IST Networks, based in Egypt.
Read more about Ahmed Menshawy

View More author details
Right arrow

Reinforcement Learning

Reinforcement Learning is based on an interesting psychological theory:

Applying a reward immediately after the occurrence of a response increases its probability of reoccurring, while providing punishment after the response will decrease the probability (Thorndike, 1911).

A reward, received immediately after the execution of a correct behavior, increases the likelihood that this behavior will be repeated; while, following an undesired behavior, the application of a punishment decreases the likelihood of that error reocurring. Therefore, once a goal has been established, Reinforcement Learning seeks to maximize the rewards received, to achieve the designated goal.

RL finds applications in different contexts in which supervised learning is inefficient.

A very short list includes the following:

  • Advertising helps in learning rank, using one-shot learning for emerging items, and new users will...

Basic concepts of Reinforcement Learning

Reinforcement Learning (RL) aims to create systems that will learn and, at the same time, adapt to changes in the environment in which they are located, using a reward that is assigned to each action performed.

Software systems that process information in this way are called intelligent agents.

These agents decide to take an action based on the following:

  • State of the system
  • Learning algorithm used

To change the system state and maximize its long term rewards, and agent selects the action to be performed by continuously monitoring its environment.

To obtain a large reward and, therefore, optimize the Reinforcement Learning procedure, the agent must prefer actions that, in the past, have produced a good reward.

The actions are discovered, proving those never selected first. Therefore, the agent must exploit what it already knows, both to obtain the maximum reward, and also...

Q-learning algorithm

Solving a Reinforcement Learning problem during the learning process estimates an evaluation function. This function must be able to assess, through the sum of the rewards, the convenience or, otherwise, a policy. The basic idea of Q-learning is that the algorithm learns the optimal evaluation function on the whole space of states and actions (SxA).

The so-called Q-function provides a match in the form Q: S × A => V, where V is the value of future rewards of an action, a Î A, executed in the state s Î S.

Once it has learned the optimal function, Q, the agent will of course be able to recognize what action will lead to the highest future reward in a s state.

One of the most used examples for implementing the Q-learning algorithm involves the use of a table. Each cell of the table is a value, Q(s; a)= V, initialized to 0.

The agent can perform any action a Î A, where A is...

Introducing the OpenAI Gym framework

To implement a Q-learning algorithm we'll use the OpenAI Gym framework, which is a TensorFlow compatible toolkit for developing and comparing Reinforcement Learning algorithms.

OpenAI Gym consists of two main parts:

  • The Gym open source library: A collection of problems and environments that can be used to test Reinforcement Learning algorithms. All these environments have a shared interface, allowing you to write RL algorithms.
  • The OpenAI Gym service: A site and API allowing people to meaningfully compare the performance of their trained agents.
See more references at https://gym.openai.com.

To get started, you'll need to have Python 2.7 or Python 3.5. To install Gym, use the pip installer:

sudo pip install gym.

Once installed, you can list Gym's environments as follows:

>>>from gym import envs 
>>>print(envs.registry.all())

The output list...

FrozenLake-v0 implementation problem

Here we report a basic Q-learning implementation for the FrozenLake-v0 problem.

Import the following two basic libraries:

import gym 
import numpyasnp

Then, we load the FrozenLake-v0 environment:

environment = gym.make('FrozenLake-v0')

Then, we build the Q-learning table; it has the dimensions SxA, where S is the dimension of the observation space, S, while A is the dimension of the action space, A:

S = environment.observation_space.n 
A = environment.action_space.n

The FrozenLake environment provides a state for each block, and four actions (that is, the four directions of movement), giving us a 16x4 table of Q-values to initialize:

Q = np.zeros([S,A])

Then, we define the a parameter for the training rule and the discount g factor:

alpha = .85 
gamma = .99

We fix the total number of episodes (trials):

num_episodes = 2000

Then, we initialize the rList, where we&apos...

Q-learning with TensorFlow

In the previous example, we saw how it is relatively simple, using a 16x4 grid, to update the Q-table at each step of the learning process. It is easy to imagine that the use of this table can serve for simple problems, but in real-world problems, we need a more sophisticated mechanism to update the system state. This is the point where deep learning steps in. Neural networks are exceptionally good at coming up with good features for highly structured data.

In this final section, we'll look at how to manage a Q-function with a neural network, which takes the state and action as input, and outputs the corresponding Q-value.

To do that, we'll build a one layer network that takes the state, encoded in a [1x16] vector, which learns the best move (action), mapping the possible actions in a vector of length four.

A recent application of deep Q-networks has been successful at playing...

Source code for the Q-learning neural network

The following is the full code for the example shown previously:

import gym 
import numpy as np
import random
import tensorflow as tf
import matplotlib.pyplot as plt

#Define the FrozenLake enviroment
env = gym.make('FrozenLake-v0')

#Setup the TensorFlow placeholders and variabiles
tf.reset_default_graph()
inputs1 = tf.placeholder(shape=[1,16],dtype=tf.float32)
W = tf.Variable(tf.random_uniform([16,4],0,0.01))
Qout = tf.matmul(inputs1,W)
predict = tf.argmax(Qout,1)
nextQ = tf.placeholder(shape=[1,4],dtype=tf.float32)

#define the loss and optimization functions
loss = tf.reduce_sum(tf.square(nextQ - Qout))
trainer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
updateModel = trainer.minimize(loss)

#initilize the vabiables
init = tf.global_variables_initializer()

#prepare the q-learning parameters
gamma = .99
e = 0.1
num_episodes = 6000
jList = []
rList...

Summary

This chapter covers the basic principles of Reinforcement Learning and the fundamental Q-learning algorithm.

The distinctive feature of Q-learning is its capacity to choose between immediate rewards and delayed rewards. Q-learning at its simplest uses tables to store data. This very quickly loses viability as the state/action space of the system it is monitoring/controlling increases.

We can overcome this problem by using a neural network as a function approximator, which takes the state and action as input, and outputs the corresponding Q-value.

Following this idea, we implemented a Q-learning neural network using the TensorFlow framework and the OpenAI Gym toolkit for developing and comparing Reinforcement Learning algorithms.

Our journey into Deep Learning with TensorFlow ends here.

Deep learning is a very productive research area; there are many books, courses, and online resources that may help you...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Deep Learning with TensorFlow
Published in: Apr 2017Publisher: PacktISBN-13: 9781786469786
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Giancarlo Zaccone

Giancarlo Zaccone has over fifteen years' experience of managing research projects in the scientific and industrial domains. He is a software and systems engineer at the European Space Agency (ESTEC), where he mainly deals with the cybersecurity of satellite navigation systems. Giancarlo holds a master's degree in physics and an advanced master's degree in scientific computing. Giancarlo has already authored the following titles, available from Packt: Python Parallel Programming Cookbook (First Edition), Getting Started with TensorFlow, Deep Learning with TensorFlow (First Edition), and Deep Learning with TensorFlow (Second Edition).
Read more about Giancarlo Zaccone

author image
Md. Rezaul Karim

Md. Rezaul Karim is a researcher, author, and data science enthusiast with a strong computer science background, coupled with 10 years of research and development experience in machine learning, deep learning, and data mining algorithms to solve emerging bioinformatics research problems by making them explainable. He is passionate about applied machine learning, knowledge graphs, and explainable artificial intelligence (XAI). Currently, he is working as a research scientist at Fraunhofer FIT, Germany. He is also a PhD candidate at RWTH Aachen University, Germany. Before joining FIT, he worked as a researcher at the Insight Centre for Data Analytics, Ireland. Previously, he worked as a lead software engineer at Samsung Electronics, Korea.
Read more about Md. Rezaul Karim

author image
Ahmed Menshawy

Ahmed Menshawy is a Research Engineer at the Trinity College Dublin, Ireland. He has more than 5 years of working experience in the area of ML and NLP. He holds an MSc in Advanced Computer Science. He started his Career as a Teaching Assistant at the Department of Computer Science, Helwan University, Cairo, Egypt. He taught several advanced ML and NLP courses such as ML, Image Processing, and so on. He was involved in implementing the state-of-the-art system for Arabic Text to Speech. He was the main ML specialist at the Industrial research and development lab at IST Networks, based in Egypt.
Read more about Ahmed Menshawy