Reader small image

You're reading from  Reinforcement Learning Algorithms with Python

Product typeBook
Published inOct 2019
Reading LevelBeginner
PublisherPackt
ISBN-139781789131116
Edition1st Edition
Languages
Right arrow
Author (1)
Andrea Lonza
Andrea Lonza
author image
Andrea Lonza

Andrea Lonza is a deep learning engineer with a great passion for artificial intelligence and a desire to create machines that act intelligently. He has acquired expert knowledge in reinforcement learning, natural language processing, and computer vision through academic and industrial machine learning projects. He has also participated in several Kaggle competitions, achieving high results. He is always looking for compelling challenges and loves to prove himself.
Read more about Andrea Lonza

Right arrow

Deep Q-Network

So far, we've approached and developed reinforcement learning algorithms that learn about a value function, V, for each state, or an action-value function, Q, for each action-state pair. These methods involve storing and updating each value separately in a table (or an array). These approaches do not scale because, for a large number of states and actions, the table's dimensions increase exponentially and can easily exceed the available memory capacity.

In this chapter, we will introduce the use of function approximation in reinforcement learning algorithms to overcome this problem. In particular, we will focus on deep neural networks that are applied to Q-learning. In the first part of this chapter, we'll explain how to extend Q-learning with function approximation to store Q values, and we'll explore some major difficulties that we may face...

Deep neural networks and Q-learning

The Q-learning algorithm, as we saw in Chapter 4, Q-Learning and SARSA Applications, has many qualities that enable its application in many real-world contexts. A key ingredient of this algorithm is that it makes use of the Bellman equation for learning the Q-function. The Bellman equation, as used by the Q-learning algorithm, enables the updating of Q-values from subsequent state-action values. This makes the algorithm able to learn at every step, without waiting until the trajectory is completed. Also, every state or action-state pair has its own values stored in a lookup table that saves and retrieves the corresponding values. Being designed in this way, Q-learning converges to optimal values as long as all the state-action pairs are repeatedly sampled. Furthermore, the method uses two policies: a non-greedy behavior policy to gather experience...

DQN

DQN, which was introduced for the first time in the paper Human-level control through deep reinforcement learning by Mnih and others from DeepMind, is the first scalable reinforcement learning algorithm that combines Q-learning with deep neural networks. To overcome stability issues, DQN adopts two novel techniques that turned out to be essential for the balance of the algorithm.

DQN has proven itself to be the first artificial agent capable of learning in a diverse array of challenging tasks. Furthermore, it has learned how to control many tasks using only high-dimensional row pixels as input and using an end-to-end RL approach.

The solution

The key innovations brought by DQN involve a replay buffer to get over the data...

DQN applied to Pong

Equipped with all the technical knowledge about Q-learning, deep neural networks, and DQN, we can finally put it to work and start to warm up the GPU. In this section, we will apply DQN to an Atari environment, Pong. We have chosen Pong rather than all the other Atari environments because it's simpler to solve and thus requires less time, computational power, and memory. That being said, if you have a decent GPU available, you can apply the same exact configuration to almost all the other Atari games (some may require a little bit of fine-tuning). For the same reason, we adopted a lighter configuration compared to the original DQN paper, both in terms of the capacity of the function approximator (that is, fewer weights) and hyperparameters such as a smaller buffer size. This does not compromise the results rather on Pong but might degrade the performance...

DQN variations

Following the amazing results of DQN, many researchers have studied it and come up with integrations and changes to improve its stability, efficiency, and performance. In this section, we will present three of these improved algorithms, explain the idea and solution behind them, and provide their implementation. The first is Double DQN or DDQN, which deals with the over-estimation problem we mentioned in the DQN algorithm. The second is Dueling DQN, which decouples the Q-value function in a state value function and an action-state advantage value function. The third is n-step DQN, an old idea taken from TD algorithms, which spaces the step length between one-step learning and MC learning.

Double DQN

The over...

Summary

In this chapter, we went further into RL algorithms and talked about how these can be combined with function approximators so that RL can be applied to a broader variety of problems. Specifically, we described how function approximation and deep neural networks can be used in Q-learning and the instabilities that derive from it. We demonstrated that, in practice, deep neural networks cannot be combined with Q-learning without any modifications.

The first algorithm that was able to use deep neural networks in combination with Q-learning was DQN. It integrates two key ingredients to stabilize learning and control complex tasks such as Atari 2600 games. The two ingredients are the replay buffer, which is used to store the old experience, and a separate target network, which is updated less frequently than the online network. The former is employed to exploit the off-policy...

Questions

  1. What is the cause of the deadly triad problem?
  2. How does DQN overcome instabilities?
  3. What's the moving target problem?
  4. How is the moving target problem mitigated in DQN?
  5. What's the optimization procedure that's used in DQN?
  6. What's the definition of a state-action advantage value function?

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Reinforcement Learning Algorithms with Python
Published in: Oct 2019Publisher: PacktISBN-13: 9781789131116
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Andrea Lonza

Andrea Lonza is a deep learning engineer with a great passion for artificial intelligence and a desire to create machines that act intelligently. He has acquired expert knowledge in reinforcement learning, natural language processing, and computer vision through academic and industrial machine learning projects. He has also participated in several Kaggle competitions, achieving high results. He is always looking for compelling challenges and loves to prove himself.
Read more about Andrea Lonza