You're reading from Deep Reinforcement Learning Hands-On. - Second Edition

Product type Book

Published in Jan 2020

Publisher Packt

ISBN-13 9781838826994

Pages 826 pages

Edition 2nd Edition

Languages

Python

Concepts

Deep Reinforcement Learning

Author (1):

Maxim Lapan

Table of Contents (28) Chapters

Preface

1. What Is Reinforcement Learning?

2. OpenAI Gym

3. Deep Learning with PyTorch

4. The Cross-Entropy Method

5. Tabular Learning and the Bellman Equation

6. Deep Q-Networks

7. Higher-Level RL Libraries

8. DQN Extensions

9. Ways to Speed up RL

10. Stocks Trading Using RL

11. Policy Gradients – an Alternative

12. The Actor-Critic Method

13. Asynchronous Advantage Actor-Critic

14. Training Chatbots with RL

15. The TextWorld Environment

16. Web Navigation

17. Continuous Action Space

18. RL in Robotics

19. Trust Regions – PPO, TRPO, ACKTR, and SAC

20. Black-Box Optimization in RL

21. Advanced Exploration

22. Beyond Model-Free – Imagination

23. AlphaGo Zero

24. RL in Discrete Optimization

25. Multi-agent RL

26. Other Books You May Enjoy

27. Index

The TextWorld Environment

In the previous chapter, you saw how reinforcement learning (RL) methods can be applied to natural language processing (NLP) problems, in particular, to improve the chatbot training process. Continuing our journey into the NLP domain, in this chapter, we will now use RL to solve text-based interactive fiction games, using the environment published by Microsoft Research called TextWorld.

In this chapter, we will:

Cover a brief historical overview of interactive fiction
Study the TextWorld environment
Implement the simple baseline deep Q-network (DQN) method, and then try to improve it by implementing a command generator using recurrent neural networks (RNNs). This will provide a good illustration of how RL can be applied to complicated environments with a rich observation space

Interactive fiction

As you have already seen, computer games are not only entertaining for humans, but also provide challenging problems for RL researchers due to the complicated observations and action spaces, long sequences of decisions to be made during the gameplay, and natural reward systems.

Arcade games like Atari 2600 are just one of many genres that the gaming industry has. From a historical perspective, the Atari 2600 platform peaked in popularity during the late 70s and early 80s. Then followed the era of Z80 and clones, which evolved into the period of the PC-compatible platforms and consoles we have now.

Over time, computer games continually become more complex, colorful, and detailed in terms of graphics, which inevitably increases hardware requirements. This trend makes it harder for RL researchers and practitioners to apply RL methods to the more recent games; for example, almost everybody can train an RL agent to solve an Atari game, but for StarCraft II, DeepMind...

The environment

At the time of writing, the TextWorld environment supports only Linux and macOS platforms and internally relies on the Inform 7 system (http://inform7.com). There are two webpages for the project: one is the Microsoft Research webpage: https://www.microsoft.com/en-us/research/project/textworld/, which contains general information about the environment, and the another is on GitHub (https://github.com/microsoft/TextWorld) and describes installation and usage. Let's start with installation.

Installation

The installation instructions suggest that you can install the package by just typing pip install textworld in your Python virtual environment, but at the time of writing, this step is broken by a changed URL for the Inform 7 engine. Hopefully, this will be fixed on the next TextWorld release, but if you experience any issues, you can set up a version that I've tested for this example by running pip install git+https://github.com/microsoft/TextWorld@f1ac489fefeb6a48684ed1f89422b84b7b4a6e4b...

Baseline DQN

In this problem, the major challenge lies in inconvenient observation and action spaces. Text sequences might be problematic on their own, as we discussed in the previous chapter. The variability of sequence lengths might cause vanishing and exploding gradients in RNNs, slow training, and convergence issues. In addition to that, our TextWorld environment provides us with several such sequences that we need to handle separately. Our scene description string, for example, might have a completely different meaning to the agent than the inventory string, which describes our possessions.

As mentioned, another obstacle is the action space. As you have seen in the previous section, TextWorld might provide us with a list of commands that we can execute in every state. It significantly reduces the action space we need to choose from, but there are other complications. One of them is that the list of admissible commands changes from state to state (as different locations might...

The command generation model

In this part of the chapter, we will extend our baseline model with an extra submodule that will generate commands that our DQN network should evaluate. In the baseline model, commands were taken from the admissible commands list, which was taken from the extended information from the environment. But maybe we can generate commands from the observation using the same techniques that we covered in the previous chapter.

The architecture of our new model is shown in Figure 15.12.

Figure 15.12: The architecture of the DQN with command generation

In comparison with Figure 15.3 from earlier in the chapter, there are several changes here. First of all, our preprocessor pipeline no longer accepts a command sequence in the input. The second difference is that the preprocessor's output now not only gets passed to the DQN model, but it also forks to the "Commands generator" submodule.

The responsibility of this new submodule is to produce...

Summary

In this chapter, you have seen how DQN can be applied to interactive fiction games, which is an interesting and challenging domain at the intersection of RL and NLP. You learned how to handle complex textual data with NLP tools and experimented with fun and challenging interactive fiction environments, with lots of opportunities for future experimentations.

In the next chapter, we will continue our exploration of "RL in the wild" and check the applicability of methods in web automation.