Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Deep Reinforcement Learning Hands-On. - Second Edition

You're reading from  Deep Reinforcement Learning Hands-On. - Second Edition

Product type Book
Published in Jan 2020
Publisher Packt
ISBN-13 9781838826994
Pages 826 pages
Edition 2nd Edition
Languages
Author (1):
Maxim Lapan Maxim Lapan
Profile icon Maxim Lapan

Table of Contents (28) Chapters

Preface 1. What Is Reinforcement Learning? 2. OpenAI Gym 3. Deep Learning with PyTorch 4. The Cross-Entropy Method 5. Tabular Learning and the Bellman Equation 6. Deep Q-Networks 7. Higher-Level RL Libraries 8. DQN Extensions 9. Ways to Speed up RL 10. Stocks Trading Using RL 11. Policy Gradients – an Alternative 12. The Actor-Critic Method 13. Asynchronous Advantage Actor-Critic 14. Training Chatbots with RL 15. The TextWorld Environment 16. Web Navigation 17. Continuous Action Space 18. RL in Robotics 19. Trust Regions – PPO, TRPO, ACKTR, and SAC 20. Black-Box Optimization in RL 21. Advanced Exploration 22. Beyond Model-Free – Imagination 23. AlphaGo Zero 24. RL in Discrete Optimization 25. Multi-agent RL 26. Other Books You May Enjoy
27. Index

The TextWorld Environment

In the previous chapter, you saw how reinforcement learning (RL) methods can be applied to natural language processing (NLP) problems, in particular, to improve the chatbot training process. Continuing our journey into the NLP domain, in this chapter, we will now use RL to solve text-based interactive fiction games, using the environment published by Microsoft Research called TextWorld.

In this chapter, we will:

  • Cover a brief historical overview of interactive fiction
  • Study the TextWorld environment
  • Implement the simple baseline deep Q-network (DQN) method, and then try to improve it by implementing a command generator using recurrent neural networks (RNNs). This will provide a good illustration of how RL can be applied to complicated environments with a rich observation space

Interactive fiction

As you have already seen, computer games are not only entertaining for humans, but also provide challenging problems for RL researchers due to the complicated observations and action spaces, long sequences of decisions to be made during the gameplay, and natural reward systems.

Arcade games like Atari 2600 are just one of many genres that the gaming industry has. From a historical perspective, the Atari 2600 platform peaked in popularity during the late 70s and early 80s. Then followed the era of Z80 and clones, which evolved into the period of the PC-compatible platforms and consoles we have now.

Over time, computer games continually become more complex, colorful, and detailed in terms of graphics, which inevitably increases hardware requirements. This trend makes it harder for RL researchers and practitioners to apply RL methods to the more recent games; for example, almost everybody can train an RL agent to solve an Atari game, but for StarCraft II, DeepMind...

The environment

At the time of writing, the TextWorld environment supports only Linux and macOS platforms and internally relies on the Inform 7 system (http://inform7.com). There are two webpages for the project: one is the Microsoft Research webpage: https://www.microsoft.com/en-us/research/project/textworld/, which contains general information about the environment, and the another is on GitHub (https://github.com/microsoft/TextWorld) and describes installation and usage. Let's start with installation.

Installation

The installation instructions suggest that you can install the package by just typing pip install textworld in your Python virtual environment, but at the time of writing, this step is broken by a changed URL for the Inform 7 engine. Hopefully, this will be fixed on the next TextWorld release, but if you experience any issues, you can set up a version that I've tested for this example by running pip install git+https://github.com/microsoft/TextWorld@f1ac489fefeb6a48684ed1f89422b84b7b4a6e4b...

Baseline DQN

In this problem, the major challenge lies in inconvenient observation and action spaces. Text sequences might be problematic on their own, as we discussed in the previous chapter. The variability of sequence lengths might cause vanishing and exploding gradients in RNNs, slow training, and convergence issues. In addition to that, our TextWorld environment provides us with several such sequences that we need to handle separately. Our scene description string, for example, might have a completely different meaning to the agent than the inventory string, which describes our possessions.

As mentioned, another obstacle is the action space. As you have seen in the previous section, TextWorld might provide us with a list of commands that we can execute in every state. It significantly reduces the action space we need to choose from, but there are other complications. One of them is that the list of admissible commands changes from state to state (as different locations might...

The command generation model

In this part of the chapter, we will extend our baseline model with an extra submodule that will generate commands that our DQN network should evaluate. In the baseline model, commands were taken from the admissible commands list, which was taken from the extended information from the environment. But maybe we can generate commands from the observation using the same techniques that we covered in the previous chapter.

The architecture of our new model is shown in Figure 15.12.

Figure 15.12: The architecture of the DQN with command generation

In comparison with Figure 15.3 from earlier in the chapter, there are several changes here. First of all, our preprocessor pipeline no longer accepts a command sequence in the input. The second difference is that the preprocessor's output now not only gets passed to the DQN model, but it also forks to the "Commands generator" submodule.

The responsibility of this new submodule is to produce...

Summary

In this chapter, you have seen how DQN can be applied to interactive fiction games, which is an interesting and challenging domain at the intersection of RL and NLP. You learned how to handle complex textual data with NLP tools and experimented with fun and challenging interactive fiction environments, with lots of opportunities for future experimentations.

In the next chapter, we will continue our exploration of "RL in the wild" and check the applicability of methods in web automation.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Deep Reinforcement Learning Hands-On. - Second Edition
Published in: Jan 2020 Publisher: Packt ISBN-13: 9781838826994
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}