Reader small image

You're reading from  Deep Reinforcement Learning Hands-On. - Second Edition

Product typeBook
Published inJan 2020
Reading LevelIntermediate
PublisherPackt
ISBN-139781838826994
Edition2nd Edition
Languages
Right arrow
Author (1)
Maxim Lapan
Maxim Lapan
author image
Maxim Lapan

Maxim has been working as a software developer for more than 20 years and was involved in various areas: distributed scientific computing, distributed systems and big data processing. Since 2014 he is actively using machine and deep learning to solve practical industrial tasks, such as NLP problems, RL for web crawling and web pages analysis. He has been living in Germany with his family.
Read more about Maxim Lapan

Right arrow

Web Navigation

We will now take a look at another practical application of reinforcement learning (RL): web navigation and browser automation.

In this chapter, we will:

  • Discuss web navigation in general and the practical application of browser automation
  • Explore how web navigation can be solved with an RL approach
  • Take a deep look at one very interesting, but commonly overlooked and a bit abandoned, RL benchmark that was implemented by OpenAI, called Mini World of Bits (MiniWoB)

Web navigation

When the web was invented, it started as several text-only web pages interconnected by hyperlinks. If you're curious, here is the home of the first web page: http://info.cern.ch/, with text and links. The only thing you can do is read the text and click on links to go between pages.

Several years later, in 1995, the Internet Engineering Task Force (IETF) published the HTML 2.0 specification, which had a lot of extensions to the original version invented by Tim Berners-Lee. Among these extensions were forms and form elements that allowed web page authors to add activity to their websites. Users could enter and change text, toggle checkboxes, select drop-down lists, and push buttons. The set of controls was similar to a minimalistic set of graphical user interface (GUI) application controls. The difference was that this happened inside the browser's window, and both the data and user interface (UI) controls that users interacted with were defined by the server...

OpenAI Universe

The core idea underlying OpenAI Universe (available at https://github.com/openai/universe) is to wrap general GUI applications into an RL environment using the same core classes provided by Gym. To achieve this, it uses the VNC protocol to connect with the VNC server running inside the Docker (a standard method running lightweight containers) container, exposing the mouse and keyboard actions to the RL agent and providing the GUI application image as an observation.

The reward is provided by an external small rewarder daemon running inside the same container and giving the agent the scalar reward value based on this rewarder judgement. It is possible to launch several containers locally, or over the network, to gather episodes data in parallel, in the same way that we started several Atari emulators to increase the convergence of the asynchronous advantage actor-critic (A3C) method in Chapter 13, Asynchronous Advantage Actor-Critic. The architecture is illustrated...

The simple clicking approach

As the first demo, let's implement a simple A3C agent that decides where it should click given the image observation. This approach can solve only a small subset of the full MiniWoB suite, and we will discuss restrictions of this approach later. For now, it will allow us to get a better understanding of the problem.

As with the previous chapter, due to its size, I won't put the complete source code here. We will focus on the most important functions and I will provide the rest as an overview. The complete source code is available in the GitHub repository.

Grid actions

When we talked about Universe's architecture and organization, it was mentioned that the richness and flexibility of the action space creates a lot of challenges for the RL agent. MiniWoB's active area inside the browser is just 160×210 (exactly the same dimension that the Atari emulator has), but even with such a small area, our agent could be asked to move...

Human demonstrations

The idea behind demonstrations is simple: to help our agent to discover the best way to solve the task, we show it some examples of actions that we think are required for the problem. Those examples could be not the best solution or not 100% accurate, but they should be good enough to show the agent promising directions to explore.

In fact, this is a very natural thing to do, as all human learning is based on some prior examples given by a teacher in class, parents, or other people. Those examples could be in a written form (for example, recipe books) or given as demonstrations that you need to repeat several times to get right (for example, dance classes). Such forms of training are much more effective than random searches. Just imagine how complicated and lengthy it would be to learn how to clean your teeth by trial and error alone. Of course, there is a danger from learning how to follow demonstrations, which could be wrong or not the most efficient way to...

Adding text descriptions

As the last example of this chapter, we will add text descriptions of the problem into observations of our model. I have already mentioned that some problems contain vital information given in a text description, like the index of tabs needed to be clicked or the list of entries that the agent needs to check. The same information is shown on top of the image observation, but pixels are not always the best representation of simple text.

To take this text into account, we need to extend our model's input from an image only to an image and text data. We worked with text in the previous chapter, so a recurrent neural network (RNN) is quite an obvious choice (maybe not the best for such a toy problem, but it is flexible and scalable).

Implementation

I'm not going to cover this example in detail but will just focus on the most important points of the implementation. (The whole code is in Chapter16/wob_click_mm_train.py.) In comparison to our clicker...

Things to try

In this chapter, we only started playing with MiniWoB by touching upon the six easiest environments from the full set of 80 problems, so there is plenty of uncharted territory ahead. If you want to practice, there are several items you can experiment with:

  • Testing the robustness of demonstrations to noisy clicks.
  • Implementing training of the value head of A3C based on demonstration data.
  • Implementing more sophisticated mouse control, like move mouse N pixels left/right/top/bottom.
  • Using some pretrained optical character recognition (OCR) network (or training your own!) to extract text information from the observations.
  • Taking other problems and trying to solve them. There are some quite tricky and fun problems, like sort items using drag-n-drop or repeat the pattern using checkboxes.
  • Checking MiniWoB++ (https://stanfordnlp.github.io/miniwob-plusplus/) from the Stanford NLP Group. It will require learning and writing new wrappers; as mentioned...

Summary

In this chapter, you saw the practical application of RL methods for browser automation and used the MiniWoB benchmark from OpenAI. This chapter concludes part three of the book. The next part will be devoted to more complicated and recent methods related to continuous action spaces, non-gradient methods, and other more advanced methods of RL.

In the next chapter, we will take a look at continuous control problems, which are an important subfield of RL, both theoretically and practically.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Deep Reinforcement Learning Hands-On. - Second Edition
Published in: Jan 2020Publisher: PacktISBN-13: 9781838826994
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Maxim Lapan

Maxim has been working as a software developer for more than 20 years and was involved in various areas: distributed scientific computing, distributed systems and big data processing. Since 2014 he is actively using machine and deep learning to solve practical industrial tasks, such as NLP problems, RL for web crawling and web pages analysis. He has been living in Germany with his family.
Read more about Maxim Lapan