Reader small image

You're reading from  Mastering Reinforcement Learning with Python

Product typeBook
Published inDec 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781838644147
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Enes Bilgin
Enes Bilgin
author image
Enes Bilgin

Enes Bilgin works as a senior AI engineer and a tech lead in Microsoft's Autonomous Systems division. He is a machine learning and operations research practitioner and researcher with experience in building production systems and models for top tech companies using Python, TensorFlow, and Ray/RLlib. He holds an M.S. and a Ph.D. in systems engineering from Boston University and a B.S. in industrial engineering from Bilkent University. In the past, he has worked as a research scientist at Amazon and as an operations research scientist at AMD. He also held adjunct faculty positions at the McCombs School of Business at the University of Texas at Austin and at the Ingram School of Engineering at Texas State University.
Read more about Enes Bilgin

Right arrow

Chapter 10: Introducing Machine Teaching

The great excitement about reinforcement learning is, to a significant extent, due to its similarities to human learning: An RL agent learns from experience. This is also why many consider it as the path to artificial general intelligence. On the other hand, if you think about it, reducing human learning to just trial and error would be a gross underestimation. We don't discover everything we know, in science, art, engineering, from scratch when we are born! Instead, we build on the knowledge and civilization that have evolved over thousands of years! We transfer this knowledge among us through various, structured or unstructured forms of teaching. This capability makes it possible for us to gain skills relatively quickly and advance the common knowledge.

When we think from this perspective, what we are doing with machine learning looks quite inefficient: We dump bunch of raw data to algorithms, or expose them to an environment in the...

Introduction to machine teaching

Machine teaching is the name of a general approach and collection of methods to efficiently transfer knowledge from a teacher, a subject matter expert, to a machine learning model. With that, we aim to make the training much more efficient, and even feasible for tasks that would be impossible to achieve otherwise. Let's talk about what MT is in more detail, why we need it, and what its components are.

Understanding the need for machine teaching

Did you know that the United States is expected to spend about 1.25 trillion dollars, around 5% of its gross domestic product, on education in 2021? This should speak to the existential significance of education for our society and civilization (and many would argue that we should spend more).

We humans have built such a giant system, which we expect people to spend many years in, because we don't expect ourselves to decipher the alphabet or math on our own. And it is not just that, we continuously...

Engineering the reward function

Reward function engineering means crafting the reward dynamics of the environment in an RL problem so that it reflects the objective you have in your mind for your agent and leads the agent to that objective. How you define your reward function might make the training easy, difficult, or even impossible for the agent. Therefore, in most RL projects, a significant effort is dedicated to designing the reward. In this section, we cover some specific cases where you will need to do it and how, then provide a specific example, and finally discuss the challenges that come with engineering the reward function.

When to engineer the reward function

Multiple times in the book, including the previous section when we discussed concepts, we mentioned how sparse rewards pose a problem for learning. One way of dealing with this is to shape the reward to make it non-sparse. Sparse reward case, therefore, is a common reason of why we may want to do reward function...

Curriculum learning

When we learn a new skill, we start with basics. Bouncing and dribbling are the first steps while learning basketball. Doing alley-oops is not something to try to teach in the first lesson. One needs to gradually proceed to advanced lessons, after feeling comfortable with the earlier ones. This idea of following a curriculum, from basics to advanced levels, is the basis of the whole education system. The question is whether machine learning models can benefit from the same approach. It turns out that they can!

In the context of RL, when we create a curriculum, we similarly start with "easy" environment configurations for the agent. This way the agent can get an idea about what success means early on, rather than spending a lot of time by blindly exploring the environment with the hope of stumbling upon success. We then gradually increase the difficulty if we observe the agent is exceeding a certain reward threshold. Each of these difficulty levels are...

Warm starts with demonstrations

A popular technique to demonstrate the agent a way to success is to train it on data that is coming from a reasonably successful controller, such as humans. In RLlib, this can be done via saving the human play data from the mountain car environment:

Chapter10/mcar_demo.py

        ...
        new_obs, r, done, info = env.step(a)
        # Build the batch
        batch_builder.add_values(
            t=t,
            eps_id=eps_id,
            agent_index=0,
            obs=prep.transform(obs),
          ...

Action masking

One final machine teaching approach we will use is action masking. With that, we can prevent the agent to take certain actions in certain steps based on conditions we define. For mountain car, assume that we have this intuition of building momentum before trying to climb the hill. So, we want the agent to apply force to left if the car is already moving left around the valley. So, for these conditions, we will mask all the actions except left.

    def update_avail_actions(self):
        self.action_mask = np.array([1.0] * \
                           self.action_space.n)
        pos, vel = self.wrapped.unwrapped.state
        # 0: left, 1: no action, 2: right
       ...

Summary

In this chapter, we have covered an emerging paradigm in artificial intelligence, machine teaching, which is about effectively conveying the expertise of a subject matter expert (teacher) to machine learning model training. We discussed how this is similar to how humans are educated: By building on others' knowledge. The advantage of this approach is that it greatly increases data efficiency in machine learning, and, in some cases, makes learning possible that would have been impossible without a teacher. We discussed various methods in this paradigm, including reward function engineering, curriculum learning, demonstration learning, action masking, and concept networks. We observed how some of these methods improved vanilla use of Ape-X DQN significantly.

Besides its benefits, machine teaching also has some challenges and potential downsides: First, it is usually non-trivial to come up with good reward shaping, curriculum, set of action masking conditions etc. This...

References

  1. Gudimella, Aditya, et al. Deep Reinforcement Learning for Dexterous Manipulation with Concept Networks. arXiv.org, http://arxiv.org/abs/1709.06977.
  2. Bonsai. (2017). Deep Reinforcement Learning Models: Tips & Tricks for Writing Reward Functions. Medium. URL: https://bit.ly/33eTjBv
  3. Bonsai. (2017). Writing Great Reward Functions – Bonsai. YouTube. URL: https://youtu.be/0R3PnJEisqk
  4. Weng, L. (2020). Curriculum for Reinforcement Learning. Lil'Log. URL: https://bit.ly/39foJvE
  5. OpenAI. (2016). Faulty Reward Functions in the Wild. OpenAI Blog. URL: https://openai.com/blog/faulty-reward-functions/
  6. Irpan, A. (2018). Deep Reinforcement Learning Doesn't Work Yet. Sorta Insightful. URL: https://www.alexirpan.com/2018/02/14/rl-hard.html
  7. Heess, Nicolas, et al. (2017). Emergence of Locomotion Behaviours in Rich Environments. arXiv.org. URL:http://arxiv.org/abs/1707.02286
  8. Badnava, Babak, and Nasser Mozayani. (2019). A New Potential-Based...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Reinforcement Learning with Python
Published in: Dec 2020Publisher: PacktISBN-13: 9781838644147
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Enes Bilgin

Enes Bilgin works as a senior AI engineer and a tech lead in Microsoft's Autonomous Systems division. He is a machine learning and operations research practitioner and researcher with experience in building production systems and models for top tech companies using Python, TensorFlow, and Ray/RLlib. He holds an M.S. and a Ph.D. in systems engineering from Boston University and a B.S. in industrial engineering from Bilkent University. In the past, he has worked as a research scientist at Amazon and as an operations research scientist at AMD. He also held adjunct faculty positions at the McCombs School of Business at the University of Texas at Austin and at the Ingram School of Engineering at Texas State University.
Read more about Enes Bilgin