Reader small image

You're reading from  Mastering Reinforcement Learning with Python

Product typeBook
Published inDec 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781838644147
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Enes Bilgin
Enes Bilgin
author image
Enes Bilgin

Enes Bilgin works as a senior AI engineer and a tech lead in Microsoft's Autonomous Systems division. He is a machine learning and operations research practitioner and researcher with experience in building production systems and models for top tech companies using Python, TensorFlow, and Ray/RLlib. He holds an M.S. and a Ph.D. in systems engineering from Boston University and a B.S. in industrial engineering from Bilkent University. In the past, he has worked as a research scientist at Amazon and as an operations research scientist at AMD. He also held adjunct faculty positions at the McCombs School of Business at the University of Texas at Austin and at the Ingram School of Engineering at Texas State University.
Read more about Enes Bilgin

Right arrow

What this book covers

Chapter 1, Introduction to Reinforcement Learning, provides an introduction to RL, presents motivating examples and success stories, and looks at RL applications in industry. It then gives some fundamental definitions to refresh your mind on RL concepts and concludes with a section on software and hardware setup.

Chapter 2, Multi-Armed Bandits, covers a rather simple RL setting, bandit problems without context, which, on the other hand, has tremendous applications in industry as an alternative to the traditional A/B testing. The chapter also describes a very fundamental RL trade-off: exploration versus exploitation. It then presents three approaches to tackle this trade-off and compares them against A/B testing.

Chapter 3, Contextual Bandits, takes the discussion on multi-armed bandits to an advanced level by adding context to the decision-making process and involving deep neural networks in decision making. We adapt a real dataset from the U.S. Census to an online advertising problem. We conclude the chapter with a section on the applications of bandit problems in industry and business.

Chapter 4, Makings of a Markov Decision Process, builds the mathematical theory behind sequential decision processes that are solved using RL. We start with Markov chains, where we describe types of states, ergodicity, transitionary, and steady-state behavior. Then we go into Markov reward and decision processes. Along the way, we introduce return, discount, policy, value functions, and Bellman optimality, which are key concepts in RL theory that will be frequently referred to in later chapters. We conclude the chapter with a discussion on partially observed Markov decision processes. Throughout the chapter, we use a grid world example to illustrate the concepts.

Chapter 5, Solving the Reinforcement Learning Problem, presents and compares dynamic programming, Monte Carlo, and temporal-difference methods, which are fundamental to understanding how to solve a Markov decision process. Key approaches such as policy evaluation, policy iteration, and value iteration are introduced and illustrated. Throughout the chapter, we solve an example inventory replenishment problem. Along the way, we motivate the reader for deep RL methods. We conclude the chapter with a discussion on the importance of simulation in reinforcement learning.

Chapter 6, Deep Q-Learning at Scale, starts with a discussion on why it is challenging to use deep neural networks in reinforcement learning and how modern deep Q-learning addresses those challenges. After a thorough coverage of scalable deep Q-learning methods, we introduce Ray, a distributed computing framework, with which we implement a parallelized deep Q-learning variant. We finish the chapter by introducing RLlib, Ray's own scalable RL library.

Chapter 7, Policy-Based Methods, introduces another important class of RL approaches: policy-based methods. You will first learn how they are different than Q-learning and why they are needed. As we build the theory for contemporary policy-based methods, we also show how you can use RLlib for their application to a sample problem.

Chapter 8, Model-Based Methods, presents how learning a model of the environment can help an RL agent to plan its actions efficiently. In the chapter, we implement and use variants of cross-entropy methods and present Dyna, an RL framework that combines model-free and model-based approaches.

Chapter 9, Multi-Agent Reinforcement Learning, increases gears, goes into multi-agent settings and present the challenges that come with it. In the chapter, we train tic-tac-toe agents through self-play, which you also can play against for fun.

Chapter 10, Introducing Machine Teaching, introduces an emerging concept in RL that focuses on leveraging the subject matter expertise of a human "teacher" to make learning easy for RL agents. We present how reward function engineering, curriculum learning, demonstration learning, and action masking can help with training autonomous agents effectively.

Chapter 11, Achieving Generalization and Overcoming Partial Observability discusses why it is important to be concerned about generalization capabilities of trained RL policies for successful real-world implementations. To this end, the chapter focuses on simulation-to-real gap, connects generalization and partial observability, and introduces domain randomization and memory mechanisms. We also present the CoinRun environment and results on how traditional regularization methods can also help with generalization in RL.

Chapter 12, Meta-Reinforcement Learning, introduces approaches that allow an RL agent to adapt to a new environment once it is deployed for its task. This is one of the most important research directions towards achieving resilient autonomy through RL.

Chapter 13, Exploring Advanced Topics, brings you up to speed with some of the most recent developments in RL, including state-of-the-art distributed RL, SEED RL, approaches that cracked all the Atari benchmarks, Agent57, and RL without simulation, offline RL.

Chapter 14, Solving Robot Learning, goes into implementations of the methods covered in the earlier chapters by training a robot hand to grasp objects using manual and automated curriculum learning in PyBullet, a famous physics simulation in Python.

Chapter 15, Supply Chain Management, gives you hands-on experience in modeling and solving an inventory replenishment problem. Along the way, we perform hyperparameter tuning for our RL agent. The chapter concludes with a discussion on how RL can be applied to vehicle routing problems.

Chapter 16, Personalization, Marketing, and Finance goes beyond bandit models for personalization and discusses a news recommendation problem while introducing dueling bandit gradient descent and action embeddings along the way. The chapter also discusses marketing and finance applications of RL and introduces the TensorTrade library for the latter.

Chapter 17, Smart City and Cybersecurity starts with solving a traffic light contsrol scenario as a multi-agent RL problem using the Flow framework. It then describes how RL can be applied to two other problems: providing ancillary service to a power grid and discovering cyberattacks in it.

Chapter 18, Challenges and Future Directions in Reinforcement Learning wraps up the book by recapping the challenges in RL and connects them to the recent developments and research in the field. Finally, we present practical suggestions for the reader who want to further deepen their RL expertise.

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Mastering Reinforcement Learning with Python
Published in: Dec 2020Publisher: PacktISBN-13: 9781838644147

Author (1)

author image
Enes Bilgin

Enes Bilgin works as a senior AI engineer and a tech lead in Microsoft's Autonomous Systems division. He is a machine learning and operations research practitioner and researcher with experience in building production systems and models for top tech companies using Python, TensorFlow, and Ray/RLlib. He holds an M.S. and a Ph.D. in systems engineering from Boston University and a B.S. in industrial engineering from Bilkent University. In the past, he has worked as a research scientist at Amazon and as an operations research scientist at AMD. He also held adjunct faculty positions at the McCombs School of Business at the University of Texas at Austin and at the Ingram School of Engineering at Texas State University.
Read more about Enes Bilgin