Reader small image

You're reading from  Generative AI with Python and TensorFlow 2

Product typeBook
Published inApr 2021
PublisherPackt
ISBN-139781800200883
Edition1st Edition
Right arrow

Play Video Games with Generative AI: GAIL

In the preceding chapters, we have seen how we can use generative AI to produce both simple (restricted Boltzmann machines) and sophisticated (variational autoencoders, generative adversarial models) images, musical notes (MuseGAN), and novel text (BERT, GPT-3).

In all these prior examples, we have focused on generating complex data using deep neural networks. However, neural networks can also be used to learn rules for how an entity (such as a video game character or a vehicle) should respond to an environment to optimize a reward; as we will describe in this chapter, this field is known as reinforcement learning (RL). While RL is not intrinsically tied to either deep learning or generative AI, the union of these fields has created a powerful set of techniques for optimizing complex behavioral functions.

In this chapter, we will show you how to apply GANs to learn optimal policies for different figures to navigate within...

Reinforcement learning: Actions, agents, spaces, policies, and rewards

Recall from Chapter 1, An Introduction to Generative AI: "Drawing" Data from Models, that most discriminative AI examples involve applying a continuous or discrete label to a piece of data. In the image examples we have discussed in this book, this could be applying a deep neural network to determine the digit represented by one of the MNIST images, or whether a CIFAR-10 image contains a horse. In these cases, the model produces a single output, a prediction with minimal error. In reinforcement learning, we also want to make such point predictions, but over many steps, and to optimize the total error over repeated uses.

Figure 12.1: Atari video game examples1

As a concrete example, consider a video game with a player controlling a spaceship to shoot down alien vessels. The spaceship navigated by the player in this example is the agent; the set of pixels on the screen at any point in...

Running GAIL on PyBullet Gym

For our code example in this chapter, we will train a virtual agent to navigate a simulated environment – in many RL papers, this environment is simulated using the Mujoco framework (http://www.mujoco.org/). Mujoco stands for Multi joint dynamics with contacts – it is a physics "engine" that allows you to create an artificial agent (such as a pendulum or bipedal humanoid), where a "reward" might be an ability to move through the simulated environment.

While it is a popular framework used for developing reinforcement learning benchmarks, such as by the research group OpenAI (see https://github.com/openai/baselines for some of these implementations), it is also closed source and requires a license for use. For our experiments, we will use PyBullet Gymperium (https://github.com/benelot/pybullet-gym), a drop-in replacement for Mujoco that allows us to run a physics simulator and import agents trained in Mujoco...

Summary

In this chapter, we explored another application of generative models in reinforcement learning. First, we described how RL allows us to learn the behavior of an agent in an environment, and how deep neural networks allowed Q-learning to scale to complex environments with extremely large observation and action spaces.

We then discussed inverse reinforcement learning, and how it varies from RL by "inverting" the problem and attempting to "learn by example." We discussed how the problem of trying to compare a proposed and expert network can be scored using entropy, and how a particular, regularized version of this entropy loss has a similar form as the GAN problem we studied in Chapter 6, called GAIL (Generative Adversarial Imitation Learning). We saw how GAIL is but one of many possible formulations of this general idea, using different loss functions. Finally, we implemented GAIL using the bullet-gym physics simulator and OpenAI Gym.

...

References

  1. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with Deep Reinforcement Learning. arXiv. https://arxiv.org/abs/1312.5602
  2. Bareketain, P. (2019, March 10). Understanding Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning. Medium. https://medium.com/@parnianbrk/understanding-stabilising-experience-replay-for-deep-multi-agent-reinforcement-learning-84b4c04886b5
  3. Wikipedia user waldoalverez, under a CC BY-SA 4.0 license (https://creativecommons.org/licenses/by-sa/4.0/).
  4. Amit, R., Meir, R., & Ciosek, K. (2020). Discount Factor as a Regularizer in Reinforcement Learning. Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, PMLR 119, 2020. http://proceedings.mlr.press/v119/amit20a/amit20a.pdf
  5. Matiisen, T. (2015, December 19). Demystifying Deep Reinforcement Learning. Computational Neuroscience Lab. https://neuro.cs...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Generative AI with Python and TensorFlow 2
Published in: Apr 2021Publisher: PacktISBN-13: 9781800200883
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime