Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Python Machine Learning - Third Edition

You're reading from  Python Machine Learning - Third Edition

Product type Book
Published in Dec 2019
Publisher Packt
ISBN-13 9781789955750
Pages 772 pages
Edition 3rd Edition
Languages
Authors (2):
Sebastian Raschka Sebastian Raschka
Profile icon Sebastian Raschka
Vahid Mirjalili Vahid Mirjalili
Profile icon Vahid Mirjalili
View More author details

Table of Contents (21) Chapters

Preface 1. Giving Computers the Ability to Learn from Data 2. Training Simple Machine Learning Algorithms for Classification 3. A Tour of Machine Learning Classifiers Using scikit-learn 4. Building Good Training Datasets – Data Preprocessing 5. Compressing Data via Dimensionality Reduction 6. Learning Best Practices for Model Evaluation and Hyperparameter Tuning 7. Combining Different Models for Ensemble Learning 8. Applying Machine Learning to Sentiment Analysis 9. Embedding a Machine Learning Model into a Web Application 10. Predicting Continuous Target Variables with Regression Analysis 11. Working with Unlabeled Data – Clustering Analysis 12. Implementing a Multilayer Artificial Neural Network from Scratch 13. Parallelizing Neural Network Training with TensorFlow 14. Going Deeper – The Mechanics of TensorFlow 15. Classifying Images with Deep Convolutional Neural Networks 16. Modeling Sequential Data Using Recurrent Neural Networks 17. Generative Adversarial Networks for Synthesizing New Data 18. Reinforcement Learning for Decision Making in Complex Environments 19. Other Books You May Enjoy 20. Index

Reinforcement Learning for Decision Making in Complex Environments

In the previous chapters, we focused on supervised and unsupervised machine learning. We also learned how to leverage artificial neural networks and deep learning to tackle problems encountered with these types of machine learning. As you'll recall, supervised learning focuses on predicting a category label or continuous value from a given input feature vector. Unsupervised learning focuses on extracting patterns from data, making it useful for data compression (Chapter 5, Compressing Data via Dimensionality Reduction), clustering (Chapter 11, Working with Unlabeled Data – Clustering Analysis), or approximating the training set distribution for generating new data (Chapter 17, Generative Adversarial Networks for Synthesizing New Data).

In this chapter, we turn our attention to a separate category of machine learning, reinforcement learning (RL), which is different from the previous categories as...

Introduction – learning from experience

In this section, we will first introduce the concept of RL as a branch of machine learning and see its major differences compared with other tasks of machine learning. After that, we will cover the fundamental components of an RL system. Then, we will see the RL mathematical formulation based on the Markov decision process.

Understanding reinforcement learning

Until this point, this book has primarily focused on supervised and unsupervised learning. Recall that in supervised learning, we rely on labeled training examples, which are provided by a supervisor or a human expert, and the goal is to train a model that can generalize well to unseen, unlabeled test examples. This means that the supervised learning model should learn to assign the same labels or values to a given input example as the supervisor human expert. On the other hand, in unsupervised learning, the goal is to learn or capture the underlying structure of a dataset...

The theoretical foundations of RL

Before we jump into some practical examples and start training an RL model, which we will be doing later in this chapter, let's first understand some of the theoretical foundations of RL. The following sections will begin by first examining the mathematical formulation of Markov decision processes, episodic versus continuing tasks, some key RL terminology, and dynamic programming using the Bellman equation. Let's start with Markov decision processes.

Markov decision processes

In general, the type of problems that RL deals with are typically formulated as Markov decision processes (MDPs). The standard approach for solving MDP problems is by using dynamic programming, but RL offers some key advantages over dynamic programming.

Dynamic programming

Dynamic programming refers to a set of computer algorithms and programming methods that was developed by Richard Bellman in the 1950s. In a sense, dynamic programming is about...

Reinforcement learning algorithms

In this section, we will cover a series of learning algorithms. We will start with dynamic programming, which assumes that the transition dynamics (or the environment dynamics, that is, , are known. However, in most RL problems, this is not the case. To work around the unknown environment dynamics, RL techniques were developed that learn through interacting with the environment. These techniques include MC, TD learning, and the increasingly popular Q-learning and deep Q-learning approaches. The following figure describes the course of advancing RL algorithms, from dynamic programming to Q-learning:

In the following sections of this chapter, we will step through each of these RL algorithms. We will start with dynamic programming, before moving on to MC, and finally on to TD and its branches of on-policy SARSA (state–action–reward–state–action) and off-policy Q-learning. We will also move into deep Q-learning while...

Implementing our first RL algorithm

In this section, we will cover the implementation of the Q-learning algorithm to solve the grid world problem. To do this, we use the OpenAI Gym toolkit.

Introducing the OpenAI Gym toolkit

OpenAI Gym is a specialized toolkit for facilitating the development of RL models. OpenAI Gym comes with several predefined environments. Some basic examples are CartPole and MountainCar, where the tasks are to balance a pole and to move a car up a hill, respectively, as the names suggest. There are also many advanced robotics environments for training a robot to fetch, push, and reach for items on a bench or training a robotic hand to orient blocks, balls, or pens. Moreover, OpenAI Gym provides a convenient, unified framework for developing new environments. More information can be found on its official website: https://gym.openai.com/.

To follow the OpenAI Gym code examples in the next sections, you need to install the gym library, which can be easily...

Chapter and book summary

In this chapter, we covered the essential concepts in RL, starting from the very foundations, and how RL can support decision making in complex environments.

We learned about agent-environment interactions and Markov decision processes (MDP), and we considered three main approaches for solving RL problems: dynamic programming, MC learning, and TD learning. We discussed that the dynamic programming algorithm assumes that the full knowledge of environment dynamics is available, an assumption that is not typically true for most real-world problems.

Then, we saw how the MC- and TD-based algorithms learn by allowing an agent to interact with the environment and generate a simulated experience. After discussing the underlying theory, we implemented the Q-learning algorithm as an off-policy subcategory of the TD algorithm for solving the grid world example. Finally, we covered the concept of function approximation and deep Q-learning in particular, which can...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Python Machine Learning - Third Edition
Published in: Dec 2019 Publisher: Packt ISBN-13: 9781789955750
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}