Reader small image

You're reading from  Hands-On Markov Models with Python

Product typeBook
Published inSep 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788625449
Edition1st Edition
Languages
Concepts
Right arrow
Authors (2):
Ankur Ankan
Ankur Ankan
author image
Ankur Ankan

Ankur Ankan is a BTech graduate from IIT (BHU), Varanasi. He is currently working in the field of data science. He is an open source enthusiast and his major work includes starting pgmpy with four other members. In his free time, he likes to participate in Kaggle competitions.
Read more about Ankur Ankan

Abinash Panda
Abinash Panda
author image
Abinash Panda

Abinash Panda has been a data scientist for more than 4 years. He has worked at multiple early-stage start-ups and helped them build their data analytics pipelines. He loves to munge, plot, and analyze data. He has been a speaker at Python conferences. These days, he is busy co-founding a start-up. He has contributed to books on probabilistic graphical models by Packt Publishing.
Read more about Abinash Panda

View More author details
Right arrow

Markov Decision Process

In this chapter, we will talk about another application of HMMs known as Markov Decision Process (MDP). In the case of MDPs, we introduce a reward to our model, and any sequence of states taken by the process results in a specific reward. We will also introduce the concept of discounts, which will allow us to control how short-sighted or far-sighted we want our agent to be. The goal of the agent would be to maximize the total reward that it can get.

In this chapter, we will be covering the following topics:

  • Reinforcement learning
  • The Markov reward process
  • Markov decision processes
  • Code example

Reinforcement learning

Reinforcement learning is a different paradigm in machine learning where an agent tries to learn to behave optimally in a defined environment by making decisions/actions and observing the outcome of that decision. So, in the case of reinforcement learning, the agent is not really from some given dataset, but rather, by interacting with the environment, the agent tries to learn by observing the effects of its actions. The environment is defined in such a way that the agent gets rewards if its action gets it closer to the goal.

Humans are known to learn in this way. For example, consider a child in front of a fireplace where the child is the agent and the space around the child is the environment. Now, if the child moves its hand towards the fire, it feels the warmth, which feels good and, in a way, the child (or the agent) is rewarded for the action of moving...

The Markov reward process

In the previous section, we gave an introduction to MDP. In this section, we will define the problem statement formally and see the algorithms for solving it.

An MDP is used to define the environment in reinforcement learning and almost all reinforcement learning problems can be defined using an MDP.

For understanding MDPs we need to use the concept of the Markov reward process (MRP). An MRP is a stochastic process which extends a Markov chain by adding a reward rate to each state. We can also define an additional variable to keep a track of the accumulated reward over time. Formally, an MRP is defined by where S is a finite state space, P is the state transition probability function, R is a reward function, and is the discount rate:

where denotes the expectation. And the term Rs here denotes the expected reward at the state s.

In the case of...

Code example

In the following code example we implement a simple MDP:

import numpy as np
import random


class MDP(object):
"""
Defines a Markov Decision Process containing:

- States, s
- Actions, a
- Rewards, r(s,a)
- Transition Matrix, t(s,a,_s)

Includes a set of abstract methods for extended class will
need to implement.

"""

def __init__(self, states=None, actions=None, rewards=None, transitions=None,
discount=.99, tau=.01, epsilon=.01):
"""
Parameters:
-----------
states: 1-D array
The states of the environment

actions: 1-D array
The possible actions by the agent.

rewards: 2-D array
The rewards corresponding to each action at each state of the environment.

transitions: 2-D array
The transition probabilities between the states of the environment.

...

Summary

In this chapter, we started with a short introduction to Reinforcement Learning. We talked about agents, rewards and our learning goals in reinforcement learning. In the next section, we introduced MRP which is one of the main concepts underlying MDP. Having an understanding of MRP we next introduce the concepts of MDP along with a code example.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Markov Models with Python
Published in: Sep 2018Publisher: PacktISBN-13: 9781788625449
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Ankur Ankan

Ankur Ankan is a BTech graduate from IIT (BHU), Varanasi. He is currently working in the field of data science. He is an open source enthusiast and his major work includes starting pgmpy with four other members. In his free time, he likes to participate in Kaggle competitions.
Read more about Ankur Ankan

author image
Abinash Panda

Abinash Panda has been a data scientist for more than 4 years. He has worked at multiple early-stage start-ups and helped them build their data analytics pipelines. He loves to munge, plot, and analyze data. He has been a speaker at Python conferences. These days, he is busy co-founding a start-up. He has contributed to books on probabilistic graphical models by Packt Publishing.
Read more about Abinash Panda