Home Data Hands-On Markov Models with Python

Hands-On Markov Models with Python

By Ankur Ankan , Abinash Panda
books-svg-icon Book
eBook $29.99 $20.98
Print $38.99
Subscription $15.99 $10 p/m for three months
$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
eBook $29.99 $20.98
Print $38.99
Subscription $15.99 $10 p/m for three months
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
About this book
Hidden Markov Model (HMM) is a statistical model based on the Markov chain concept. Hands-On Markov Models with Python helps you get to grips with HMMs and different inference algorithms by working on real-world problems. The hands-on examples explored in the book help you simplify the process flow in machine learning by using Markov model concepts, thereby making it accessible to everyone. Once you’ve covered the basic concepts of Markov chains, you’ll get insights into Markov processes, models, and types with the help of practical examples. After grasping these fundamentals, you’ll move on to learning about the different algorithms used in inferences and applying them in state and parameter inference. In addition to this, you’ll explore the Bayesian approach of inference and learn how to apply it in HMMs. In further chapters, you’ll discover how to use HMMs in time series analysis and natural language processing (NLP) using Python. You’ll also learn to apply HMM to image processing using 2D-HMM to segment images. Finally, you’ll understand how to apply HMM for reinforcement learning (RL) with the help of Q-Learning, and use this technique for single-stock and multi-stock algorithmic trading. By the end of this book, you will have grasped how to build your own Markov and hidden Markov models on complex datasets in order to apply them to projects.
Publication date:
September 2018
Publisher
Packt
Pages
178
ISBN
9781788625449

 

Introduction to the Markov Process

In this chapter, we will develop the basic concepts that we need to understand Hidden Markov Models (HMM). We will cover the following topics:

  • Random processes
  • Markov processes
  • Markov chains or discrete-time Markov processes
  • Continuous-time Markov chains
 

Random variables

As we always do in statistics, let's start with a simple example of rolling a dice. If we consider rolling a fair dice, the outcome of the dice can be anything from 1 to 6, and is random. To represent such situations (the outcome of rolling the dice in this case), in mathematics we use the concept of random variables. We come across a lot of such variables in our everyday lives. Another example could be ordering food at a restaurant. In this case, the outcome could be any food item on the menu. In general terms, a random variable is a variable whose possible values are outcomes of a random phenomenon. The possible states of the outcomes are also known as the domain of the random variable, and the outcome is based on the probability distribution defined over the domain of the random variable.

Coming back to rolling the dice, the domain of the random variable outcome, O, is given by domain(O) = (1, 2, 3, 4, 5, 6), and the probability distribution is given by a uniform distribution P(o) = 1/6 ∀ ∈ domain(O). Similarly, in the case of the restaurant example, for the random variable choosing a dish, the domain would be every item on the menu, and the probability distribution would depend on your food preference. In both of the previous examples, the domain of the random variable has discrete variables; such random variables are known as discrete random variables. But it's also possible for the domain to be a continuous space. For example, consider the random variable representing the stock price of Google tomorrow. The domain of this random variable will be all positive real numbers with most of the probability mass distributed around ±5% of today's price. Such random variables are known as continuous random variables.

 

Random processes

In the previous section, we discussed random variables that are able to mathematically represent the outcomes of a single random phenomenon. But what if we want to represent these random events over some period of time or the length of an experiment? For example, let's say we want to represent the stock prices for a whole day at intervals of every one hour, or we want to represent the height of a ball at intervals of every one second after being dropped from some height in a vacuum. For such situations, we would need a set of random variables, each of which will represent the outcome at the given instance of time. These sets of random variables that represent random variables over a period of time are also known as random processes. It is worth noting that the domains of all these random variables are the same. Therefore, we can also think of the process as just changing the states.

Here, we have been talking about random variables at different instances of time, but it doesn't need to be time-based in every case. It could be just some other event. But since, in most cases, it is usually time, and it is much easier to talk about random processes in terms of time, we will use time to represent any such event. The same concepts will apply to creating a model if it varies over some other event instead of time.

Now let's discuss the previous two examples in more detail. Starting with the example of dropping the ball from a height in a vacuum, if we know the exact value of gravity and the height from which the ball is being dropped, we will be able to determine the exact location of the ball at every interval of one second using Newton's laws of motion.

Such random processes, in which we can deterministically find the state of each random variable given the initial conditions (in this case, dropping the ball, zero initial velocity) and the parameters of the system (in this case, the value of gravity), are known as deterministic random processes (commonly called deterministic processes).

Now let's go to the second example; representing the stock price over time. In this case, even if we know the current price and the exact probability distribution of the price at the next one hour mark, we won't be able to deterministically compute the value. These random processes, in which we can't determine the state of a process, even if we are given the initial conditions and all the parameters of the system, are known as stochastic random processes (commonly called processes). A very good way of understanding or getting a feel for a stochastic process is to think of it as being the opposite of a deterministic process.

 

Markov processes

A stochastic process is called a Markov process if the state of the random variable at the next instance of time depends only on the outcome of the random variable at the current time. In simplistic mathematical terms, for a stochastic process, S = {R1, R2, . . ., Rn} = {R}t=1, . . ., n, to be a Markov process, it must satisfy the following condition:

According to the previous condition, the probability distribution for any variable at any given instance in a Markov process is a conditional distribution, which is conditioned only on the random variable at the last time instance. This property of a system, such that the future states of the system depend only on the current state of the system, is also known as the Markov property. Systems satisfying the Markov property are also known as memoryless systems since they don't need to remember the previous states to compute the distribution of the next state, or, in other words, the next state depends only on the current state of the system.

A very common example used to explain the Markov process is a drunk man walking along a street. We consider that, since the man is drunk, he can either take a step backward, a step forward, or stay in his current position, which is given by some distribution of these, let's say [0.4, 0.4, 0.2]. Now, given the position of the man at any given instance in time, his position at the next instance depends only on his current position and the parameters of the system (his step size and the probability distribution of possible actions). Therefore, this is an example of a Markov process.

In the previous example, let's assume that the drunk man takes an action (steps forward/backward or stays in his position) at fixed intervals of time and his step size is always the same. With these considerations, the Markov process in our example has a discrete state space. Also, since the man takes steps after fixed intervals of time, we can think of it as a discrete time. But Markov processes don't need to have discrete state space or discrete time intervals. Considering discrete and continuous time as well as discrete and continuous state space, we can categorize Markov processes into four main categories:

  • Discrete time and discrete state space
  • Discrete time and continuous state space
  • Continuous time and discrete state space
  • Continuous time and continuous state space

We will discuss each of these categories of Markov process in more detail in the following sections.

 

Installing Python and packages

Installation on Windows

Miniconda can be installed on a Windows system by just double-clicking on the downloaded .exe file and following the installation instructions. After installation, we will need to create a conda environment and install all the required packages in the environment. To create a new Python 3.4 environment with the name hmm, run the following command:

conda create -n hmm python=3.4

After creating the environment, we will need to activate it and install the required packages in it. This can be done using the following commands:

activate hmm
conda install numpy scipy

Installation on Linux

On Linux, after downloading the Miniconda file, we will need to give it execution permissions and then install it. This can be done using the following commands:

chmod +x Miniconda.sh
./Miniconda.sh

After executing the file, we can simply follow the installation instructions. Once installed, we will need to create a new environment and install the required packages. We can create a new Python 3.4 environment with the name hmm using the following commands:

conda create -n hmm python=3.4

Once the environment has been created, we will need to activate it and install the packages inside it using the following:

source activate hmm
conda install numpy scipy
 

Markov chains or discrete-time Markov processes

A Markov chain is a type of Markov process in which the time is discrete. However, there is a lot of disagreement among researchers on what categories of Markov process should be called Markov chain. But, most commonly, it is used to refer to discrete-state-space Markov processes. Therefore, a Markov chain is a stochastic process over a discrete state space satisfying the Markov property. More formally, we can say that a discrete-time Markov chain is a sequence of random variables X1, X2, X3, ... that satisfy the Markov property, namely that the probability of moving from the current state to the next state depends only on the present state and not on any of the previous states. In terms of the probability distribution, we can say that, given that the system is at time instance n, the conditional distribution of the states at the next time instance, n + 1, is conditionally independent of the state of the system at time instances {1, 2, . . ., n-1}, given the state of the random variable at time instance n. This can be written as follows:

Markov chains are often represented using directed graphs. The nodes in the directed graphs represent the different possible states of the random variables, and the edges represent the probability of the system going from one state to the other in the next time instance. Let's take a simple example of predicting the weather to understand this representation better. We will consider that there are three possible states of the random variable Weather={Sunny, Rainy, Snowy}, and possible Markov chains for this can be represented as shown in Figure 1.1:

Figure 1.1: A simple Markov chain on the random variable, representing the random variable Weather={Sunny, Rainy, Snowy} and showing the probability of the random variable switching to other states in the next time instance

One of the main points to understand in Markov chains is that we are modeling the outcomes of a sequence of random variables over time. This is sometimes confusing for people since the model is represented using a single graph, which doesn't mention anything about time. So, the name state transitions is not a particularly good name for this, since the state is not changing for any random variable; rather, we are trying to determine the state of the next random variable given the observed state of our current random variable. Coming back to our example, we can see that the nodes of the graph represent the different possible states of the random variable Weather, and the edges between them show the probability of the next random variable taking the different possible states, given the state of the current random variable. The self-loops show the probability of the model staying in its current state. In the previous Markov chain, let's say we know that the observed state of the current random variable is Sunny, then the probability that the random variable at the next time instance will also take the value Sunny is 0.8. It could also take the value Rainy with a probability of 0.19, or Snowy with a probability of 0.01. One thing to note here is that the sum of all the probability values on all the outward edges from any state should equal 1, since it's an exhaustive event.

Now, let's try to code this simple Markov chain. We will start by defining a simple MarkovChain class, and we will keep on adding methods to this class as we go through this chapter:

import numpy as np

class MarkovChain(object):
def __init__(self, transition_prob):
"""
Initialize the MarkovChain instance.

Parameters
----------
transition_prob: dict
A dict object representing the transition probabilities in
Markov Chain. Should be of the form: {'state1': {'state1':
0.1, 'state2': 0.4}, 'state2': {...}}
"""
self.transition_prob = transition_prob
self.states = list(transition_prob.keys())

def next_state(self, current_state):
"""
Returns the state of the random variable at the next time
instance.

Parameters
----------
current_state: str
The current state of the system.
"""
return np.random.choice(
self.states, p=[self.transition_prob[current_state][next_state]
for next_state in self.states])

def generate_states(self, current_state, no=10):
"""
Generates the next states of the system.

Parameters
----------
current_state: str
The state of the current random variable.

no: int
The number of future states to generate.
"""
future_states = []
for i in range(no):
next_state = self.next_state(current_state)
future_states.append(next_state)
current_state = next_state
return future_states

Now, we can try out our example with this MarkovChain class:

>>> transition_prob = {'Sunny': {'Sunny': 0.8, 'Rainy': 0.19, 
'Snowy': 0.01},
'Rainy': {'Sunny': 0.2, 'Rainy': 0.7,
'Snowy': 0.1},
'Snowy': {'Sunny': 0.1, 'Rainy': 0.2,
'Snowy': 0.7}}

>>> weather_chain = MarkovChain(transition_prob=transition_prob)
>>> weather_chain.next_state(current_state='Sunny')
'Sunny'
>>> weather_chain.next_state(current_state='Snowy')
'Snowy'
>>> weather_chain.generate_states(current_state='Snowy', no=10)
['Snowy', 'Snowy', 'Snowy', 'Rainy', 'Snowy', 'Snowy', 'Rainy', 'Rainy', 'Snowy', 'Snowy']
In the previous code example, you might find your outputs to be different from what's shown here. This is because the Markov chain is probabilistic in nature and it picks on the next state based on a probability distribution, which can give different outputs on different runs.

So far in the discussion, we have considered that the probability space of the variables doesn't change over different instances of time. This is known as a time-homogeneous Markov chain, but it is also possible to have a time-inhomogeneous Markov chain, which also has a lot of applications but is outside the scope of this book.

Parameterization of Markov chains

In the code for the Markov chain in the previous section, we used a dictionary to parameterize the Markov chain that had the probability values of all the possible state transitions. Another way of representing state transitions is using a transition matrix. The transition matrix, as the name suggests, uses a tabular representation for the transition probabilities. The transition matrix for the example in Figure 1.1 is shown in the following table.

The following table shows the transition matrix for the Markov chain shown in Figure 1.1. The probability values represent the probability of the system going from the state in the row to the states mentioned in the columns:

States Sunny Rainy Snowy
Sunny 0.8 0.19 0.01
Rainy 0.2 0.7 0.1
Snowy 0.1+ 0.2 0.7

The transition matrix represents the same information as in the dictionary, but in a more compact way. For this reason, the transition matrix is the standard way of representing Markov chains. Let's modify our MarkovChain class so that it can accept a transition matrix:

import numpy as np

class MarkovChain(object):
def __init__(self, transition_matrix, states):
"""
Initialize the MarkovChain instance.

Parameters
----------
transition_matrix: 2-D array
A 2-D array representing the probabilities of change of
state in the Markov Chain.

states: 1-D array
An array representing the states of the Markov Chain. It
needs to be in the same order as transition_matrix.
"""
self.transition_matrix = np.atleast_2d(transition_matrix)
self.states = states
self.index_dict = {self.states[index]: index for index in
range(len(self.states))}
self.state_dict = {index: self.states[index] for index in
range(len(self.states))}

def next_state(self, current_state):
"""
Returns the state of the random variable at the next time
instance.

Parameters
----------
current_state: str
The current state of the system.
"""
return np.random.choice(
self.states,
p=self.transition_matrix[self.index_dict[current_state], :])

def generate_states(self, current_state, no=10):
"""
Generates the next states of the system.

Parameters
----------
current_state: str
The state of the current random variable.

no: int
The number of future states to generate.
"""
future_states = []
for i in range(no):
next_state = self.next_state(current_state)
future_states.append(next_state)
current_state = next_state
return future_states

Running this code should also give similar results to what we got in the previous section. Using a transition matrix might not seem like a good idea because it requires us to create extra variables to store the indices. But, in cases when we have hundreds of states, using a transition matrix is much more efficient than using the simple dictionary implementation. In the case of a transition matrix, we can simply use NumPy indexing to get the probability values in the next_state method, whereas we were looping over all the state names in the previous implementation:

>>> transition_matrix = [[0.8, 0.19, 0.01],
[0.2, 0.7, 0.1],
[0.1, 0.2, 0.7]]
>>> weather_chain = MarkovChain(transition_matrix=transition_matrix,
states=['Sunny', 'Rainy', 'Snowy'])
>>> weather_chain.next_state(current_state='Sunny')
'Sunny'
>>> weather_chain.next_state(current_state='Snowy')
'Sunny'
>>> weather_chain.generate_states(current_state='Snowy', no=10)
['Snowy', 'Rainy', 'Rainy', 'Rainy', 'Rainy', 'Rainy',
'Rainy', 'Rainy', 'Sunny', 'Sunny']

Properties of Markov chains

In this section, we will talk about the different properties of Markov chains, namely reducibility, periodicity, transience and recurrence, ergodicity, and steady-state analysis and limiting distributions. We will also try some simple examples of our MarkovChain class to show these properties.

Reducibility

A Markov chain is said to be irreducible if we can reach any state of the given Markov chain from any other state. In terms of states, state j is said to be accessible from another state i if a system that started at state i has a non-zero probability of getting to the state j. In more formal terms, state j is said to be accessible from state i if an integer nij ≥ 0 exists such that the following condition is met:

The nij here is basically the number of steps it takes to go from state i to j, and it can be different for different pairs of values for i and j. Also, for a given state i, if all the values for nij = 0, it means that all the states of the Markov chain are directly accessible from it. The accessibility relation is reflexive and transitive, but not necessary symmetric. We can take a simple example to understand this property:

Figure 1.2: An example of an irreducible Markov chain

In the previous example, it can be clearly seen that all of the states are accessible from all other states and hence are irreducible.

Note in the examples in Figure 1.2 and Figure 1.3 that we haven't represented edges if probability values are 0. This helps to keep the model less complicated and easier to read.

In the following example, we can see that state D is not accessible from A, B, or C. Also, state C is not accessible from either A or B. But all the states are accessible from state D, and states A and B are accessible from C:

Figure 1.3: An example of a reducible Markov chain

We can also add a couple of methods to our MarkovChain class to check which states in our chain are reachable and whether our chain is irreducible:

from itertools import combinations

def is_accessible(self, i_state, f_state):
"""
Check if state f_state is accessible from i_state.

Parameters
----------
i_state: str
The state from which the accessibility needs to be checked.

f_state: str
The state to which accessibility needs to be checked.
"""
reachable_states = [i_state]
for state in reachable_states:
if state == self.index_dict[f_state]:
return True
else:
reachable_states.append(np.nonzero(
self.transition_matrix[self.index_dict[i_state], :])[0])
return False

def is_irreducible(self):
"""
Check if the Markov Chain is irreducible.
"""
for (i, j) in combinations(self.states, self.states):
if not self.is_accessible(i, j):
return False
return True

Let's give our examples a try using the examples in Figure 1.2 and Figure 1.3:

>>> transition_irreducible = [[0.5, 0.5, 0, 0],
[0.25, 0, 0.5, 0.25],
[0.25, 0.5, 0, 0.25],
[0, 0, 0.5, 0.5]]
>>> transition_reducible = [[0.5, 0.5, 0, 0],
[0, 1, 0, 0],
[0.25, 0.5, 0, 0],
[0, 0, 0.25, 0.75]]
>>> markov_irreducible = MarkovChain(transition_matrix=transition_irreducible,
states=['A', 'B', 'C', 'D'])
>>> markov_reducible = MarkovChain(transition_matrix=transition_reducible,
states=['A', 'B', 'C', 'D'])
>>> markov_irreducible.is_accessible(i_state='A', f_state='D')
True
>>> markov_irreducible.is_accessible(i_state='B', f_state='D')
True
>>> markov_irreducible.is_irreducible()
True
>>> markov_reducible.is_accessible(i_state='A', f_state='D')
False
>>> markov_reducible.is_accessible(i_state='D', f_state='A')
True
>>> markov_reducible.is_accessible(i_state='C', f_state='D')
False
>>> markov_reducible.is_irreducible()
False

Periodicity

State i is said to have period k if any possible path to return to state i would be a multiple of k steps. Formally, it is defined like this:

Here, gcd means the greatest common divisor (GCD). Basically, k is the GCD of the length/number of steps of all possible paths from state i back to itself. If there are no possible paths from state i back to itself, then the period for it is not defined. We also need to note that k has nothing to do with the number of steps required to return to the starting state. For example, let's say that for any given state the number of steps required to return to it are (4, 6, 8, 12, 16). In this case k=2, but the minimum number of steps required to return is 4, and 2 doesn't even appear in the list of possible numbers of steps.

For any given state in the Markov chain, if k=1, the state is said to be aperiodic. A Markov chain is called aperiodic if all of its states are aperiodic. One major thing to note is that, in the case of an irreducible Markov chain, a single aperiodic state is enough to imply that all the states are aperiodic. Let's take a simple example and check the periodicity of different states:

Figure 1.4: Markov chain is also periodic

In the previous example, we can easily see that for state A the possible paths to return are A -> B -> C -> A or A -> B -> C -> D -> E -> C -> A. For these two paths, the path lengths are 3 and 6, respectively, and hence state A has a period of 3. Similarly, B, C, D, and E also each has a period of 3 in the Markov chain, and hence the Markov chain is also periodic:

Figure 1.5: Example of Markov Chain with aperiodic states.

In this example, we added a couple of extra edges, due to which the possible path lengths for A are now 3, 5, 7, ...; and for B are 2, 3, 4, 5, ... And, since the GCD of these path lengths is 1, states A and B are both now aperiodic. Similarly, we can compute the period of other nodes, each of which is also 1, and hence the Markov chain is also aperiodic.

Let's now add a couple of new methods to our MarkovChain class to compute the period of different states and check whether our model is aperiodic:

def get_period(self, state):
"""
Returns the period of the state in the Markov Chain.

Parameters
----------
state: str
The state for which the period needs to be computed.
"""
return gcd([len(i) for i in all_possible_paths])

def is_aperiodic(self):
"""
Checks if the Markov Chain is aperiodic.
"""
periods = [self.get_period(state) for state in self.states]
for period in periods:
if period != 1:
return False
return True

Let's now try out our methods on our examples. In this example, we will randomly assign probability values to different transitions:

>>> transition_periodic = [[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0.5, 0, 0, 0.5, 0],
[0, 0, 0, 0, 1],
[0, 0, 1, 0, 0]]
>>> transition_aperiodic = [[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0.5, 0.25, 0, 0.25, 0],
[0, 0, 0, 0, 1],
[0, 0, 0.5, 0.5, 0]]
>>> markov_periodic = MarkovChain(transition_matrix=transition_periodic,
states=['A', 'B', 'C', 'D', 'E'])
>>> markov_aperiodic = MarkovChain(transition_matrix=transition_aperiodic,
states=['A', 'B', 'C', 'D', 'E'])

>>> markov_periodic.get_period('A')
3
>>> markov_periodic.get_period('C')
3
>>> markov_aperiodic.is_periodic()
False

>>> markov_aperiodic.get_period('A')
1
>>> markov_aperiodic.get_period('B')
1
>>> markov_aperiodic.is_periodic()
True

Transience and recurrence

Given that we start at state i, it is called transient if there is a non-zero probability that we will never return to state i. To define this in more formal terms, let's consider a random variable Ti as the first return time to state i:

Let's now define another term, , as the probability of the system returns to state i after n steps:

Now we can define that any given state i is transient if the following condition is met:

In the preceding equation, we are basically checking whether the total sum of probabilities of returning to state i in step sizes less than is less than 1. If the total sum is less than 1, it would mean that the probability of Ti to be is greater than 0 which would mean that the state i is transient. The given state i is called recurrent if it is not transient:

Figure 1.6:

In the preceding example, we can see that states A and B are transient because A doesn't have any incoming edge. B does have an incoming edge, but it's incoming from another transient state and therefore it is also transient. Hence, once the system leaves state A or B, it won't be able to come back.

It is really simple to check whether a given state is transient or not. We can simply check whether there are any incoming edges from other states or not. If not, the state is transient. Let's write a simple method to check this for our MarkovChain class:

def is_transient(self, state):
"""
Checks if a state is transient or not.

Parameters
----------
state: str
The state for which the transient property needs to be checked.
"""
if all(self.transition_matrix[~self.index_dict[state], self.index_dict[state]] == 0):
return True
else:
return False

Now we can use this method in our example in Figure 1.6 to check which nodes are transient:

>>> transient_matrix = [[0, 0.5, 0.5, 0],
[0, 0, 0.25, 0.75],
[0, 0, 0, 1],
[0, 0, 0.5, 0.5]]
>>> transient_markov = MarkovChain(transition_matrix=transient_matrix,
states=['A', 'B', 'C', 'D']) >>> transient_markov.is_transient('A')
True
>>> transient_markov.is_transient('B')
True
>>> transient_markov.is_transient('C')
False

In the following subsections, we will talk about the statistical properties of the random variable Ti.

Mean recurrence time

The first-return time for the initial state i is also known as the hitting time. It was represented using the random variable Ti in the previous section. The mean recurrence time of state i is defined as its expected return time:

If the mean recurrence time, Mi, is finite, the state is called positive recurrent, otherwise it is called null recurrent.

Expected number of visits

As is evident from the name, the expected number of visits for any state i is the number of times the system is expected to be in that state. Also, a given state i is recurrent if and only if the expected number of visits to i is infinite:

Absorbing states

State i is said to be an absorbing state if it is impossible for a system to leave that state once it reaches it. For a state to be an absorbing state, the probability of staying in the same state should be 1, and all the other probabilities should be 0:

In a Markov chain, if all the states are absorbing, then we call it an absorbing Markov chain:

Figure 1.7: An example showing an absorbing state C, since the probability of transitioning from state C to C is 1

Again, we can add a very simple method to check for absorbing states in our MarkovChain class:

def is_absorbing(self, state):
"""
Checks if the given state is absorbing.

Parameters
----------
state: str
The state for which we need to check whether it's absorbing
or not.
"""
state_index = self.index_dict[state]
if self.transition_matrix[state_index, state_index]

We can again check whether our state in the example is absorbing by creating a Markov chain and using the is_absorbing method:

>>> absorbing_matrix = [[0, 1, 0],
[0.5, 0, 0.5],
[0, 0, 1]]
>>> absorbing_chain = MarkovChain(transition_matrix=absorbing_matrix,
states=['A', 'B', 'C'])
>>> absorbing_chain.is_absorbing('A')
False
>>> absorbing_chain.is_absorbing('C')
True

Ergodicity

State i is said to be ergodic if it is recurrent, has a period of 1, and has a finite mean recurrence time. If all the states of a Markov chain are ergodic, then it's an ergodic Markov chain. In general terms, a Markov chain is ergodic if there is a number N, such that any state in the system can be reached from any other state in any number of steps greater than or equal to the number N. Therefore, in the case of a fully connected transition matrix, where all transitions have a non-zero probability, this condition is fulfilled with N=1.

Steady-state analysis and limiting distributions

In a Markov chain, vector π is called the stationary distribution if ∀ j ∈ s satisfies the following conditions:

The stationary distribution is one of the most important properties of Markov chains, and we will talk about it in much more detail in later sections of this chapter.

 

Continuous-time Markov chains

Continuous-time Markov chains are quite similar to discrete-time Markov chains except for the fact that in the continuous case we explicitly model the transition time between the states using a positive-value random variable. Also, we consider the system at all possible values of time instead of just the transition times.

Exponential distributions

The random variable x is said to have an exponential distribution with a rate of distribution of λ if its probability density function is defined as follows:

Here, the rate of distribution λ needs to be greater than 0. We can also compute the expectation of X as this:

We see that the expectation of X is inversely proportional to the rate of learning. This means that an exponential distribution with a higher rate of learning would have a lower expectation. The exponential distribution is often used to model problems that involve modelling time until some event happens. A simple example could be modelling the time before an alarm clock goes off, or the time before a server comes to your table in a restaurant. And, as we know , the higher the learning rate, the sooner we would expect the event to happen, and hence the name learning rate.

We can also compute the second moment and the variance of the exponential distribution:

And, using the first moment and the second moment, we can compute the variance of the distribution:

Figure 1.x: Probability distribution of exponential distribution

Now we will move on to some of the properties of the exponential distribution that are relevant to our example:

  • Memoryless: Figure 1.x shows a plot of an exponential distribution. In the diagram, we can clearly see that the graph after any given point (a in this case) is an exact copy of the original distribution. We can also say that an exponential distribution that is conditioned on (X > a) still stays exponential. If we think about this property in terms of our examples, it means that if we had an alarm clock, and at any time, t, we check that it still hasn't gone off, we can still determine the distribution over the time ahead of t, which will be the same exponential distribution. This property of the exponential distribution is known as being memoryless, since at any given point in time, if you know the current state of the system (in this example, that the alarm hasn't gone off), you can determine the probability distribution over time in the future. This property of exponential distributions is quite similar to Markov chains, as you may recall from previous sections.
  • Probability of minimum value: Let's say we have n independent exponential distributions over the random variables X0, . . ., Xn with learning rates λ0, ..., λn, respectively. For these distributions, we can prove that the distribution of min(X0, . . ., Xn) is also an exponential distribution with learning rate :

We will use both of these properties of the exponential distribution in our example for the continuous time Markov chain in a later section.

Poisson process

The Poisson process is a continuous process, and there can be multiple interpretations of it, which lead to different possible definitions. In this section, we will start with the formal definition and build up to a more simple, intuitive definition. A continuous-time stochastic process N(t):t > 0 is a Poisson process with a rate λ > 0 if the following conditions are met:

  • N(0) = 0
  • It has stationary and independent increments
  • The distribution of N(t) is Poisson with mean λt:

First of all, we need to define what the stationary and independent increments are. For a continuous-time stochastic process, X(t): ≥ 0, an increment is defined as the difference in state of the system between two time instances; that is, given two time instances s and t with s < t, the increment from time s to time t is X(t) - X(s). As the name suggests, a process is said to have a stationary increment if its distribution for the increment depends only on the time difference.

In other words, a process is said to have a stationary increment if the distribution of X(t1) - X(s1) is equal to X(t2) - X(s2) if t1 > s1,t2 > s2 and t1 - s1 = t2 -s2. A process is said to have an independent increment if any two increments in disjointed time intervals are independent; that is, if t1 > s1 > t2 > s2, then the increments X(t2) - X(s2) and X(t1) - X(s1) are independent.

Now let's come back to defining the Poisson process. The Poisson process is essentially a counting process that counts the number of events that have occurred before time t. This count of the number of events before time t is given by N(t), and, similarly, the number of events occurring between time intervals t and t + s is given by N(t + s) - N(t). The value N(t + s) - N(t) is Poisson-distributed with a mean λs. We can see that the Poisson process has stationary increments in fixed time intervals, but as , the value of N(t) will also approach infinity; that is, . Another thing worth noting is that, as the value of λ increases, the number of events happening will also increase, and that is why λ is also known as the rate of the process.

This brings us to our second simplified definition of the Poisson process. A continuous-time stochastic process N(t): t ≥ 0 is called a Poisson process with the rate of learning λ > 0 if the following conditions are met:

  • N(0) = 0
  • It is a counting process; that is, N(T) gives the count of the number of events that have occurred before time t
  • The times between the events are distributed independently and identically, with an exponential distribution having a learning rate of λ

Continuous-time Markov chain example

Now, since we have a basic understanding of exponential distributions and the Poisson process, we can move on to the example to build up a continuous-time Markov chain. In this example, we will try to show how the properties of exponential distributions can be used to build up generic continuous-time Markov chains. Let's consider a hotel reception where n receptionists are working in parallel. Also consider that the guests arrive according to a Poisson process, with rate λ, and the service time for each guest is represented using an exponential random variable with learning rate µ. Also, if all the receptionists are busy when a new guest arrives, he/she will depart without getting any service. Now let's consider that a new guest arrives and finds all the receptionists are busy, and let's try to compute the expected number of busy receptionists in the next time interval.

Let's start by assuming that Tk represents the number of k busy receptionists in the next time instance. We can also use Tk to represent the expected number of busy receptionists found by the next arriving guest if k receptionists are busy at the current time instance. These two representations of Tk are equivalent because of the memoryless property of exponential distributions.

Firstly, T0 is clearly 0, because if there are currently 0 busy receptionists, the next arrival will also find 0 busy receptionists for sure. Now, considering T1, if there are currently i busy receptionists, the next arriving guest will find 1 busy receptionist if the time to the next arrival is less than the remaining service time for the busy receptionist. From the memoryless property, we know that the next arrival time is exponentially distributed with a learning rate of λ, and the remaining service time is also exponentially distributed with a learning rate of µ. Therefore, the probability that the next guest will find one receptionist busy is and hence the following is true:

In general, we consider the situation that k receptionists are busy. We can obtain an expression for Tk by conditioning on what happens first. When we have k receptionists busy, we can think of basically k+1 independent exponential distributions: k exponentials with a learning rate of µ for the remaining service time for each receptionist, and 1 exponential distribution with a learning rate of λ for the next arriving guest. In our case, we want to condition on whether a service completion happens first or a new guest arrives first. The time for a service completion will be the minimum of the k exponentials. This first completion time is also exponentially distributed with a learning rate of . Now, the probability of having a service completion before the next guest arrives is . Similarly, the probability of the next thing happening being a guest arrival is .

Now, based on this, we can say that if the next event is service completion, then the expected number of busy receptionists will be Tk-1. Otherwise, if a guest arrives first, there will be k busy receptionists. Therefore we have the following:

We need to just solve this recursion now. T2 will be given by this equation:

If we continue this same pattern, we will get T3 as follows:

We can see a pattern in the values of T1 and T2, and therefore we can write a general term for it as follows:

Let's point out our observations on the previous example:

  • At any given time instance, if there are i busy receptionists, for i < n there are i + 1 independent exponential distributions, with i of them having rate µ, and 1 of them having rate λ. The time until the process makes a jump is exponential, and its rate is given by iµ + λ. If all the receptionists are busy, then only the n exponential distributions corresponding to the service time can trigger a jump, and the time until the process makes a jump is exponential with rate .
  • When the process jumps from state i, for i < n, it jumps to state i + 1 with probability , and jumps to state i - 1 with probability of .
  • When the process makes a jump from state i, we can start up a whole new set of distributions corresponding to the state we jumped to. This is because, even though some of the old exponential distributions haven't triggered, it's equivalent to resetting or replacing those distributions.
Every time we jump to state i, regardless of when the time is, the distribution of how long we stay in state i and the probabilities of where we jump to next when we leave state i are the same. In other words, the process is time-homogeneous.

The preceding description of a continuous-time stochastic process corresponds to a continuous-time Markov chain. In the next section, we will try to define it in a more formal way.

Continuous-time Markov chain

In the previous section, we showed an example of a continuous-time Markov chain to give an indication of how it works. Let's now move on to formally define it. In a continuous-time Markov chain with a discrete state space S, for each state i ∈ S we have an associated set of ni independent exponential distributions with rates qi, j1, ..., qi,jni, where j1, ..., jni is the set of possible states the process may jump to when it leaves state i. And, when the process enters state i, the amount of time it spends in state i is exponentially distributed with rate vi = qij1+...+qijni, and when it leaves state i it will go to state jl with probability for l = 1, ...,ni.

We can also extend the Markov property from the discrete-time case to continuous time.

For a continuous-time stochastic process (X(t) : t ≥ 0) with state space S, we say it has the Markov property if the following condition is met:

Here, 0 ≤ t1 ≤ t2 ≤. . . .tn-1 ≤ s ≤ t is any non-decreasing sequence of n + 1 times, and i1, i2, . . ., in-1, i, j ∈ S are any n + 1 states in the state space, for any integer n ≥ 1.

Similarly, we can extend time-homogeneity to the case of continuous-time Markov chains. We say that a continuous-time Markov chain is time homogenous if, for any s ≤ t, and any states i, j ∈ S, the following condition is met:

As in the case of discrete-time Markov chains, a continuous-time Markov chain does not need to be time-homogeneous, but non-homogeneous Markov chains are out of scope for this book. For more details on non-homogeneous Markov chains, you can refer to Cheng-Chi Huang's thesis on the topic: https://lib.dr.iastate.edu/cgi/viewcontent.cgi?article=8613&context=rtd.

Now let's define the transition probability for a continuous-time Markov chain. Just as the rates qij in a continuous-time Markov chain are the counterpart of the transition probabilities pij in a discrete-time Markov chain, there is a counterpart to the n-step transition probabilities pij(t) for a time-homogeneous, continuous-time Markov chain, which is defined as follows:

 

Summary

In this chapter, we gave a detailed introduction to Markov chains. We talked about different types of Markov chains, mainly chains with a discrete state space, with either discrete time or continuous time. We also introduced the concepts of time-homogeneous and non-time-homogeneous Markov chains. We discussed the different properties of Markov chains in detail, and provided relevant examples and code.

Markov chains and their properties are the basic concepts on which HMMs are built. In the next chapter, we will discuss HMMs in much more detail.

About the Authors
  • Ankur Ankan

    Ankur Ankan is a BTech graduate from IIT (BHU), Varanasi. He is currently working in the field of data science. He is an open source enthusiast and his major work includes starting pgmpy with four other members. In his free time, he likes to participate in Kaggle competitions.

    Browse publications by this author
  • Abinash Panda

    Abinash Panda has been a data scientist for more than 4 years. He has worked at multiple early-stage start-ups and helped them build their data analytics pipelines. He loves to munge, plot, and analyze data. He has been a speaker at Python conferences. These days, he is busy co-founding a start-up. He has contributed to books on probabilistic graphical models by Packt Publishing.

    Browse publications by this author
Latest Reviews (2 reviews total)
Exhaustive on theory and application, with many coded examples in different fields.
good starter for the topic
Hands-On Markov Models with Python
Unlock this book and the full library FREE for 7 days
Start now