Reader small image

You're reading from  Mastering Predictive Analytics with R

Product typeBook
Published inJun 2015
Reading LevelExpert
Publisher
ISBN-139781783982806
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
Rui Miguel Forte
Rui Miguel Forte
author image
Rui Miguel Forte

Why do you think this reviewer is suitable for this book? Mr. Rui Miguel Forte has authored a book for Packt titled “Mastering Predictive Analytics with R”. The book has received a 5 star rating. He has 3 years experience as a Data Scientist. He has knowledge of Scala, Python, R, PHP. • Has the reviewer published any articles or blogs on this or a similar tool/technology ? [Provide Links and References] A brief of Unsupervised learning has been covered in his book “Mastering Predictive Analytics with R” https://www.safaribooksonline.com/library/view/mastering-predictive-analytics/9781783982806/ https://www.linkedin.com/profile/view?id=AAkAAAC5YUIBYL7LyLCWZ6LsR0ENJxByC2jU9AU&authType=NAME_SEARCH&authToken=c1Pg&locale=en_US&trk=tyah&trkInfo=clickedVertical%3Amynetwork%2CclickedEntityId%3A12149058%2CauthType%3ANAME_SEARCH%2Cidx%3A1-1-1%2CtarId%3A1444032603690%2Ctas%3ARui%20Miguel%20Forte • Feedback on the Outline (in case outline has been shared with the reviewer) The author said the outline is good to go. • Did the reviewer share any concerns or questions regarding the reviewing process? (related to the schedule, commitment, or any additional comments) No
Read more about Rui Miguel Forte

Rui Miguel Forte
Rui Miguel Forte
author image
Rui Miguel Forte

Rui Miguel Forte is currently the chief data scientist at Workable. He was born and raised in Greece and studied in the UK. He is an experienced data scientist, having over 10 years of work experience in a diverse array of industries spanning mobile marketing, health informatics, education technology, and human resources technology. His projects have included predictive modeling of user behavior in mobile marketing promotions, speaker intent identification in an intelligent tutor, information extraction techniques for job applicant resumes and fraud detection for job scams. He currently teaches R, MongoDB, and other data science technologies to graduate students in the Business Analytics MSc program at the Athens University of Economics and Business. In addition, he has lectured in a number of seminars, specialization programs, and R schools for working data science professionals in Athens. His core programming knowledge is in R and Java, and he has extensive experience working with a variety of database technologies such as Oracle, PostgreSQL, MongoDB, and HBase. He holds a Master’s degree in Electrical and Electronic Engineering from Imperial College London and is currently researching machine learning applications in information extraction and natural language processing.
Read more about Rui Miguel Forte

View More author details
Right arrow

Chapter 8. Probabilistic Graphical Models

Probabilistic graphical models, or simply graphical models as we will refer to them in this chapter, are models that use the representation of a graph to describe the conditional independence relationships between a series of random variables. This topic has received an increasing amount of attention in recent years and probabilistic graphical models have been successfully applied to tasks ranging from medical diagnosis to image segmentation. In this chapter, we'll present some of the necessary background that will pave the way to understanding the most basic graphical model, the Naïve Bayes classifier. We will then look at a slightly more complicated graphical model, known as the Hidden Markov Model, or HMM for short. To get started in this field, we must first learn about graphs and why they are useful.

A little graph theory


Graph theory is a branch of mathematics that deals with mathematical objects known as graphs. Here, a graph does not have the everyday meaning that we are more used to talking about, in the sense of a diagram or plot with an x and y axis. In graph theory, a graph consists of two sets. The first is a set of vertices, which are also referred to as nodes. We typically use integers to label and enumerate the vertices. The second set consists of edges between these vertices.

Thus, a graph is nothing more than a description of some points and the connections between them. The connections can have a direction so that an edge goes from the source or tail vertex to the target or head vertex. In this case, we have a directed graph. Alternatively, the edges can have no direction, so that the graph is undirected.

A common way to describe a graph is via the adjacency matrix. If we have V vertices in the graph, an adjacency matrix is a V×V matrix whose entries are 0 if the vertex represented...

Bayes' Theorem


Suppose we are interested in two events, A and B. In this case, event A might represent the event that a patient has appendicitis and event B might represent a patient having a high white blood cell count. The conditional probability of event A given event B is essentially the probability that event A will occur when we know that event B has already happened.

Formally, we define the conditional probability of event A given event B as the joint probability of both events occurring divided by the probability of event B occurring:

Note that this is consistent with the way in which we define statistical independence. Statistical independence occurs when the joint probability of two events occurring is just the product of the individual probabilities of the two events. If we substitute this in our previous equation, we have:

This makes sense intuitively because if we know that two events are independent of each other, knowing that event B has occurred does not change the probability...

Conditional independence


We know from statistics that the notion of statistical independence says that the joint probability of two random variables A and B is just the product of their (marginal) probabilities. Sometimes, two variables may not be statistically independent of each other to begin with, but observing a third variable, C, might result in them becoming independent of each other. In short, we say that events A and B are conditionally independent given C, and we can express this as:

For example, suppose that J represents the probability of being given a job offer at a particular company and G represents the probability of being accepted into graduate school at a particular university. Both of these might depend on a variable U, a person's performance on their undergraduate degree. This can be summarized in a graph as:

When we don't know U, a person's performance on their undergraduate degree, knowing that they were accepted into graduate school might increase our belief in their...

Bayesian networks


Bayesian networks are a type of graphical model that involves a directed acyclic graph structure. We often refer to the tail node of a directed edge in a graphical model as the parent and the head node as the child or descendant. In fact, we generalize this latter notion so that if there is a path from node A to node B in the model, node B is a descendant of node A. We can distinguish the special case of node A connected to node B by saying that the latter is a direct descendant.

The parent relationship and the descendant relationship are mutually exclusive in a Bayesian network because it has no cycles. Bayesian networks have the distinguishing property that given its parents, every node in the network is conditionally independent of all other nodes in the network that are not its descendants. This is sometimes referred to as the local Markov property. It is an important property because it means that we can easily factorize the joint probability function of all the random...

The Naïve Bayes classifier


We now have the necessary tools to learn about our first and simplest graphical model, the Naïve Bayes classifier. This is a directed graphical model that contains a single parent node and a series of child nodes representing random variables that are dependent only on this node with no dependencies between them. Here is an example:

We usually interpret our single parent node as the causal node, so in our particular example, the value of the Sentiment node will influence the value of the sad node, the fun node, and so on. As this is a Bayesian network, the local Markov property can be used to explain the core assumption of the model. Given the Sentiment node, all other nodes are independent of each other.

In practice, we use the Naïve Bayes classifier in a context where we can observe and measure the child nodes and attempt to estimate the parent node as our output. Thus, the child nodes will be the input features of our model, and the parent node will be the output...

Hidden Markov models


A Hidden Markov model, often abbreviated to HMM, which we will use here, is a Bayesian network with a repeating structure that is commonly used to model and predict sequences. In this section, we'll see two applications of this model: one to model DNA gene sequences, and another to model the sequences of letters that make up English text. The basic diagram for an HMM is shown here:

As we can see in the diagram, the sequence flows from left to right and we have a pair of nodes for every entry in the sequence that we are trying to model. Nodes labeled Ci are known as latent states, hidden states, or merely states, as they are typically nodes that are not observable. The nodes labeled Oi are observed states or observations. We will use the terms states and observations.

Now, as this is a Bayesian network, we can immediately identify some key properties. All the observations are independent of each other given their corresponding state. Also, every state is independent of...

Predicting promoter gene sequences


The first application we will study in detail comes from the field of biology. There, we learn that the basic building blocks of DNA molecules are actually four fundamental molecules known as nucleotides. These are called Thymine, Cytosine, Adenine, and Guanine, and it is the order in which these molecules appear in a DNA strand that encodes the genetic information carried by the DNA.

An interesting problem in molecular biology is finding promoter sequences within a larger DNA strand. These are special sequences of nucleotides that play an important role in regulating a genetic process known as gene transcription. This is the first step in the mechanism by which information in the DNA is read.

The molecular biology (promoter gene sequences) data set, hosted by the UCI Machine Learning repository at https://archive.ics.uci.edu/ml/datasets/Molecular+Biology+(Promoter+Gene+Sequences) contains a number of gene sequences from DNA belonging to the bacterium E....

Predicting letter patterns in English words


In this section, we will model the patterns of letters that form English words. Beyond having different words, and sometimes alphabets, languages differ from each other in the patterns of letters that are used to form words. English words have a characteristic distribution of letters and letter sequences, and in this section we will try to model the process of word formation in a very simplistic way by using a hidden Markov model.

The emitted symbols of our model will be the letters themselves, but this time we don't know what the states could be as we are using unlabeled data. For this reason, we are going to provide just the number of states that we want our model to have, and then use the Baum-Welch algorithm to train the parameters of our HMM.

All we need for this task is a corpus of text in English. Earlier in this chapter, we studied movie reviews with the Naïve Bayes classifier, so we will use these for convenience, although other sources...

Summary


In this chapter, we introduced ourselves to one of the very active areas of research in machine learning, namely the field of probabilistic graphical models. These models involve using a graphical structure to encode conditional independence relations between random variables. We saw how Bayes' Theorem, a very simple formula that essentially tells us how we can predicate cause by observing effect, can be used to build a simple classifier known as the Naïve Bayes classifier. This is a simple model where we are trying to predict an output class that best explains a set of observed features, all of which are assumed to be independent of each other given the output class.

We used this model to predict user sentiment on a set of movie reviews where the features were the words that were present in the reviews. Although we obtained reasonable accuracy, we found that the assumptions in our model are quite strict and prevent us from doing substantially better. Often, a Naïve Bayes model is...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Predictive Analytics with R
Published in: Jun 2015Publisher: ISBN-13: 9781783982806
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Rui Miguel Forte

Why do you think this reviewer is suitable for this book? Mr. Rui Miguel Forte has authored a book for Packt titled “Mastering Predictive Analytics with R”. The book has received a 5 star rating. He has 3 years experience as a Data Scientist. He has knowledge of Scala, Python, R, PHP. • Has the reviewer published any articles or blogs on this or a similar tool/technology ? [Provide Links and References] A brief of Unsupervised learning has been covered in his book “Mastering Predictive Analytics with R” https://www.safaribooksonline.com/library/view/mastering-predictive-analytics/9781783982806/ https://www.linkedin.com/profile/view?id=AAkAAAC5YUIBYL7LyLCWZ6LsR0ENJxByC2jU9AU&authType=NAME_SEARCH&authToken=c1Pg&locale=en_US&trk=tyah&trkInfo=clickedVertical%3Amynetwork%2CclickedEntityId%3A12149058%2CauthType%3ANAME_SEARCH%2Cidx%3A1-1-1%2CtarId%3A1444032603690%2Ctas%3ARui%20Miguel%20Forte • Feedback on the Outline (in case outline has been shared with the reviewer) The author said the outline is good to go. • Did the reviewer share any concerns or questions regarding the reviewing process? (related to the schedule, commitment, or any additional comments) No
Read more about Rui Miguel Forte

author image
Rui Miguel Forte

Rui Miguel Forte is currently the chief data scientist at Workable. He was born and raised in Greece and studied in the UK. He is an experienced data scientist, having over 10 years of work experience in a diverse array of industries spanning mobile marketing, health informatics, education technology, and human resources technology. His projects have included predictive modeling of user behavior in mobile marketing promotions, speaker intent identification in an intelligent tutor, information extraction techniques for job applicant resumes and fraud detection for job scams. He currently teaches R, MongoDB, and other data science technologies to graduate students in the Business Analytics MSc program at the Athens University of Economics and Business. In addition, he has lectured in a number of seminars, specialization programs, and R schools for working data science professionals in Athens. His core programming knowledge is in R and Java, and he has extensive experience working with a variety of database technologies such as Oracle, PostgreSQL, MongoDB, and HBase. He holds a Master’s degree in Electrical and Electronic Engineering from Imperial College London and is currently researching machine learning applications in information extraction and natural language processing.
Read more about Rui Miguel Forte