Packt+ | Advance your knowledge in tech

You're reading from Mastering Predictive Analytics with R

Product typeBook

Published inJun 2015

Reading LevelExpert

Publisher

ISBN-139781783982806

Edition1st Edition

Languages

Tools

RStudio

Concepts

Predictive Analytics

Authors (2):

Rui Miguel Forte

View More author details

Chapter 8. Probabilistic Graphical Models

Probabilistic graphical models, or simply graphical models as we will refer to them in this chapter, are models that use the representation of a graph to describe the conditional independence relationships between a series of random variables. This topic has received an increasing amount of attention in recent years and probabilistic graphical models have been successfully applied to tasks ranging from medical diagnosis to image segmentation. In this chapter, we'll present some of the necessary background that will pave the way to understanding the most basic graphical model, the Naïve Bayes classifier. We will then look at a slightly more complicated graphical model, known as the Hidden Markov Model, or HMM for short. To get started in this field, we must first learn about graphs and why they are useful.

A little graph theory

Graph theory is a branch of mathematics that deals with mathematical objects known as graphs. Here, a graph does not have the everyday meaning that we are more used to talking about, in the sense of a diagram or plot with an x and y axis. In graph theory, a graph consists of two sets. The first is a set of vertices, which are also referred to as nodes. We typically use integers to label and enumerate the vertices. The second set consists of edges between these vertices.

Thus, a graph is nothing more than a description of some points and the connections between them. The connections can have a direction so that an edge goes from the source or tail vertex to the target or head vertex. In this case, we have a directed graph. Alternatively, the edges can have no direction, so that the graph is undirected.

A common way to describe a graph is via the adjacency matrix. If we have V vertices in the graph, an adjacency matrix is a V×V matrix whose entries are 0 if the vertex represented...

Bayes' Theorem

Suppose we are interested in two events, A and B. In this case, event A might represent the event that a patient has appendicitis and event B might represent a patient having a high white blood cell count. The conditional probability of event A given event B is essentially the probability that event A will occur when we know that event B has already happened.

Formally, we define the conditional probability of event A given event B as the joint probability of both events occurring divided by the probability of event B occurring:

Note that this is consistent with the way in which we define statistical independence. Statistical independence occurs when the joint probability of two events occurring is just the product of the individual probabilities of the two events. If we substitute this in our previous equation, we have:

This makes sense intuitively because if we know that two events are independent of each other, knowing that event B has occurred does not change the probability...

Conditional independence

We know from statistics that the notion of statistical independence says that the joint probability of two random variables A and B is just the product of their (marginal) probabilities. Sometimes, two variables may not be statistically independent of each other to begin with, but observing a third variable, C, might result in them becoming independent of each other. In short, we say that events A and B are conditionally independent given C, and we can express this as:

For example, suppose that J represents the probability of being given a job offer at a particular company and G represents the probability of being accepted into graduate school at a particular university. Both of these might depend on a variable U, a person's performance on their undergraduate degree. This can be summarized in a graph as:

When we don't know U, a person's performance on their undergraduate degree, knowing that they were accepted into graduate school might increase our belief in their...

Bayesian networks

Bayesian networks are a type of graphical model that involves a directed acyclic graph structure. We often refer to the tail node of a directed edge in a graphical model as the parent and the head node as the child or descendant. In fact, we generalize this latter notion so that if there is a path from node A to node B in the model, node B is a descendant of node A. We can distinguish the special case of node A connected to node B by saying that the latter is a direct descendant.

The parent relationship and the descendant relationship are mutually exclusive in a Bayesian network because it has no cycles. Bayesian networks have the distinguishing property that given its parents, every node in the network is conditionally independent of all other nodes in the network that are not its descendants. This is sometimes referred to as the local Markov property. It is an important property because it means that we can easily factorize the joint probability function of all the random...

The Naïve Bayes classifier

We now have the necessary tools to learn about our first and simplest graphical model, the Naïve Bayes classifier. This is a directed graphical model that contains a single parent node and a series of child nodes representing random variables that are dependent only on this node with no dependencies between them. Here is an example:

We usually interpret our single parent node as the causal node, so in our particular example, the value of the Sentiment node will influence the value of the sad node, the fun node, and so on. As this is a Bayesian network, the local Markov property can be used to explain the core assumption of the model. Given the Sentiment node, all other nodes are independent of each other.

In practice, we use the Naïve Bayes classifier in a context where we can observe and measure the child nodes and attempt to estimate the parent node as our output. Thus, the child nodes will be the input features of our model, and the parent node will be the output...

Hidden Markov models

A Hidden Markov model, often abbreviated to HMM, which we will use here, is a Bayesian network with a repeating structure that is commonly used to model and predict sequences. In this section, we'll see two applications of this model: one to model DNA gene sequences, and another to model the sequences of letters that make up English text. The basic diagram for an HMM is shown here:

As we can see in the diagram, the sequence flows from left to right and we have a pair of nodes for every entry in the sequence that we are trying to model. Nodes labeled C_i are known as latent states, hidden states, or merely states, as they are typically nodes that are not observable. The nodes labeled O_i are observed states or observations. We will use the terms states and observations.

Now, as this is a Bayesian network, we can immediately identify some key properties. All the observations are independent of each other given their corresponding state. Also, every state is independent of...

Predicting promoter gene sequences

The first application we will study in detail comes from the field of biology. There, we learn that the basic building blocks of DNA molecules are actually four fundamental molecules known as nucleotides. These are called Thymine, Cytosine, Adenine, and Guanine, and it is the order in which these molecules appear in a DNA strand that encodes the genetic information carried by the DNA.

An interesting problem in molecular biology is finding promoter sequences within a larger DNA strand. These are special sequences of nucleotides that play an important role in regulating a genetic process known as gene transcription. This is the first step in the mechanism by which information in the DNA is read.

The molecular biology (promoter gene sequences) data set, hosted by the UCI Machine Learning repository at https://archive.ics.uci.edu/ml/datasets/Molecular+Biology+(Promoter+Gene+Sequences) contains a number of gene sequences from DNA belonging to the bacterium E....

Predicting letter patterns in English words

In this section, we will model the patterns of letters that form English words. Beyond having different words, and sometimes alphabets, languages differ from each other in the patterns of letters that are used to form words. English words have a characteristic distribution of letters and letter sequences, and in this section we will try to model the process of word formation in a very simplistic way by using a hidden Markov model.

The emitted symbols of our model will be the letters themselves, but this time we don't know what the states could be as we are using unlabeled data. For this reason, we are going to provide just the number of states that we want our model to have, and then use the Baum-Welch algorithm to train the parameters of our HMM.

All we need for this task is a corpus of text in English. Earlier in this chapter, we studied movie reviews with the Naïve Bayes classifier, so we will use these for convenience, although other sources...

Summary

In this chapter, we introduced ourselves to one of the very active areas of research in machine learning, namely the field of probabilistic graphical models. These models involve using a graphical structure to encode conditional independence relations between random variables. We saw how Bayes' Theorem, a very simple formula that essentially tells us how we can predicate cause by observing effect, can be used to build a simple classifier known as the Naïve Bayes classifier. This is a simple model where we are trying to predict an output class that best explains a set of observed features, all of which are assumed to be independent of each other given the output class.

We used this model to predict user sentiment on a set of movie reviews where the features were the words that were present in the reviews. Although we obtained reasonable accuracy, we found that the assumptions in our model are quite strict and prevent us from doing substantially better. Often, a Naïve Bayes model is...

The rest of the chapter is locked

You have been reading a chapter from

Mastering Predictive Analytics with R

Published in: Jun 2015Publisher: ISBN-13: 9781783982806

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Rui Miguel Forte

Why do you think this reviewer is suitable for this book? Mr. Rui Miguel Forte has authored a book for Packt titled “Mastering Predictive Analytics with R”. The book has received a 5 star rating. He has 3 years experience as a Data Scientist. He has knowledge of Scala, Python, R, PHP. • Has the reviewer published any articles or blogs on this or a similar tool/technology ? [Provide Links and References] A brief of Unsupervised learning has been covered in his book “Mastering Predictive Analytics with R” https://www.safaribooksonline.com/library/view/mastering-predictive-analytics/9781783982806/ https://www.linkedin.com/profile/view?id=AAkAAAC5YUIBYL7LyLCWZ6LsR0ENJxByC2jU9AU&authType=NAME_SEARCH&authToken=c1Pg&locale=en_US&trk=tyah&trkInfo=clickedVertical%3Amynetwork%2CclickedEntityId%3A12149058%2CauthType%3ANAME_SEARCH%2Cidx%3A1-1-1%2CtarId%3A1444032603690%2Ctas%3ARui%20Miguel%20Forte • Feedback on the Outline (in case outline has been shared with the reviewer) The author said the outline is good to go. • Did the reviewer share any concerns or questions regarding the reviewing process? (related to the schedule, commitment, or any additional comments) No
Read more about Rui Miguel Forte

Rui Miguel Forte

Rui Miguel Forte is currently the chief data scientist at Workable. He was born and raised in Greece and studied in the UK. He is an experienced data scientist, having over 10 years of work experience in a diverse array of industries spanning mobile marketing, health informatics, education technology, and human resources technology. His projects have included predictive modeling of user behavior in mobile marketing promotions, speaker intent identification in an intelligent tutor, information extraction techniques for job applicant resumes and fraud detection for job scams. He currently teaches R, MongoDB, and other data science technologies to graduate students in the Business Analytics MSc program at the Athens University of Economics and Business. In addition, he has lectured in a number of seminars, specialization programs, and R schools for working data science professionals in Athens. His core programming knowledge is in R and Java, and he has extensive experience working with a variety of database technologies such as Oracle, PostgreSQL, MongoDB, and HBase. He holds a Master’s degree in Electrical and Electronic Engineering from Imperial College London and is currently researching machine learning applications in information extraction and natural language processing.
Read more about Rui Miguel Forte

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages