Reader small image

You're reading from  Building Statistical Models in Python

Product typeBook
Published inAug 2023
Reading LevelIntermediate
PublisherPackt
ISBN-139781804614280
Edition1st Edition
Languages
Concepts
Right arrow
Authors (3):
Huy Hoang Nguyen
Huy Hoang Nguyen
author image
Huy Hoang Nguyen

Huy Hoang Nguyen is a Mathematician and a Data Scientist with far-ranging experience, championing advanced mathematics and strategic leadership, and applied machine learning research. He holds a Master's in Data Science and a PhD in Mathematics. His previous work was related to Partial Differential Equations, Functional Analysis and their applications in Fluid Mechanics. He transitioned from academia to the healthcare industry and has performed different Data Science projects from traditional Machine Learning to Deep Learning.
Read more about Huy Hoang Nguyen

Paul N Adams
Paul N Adams
author image
Paul N Adams

Paul Adams is a Data Scientist with a background primarily in the healthcare industry. Paul applies statistics and machine learning in multiple areas of industry, focusing on projects in process engineering, process improvement, metrics and business rules development, anomaly detection, forecasting, clustering and classification. Paul holds a Master of Science in Data Science from Southern Methodist University.
Read more about Paul N Adams

Stuart J Miller
Stuart J Miller
author image
Stuart J Miller

Stuart Miller is a Machine Learning Engineer with degrees in Data Science, Electrical Engineering, and Engineering Physics. Stuart has worked at several Fortune 500 companies, including Texas Instruments and StateFarm, where he built software that utilized statistical and machine learning techniques. Stuart is currently an engineer at Toyota Connected helping to build a more modern cockpit experience for drivers using machine learning.
Read more about Stuart J Miller

View More author details
Right arrow

Discriminant Analysis

In the previous chapter, we discussed discrete regression models, including classification using logistic regression. In this chapter, we will begin with an overview of probability, expanding into conditional and independent probability. We then discuss how these two approaches to understanding the laws of probability form the basis for Bayes’ Theorem, which is used directly to expand an approach called Bayesian statistics. Following this topic, we dive into Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA), two powerful classifiers that model data using the Bayesian approach to probability modeling.

In this chapter we’re going to cover the following main topics:

  • Bayes’ Theorem
  • LDA
  • QDA

Bayes’ theorem

In this section, we will discuss Bayes’ Theorem, which is used in the classification models described later in this chapter. We will start the chapter by discussing the basics of probability. Then, we will take a look at dependent events and discuss how Bayes’ Theorem is related to dependent events.

Probability

Probability is a measurement of the likelihood that an event occurs or a certain outcome occurs. Generally, we can group events into two types of events: independent events and dependent events. The distinction between the types of events is in the name. An independent event is an event that is not affected or influenced by the occurrences of other events, while a dependent event is affected or influenced by the occurrences of other events.

Let’s think about some examples of these events. For the first example, think about a fair coin toss. A coin toss can result in one of two states: heads and tails. If the coin is fair, there...

Linear Discriminant Analysis

In the previous chapter, we discussed logistic regression as a classification model leveraging linear regression to model directly the probability of a target distribution given an input distribution. One alternative to this approach is LDA. LDA models the probability of target distribution class memberships given input variable distributions corresponding to each class using decision boundaries constructed using Bayes’ Theorem, which we discussed previously. Where we have k classes, using Bayes’ Theorem, we have the probability density function for LDA class membership simply as P(Y = k|X = x) for any discrete random variable, X. This relies on the posterior probability that an observation x in variable X belongs to the kth class.

Before proceeding, we must first make note that LDA makes three pertinent assumptions:

  • Each input variable is normally distributed.
  • Across all target classes, there is equal covariance among the predictors...

Quadratic Discriminant Analysis

In the last section, we discussed LDA. The data within each class needs to be drawn from a multivariate Gaussian distribution, and the covariance matrix is the same across different classes. In this section, we consider another type of discriminant analysis called QDA but the assumptions for QDA can be relaxed on the covariance matrix assumption. Here, we do not need the covariance matrix to be identical across different classes but only for each class to have its own covariance matrix. The multivariate Gaussian distribution with a class-specific mean vector within each class for observations is still required to conduct QDA. We assume that an observation from a k th class satisfies the following formula:

X~N(μ k, Σ k)

We’ll thus consider a generative classifier, as follows:

p(X | y = k, θ) = N(X | μ k, Σ k)

And then, its corresponding class posterior is this:

p(y = k | X, ...

Summary

In this chapter, we began with an overview of probability. We covered the differences between conditional and independent probability and how Bayes’ Theorem leverages these concepts to provide a unique approach to probability modeling. Next, we discussed LDA, its assumptions, and how the algorithm can be used to apply Bayesian statistics to both perform classification modeling and supervised dimension reduction. Finally, we covered QDA, an alternative to LDA when linear decision boundaries are not effective.

In the next chapter, we will introduce the fundamentals of time-series analysis, including an overview of the depths and limitations of this approach to answering statistical questions.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Building Statistical Models in Python
Published in: Aug 2023Publisher: PacktISBN-13: 9781804614280
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Huy Hoang Nguyen

Huy Hoang Nguyen is a Mathematician and a Data Scientist with far-ranging experience, championing advanced mathematics and strategic leadership, and applied machine learning research. He holds a Master's in Data Science and a PhD in Mathematics. His previous work was related to Partial Differential Equations, Functional Analysis and their applications in Fluid Mechanics. He transitioned from academia to the healthcare industry and has performed different Data Science projects from traditional Machine Learning to Deep Learning.
Read more about Huy Hoang Nguyen

author image
Paul N Adams

Paul Adams is a Data Scientist with a background primarily in the healthcare industry. Paul applies statistics and machine learning in multiple areas of industry, focusing on projects in process engineering, process improvement, metrics and business rules development, anomaly detection, forecasting, clustering and classification. Paul holds a Master of Science in Data Science from Southern Methodist University.
Read more about Paul N Adams

author image
Stuart J Miller

Stuart Miller is a Machine Learning Engineer with degrees in Data Science, Electrical Engineering, and Engineering Physics. Stuart has worked at several Fortune 500 companies, including Texas Instruments and StateFarm, where he built software that utilized statistical and machine learning techniques. Stuart is currently an engineer at Toyota Connected helping to build a more modern cockpit experience for drivers using machine learning.
Read more about Stuart J Miller