You're reading from Mastering pandas. - Second Edition

Product typeBook

Published inOct 2019

Reading LevelIntermediate

Publisher

ISBN-139781789343236

Edition2nd Edition

Languages

Python

Tools

Pandas

Concepts

Scientific Computing

Author (1)

Ashish Kumar

A Brief Tour of Bayesian Statistics and Maximum Likelihood Estimates

In this chapter, we take a brief tour of an alternative approach to statistical inference called Bayesian statistics. It is not intended to be a full primer, but will simply serve as an introduction to the Bayesian approach. We will also explore the associated Python-related libraries and learn how to use pandas and matplotlib to help with the data analysis. The various topics that will be discussed are as follows:

Introduction to Bayesian statistics
The mathematical framework for Bayesian statistics
Probability distributions
Bayesian versus frequentist statistics
Introduction to PyMC and Monte Carlo simulations
Bayesian analysis example – switchpoint detection

Introduction to Bayesian statistics

The field of Bayesian statistics is built on the work of Reverend Thomas Bayes, an 18th-century statistician, philosopher, and Presbyterian minister. His famous Bayes' theorem, which forms the theoretical underpinnings of Bayesian statistics, was published posthumously in 1763 as a solution to the problem of inverse probability. For more details on this topic, refer to http://en.wikipedia.org/wiki/Thomas_Bayes.

Inverse probability problems were all the rage in the early 18th century, and were often formulated as follows.

Suppose you play a game with a friend. There are 10 green balls and 7 red balls in bag 1 and 4 green balls and 7 red balls in bag 2. Your friend tosses a coin (without telling you the result), picks a ball from one of the bags at random, and shows it to you. The ball is red. What is the probability that the ball was drawn...

The mathematical framework for Bayesian statistics

Bayesian methods are an alternative way of making a statistical inference. We will first look at Bayes' theorem, the fundamental equation from which all Bayesian inference is derived.

A few definitions regarding probability are in order before we begin:

A,B: These are events that can occur with a certain probability.
P(A) and P(B): This is the probability of the occurrence of a particular event.
P(A|B): This is the probability of A happening, given that B has occurred. This is known as a conditional probability.
P(AB) = P(A and B): This is the probability of A and B occurring together.

We begin with the following basic assumption:

P(AB) = P(B) * P(A|B)

The preceding equation shows the relation of the joint probability of P(AB) to the conditional probability P(A|B) and what is known as the marginal probability, P(B). If...

Probability distributions

In this section, we will briefly examine the properties of various probability distributions. Many of these distributions are used for Bayesian analysis, and so a brief synopsis is needed before we can proceed. We will also illustrate how to generate and display these distributions using matplotlib. In order to avoid repeating the import statements for every code snippet in each section, we will be presenting the following standard set of Python code imports that need to be run before any of the code snippets mentioned in the following command. You only need to run these imports once per session. The imports are as follows:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import colors
import matplotlib.pyplot as plt
import matplotlib
%matplotlib inline

...

Bayesian statistics versus frequentist statistics

In statistics today, there are two schools of thought as to how we interpret data and make statistical inferences. The classical and more dominant approach to date has been what is termed the frequentist approach (refer to Chapter 7, A Tour of Statistics – The Classical Approach). We are looking at the Bayesian approach in this chapter.

What is probability?

At the heart of the debate between the Bayesian and frequentist worldview is the question of how we define probability.

In the frequentist worldview, probability is a notion that is derived from the frequencies of repeated events—for example, when we define the probability of getting heads when a fair coin...

Conducting Bayesian statistical analysis

Conducting a Bayesian statistical analysis involves the following steps:

Specifying a probability model: In this step, we fully describe the model using a probability distribution. Based on the distribution of a sample that we have taken, we try to fit a model to it and attempt to assign probabilities to unknown parameters.
Calculating a posterior distribution: The posterior distribution is a distribution that we calculate in light of observed data. In this case, we will directly apply Bayes' formula. It will be specified as a function of the probability model that we specified in the previous step.

Checking our model: This is a necessary step where we review our model and its outputs before we make inferences. Bayesian inference methods use probability distributions to assign probabilities to possible outcomes.

...

Monte Carlo estimation of the likelihood function and PyMC

Bayesian statistics isn't just another method. It is an entirely different paradigm for practicing statistics. It uses probability models for making inferences, given the data that has been collected. This can be expressed in a fundamental expression as P(H|D).

Here, H is our hypothesis, that is, the thing we're trying to prove, and D is our data or observations.

As a reminder of our previous discussion, the diachronic form of Bayes' theorem is as follows:

Here, P(H) is an unconditional prior probability that represents what we know before we conduct our trial. P(D|H) is our likelihood function, or probability of obtaining the data we observe, given that our hypothesis is true.

P(D) is the probability of the data, also known as the normalizing constant. This can be obtained by integrating the numerator...

References

For a more in-depth look at other Bayesian statistics topics that we touched upon, please take a look at the following references:

Probabilistic Programming and Bayesian Methods for Hackers: https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
Bayesian Data Analysis, Third Edition, Andrew Gelman: http://www.amazon.com/Bayesian-Analysis-Chapman-Statistical-Science/dp/1439840954
The Bayesian Choice, Christian P Robert (this is more theoretical): http://www.springer.com/us/book/9780387952314
PyMC documentation: http://pymc-devs.github.io/pymc/index.html

Summary

In this chapter, we undertook a whirlwind tour of one of the hottest trends in statistics and data analysis in the past few years—the Bayesian approach to statistical inference. We covered a lot of ground here.

We examined what the Bayesian approach to statistics entails and discussed the various reasons why the Bayesian view is a compelling one, such as the fact that it values facts over belief. We explained the key statistical distributions and showed how we can use the various statistical packages to generate and plot them in matplotlib.

We tackled a rather difficult topic without too much oversimplification and demonstrated how we can use the PyMC package and Monte Carlo simulation methods to showcase the power of Bayesian statistics to formulate models, perform trend analysis, and make inferences on a real-world dataset (Facebook user posts). The concept of...

The rest of the chapter is locked

You have been reading a chapter from

Mastering pandas. - Second Edition

Published in: Oct 2019Publisher: ISBN-13: 9781789343236

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Ashish Kumar

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.
Read more about Ashish Kumar

Other recommended products

Related to this chapter

Mastering Exploratory Analysis with pandas

Exploratory data analysis exploits the visual properties of the datasets that are commonly used by data scientists. It helps you build custom data pipelines to address data analysis tasks. This book uses pandas, the most popular Python library for data analysis, and helps you build end-to-end exploratory data-analysis solutions

BookSep 2018140 pages

Learning pandas

Pandas is a popular Python package used for practical, real world data analysis. It provides efficient fast, high-performance data structures that makes data exploration and analysis very easy. This learner's guide will help you through a comprehensive set of features provided by the pandas library to perform efficient data manipulation and analysis.

BookJun 2017446 pages

Hands-On Data Analysis with NumPy and Pandas

In this book, you will explore two important Python packages used by Data Analysts, NumPy & pandas. You will dive into different concepts such as reading, sorting, grouping of data, and also learn how to work with different data formats for your data analysis projects.

BookJun 2018168 pages5

Pandas Cookbook

Explore pandas, the powerful Python library for data analysis and manipulation by working on real-world datasets. Get to grips with the fundamentals and learn to use pandas to clean messy data, independently analyze groups within your data, make powerful time-series calculations, and create beautiful visualizations during exploratory data analysis.

BookOct 2017532 pages

Become a Python Data Analyst

Become a Python Data Analyst book introduces you to the mainstream libraries of Python’s Data Science stack. With proven examples and real-world datasets, this book teaches how to effectively perform data manipulation, visualize and analyze data patterns and brings you to the ladder of advanced topics like Predictive Analytics.

BookAug 2018178 pages

Hands-On Financial Trading with Python

This book focuses on key Python analytics and algorithmic trading libraries used for backtesting. With the help of practical examples, you will learn the principle aspects of trading strategy development. The 14 profitable strategies included in the book will also help you build intuitions that will enable you to create your own strategy.

BookApr 2021360 pages

Python Data Analysis

This book will show data analysis tasks, ranging from data retrieval, cleaning, manipulation, visualization, and storage to complex analysis and modeling using a variety of modules such as NumPy, SciPy, matplotlib, pandas, scikit-learn, and NLTK. You will be able to analyze different kinds of data including numeric, text, time-series, graph, and social media.

BookMar 2017330 pages

Statistics Crash Course for Beginners

Through both theoretical and practical study with Python, this course will get you up to speed with all you need to know about statistics in programming—a core study of machine learning.

BookMar 2021329 pages

Pandas 1.x Cookbook

A new edition of the bestselling Pandas cookbook updated to pandas 1.x with new chapters on creating and testing, and exploratory data analysis. Recipes are written with modern pandas constructs. This book also covers EDA, tidying data, pivoting data, time-series calculations, visualizations, and more.

BookFeb 2020626 pages

Essential Statistics for Non-STEM Data Analysts

Put your data science knowledge to work with this practical guide to statistics. You’ll understand the working mechanism of each method used and find out how data science algorithms function. This book will help you learn the statistical techniques required for key model building and functioning using Python.

BookNov 2020392 pages

Hands-On Exploratory Data Analysis with Python

This book provides practical knowledge about the main pillars of EDA including data cleaning, data preparation, data exploration, and data visualization. You can leverage the power of Python to understand, summarize and investigate your data in the best way possible. The book presents a unique approach to exploring hidden features in your data.

BookMar 2020352 pages

Python Data Analysis

This book takes a practical approach to Python data analysis, showing you how to use Python libraries such as pandas, NumPy, SciPy, and scikit-learn to analyze a variety of data. You’ll also get up to speed with everything from data manipulation to visualization systematically.

BookFeb 2021478 pages5

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Mastering pandas. - Second Edition

Unlock this book and the full library FREE for 7 days

Author (1)

Mastering Exploratory Analysis with pandas

Learning pandas

Hands-On Data Analysis with NumPy and Pandas

In this book, you will explore two important Python packages used by Data Analysts, NumPy &amp; pandas. You will dive into different concepts such as reading, sorting, grouping of data, and also learn how to work with different data formats for your data analysis projects.

Pandas Cookbook

Become a Python Data Analyst

Hands-On Financial Trading with Python

Python Data Analysis

Statistics Crash Course for Beginners

Through both theoretical and practical study with Python, this course will get you up to speed with all you need to know about statistics in programming—a core study of machine learning.

Pandas 1.x Cookbook

Essential Statistics for Non-STEM Data Analysts

Hands-On Exploratory Data Analysis with Python

Python Data Analysis

This book takes a practical approach to Python data analysis, showing you how to use Python libraries such as pandas, NumPy, SciPy, and scikit-learn to analyze a variety of data. You’ll also get up to speed with everything from data manipulation to visualization systematically.

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook

In this book, you will explore two important Python packages used by Data Analysts, NumPy & pandas. You will dive into different concepts such as reading, sorting, grouping of data, and also learn how to work with different data formats for your data analysis projects.