Reader small image

You're reading from  The Statistics and Calculus with Python Workshop

Product typeBook
Published inAug 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781800209763
Edition1st Edition
Languages
Concepts
Right arrow
Authors (6):
Peter Farrell
Peter Farrell
author image
Peter Farrell

Peter Farrell learned to program from the Logo code in Seymour Paperts Mindstorms. A student introduced him to Python and he never looked back. In 2015, he self-published Hacking Math Class with Python on applying Python programming to learning and teaching high-school math. In 2019, No Starch Press published his second book, Math Adventures with Python. In his books, he also presents 21st-century topics, such as Cellular Automata, 3D Graphics, and Genetic Algorithms. Currently, he teaches Python and Math in the Dallas, Texas area.
Read more about Peter Farrell

Alvaro Fuentes
Alvaro Fuentes
author image
Alvaro Fuentes

Alvaro Fuentes is a senior data scientist with a background in applied mathematics and economics. He has more than 14 years of experience in various analytical roles and is an analytics consultant at one of the ‘Big Three' global management consulting firms, leading advanced analytics projects in different industries like banking, technology, and consumer goods. Alvaro is also an author and trainer in analytics and data science and has published courses and books, such as 'Become a Python Data Analyst' and 'Hands-On Predictive Analytics with Python'. He has also taught data science and related topics to thousands of students both on-site and online through different platforms such as Springboard, Simplilearn, Udemy, and BSG Institute, among others.
Read more about Alvaro Fuentes

Ajinkya Sudhir Kolhe
Ajinkya Sudhir Kolhe
author image
Ajinkya Sudhir Kolhe

Ajinkya Sudhir Kolhe is a programmer working for a tech company in the Bay area. He holds a M.S. in Computer Science and has experience in the tech industry of 5+ years. His area of interests include problem solving, analytics and applications in Python.
Read more about Ajinkya Sudhir Kolhe

Quan Nguyen
Quan Nguyen
author image
Quan Nguyen

Quan Nguyen, the author of the first edition of this book, is a Python programmer with a strong passion for machine learning. He holds a dual degree in mathematics and computer science, with a minor in philosophy, earned from DePauw University. Quan is deeply involved in the Python community and has authored multiple Python books, contributing to the Python Software Foundation and regularly sharing insights on DataScience portal. He is currently pursuing a Ph.D. in computer science at Washington University in St. Louis.
Read more about Quan Nguyen

Alexander Joseph Sarver
Alexander Joseph Sarver
author image
Alexander Joseph Sarver

Alexander Joseph Sarver is an ambitious data scientist and content creator with 6 years of mathematical teaching experience.
Read more about Alexander Joseph Sarver

Marios Tsatsos
Marios Tsatsos
author image
Marios Tsatsos

Marios Tsatsos has 8+ years of experience in research in Physics, analytical thinking, modeling, problem solving and decision making.
Read more about Marios Tsatsos

View More author details
Right arrow

9. Intermediate Statistics with Python

Overview

In this chapter, we will progress through to some intermediate statistical concepts. We will learn what the law of large numbers tells us about the value of the sample mean as a sample gets larger.

By the end of this chapter, you will be able to apply the central limit theorem to describe the distribution of the sample mean, create confidence intervals to describe the possible value of the average with some degree of confidence, use hypothesis testing to evaluate conclusions based on the evidence that our sample provides, and use regression equations to analyze data.

Introduction

In previous chapters, we have described and explored data using descriptive statistics and visual techniques. We have also looked at probability, randomness, and using simulations of random variables to solve problems. The idea of distributions was also examined, which plays a much bigger role later in this chapter.

When looking at applying statistical ideas, there are some important questions to answer concerning methodology. Some examples of these questions could include "how large should I make my sample?" or "how confident can we be in the results?". For this chapter, we will look at how we can apply two of the most important theorems in statistics, starting with their practical implications before moving onto solving common problems using the more useful techniques that are derived from these important ideas.

In this chapter, we will explain what the law of large numbers is and clarify how sample size affects the sample mean. The central...

Law of Large Numbers

There are many schemes and systems that people claim can make you a big winner at the casino. But what these people fail to see is the reason why casinos are lucrative money-makers; the odds are always on the casino's side, ensuring that the casino will come out ahead and always win (in the long run). What the casinos have come to depend on is something called the law of large numbers.

Before we figure out how the casinos always make themselves winners in the long run, we need to define several terms. The first is sample average, or sample mean. The sample mean is what everybody thinks of when they think of the average. You calculate the sample mean by adding up the results and dividing by the number of results. Let's say we flip a coin 10 times and it comes up heads 7 times. We calculate the sample mean, or the average number of heads per flip, like so:

Figure 9.1: Formula for sample mean

The sample average is typically denoted...

Central Limit Theorem

By way of a quick review of the previous section, the law of large numbers tells us that as our sample gets larger, the closer our sample mean matches up with the population average. While this tells us what we should expect the value of the sample mean to be, it does not tell us anything at all about the distribution. For that, we need the central limit theorem. The central limit theorem (CLT) states that if we have a large enough sample size, the distribution of the sample mean is approximately normal, with a mean of the population mean and a standard deviation of the population standard deviation divided by the square root of n. This is important because not only do we know the typical value that our population mean can take, but we know the shape and variance of the distribution as well.

Normal Distribution and the CLT

In Chapter 8, Foundational Probability Concepts and Their Applications, we looked at a type of continuous distribution known as normal...

Confidence Intervals

As we saw with the previous simulations, our sample mean can vary from sample to sample. While, in a simulation, we have the luxury of taking 10,000 samples, we cannot do that in the real world; it would be far too expensive and time-consuming. Typically, we are given only enough resources to gather one sample. So how can we be confident in the results of our sample? Is there any way we can account for this variability when reporting our sample mean?

The good news is that the CLT gives us an idea of the variance in our sample mean. We can apply the CLT and take sampling variability into account by using a confidence interval. More generally, a confidence interval is a range of values for a statistic (an example of a statistic is a sample mean) based on a distribution that has some degree of confidence of how likely it is to contain the true value for the mean. We are not always going to be calculating confidence intervals for just the sample mean; the idea applies...

Hypothesis Testing

In the previous section, we ran simulations where the sample mean changed from sample to sample, despite sampling from the same population. But how will we know if a sample mean we calculate is significantly different from a preconceived value or even a different sample? How will we know if a difference is variability in action, or if the measures are different? The answer lies in conducting a hypothesis test.

A hypothesis test is a statistical test that is designed to determine whether a statistic is significantly different from what we expect. Examples of hypothesis tests include checking to see whether the sample mean is significantly different from a pre-established standard or compare two different samples to see whether they are statistically different or the same.

Parts of a Hypothesis Test

There are three main parts to any hypothesis test: the hypotheses, the test statistic, and the p-value. The hypotheses are what you are conducting the tests on...

Summary

In this chapter, we examined the law of large numbers and how the stability of the sample mean statistic is affected by sample size. Through the CLT, the theoretical underpinnings of confidence intervals and hypothesis testing were examined. Confidence intervals were used to describe sample statistics, such as sample mean, sample proportion, and margin of error. Hypothesis testing was conducted to evaluate two opposing hypotheses using the evidence of a collected sample.

The next chapter begins your study of calculus, where you will examine such topics as the instantaneous rate of change and finding the slope of a curved line. After studying that, we will look at integration, which is finding the area underneath a curve. Finally, we will use derivatives to find optimal values of complicated equations and graphs.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
The Statistics and Calculus with Python Workshop
Published in: Aug 2020Publisher: PacktISBN-13: 9781800209763
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (6)

author image
Peter Farrell

Peter Farrell learned to program from the Logo code in Seymour Paperts Mindstorms. A student introduced him to Python and he never looked back. In 2015, he self-published Hacking Math Class with Python on applying Python programming to learning and teaching high-school math. In 2019, No Starch Press published his second book, Math Adventures with Python. In his books, he also presents 21st-century topics, such as Cellular Automata, 3D Graphics, and Genetic Algorithms. Currently, he teaches Python and Math in the Dallas, Texas area.
Read more about Peter Farrell

author image
Alvaro Fuentes

Alvaro Fuentes is a senior data scientist with a background in applied mathematics and economics. He has more than 14 years of experience in various analytical roles and is an analytics consultant at one of the ‘Big Three' global management consulting firms, leading advanced analytics projects in different industries like banking, technology, and consumer goods. Alvaro is also an author and trainer in analytics and data science and has published courses and books, such as 'Become a Python Data Analyst' and 'Hands-On Predictive Analytics with Python'. He has also taught data science and related topics to thousands of students both on-site and online through different platforms such as Springboard, Simplilearn, Udemy, and BSG Institute, among others.
Read more about Alvaro Fuentes

author image
Ajinkya Sudhir Kolhe

Ajinkya Sudhir Kolhe is a programmer working for a tech company in the Bay area. He holds a M.S. in Computer Science and has experience in the tech industry of 5+ years. His area of interests include problem solving, analytics and applications in Python.
Read more about Ajinkya Sudhir Kolhe

author image
Quan Nguyen

Quan Nguyen, the author of the first edition of this book, is a Python programmer with a strong passion for machine learning. He holds a dual degree in mathematics and computer science, with a minor in philosophy, earned from DePauw University. Quan is deeply involved in the Python community and has authored multiple Python books, contributing to the Python Software Foundation and regularly sharing insights on DataScience portal. He is currently pursuing a Ph.D. in computer science at Washington University in St. Louis.
Read more about Quan Nguyen

author image
Alexander Joseph Sarver

Alexander Joseph Sarver is an ambitious data scientist and content creator with 6 years of mathematical teaching experience.
Read more about Alexander Joseph Sarver

author image
Marios Tsatsos

Marios Tsatsos has 8+ years of experience in research in Physics, analytical thinking, modeling, problem solving and decision making.
Read more about Marios Tsatsos