You're reading from The Statistics and Calculus with Python Workshop

Product typeBook

Published inAug 2020

Reading LevelBeginner

PublisherPackt

ISBN-139781800209763

Edition1st Edition

Languages

Python

Concepts

Statistics

Authors (6):

Peter Farrell

Alvaro Fuentes

Ajinkya Sudhir Kolhe

Quan Nguyen

Alexander Joseph Sarver

Marios Tsatsos

View More author details

9. Intermediate Statistics with Python

Overview

In this chapter, we will progress through to some intermediate statistical concepts. We will learn what the law of large numbers tells us about the value of the sample mean as a sample gets larger.

By the end of this chapter, you will be able to apply the central limit theorem to describe the distribution of the sample mean, create confidence intervals to describe the possible value of the average with some degree of confidence, use hypothesis testing to evaluate conclusions based on the evidence that our sample provides, and use regression equations to analyze data.

Introduction

In previous chapters, we have described and explored data using descriptive statistics and visual techniques. We have also looked at probability, randomness, and using simulations of random variables to solve problems. The idea of distributions was also examined, which plays a much bigger role later in this chapter.

When looking at applying statistical ideas, there are some important questions to answer concerning methodology. Some examples of these questions could include "how large should I make my sample?" or "how confident can we be in the results?". For this chapter, we will look at how we can apply two of the most important theorems in statistics, starting with their practical implications before moving onto solving common problems using the more useful techniques that are derived from these important ideas.

In this chapter, we will explain what the law of large numbers is and clarify how sample size affects the sample mean. The central...

Law of Large Numbers

There are many schemes and systems that people claim can make you a big winner at the casino. But what these people fail to see is the reason why casinos are lucrative money-makers; the odds are always on the casino's side, ensuring that the casino will come out ahead and always win (in the long run). What the casinos have come to depend on is something called the law of large numbers.

Before we figure out how the casinos always make themselves winners in the long run, we need to define several terms. The first is sample average, or sample mean. The sample mean is what everybody thinks of when they think of the average. You calculate the sample mean by adding up the results and dividing by the number of results. Let's say we flip a coin 10 times and it comes up heads 7 times. We calculate the sample mean, or the average number of heads per flip, like so:

Figure 9.1: Formula for sample mean

The sample average is typically denoted...

Central Limit Theorem

By way of a quick review of the previous section, the law of large numbers tells us that as our sample gets larger, the closer our sample mean matches up with the population average. While this tells us what we should expect the value of the sample mean to be, it does not tell us anything at all about the distribution. For that, we need the central limit theorem. The central limit theorem (CLT) states that if we have a large enough sample size, the distribution of the sample mean is approximately normal, with a mean of the population mean and a standard deviation of the population standard deviation divided by the square root of n. This is important because not only do we know the typical value that our population mean can take, but we know the shape and variance of the distribution as well.

Normal Distribution and the CLT

In Chapter 8, Foundational Probability Concepts and Their Applications, we looked at a type of continuous distribution known as normal...

Confidence Intervals

As we saw with the previous simulations, our sample mean can vary from sample to sample. While, in a simulation, we have the luxury of taking 10,000 samples, we cannot do that in the real world; it would be far too expensive and time-consuming. Typically, we are given only enough resources to gather one sample. So how can we be confident in the results of our sample? Is there any way we can account for this variability when reporting our sample mean?

The good news is that the CLT gives us an idea of the variance in our sample mean. We can apply the CLT and take sampling variability into account by using a confidence interval. More generally, a confidence interval is a range of values for a statistic (an example of a statistic is a sample mean) based on a distribution that has some degree of confidence of how likely it is to contain the true value for the mean. We are not always going to be calculating confidence intervals for just the sample mean; the idea applies...

Hypothesis Testing

In the previous section, we ran simulations where the sample mean changed from sample to sample, despite sampling from the same population. But how will we know if a sample mean we calculate is significantly different from a preconceived value or even a different sample? How will we know if a difference is variability in action, or if the measures are different? The answer lies in conducting a hypothesis test.

A hypothesis test is a statistical test that is designed to determine whether a statistic is significantly different from what we expect. Examples of hypothesis tests include checking to see whether the sample mean is significantly different from a pre-established standard or compare two different samples to see whether they are statistically different or the same.

Parts of a Hypothesis Test

There are three main parts to any hypothesis test: the hypotheses, the test statistic, and the p-value. The hypotheses are what you are conducting the tests on...

Summary

In this chapter, we examined the law of large numbers and how the stability of the sample mean statistic is affected by sample size. Through the CLT, the theoretical underpinnings of confidence intervals and hypothesis testing were examined. Confidence intervals were used to describe sample statistics, such as sample mean, sample proportion, and margin of error. Hypothesis testing was conducted to evaluate two opposing hypotheses using the evidence of a collected sample.

The next chapter begins your study of calculus, where you will examine such topics as the instantaneous rate of change and finding the slope of a curved line. After studying that, we will look at integration, which is finding the area underneath a curve. Finally, we will use derivatives to find optimal values of complicated equations and graphs.

The rest of the chapter is locked

You have been reading a chapter from

The Statistics and Calculus with Python Workshop

Published in: Aug 2020Publisher: PacktISBN-13: 9781800209763

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (6)

Peter Farrell

Peter Farrell learned to program from the Logo code in Seymour Paperts Mindstorms. A student introduced him to Python and he never looked back. In 2015, he self-published Hacking Math Class with Python on applying Python programming to learning and teaching high-school math. In 2019, No Starch Press published his second book, Math Adventures with Python. In his books, he also presents 21st-century topics, such as Cellular Automata, 3D Graphics, and Genetic Algorithms. Currently, he teaches Python and Math in the Dallas, Texas area.
Read more about Peter Farrell

Alvaro Fuentes

Alvaro Fuentes is a senior data scientist with a background in applied mathematics and economics. He has more than 14 years of experience in various analytical roles and is an analytics consultant at one of the ‘Big Three' global management consulting firms, leading advanced analytics projects in different industries like banking, technology, and consumer goods. Alvaro is also an author and trainer in analytics and data science and has published courses and books, such as 'Become a Python Data Analyst' and 'Hands-On Predictive Analytics with Python'. He has also taught data science and related topics to thousands of students both on-site and online through different platforms such as Springboard, Simplilearn, Udemy, and BSG Institute, among others.
Read more about Alvaro Fuentes

Ajinkya Sudhir Kolhe

Ajinkya Sudhir Kolhe is a programmer working for a tech company in the Bay area. He holds a M.S. in Computer Science and has experience in the tech industry of 5+ years. His area of interests include problem solving, analytics and applications in Python.
Read more about Ajinkya Sudhir Kolhe

Quan Nguyen

Quan Nguyen, the author of the first edition of this book, is a Python programmer with a strong passion for machine learning. He holds a dual degree in mathematics and computer science, with a minor in philosophy, earned from DePauw University. Quan is deeply involved in the Python community and has authored multiple Python books, contributing to the Python Software Foundation and regularly sharing insights on DataScience portal. He is currently pursuing a Ph.D. in computer science at Washington University in St. Louis.
Read more about Quan Nguyen

Alexander Joseph Sarver

Alexander Joseph Sarver is an ambitious data scientist and content creator with 6 years of mathematical teaching experience.
Read more about Alexander Joseph Sarver

Marios Tsatsos

Marios Tsatsos has 8+ years of experience in research in Physics, analytical thinking, modeling, problem solving and decision making.
Read more about Marios Tsatsos

Other recommended products

Related to this chapter

Become a Python Data Analyst

Become a Python Data Analyst book introduces you to the mainstream libraries of Python’s Data Science stack. With proven examples and real-world datasets, this book teaches how to effectively perform data manipulation, visualize and analyze data patterns and brings you to the ladder of advanced topics like Predictive Analytics.

BookAug 2018178 pages

Statistics Crash Course for Beginners

Through both theoretical and practical study with Python, this course will get you up to speed with all you need to know about statistics in programming—a core study of machine learning.

BookMar 2021329 pages

Practical Discrete Mathematics

Discrete math deals with studying finite and distinct elements. With this book, you’ll learn the discrete math language and methods crucial to studying and describing objects and functions in computer science. You'll also focus on the mathematics of machine learning and computer science and prepare to understand real-world algorithm development.

BookFeb 2021330 pages

Applying Math with Python

Python has a number of powerful packages to help anyone tackle complex mathematical problems in a simple and efficient way. This practical guide explains how to model real-world problems as mathematical objects in Python and how to perform computations, and interpret results. It explores Python lang to solve a variety of math and statistics problems.

BookJul 2020358 pages

SciPy Recipes

The SciPy stack is a popular Python ecosystem used for mathematical and scientific computing tasks. Learn how you can put to use the various functionalities offered by the SciPy stack in the most efficient way possible. With the help of this book, you will solve real-world problems in linear algebra, numerical analysis, visualization, and more.

BookDec 2017386 pages

Essential Statistics for Non-STEM Data Analysts

Put your data science knowledge to work with this practical guide to statistics. You’ll understand the working mechanism of each method used and find out how data science algorithms function. This book will help you learn the statistical techniques required for key model building and functioning using Python.

BookNov 2020392 pages

Hands-On Mathematics for Deep Learning

The main aim of this book is to make the advanced mathematical background accessible to someone with a programming background. This book will equip the readers with not only deep learning architectures but the mathematics behind them. With this book, you will understand the relevant mathematics that goes behind building deep learning models.

BookJun 2020364 pages

Statistical Application Development with R and Python

Statistical Analysis involves collecting and examining data to describe the nature of data that needs to be analyzed. It helps you explore the relation of data and build models to make better decisions. You will begin with a brief understanding of the nature of data and end with modern and advanced statistical models like CART. Every step is taken with DATA and R code, and further enhanced by Python. By the end of this book you will be able to apply your statistical learning in major domains at work or in your projects.

BookAug 2017432 pages

Hands-On Simulation Modeling with Python

Developers working with the simulation models will be able to put their knowledge to work with this practical guide. You will work with real-world data to uncover various patterns used in complex systems using Python. The book provides a hands-on approach to implementation and associated methodologies to improve or optimize systems.

BookJul 2020346 pages

Hands-On Exploratory Data Analysis with Python

This book provides practical knowledge about the main pillars of EDA including data cleaning, data preparation, data exploration, and data visualization. You can leverage the power of Python to understand, summarize and investigate your data in the best way possible. The book presents a unique approach to exploring hidden features in your data.

BookMar 2020352 pages

Data Analysis with R

R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples.

BookMar 2018570 pages

Mastering pandas

pandas is a popular Python library used by data scientists and analysts worldwide to manipulate and analyze their data. This book presents useful techniques and real-world examples on getting the most out of pandas for expert-level data manipulation, analysis and visualization.

BookOct 2019674 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages