You're reading from Building Statistical Models in Python

Product typeBook

Published inAug 2023

Reading LevelIntermediate

PublisherPackt

ISBN-139781804614280

Edition1st Edition

Languages

Python

Concepts

Statistics

Authors (3):

Huy Hoang Nguyen

Paul N Adams

Stuart J Miller

View More author details

Hypothesis Testing

In this chapter, we will begin discussing drawing statistical conclusions from data, putting together sampling and experiment design from Chapter 1, Sampling and Generalization and distributions from Chapter 2, Distributions of Data. Our primary use of statistical modeling is to answer questions of interest from data. Hypothesis testing provides a formal framework for answering questions of interest with measures of uncertainty. First, we will cover the goals and structure of hypothesis testing. Then, we will talk about the errors that can occur from hypothesis tests and define the expected error rate. Then, we will walk through the hypothesis test process utilizing the z-test. Finally, we will discuss statistical power analysis.

In this chapter, we’re going to cover the following main topics:

The goal of hypothesis testing
Type I and type II errors
Basics of the z-test – the z-score, z-statistic, critical values, and p-values

The goal of hypothesis testing

Put simply, the goal of hypothesis testing is to decide whether the data we have is sufficient to support a particular hypothesis. The hypothesis test provides a formal framework for testing a hypothesis based on our data rather than attempting to decide based on visual inspection. In this section, we will discuss the process of hypothesis testing. In the next section, Basics of the z-test – the z-score, z-statistic, critical values, and p-values, we will put the process to work by walking through an example in detail with the z-test.

Overview of a hypothesis test for the mean

To understand the hypothesis testing process, let’s start with a simple example. Suppose we have a factory with machines that produce widgets, and we expect our machines to produce widgets at a certain rate (30 widgets per hour). We start by constructing two hypotheses, the null hypothesis and the alternative hypothesis. The null hypothesis and alternative hypothesis...

Type I and Type II errors

While data can give us a good idea of the characteristics of a distribution, it is possible for a hypothesis test to result in an error. Errors can occur because we are taking a random sample from a population. While randomization makes it less likely that a sample contains sampling bias, there is no guarantee that a random sample will be representative of the population. There are two possible errors that could occur as a result of a hypothesis test:

Type I error: Rejecting the null hypothesis when it is actually true
Type II error: Failure to reject the null hypothesis when it is actually false

Type I errors

A type I error occurs when a hypothesis test results in rejecting the null hypothesis, but the null hypothesis is actually true. For example, say we have a distribution of data with a population mean of 30. We state our null hypothesis as H 0 : _ x = 30. We take a random sample for our test, but the random...

Basics of the z-test – the z-score, z-statistic, critical values, and p-values

In this section, we will discuss a type of hypothesis test called the z-test. It is a statistical procedure using sample data assumed to be normally distributed to determine whether a statistical statement related to the value of a population parameter should be rejected or not. The test can be performed on the following:

One sample (a left-tailed z-test, right-tailed z-test, or two-tailed z-test)
Two samples (a two-sample z-test)
Proportions (a one-proportion z-test or two-proportion z-test)

The test assumes that the standard deviation is known and the sample size is large enough. In practice, a sample size that is larger than 30 should be considered.

Before going into different types of z-tests, we will discuss the z-score and z-statistic.

The z-score and z-statistic

To measure how far a particular value from a mean is, we could use the z-score or the z-statistic...

Summary

In this chapter, we introduced the concept of a hypothesis test. We started with a basic outline of a hypothesis test with the four key steps:

State the hypothesis
Perform the test
Determine whether to reject or fail to reject the null hypothesis
Draw a statistical conclusion with a scope of inference

Then we talked about potential errors that can occur and false positives and false negatives and defined the expected error rate (alpha) of a test and the power (beta) of a test.

We also discussed the statistical procedure called the z-test. This is a type of hypothesis test using sample data assumed to be normally distributed. The z-score and z-statistic were also introduced in the section on different types of z-tests, such as one-sample or two-sample z-tests for means or proportions.

Finally, we discussed the concept and motivation behind the power analysis, which can be used to identify the probability of incorrectly rejecting the null hypothesis...

The rest of the chapter is locked

You have been reading a chapter from

Building Statistical Models in Python

Published in: Aug 2023Publisher: PacktISBN-13: 9781804614280

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Authors (3)

Huy Hoang Nguyen

Huy Hoang Nguyen is a Mathematician and a Data Scientist with far-ranging experience, championing advanced mathematics and strategic leadership, and applied machine learning research. He holds a Master's in Data Science and a PhD in Mathematics. His previous work was related to Partial Differential Equations, Functional Analysis and their applications in Fluid Mechanics. He transitioned from academia to the healthcare industry and has performed different Data Science projects from traditional Machine Learning to Deep Learning.
Read more about Huy Hoang Nguyen

Paul N Adams

Paul Adams is a Data Scientist with a background primarily in the healthcare industry. Paul applies statistics and machine learning in multiple areas of industry, focusing on projects in process engineering, process improvement, metrics and business rules development, anomaly detection, forecasting, clustering and classification. Paul holds a Master of Science in Data Science from Southern Methodist University.
Read more about Paul N Adams

Stuart J Miller

Stuart Miller is a Machine Learning Engineer with degrees in Data Science, Electrical Engineering, and Engineering Physics. Stuart has worked at several Fortune 500 companies, including Texas Instruments and StateFarm, where he built software that utilized statistical and machine learning techniques. Stuart is currently an engineer at Toyota Connected helping to build a more modern cockpit experience for drivers using machine learning.
Read more about Stuart J Miller

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages