Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Building Statistical Models in Python

You're reading from  Building Statistical Models in Python

Product type Book
Published in Aug 2023
Publisher Packt
ISBN-13 9781804614280
Pages 420 pages
Edition 1st Edition
Languages
Concepts
Authors (3):
Huy Hoang Nguyen Huy Hoang Nguyen
Profile icon Huy Hoang Nguyen
Paul N Adams Paul N Adams
Profile icon Paul N Adams
Stuart J Miller Stuart J Miller
Profile icon Stuart J Miller
View More author details

Table of Contents (22) Chapters

Preface 1. Part 1:Introduction to Statistics
2. Chapter 1: Sampling and Generalization 3. Chapter 2: Distributions of Data 4. Chapter 3: Hypothesis Testing 5. Chapter 4: Parametric Tests 6. Chapter 5: Non-Parametric Tests 7. Part 2:Regression Models
8. Chapter 6: Simple Linear Regression 9. Chapter 7: Multiple Linear Regression 10. Part 3:Classification Models
11. Chapter 8: Discrete Models 12. Chapter 9: Discriminant Analysis 13. Part 4:Time Series Models
14. Chapter 10: Introduction to Time Series 15. Chapter 11: ARIMA Models 16. Chapter 12: Multivariate Time Series 17. Part 5:Survival Analysis
18. Chapter 13: Time-to-Event Variables – An Introduction 19. Chapter 14: Survival Models 20. Index 21. Other Books You May Enjoy

Hypothesis Testing

In this chapter, we will begin discussing drawing statistical conclusions from data, putting together sampling and experiment design from Chapter 1, Sampling and Generalization and distributions from Chapter 2, Distributions of Data. Our primary use of statistical modeling is to answer questions of interest from data. Hypothesis testing provides a formal framework for answering questions of interest with measures of uncertainty. First, we will cover the goals and structure of hypothesis testing. Then, we will talk about the errors that can occur from hypothesis tests and define the expected error rate. Then, we will walk through the hypothesis test process utilizing the z-test. Finally, we will discuss statistical power analysis.

In this chapter, we’re going to cover the following main topics:

  • The goal of hypothesis testing
  • Type I and type II errors
  • Basics of the z-test – the z-score, z-statistic, critical values, and p-values
  • ...

The goal of hypothesis testing

Put simply, the goal of hypothesis testing is to decide whether the data we have is sufficient to support a particular hypothesis. The hypothesis test provides a formal framework for testing a hypothesis based on our data rather than attempting to decide based on visual inspection. In this section, we will discuss the process of hypothesis testing. In the next section, Basics of the z-test – the z-score, z-statistic, critical values, and p-values, we will put the process to work by walking through an example in detail with the z-test.

Overview of a hypothesis test for the mean

To understand the hypothesis testing process, let’s start with a simple example. Suppose we have a factory with machines that produce widgets, and we expect our machines to produce widgets at a certain rate (30 widgets per hour). We start by constructing two hypotheses, the null hypothesis and the alternative hypothesis. The null hypothesis and alternative hypothesis...

Type I and Type II errors

While data can give us a good idea of the characteristics of a distribution, it is possible for a hypothesis test to result in an error. Errors can occur because we are taking a random sample from a population. While randomization makes it less likely that a sample contains sampling bias, there is no guarantee that a random sample will be representative of the population. There are two possible errors that could occur as a result of a hypothesis test:

  • Type I error: Rejecting the null hypothesis when it is actually true
  • Type II error: Failure to reject the null hypothesis when it is actually false

Type I errors

A type I error occurs when a hypothesis test results in rejecting the null hypothesis, but the null hypothesis is actually true. For example, say we have a distribution of data with a population mean of 30. We state our null hypothesis as H 0 :  _ x  = 30. We take a random sample for our test, but the random...

Basics of the z-test – the z-score, z-statistic, critical values, and p-values

In this section, we will discuss a type of hypothesis test called the z-test. It is a statistical procedure using sample data assumed to be normally distributed to determine whether a statistical statement related to the value of a population parameter should be rejected or not. The test can be performed on the following:

  • One sample (a left-tailed z-test, right-tailed z-test, or two-tailed z-test)
  • Two samples (a two-sample z-test)
  • Proportions (a one-proportion z-test or two-proportion z-test)

The test assumes that the standard deviation is known and the sample size is large enough. In practice, a sample size that is larger than 30 should be considered.

Before going into different types of z-tests, we will discuss the z-score and z-statistic.

The z-score and z-statistic

To measure how far a particular value from a mean is, we could use the z-score or the z-statistic...

Summary

In this chapter, we introduced the concept of a hypothesis test. We started with a basic outline of a hypothesis test with the four key steps:

  • State the hypothesis
  • Perform the test
  • Determine whether to reject or fail to reject the null hypothesis
  • Draw a statistical conclusion with a scope of inference

Then we talked about potential errors that can occur and false positives and false negatives and defined the expected error rate (alpha) of a test and the power (beta) of a test.

We also discussed the statistical procedure called the z-test. This is a type of hypothesis test using sample data assumed to be normally distributed. The z-score and z-statistic were also introduced in the section on different types of z-tests, such as one-sample or two-sample z-tests for means or proportions.

Finally, we discussed the concept and motivation behind the power analysis, which can be used to identify the probability of incorrectly rejecting the null hypothesis...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Building Statistical Models in Python
Published in: Aug 2023 Publisher: Packt ISBN-13: 9781804614280
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}