Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Building Statistical Models in Python

You're reading from  Building Statistical Models in Python

Product type Book
Published in Aug 2023
Publisher Packt
ISBN-13 9781804614280
Pages 420 pages
Edition 1st Edition
Languages
Concepts
Authors (3):
Huy Hoang Nguyen Huy Hoang Nguyen
Profile icon Huy Hoang Nguyen
Paul N Adams Paul N Adams
Profile icon Paul N Adams
Stuart J Miller Stuart J Miller
Profile icon Stuart J Miller
View More author details

Table of Contents (22) Chapters

Preface Part 1:Introduction to Statistics
Chapter 1: Sampling and Generalization Chapter 2: Distributions of Data Chapter 3: Hypothesis Testing Chapter 4: Parametric Tests Chapter 5: Non-Parametric Tests Part 2:Regression Models
Chapter 6: Simple Linear Regression Chapter 7: Multiple Linear Regression Part 3:Classification Models
Chapter 8: Discrete Models Chapter 9: Discriminant Analysis Part 4:Time Series Models
Chapter 10: Introduction to Time Series Chapter 11: ARIMA Models Chapter 12: Multivariate Time Series Part 5:Survival Analysis
Chapter 13: Time-to-Event Variables – An Introduction Chapter 14: Survival Models Index Other Books You May Enjoy

Parametric Tests

In the previous chapter, we introduced the concept of a hypothesis test and showed several applications of the z-test. The z-test is a type of hypothesis test in a family of hypothesis tests called parametric tests. Parametric tests are powerful hypothesis tests, but the application of parametric tests requires certain assumptions to be met by the data. While the z-test is a useful test, it is limited by the required assumptions. In this chapter, we will discuss several more parametric tests, which will expand our parametric tool set. More specifically, we will discuss the various applications of the t-test, how to perform tests when more than two subgroups of data are present, and the hypothesis test for Pearson’s correlation coefficient. We will complete the chapter with a discussion on power analysis for parametric tests.

In this chapter, we’re going to cover the following main topics:

  • Assumptions of parametric tests
  • T-test—a...

Assumptions of parametric tests

Parametric tests make assumptions about population data that require the statistics practitioner to perform analysis of data prior to modeling, especially when using sample data because the sample statistics are leveraged as estimates for the population parameters when the true population parameters are unknown. These are the three primary assumptions of parametric hypothesis tests:

  • Normally distributed population data
  • Samples are independent
  • Equal population variances (when comparing two or more groups)

In this chapter, we discuss the z-test, t-test, ANOVA, and Pearson’s correlation. These tests are used on continuous data. In addition to these assumptions, Pearson’s correlation requires data to contain paired samples. In other words, there must be an equal number of samples in each group being compared as Pearson’s correlation is based on pairwise comparisons.

While these assumptions are ideal, there are...

T-test – a parametric hypothesis test

In the last chapter, the z-test for means was applied when population standard deviations were known. However, in the real world, it is not easy (or virtually impossible) to obtain the population standard deviation. In this section, we will discuss another hypothesis test called the t-test, which is used when the population standard deviations are unknown. The mean and the standard deviation of a population are estimated by taking the mean and the standard deviation of sample data representative of this population.

Broadly speaking, the method for the t-test for means is very similar to the one for the z-test for means, but the calculations for the test statistic and p-value are not the same as for the z-test. The test statistic is computed by the following formula:

t =   _ x  μ _ s/  _ n  

Here,  _ x  , μ, s, and n are the sample mean...

Tests with more than two groups and ANOVA

In the previous chapter and previous sections, we covered tests between two groups. In this section, we will cover two methods for testing differences between groups, as follows:

  • Pairwise tests with the Bonferroni correction
  • ANOVA

When testing for differences between more than two groups, we will have to use multiple tests, which will affect our type I error rate. There are several methods to control the error rate. We will see how to utilize the Bonferroni correction to control the Type I error rate. We will also discuss ANOVA in this section, which is used to test for a difference in means of multiple groups.

Multiple tests for significance

In the previous sections, we looked at making a comparison between two groups. In this section, we will consider how to perform tests when there are more than two groups present. Let’s again consider the factory example where we have several models (model A, model B, and...

Summary

This chapter covered topics of parametric tests. Starting with the assumptions of parametric tests, we identified and applied methods for testing the violation of these assumptions and discussed scenarios where robustness can be assumed when the required assumptions are not met. We then looked at one of the most popular alternatives to the z-test, the t-test. We iterated through multiple applications of this test, covering one-sample and two-sample versions of this test using pooling, pairing, and Welch’s non-pooled version of the two-sample analysis. Next, we explored ANOVA techniques, where we looked at using data from multiple groups to identify statistically significant differences between them. This included one of the most popular adjustments to the p-value for when a high volume of groups is present—the Bonferroni correction, which helps prevent inflating the Type I error when performing multiple tests. We then looked at performing correlation analysis...

References

[1] Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Building Statistical Models in Python
Published in: Aug 2023 Publisher: Packt ISBN-13: 9781804614280
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}