You're reading from Building Statistical Models in Python

Product typeBook

Published inAug 2023

Reading LevelIntermediate

PublisherPackt

ISBN-139781804614280

Edition1st Edition

Languages

Python

Concepts

Statistics

Authors (3):

Huy Hoang Nguyen

Paul N Adams

Stuart J Miller

View More author details

Parametric Tests

In the previous chapter, we introduced the concept of a hypothesis test and showed several applications of the z-test. The z-test is a type of hypothesis test in a family of hypothesis tests called parametric tests. Parametric tests are powerful hypothesis tests, but the application of parametric tests requires certain assumptions to be met by the data. While the z-test is a useful test, it is limited by the required assumptions. In this chapter, we will discuss several more parametric tests, which will expand our parametric tool set. More specifically, we will discuss the various applications of the t-test, how to perform tests when more than two subgroups of data are present, and the hypothesis test for Pearson’s correlation coefficient. We will complete the chapter with a discussion on power analysis for parametric tests.

In this chapter, we’re going to cover the following main topics:

Assumptions of parametric tests
T-test—a...

Assumptions of parametric tests

Parametric tests make assumptions about population data that require the statistics practitioner to perform analysis of data prior to modeling, especially when using sample data because the sample statistics are leveraged as estimates for the population parameters when the true population parameters are unknown. These are the three primary assumptions of parametric hypothesis tests:

Normally distributed population data
Samples are independent
Equal population variances (when comparing two or more groups)

In this chapter, we discuss the z-test, t-test, ANOVA, and Pearson’s correlation. These tests are used on continuous data. In addition to these assumptions, Pearson’s correlation requires data to contain paired samples. In other words, there must be an equal number of samples in each group being compared as Pearson’s correlation is based on pairwise comparisons.

While these assumptions are ideal, there are...

T-test – a parametric hypothesis test

In the last chapter, the z-test for means was applied when population standard deviations were known. However, in the real world, it is not easy (or virtually impossible) to obtain the population standard deviation. In this section, we will discuss another hypothesis test called the t-test, which is used when the population standard deviations are unknown. The mean and the standard deviation of a population are estimated by taking the mean and the standard deviation of sample data representative of this population.

Broadly speaking, the method for the t-test for means is very similar to the one for the z-test for means, but the calculations for the test statistic and p-value are not the same as for the z-test. The test statistic is computed by the following formula:

t = _ x − μ _ s/ √ _ n

Here, _ x , μ, s, and n are the sample mean...

Tests with more than two groups and ANOVA

In the previous chapter and previous sections, we covered tests between two groups. In this section, we will cover two methods for testing differences between groups, as follows:

Pairwise tests with the Bonferroni correction
ANOVA

When testing for differences between more than two groups, we will have to use multiple tests, which will affect our type I error rate. There are several methods to control the error rate. We will see how to utilize the Bonferroni correction to control the Type I error rate. We will also discuss ANOVA in this section, which is used to test for a difference in means of multiple groups.

Multiple tests for significance

In the previous sections, we looked at making a comparison between two groups. In this section, we will consider how to perform tests when there are more than two groups present. Let’s again consider the factory example where we have several models (model A, model B, and...

Summary

This chapter covered topics of parametric tests. Starting with the assumptions of parametric tests, we identified and applied methods for testing the violation of these assumptions and discussed scenarios where robustness can be assumed when the required assumptions are not met. We then looked at one of the most popular alternatives to the z-test, the t-test. We iterated through multiple applications of this test, covering one-sample and two-sample versions of this test using pooling, pairing, and Welch’s non-pooled version of the two-sample analysis. Next, we explored ANOVA techniques, where we looked at using data from multiple groups to identify statistically significant differences between them. This included one of the most popular adjustments to the p-value for when a high volume of groups is present—the Bonferroni correction, which helps prevent inflating the Type I error when performing multiple tests. We then looked at performing correlation analysis...

References

[1] Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

The rest of the chapter is locked

You have been reading a chapter from

Building Statistical Models in Python

Published in: Aug 2023Publisher: PacktISBN-13: 9781804614280

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Huy Hoang Nguyen

Huy Hoang Nguyen is a Mathematician and a Data Scientist with far-ranging experience, championing advanced mathematics and strategic leadership, and applied machine learning research. He holds a Master's in Data Science and a PhD in Mathematics. His previous work was related to Partial Differential Equations, Functional Analysis and their applications in Fluid Mechanics. He transitioned from academia to the healthcare industry and has performed different Data Science projects from traditional Machine Learning to Deep Learning.
Read more about Huy Hoang Nguyen

Paul N Adams

Paul Adams is a Data Scientist with a background primarily in the healthcare industry. Paul applies statistics and machine learning in multiple areas of industry, focusing on projects in process engineering, process improvement, metrics and business rules development, anomaly detection, forecasting, clustering and classification. Paul holds a Master of Science in Data Science from Southern Methodist University.
Read more about Paul N Adams

Stuart J Miller

Stuart Miller is a Machine Learning Engineer with degrees in Data Science, Electrical Engineering, and Engineering Physics. Stuart has worked at several Fortune 500 companies, including Texas Instruments and StateFarm, where he built software that utilized statistical and machine learning techniques. Stuart is currently an engineer at Toyota Connected helping to build a more modern cockpit experience for drivers using machine learning.
Read more about Stuart J Miller

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages