You're reading from Building Statistical Models in Python

Product typeBook

Published inAug 2023

Reading LevelIntermediate

PublisherPackt

ISBN-139781804614280

Edition1st Edition

Languages

Python

Concepts

Statistics

Authors (3):

Huy Hoang Nguyen

Paul N Adams

Stuart J Miller

View More author details

Preface

Statistics is a discipline of study used for applying analytical methods to answer questions and solve problems using data, in both academic and industry settings. Many methods have been around for centuries, while others are much more recent. Statistical analysis and results are fairly straightforward for presenting to both technical and non-technical audiences. Furthermore, producing results with statistical analysis does not necessarily require large amounts of data or compute resources and can be done fairly quickly, especially when using programming languages such as Python, which is moderately easy to work with and implement.

While artificial intelligence (AI) and advanced machine learning (ML) tools have become more prominent and popular over recent years with the increase of accessibility in compute power, performing statistical analysis as a precursor to developing larger-scale projects using AI and ML can enable a practitioner to assess feasibility and practicality before using larger compute resources and project architecture development for those types of projects.

This book provides a wide variety of tools that are commonly used to test hypotheses and provide basic predictive capabilities to analysts and data scientists alike. The reader will walk through the basic concepts and terminology required for understanding the statistical tools in this book prior to exploring the different tests and conditions under which they are applicable. Further, the reader will gain knowledge for assessing the performance of the tests. Throughout, examples will be provided in the Python programming language to get readers started understanding their data using the tools presented, which will be applicable to some of the most common questions faced in the data analytics industry. The topics we will walk through include:

An introduction to statistics
Regression models
Classification models
Time series models
Survival analysis

Understanding the tools provided in these sections will provide the reader with a firm foundation from which further independent growth in the statistics domain can more easily be achieved.

Who this book is for

Professionals in most industries can benefit from the tools in this book. The tools provided are useful primarily at a higher level of inferential analysis, but can be applied to deeper levels depending on the industry in which the practitioner wishes to apply them. The target audiences of this book are:

Industry professionals with limited statistical or programming knowledge who would like to learn to use data for testing hypotheses they have in their business domain
Data analysts and scientists who wish to broaden their statistical knowledge and find a set of tools and their implementations for performing various data-oriented tasks

The ground-up approach of this book seeks to provide entry into the knowledge base for a wide audience and therefore should neither discourage novice-level practitioners nor exclude advanced-level practitioners from the benefits of the materials presented.

What this book covers

Chapter 1, Sampling and Generalization, describes the concepts of sampling and generalization. The discussion of sampling covers several common methods for sampling data from a population and discusses the implications for generalization. This chapter also discusses how to setup the software required for this book.

Chapter 2, Distributions of Data, provides a detailed introduction to types of data, common distributions used to describe data, and statistical measures. This chapter also covers common transformations used to change distributions.

Chapter 3, Hypothesis Testing, introduces the concept of statistical tests as a method for answering questions of interest. This chapter covers the steps to perform a test, the types of errors encountered in testing, and how to select power using the Z-test.

Chapter 4, Parametric Tests, further discusses statistical tests, providing detailed descriptions of common parametric statistical tests, the assumptions of parametric tests, and how to assess the validity of parametric tests. This chapter also introduces the concept of multiple tests and provides details on corrections for multiple tests.

Chapter 5, Non-parametric Tests, discuss how to perform statistical tests when the assumptions of parametric tests are violated with class of tests without assumptions called non-parametric tests.

Chapter 6, Simple Linear Regression, introduces the concept of a statistical model with the simple linear regression model. This chapter begins by discussing the theoretical foundations of simple linear regression and then discusses how to interpret the results of the model and assess the validity of the model.

Chapter 7, Multiple Linear Regression, builds on the previous chapter by extending the simple linear regression model into additional dimensions. This chapter also discusses issues that occur when modeling with multiple explanatory variables, including multicollinearity, feature selection, and dimension reduction.

Chapter 8, Discrete Models, introduces the concept of classification and develops a model for classifying variables into discrete levels of a categorical response variable. This chapter starts by developing the model binary classification and then extends the model to multivariate classification. Finally, the Poisson model and negative binomial models are covered.

Chapter 9, Discriminant Analysis, discusses several additional models for classification, including linear discriminant analysis and quadratic discriminant analysis. This chapter also introduces Bayes’ Theorem.

Chapter 10, Introduction to Time Series, introduces time series data, discussing the time series concept of autocorrelation and the statistical measures for time series. This chapter also introduces the white noise model and stationarity.

Chapter 11, ARIMA Models, discusses models for univariate models. This chapter starts by discussing models for stationary time series and then extends the discussion to non-stationary time series. Finally, this chapter provides a detailed discussion on model evaluation.

Chapter 12, Multivariate Time Series, builds on the previous two chapters by introducing the concept of a multivariate time series and extends ARIMA models to multiple explanatory variables. This chapter also discusses time series cross-correlation.

Chapter 13, Survival Analysis, introduces survival data, also called time-to-event data. This chapter discusses the concept of censoring and the impact of censoring survival data. Finally, the chapter discusses the survival function, hazard, and hazard ratio.

Chapter 14, Survival Models, building on the previous chapter, provides an overview of several models for survival data, including the Kaplan-Meier model, the Exponential model, and the Cox Proportional Hazards model.

To get the most out of this book

You will need access to download and install open-source code packages implemented in the Python programming language and accessible through PyPi.org or the Anaconda Python distribution. While a background in statistics is helpful, but not necessary, this book assumes you have a decent background in basic algebra. Each unit of this book is independent of the other units, but the chapters within each unit build upon each other. Thus, we advise you to begin each unit with that unit’s first chapter to understand the content.

Software/hardware covered in the book	Operating system requirements
Python version ≥ 3.8	Windows, macOS, or Linux
Statsmodels 0.13.2
SciPy 1.8.1
lifelines 0.27.4
scikit-learn 1.1.1
pmdarima 2.02
Sktime 0.15.0
Pandas 1.4.3
Matplotlib 3.5.2
Numpy 1.23.0

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Building-Statistical-Models-in-Python. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system.”

A block of code is set as follows:

A = [3,5,4]
B = [43,41,56,78,54]
permutation_testing(A,B,n_iter=10000)

Any command-line input or output is written as follows:

pip install SomePackage

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “Select System info from the Administration panel.”

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at customercare@packtpub.com and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packtpub.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere? Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below

https://packt.link/free-ebook/978-1-80461-428-0

Submit your proof of purchase
That’s it! We’ll send your free PDF and other benefits to your email directly

The rest of the chapter is locked

You have been reading a chapter from

Building Statistical Models in Python

Published in: Aug 2023Publisher: PacktISBN-13: 9781804614280

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Huy Hoang Nguyen

Huy Hoang Nguyen is a Mathematician and a Data Scientist with far-ranging experience, championing advanced mathematics and strategic leadership, and applied machine learning research. He holds a Master's in Data Science and a PhD in Mathematics. His previous work was related to Partial Differential Equations, Functional Analysis and their applications in Fluid Mechanics. He transitioned from academia to the healthcare industry and has performed different Data Science projects from traditional Machine Learning to Deep Learning.
Read more about Huy Hoang Nguyen

Paul N Adams

Paul Adams is a Data Scientist with a background primarily in the healthcare industry. Paul applies statistics and machine learning in multiple areas of industry, focusing on projects in process engineering, process improvement, metrics and business rules development, anomaly detection, forecasting, clustering and classification. Paul holds a Master of Science in Data Science from Southern Methodist University.
Read more about Paul N Adams

Stuart J Miller

Stuart Miller is a Machine Learning Engineer with degrees in Data Science, Electrical Engineering, and Engineering Physics. Stuart has worked at several Fortune 500 companies, including Texas Instruments and StateFarm, where he built software that utilized statistical and machine learning techniques. Stuart is currently an engineer at Toyota Connected helping to build a more modern cockpit experience for drivers using machine learning.
Read more about Stuart J Miller

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Building Statistical Models in Python

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Share Your Thoughts

Download a free PDF copy of this book

Unlock this book and the full library FREE for 7 days

Authors (3)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook