Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Building Statistical Models in Python

You're reading from  Building Statistical Models in Python

Product type Book
Published in Aug 2023
Publisher Packt
ISBN-13 9781804614280
Pages 420 pages
Edition 1st Edition
Languages
Concepts
Authors (3):
Huy Hoang Nguyen Huy Hoang Nguyen
Profile icon Huy Hoang Nguyen
Paul N Adams Paul N Adams
Profile icon Paul N Adams
Stuart J Miller Stuart J Miller
Profile icon Stuart J Miller
View More author details

Table of Contents (22) Chapters

Preface Part 1:Introduction to Statistics
Chapter 1: Sampling and Generalization Chapter 2: Distributions of Data Chapter 3: Hypothesis Testing Chapter 4: Parametric Tests Chapter 5: Non-Parametric Tests Part 2:Regression Models
Chapter 6: Simple Linear Regression Chapter 7: Multiple Linear Regression Part 3:Classification Models
Chapter 8: Discrete Models Chapter 9: Discriminant Analysis Part 4:Time Series Models
Chapter 10: Introduction to Time Series Chapter 11: ARIMA Models Chapter 12: Multivariate Time Series Part 5:Survival Analysis
Chapter 13: Time-to-Event Variables – An Introduction Chapter 14: Survival Models Index Other Books You May Enjoy

Survival Models

In Chapter 13, Time-to-Event Variables, we introduced the topics of survival analysis, censoring, and time-to-event (TTE) variables. In this chapter, we will provide an in-depth overview and walkthrough of the implementation of these techniques with respect to three primary model frameworks:

  • Kaplan-Meier model
  • Exponential model
  • Cox Proportional Hazards model

We will discuss how each approach provides probabilistic insight into the survival and hazard risk of study subjects using univariate Kaplan-Meier and exponential approaches as well as the multivariate Cox Proportional Hazards regression model. We’ll walk through examples using real data and discuss the results so that the reader understands how to assess performance and translate test output into useful information. Finally, we will show how to use the trained models to provide forecast probabilities for unseen data.

Technical requirements

In this chapter, we use an additional Python library for survival analysis: lifelines. Please install the following versions of these libraries to run the provided code. Instructions for installing libraries can be found in Chapter 1, Sampling and Generalization:

  • lifelines== 0.27.4

More information about lifelines can be found at this link:

https://lifelines.readthedocs.io/en/latest/index.html

Kaplan-Meier model

The first model for survival analysis we will discuss is the Kaplan-Meier model (also called the Kaplan-Meier estimator). We will start this section with a discussion model definition and learn how it is built. Then, we will close this section with an example of how to use this model in Python using the lifelines library. Let’s get started.

Model definition

The Kaplan-Meier estimator is defined by the following formula:

 ˆ S (t) =  i:t it  n i d i _ n i 

Here, n i is the number of subjects at risk just before time t, d i is the number of death events at time t, and  ˆ S (t) (the survival function) is the probability that life is longer than t. The Π symbol used in the formula is like the symbol Σ; however, Π indicates multiplication. This means that the preceding formula will result in a multiplication...

Exponential model

In the last section, we studied the non-parametric Kaplan-Meier survival model. We will now bridge parametric modeling with the exponential model and then will discuss a semi-parametric model, the Cox Proportional Hazards model, in the next section. Before considering the exponential model, we will review what the exponential distribution is and why we mention it in this section. This distribution is based on the Poisson process. Here, events occur independently over time and the event rate, λ, is calculated by the number of occurrences per unit of time, as follows:

λ = Y _ t 

The Poisson distribution is a statistical discrete distribution concerning the number of events occurring in a specified time period. It is defined as follows. Let Y be the number of occurrences in time t. Y follows the Poisson distribution with parameter λ if a probability mass function is given by the following formula:

f(Y) = Pr(y = Y) = e ...

Cox Proportional Hazards regression model

Survival analysis, also called TTE analysis, as we discussed in Chapter 13, Time-to-Event Variables, is an analytical approach that uses probability to estimate the time remaining before an event occurs based on previous observations. We have seen how this can be helpful when including appropriate covariates in applications such as estimating life expectancy, mechanical failure, and customer churn, which can help with prioritizing needs and to more efficiently allocate resources. As we discussed in depth in Chapter 13, censoring is an aspect making survival analysis unique from other statistical questions that can be solved using techniques such as regression. Consequently—and because dropping an observation due to censoring will almost certainly mislead our model and provide results we cannot trust—we insert what is known as an event status indicator to help account for whether an event will occur or fail to occur prior to estimating...

Summary

In this chapter, we discussed three survival analysis models in depth; the Kaplan-Meier, the exponential, and the Cox Proportional Hazards regression models. Using these frameworks, we modeled survival functions and estimated survival probabilities and hazard ratios for various TTE, right-censored studies. For the multivariate case, we used Cox Proportional Hazards regression to model hazard ratios for covariate analysis on dependent variables. For all models, we demonstrated using the confidence intervals for assessing significance, as well as the corresponding p-values. At this point, the reader should be able to confidently identify the scenarios in which each model would outperform the others and appropriately fit and implement that model to obtain the necessary results for strategic success.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Building Statistical Models in Python
Published in: Aug 2023 Publisher: Packt ISBN-13: 9781804614280
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}