You're reading from Building Statistical Models in Python

Product typeBook

Published inAug 2023

Reading LevelIntermediate

PublisherPackt

ISBN-139781804614280

Edition1st Edition

Languages

Python

Concepts

Statistics

Authors (3):

Huy Hoang Nguyen

Paul N Adams

Stuart J Miller

View More author details

Survival Models

In Chapter 13, Time-to-Event Variables, we introduced the topics of survival analysis, censoring, and time-to-event (TTE) variables. In this chapter, we will provide an in-depth overview and walkthrough of the implementation of these techniques with respect to three primary model frameworks:

Kaplan-Meier model
Exponential model
Cox Proportional Hazards model

We will discuss how each approach provides probabilistic insight into the survival and hazard risk of study subjects using univariate Kaplan-Meier and exponential approaches as well as the multivariate Cox Proportional Hazards regression model. We’ll walk through examples using real data and discuss the results so that the reader understands how to assess performance and translate test output into useful information. Finally, we will show how to use the trained models to provide forecast probabilities for unseen data.

Technical requirements

In this chapter, we use an additional Python library for survival analysis: lifelines. Please install the following versions of these libraries to run the provided code. Instructions for installing libraries can be found in Chapter 1, Sampling and Generalization:

lifelines== 0.27.4

More information about lifelines can be found at this link:

https://lifelines.readthedocs.io/en/latest/index.html

Kaplan-Meier model

The first model for survival analysis we will discuss is the Kaplan-Meier model (also called the Kaplan-Meier estimator). We will start this section with a discussion model definition and learn how it is built. Then, we will close this section with an example of how to use this model in Python using the lifelines library. Let’s get started.

Model definition

The Kaplan-Meier estimator is defined by the following formula:

ˆ S (t) = ∏ i:t i≤t n i − d i _ n i

Here, n i is the number of subjects at risk just before time t, d i is the number of death events at time t, and ˆ S (t) (the survival function) is the probability that life is longer than t. The Π symbol used in the formula is like the symbol Σ; however, Π indicates multiplication. This means that the preceding formula will result in a multiplication...

Exponential model

In the last section, we studied the non-parametric Kaplan-Meier survival model. We will now bridge parametric modeling with the exponential model and then will discuss a semi-parametric model, the Cox Proportional Hazards model, in the next section. Before considering the exponential model, we will review what the exponential distribution is and why we mention it in this section. This distribution is based on the Poisson process. Here, events occur independently over time and the event rate, λ, is calculated by the number of occurrences per unit of time, as follows:

λ = Y _ t

The Poisson distribution is a statistical discrete distribution concerning the number of events occurring in a specified time period. It is defined as follows. Let Y be the number of occurrences in time t. Y follows the Poisson distribution with parameter λ if a probability mass function is given by the following formula:

f(Y) = Pr(y = Y) = e ...

Cox Proportional Hazards regression model

Survival analysis, also called TTE analysis, as we discussed in Chapter 13, Time-to-Event Variables, is an analytical approach that uses probability to estimate the time remaining before an event occurs based on previous observations. We have seen how this can be helpful when including appropriate covariates in applications such as estimating life expectancy, mechanical failure, and customer churn, which can help with prioritizing needs and to more efficiently allocate resources. As we discussed in depth in Chapter 13, censoring is an aspect making survival analysis unique from other statistical questions that can be solved using techniques such as regression. Consequently—and because dropping an observation due to censoring will almost certainly mislead our model and provide results we cannot trust—we insert what is known as an event status indicator to help account for whether an event will occur or fail to occur prior to estimating...

Summary

In this chapter, we discussed three survival analysis models in depth; the Kaplan-Meier, the exponential, and the Cox Proportional Hazards regression models. Using these frameworks, we modeled survival functions and estimated survival probabilities and hazard ratios for various TTE, right-censored studies. For the multivariate case, we used Cox Proportional Hazards regression to model hazard ratios for covariate analysis on dependent variables. For all models, we demonstrated using the confidence intervals for assessing significance, as well as the corresponding p-values. At this point, the reader should be able to confidently identify the scenarios in which each model would outperform the others and appropriately fit and implement that model to obtain the necessary results for strategic success.

The rest of the chapter is locked

You have been reading a chapter from

Building Statistical Models in Python

Published in: Aug 2023Publisher: PacktISBN-13: 9781804614280

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Huy Hoang Nguyen

Huy Hoang Nguyen is a Mathematician and a Data Scientist with far-ranging experience, championing advanced mathematics and strategic leadership, and applied machine learning research. He holds a Master's in Data Science and a PhD in Mathematics. His previous work was related to Partial Differential Equations, Functional Analysis and their applications in Fluid Mechanics. He transitioned from academia to the healthcare industry and has performed different Data Science projects from traditional Machine Learning to Deep Learning.
Read more about Huy Hoang Nguyen

Paul N Adams

Paul Adams is a Data Scientist with a background primarily in the healthcare industry. Paul applies statistics and machine learning in multiple areas of industry, focusing on projects in process engineering, process improvement, metrics and business rules development, anomaly detection, forecasting, clustering and classification. Paul holds a Master of Science in Data Science from Southern Methodist University.
Read more about Paul N Adams

Stuart J Miller

Stuart Miller is a Machine Learning Engineer with degrees in Data Science, Electrical Engineering, and Engineering Physics. Stuart has worked at several Fortune 500 companies, including Texas Instruments and StateFarm, where he built software that utilized statistical and machine learning techniques. Stuart is currently an engineer at Toyota Connected helping to build a more modern cockpit experience for drivers using machine learning.
Read more about Stuart J Miller

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages