Reader small image

You're reading from  Building Statistical Models in Python

Product typeBook
Published inAug 2023
Reading LevelIntermediate
PublisherPackt
ISBN-139781804614280
Edition1st Edition
Languages
Concepts
Right arrow
Authors (3):
Huy Hoang Nguyen
Huy Hoang Nguyen
author image
Huy Hoang Nguyen

Huy Hoang Nguyen is a Mathematician and a Data Scientist with far-ranging experience, championing advanced mathematics and strategic leadership, and applied machine learning research. He holds a Master's in Data Science and a PhD in Mathematics. His previous work was related to Partial Differential Equations, Functional Analysis and their applications in Fluid Mechanics. He transitioned from academia to the healthcare industry and has performed different Data Science projects from traditional Machine Learning to Deep Learning.
Read more about Huy Hoang Nguyen

Paul N Adams
Paul N Adams
author image
Paul N Adams

Paul Adams is a Data Scientist with a background primarily in the healthcare industry. Paul applies statistics and machine learning in multiple areas of industry, focusing on projects in process engineering, process improvement, metrics and business rules development, anomaly detection, forecasting, clustering and classification. Paul holds a Master of Science in Data Science from Southern Methodist University.
Read more about Paul N Adams

Stuart J Miller
Stuart J Miller
author image
Stuart J Miller

Stuart Miller is a Machine Learning Engineer with degrees in Data Science, Electrical Engineering, and Engineering Physics. Stuart has worked at several Fortune 500 companies, including Texas Instruments and StateFarm, where he built software that utilized statistical and machine learning techniques. Stuart is currently an engineer at Toyota Connected helping to build a more modern cockpit experience for drivers using machine learning.
Read more about Stuart J Miller

View More author details
Right arrow

Time-to-Event Variables – An Introduction

In this short chapter, we will introduce another branch of statistics called survival analysis, which is related to survival and time-censoring studies. Survival analysis is also called time-to-event variable analysis, which is a particular statistical outcome type that requires other techniques than those used in the few last chapters that we have studied. A time-to-event variable analysis studies, for example, whether a participant has an event of interest during the study timeframe. In other words, we study the proportion of a sample surviving after a specific time point and the rate at which the survived sample proportion will fail or die, or whether there are survival differences in different treatment groups. The term survival in survival analysis is originally based on the time from treatment until death in the medical field. However, survival analysis is readily applicable to many fields including engineering (where it is referred...

What is censoring?

In the field of statistics, Censoring refers to a situation where the full extent or precise value of a measurement or observation is not entirely known. In Survival Analysis, this happens when we have information about sample observations and do not know when the given event happened, and is considered a key issue in survival analysis, distinguishing time-to-event analysis from the other statistical analyses mentioned in the previous chapters. There are several reasons why censoring happens; for example, a person withdraws from a study or exits prior to a follow-up, or the event in question has already happened before the study starts. The censored event is non-informative, that is, censoring causes study failure due to some reason other than failure time. In other words, failure caused by censoring is not related to the probability of an event occurring. Informative censoring happens when an observation is lost to follow-up because of research reasons. Three types...

Survival data

Survival data focuses on whether or not an object in a study experiences an event. In addition, the follow-up time is also considered. Time zero or time origin is the time when the study starts. Depending on the purpose of a study, time zero or time origin can be different. For instance, in a prostate cancer study, researchers recruit 40-year-old and older male participants, but in a study of puberty developmental ages, male and female teenagers of 12 and older are recruited. If the research takes place over a period of time (which could be several months or years) then recording the time origin and follow-up time during the study is vitally important.

Lastly, we discuss the relationship between survival and censoring times and how we record survival data with censoring. Suppose that for each object in a sample, we know its true event time T and true censoring time C. Then, the survival time of an object is the period of time until an event occurs and the censoring...

Survival Function, Hazard and Hazard Ratio

Let us first discuss the survival function. The formula of the function is defined as

S(t) = P(T > t)

and represents the probability that the object survives past time t. The survival function is a non-increasing function with t ranges from 0 to . When t = 0, S(t) = 1 and when t = , S(t) = S() = 0. It is a smooth function theoretically but practically, events occur on a discrete time scale (days, weeks, years).

Figure 13.4 – Survival function illustrated

Figure 13.4 – Survival function illustrated

In this example, we go back to the cancer study that spanned 5 years. At time zero, when the study started, the survival probability was 1 or 100% but at year 5, the probability of survival was close to 0.2 or 20%.

Now we consider the Stanford heart transplant dataset. The dataset contains the information of 103 patients who participated in an experimental heart transplant program (see Figure 13.5). The patients were...

Summary

In this chapter, we provided an overview of time-to-event variable analysis and how it differs from other statistical analyses that we have studied in the previous chapters. We covered censoring intuition (left, right, and interval censoring) and discussed Type I and Type II censoring. We also discussed non-informative and informative events in this chapter. We then discussed survival data and the relationship between survival and censoring times and how we record survival data with censoring. The survival function, hazard, and hazard ratio were also mentioned in the last section of this chapter.

In the next chapter, we will consider the non-parametric Kaplan-Meier model, the parametric exponential model, and also the semiparametric Cox Proportional Hazards model. We will perform real data analysis in Python by applying these models for survival analysis.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Building Statistical Models in Python
Published in: Aug 2023Publisher: PacktISBN-13: 9781804614280
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Huy Hoang Nguyen

Huy Hoang Nguyen is a Mathematician and a Data Scientist with far-ranging experience, championing advanced mathematics and strategic leadership, and applied machine learning research. He holds a Master's in Data Science and a PhD in Mathematics. His previous work was related to Partial Differential Equations, Functional Analysis and their applications in Fluid Mechanics. He transitioned from academia to the healthcare industry and has performed different Data Science projects from traditional Machine Learning to Deep Learning.
Read more about Huy Hoang Nguyen

author image
Paul N Adams

Paul Adams is a Data Scientist with a background primarily in the healthcare industry. Paul applies statistics and machine learning in multiple areas of industry, focusing on projects in process engineering, process improvement, metrics and business rules development, anomaly detection, forecasting, clustering and classification. Paul holds a Master of Science in Data Science from Southern Methodist University.
Read more about Paul N Adams

author image
Stuart J Miller

Stuart Miller is a Machine Learning Engineer with degrees in Data Science, Electrical Engineering, and Engineering Physics. Stuart has worked at several Fortune 500 companies, including Texas Instruments and StateFarm, where he built software that utilized statistical and machine learning techniques. Stuart is currently an engineer at Toyota Connected helping to build a more modern cockpit experience for drivers using machine learning.
Read more about Stuart J Miller