You're reading from Practical Predictive Analytics
Survival analysis covers a broad range of topics. Here is the list of topics that we will cover in this chapter:
- Survival analysis
- Time-based variables and regression
- R survival objects
- Customer attrition or churn
- Survival curves
- Cox regression
- Plotting methods
- Variable selection
- Model concordance
Often, predictive analytic problems deal with various situations concerning the tracking of important events along a customer's journey, and predicting when these events will occur. Survival analysis is a form of analysis that is based upon the concept of time to event. The time to event is simply the number of units of time that have elapsed until something happens. The event can be just about anything; a car crash, a stock market crash, or a devastating phenomenon.
Survival analysis originated in the studying of patients who developed terminal diseases, such as cancer, hence the term survival. However, conceptually, it can even be applied to marketing applications in which you...
In this chapter, we will be looking at a dataset of hypothetical customers who are subscribed to an online service, and who have responded to a customer satisfaction survey prior to the beginning of the study. This survey was then matched to transactional as well as demographic data to produce this simple analysis dataset, consisting of an event variable (churn), which will represent whether or not a customer unsubscribed from the service. We will also include some transaction data (number of purchases last month), as well as some demographic data (gender, educational level), as well as an overall satisfaction survey administered prior to the start of the study:
Variable | Description |
| Average dollar amount of previous purchases |
| Number of purchases in the month before the study begins |
| Overall satisfaction with the service supplied on a Likert scale |
| Follow-up satisfaction score |
| Male or female |
|
> summary(survfit(CoxModel.2)) Call: survfit(formula = CoxModel.2) v time n.risk n.event survival std.err lower 95% CI upper 95% CI 1 1488 15 0.994 0.00157 0.991 0.997 2 1455 52 0.973 0.00359 0.966 0.980 3 1393 34 0.958 0.00461 0.949 0.967 4 1342 20 0.950 0.00518 0.940 0.960 5 1315 39 0.932 0.00624 0.920 0.945 6 1245 42 0.913 0.00736 0.898 0.927 7 1156 24 0.898 0.00801 0.883 0.914 8 1020 32 0.877 0.00902 0.859 0.895 9 850 40 0.846 0.01052 0.825 0.866 10 665 51 0.797 0.01293 0.772 0.822 11 435 54 0.721 0.01688 0.688 0.755 12 225 55 0.569 0.02518 0.522 0.621
| > summary(survfit(CoxModel.1)) Call: survfit(formula = CoxModel.1)
time n.risk n.event survival std.err lower 95% CI upper 95% CI 1 1488 15 0.993 0.00185... |