You're reading from Practical Predictive Analytics
Decision trees are considered a good predictive model to start with, and have many advantages. Interpretability, variable selection, variable interaction, and the flexibility to choose the level of complexity for a decision tree all come into play.
Decision trees methods are considered classification methods, so the typical use case for a decision tree is predicting a class or category. However, there are also certain types of decision trees, which are known as regression trees, where the output is a continuous variable. In this way, we can begin development models that are a mix of numeric and categorical variables.
Decision trees are heavily used in marketing and advertising, and in any industry where there is a need to segment customers into different groups. They are also used in healthcare for disease and risk classification.
Cluster analysis has many uses. At its very basic level, a cluster is a group of people or objects that share similar characteristics. In the marketing and sales industries, clustering is important, since customers (or potential customers) can be grouped by characteristics such as average spending, frequency of purchase, and recency of purchases, and assigned a cluster that represents one single measure of the different levels contained in all of the attributes that make up that cluster. So, for our RFM example, cluster A might represent frequent purchasers who spend a lot of money, and spend often (every marketers dream). Cluster B could represent people who are just average consumers across all three of those RFM metrics, and there might even be a cluster Z which represents things that seem to be impossible, such as customers who buy Halloween costumes only on Tuesdays.
Data analysts can often get good results by using tools such as SQL, or by having great insights in customers...
We have already seen some examples in which we use a straight line to separate classes.
As the dimensionality, or feature space, of a model increases, there may be many different ways to separate classes, in both linear and non-linear ways.
In the cases of support vector machines, data is first transformed into a higher dimensional space using a mapping function known as a kernel, and an optimal hyperplane is used to segment the higher dimensional space. A hyperplane uses one dimension less than the space it is trying to measure, so a straight line is used to segment a two-dimensional space, and a 2-dimensional sheet of paper is used to segment a three-dimensional space. The hyperplane can be either linear or non-linear.
Hyperplanes use support vectors which are important training tuples and are used to define the boundaries of each class. They are the most critical points in the data, and they are the most important points used which support the definition of the hyperplane...
- AMERICAN STATISTICAL ASSOCIATION RELEASES STATEMENT ON STATISTICAL SIGNIFICANCE AND P-VALUES. (2016, March 7). Retrieved from ASA news: http://www.amstat.org/asa/files/pdfs/P-ValueStatement.pdf
- Anscombe's quartet. Retrieved from Wikipedia: https://en.wikipedia.org/wiki/Anscombe%27s_quartet
- Coefficient of determination. Retrieved from Wikipedia:https://en.wikipedia.org/wiki/Coefficient_of_determination
- Fader, P. S. (2001, May-June). Forecasting Repeat Sales at CDNOW. Interfaces, 31 (May-June), Part 2 of 2, S94-S107.
- FAQ: WHAT ARE PSEUDO R-SQUAREDS? Retrieved from UCLA Institute for Digital Research and Education: http://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-what-are-pseudo-r-squareds/
In this chapter, we added three more algorithms to our arsenal, and these 3, along with regression form the core basic algorithms that can cover a lot of ground in terms of the typical problems a predictive analyst will face. We saw that a good knowledge of decision tree methodologies allows you to start developing models quickly, they are easily interpretable, and are the basis for more advanced techniques such as random forests. We then went on to clustering. Clustering allows you to begin to grasp the concepts of similarity and dissimilarity, and we introduced distance measures. We then ended with a basic introduction to support vector machines, which were demonstrated in the context of text mining.
In the next chapter, we will begin to look at some examples of creating models that predict how long a customer will stay with a company, or for predicting how long it will be until a patient develops a certain medical condition.