Reader small image

You're reading from  Practical Predictive Analytics

Product typeBook
Published inJun 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781785886188
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Ralph Winters
Ralph Winters
author image
Ralph Winters

Ralph Winters started his career as a database researcher for a music performing rights organization (he composed as well!), and then branched out into healthcare survey research, finally landing in the Analytics and Information technology world. He has provided his statistical and analytics expertise to many large fortune 500 companies in the financial, direct marketing, insurance, healthcare, and pharmaceutical industries. He has worked on many diverse types of predictive analytics projects involving customerretention, anti-money laundering, voice of the customer text mining analytics, and health care risk and customer choice models. He is currently data architect for a healthcare services company working in the data and advanced analytics group. He enjoys working collaboratively with a smart team of business analysts, technologists, actuaries as well as with other data scientists. Ralph considered himself a practical person. In addition to authoring Practical Predictive Analytics for Packt Publishing, he has also contributed two tutorials illustrating the use of predictive analytics in Medicine and Healthcare in Practical Predictive Analytics and Decisioning Systems for Medicine: Miner et al., Elsevier September, 2014, and also presented Practical Text Mining with SQL using Relational Databases, at the 2013 11th Annual Text and Social Analytics Summit in Cambridge, MA. Ralph resides in New Jersey with his loving wife Katherine, amazing daughters Claire and Anna, and his four-legged friends, Bubba and Phoebe, who can be unpredictable. Ralph's web site can be found at ralphwinters.com
Read more about Ralph Winters

Right arrow

Chapter 5. Introduction to Decision Trees, Clustering, and SVM

"My interest is in the future because I am going to spend the rest of my life there€

—€“ Charles F. Kettering.

Decision tree algorithms


Decision trees are considered a good predictive model to start with, and have many advantages. Interpretability, variable selection, variable interaction, and the flexibility to choose the level of complexity for a decision tree all come into play.

Decision trees methods are considered classification methods, so the typical use case for a decision tree is predicting a class or category. However, there are also certain types of decision trees, which are known as regression trees, where the output is a continuous variable. In this way, we can begin development models that are a mix of numeric and categorical variables.

Decision trees are heavily used in marketing and advertising, and in any industry where there is a need to segment customers into different groups. They are also used in healthcare for disease and risk classification.

Advantages of decision trees

Decision trees have many advantages. They can be easily understood by both technical and business people and...

Cluster analysis


Cluster analysis has many uses. At its very basic level, a cluster is a group of people or objects that share similar characteristics. In the marketing and sales industries, clustering is important, since customers (or potential customers) can be grouped by characteristics such as average spending, frequency of purchase, and recency of purchases, and assigned a cluster that represents one single measure of the different levels contained in all of the attributes that make up that cluster. So, for our RFM example, cluster A might represent frequent purchasers who spend a lot of money, and spend often (every marketers dream). Cluster B could represent people who are just average consumers across all three of those RFM metrics, and there might even be a cluster Z which represents things that seem to be impossible, such as customers who buy Halloween costumes only on Tuesdays.

Data analysts can often get good results by using tools such as SQL, or by having great insights in customers...

Support vector machines


We have already seen some examples in which we use a straight line to separate classes.

As the dimensionality, or feature space, of a model increases, there may be many different ways to separate classes, in both linear and non-linear ways.

In the cases of support vector machines, data is first transformed into a higher dimensional space using a mapping function known as a kernel, and an optimal hyperplane is used to segment the higher dimensional space. A hyperplane uses one dimension less than the space it is trying to measure, so a straight line is used to segment a two-dimensional space, and a 2-dimensional sheet of paper is used to segment a three-dimensional space. The hyperplane can be either linear or non-linear.

Hyperplanes use support vectors which are important training tuples and are used to define the boundaries of each class. They are the most critical points in the data, and they are the most important points used which support the definition of the hyperplane...

References


Summary


In this chapter, we added three more algorithms to our arsenal, and these 3, along with regression form the core basic algorithms that can cover a lot of ground in terms of the typical problems a predictive analyst will face. We saw that a good knowledge of decision tree methodologies allows you to start developing models quickly, they are easily interpretable, and are the basis for more advanced techniques such as random forests. We then went on to clustering. Clustering allows you to begin to grasp the concepts of similarity and dissimilarity, and we introduced distance measures. We then ended with a basic introduction to support vector machines, which were demonstrated in the context of text mining.

In the next chapter, we will begin to look at some examples of creating models that predict how long a customer will stay with a company, or for predicting how long it will be until a patient develops a certain medical condition.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Practical Predictive Analytics
Published in: Jun 2017Publisher: PacktISBN-13: 9781785886188
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ralph Winters

Ralph Winters started his career as a database researcher for a music performing rights organization (he composed as well!), and then branched out into healthcare survey research, finally landing in the Analytics and Information technology world. He has provided his statistical and analytics expertise to many large fortune 500 companies in the financial, direct marketing, insurance, healthcare, and pharmaceutical industries. He has worked on many diverse types of predictive analytics projects involving customerretention, anti-money laundering, voice of the customer text mining analytics, and health care risk and customer choice models. He is currently data architect for a healthcare services company working in the data and advanced analytics group. He enjoys working collaboratively with a smart team of business analysts, technologists, actuaries as well as with other data scientists. Ralph considered himself a practical person. In addition to authoring Practical Predictive Analytics for Packt Publishing, he has also contributed two tutorials illustrating the use of predictive analytics in Medicine and Healthcare in Practical Predictive Analytics and Decisioning Systems for Medicine: Miner et al., Elsevier September, 2014, and also presented Practical Text Mining with SQL using Relational Databases, at the 2013 11th Annual Text and Social Analytics Summit in Cambridge, MA. Ralph resides in New Jersey with his loving wife Katherine, amazing daughters Claire and Anna, and his four-legged friends, Bubba and Phoebe, who can be unpredictable. Ralph's web site can be found at ralphwinters.com
Read more about Ralph Winters