Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Machine Learning with R - Third Edition

You're reading from  Machine Learning with R - Third Edition

Product type Book
Published in Apr 2019
Publisher Packt
ISBN-13 9781788295864
Pages 458 pages
Edition 3rd Edition
Languages
Author (1):
Brett Lantz Brett Lantz
Profile icon Brett Lantz

Table of Contents (18) Chapters

Machine Learning with R - Third Edition
Contributors
Preface
Other Books You May Enjoy
Leave a review - let other readers know what you think
Introducing Machine Learning Managing and Understanding Data Lazy Learning – Classification Using Nearest Neighbors Probabilistic Learning – Classification Using Naive Bayes Divide and Conquer – Classification Using Decision Trees and Rules Forecasting Numeric Data – Regression Methods Black Box Methods – Neural Networks and Support Vector Machines Finding Patterns – Market Basket Analysis Using Association Rules Finding Groups of Data – Clustering with k-means Evaluating Model Performance Improving Model Performance Specialized Machine Learning Topics Index

Chapter 5. Divide and Conquer – Classification Using Decision Trees and Rules

When deciding between job offers with various levels of pay and benefits, many people begin by making lists of pros and cons, then eliminate options using simple rules. For instance, saying "If I must commute more than an hour, I will be unhappy," or "If I make less than $50K, I won't be able to support my family." In this way, the complex and difficult decision of predicting one's future happiness can be reduced to a series of simple decisions.

This chapter covers decision trees and rule learners—two machine learning methods that also make complex decisions from sets of simple choices. These methods present their knowledge in the form of logical structures that can be understood with no statistical knowledge. This aspect makes these models particularly useful for business strategy and process improvement.

By the end of this chapter, you will learn:

  • How trees and rules "greedily" partition data into interesting segments...

Understanding decision trees


Decision tree learners are powerful classifiers that utilize a tree structure to model the relationships among the features and the potential outcomes. As illustrated in the following figure, this structure earned its name due to the fact that it mirrors the way a literal tree begins at a wide trunk and splits into narrower and narrower branches as it is followed upward. In much the same way, a decision tree classifier uses a structure of branching decisions that channel examples into a final predicted class value.

To better understand how this works in practice, let's consider the following tree, which predicts whether a job offer should be accepted. A job offer under consideration begins at the root node, where it is then passed through decision nodes that require choices to be made based on the attributes of the job. These choices split the data across branches that indicate potential outcomes of a decision. They are depicted here as yes or no outcomes, but...

Example – identifying risky bank loans using C5.0 decision trees


The global financial crisis of 2007-2008 highlighted the importance of transparency and rigor in banking practices. As the availability of credit was limited, banks tightened their lending systems and turned to machine learning to more accurately identify risky loans.

Decision trees are widely used in the banking industry due to their high accuracy and ability to formulate a statistical model in plain language. Since governments in many countries carefully monitor the fairness of lending practices, executives must be able to explain why one applicant was rejected for a loan while another was approved. This information is also useful for customers hoping to determine why their credit rating is unsatisfactory.

It is likely that automated credit scoring models are used for credit card mailings and instant online approval processes. In this section, we will develop a simple credit approval model using C5.0 decision trees. We will...

Understanding classification rules


Classification rules represent knowledge in the form of logical if-else statements that assign a class to unlabeled examples. They are specified in terms of an antecedent and a consequent, which form a statement that says "if this happens, then that happens." The antecedent comprises certain combinations of feature values, while the consequent specifies the class value to assign if the rule's conditions are met. A simple rule might state, "if the hard drive is making a clicking sound, then it is about to fail."

Rule learners are a closely related sibling of decision tree learners and are often used for similar types of tasks. Like decision trees, they can be used for applications that generate knowledge for future action, such as:

  • Identifying conditions that lead to hardware failure in mechanical devices

  • Describing the key characteristics of groups of people for customer segmentation

  • Finding conditions that precede large drops or increases in the prices of...

Example – identifying poisonous mushrooms with rule learners


Each year, many people fall ill and sometimes even die from ingesting poisonous wild mushrooms. Since many mushrooms are very similar to each other in appearance, occasionally even experienced mushroom gatherers are poisoned.

Unlike the identification of harmful plants, such as a poison oak or poison ivy, there are no clear rules like "leaves of three, let them be" for identifying whether a wild mushroom is poisonous or edible. Complicating matters, many traditional rules such as "poisonous mushrooms are brightly colored" provide dangerous or misleading information. If simple, clear, and consistent rules were available for identifying poisonous mushrooms, they could save the lives of foragers.

As one of the strengths of rule learning algorithms is the fact that they generate easy-to-understand rules, they seem like an appropriate fit for this classification task. However, the rules will only be as useful as they are accurate.

Step...

Summary


This chapter covered two classification methods that use so-called "greedy" algorithms to partition the data according to feature values. Decision trees use a divide and conquer strategy to create flowchart-like structures, while rule learners separate and conquer data to identify logical if-else rules. Both methods produce models that can be interpreted without a statistical background.

One popular and highly configurable decision tree algorithm is C5.0. We used the C5.0 algorithm to create a tree to predict whether a loan applicant will default. Using options for boosting and cost-sensitive errors, we were able to improve our accuracy and avoid risky loans that could cost the bank more money.

We also used two rule learners, 1R and RIPPER, to develop rules for identifying poisonous mushrooms. The 1R algorithm used a single feature to achieve 99 percent accuracy in identifying potentially fatal mushroom samples. On the other hand, the set of eight rules generated by the more sophisticated...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Machine Learning with R - Third Edition
Published in: Apr 2019 Publisher: Packt ISBN-13: 9781788295864
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}