Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
IBM SPSS Modeler Essentials
IBM SPSS Modeler Essentials

IBM SPSS Modeler Essentials: Effective techniques for building powerful data mining and predictive analytics solutions

By Jesus Salcedo , Keith McCormick
€22.99 €15.99
Book Dec 2017 238 pages 1st Edition
eBook
€22.99 €15.99
Print
€28.99
Subscription
€14.99 Monthly
eBook
€22.99 €15.99
Print
€28.99
Subscription
€14.99 Monthly

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Dec 26, 2017
Length 238 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781788291118
Category :
Table of content icon View table of contents Preview book icon Preview Book

IBM SPSS Modeler Essentials

Chapter 1. Introduction to Data Mining and Predictive Analytics

IBM SPSS Modeler is an interactive data mining workbench composed of multiple tools and technologies to support the entire data mining process. In this first chapter, readers will be introduced to the concepts of data mining, CRISP-DM, which is a recipe for doing data mining the right way, and a case study outlining the data mining process. The chapter topics are as follows:

  • Introduction to data mining
  • CRISP-DM overview
  • The data mining process (as a case study)

Introduction to data mining


In this chapter, we will place IBM SPSS Modeler and its use in a broader context. Modeler was developed as a tool to perform data mining. Although the phrase predictive analytics is more common now, when Modeler was first developed in the 1990s, this type of analytics was almost universally called data mining. The use of the phrase data mining has evolved a bit since then to emphasize the exploratory aspect, especially in the context of big data and sometimes with a particular emphasis on the mining of private data that has been collected. This will not be our use of the term. Data mining can be defined in the following way:

Data mining is the search of data, accumulated during the normal course of doing business, in order to find and confirm the existence of previously unknown relationships that can produce positive and verifiable outcomes through the deployment of predictive models when applied to new data.

Several points are worth emphasizing:

  • The data is not new
  • The data that can solve the problem was not collected solely to perform data mining
  • The data miner is not testing known relationships (neither hypotheses nor hunches) against the data
  • The patterns must be verifiable
  • The resulting models must be capable of something useful
  • The resulting models must actually work when deployed on new data

In the late 1990s, a process was developed called the Cross Industry Standard Process for Data Mining (CRISP-DM). We will be drawing heavily from that tradition in this chapter, and CRISP-DM can be a powerful way to organize your work in Modeler. It is because of our use of this process in organizing this book's material that prompts us to use the term data mining. It is worth noting that the team that first developed Modeler, originally called Clementine, and the team that wrote CRISP-DM have some members in common.

CRISP-DM overview


The CRISP-DM is considered to be the de facto standard for conducting a data mining project. Starting with the Business Understanding phase and ending with the Deployment phase, this six-phase process has a total of 24 tasks. It is important to not get by with just focusing on the highest level of the phases, since it is well worth the effort to familiarize yourself with all of the 24 tasks. The diagram shown next illustrates the six phases of the CRISP-DM process model and the following pages will discuss each of these phases:

Business Understanding

The Business Understanding phase is focused on good problem definition and ensuring that you are solving the business's problem. You must begin from a business perspective and business knowledge, and proceed by converting this knowledge into a data mining problem definition. You will not be performing the actual Business Understanding in Modeler, as such, but Modeler allows you to organize supporting material such as word documents and PowerPoint presentations as part of a Modeler project file. You don't need to organize this material in a project file, but you do need to remember to do a proper job at this phase. For more detailed information on each task within a phase, refer to the CRISP-DM document itself. It is free and readily available on the internet.

The four tasks in this phase are:

  • Determine business objectives
  • Assess situation
  • Determine data mining goals
  • Produce project plan

Data Understanding

Modeler has numerous resources for exploring your data in preparation for the other phases. We will demonstrate a number of these in Chapter 3, Importing Data into ModelerChapter 4, Data Quality and Exploration; and Chapter 8, Looking for Relationships Between Fields. The Data Understanding phase includes activities for getting familiar with the data as well as data collection and data quality. The four Data Understanding tasks are:

  • Collect initial data
  • Describe data
  • Explore data
  • Verify data quality

Data Preparation

The Data Preparation phase covers all activities to construct the final dataset (the data that will be fed into the modeling tool(s)) from the initial raw data. Data Preparation is often described as the most labor-intensive phase for the data analyst. It is terribly important that Data Preparation is done well, and a substantial amount of this book is dedicated to it. We cover cleaning, selecting, integrating, and constructing data, in Chapter 5Cleaning and Selecting Data; Chapter 6,Combining Data Files; and Chapter 7, Deriving New Fields, respectively. However, a book dedicated to the basics of data mining can really only start you on your journey when it comes to Data Preparation, since there are so many ways in which you can improve and prepare data. When you are ready for a more advanced treatment of this topic, there are two resources that will go into Data Preparation in much more depth, and both have extensive Modeler software examples: The IBM SPSS Modeler Cookbook (Packt Publishing) and Effective Data Preparation (Cambridge University Press).

The five Data Preparation tasks are:

  • Select data
  • Clean data
  • Construct data
  • Integrate data
  • Format data

Modeling

The Modeling phase is probably what you expect it to be—the phase where the modeling algorithms move to the forefront. In many ways, this is the easiest phase, as the algorithms do a lot of the work if you have done an excellent job on the prior phases and you've done a good job translating the business problem into a data mining problem. Despite the fact that the algorithms are doing the heavy lifting in this phase, it is generally considered the most intimidating; it is understandable why. There are an overwhelming number of algorithms to choose from. Even in a well-curated workbench such as Modeler, there are dozens of choices. Open source options such as R have hundreds of choices. While this book is not an algorithms guide, and even though it is impossible to offer a chapter on each algorithm, Chapter 9Introduction to Modeling Options in IBM SPSS Modeler should be very helpful in understanding, at a high level, what options are available in Modeler. Also, in Chapter 10, Decision Tree Models we go through a thorough demonstration of one modeling technique, decision trees, to orient you to modeling in Modeler.

The four tasks in this phase are:

  • Select modeling technique
  • Generate test design
  • Build model
  • Assess model

Evaluation

At this stage in the project you have built a model (or models) that appears to be of high quality, from a data analysis perspective. Before proceeding to final deployment of the model, it is important to more thoroughly evaluate the model—to be certain it properly achieves the business objectives.

Evaluation is frequently confused with model assessment—the last task of the Modeling phase. Assess model is all about the data analysis perspective and includes metrics such as model accuracy. The authors of CRISP-DM considered calling this phase business evaluation because it has to be conducted in the language of the business and using the metrics of the business as indicators of success. Given the nature of this book, and its emphasis on the point and click operation of Modeler, there will be virtually no opportunity to practice this phase, but in real world projects it is a critical phase.

The three tasks in this phase are:

  • Evaluate results
  • Review process
  • Determine next steps

Deployment

Creation of the model is generally not the end of the project. Depending on the requirements, the Deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process. Given the software focus of this book and the spirit of sticking to the basics, we will really only cover using models for the scoring of new data. Real world deployment is much more complex and a complex deployment can more than double the length of a project. Modeler's capabilities in this area go far beyond what we will be able to show in this book. The final chapter of this book, Chapter 11, Model Assessment and Scoring, briefly talks about some of these issues.

However, it is not unusual for the deployment team to be different than the modeling team, and the responsibility may fall to team members with more of an IT focus. The IBM software stack offers dedicated tools for complex deployment scenarios. IBM Collaboration and Deployment Services has such advanced features.

The four tasks in the Deployment phase are:

  • Plan deployment
  • Plan monitoring and maintenance
  • Produce final report
  • Review project

Learning more about CRISP-DM

Here are five great resources to learn more about CRISP-DM:

The data mining process (as a case study)


As Chapter 9Introduction to Modeling Options in IBM SPSS Modeler will illustrate, there are many different types of data mining projects. For example, you may wish to create customer segments based on products purchased or service usage, so that you can develop targeted advertising campaigns. Or you may want to determine where to better position products in your store, based on customer purchase patterns. Or you may want to predict which students will drop out of school, so that you can provide additional services before this happens.

In this book, we will be using a dataset where we are trying to predict which people have incomes above or below $50,000. We may be trying to do this because we know that people with incomes above $50,000 are much more likely to purchase our products, given that previous work found that income was the most important predictor regarding product purchase. The point is that regardless of the actual data that we are using, the principles that we will be showing apply to an infinite number of data mining problems; whether you are trying to determine which customers will purchase a product, or when you will need to replace an elevator, or how many hotels rooms will be booked on a given date, or what additional complications might occur during surgery, and so on.

As was mentioned previously, Modeler supports the entire data mining process. The figure shown next illustrates exactly how Modeler can be used to compartmentalize each aspect of the CRISP-DM process model:

In Chapter 2The Basics of Using IBM SPSS Modeler, you will become familiar with the Modeler graphic user interface. In this chapter, we will be using screenshots to illustrate how Modeler represents various data mining activities. Therefore the following figures in this chapter are just providing an overview of how different tasks will look within Modeler, so for the moment do not worry about how each image was created, since you will see exactly how to create each of these in later chapters.

First and foremost, every data mining project will need to begin with well-defined business objectives. This is crucial for determining what you are trying to accomplish or learn from a project, and how to translate this into data mining goals. Once this is done, you will need to assess the current business situation and develop a project plan that is reasonable given the data and time constraints.

Once business and data mining objectives are well defined, you will need to collect the appropriate data. Chapter 3, Importing Data into Modeler will focus on how to bring data into Modeler. Remember that data mining typically uses data that was collected during the normal course of doing business, therefore it is going to be crucial that the data you are using can really address the business and data mining goals:

Once you have data, it is very important to describe and assess its quality. Chapter 4Data Quality and Exploration will focus on how to assess data quality using the Data Audit node:

Once the Data Understanding phase has been completed, it is time to move on to the Data Preparation phase. The Data Preparation phase is by far the most time consuming and creative part of a data mining project. This is because, as was mentioned previously, we are using data that was collected during the normal course of doing business, therefore the data will not be clean, it will have errors, it will include information that is not relevant, it will have to be restructured into an appropriate format, and you will need to create many new variables that extract important information. Thus, due to the importance of this phase, we have devoted several chapters to addressing these issues. Chapter 5Cleaning and Selecting Data will focus on how to select the appropriate cases, by using the Select node, and how to clean data by using the Distinct and Reclassify nodes:

Chapter 6, Combining Data Files will continue to focus on the Data Preparation phase by using both the Append and Merge nodes to integrate various data files:

Finally, Chapter 7Deriving New Fields will focus on constructing additional fields by using the Derive node:

At this point we will be ready to begin exploring relationships within the data. In Chapter 8Looking for Relationships Between Fields we will use the Distribution, Matrix, Histogram, Means, Plot, and Statistics nodes to uncover and understand simple relationships between variables:

Once the Data Preparation phase has been completed, we will move on to the Modeling phase. Chapter 9Introduction to Modeling Options in IBM SPSS Modeler will introduce the various types of models available in Modeler and then provide an overview of the predictive models. It will also discuss how to select a modeling technique. Chapter 10Decision Tree Models will cover the theory behind decision tree models and focus specifically on how to build a CHAID model. We will also use a Partition node to generate a test design; this is extremely important because only through replication can we determine whether we have a verifiable pattern:

Chapter 11Model Assessment and Scoring is the final chapter in this book and it will provide readers with the opportunity to assess and compare models using the Analysis node. The Evaluation node will also be introduced as a way to evaluate model results:

Finally, we will spend some time discussing how to score new data and export those results to another application using the Flat File node:

Summary


In this chapter, you were introduced to the notion of data mining and the CRISP-DM process model. You were also provided with an overview of the data mining process, along with previews of what to expect in the upcoming chapters.

In the next chapter you will learn about the different components of the Modeler graphic user interface. You also learn how to build streams. Finally, you will be introduced to various help options.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Get up–and-running with IBM SPSS Modeler without going into too much depth.
  • Identify interesting relationships within your data and build effective data mining and predictive analytics solutions
  • A quick, easy–to-follow guide to give you a fundamental understanding of SPSS Modeler, written by the best in the business

Description

IBM SPSS Modeler allows users to quickly and efficiently use predictive analytics and gain insights from your data. With almost 25 years of history, Modeler is the most established and comprehensive Data Mining workbench available. Since it is popular in corporate settings, widely available in university settings, and highly compatible with all the latest technologies, it is the perfect way to start your Data Science and Machine Learning journey. This book takes a detailed, step-by-step approach to introducing data mining using the de facto standard process, CRISP-DM, and Modeler’s easy to learn “visual programming” style. You will learn how to read data into Modeler, assess data quality, prepare your data for modeling, find interesting patterns and relationships within your data, and export your predictions. Using a single case study throughout, this intentionally short and focused book sticks to the essentials. The authors have drawn upon their decades of teaching thousands of new users, to choose those aspects of Modeler that you should learn first, so that you get off to a good start using proven best practices. This book provides an overview of various popular data modeling techniques and presents a detailed case study of how to use CHAID, a decision tree model. Assessing a model’s performance is as important as building it; this book will also show you how to do that. Finally, you will see how you can score new data and export your predictions. By the end of this book, you will have a firm understanding of the basics of data mining and how to effectively use Modeler to build predictive models.

What you will learn

• Understand the basics of data mining and familiarize yourself with Modeler’s visual programming interface • Import data into Modeler and learn how to properly declare metadata • Obtain summary statistics and audit the quality of your data • Prepare data for modeling by selecting and sorting cases, identifying and removing duplicates, combining data files, and modifying and creating fields • Assess simple relationships using various statistical and graphing techniques • Get an overview of the different types of models available in Modeler • Build a decision tree model and assess its results • Score new data and export predictions

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Dec 26, 2017
Length 238 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781788291118
Category :

Table of Contents

19 Chapters
Title Page Chevron down icon Chevron up icon
Credits Chevron down icon Chevron up icon
About the Authors Chevron down icon Chevron up icon
About the Reviewer Chevron down icon Chevron up icon
www.PacktPub.com Chevron down icon Chevron up icon
Customer Feedback Chevron down icon Chevron up icon
Dedication Chevron down icon Chevron up icon
Preface Chevron down icon Chevron up icon
Introduction to Data Mining and Predictive Analytics Chevron down icon Chevron up icon
The Basics of Using IBM SPSS Modeler Chevron down icon Chevron up icon
Importing Data into Modeler Chevron down icon Chevron up icon
Data Quality and Exploration Chevron down icon Chevron up icon
Cleaning and Selecting Data Chevron down icon Chevron up icon
Combining Data Files Chevron down icon Chevron up icon
Deriving New Fields Chevron down icon Chevron up icon
Looking for Relationships Between Fields Chevron down icon Chevron up icon
Introduction to Modeling Options in IBM SPSS Modeler Chevron down icon Chevron up icon
Decision Tree Models Chevron down icon Chevron up icon
Model Assessment and Scoring Chevron down icon Chevron up icon

Customer reviews

Filter icon Filter
Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.