IBM SPSS Modeler Cookbook

If you’ve already had some experience with IBM SPSS Modeler this cookbook will help you delve deeper and exploit the incredible potential of this data mining workbench. The recipes come from some of the best brains in the business.

IBM SPSS Modeler Cookbook

Keith McCormick et al.

If you’ve already had some experience with IBM SPSS Modeler this cookbook will help you delve deeper and exploit the incredible potential of this data mining workbench. The recipes come from some of the best brains in the business.
Mapt Subscription
FREE
$29.99/m after trial
eBook
$27.30
RRP $38.99
Save 29%
Print + eBook
$64.99
RRP $64.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$27.30
$64.99
$29.99p/m after trial
RRP $38.99
RRP $64.99
Subscription
eBook
Print + eBook
Start 30 Day Trial
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 
Preview in Mapt

Book Details

ISBN 139781849685467
Paperback382 pages

Book Description

IBM SPSS Modeler is a data mining workbench that enables you to explore data, identify important relationships that you can leverage, and build predictive models quickly allowing your organization to base its decisions on hard data not hunches or guesswork.

IBM SPSS Modeler Cookbook takes you beyond the basics and shares the tips, the timesavers, and the workarounds that experts use to increase productivity and extract maximum value from data. The authors of this book are among the very best of these exponents, gurus who, in their brilliant and imaginative use of the tool, have pushed back the boundaries of applied analytics. By reading this book, you are learning from practitioners who have helped define the state of the art.

Follow the industry standard data mining process, gaining new skills at each stage, from loading data to integrating results into everyday business practices. Get a handle on the most efficient ways of extracting data from your own sources, preparing it for exploration and modeling. Master the best methods for building models that will perform well in the workplace.

Go beyond the basics and get the full power of your data mining workbench with this practical guide.

Table of Contents

Chapter 1: Data Understanding
Introduction
Using an empty aggregate to evaluate sample size
Evaluating the need to sample from the initial data
Using CHAID stumps when interviewing an SME
Using a single cluster K-means as an alternative to anomaly detection
Using an @NULL multiple Derive to explore missing data
Creating an Outlier report to give to SMEs
Detecting potential model instability early using the Partition node and Feature Selection node
Chapter 2: Data Preparation – Select
Introduction
Using the Feature Selection node creatively to remove or decapitate perfect predictors
Running a Statistics node on anti-join to evaluate the potential missing data
Evaluating the use of sampling for speed
Removing redundant variables using correlation matrices
Selecting variables using the CHAID Modeling node
Selecting variables using the Means node
Selecting variables using single-antecedent Association Rules
Chapter 3: Data Preparation – Clean
Introduction
Binning scale variables to address missing data
Using a full data model/partial data model approach to address missing data
Imputing in-stream mean or median
Imputing missing values randomly from uniform or normal distributions
Using random imputation to match a variable's distribution
Searching for similar records using a Neural Network for inexact matching
Using neuro-fuzzy searching to find similar names
Producing longer Soundex codes
Chapter 4: Data Preparation – Construct
Introduction
Building transformations with multiple Derive nodes
Calculating and comparing conversion rates
Grouping categorical values
Transforming high skew and kurtosis variables with a multiple Derive node
Creating flag variables for aggregation
Using Association Rules for interaction detection/feature creation
Creating time-aligned cohorts
Chapter 5: Data Preparation – Integrate and Format
Introduction
Speeding up merge with caching and optimization settings
Merging a lookup table
Shuffle-down (nonstandard aggregation)
Cartesian product merge using key-less merge by key
Multiplying out using Cartesian product merge, user source, and derive dummy
Changing large numbers of variable names without scripting
Parsing nonstandard dates
Parsing and performing a conversion on a complex stream
Sequence processing
Chapter 6: Selecting and Building a Model
Introduction
Evaluating balancing with Auto Classifier
Building models with and without outliers
Using Neural Network for Feature Selection
Creating a bootstrap sample
Creating bagged logistic regression models
Using KNN to match similar cases
Using Auto Classifier to tune models
Next-Best-Offer for large datasets
Chapter 7: Modeling – Assessment, Evaluation, Deployment, and Monitoring
Introduction
How (and why) to validate as well as test
Using classification trees to explore the predictions of a Neural Network
Correcting a confusion matrix for an imbalanced target variable by incorporating priors
Using aggregate to write cluster centers to Excel for conditional formatting
Creating a classification tree financial summary using aggregate and an Excel Export node
Reformatting data for reporting with a Transpose node
Changing formatting of fields in a Table node
Combining generated filters
Chapter 8: CLEM Scripting
Introduction
Building iterative Neural Network forecasts
Quantifying variable importance with Monte Carlo simulation
Implementing champion/challenger model management
Detecting outliers with the jackknife method
Optimizing K-means cluster solutions
Automating time series forecasts
Automating HTML reports and graphs
Rolling your own modeling algorithm – Weibull analysis

What You Will Learn

  • Use and understand the industry standard CRISP_DM process for data mining.
  • Assemble data simply, quickly, and correctly using the full power of extraction, transformation, and loading (ETL) tools.
  • Control the amount of time you spend organizing and formatting your data.
  • Develop predictive models that stand up to the demands of real-life applications.
  • Take your modeling to the next level beyond default settings and learn the tips that the experts use.
  • Learn why the best model is not always the most accurate one.
  • Master deployment techniques that put your discoveries to work making the most of your business’ most critical resources.
  • Challenge yourself with scripting for ultimate control and automation - it’s easier than you think!

Authors

Table of Contents

Chapter 1: Data Understanding
Introduction
Using an empty aggregate to evaluate sample size
Evaluating the need to sample from the initial data
Using CHAID stumps when interviewing an SME
Using a single cluster K-means as an alternative to anomaly detection
Using an @NULL multiple Derive to explore missing data
Creating an Outlier report to give to SMEs
Detecting potential model instability early using the Partition node and Feature Selection node
Chapter 2: Data Preparation – Select
Introduction
Using the Feature Selection node creatively to remove or decapitate perfect predictors
Running a Statistics node on anti-join to evaluate the potential missing data
Evaluating the use of sampling for speed
Removing redundant variables using correlation matrices
Selecting variables using the CHAID Modeling node
Selecting variables using the Means node
Selecting variables using single-antecedent Association Rules
Chapter 3: Data Preparation – Clean
Introduction
Binning scale variables to address missing data
Using a full data model/partial data model approach to address missing data
Imputing in-stream mean or median
Imputing missing values randomly from uniform or normal distributions
Using random imputation to match a variable's distribution
Searching for similar records using a Neural Network for inexact matching
Using neuro-fuzzy searching to find similar names
Producing longer Soundex codes
Chapter 4: Data Preparation – Construct
Introduction
Building transformations with multiple Derive nodes
Calculating and comparing conversion rates
Grouping categorical values
Transforming high skew and kurtosis variables with a multiple Derive node
Creating flag variables for aggregation
Using Association Rules for interaction detection/feature creation
Creating time-aligned cohorts
Chapter 5: Data Preparation – Integrate and Format
Introduction
Speeding up merge with caching and optimization settings
Merging a lookup table
Shuffle-down (nonstandard aggregation)
Cartesian product merge using key-less merge by key
Multiplying out using Cartesian product merge, user source, and derive dummy
Changing large numbers of variable names without scripting
Parsing nonstandard dates
Parsing and performing a conversion on a complex stream
Sequence processing
Chapter 6: Selecting and Building a Model
Introduction
Evaluating balancing with Auto Classifier
Building models with and without outliers
Using Neural Network for Feature Selection
Creating a bootstrap sample
Creating bagged logistic regression models
Using KNN to match similar cases
Using Auto Classifier to tune models
Next-Best-Offer for large datasets
Chapter 7: Modeling – Assessment, Evaluation, Deployment, and Monitoring
Introduction
How (and why) to validate as well as test
Using classification trees to explore the predictions of a Neural Network
Correcting a confusion matrix for an imbalanced target variable by incorporating priors
Using aggregate to write cluster centers to Excel for conditional formatting
Creating a classification tree financial summary using aggregate and an Excel Export node
Reformatting data for reporting with a Transpose node
Changing formatting of fields in a Table node
Combining generated filters
Chapter 8: CLEM Scripting
Introduction
Building iterative Neural Network forecasts
Quantifying variable importance with Monte Carlo simulation
Implementing champion/challenger model management
Detecting outliers with the jackknife method
Optimizing K-means cluster solutions
Automating time series forecasts
Automating HTML reports and graphs
Rolling your own modeling algorithm – Weibull analysis

Book Details

ISBN 139781849685467
Paperback382 pages
Read More

Read More Reviews

Recommended for You

Practical Data Science Cookbook Book Cover
Practical Data Science Cookbook
$ 29.99
$ 21.00
Practical Data Analysis Book Cover
Practical Data Analysis
$ 29.99
$ 21.00
Building Machine Learning Systems with Python Book Cover
Building Machine Learning Systems with Python
$ 29.99
$ 6.00
Big Data Analytics with R and Hadoop Book Cover
Big Data Analytics with R and Hadoop
$ 29.99
$ 21.00
Python Machine Learning Book Cover
Python Machine Learning
$ 35.99
$ 25.20
Mastering Predictive Analytics with R Book Cover
Mastering Predictive Analytics with R
$ 39.99
$ 28.00