Packt+ | Advance your knowledge in tech

You're reading from IBM SPSS Modeler Essentials

Product typeBook

Published inDec 2017

PublisherPackt

ISBN-139781788291118

Edition1st Edition

Tools

IBM SPSS

Concepts

Predictive Analytics

Authors (2):

Jesus Salcedo

Keith McCormick

View More author details

Chapter 4. Data Quality and Exploration

The previous chapter introduced the general data structure that is used in Modeler. You learned how to read and display data, and you were introduced to the concepts of the measurement level and the field roles. Now that you know how to bring data into Modeler, the next step is to assess the quality of the data. In this chapter you will:

Get an overview of the Data Audit node options
Go over the results of the Data Audit node
Be introduced to missing data
Discuss ways to address missing data

Once your data is in Modeler, you are ready to start exploring and become familiar with the characteristics of the data. You should review the distribution of each field so that you can become familiar with a dataset, but also so that you can identify potential problems that may arise. For continuous fields, you will want to inspect the range of values. For categorical fields, you will want to take a look at the number of distinct values. You will also have to consider...

Data Audit node options

When data is first read into Modeler, it is important to check the data to make sure it was read correctly. Typically, using a Table node can help you get a sense of the data and inform you of some potential issues that you may have. However, the Data Audit node is a better alternative to using a Table node, as it provides a more thorough look at the data.

Before modeling takes place, it is important to see how records are distributed within the fields in the dataset. Knowing this information can identify values that, on the surface, appear to be valid, but when compared to the rest of the data are either out of range or inappropriate. Let's begin by opening a stream that has the modifications we made in the previous chapter:

Open the Data Quality and Exploration stream.

Note

This simple stream contains the Demographic data file that has been linked to the Var. File source, along with the modifications we previously made in the Types tab. In order for this stream to function...

Summary

This chapter focused on understanding your data. You learned about the different options available in the Data Audit node. You also learned how to look over the results of the Data Audit node to get a better feel for your data and to identify potential problems your data may have. Finally, you were introduced to the topic of missing data, and several ways to address this issue were discussed.

In the next chapter, we will begin to fix some of the problems that we found in the data. Specifically, we will use the Select node to choose the appropriate sample for our analysis. We will also use the Reclassify node to modify fields so that the distributions are appropriate for modeling. We will also use other nodes to rectify other concerns.

The rest of the chapter is locked

You have been reading a chapter from

IBM SPSS Modeler Essentials

Published in: Dec 2017Publisher: PacktISBN-13: 9781788291118

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Jesus Salcedo

Jesus Salcedo has a PhD in psychometrics from Fordham University. He is an independent statistical consultant and has been using SPSS products for over 20 years. He is a former SPSS Curriculum Team Lead and Senior Education Specialist who has written numerous SPSS training courses and trained thousands of users.
Read more about Jesus Salcedo

Keith McCormick

Keith McCormick is a career long practitioner of predictive analytics and data science. He has engaged in statistical modeling, data mining, and mentoring others in the area for more than 20 years. He has a particular expertise in helping organizations perform their first predictive analytics project or build their first predictive analytics practice, and has done so in a variety of industries including healthcare, banking, telecommunications, non-profit, direct mail, pharmaceuticals, and retail. Keith is also an established author and speaker with four books in print, or under contract. Although his consulting work is not restricted to any one tool, his writing and speaking has made him particularly well known in the IBM SPSS Statistics and IBM SPSS Modeler communities.
Read more about Keith McCormick

Other recommended products

Related to this chapter

Machine Learning for Data Mining

Most data mining opportunities involve machine learning and often come with greater financial rewards. This book will help you bring the power of machine learning techniques into your data mining work. By the end of the book, you will be able to create accurate predictive models for data mining.

BookApr 2019252 pages

Learning Alteryx

Alteryx, as a leading data blending and advanced data analytics platform, has taken self-service data analytics to the next level. This book will set you on a self-service data analytics journey that will help you create efficient workflows using Alteryx, without any coding involved. It will empower you and your organization to take well-informed decisions with the help of deeper business insights from the data. You will see how to use the unique features of Alteryx to perform common tasks such as data preparation and blending, and also delve into the more advanced concepts such as performing predictive analytics, before sharing the insights gained with the relevant decision makers. Whether you are a novice with Alteryx or an experienced data analyst keen to explore Alteryx’s self-service analytics features, this guide will be the perfect companion for you.

BookDec 2017228 pages

Data Analysis with IBM SPSS Statistics

SPSS Statistics is a software package used for logical batched and non-batched statistical analysis. Analytical tools such as SPSS can readily provide even a novice user with an overwhelming amount of information and a broad range of options for analyzing patterns in the data. This book will have a comprehensive coverage of IBM’s premier statistics and data analysis tool – IBM SPSS Statistics. It is designed for business professionals who wish to analyze their data. By the end of this book, you will have a firm understanding of the various statistical analysis techniques offered by SPSS Statistics, and be able to master its use for data analysis with ease.

BookSep 2017446 pages

Advanced Analytics with R and Tableau

R is the go-to tool for statistics and data mining while Tableau offers an interface to filter data, plug and play with rich visualizations to describe insights from your data. When combined these two tools makes it easier to harness interesting patterns and communicate stories. This book covers various analytical techniques like prediction, classification, clustering and best practices to visualize it using interactive dashboard with drop-downs, sliders, and other visual cues of Tableau. Get to know how R can be used in conjunction with Tableau and implement powerful machine learning techniques making big data analytics accessible and presentable through Tableau workbooks.

BookAug 2017178 pages

Hands-On Machine Learning with IBM Watson

A practical guide on Machine learning with IBM cloud to act as a solid yet concise reference for the readers. You will learn about the role of data representation and feature extraction in machine learning. This book will help you learn how to use the IBM Cloud and Watson Machine learning service to develop real-world machine learning solutions.

BookMar 2019288 pages

R Data Mining

This book will empower you to produce and present impressive analyses from data, by selecting and implementing the appropriate data mining techniques in R. Explore a data mining crime case, where you will be requested to help resolving a real fraud case affecting a commercial company, by the mean of both basic and advanced data mining techniques.

BookNov 2017442 pages

Big Data Visualization

Uncover new approaches to big data visualization to make your analysis more effective and efficient with Big Data Visualization. Featuring in-depth coverage of big data analysis concepts together with industry-proven techniques, you?ll learn how to approach the challenge of big data visualization with confidence, ease and precision.

BookFeb 2017304 pages

Practical Predictive Analytics

This book teaches six specific steps needed to implement predictive analytics using R. It also teaches how team collaboration is critical and how it increases the chances of implementing a successful model. The book uses cases from healthcare, marketing, and government to build practical skills. Big Data is also covered, in this book, which will extend your skill sets by learning Databricks and RSpark.

BookJun 2017576 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages