Reader small image

You're reading from  IBM SPSS Modeler Essentials

Product typeBook
Published inDec 2017
PublisherPackt
ISBN-139781788291118
Edition1st Edition
Right arrow
Authors (2):
Jesus Salcedo
Jesus Salcedo
author image
Jesus Salcedo

Jesus Salcedo has a PhD in psychometrics from Fordham University. He is an independent statistical consultant and has been using SPSS products for over 20 years. He is a former SPSS Curriculum Team Lead and Senior Education Specialist who has written numerous SPSS training courses and trained thousands of users.
Read more about Jesus Salcedo

Keith McCormick
Keith McCormick
author image
Keith McCormick

Keith McCormick is a career long practitioner of predictive analytics and data science. He has engaged in statistical modeling, data mining, and mentoring others in the area for more than 20 years. He has a particular expertise in helping organizations perform their first predictive analytics project or build their first predictive analytics practice, and has done so in a variety of industries including healthcare, banking, telecommunications, non-profit, direct mail, pharmaceuticals, and retail. Keith is also an established author and speaker with four books in print, or under contract. Although his consulting work is not restricted to any one tool, his writing and speaking has made him particularly well known in the IBM SPSS Statistics and IBM SPSS Modeler communities.
Read more about Keith McCormick

View More author details
Right arrow

Chapter 4. Data Quality and Exploration

The previous chapter introduced the general data structure that is used in Modeler. You learned how to read and display data, and you were introduced to the concepts of the measurement level and the field roles. Now that you know how to bring data into Modeler, the next step is to assess the quality of the data. In this chapter you will:

  • Get an overview of the Data Audit node options
  • Go over the results of the Data Audit node
  • Be introduced to missing data
  • Discuss ways to address missing data

Once your data is in Modeler, you are ready to start exploring and become familiar with the characteristics of the data. You should review the distribution of each field so that you can become familiar with a dataset, but also so that you can identify potential problems that may arise. For continuous fields, you will want to inspect the range of values. For categorical fields, you will want to take a look at the number of distinct values. You will also have to consider...

Data Audit node options


When data is first read into Modeler, it is important to check the data to make sure it was read correctly. Typically, using a Table node can help you get a sense of the data and inform you of some potential issues that you may have. However, the Data Audit node is a better alternative to using a Table node, as it provides a more thorough look at the data.

Before modeling takes place, it is important to see how records are distributed within the fields in the dataset. Knowing this information can identify values that, on the surface, appear to be valid, but when compared to the rest of the data are either out of range or inappropriate. Let's begin by opening a stream that has the modifications we made in the previous chapter:

  1. Open the Data Quality and Exploration stream.

Note

This simple stream contains the Demographic data file that has been linked to the Var. File source, along with the modifications we previously made in the Types tab. In order for this stream to function...

Summary


This chapter focused on understanding your data. You learned about the different options available in the Data Audit node. You also learned how to look over the results of the Data Audit node to get a better feel for your data and to identify potential problems your data may have. Finally, you were introduced to the topic of missing data, and several ways to address this issue were discussed.

In the next chapter, we will begin to fix some of the problems that we found in the data. Specifically, we will use the Select node to choose the appropriate sample for our analysis. We will also use the Reclassify node to modify fields so that the distributions are appropriate for modeling. We will also use other nodes to rectify other concerns.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
IBM SPSS Modeler Essentials
Published in: Dec 2017Publisher: PacktISBN-13: 9781788291118
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Jesus Salcedo

Jesus Salcedo has a PhD in psychometrics from Fordham University. He is an independent statistical consultant and has been using SPSS products for over 20 years. He is a former SPSS Curriculum Team Lead and Senior Education Specialist who has written numerous SPSS training courses and trained thousands of users.
Read more about Jesus Salcedo

author image
Keith McCormick

Keith McCormick is a career long practitioner of predictive analytics and data science. He has engaged in statistical modeling, data mining, and mentoring others in the area for more than 20 years. He has a particular expertise in helping organizations perform their first predictive analytics project or build their first predictive analytics practice, and has done so in a variety of industries including healthcare, banking, telecommunications, non-profit, direct mail, pharmaceuticals, and retail. Keith is also an established author and speaker with four books in print, or under contract. Although his consulting work is not restricted to any one tool, his writing and speaking has made him particularly well known in the IBM SPSS Statistics and IBM SPSS Modeler communities.
Read more about Keith McCormick