Reader small image

You're reading from  Data Analysis with IBM SPSS Statistics

Product typeBook
Published inSep 2017
PublisherPackt
ISBN-139781787283817
Edition1st Edition
Right arrow
Authors (2):
Ken Stehlik-Barry
Ken Stehlik-Barry
author image
Ken Stehlik-Barry

Kenneth Stehlik-Barry, PhD, joined SPSS as Manager of Training in 1980 after using SPSS for his own research for several years. Working with others at SPSS, including Anthony Babinec, he developed a series of courses related to the use of SPSS and taught these courses to numerous SPSS users. He also managed the technical support and statistics groups at SPSS. Along with Norman Nie, the founder of SPSS and Jane Junn, a political scientist, he co-authored Education and Democratic Citizenship. Dr. Stehlik-Barry has used SPSS extensively to analyze data from SPSS and IBM customers to discover valuable patterns that can be used to address pertinent business issues. He received his PhD in Political Science from Northwestern University and currently teaches in the Masters of Science in Predictive Analytics program there.
Read more about Ken Stehlik-Barry

Anthony Babinec
Anthony Babinec
author image
Anthony Babinec

Anthony J. Babinec joined SPSS as a Statistician in 1978 after assisting Norman Nie, SPSS founder, in a research methods class at the University of Chicago. Anthony developed SPSS courses and trained many SPSS users. He also wrote many examples found in SPSS documentation and worked in technical support. Anthony led a business development effort to find products implementing then-emerging new technologies such as CHAID decision trees and neural networks and helped SPSS customers successfully apply them. Anthony uses SPSS in consulting engagements and teaches IBM customers how to use its advanced features. He received his BA and MA in Sociology with a specialization in Advanced Statistics from the University of Chicago and teaches classes at the Institute for Statistics Education. He is on the Board of Directors of the Chicago Chapter of the American Statistical Association, where he has served in different positions including President.
Read more about Anthony Babinec

View More author details
Right arrow

Dealing with Missing Data and Outliers

The earlier chapters showed you how to read common file formats and define Variable Properties. In any project, as you pull together the data that helps you address your business question or research question, you must spend some time gaining an understanding of your data via a data audit. Simple procedures such as Frequencies, Descriptives, or Examine can give you a summary understanding of each variable via statistical and graphical means. In addition, the data audit should focus on unusual/extreme values and the nature and extent of missing data.

The topics covered in this chapter include the following:

Outliers:

  • Frequencies for a histogram and percentile values
  • Descriptives for standardized scores
  • The Examine procedure for extreme values and boxplot
  • Detecting multivariate outliers using the Regression procedure

Missing data:

  • Missing...

Outliers

An outlier is an observation that lies an unusual distance from other observations. There is a judgmental element in deciding what is considered unusual, and it helps to work with the subject-matter expert in deciding this. In exploratory data analysis, there are two activities that are linked:

  • Examining the overall shape of the graphed data for important features
  • Examining the data for unusual observations that are far from the mass or general trend of the data

Outliers are data points that deserve a closer look. The values could be real data values accurately recorded or the values could be misrecorded or otherwise flawed data. You need to discern what is the case in your situation and decide what action to take.

In this section, we consider statistical and graphical ways of summarizing the distribution of a variable and detecting unusual/extreme values. IBM SPSS...

Missing data

Just as you ought to assess outliers and extreme values in the variables being analyzed, you should also assess the missing responses in the variables being analyzed. For a given variable, what number or fraction of responses is missing? What is or are the mechanisms by which missing values happen? Is the missingness in a variable related to values on another variable or perhaps that same variable? Fully addressing these questions in the context of your data can be hard work, and a full discussion is beyond the scope of this book. Here, we briefly address why missing data matters and show some analyses that you can do.

Why should you be concerned about missing data?

There are two reasons:

  • Statistical efficiency
  • Bias

Statistical efficiency has to do with the relationship between sample size and precision. If your data is a random sample from a population, then along...

Summary

In the early stages of working with a dataset, you gain data understanding by at least selectively performing outlier analysis and missing value analysis. IBM SPSS Statistics offers many useful facilities for outlier analysis. In this chapter, we looked at ways of generating histograms, percentiles, z-scores, and boxplots to gain an understanding of outliers. In addition, most procedures in IBM SPSS Statistics produce a simple summary table of valid and missing cases. We also saw how to look at missing value patterns and perform mean substitution. In the next chapter, we turn to visually exploring the data through charts.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Analysis with IBM SPSS Statistics
Published in: Sep 2017Publisher: PacktISBN-13: 9781787283817
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Ken Stehlik-Barry

Kenneth Stehlik-Barry, PhD, joined SPSS as Manager of Training in 1980 after using SPSS for his own research for several years. Working with others at SPSS, including Anthony Babinec, he developed a series of courses related to the use of SPSS and taught these courses to numerous SPSS users. He also managed the technical support and statistics groups at SPSS. Along with Norman Nie, the founder of SPSS and Jane Junn, a political scientist, he co-authored Education and Democratic Citizenship. Dr. Stehlik-Barry has used SPSS extensively to analyze data from SPSS and IBM customers to discover valuable patterns that can be used to address pertinent business issues. He received his PhD in Political Science from Northwestern University and currently teaches in the Masters of Science in Predictive Analytics program there.
Read more about Ken Stehlik-Barry

author image
Anthony Babinec

Anthony J. Babinec joined SPSS as a Statistician in 1978 after assisting Norman Nie, SPSS founder, in a research methods class at the University of Chicago. Anthony developed SPSS courses and trained many SPSS users. He also wrote many examples found in SPSS documentation and worked in technical support. Anthony led a business development effort to find products implementing then-emerging new technologies such as CHAID decision trees and neural networks and helped SPSS customers successfully apply them. Anthony uses SPSS in consulting engagements and teaches IBM customers how to use its advanced features. He received his BA and MA in Sociology with a specialization in Advanced Statistics from the University of Chicago and teaches classes at the Institute for Statistics Education. He is on the Board of Directors of the Chicago Chapter of the American Statistical Association, where he has served in different positions including President.
Read more about Anthony Babinec