Reader small image

You're reading from  Data Analysis with IBM SPSS Statistics

Product typeBook
Published inSep 2017
PublisherPackt
ISBN-139781787283817
Edition1st Edition
Right arrow
Authors (2):
Ken Stehlik-Barry
Ken Stehlik-Barry
author image
Ken Stehlik-Barry

Kenneth Stehlik-Barry, PhD, joined SPSS as Manager of Training in 1980 after using SPSS for his own research for several years. Working with others at SPSS, including Anthony Babinec, he developed a series of courses related to the use of SPSS and taught these courses to numerous SPSS users. He also managed the technical support and statistics groups at SPSS. Along with Norman Nie, the founder of SPSS and Jane Junn, a political scientist, he co-authored Education and Democratic Citizenship. Dr. Stehlik-Barry has used SPSS extensively to analyze data from SPSS and IBM customers to discover valuable patterns that can be used to address pertinent business issues. He received his PhD in Political Science from Northwestern University and currently teaches in the Masters of Science in Predictive Analytics program there.
Read more about Ken Stehlik-Barry

Anthony Babinec
Anthony Babinec
author image
Anthony Babinec

Anthony J. Babinec joined SPSS as a Statistician in 1978 after assisting Norman Nie, SPSS founder, in a research methods class at the University of Chicago. Anthony developed SPSS courses and trained many SPSS users. He also wrote many examples found in SPSS documentation and worked in technical support. Anthony led a business development effort to find products implementing then-emerging new technologies such as CHAID decision trees and neural networks and helped SPSS customers successfully apply them. Anthony uses SPSS in consulting engagements and teaches IBM customers how to use its advanced features. He received his BA and MA in Sociology with a specialization in Advanced Statistics from the University of Chicago and teaches classes at the Institute for Statistics Education. He is on the Board of Directors of the Chicago Chapter of the American Statistical Association, where he has served in different positions including President.
Read more about Anthony Babinec

View More author details
Right arrow

Statistics for Individual Data Elements

Prior to beginning analysis, it is essential to assess the data in terms of its quality and potential to yield insights. This is done initially by examining individual fields within the data and cross-checking key elements to determine the integrity of the data. This chapter will cover techniques that you can employ to establish the foundation for subsequent investigation of patterns. It will also help to introduce several of the most basic features of the SPSS Statistics software that you will make use of regularly. We will cover the following procedures in this chapter:

  • Descriptives
  • Frequencies
  • Explore

Getting the sample data

The examples in this section will use a subset of the General Social Survey from 2016 with only 28 fields out of the original 896. After downloading and opening the General Social Survey file for 2016 in the SPSS Statistics format, you can run the following code to create a file that will produce the same results shown in this chapter. Remember to change the directory reference on the second line of the SPSS code to reflect the directory on your machine where you want to have this new file saved:

* create GSS2014small with 28 fields.
* change the directory reference below as needed.
SAVE OUTFILE='C:GSS DataGSS2014sm28.sav'
/keep = happy marital hapmar age
VOTE12 PRES12 educ speduc natpark natroad NATENRGY cappun natmass natchld natsci
partyid degree incom16 satfin size spdeg polviews
rincom16 res16 childs wrkstat sex region /COMPRESSED.
...

Descriptive statistics for numeric fields

The descriptives procedure in SPSS Statistics provides you with an easy way to get a comprehensive picture of all the numeric fields in a dataset. As was noted in Chapter 2, Accessing and Organizing Data, the way in which a field is coded determines how it can be used in SPSS Statistics. Data fields coded with characters will not be available for use in the Descriptives dialog as it produces summary statistics only. Text fields in your data will need to be examined using a different approach, which will be covered next section of this chapter.

To obtain a table with all the numeric fields from your data along with some basic information such as the count, mean, and standard deviation, select Descriptive Statistics under the Analyze menu and click on the second choice, Descriptives. Highlight the first field--which in this dataset...

Discovering coding issues using frequencies

The frequency distribution for INCOME in the following screenshot demonstrates another reason why it is important to examine the pattern for individual data fields before diving into analytics more deeply. Navigate to Analyze | Descriptive Statistics | Frequencies, and select Respondents Income to build this table:

The values coded in the data are displayed in Figure 3 along with the associated value labels. This was done on the Edit | Options | Output screen by specifying values and labels in the dropdown at the lower left under to pivot table labeling.

People are often reluctant to divulge their income so surveys typically ask them to select an income category like the groupings in this table. Notice, however, that the groups (numbered 1 through 26) represent unequal bands of income. The groups coded 3 thru 7 represent a range of...

Explore procedure

To thoroughly examine the distribution of scale or interval level fields, you can employ the explore procedure in SPSS Statistics. The output provided by explore is more detailed than descriptives or frequencies, and includes more information on extreme values that may influence statistical measures in an undesirable manner. Navigate to Analyze | Descriptive Statistics | Explore to open the dialog box in the following figure and put the HIGHEST YEAR OF SCHOOL COMPLETED field in the upper box labeled Dependent List. Select OK to request the default output that explore generates:

The first section of results produced by Explore contains a set of descriptive statistics related to the distribution of the values. In addition to the mean, a 5% trimmed mean is calculated to show how removing the top and bottom 2.5% of the values influences the mean.

If the mean...

Summary

You will find the techniques covered in this chapter valuable not only initially when working with a new set of data, but throughout the analytic journey as patterns are investigated and further exploration of the results is undertaken.

Understanding the structure of the data in detail is critical before moving on to more sophisticated analytical methods as they often characterize the relationship found into a handful of summary statistics. The diagnostics accompanying these statistics provide a means of assessing how well they capture the patterns, but appreciating in advance where issues are likely to be present helps focus the examination of the results.

The next chapter will expand on the topic of outliers touched on here and address the issue of missing values. Both of these situations occur regularly when dealing with real data and there are several approaches that...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Analysis with IBM SPSS Statistics
Published in: Sep 2017Publisher: PacktISBN-13: 9781787283817
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Ken Stehlik-Barry

Kenneth Stehlik-Barry, PhD, joined SPSS as Manager of Training in 1980 after using SPSS for his own research for several years. Working with others at SPSS, including Anthony Babinec, he developed a series of courses related to the use of SPSS and taught these courses to numerous SPSS users. He also managed the technical support and statistics groups at SPSS. Along with Norman Nie, the founder of SPSS and Jane Junn, a political scientist, he co-authored Education and Democratic Citizenship. Dr. Stehlik-Barry has used SPSS extensively to analyze data from SPSS and IBM customers to discover valuable patterns that can be used to address pertinent business issues. He received his PhD in Political Science from Northwestern University and currently teaches in the Masters of Science in Predictive Analytics program there.
Read more about Ken Stehlik-Barry

author image
Anthony Babinec

Anthony J. Babinec joined SPSS as a Statistician in 1978 after assisting Norman Nie, SPSS founder, in a research methods class at the University of Chicago. Anthony developed SPSS courses and trained many SPSS users. He also wrote many examples found in SPSS documentation and worked in technical support. Anthony led a business development effort to find products implementing then-emerging new technologies such as CHAID decision trees and neural networks and helped SPSS customers successfully apply them. Anthony uses SPSS in consulting engagements and teaches IBM customers how to use its advanced features. He received his BA and MA in Sociology with a specialization in Advanced Statistics from the University of Chicago and teaches classes at the Institute for Statistics Education. He is on the Board of Directors of the Chicago Chapter of the American Statistical Association, where he has served in different positions including President.
Read more about Anthony Babinec