Descriptive statistics for numeric fields

The descriptives procedure in SPSS Statistics provides you with an easy way to get a comprehensive picture of all the numeric fields in a dataset. As was noted in Chapter 2, Accessing and Organizing Data, the way in which a field is coded determines how it can be used in SPSS Statistics. Data fields coded with characters will not be available for use in the Descriptives dialog as it produces summary statistics only. Text fields in your data will need to be examined using a different approach, which will be covered next section of this chapter.

To obtain a table with all the numeric fields from your data along with some basic information such as the count, mean, and standard deviation, select Descriptive Statistics under the Analyze menu and click on the second choice, Descriptives. Highlight the first field--which in this dataset is Age--scroll down to the last field listed on the left, VOTE OBAMA OR ROMNEY [PRES12], and use Shift-Click to select all fields.

Click on the arrow in the middle of the dialog to move the list to the box on the left, as shown in the following image, and then click on OK:

The descriptive statistics for the 28 fields in this dataset are displayed in following screenshot. One of the first pieces of information to check is the N, which indicates how many of the rows contain a valid code for each field. For the 2016 General Social Survey data, the maximum value of N is 2,867 and it is evident that most of the fields are close to this number with a few exceptions. Questions in the survey tare dependent on a person's marital status, such as Happiness of Marriage and the items related to spouse's education, so it makes sense that the N for these fields would be lower.

A check of the Marital Status field specifically (using the frequencies procedure) can be used to confirm the number of married individuals in this dataset. The VOTE OBAMA OR ROMNEY field also has a smaller N value but this question is only asked of individuals that voted in the 2012 election. Checking the DID R VOTE IN 2012 ELECTION field is a way to confirm that this N is correct.

For some fields, such as age and years of school completed, the minimum, maximum, and mean values provide useful information as they can be interpreted directly. In this survey, only individuals in the 18 to 89 age range were included and the mean age of the group was 49.

In general, however, the numeric values used for questions such as marital status or region are associated with categories relevant to the item so the minimum, maximum, and mean are not particularly useful except to provide a sense of the range of values in the data. At the bottom of the table, there is Valid N (listwise), which indicates how many of the 2,867 individuals surveyed had a valid value for each of the 28 questions in the table. This number can be very helpful, especially when selecting fields to use in multivariate analysis.

Here, it is useful to note that while the smallest N value for the 28 fields is 1,195, only 422 of those surveyed had a valid value on all the questions. This illustrates how absent information can dramatically reduce the number of rows available for use in analysis. Strategies to deal with missing data will be covered in a later chapter, but descriptive statistics is an important means of identifying the magnitude of the challenge before embarking on a more detailed investigation of the data:

官术网_书友最值得收藏!

Data Analysis with IBM SPSS Statistics

Descriptive statistics for numeric fields