Accessing and organizing data overview

Once you read the data to IBM SPSS Statistics, you should at least do a cursory data check of the inputted data. Do you see numeric data? String data? Is the data in the expected scale and range? Is the data complete?

Of course, even if your data is not really very large in either the number of rows or columns, it can be difficult to assess via a simple visual inspection. For this reason, you might use SPSS Statistics to produce a tabular summary of variables showing counts and percentages. Doing so produces tables showing all the data codes in the designated variables. Once you have defined the SPSS Variable Properties such as value labels, you can control the tabular display to show data values (the data codes), value labels, or both.

A further consideration is how the data values are represented for categorical variables. Let's consider Respondent's Sex as an example.

Your categorical values in an Excel spreadsheet could be string values such as male or female. If so, then IBM SPSS Statistics can read these values.

However, it is a common practice in the survey research community to use numeric codes to represent categories. In general, use sequential numbers starting from 1 to enumerate the categories. In this example, the data codes would be 1 and 2, although assignment to the genders of male and female is arbitrary. Say that males are represented by a 1 code and females are represented by a 2 code.

A drawback of using numeric codes is that tabular summaries such as a summary table of counts will list the number of 1s and 2s, but the reader would not know that 1 represents male and 2 represents female. The way to handle this situation is to use value labels, one of a number of Variable Properties you can define after successfully reading the data.

Another consideration is: what if Respondent's Sex is not known for a specific individual? If the variable is a string variable, you could represent an unknown value of Respondent's Sex as a string value such as 'unknown', or you might represent the absence of information with a string of blanks such as ' '.

If Respondent's Sex is a numeric field, an unknown value could be represented by a distinct number code such as 3, assuming that males and females would be represented by 1 and 2, respectively. In either situation, you would like your summary tables and statistics to take into account the absence of information indicated in the values 'unknown' or 3. The way to handle this situation is to use the missing values command. There is more on this next.

Value labels and missing values are two examples of variable properties, which are properties internal to IBM SPSS Statistics that are associated with each variable in the data. You can save these properties along with the data. When added, these properties inform the analysis and display of data in IBM SPSS Statistics. For example, for a variable indicating Sex of Respondent, value labels could provide gender labels 'male' and 'female' that would clarify which data code represented which gender. Or, by defining data codes as missing values, you would insure that SPSS Statistics excluded these cases from the calculation of valid percent's, for example.

Menus versus syntax
The examples in this chapter start from the menus but suggest the use of the Paste button to paste constructed syntax to the Syntax window. In the syntax window, you can run the just-pasted syntax. We discuss elements of the syntax, but encourage you to use the Help button to learn more about individual commands.

官术网_书友最值得收藏!

Data Analysis with IBM SPSS Statistics

Accessing and organizing data overview