- Data Analysis with IBM SPSS Statistics
- Kenneth Stehlik Barry Anthony J. Babinec
- 484字
- 2021-07-02 18:13:51
Using explore to check subgroup patterns
While explore is useful for looking at the distribution of individual fields, it is particularly helpful for the investigation of patterns across subsets of the data. We'll look at an example of this approach next. Go back to the Explore dialog box, the HIGHEST YEAR OF SCHOOL COMPLETED field should still be in the upper Dependent List box (if not, add it). In the lower Factor List, add REGION OF INTERVIEW and click on OK.
The descriptives produced by explore now contain a separate set of results for each of the nine regions used to group the states for the purposes of the survey. Values for New England (Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont) are shown first (see Figure 12) as this region is coded with the value 1 in the data.
This area of the US is relatively well-educated as can be seen by the mean (14.29) and median (14) values in the table:

By comparison, the West South Central region (Arkansas, Louisiana, Oklahoma, and Texas), which is coded 7 in the data, has a lower mean (12.91) and median (12) years of schooling:

The stem and leaf plot for the New England region (see the figure below) indicates that there are only two extreme values and a large proportion of individuals with 14 and 16 years of education:

The corresponding plot for the West South Central region, shown in the following figure, has 19 extreme values at the lower end, 8 or fewer years, and another 19 extreme values at the higher end, 18 or more years of schooling. It is also evident that in this area of the US, people very often finish their education after 12 years when they complete high school:

The boxplot (following figure) included in the explore output provides an excellent visual depiction of the pattern across the groups and highlights potential areas to address in terms of the distribution of education. At a glance, one can see that five of the regions (New England, Middle Atlantic, South Atlantic, Mountain, and Pacific) have a similar pattern in terms of the median (14), size of the box, and small number of extreme values. By contrast, the West North Central and West South Central regions have a lower median value (12), a smaller box indicating a concentration of values just above the median, and several extreme values at both the top and bottom. These patterns are important because the variance across, groups involved in an analysis is assumed to be consistent and, when that is not the case, it can cause problems. The boxplot is a convenient means of comparing the variability of the subgroups in the data visually on a single page:

- 跟“龍哥”學C語言編程
- AngularJS Web Application Development Blueprints
- C和C++安全編碼(原書第2版)
- Magento 2 Theme Design(Second Edition)
- Xcode 7 Essentials(Second Edition)
- Python Tools for Visual Studio
- 從0到1:Python數據分析
- Salesforce Reporting and Dashboards
- FPGA Verilog開發實戰指南:基于Intel Cyclone IV(進階篇)
- Android項目實戰:手機安全衛士開發案例解析
- Web前端應用開發技術
- 虛擬現實建模與編程(SketchUp+OSG開發技術)
- C# 7.0本質論
- Mastering Magento Theme Design
- Practical Time Series Analysis