官术网_书友最值得收藏!

Outliers

An outlier is an observation that lies an unusual distance from other observations. There is a judgmental element in deciding what is considered unusual, and it helps to work with the subject-matter expert in deciding this. In exploratory data analysis, there are two activities that are linked:

  • Examining the overall shape of the graphed data for important features
  • Examining the data for unusual observations that are far from the mass or general trend of the data

Outliers are data points that deserve a closer look. The values could be real data values accurately recorded or the values could be misrecorded or otherwise flawed data. You need to discern what is the case in your situation and decide what action to take.

In this section, we consider statistical and graphical ways of summarizing the distribution of a variable and detecting unusual/extreme values. IBM SPSS Statistics provides many tools for this, which are found in procedures such as Frequencies, Examine, and Chart Builder. To explore these facilities, we introduce data on used Toyota Corollas and, in particular, look at the distribution of the offer prices, in Euros, of sales in the Netherlands in the year 2004. 

The Toyota Corolla data featured in this chapter is described in  Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner(R) , Third Edition. Galit Shmueli, Peter C. Bruce, and Nitin R. Patel. Copyright 2016 John Wiley and Sons.
主站蜘蛛池模板: 紫金县| 曲麻莱县| 新营市| 恩平市| 巫溪县| 年辖:市辖区| 长白| 田阳县| 金门县| 开鲁县| 宜良县| 沙洋县| 曲沃县| 洛扎县| 方城县| 内丘县| 安阳市| 江西省| 通榆县| 开江县| 长治县| 耒阳市| 玛多县| 厦门市| 三穗县| 奉新县| 湖北省| 屏山县| 巴楚县| 镇沅| 长宁区| 罗田县| 英山县| 四平市| 盖州市| 平山县| 砀山县| 温州市| 容城县| 临猗县| 南涧|