官术网_书友最值得收藏!

Outlier detection

Outliers are very important to be taken into consideration for any analysis as they can make analysis biased. There are various ways to detect outliers in R and the most common one will be discussed in this section.

Boxplot

Let us construct a boxplot for the variable volume of the Sampledata, which can be done by executing the following code:

> boxplot(Sampledata$Volume, main="Volume", boxwex=0.1) 

The graph is as follows:

Boxplot

Figure 2.16: Boxplot for outlier detection

An outlier is an observation which is distant from the rest of the data. When reviewing the preceding boxplot, we can clearly see the outliers which are located outside the fences (whiskers) of the boxplot.

LOF algorithm

The local outlier factor (LOF) is used for identifying density-based local outliers. In LOF, the local density of a point is compared with that of its neighbors. If the point is in a sparser region than its neighbors then it is treated as an outlier. Let us consider some of the variables from the Sampledata and execute the following code:

> library(DMwR) 
> Sampledata1<- Sampledata[,2:4] 
> outlier.scores <- lofactor(Sampledata1, k=4) 
> plot(density(outlier.scores)) 

Here, k is the number of neighbors used in the calculation of the local outlier factors.

The graph is as follows:

LOF algorithm

Figure 2.17: Plot showing outliers by LOF method

If you want the top five outliers then execute the following code:

> order(outlier.scores, decreasing=T)[1:5] 

This gives an output with the row numbers:

[1] 50 34 40 33 22 
主站蜘蛛池模板: 洱源县| 五常市| 潮州市| 介休市| 军事| 昌图县| 贺兰县| 阿克陶县| 彭泽县| 宁化县| 车致| 阿勒泰市| 沁水县| 新乐市| 通渭县| 镇雄县| 鱼台县| 信丰县| 霍城县| 昌邑市| 若尔盖县| 望城县| 宁河县| 大石桥市| 北流市| 临沂市| 广南县| 图木舒克市| 凤城市| 庄河市| 大同市| 化隆| 陈巴尔虎旗| 勃利县| 思南县| 湖州市| 望都县| 庄浪县| 镇平县| 林西县| 黄骅市|