官术网_书友最值得收藏!

Boxplot

A boxplot is a standardized way of displaying the distribution of data based on a five number summary (minimum, first quartile (Q1), median, third quartile (Q3), and maximum). Probably, boxplot is the only chart that encapsulates much information in a beautiful looking representation compared to any other charts. Observe the summary of the age variable by each job type. The five summary statistics, that is, min, first quartile, median, mean, third quartile, and max, are described succinctly by a boxplot.

The 25th and 75th percentiles, in the first and third quartiles, are shown by lower and upper hinges, respectively. The upper whisper, which extends from the hinges to the maximum value, is within an IQR of 1.5 *, from the hinge. This is where the IQR is the inter-quartile range or distance between the two quartiles. This is similar in case of the lower hinge. All the points that are outside the hinges are called outliers:

tapply(df_bank_detail$age, df_bank_detail$job, summary)

The output is as follows:

## $admin.

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 20.00 32.00 38.00 39.29 46.00 75.00

##

## $'blue-collar'

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 20.00 33.00 39.00 40.04 47.00 75.00

##

## $entrepreneur

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 21.00 35.00 41.00 42.19 49.00 84.00

##

## $housemaid

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 22.00 38.00 47.00 46.42 55.00 83.00

##

## $management

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 21.00 33.00 38.00 40.45 48.00 81.00

##

## $retired

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 24.00 56.00 59.00 61.63 67.00 95.00

##

## $'self-employed'

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 22.00 33.00 39.00 40.48 48.00 76.00

##

0

In the following boxplot, we are looking at the summary of age with respect to each job type. The size of the box that is set to varwidth = TRUE in geom_boxplot shows the number of observations in the particular job type. The wider the box, the larger the number of observations:

ggplot(data = df_bank_detail, mapping = aes(x=job, y = age, fill = job)) +

geom_boxplot(varwidth = TRUE) +

theme(axis.text.x = element_text(angle=90, vjust=.8, hjust=0.8))

Figure 1.12: Boxplot of age and job

主站蜘蛛池模板: 武清区| 溆浦县| 公安县| 扎兰屯市| 博乐市| 乡宁县| 麻江县| 墨江| 武平县| 淅川县| 河南省| 荣成市| 昌图县| 垦利县| 沅陵县| 恩施市| 乐平市| 郓城县| 公主岭市| 依安县| 泾川县| 呈贡县| 柳江县| 唐河县| 乌拉特后旗| 天门市| 红桥区| 铅山县| 怀宁县| 太谷县| 鹤庆县| 上犹县| 镇康县| 双鸭山市| 万源市| 丰县| 洛扎县| 利辛县| 蒙自县| 达拉特旗| 昌宁县|