官术网_书友最值得收藏!

Using tibble and dplyr for data manipulation

tibble is a recent development. It is essentially a more user-friendly version of DataFrames. For example, when you view data.frame in R, it will attempt to print as many rows as your console supports until it reaches the max.print value, at which point you'll get the following message:

getOption("max.print") -- omitted 99000 rows 

tibble, on the other hand, will show only the first few rows by default and adjust the viewable columns based on your viewable area on the screen.

To use tibble, and other related functionalities, install the tidyverse package as follows:

install.packages("tidyverse") 
library("tidyverse") 

The output of library("tidyverse")  is as follows:

Let us create tibble of the state DataFrame that we have used thus far:

tstate <- as_tibble(state.x77) 
tstate$Region <- state.region 

Before getting into the details of dplyr, it would help to get familiarized with a commonly used notation in R called pipe, which is represented as %>%. This notation has been a recent development.

Pipes allow the developer to pass the output of one function in the input of a subsequent function successively. For instance, suppose we wanted to find Region with the highest income from our state dataset. 

One way to find the region with the maximum income would be to aggregate by Region and then find Region corresponding to the highest value, as follows:

step1 <- aggregate(tstate[,-c(9)], by=list(state$Region), mean, na.rm = T) 
step1 

The output is as follows:

step2 <- step1[step1$Income==max(step1$Income),] 
step2 

This can, however, be greatly simplified using the %>% pipe operator, as follows:

tstate %>% group_by(Region) %>% summarise(Income = mean(Income)) %>% filter(Income == max(Income)) 
 
# # A tibble: 1 x 2 
# Region   Income 
# <fctr>    <dbl> 
#   1   West 4702.615 

It is also possible to summarize all of the column values at once using summarise_all and find the row corresponding to the max income, as in the prior example:

tstate %>% group_by(Region) %>% summarise_all(funs(mean)) %>% filter(Income == max(Income)) 

The output is as follows:

主站蜘蛛池模板: 龙口市| 镇坪县| 苍南县| 泗洪县| 海阳市| 隆回县| 漾濞| 阜宁县| 淮阳县| 治县。| 镇赉县| 剑阁县| 台山市| 清新县| 黄平县| 南郑县| 萨迦县| 那坡县| 讷河市| 游戏| 延川县| 鄂温| 四川省| 鄂尔多斯市| 静乐县| 子长县| 松潘县| 黄石市| 吉木乃县| 东阿县| 崇信县| 伊金霍洛旗| 屏山县| 栾城县| 渑池县| 鹤庆县| 吕梁市| 龙泉市| 万源市| 什邡市| 昭通市|