- R for Data Science Cookbook
- Yu Wei Chiu (David Chiu)
- 659字
- 2021-07-14 10:51:30
Reshaping data
Reshaping data is similar to creating a contingency table, which enables the user to aggregate data of specific values. The reshape2
package is designed for this specific purpose. Here, we introduce how to use the reshape2
package to transform our dataset from long to wide format with the dcast
function. We also cover how to transform it from wide format back to long format with the melt
function.
Getting ready
Refer to the Merging data recipe and merge employees
and salaries
into employees_salary
.
How to do it…
Perform the following steps to reshape data:
- First, we can use the
dcast
function to transform data from long to wide:> wide_salaries <- dcast(salaries, emp_no ~ year(ymd(from_date)), value.var="salary") > wide_salaries[1:3, 1:7] emp_no 1985 1986 1987 1988 1989 1990 1 10001 NA 60117 62102 66074 66596 66961 2 10002 NA NA NA NA NA NA 3 10003 NA NA NA NA NA NA
- We can also transform the data by keeping
emp_no
and the formatted name string as two of the attributes. Then, we can shape the year of salary payment as the column name and shape salary as its value:> wide_employees_salary <- dcast(employees_salary, emp_no + paste(first_name, last_name) ~ year(ymd(from_date)), value.var="salary", variable.name="condition") > wide_employees_salary[1:3,1:7] emp_no paste(first_name, last_name) 1985 1986 1987 1988 1989 1 10001 Georgi Facello NA 60117 62102 66074 66596 2 10002 Bezalel Simmel NA NA NA NA NA 3 10003 Parto Bamford NA NA NA NA NA
- On the other hand, we can transform the data from wide back to long format using the
melt
function:> long_salaries <- melt(wide_salaries, id.vars=c("emp_no")) > head(long_salaries) emp_no variable value 1 10001 1985 NA 2 10002 1985 NA 3 10003 1985 NA 4 10004 1985 NA 5 10005 1985 NA 6 10006 1985 NA
- To remove data with missing values in
long_salaries
, we can usena.omit
to remove this data:> head(na.omit(long_salaries)) emp_no variable value 9 10009 1985 60929 13 10013 1985 40000 48 10048 1985 40000 64 10064 1985 40000 70 10070 1985 55999 98 10098 1985 40000
How it works…
In this recipe, we demonstrated how to reshape data with the reshape2
package. First, we used the dcast
function to transform data from the long to wide format. Using this function, we specified that we wanted the data to transform as salaries
in the first parameter. Then, we specified the shaping formula by setting emp_no
as row and the year of salary paid as the column. Finally, we set salary
as presented value in wide format layout.
We can also shape the data with multiple columns; all that we need to do is add another column information on the left-hand side of the formula with a +
operator. Thus, in our second case, we set the shaping formula as emp_no + paste(first_name, last_name) ~ year(ymd(from_date))
on the employees_salary
merged data. Then, we saw that the output data was presented with emp_no
and the formatted name on the left-hand side, and the year of salary paid on the upper side and the presented salary as the value.
Besides shaping the data from long format to wide format, we can also reshape the data back to long format using the melt
function. Therefore, we can transform the wide_salaries
data back to long format using emp_no
as the basis. As there are many missing values (presented as NA
), we can use the na.omit
function to remove these records.
There's more…
In addition to the dcast
and melt
functions in the plyr
package, we can use stack
and unstack
to ungroup and group values:
- We can group data by value using the
unstack
function:> un_salaries <- unstack(long_salaries[,c(3,1)]) > head(un_salaries, 3) $`10001` [1] 60117 62102 66074 66596 66961 71046 74333 75286 75994 76884 80013 [12] 81025 81097 84917 85112 85097 88958 $`10002` [1] 65828 65909 67534 69366 71963 72527 $`10003` [1] 40006 43616 43466 43636 43478 43699 43311
- In contrast, we can concatenate multiple data frames or lists using the
stack
function:> stack_salaries <- stack(un_salaries ) > head(stack_salaries) values ind 1 60117 10001 2 62102 10001 3 66074 10001 4 66596 10001 5 66961 10001 6 71046 10001
- Data Visualization with D3 4.x Cookbook(Second Edition)
- Advanced Machine Learning with Python
- Oracle Exadata性能優化
- 軟件界面交互設計基礎
- Magento 2 Theme Design(Second Edition)
- PowerCLI Cookbook
- Python從入門到精通(精粹版)
- 游戲程序設計教程
- 低代碼平臺開發實踐:基于React
- Python Data Science Cookbook
- 鴻蒙OS應用編程實戰
- Mastering HTML5 Forms
- HTML5 Game Development by Example:Beginner's Guide(Second Edition)
- Java基礎案例教程(第2版)
- 機器人ROS開發實踐