- R for Data Science Cookbook
- Yu Wei Chiu (David Chiu)
- 573字
- 2021-07-14 10:51:30
Sorting data
The power of sorting enables us to view data in an arrangement so that we can analyze the data more efficiently. In a database, we can use an order by
clause to sort data with appointed columns. In R, we can use the order
and sort
functions to place data in an arrangement.
Getting ready
Refer to the Converting data types recipe and convert each attribute of imported data into the proper data type. Also, rename the columns of the employees
and salaries
datasets by following the steps from the Renaming the data variable recipe.
How to do it…
Perform the following steps to sort the salaries
dataset:
- First, we can use the
sort
function to sort data:> a <- c(5,1,4,3,2,6,3) > sort(a) [1] 1 2 3 3 4 5 6 > sort(a, decreasing=TRUE) [1] 6 5 4 3 3 2 1
- Next, we can determine how the
order
function works on the same input vector:> order(a) [1] 2 5 4 7 3 1 6 > order(a, decreasing = TRUE) [1] 6 1 3 4 7 5 2
- To sort a data frame by a specific column, we first obtain the ordered index and then employ the index to retrieve the sorted dataset:
> sorted_salaries <- salaries[order(salaries$salary, decreasing = TRUE),] > head(sorted_salaries) emp_no salary from_date to_date 684 10068 113229 2001-08-03 9999-01-01 683 10068 112470 2000-08-03 2001-08-03 682 10068 111623 1999-08-04 2000-08-03 681 10068 108345 1998-08-04 1999-08-04 680 10068 106204 1997-08-04 1998-08-04 679 10068 105533 1996-08-04 1997-08-04
- Besides sorting data by a single column, we can sort data by multiple columns:
> sorted_salaries2 <-salaries[order(salaries$salary, salaries$from_date, decreasing = TRUE),] > head(sorted_salaries2) emp_no salary from_date to_date 684 10068 113229 2001-08-03 9999-01-01 683 10068 112470 2000-08-03 2001-08-03 682 10068 111623 1999-08-04 2000-08-03 681 10068 108345 1998-08-04 1999-08-04 680 10068 106204 1997-08-04 1998-08-04 679 10068 105533 1996-08-04 1997-08-04
How it works…
R provides two methods to sort data: one is sort
and the other is order
. For the sort function, the function returns sorted vector as output. In our first case, we set up an a
integer vector with seven integer elements. We then applied the sort
function to sort the a
vector, which yielded a sorted vector as the output. By default, the sorted vector is in ascending order. However, we can change the order sequence by specifying decreasing to TRUE
. On the other hand, the order
function returns an ordering index vector as output. Still, we can specify whether the returned index vector is in ascending or descending order.
To arrange elements in the vector in ascending or descending order, we can simply use the sort
function. However, to arrange records in a specific column, we should use the order
function. In our example, we first obtained the ordering index in descending order from the salary
attribute and then retrieved the record from salaries
with an ordering index. As a result, we found records in salaries
arranged by salary. Besides sorting records by a single attribute, we can sort records by multiple attributes. All we need to do is to place the salary
and from_date
attributes one by one in the order
function.
There's more…
You can use the arrange function in plyr
to sort salary data with salary
in ascending order and from_date
in descending order:
> arranged_salaries <- arrange(salaries, salary, desc(from_date)) > head(arranged_salaries) emp_no salary from_date to_date 1 10048 39507 1986-02-24 1987-01-27 2 10027 39520 1996-04-01 1997-04-01 3 10064 39551 1986-11-20 1987-11-20 4 10072 39567 1990-05-21 1991-05-21 5 10072 39724 1991-05-21 1992-05-20 6 10049 39735 1993-05-04 1994-05-04
- Python入門很簡單
- Swift 3 New Features
- 面向對象程序設計(Java版)
- GeoServer Beginner's Guide(Second Edition)
- Hands-On Enterprise Automation with Python.
- 飛槳PaddlePaddle深度學習實戰
- C#應用程序設計教程
- Microsoft Dynamics AX 2012 R3 Financial Management
- Unity 3D/2D移動開發實戰教程
- Android開發三劍客:UML、模式與測試
- JBoss:Developer's Guide
- Android Sensor Programming By Example
- Java EE 程序設計
- 少年小魚的魔法之旅:神奇的Python
- PHP高性能開發:基礎、框架與項目實戰