官术网_书友最值得收藏!

Sorting data

The power of sorting enables us to view data in an arrangement so that we can analyze the data more efficiently. In a database, we can use an order by clause to sort data with appointed columns. In R, we can use the order and sort functions to place data in an arrangement.

Getting ready

Refer to the Converting data types recipe and convert each attribute of imported data into the proper data type. Also, rename the columns of the employees and salaries datasets by following the steps from the Renaming the data variable recipe.

How to do it…

Perform the following steps to sort the salaries dataset:

  1. First, we can use the sort function to sort data:
    > a <- c(5,1,4,3,2,6,3)
    > sort(a)
    [1] 1 2 3 3 4 5 6
    > sort(a, decreasing=TRUE)
    [1] 6 5 4 3 3 2 1
    
  2. Next, we can determine how the order function works on the same input vector:
    > order(a)
    [1] 2 5 4 7 3 1 6
    > order(a, decreasing = TRUE)
    [1] 6 1 3 4 7 5 2
    
  3. To sort a data frame by a specific column, we first obtain the ordered index and then employ the index to retrieve the sorted dataset:
    > sorted_salaries <- salaries[order(salaries$salary, decreasing = TRUE),]
    > head(sorted_salaries)
     emp_no salary from_date to_date
    684 10068 113229 2001-08-03 9999-01-01
    683 10068 112470 2000-08-03 2001-08-03
    682 10068 111623 1999-08-04 2000-08-03
    681 10068 108345 1998-08-04 1999-08-04
    680 10068 106204 1997-08-04 1998-08-04
    679 10068 105533 1996-08-04 1997-08-04
    
  4. Besides sorting data by a single column, we can sort data by multiple columns:
    > sorted_salaries2 <-salaries[order(salaries$salary, salaries$from_date, decreasing = TRUE),]
    > head(sorted_salaries2)
     emp_no salary from_date to_date
    684 10068 113229 2001-08-03 9999-01-01
    683 10068 112470 2000-08-03 2001-08-03
    682 10068 111623 1999-08-04 2000-08-03
    681 10068 108345 1998-08-04 1999-08-04
    680 10068 106204 1997-08-04 1998-08-04
    679 10068 105533 1996-08-04 1997-08-04
    

How it works…

R provides two methods to sort data: one is sort and the other is order. For the sort function, the function returns sorted vector as output. In our first case, we set up an a integer vector with seven integer elements. We then applied the sort function to sort the a vector, which yielded a sorted vector as the output. By default, the sorted vector is in ascending order. However, we can change the order sequence by specifying decreasing to TRUE. On the other hand, the order function returns an ordering index vector as output. Still, we can specify whether the returned index vector is in ascending or descending order.

To arrange elements in the vector in ascending or descending order, we can simply use the sort function. However, to arrange records in a specific column, we should use the order function. In our example, we first obtained the ordering index in descending order from the salary attribute and then retrieved the record from salaries with an ordering index. As a result, we found records in salaries arranged by salary. Besides sorting records by a single attribute, we can sort records by multiple attributes. All we need to do is to place the salary and from_date attributes one by one in the order function.

There's more…

You can use the arrange function in plyr to sort salary data with salary in ascending order and from_date in descending order:

> arranged_salaries <- arrange(salaries, salary, desc(from_date))
> head(arranged_salaries)
 emp_no salary from_date to_date
1 10048 39507 1986-02-24 1987-01-27
2 10027 39520 1996-04-01 1997-04-01
3 10064 39551 1986-11-20 1987-11-20
4 10072 39567 1990-05-21 1991-05-21
5 10072 39724 1991-05-21 1992-05-20
6 10049 39735 1993-05-04 1994-05-04
主站蜘蛛池模板: 炉霍县| 元谋县| 肥西县| 朝阳县| 彩票| 内江市| 含山县| 维西| 尼木县| 南靖县| 武宁县| 如东县| 温泉县| 横峰县| 永年县| 威远县| 巴里| 共和县| 福鼎市| 乌兰察布市| 津南区| 延吉市| 云浮市| 武胜县| 措勤县| 堆龙德庆县| 威海市| 北碚区| 澄迈县| 四川省| 遂昌县| 肇州县| 建阳市| 金山区| 昆明市| 兴义市| 龙门县| 黄浦区| 昌吉市| 德钦县| 茂名市|