書名： Hands-On Data Science with Anaconda
作者名： Dr. Yuxing Yan James Yan
本章字數： 268字
更新時間： 2021-06-25 21:08:50

Data sorting

In R, we have several ways to sort data. The easiest way is to use the sort() function (see the code for the simplest one-dimensional data):

> set.seed(123) 
> x<-rnorm(100) 
> head(x) 
[1] -0.56047565 -0.23017749  1.55870831  0.07050839  0.12928774  1.71506499 
> y<-sort(x) 
> head(y) 
[1] -2.309169 -1.966617 -1.686693 -1.548753 -1.265396 -1.265061

Let's look at another way to sort data. The dataset used is called nyseListing, which is included in the R package called fImport, shown here:

library(fImport) 
data(nyseListing) 
dim(nyseListing) 
head(nyseListing)

The output is shown here:

In total, we have 3,387 observations, each with 4 variables. The dataset is sorted by Symbol, as in the tickers of inpidual stocks. Assume that we want to sort them by Name, as shown here:

> x<-nyseListing[order(nyseListing$Name),] 
> head(x)

The output shows that the dataset is indeed sorted by company Name:

In the following example, we sort by ID first, then by RET:

> x<-c(1,3,1, 0.1,0.3,-0.4,100,300,30) 
> y<-data.frame(matrix(x,3,3)) 
> colnames(y)<-c("ID","RET","Data1") 
> y

Our simple output dataset is shown here:

To sort the data according to ID and RET, we could use order(ID,RET), shown here:

> z<-y[order(y$ID,y$RET),] 
> z

The following screenshot shows that the output dataset was sorted correctly:

If we want to sort according to decreasing order, we could add decreasing=TRUE:

> z2<-y[order(y$ID,decreasing = TRUE,y$RET),] 
> z2 
  ID  RET Data1 
2  3  0.3   300 
1  1  0.1   100 
3  1 -0.4    30

To sort data in Python, see the following code:

import pandas as pd 
a = pd.DataFrame([[8,3],[8,2],[1,-1]],columns=['X','Y']) 
print(a) 
# sort by A ascending, then B descending 
b= a.sort_values(['X', 'Y'], ascending=[1, 0]) 
print(b) 
# sort by A and B, both ascending 
c= a.sort_values(['X', 'Y'], ascending=[1, 1]) 
print(c)

The output is shown here:

官术网_书友最值得收藏!

Hands-On Data Science with Anaconda

Data sorting