- Hands-On Data Science with Anaconda
- Dr. Yuxing Yan James Yan
- 268字
- 2021-06-25 21:08:50
Data sorting
In R, we have several ways to sort data. The easiest way is to use the sort() function (see the code for the simplest one-dimensional data):
> set.seed(123) > x<-rnorm(100) > head(x) [1] -0.56047565 -0.23017749 1.55870831 0.07050839 0.12928774 1.71506499 > y<-sort(x) > head(y) [1] -2.309169 -1.966617 -1.686693 -1.548753 -1.265396 -1.265061
Let's look at another way to sort data. The dataset used is called nyseListing, which is included in the R package called fImport, shown here:
library(fImport) data(nyseListing) dim(nyseListing) head(nyseListing)
The output is shown here:

In total, we have 3,387 observations, each with 4 variables. The dataset is sorted by Symbol, as in the tickers of inpidual stocks. Assume that we want to sort them by Name, as shown here:
> x<-nyseListing[order(nyseListing$Name),] > head(x)
The output shows that the dataset is indeed sorted by company Name:

In the following example, we sort by ID first, then by RET:
> x<-c(1,3,1, 0.1,0.3,-0.4,100,300,30) > y<-data.frame(matrix(x,3,3)) > colnames(y)<-c("ID","RET","Data1") > y
Our simple output dataset is shown here:

To sort the data according to ID and RET, we could use order(ID,RET), shown here:
> z<-y[order(y$ID,y$RET),] > z
The following screenshot shows that the output dataset was sorted correctly:

If we want to sort according to decreasing order, we could add decreasing=TRUE:
> z2<-y[order(y$ID,decreasing = TRUE,y$RET),] > z2 ID RET Data1 2 3 0.3 300 1 1 0.1 100 3 1 -0.4 30
To sort data in Python, see the following code:
import pandas as pd a = pd.DataFrame([[8,3],[8,2],[1,-1]],columns=['X','Y']) print(a) # sort by A ascending, then B descending b= a.sort_values(['X', 'Y'], ascending=[1, 0]) print(b) # sort by A and B, both ascending c= a.sort_values(['X', 'Y'], ascending=[1, 1]) print(c)
The output is shown here:

- 21天學通ASP.NET
- 大數據平臺異常檢測分析系統的若干關鍵技術研究
- Photoshop CS3圖層、通道、蒙版深度剖析寶典
- 四向穿梭式自動化密集倉儲系統的設計與控制
- 21天學通Visual C++
- Cloudera Administration Handbook
- 工業控制系統測試與評價技術
- Kubernetes for Serverless Applications
- Nginx高性能Web服務器詳解
- Cloud Security Automation
- Visual Studio 2010 (C#) Windows數據庫項目開發
- 網絡服務器搭建與管理
- The DevOps 2.1 Toolkit:Docker Swarm
- Redash v5 Quick Start Guide
- Machine Learning with Spark(Second Edition)