- R for Data Science Cookbook
- Yu Wei Chiu (David Chiu)
- 487字
- 2021-07-14 10:51:25
Working with Excel files
Excel is another popular tool used to store and analyze data. Of course, one can convert Excel files to CSV files or other text formats by using Excel. Alternatively, to simplify the process, you can use install
and load the xlsx
package to read and process Excel data in R.
Getting ready
In this recipe, you need to prepare your environment with R installed and a computer that can access the Internet.
How to do it…
Please perform the following steps to read Excel documents:
- First, install and load the
xlsx
package:> install.packages("xlsx") > library(xlsx)
- Access www.data.worldbank.org/topic/economy-and-growth to find world economy indicator data in Excel:
Figure 6: World economy indicator
- Download world economy indicator data from the following URL using
download.file
:> download.file("http://api.worldbank.org/v2/en/topic/3?downloadformat=excel", "worldbank.xls", mode="wb")
- Examine the downloaded file with Excel (or Open Office):
Figure 7: Using Excel to examine the downloaded file
- You can use
read.xlsx2
to read data from the downloaded Excel file:> options(java.parameters = "-Xmx2000m") > wb <- read.xlsx2("worldbank.xls", sheetIndex = 1, startRow = 4)
- Select the country name, country code, indicator name, indicator code, and data for 2014 out of the read data:
> wb2 <- wb[c("Country.Name","Country.Code","Indicator.Name","Indicator.Code", "X2014")]
- Subsequently, you can examine the dimension of the file using the
dim
function:> dim(wb2)
- Finally, you can write filtered data back to a file named
2014wbdata.xlsx
:> write.xlsx2(wb2, "2014wbdata.xlsx", sheetName = "Sheet1")
How it works…
In this recipe, we covered how to read and write an Excel file containing world development indicators with the xlsx
package. In the first step, we needed to install and load the xlsx
package, which enables the user to read and write Excel files in the R command prompt through the use of a Java POI package. Thus, to utilize the Java POI package, the installation process will also install rJava
and xlsxjars
at the same time. You can find the Java POI .jar
file under <R Installed Path>\library\xlsx]jars\java
. Using the author's computer as an example, which has the Windows 7 operating system installed, the .jar
files are located at the C:\Program Files\R\R-3.2.1\library\xlsxjars\java
path.
Next, we downloaded world economy indicator data from the link (http://data.worldbank.org/topic/economy-and-growth) with the download.file
function. By default, download.file
downloads the file in ASCII mode. To download the file in binary mode, we need to set download mode to wb
.
After the Excel file is downloaded, we can examine it with Excel. The screenshot of the Excel file shows that the economy indicator starts from row 4 in Sheet 1. Therefore, we can use the read.xlsx2
function to read world economy indicators from this location. The xlsx
package provides two functions to read data from Excel: read.xlsx
and read.xlsx2
. As the read.xlsx2
function mainly processes data in Java, read.xlsx2
performs better (in particular, read.xlsx2
processes data considerably faster on sheets with more than 100,000 cells).
After we have read the contents of the worksheet into an R data frame, we can select variables Country.Name
, Country.Code
, Indicator.Name
, Indicator.Code
, and X2014
out of the extracted R data frame. Next, we can use the dim
function to examine the dimensions of the data frame. Finally, we can use write.xlsx2
to write transformed data to an Excel file, 2014wbdata.xlsx
.
- 工程軟件開發技術基礎
- 零基礎玩轉區塊鏈
- Android Application Development Cookbook(Second Edition)
- Python數據分析(第2版)
- Visual Basic程序設計實驗指導(第4版)
- PySide GUI Application Development(Second Edition)
- Go語言開發實戰(慕課版)
- OpenCV 3 Blueprints
- Web App Testing Using Knockout.JS
- JavaScript程序設計:基礎·PHP·XML
- C語言程序設計教程
- 絕密原型檔案:看看專業產品經理的原型是什么樣
- ASP.NET Core 2 High Performance(Second Edition)
- 多接入邊緣計算實戰
- Getting Started with Backbone Marionette