官术网_书友最值得收藏!

Working with Excel files

Excel is another popular tool used to store and analyze data. Of course, one can convert Excel files to CSV files or other text formats by using Excel. Alternatively, to simplify the process, you can use install and load the xlsx package to read and process Excel data in R.

Getting ready

In this recipe, you need to prepare your environment with R installed and a computer that can access the Internet.

How to do it…

Please perform the following steps to read Excel documents:

  1. First, install and load the xlsx package:
    > install.packages("xlsx")
    > library(xlsx)
    
  2. Access www.data.worldbank.org/topic/economy-and-growth to find world economy indicator data in Excel:

    Figure 6: World economy indicator

  3. Download world economy indicator data from the following URL using download.file:
    > download.file("http://api.worldbank.org/v2/en/topic/3?downloadformat=excel", "worldbank.xls", mode="wb")
    
  4. Examine the downloaded file with Excel (or Open Office):

    Figure 7: Using Excel to examine the downloaded file

  5. You can use read.xlsx2 to read data from the downloaded Excel file:
    > options(java.parameters = "-Xmx2000m")
    > wb <- read.xlsx2("worldbank.xls", sheetIndex = 1, startRow = 4)
    
  6. Select the country name, country code, indicator name, indicator code, and data for 2014 out of the read data:
    > wb2 <- wb[c("Country.Name","Country.Code","Indicator.Name","Indicator.Code", "X2014")]
    
  7. Subsequently, you can examine the dimension of the file using the dim function:
    > dim(wb2)
    
  8. Finally, you can write filtered data back to a file named 2014wbdata.xlsx:
    > write.xlsx2(wb2, "2014wbdata.xlsx", sheetName = "Sheet1")
    

How it works…

In this recipe, we covered how to read and write an Excel file containing world development indicators with the xlsx package. In the first step, we needed to install and load the xlsx package, which enables the user to read and write Excel files in the R command prompt through the use of a Java POI package. Thus, to utilize the Java POI package, the installation process will also install rJava and xlsxjars at the same time. You can find the Java POI .jar file under <R Installed Path>\library\xlsx]jars\java. Using the author's computer as an example, which has the Windows 7 operating system installed, the .jar files are located at the C:\Program Files\R\R-3.2.1\library\xlsxjars\java path.

Next, we downloaded world economy indicator data from the link (http://data.worldbank.org/topic/economy-and-growth) with the download.file function. By default, download.file downloads the file in ASCII mode. To download the file in binary mode, we need to set download mode to wb.

After the Excel file is downloaded, we can examine it with Excel. The screenshot of the Excel file shows that the economy indicator starts from row 4 in Sheet 1. Therefore, we can use the read.xlsx2 function to read world economy indicators from this location. The xlsx package provides two functions to read data from Excel: read.xlsx and read.xlsx2. As the read.xlsx2 function mainly processes data in Java, read.xlsx2 performs better (in particular, read.xlsx2 processes data considerably faster on sheets with more than 100,000 cells).

After we have read the contents of the worksheet into an R data frame, we can select variables Country.Name, Country.Code, Indicator.Name, Indicator.Code, and X2014 out of the extracted R data frame. Next, we can use the dim function to examine the dimensions of the data frame. Finally, we can use write.xlsx2 to write transformed data to an Excel file, 2014wbdata.xlsx.

主站蜘蛛池模板: 墨江| 张家川| 贵溪市| 奉贤区| 六安市| 青冈县| 永修县| 奈曼旗| 乌拉特前旗| 聂荣县| 南木林县| 嘉义县| 三台县| 迁西县| 和政县| 竹山县| 台北县| 柘荣县| 墨脱县| 连城县| 顺昌县| 柞水县| 五寨县| 会昌县| 邛崃市| 兰西县| 天台县| 石首市| 泰宁县| 乐昌市| 蒙城县| 响水县| 平昌县| 玉门市| 台中市| 景宁| 建水县| 文水县| 大兴区| 新闻| 且末县|