官术网_书友最值得收藏!

Importing and exporting different data types

In R, we can read the files stored from outside the R environment. We can also write the data into files which can be stored and accessed by the operating system. In R, we can read and write different formats of files, such as CSV, Excel, TXT, and so on. In this section, we are going to discuss how to read and write different formats of files.

The required files should be present in the current directory to read them. Otherwise, the directory should be changed to the required destination.

The first step for reading/writing files is to know the working directory. You can find the path of the working directory by running the following code:

>print (getwd()) 

This will give the paths for the current working directory. If it is not your desired directory, then please set your own desired directory by using the following code:

>setwd("") 

For instance, the following code makes the folder C:/Users the working directory:

  >setwd("C:/Users") 

How to read and write a CSV format file

A CSV format file is a text file in which values are comma separated. Let us consider a CSV file with the following content from stock-market data:

To read the preceding file in R, first save this file in the working directory, and then read it (the name of the file is Sample.csv) using the following code:

>data<-read.csv("Sample.csv") 
>print(data) 

When the preceding code gets executed, it will give the following output:

       Date    Open    High     Low   Close     Volume     Adj.Close 
1  14-10-2016 2139.68 2149.19 2132.98 2132.98 3228150000   2132.98 
2  13-10-2016 2130.26 2138.19 2114.72 2132.55 3580450000   2132.55 
3  12-10-2016 2137.67 2145.36 2132.77 2139.18 2977100000   2139.18 
4  11-10-2016 2161.35 2161.56 2128.84 2136.73 3438270000   2136.73 
5  10-10-2016 2160.39 2169.60 2160.39 2163.66 2916550000   2163.66 

Read.csv by default produces the file in DataFrame format; this can be checked by running the following code:

>print(is.data.frame(data)) 

Now, whatever analysis you want to do, you can perform it by applying various functions on the DataFrame in R, and once you have done the analysis, you can write your desired output file using the following code:

>write.csv(data,"result.csv") 
>output <- read.csv("result.csv") 
>print(output) 

When the preceding code gets executed, it writes the output file in the working directory folder in CSV format.

XLSX

Excel is the most common format of file for storing data, and it ends with extension .xls or .xlsx.

The xlsx package will be used to read or write .xlsx files in the R environment.

Installing the xlsx package has dependency on Java, so Java needs to be installed on the system. The xlsx package can be installed using the following command:

>install.packages("xlsx")

When the previous command gets executed, it will ask for the nearest CRAN mirror, which the user has to select to install the package. We can verify that the package has been installed or not by executing the following command:

>any(grepl("xlsx",installed.packages()))

If it has been installed successfully, it will show the following output:

[1] TRUE
Loading required package: rJava
Loading required package: methods
Loading required package: xlsxjars

We can load the xlsx library by running the following script:

>library("xlsx") 

Now let us save the previous sample file in .xlsx format and read it in the R environment, which can be done by executing the following code:

>data <- read.xlsx("Sample.xlsx", sheetIndex = 1) 
>print(data) 

This gives a DataFrame output with the following content:

       Date    Open    High     Low   Close     Volume    Adj.Close 
1 2016-10-14 2139.68 2149.19 2132.98 2132.98 3228150000   2132.98 
2 2016-10-13 2130.26 2138.19 2114.72 2132.55 3580450000   2132.55 
3 2016-10-12 2137.67 2145.36 2132.77 2139.18 2977100000   2139.18 
4 2016-10-11 2161.35 2161.56 2128.84 2136.73 3438270000   2136.73 
5 2016-10-10 2160.39 2169.60 2160.39 2163.66 2916550000   2163.66 

Similarly, you can write R files in .xlsx format by executing the following code:

>output<-write.xlsx(data,"result.xlsx") 
>output<- read.csv("result.csv") 
>print(output) 

Web data or online sources of data

The Web is one main source of data these days, and we want to directly bring the data from web form to the R environment. R supports this:

URL <- "http://ichart.finance.yahoo.com/table.csv?s=^GSPC" 
snp <- as.data.frame(read.csv(URL)) 
head(snp) 

When the preceding code is executed, it directly brings the data for the S&P500 index into R in DataFrame format. A portion of the data has been displayed by using the head() function here:

        Date    Open    High     Low   Close     Volume   Adj.Close 
1 2016-10-14 2139.68 2149.19 2132.98 2132.98 3228150000   2132.98 
2 2016-10-13 2130.26 2138.19 2114.72 2132.55 3580450000   2132.55 
3 2016-10-12 2137.67 2145.36 2132.77 2139.18 2977100000   2139.18 
4 2016-10-11 2161.35 2161.56 2128.84 2136.73 3438270000   2136.73 
5 2016-10-10 2160.39 2169.60 2160.39 2163.66 2916550000   2163.66 
6 2016-10-07 2164.19 2165.86 2144.85 2153.74 3619890000   2153.74 

Similarly, if we execute the following code, it brings the DJI index data into the R environment: its sample is displayed here:

>URL <- "http://ichart.finance.yahoo.com/table.csv?s=^DJI" 
>dji <- as.data.frame(read.csv(URL)) 
>head(dji) 

This gives the following output:

        Date     Open     High      Low    Close   Volume  Adj.Close 
1 2016-10-14 18177.35 18261.11 18138.38 18138.38 87050000  18138.38 
2 2016-10-13 18088.32 18137.70 17959.95 18098.94 83160000  18098.94 
3 2016-10-12 18132.63 18193.96 18082.09 18144.20 72230000  18144.20 
4 2016-10-11 18308.43 18312.33 18061.96 18128.66 88610000  18128.66 
5 2016-10-10 18282.95 18399.96 18282.95 18329.04 72110000  18329.04 
6 2016-10-07 18295.35 18319.73 18149.35 18240.49 82680000  18240.49 

Please note that we will be mostly using the snp and dji indexes for example illustrations in the rest of the book and these will be referred to as snp and dji.

Databases

A relational database stores data in normalized format, and to perform statistical analysis, we need to write complex and advance queries. But R can connect to various relational databases such as MySQL Oracle, and SQL Server, easily and convert the data tables into DataFrames. Once the data is in DataFrame format, doing statistical analysis is easy to perform using all the available functions and packages.

In this section, we will take the example of MySQL as reference.

R has a built-in package, RMySQL , which provides connectivity with the database; it can be installed using the following command:

>install.packages("RMySQL")

Once the package is installed, we can create a connection object to create a connection with the database. It takes username, password, database name, and localhost name as input. We can give our inputs and use the following command to connect with the required database:

>mysqlconnection = dbConnect(MySQL(), user = '...', password = '...', dbname = '..',host = '.....')

When the database is connected, we can list the table that is present in the database by executing the following command:

>dbListTables(mysqlconnection)

We can query the database using the function dbSendQuery(), and the result is returned to R by using function fetch(). Then the output is stored in DataFrame format:

>result = dbSendQuery(mysqlconnection, "select * from <table name>") 
>data.frame = fetch(result) 
>print(data.fame) 

When the previous code gets executed, it returns the required output.

We can query with a filter clause, update rows in database tables, insert data into a database table, create tables, drop tables, and so on by sending queries through dbSendQuery().

主站蜘蛛池模板: 英吉沙县| 瑞丽市| 泰安市| 棋牌| 城固县| 疏勒县| 图木舒克市| 太谷县| 迁西县| 白玉县| 郓城县| 衡阳县| 响水县| 涿鹿县| 监利县| 玛多县| 龙口市| 城口县| 乐陵市| 赤壁市| 高雄市| 蓝山县| 游戏| 扎鲁特旗| 特克斯县| 巴楚县| 台东市| 南涧| 南平市| 鹤峰县| 东方市| 元阳县| 晋城| 涟水县| 南溪县| 宣威市| 阿坝| 金昌市| 屏山县| 日照市| 额尔古纳市|