电子游戏

書名： R for Data Science Cookbook
作者名： Yu Wei Chiu (David Chiu)
本章字數： 505字
更新時間： 2021-07-14 10:51:25

Scanning text files

In previous recipes, we introduced how to use read.table and read.csv to load data into an R session. However, read.table and read.csv only work if the number of columns is fixed and the data size is small. To be more flexible in data processing, we will demonstrate how to use the scan function to read data from the file.

Getting ready

In this recipe, you need to have completed the previous recipes and have snp500.csv downloaded in the current directory.

How to do it…

Please perform the following steps to scan data from the CSV file:

First, you can use the scan function to read data from snp500.csv:

> stock_data3 <- scan('snp500.csv',sep=',', what=list(Date = '', Open = 0, High = 0, Low = 0,Close = 0, Volume = 0, Adj_Close = 0), skip=1, fill=T)
Read 16481 records

You can then examine loaded data with mode and str:

> mode(stock_data3)
[1] "list"
> str(stock_data3)
List of 7
 $ Date : chr [1:16481] "2015-07-02" "2015-07-01" "2015-06-30" "2015-06-29" ...
 $ Open : num [1:16481] 2078 2067 2061 2099 2103 ...
 $ High : num [1:16481] 2085 2083 2074 2099 2109 ...
 $ Low : num [1:16481] 2071 2067 2056 2057 2095 ...
 $ Close : num [1:16481] 2077 2077 2063 2058 2102 ...
 $ Volume : num [1:16481] 3.00e+09 3.73e+09 4.08e+09 3.68e+09 5.03e+09 ...
 $ Adj_Close: num [1:16481] 2077 2077 2063 2058 2102 ...

How it works…

When comparing read.csv and read.table, the scan function is more flexible and efficient in data reading. Here, we specify the field name and support type of each field within a list in the what parameter. In this case, the first field is of character type, and the rest of the fields are of numeric type. Therefore, we can set two single (or double) quotes for the Date column, and 0 for the rest of the fields. Then, as we need to skip the header row and automatically add empty fields to any lines with fewer fields than the number of columns, we set skip to 1 and fill to True.

At this point, we can now examine the data with some built-in functions. Here, we use mode to obtain the type of the object and use str to display the structure of the data.

There's more…

On some occasions, the data is separated by fixed width rather than fixed delimiter. To specify the width of each column, you can use the read.fwf function:

First, you can use download.file to download weather.op from the author's GitHub page:

> download.file("https://github.com/ywchiu/rcookbook/raw/master/chapter2/weather.op", "weather.op")

You can then examine the data with the file editor:

Figure 5: Using the file editor to examine the file
Read the data by specifying the width of each column in widths, the column name in col.names, and skip the first row by setting skip to 1:
```
> weather <- read.fwf("weather.op", widths = c(6,6,10,11,9,8), col.names = c("STN","WBAN","YEARMODA","TEMP","MAX","MIN"), skip=1)
```

Lastly, you can examine the data using the head and names functions:

> head(weather)
 STN WBAN YEARMODA TEMP MAX MIN
1 8403 99999 20140101 85.8 24 102.7* 69.3*
2 8403 99999 20140102 86.3 24 102.9* 71.1*
3 8403 99999 20140103 85.9 24 101.1* 72.0*
4 8403 99999 20140104 85.6 24 102.7* 70.5*
5 8403 99999 20140105 84.8 23 102.0* 66.6*
6 8403 99999 20140106 86.8 23 102.0* 70.9*

> names(weather)
[1] "STN" "WBAN" "YEARMODA" "TEMP" "MAX" 
[6] "MIN"

官术网_书友最值得收藏!

R for Data Science Cookbook

Scanning text files

Getting ready

How to do it…

How it works…

There's more…