- R for Data Science Cookbook
- Yu Wei Chiu (David Chiu)
- 505字
- 2021-07-14 10:51:25
Scanning text files
In previous recipes, we introduced how to use read.table
and read.csv
to load data into an R session. However, read.table
and read.csv
only work if the number of columns is fixed and the data size is small. To be more flexible in data processing, we will demonstrate how to use the scan
function to read data from the file.
Getting ready
In this recipe, you need to have completed the previous recipes and have snp500.csv
downloaded in the current directory.
How to do it…
Please perform the following steps to scan data from the CSV file:
- First, you can use the
scan
function to read data fromsnp500.csv
:> stock_data3 <- scan('snp500.csv',sep=',', what=list(Date = '', Open = 0, High = 0, Low = 0,Close = 0, Volume = 0, Adj_Close = 0), skip=1, fill=T) Read 16481 records
- You can then examine loaded data with
mode
andstr
:> mode(stock_data3) [1] "list" > str(stock_data3) List of 7 $ Date : chr [1:16481] "2015-07-02" "2015-07-01" "2015-06-30" "2015-06-29" ... $ Open : num [1:16481] 2078 2067 2061 2099 2103 ... $ High : num [1:16481] 2085 2083 2074 2099 2109 ... $ Low : num [1:16481] 2071 2067 2056 2057 2095 ... $ Close : num [1:16481] 2077 2077 2063 2058 2102 ... $ Volume : num [1:16481] 3.00e+09 3.73e+09 4.08e+09 3.68e+09 5.03e+09 ... $ Adj_Close: num [1:16481] 2077 2077 2063 2058 2102 ...
How it works…
When comparing read.csv
and read.table
, the scan
function is more flexible and efficient in data reading. Here, we specify the field name and support type of each field within a list in the what
parameter. In this case, the first field is of character type, and the rest of the fields are of numeric type. Therefore, we can set two single (or double) quotes for the Date
column, and 0
for the rest of the fields. Then, as we need to skip the header row and automatically add empty fields to any lines with fewer fields than the number of columns, we set skip
to 1
and fill
to True
.
At this point, we can now examine the data with some built-in functions. Here, we use mode
to obtain the type of the object and use str
to display the structure of the data.
There's more…
On some occasions, the data is separated by fixed width rather than fixed delimiter. To specify the width of each column, you can use the read.fwf
function:
- First, you can use
download.file
to downloadweather.op
from the author's GitHub page:> download.file("https://github.com/ywchiu/rcookbook/raw/master/chapter2/weather.op", "weather.op")
- You can then examine the data with the file editor:
Figure 5: Using the file editor to examine the file
- Read the data by specifying the width of each column in
widths
, the column name incol.names
, and skip the first row by settingskip
to1
:> weather <- read.fwf("weather.op", widths = c(6,6,10,11,9,8), col.names = c("STN","WBAN","YEARMODA","TEMP","MAX","MIN"), skip=1)
- Lastly, you can examine the data using the
head
andnames
functions:> head(weather) STN WBAN YEARMODA TEMP MAX MIN 1 8403 99999 20140101 85.8 24 102.7* 69.3* 2 8403 99999 20140102 86.3 24 102.9* 71.1* 3 8403 99999 20140103 85.9 24 101.1* 72.0* 4 8403 99999 20140104 85.6 24 102.7* 70.5* 5 8403 99999 20140105 84.8 23 102.0* 66.6* 6 8403 99999 20140106 86.8 23 102.0* 70.9* > names(weather) [1] "STN" "WBAN" "YEARMODA" "TEMP" "MAX" [6] "MIN"
- 零基礎搭建量化投資系統:以Python為工具
- Instant Apache Stanbol
- CKA/CKAD應試教程:從Docker到Kubernetes完全攻略
- Scala編程實戰(原書第2版)
- Go語言精進之路:從新手到高手的編程思想、方法和技巧(2)
- Geospatial Development By Example with Python
- PhoneGap 4 Mobile Application Development Cookbook
- Building Clouds with Windows Azure Pack
- HTML5 and CSS3:Building Responsive Websites
- Opa Application Development
- Scala實用指南
- Advanced C++
- 算法(第4版)
- Drools 8規則引擎:核心技術與實踐
- C++ Primer(中文版)(第5版)