官术网_书友最值得收藏!

Reading data from different sources

Importing data to R is quite simple and can be done from multiple sources. The most common method of importing data to R is through the comma-separated values (CSV) format. The CSV data can be accessed through the read.csv function. This is the simplest way to read the data as it requires just a single line command and the data is ready. Depending on the quality of the data, it may or may not require processing.

data <- read.csv("c:/local-data.csv")

The other function similar to read.csv is read.csv2. This function is also used to read the CSV files but the difference is that read.csv2 is mostly used in the European countries, where comma is used as decimal point and semicolon is used as a separator. Also, the data can be read from R using a few more parameters, such as read.table and read.delim. By default, read.delim is used to read tab-delimited files, and the read.table function can be used to read any file by supplying suitable parameters as the input:

data <- read.delim("local-data.txt", header=TRUE, sep="\t")
data <- read.table("local-data.txt", header=TRUE, sep="\t")

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

All the preceding functions can take multiple parameters that would explain the data source's format at best. Some of these parameters are as follows:

  • header: This is a logical value indicating the presence of column names in the file. When it is set to TRUE, it indicates that the column names are present. By default, the value is considered as TRUE.
  • sep: This defines the separator in the file. By default, the separator is comma for read.csv, tab for read.delim, and white space for the read.table function.
  • nrows: This specifies the maximum number of rows to read from the file. By default, the entire file will be read.
  • row.names: This will specify which column should be considered as a row name. When it is set as NULL, the row names will be forced as numbers. This parameter will take the column's position (one represents the first column) as input.
  • fill: This parameter when set as TRUE can read the data with unequal row lengths and blank fields are implicitly added.

These are some of the common parameters used along with the functions to read the data from a file.

We have so far explored reading data from a delimited file. In addition to this, we can read data in Excel formats as well. This can be achieved using the xlsx or XLConnect packages. We will see how to use one of these packages in order to read a worksheet from a workbook:

install.packages("xlsx")
library(xlsx)
mydata <- read.xlsx("DTH AnalysisV1.xlsx", 1)
head(mydata)

In the preceding code, we first installed the xlsx package that is required to read the Excel files. We loaded the package using the library function, then used the read.xlsx function to read the excel file, and passed an additional parameter, 1, that specifies which sheet to read from the excel file.

主站蜘蛛池模板: 黄龙县| 临西县| 罗平县| 谷城县| 玉门市| 东丽区| 南靖县| 新兴县| 怀仁县| 穆棱市| 南阳市| 嘉黎县| 巨鹿县| 海原县| 泸西县| 铜梁县| 邵阳市| 即墨市| 黑河市| 宽城| 南充市| 新余市| 新龙县| 高密市| 乌兰察布市| 乐平市| 松桃| 安乡县| 郑州市| 湾仔区| 贵港市| 鹿邑县| 灵寿县| 电白县| 库车县| 宁都县| 玛多县| 龙游县| 绥宁县| 尖扎县| 建宁县|