官术网_书友最值得收藏!

Converting rectangular data into R with the readr R package

Tabular data, or flat rectangular data, comes in many different formats, including CSV and TSV. R's readr package provides an easy and flexible way to import all kinds of data into R. It also fails gracefully if there are issues with the data you are trying to import. You can load the readr package with the following command:

library(readr)

The simplest way to import data with readr package is to call the specific read data function for different file types, depending on the data you are reading. For example, in the following screenshot, we have a CSV file containing data about automobiles. This data is also bundled as an example dataset with the readr package, as shown in the following screenshot:

Use the following command to read a particular CSV file in each column:

read_csv("mtcars.csv")#> Parsed with column specification:
#> cols(
#> mpg = col_double(),
#> cyl = col_double(),
#> disp = col_double(),
#> hp = col_double(),
#> drat = col_double(),
#> wt = col_double(),
#> qsec = col_double(),
#> vs = col_double(),
#> am = col_double(),
#> gear = col_double(),
#> carb = col_double()
#> )

Here, we have a CSV data file. For this, we used the read_csv function and passed the file path and name as arguments.

readr returns a tibble after reading in the data and it also prints the column specifications. Tibbles are data frames that represent values in rows and columns format. Here, we are loading a data file bundled with readr by default and saving the tibble in a variable:

cars_data <- read_csv(readr_example("mtcars.csv"))

The readr package is used for reading the data and then it prints the column specifications. This console output is very good for debugging. If you notice any issues with the comma separation, you can always copy and edit the columns in a different call, shown as follows:

#> Parsed with column specification:
#> cols(
#> mpg = col_double(),
#> cyl = col_double(),
#> disp = col_double(),
#> hp = col_double(),
#> drat = col_double(),
#> wt = col_double(),
#> qsec = col_double(),
#> vs = col_double(),
#> am = col_double(),
#> gear = col_double(),
#> carb = col_double()
#> )

The read_csv function uses the first line of the CSV file as the column names. However, sometimes, the first few lines of data files contain some extra information and column names start a little down the line. We can use the skip parameter to skip the number of lines as follows:

read_csv("data.csv", skip = 2)

For example, in the preceding code, we skipped the first two lines of the file and asked readr to start reading from the third line.

Sometimes, the data doesn't have column names. We can pass the col_names = FALSE argument to the read_csv function, which specifies to read all the values even if column names are not present:

read_csv("data.csv", col_names = FALSE)

readr functions support passing in column or specifications to customize the data you are reading. For example, you can specify the type of each column with the col_types argument. Sometimes, it's a good idea to specify the column types because this ensures that there are no errors when reading data:

cars_data <- read_csv(readr_example("mtcars.csv"), col_types="ddddddddd")

Here, we specified the column type as Double. The following are the column types supported by readr:

  • col_logical() [l]: Contains only T, F, TRUE, or FALSE logics
  • col_integer() [i]: Integers 
  • col_double() [d]: Doubles
  • col_euro_double() [e]: Euro doubles that use , as the decimal separator
  • col_date() [D]: Y-m-d dates
  • col_datetime() [T]: ISO 8601 date times
  • col_character() [c]: Everything else

There are a lot of other self-explanatory options available when reading data with the read_csv function. For example, a fully loaded read_csv call will look like this:

read_csv(file, col_names = TRUE, col_types = NULL,
  locale = default_locale(), na = c("", "NA"), quoted_na = TRUE,
  quote = "\"", comment = "", trim_ws = TRUE, skip = 0,
  n_max = Inf, guess_max = min(1000, n_max),
  progress = show_progress(), skip_empty_rows = TRUE)

The following are the parameters used in the preceding code:

  • file: This represents the filename
  • col_name: This represents the use of column names while reading CSV file
  • col_types: This represents the type of column
  • locale: This represents which locale should be used

Other parameters used are secondary parameters.

主站蜘蛛池模板: 若羌县| 息烽县| 合川市| 大余县| 广西| 静海县| 新和县| 同心县| 奇台县| 龙州县| 高唐县| 洞头县| 包头市| 铜梁县| 扎鲁特旗| 平定县| 平利县| 东城区| 昌吉市| 辰溪县| 松江区| 米林县| 东阿县| 武威市| 宜良县| 大宁县| 巴塘县| 武川县| 林甸县| 舞阳县| 晋宁县| 广平县| 汝城县| 台江县| 五台县| 全州县| 甘孜县| 三江| 房产| 牡丹江市| 绥滨县|