官术网_书友最值得收藏!

Data Wrangling with R

"You can have data without information, but you cannot have information without data."
                                                                                                      – Daniel Keys Moran

Data wrangling has been one of the core strengths of R, given its capabilities of relatively fast in-memory processing on demand and a wide array of packages that facilitate the fast data curation processes that data wrangling involves.

R is especially invaluable when working with datasets in excess of 1 million rows—the limit in Microsoft Excel—or when working with files that are in the order of gigabytes. Due to several easy-to-use functions for common day-to-day tasks such as aggregations, joins, and pivots, R is also arguably much simpler to use relative to some of the GUI-based tools that are available for similar tasks.

At a high level, the core categories of data wrangling with R include data extraction, data cleansing, data transformation, and data consolidation. This is a simplified categorization of the basic tenets of data wrangling and we'll delve deeper into these individual subject areas in the next few sections. The challenge emanates largely due to the fact that data comes in a range of data types and data formats from a diverse pool of data sources. Here, data type refers to the characteristics of the contents of the files, format refers to the file format in which data is delivered, and source refers to the systems from when you receive data. There is no common universal convention for thesethe data may exist in a CSV file or a binary SAS file or be present in a database, each of which can have its own nuances and challenges.

In this chapter, we will cover the following topics:

  • Introduction to data wrangling with R
  • The foundational tools of data wrangling: dplyr, data.table, and others
  • ETL with R data extraction
  • ETL with R data transformation
  • ETL with R data load
  • Helpful data wrangling tools for everyday use
  • Tutorial
主站蜘蛛池模板: 五莲县| 凤阳县| 平顶山市| 宁城县| 新平| 微博| 鹤峰县| 张掖市| 岳西县| 年辖:市辖区| 邢台市| 金乡县| 布拖县| 永平县| 赤水市| 库尔勒市| 中宁县| 绿春县| 交口县| 华池县| 大足县| 抚宁县| 深州市| 隆昌县| 周口市| 涞源县| 连州市| 平陆县| 阿坝县| 闵行区| 青海省| 桐柏县| 普兰店市| 溧阳市| 望城县| 镇安县| 太仓市| 淮阳县| 贵州省| 山阳县| 黄平县|