官术网_书友最值得收藏!

  • Practical Data Wrangling
  • Allan Visochek
  • 278字
  • 2021-07-02 15:16:04

Shaping and structuring data

Preparing data for its end use often requires both structuring and organizing the data in the correct manner. 

To illustrate this, suppose you have a hierarchical dataset of city populations, as shown in Figure 01:

Figure 01: Hierarchical structure of the population of cities

If the goal is to create a histogram of city populations, the previous data format would be hard to work with. Not only is the information of the city populations nested within the data structure, but it is nested to varying degrees of depth. For the purposes of creating a histogram, it is better to represent the data as a list of numbers, as shown in Figure 02:

Figure 02: List of populations for histogram visualization

Making structural changes like this for large datasets requires you to build programs that can extract the data from one format and put it into another format. Shaping data is an important part of data wrangling because it ensures that the data is compatible with its intended use. In Chapter 4Reading, Exploring, and Modifying Data - Part II, I will walk through exercises to convert between data formats.

Changing the form of data does not necessarily need to involve changing its structure. Changing the form of a dataset can involve filtering the data entries, reducing the data by category, changing the order of the rows, and changing the way columns are set up.

All of the previously mentioned tasks are features of the dplyr package for R. In Chapter 7, Simplifying Data Manipulation with dplyr, I will show how to use dplyr to easily and intuitively manipulate data.

主站蜘蛛池模板: 东辽县| 枝江市| 长沙县| 锡林浩特市| 平陆县| 德格县| 达拉特旗| 长海县| 清丰县| 黔东| 台中县| 清水县| 会昌县| 新蔡县| 旌德县| 基隆市| 澳门| 房产| 化德县| 西丰县| 田林县| 阿尔山市| 漾濞| 榆林市| 夏河县| 永福县| 临漳县| 县级市| 革吉县| 浦城县| 龙胜| 抚远县| 望城县| 临汾市| 桂东县| 齐齐哈尔市| 镇远县| 海淀区| 高邮市| 沈阳市| 蓝山县|