- Practical Data Wrangling
- Allan Visochek
- 278字
- 2021-07-02 15:16:04
Shaping and structuring data
Preparing data for its end use often requires both structuring and organizing the data in the correct manner.
To illustrate this, suppose you have a hierarchical dataset of city populations, as shown in Figure 01:

If the goal is to create a histogram of city populations, the previous data format would be hard to work with. Not only is the information of the city populations nested within the data structure, but it is nested to varying degrees of depth. For the purposes of creating a histogram, it is better to represent the data as a list of numbers, as shown in Figure 02:

Making structural changes like this for large datasets requires you to build programs that can extract the data from one format and put it into another format. Shaping data is an important part of data wrangling because it ensures that the data is compatible with its intended use. In Chapter 4, Reading, Exploring, and Modifying Data - Part II, I will walk through exercises to convert between data formats.
Changing the form of data does not necessarily need to involve changing its structure. Changing the form of a dataset can involve filtering the data entries, reducing the data by category, changing the order of the rows, and changing the way columns are set up.
All of the previously mentioned tasks are features of the dplyr package for R. In Chapter 7, Simplifying Data Manipulation with dplyr, I will show how to use dplyr to easily and intuitively manipulate data.
- 傳感器技術(shù)實(shí)驗(yàn)教程
- 網(wǎng)上生活必備
- 物聯(lián)網(wǎng)與云計(jì)算
- 樂高機(jī)器人—槍械武器庫
- 中國戰(zhàn)略性新興產(chǎn)業(yè)研究與發(fā)展·工業(yè)機(jī)器人
- Grome Terrain Modeling with Ogre3D,UDK,and Unity3D
- 網(wǎng)中之我:何明升網(wǎng)絡(luò)社會論稿
- 液壓機(jī)智能故障診斷方法集成技術(shù)
- SQL Server數(shù)據(jù)庫應(yīng)用基礎(chǔ)(第2版)
- Dreamweaver+Photoshop+Flash+Fireworks網(wǎng)站建設(shè)與網(wǎng)頁設(shè)計(jì)完全實(shí)用
- 工程地質(zhì)地學(xué)信息遙感自動提取技術(shù)
- Spark Streaming實(shí)時(shí)流式大數(shù)據(jù)處理實(shí)戰(zhàn)
- 洞察大數(shù)據(jù)價(jià)值:SAS編程與數(shù)據(jù)挖掘
- Hands-On Data Analysis with Scala
- 信息技術(shù)基礎(chǔ)與應(yīng)用