官术网_书友最值得收藏!

Introduction

Some studies estimate that data preparation activities account for 80 percent of the time invested in data science projects.

I know you will not be surprised reading this number. Data preparation is the phase in data science projects where you take your data from the chaotic world around you and fit it into some precise structures and standards.

This is absolutely not a simple task and involves a great number of techniques that basically let you change the structure of your data and ensure you can work with it.

This chapter will show you recipes that should give you the ability to prepare the data you got from the previous chapter, no matter how it was structured when you acquired it in R.

We will look at the two main activities performed during the data preparation phase:

  • Data cleansing: This involves identification and treatment of outliers and missing values
  • Data manipulation: Here, the main aim is to make the data structure fit some specific rule, which will let the user employ it for analysis
主站蜘蛛池模板: 含山县| 依兰县| 白朗县| 临朐县| 中牟县| 马山县| 青铜峡市| 桦甸市| 宁陵县| 蓬溪县| 文昌市| 丰都县| 泉州市| 伊吾县| 拉孜县| 广元市| 庆安县| 墨竹工卡县| 上蔡县| 延川县| 侯马市| 新宁县| 大城县| 泰安市| 青神县| 中宁县| 汤原县| 三门县| 南投市| 天长市| 仁化县| 吉隆县| 丰宁| 邮箱| 梁山县| 巨野县| 通许县| 郧西县| 蕲春县| 张家界市| 珠海市|