官术网_书友最值得收藏!

Chapter 2. Preparing Your Data

The French term mise en place is used in professional kitchens to describe the practice of chefs organizing and arranging the ingredients up to a point where it is ready to be used. It may be as simple as washing and picking herbs into inpidual leaves or chopping vegetables, or as complicated as caramelizing onions or slow cooking meats.

In the same way, before we start cooking the data or building a predictive model, we need to prepare the ingredients-the data. Our preparation covers three different tasks:

  • Loading the data into the analytic tool
  • Exploring the data to understand it and to find quality problems with it
  • Transforming the data to fix the quality problems

We say that the quality of data is high when it's appropriate for a specific use. In this chapter, we'll describe characteristics of data related to its quality.

As we've seen, our mise en place has three steps. After loading the data, we need to explore it and transform it. Exploring and transforming is an iterative process, but in this book, we'll pide it in two different steps for clarity.

In this chapter, we'll discuss the following topics:

  • Datasets and types of variables
  • Data quality
  • Loading data into Rattle
  • Assigning roles to the variables
  • Transforming variables to solve data quality problems and to improve data format of our predictive model

In this chapter, we'll cover how we explore the data to understand it and find quality problems.

主站蜘蛛池模板: 晋州市| 广饶县| 深圳市| 乐平市| 福海县| 宝丰县| 渭源县| 南康市| 广河县| 阳高县| 广西| 博兴县| 临高县| 八宿县| 安康市| 特克斯县| 萝北县| 镶黄旗| 巴林右旗| 临城县| 山西省| 沙湾县| 尼玛县| 曲阜市| 淅川县| 宾阳县| 泰顺县| 顺平县| 康定县| 砀山县| 陵川县| 宜良县| 穆棱市| 贵定县| 庆安县| 桐梓县| 阿城市| 泸西县| 瓦房店市| 毕节市| 内江市|