官术网_书友最值得收藏!

Preparation

During preparation, raw data is made ready for exploration. This preparation is often a very interesting process. It is very frequently the case that data from is fraught with all kinds of issues related to quality. You will likely spend a lot of time dealing with these quality issues, and often this is a very non-trivial amount of time.

Why? Well there are a number of reasons:

  • The data is simply incorrect
  • Parts of the dataset are missing
  • Data is not represented using measurements appropriate for your analysis
  • The data is in formats not convenient for your analysis
  • Data is at a level of detail not appropriate for your analysis
  • Not all the fields you need are available from a single source
  • The representation of data differs depending upon the provider

The preparation process focuses on solving these issues. pandas provides many great facilities for preparing data, often referred to as tidying up data. These facilities include intelligent means of handling missing data, converting data types, using format conversion, changing frequencies of measurements, joining data from multiple sets of data, mapping/converting symbols into shared representations, and grouping data, among many others. We will cover all of these in depth.

主站蜘蛛池模板: 海盐县| 林周县| 绥江县| 乐平市| 驻马店市| 汶上县| 塔河县| 南宫市| 古浪县| 荔波县| 凤城市| 错那县| 垦利县| 时尚| 南充市| 利辛县| 台湾省| 廉江市| 乌什县| 蚌埠市| 京山县| 清水河县| 岚皋县| 兴国县| 罗平县| 凤阳县| 紫金县| 香格里拉县| 溧水县| 盘锦市| 长兴县| 新干县| 德兴市| 突泉县| 漳平市| 临泉县| 台安县| 新昌县| 唐海县| 杭锦旗| 新晃|