官术网_书友最值得收藏!

Preparation

During preparation, raw data is made ready for exploration. This preparation is often a very interesting process. It is very frequently the case that data from is fraught with all kinds of issues related to quality. You will likely spend a lot of time dealing with these quality issues, and often this is a very non-trivial amount of time.

Why? Well there are a number of reasons:

  • The data is simply incorrect
  • Parts of the dataset are missing
  • Data is not represented using measurements appropriate for your analysis
  • The data is in formats not convenient for your analysis
  • Data is at a level of detail not appropriate for your analysis
  • Not all the fields you need are available from a single source
  • The representation of data differs depending upon the provider

The preparation process focuses on solving these issues. pandas provides many great facilities for preparing data, often referred to as tidying up data. These facilities include intelligent means of handling missing data, converting data types, using format conversion, changing frequencies of measurements, joining data from multiple sets of data, mapping/converting symbols into shared representations, and grouping data, among many others. We will cover all of these in depth.

主站蜘蛛池模板: 台北市| 满城县| 阜阳市| 红桥区| 安义县| 宜都市| 郁南县| 从江县| 崇文区| 鄂伦春自治旗| 蓬安县| 九龙县| 托里县| 通州区| 尼勒克县| 息烽县| 邻水| 九江市| 若羌县| 禹城市| 开阳县| 阿拉善右旗| 桃源县| 灵璧县| 乃东县| 东平县| 秭归县| 左云县| 北流市| 汪清县| 曲松县| 临城县| 康乐县| 黄冈市| 固原市| 岱山县| 普宁市| 禄劝| 枞阳县| 靖安县| 高陵县|