書名： Learning pandas（Second Edition）
作者名： Michael Heydt
本章字數： 227字
更新時間： 2021-07-02 20:36:57

Data manipulation

Data is distributed all over the planet. It is stored in different formats. It has widely varied levels of quality. Because of this there is a need for tools and processes for pulling data together and into a form that can be used for decision making. This requires many different tasks and capabilities from a tool that manipulates data in preparation for analysis. The features needed from such a tool include:

Programmability for reuse and sharing
Access to data from external sources
Storing data locally
Indexing data for efficient retrieval
Alignment of data in different sets based upon attributes
Combining data in different sets
Transformation of data into other representations
Cleaning data from cruft
Effective handling of bad data
Grouping data into common baskets
Aggregation of data of like characteristics
Application of functions to calculate meaning or perform transformations
Query and slicing to explore pieces of the whole
Restructuring into other forms
Modeling distinct categories of data such as categorical, continuous, discrete, and time series
Resampling data to different frequencies

There are many data manipulation tools in existence. Each differs in support for the items on this list, how they are deployed, and how they are utilized by their users. These tools include relational databases (SQL Server, Oracle), spreadsheets (Excel), event processing systems (such as Spark), and more generic tools such as R and pandas.

官术网_书友最值得收藏!

Learning pandas（Second Edition）

Data manipulation