- Statistics for Data Science
- James D. Miller
- 194字
- 2021-07-02 14:58:50
Processing data
The processing (or transformation) of data is where the data scientist's programming skills will come in to play (although you can often find a data scientist performing some sort of processing in other steps, like collecting, visualizing, or learning).
Keep in mind that there are many aspects of processing that occur within data science. The most common are formatting (and reformatting), which involves activities such as mechanically setting data types, aggregating values, reordering or dropping columns, and so on, cleansing (or addressing the quality of the data), which is solving for such things as default or missing values, incomplete or inapposite values, and so on, and profiling, which adds context to the data by creating a statistical understanding of the data.
The processing to be completed on the data can be simple (for example, it can be a very simple and manual event requiring repetitious updates to data in an MS Excel worksheet), or complex (as with the use of programming languages such as R or Python), or even more sophisticated (as when processing logic is coded into routines that can then be scheduled and rerun automatically on new populations of data).
- PPT,要你好看
- Mastering Spark for Data Science
- 條碼技術及應用
- 樂高機器人—槍械武器庫
- Bayesian Analysis with Python
- Apache源代碼全景分析(第1卷):體系結構與核心模塊
- Linux Shell Scripting Cookbook(Third Edition)
- 計算機應用基礎實訓·職業模塊
- 計算機辦公應用培訓教程
- EDA技術及其創新實踐(Verilog HDL版)
- 數據結構與實訓
- 歐姆龍CP1H型PLC編程與應用
- Microsoft Office 365:Exchange Online Implementation and Migration(Second Edition)
- Cloud Native Development Patterns and Best Practices
- 設計中的人因:34個設計小故事