- Statistics for Data Science
- James D. Miller
- 194字
- 2021-07-02 14:58:50
Processing data
The processing (or transformation) of data is where the data scientist's programming skills will come in to play (although you can often find a data scientist performing some sort of processing in other steps, like collecting, visualizing, or learning).
Keep in mind that there are many aspects of processing that occur within data science. The most common are formatting (and reformatting), which involves activities such as mechanically setting data types, aggregating values, reordering or dropping columns, and so on, cleansing (or addressing the quality of the data), which is solving for such things as default or missing values, incomplete or inapposite values, and so on, and profiling, which adds context to the data by creating a statistical understanding of the data.
The processing to be completed on the data can be simple (for example, it can be a very simple and manual event requiring repetitious updates to data in an MS Excel worksheet), or complex (as with the use of programming languages such as R or Python), or even more sophisticated (as when processing logic is coded into routines that can then be scheduled and rerun automatically on new populations of data).
- Learning Apache Cassandra(Second Edition)
- 流處理器研究與設計
- 數據庫原理與應用技術
- PyTorch Deep Learning Hands-On
- 大數據技術與應用
- 運動控制器與交流伺服系統的調試和應用
- 運動控制系統應用與實踐
- 基于Xilinx ISE的FPAG/CPLD設計與應用
- 統計挖掘與機器學習:大數據預測建模和分析技術(原書第3版)
- SMS 2003部署與操作深入指南
- Visual C++項目開發案例精粹
- Excel 2010函數與公式速查手冊
- Linux Shell Scripting Cookbook(Third Edition)
- Creating ELearning Games with Unity
- PostgreSQL 10 High Performance