- Python:Advanced Predictive Analytics
- Ashish Kumar Joseph Babcock
- 239字
- 2021-07-02 20:09:23
Summary
The main learning outcomes of this chapter are summarized as follows:
- Various methods and variations in importing a dataset using pandas:
read_csv
and its variations, reading a dataset using open method in Python, reading a file in chunks using theopen
method, reading directly from a URL, specifying the column names from a list, changing the delimiter of a dataset, and so on. - Basic exploratory analysis of data: observing a thumbnail of data, shape, column names, column types, and summary statistics for numerical variables
- Handling missing values: The reason for incorporation of missing values, why it is important to treat them properly, how to treat them properly by deletion and imputation, and various methods of imputing data.
- Creating dummy variables: creating dummy variables for categorical variables to be used in the predictive models.
- Basic plotting: scatter plotting, histograms and boxplots; their meaning and relevance; and how they are plotted.
This chapter is a head start into our journey to explore our data and wrangle it to make it modelling-worthy. The next chapter will go deeper in this pursuit whereby we will learn to aggregate values for categorical variables, sub-set the dataset, merge two datasets, generate random numbers, and sample a dataset.
Cleaning, as we have seen in the last chapter takes about 80% of the modelling time, so it's of critical importance and the methods we are learning will come in handy in the pursuit of that goal.
推薦閱讀
- 數據產品經理高效學習手冊:產品設計、技術常識與機器學習
- 程序員修煉之道:從小工到專家
- Python絕技:運用Python成為頂級數據工程師
- ETL數據整合與處理(Kettle)
- SQL Server入門經典
- DB29forLinux,UNIX,Windows數據庫管理認證指南
- SQL Server 2008數據庫應用技術(第二版)
- 業務數據分析:五招破解業務難題
- 數據革命:大數據價值實現方法、技術與案例
- Proxmox VE超融合集群實踐真傳
- 一本書講透Elasticsearch:原理、進階與工程實踐
- 新手學會計(2013-2014實戰升級版)
- 大數據分析:R基礎及應用
- 數據迷霧:洞察數據的價值與內涵
- 數據庫原理及應用實驗:基于GaussDB的實現方法