- Python:Advanced Predictive Analytics
- Ashish Kumar Joseph Babcock
- 239字
- 2021-07-02 20:09:23
Summary
The main learning outcomes of this chapter are summarized as follows:
- Various methods and variations in importing a dataset using pandas:
read_csv
and its variations, reading a dataset using open method in Python, reading a file in chunks using theopen
method, reading directly from a URL, specifying the column names from a list, changing the delimiter of a dataset, and so on. - Basic exploratory analysis of data: observing a thumbnail of data, shape, column names, column types, and summary statistics for numerical variables
- Handling missing values: The reason for incorporation of missing values, why it is important to treat them properly, how to treat them properly by deletion and imputation, and various methods of imputing data.
- Creating dummy variables: creating dummy variables for categorical variables to be used in the predictive models.
- Basic plotting: scatter plotting, histograms and boxplots; their meaning and relevance; and how they are plotted.
This chapter is a head start into our journey to explore our data and wrangle it to make it modelling-worthy. The next chapter will go deeper in this pursuit whereby we will learn to aggregate values for categorical variables, sub-set the dataset, merge two datasets, generate random numbers, and sample a dataset.
Cleaning, as we have seen in the last chapter takes about 80% of the modelling time, so it's of critical importance and the methods we are learning will come in handy in the pursuit of that goal.
推薦閱讀
- Python數(shù)據(jù)分析入門:從數(shù)據(jù)獲取到可視化
- 大數(shù)據(jù)可視化
- Python廣告數(shù)據(jù)挖掘與分析實戰(zhàn)
- Oracle RAC 11g實戰(zhàn)指南
- PySpark大數(shù)據(jù)分析與應用
- Hadoop大數(shù)據(jù)實戰(zhàn)權(quán)威指南(第2版)
- Hadoop 3.x大數(shù)據(jù)開發(fā)實戰(zhàn)
- Learning Proxmox VE
- 金融商業(yè)算法建模:基于Python和SAS
- Hadoop集群與安全
- Filecoin原理與實現(xiàn)
- Python 3爬蟲、數(shù)據(jù)清洗與可視化實戰(zhàn)
- 數(shù)據(jù)中心經(jīng)營之道
- SOLIDWORKS 2018中文版機械設計基礎與實例教程
- 精通Neo4j