官术网_书友最值得收藏!

Missing values

Another constraint with scikit-learn is that it cannot handle data with missing values. Therefore, we must check whether our dataset has any missing values in any of the columns to begin with. We can do this by using the following code: 

#Checking every column for missing values

df.isnull().any()

This produces this output: 

Here we note that every column has some amount of missing values. 

Missing values can be handled in a variety of ways, such as the following:

  • Median imputation
  • Mean imputation
  • Filling them with the majority value

The amount of techniques is quite large and varies depending on the nature of your dataset. This process of handling features with missing values is called feature engineering.

Feature engineering can be done for both categorical and numerical columns and would require an entire book to explain the various methodologies that comprise the topic. 

Since this book provides you with a deep focus on the art of applying the various machine learning algorithms that scikit-learn offers, feature engineering will not be dealt with. 

So, for the purpose of aligning with the goals that this book intends to achieve, we will impute all the missing values with a zero.

We can do this by using the following code: 

#Imputing the missing values with a 0

df = df.fillna(0)

We now have a dataset that is ready for machine learning with scikit-learn. We will use this dataset for all the other chapters that we will go through in the future. To make it easy for us, then, we will export this dataset as a .csv file and store it in the same directory that you are working in with the Jupyter Notebook.

We can do this by using the following code: 

df.to_csv('fraud_prediction.csv')

This will create a .csv file of this dataset in the directory that you are working in, which you can load into the notebook again using pandas. 

主站蜘蛛池模板: 永春县| 保德县| 柳河县| 河北省| 前郭尔| 公主岭市| 铜山县| 沽源县| 潼南县| 班戈县| 德阳市| 鄂州市| 措勤县| 连平县| 忻州市| 余干县| 北碚区| 桓台县| 松原市| 二连浩特市| 双峰县| 织金县| 竹北市| 安庆市| 瑞金市| 南丰县| 宁津县| 南溪县| 陇西县| 靖边县| 临夏市| 岢岚县| 手游| 乌拉特前旗| 新密市| 大邑县| 稻城县| 揭西县| 开封县| 葵青区| 谢通门县|