官术网_书友最值得收藏!

Missing values

Another constraint with scikit-learn is that it cannot handle data with missing values. Therefore, we must check whether our dataset has any missing values in any of the columns to begin with. We can do this by using the following code: 

#Checking every column for missing values

df.isnull().any()

This produces this output: 

Here we note that every column has some amount of missing values. 

Missing values can be handled in a variety of ways, such as the following:

  • Median imputation
  • Mean imputation
  • Filling them with the majority value

The amount of techniques is quite large and varies depending on the nature of your dataset. This process of handling features with missing values is called feature engineering.

Feature engineering can be done for both categorical and numerical columns and would require an entire book to explain the various methodologies that comprise the topic. 

Since this book provides you with a deep focus on the art of applying the various machine learning algorithms that scikit-learn offers, feature engineering will not be dealt with. 

So, for the purpose of aligning with the goals that this book intends to achieve, we will impute all the missing values with a zero.

We can do this by using the following code: 

#Imputing the missing values with a 0

df = df.fillna(0)

We now have a dataset that is ready for machine learning with scikit-learn. We will use this dataset for all the other chapters that we will go through in the future. To make it easy for us, then, we will export this dataset as a .csv file and store it in the same directory that you are working in with the Jupyter Notebook.

We can do this by using the following code: 

df.to_csv('fraud_prediction.csv')

This will create a .csv file of this dataset in the directory that you are working in, which you can load into the notebook again using pandas. 

主站蜘蛛池模板: 盈江县| 六安市| 济源市| 宁国市| 南投市| 旌德县| 枝江市| 泸州市| 抚顺县| 南城县| 福建省| 宿迁市| 台州市| 灵石县| 河津市| 河东区| 黄陵县| 延津县| 石狮市| 额济纳旗| 民丰县| 津市市| 寻乌县| 寻乌县| 博乐市| 信丰县| 临洮县| 靖宇县| 棋牌| 三门峡市| 博客| 永嘉县| 绵竹市| 策勒县| 孝昌县| 贡觉县| 方山县| 盐城市| 洛宁县| 建阳市| 靖西县|