官术网_书友最值得收藏!

Missing values

Another constraint with scikit-learn is that it cannot handle data with missing values. Therefore, we must check whether our dataset has any missing values in any of the columns to begin with. We can do this by using the following code: 

#Checking every column for missing values

df.isnull().any()

This produces this output: 

Here we note that every column has some amount of missing values. 

Missing values can be handled in a variety of ways, such as the following:

  • Median imputation
  • Mean imputation
  • Filling them with the majority value

The amount of techniques is quite large and varies depending on the nature of your dataset. This process of handling features with missing values is called feature engineering.

Feature engineering can be done for both categorical and numerical columns and would require an entire book to explain the various methodologies that comprise the topic. 

Since this book provides you with a deep focus on the art of applying the various machine learning algorithms that scikit-learn offers, feature engineering will not be dealt with. 

So, for the purpose of aligning with the goals that this book intends to achieve, we will impute all the missing values with a zero.

We can do this by using the following code: 

#Imputing the missing values with a 0

df = df.fillna(0)

We now have a dataset that is ready for machine learning with scikit-learn. We will use this dataset for all the other chapters that we will go through in the future. To make it easy for us, then, we will export this dataset as a .csv file and store it in the same directory that you are working in with the Jupyter Notebook.

We can do this by using the following code: 

df.to_csv('fraud_prediction.csv')

This will create a .csv file of this dataset in the directory that you are working in, which you can load into the notebook again using pandas. 

主站蜘蛛池模板: 佛坪县| 盐源县| 京山县| 霞浦县| 利津县| 黑山县| 静安区| 武功县| 昌宁县| 张北县| 中山市| 天气| 乌海市| 洛浦县| 连江县| 泗洪县| 合川市| 克什克腾旗| 于都县| 屯门区| 孙吴县| 安徽省| 余江县| 永新县| 冀州市| 尚志市| 施甸县| 平泉县| 南宁市| 临清市| 乐安县| 增城市| 江达县| 巨野县| 长春市| 祁连县| 平安县| 南靖县| 松江区| 温泉县| 资溪县|