官术网_书友最值得收藏!

Assigning an average value

This is also one of the common approaches because of its simplicity. In the case of a numerical feature, you can just replace the missing values with the mean or median. You can also use this approach in the case of categorical variables by assigning the mode (the value that has the highest occurrence) to the missing values.

The following code assigns the median of the non-missing values of the Fare feature to the missing values:

# handling the missing values by replacing it with the median fare
df_titanic_data['Fare'][np.isnan(df_titanic_data['Fare'])] = df_titanic_data['Fare'].median()

Or, you can use the following code to find the value that has the highest occurrence in the Embarked feature and assign it to the missing values:

# replacing the missing values with the most common value in the variable
df_titanic_data.Embarked[df_titanic_data.Embarked.isnull()] = df_titanic_data.Embarked.dropna().mode().values
主站蜘蛛池模板: 磐安县| 仪征市| 株洲县| 嘉荫县| 陕西省| 克山县| 祁东县| 宣威市| 花莲县| 长汀县| 广水市| 双柏县| 襄汾县| 郎溪县| 廊坊市| 龙泉市| 将乐县| 凉城县| 彭水| 教育| 卢龙县| 祁连县| 兴海县| 揭东县| 望江县| 顺义区| 旬阳县| 宜章县| 绥中县| 昆山市| 屏边| 孟州市| 繁峙县| 峨眉山市| 建宁县| 田阳县| 抚顺市| 云阳县| 怀仁县| 枣阳市| 东乡|