官术网_书友最值得收藏!

Assigning an average value

This is also one of the common approaches because of its simplicity. In the case of a numerical feature, you can just replace the missing values with the mean or median. You can also use this approach in the case of categorical variables by assigning the mode (the value that has the highest occurrence) to the missing values.

The following code assigns the median of the non-missing values of the Fare feature to the missing values:

# handling the missing values by replacing it with the median fare
df_titanic_data['Fare'][np.isnan(df_titanic_data['Fare'])] = df_titanic_data['Fare'].median()

Or, you can use the following code to find the value that has the highest occurrence in the Embarked feature and assign it to the missing values:

# replacing the missing values with the most common value in the variable
df_titanic_data.Embarked[df_titanic_data.Embarked.isnull()] = df_titanic_data.Embarked.dropna().mode().values
主站蜘蛛池模板: 通州区| 达日县| 桓仁| 海南省| 余庆县| 含山县| 德兴市| 巩留县| 尉氏县| 阿拉善右旗| 扎囊县| 孝感市| 宿迁市| 西乡县| 汉寿县| 靖江市| 固阳县| 西宁市| 南丰县| 磴口县| 黎平县| 安丘市| 宁明县| 法库县| 平陆县| 津市市| 盐城市| 武隆县| 和平区| 佳木斯市| 东平县| 高州市| 丹棱县| 潞城市| 德惠市| 肥西县| 泰兴市| 保德县| 普陀区| 尚志市| 海阳市|