官术网_书友最值得收藏!

Dummy features

These variables are also known as categorical or binary features. This approach will be a good choice if we have a small number of distinct values for the feature to be transformed. In the Titanic data samples, the Embarked feature has only three distinct values (S, C, and Q) that occur frequently. So, we can transform the Embarked feature into three dummy variables, ('Embarked_S', 'Embarked_C', and 'Embarked_Q') to be able to use the random forest classifier.

The following code will show you how to do this kind of transformation:

# constructing binary features
def process_embarked():
global df_titanic_data

# replacing the missing values with the most common value in the variable
df_titanic_data.Embarked[df.Embarked.isnull()] = df_titanic_data.Embarked.dropna().mode().values

# converting the values into numbers
df_titanic_data['Embarked'] = pd.factorize(df_titanic_data['Embarked'])[0]

# binarizing the constructed features
if keep_binary:
df_titanic_data = pd.concat([df_titanic_data, pd.get_dummies(df_titanic_data['Embarked']).rename(
columns=lambda x: 'Embarked_' + str(x))], axis=1)
主站蜘蛛池模板: 垣曲县| 南木林县| 聂荣县| 望城县| 清苑县| 农安县| 仁化县| 策勒县| 凉山| 松江区| 电白县| 江达县| 任丘市| 香港| 湟源县| 当涂县| 蒙阴县| 鸡东县| 崇州市| 平度市| 宁安市| 新疆| 合作市| 洛川县| 永胜县| 东阳市| 五台县| 驻马店市| 桦甸市| 丹巴县| 缙云县| 临潭县| 上虞市| 大丰市| 外汇| 黄龙县| 玉田县| 上犹县| 贡觉县| 泽州县| 静安区|