官术网_书友最值得收藏!

Dummy features

These variables are also known as categorical or binary features. This approach will be a good choice if we have a small number of distinct values for the feature to be transformed. In the Titanic data samples, the Embarked feature has only three distinct values (S, C, and Q) that occur frequently. So, we can transform the Embarked feature into three dummy variables, ('Embarked_S', 'Embarked_C', and 'Embarked_Q') to be able to use the random forest classifier.

The following code will show you how to do this kind of transformation:

# constructing binary features
def process_embarked():
global df_titanic_data

# replacing the missing values with the most common value in the variable
df_titanic_data.Embarked[df.Embarked.isnull()] = df_titanic_data.Embarked.dropna().mode().values

# converting the values into numbers
df_titanic_data['Embarked'] = pd.factorize(df_titanic_data['Embarked'])[0]

# binarizing the constructed features
if keep_binary:
df_titanic_data = pd.concat([df_titanic_data, pd.get_dummies(df_titanic_data['Embarked']).rename(
columns=lambda x: 'Embarked_' + str(x))], axis=1)
主站蜘蛛池模板: 顺昌县| 义马市| 保山市| 锦州市| 沙河市| 天门市| 潮州市| 云南省| 芮城县| 琼中| 汉源县| 南澳县| 驻马店市| 松溪县| 秦安县| 固始县| 收藏| 忻城县| 曲麻莱县| 舞阳县| 兴仁县| 沁水县| 双桥区| 新晃| 开化县| 大邑县| 即墨市| 江口县| 申扎县| 平江县| 申扎县| 思南县| 阳山县| 景泰县| 余江县| 曲沃县| 永丰县| 沈丘县| 沂水县| 鱼台县| 利川市|