官术网_书友最值得收藏!

Name

The name variable by itself is useless for most datasets, but it has two useful properties. The first one is the length of your name. For example, the length of your name may reflect something about your status and hence your ability to get on a lifeboat:

# getting the different names in the names variable
df_titanic_data['Names'] = df_titanic_data['Name'].map(lambda y: len(re.split(' ', y)))

The second interesting property is the Name title, which can also be used to indicate status and/or gender:

# Getting titles for each person
df_titanic_data['Title'] = df_titanic_data['Name'].map(lambda y: re.compile(", (.*?)\.").findall(y)[0])

# handling the low occurring titles
df_titanic_data['Title'][df_titanic_data.Title == 'Jonkheer'] = 'Master'
df_titanic_data['Title'][df_titanic_data.Title.isin(['Ms', 'Mlle'])] = 'Miss'
df_titanic_data['Title'][df_titanic_data.Title == 'Mme'] = 'Mrs'
df_titanic_data['Title'][df_titanic_data.Title.isin(['Capt', 'Don', 'Major', 'Col', 'Sir'])] = 'Sir'
df_titanic_data['Title'][df_titanic_data.Title.isin(['Dona', 'Lady', 'the Countess'])] = 'Lady'

# binarizing all the features
if keep_binary:
df_titanic_data = pd.concat(
[df_titanic_data, pd.get_dummies(df_titanic_data['Title']).rename(columns=lambda x: 'Title_' + str(x))],
axis=1)

You can also try to come up with other interesting features from the Name feature. For example, you might think of using the last name feature to find out the size of family members on the Titanic ship.

主站蜘蛛池模板: 哈密市| 侯马市| 壤塘县| 肃南| 惠安县| 华坪县| 旬邑县| 通辽市| 东平县| 石阡县| 皮山县| 修武县| 革吉县| 东乡族自治县| 上饶县| 望江县| 陕西省| 曲阜市| 如东县| 林周县| 石泉县| 孝感市| 德令哈市| 富锦市| 石棉县| 竹溪县| 广宗县| 称多县| 临泽县| 常州市| 客服| 县级市| 临安市| 荥经县| 临湘市| 恭城| 若羌县| 志丹县| 同心县| 梁平县| 东辽县|