- Deep Learning By Example
- Ahmed Menshawy
- 183字
- 2021-06-24 18:52:46
Name
The name variable by itself is useless for most datasets, but it has two useful properties. The first one is the length of your name. For example, the length of your name may reflect something about your status and hence your ability to get on a lifeboat:
# getting the different names in the names variable
df_titanic_data['Names'] = df_titanic_data['Name'].map(lambda y: len(re.split(' ', y)))
The second interesting property is the Name title, which can also be used to indicate status and/or gender:
# Getting titles for each person
df_titanic_data['Title'] = df_titanic_data['Name'].map(lambda y: re.compile(", (.*?)\.").findall(y)[0])
# handling the low occurring titles
df_titanic_data['Title'][df_titanic_data.Title == 'Jonkheer'] = 'Master'
df_titanic_data['Title'][df_titanic_data.Title.isin(['Ms', 'Mlle'])] = 'Miss'
df_titanic_data['Title'][df_titanic_data.Title == 'Mme'] = 'Mrs'
df_titanic_data['Title'][df_titanic_data.Title.isin(['Capt', 'Don', 'Major', 'Col', 'Sir'])] = 'Sir'
df_titanic_data['Title'][df_titanic_data.Title.isin(['Dona', 'Lady', 'the Countess'])] = 'Lady'
# binarizing all the features
if keep_binary:
df_titanic_data = pd.concat(
[df_titanic_data, pd.get_dummies(df_titanic_data['Title']).rename(columns=lambda x: 'Title_' + str(x))],
axis=1)
You can also try to come up with other interesting features from the Name feature. For example, you might think of using the last name feature to find out the size of family members on the Titanic ship.
推薦閱讀
- Drupal 7 Multilingual Sites
- B2B2C網上商城開發指南
- 數據通信與計算機網絡
- 在實戰中成長:Windows Forms開發之路
- Linux嵌入式系統開發
- 人工智能:語言智能處理
- Learn QGIS
- Mastering OpenStack(Second Edition)
- MongoDB 4 Quick Start Guide
- Puppet 3 Beginner’s Guide
- Natural Language Processing and Computational Linguistics
- 基于Proteus的PIC單片機C語言程序設計與仿真
- 運動控制系統
- Win 7二十一
- Flash CS3動畫制作融會貫通