- Deep Learning By Example
- Ahmed Menshawy
- 128字
- 2021-06-24 18:52:45
Factorizing
This approach is used to create a numerical categorical feature from any other feature. In pandas, the factorize() function does that. This type of transformation is useful if your feature is an alphanumeric categorical variable. In the Titanic data samples, we can transform the Cabin feature into a categorical feature, representing the letter of the cabin:
# the cabin number is a sequence of of alphanumerical digits, so we are going to create some features
# from the alphabetical part of it
df_titanic_data['CabinLetter'] = df_titanic_data['Cabin'].map(lambda l: get_cabin_letter(l))
df_titanic_data['CabinLetter'] = pd.factorize(df_titanic_data['CabinLetter'])[0]
def get_cabin_letter(cabin_value):
# searching for the letters in the cabin alphanumerical value
letter_match = re.compile("([a-zA-Z]+)").search(cabin_value)
if letter_match:
return letter_match.group()
else:
return 'U'
We can also apply transformations to quantitative features by using one of the following approaches.
推薦閱讀
- 大數(shù)據(jù)技術(shù)與應(yīng)用基礎(chǔ)
- 自動控制工程設(shè)計入門
- 軟件架構(gòu)設(shè)計
- WOW!Illustrator CS6完全自學(xué)寶典
- 可編程控制器技術(shù)應(yīng)用(西門子S7系列)
- 西門子S7-200 SMART PLC實例指導(dǎo)學(xué)與用
- 智能生產(chǎn)線的重構(gòu)方法
- MCGS嵌入版組態(tài)軟件應(yīng)用教程
- 精通數(shù)據(jù)科學(xué):從線性回歸到深度學(xué)習(xí)
- Mastering GitLab 12
- 網(wǎng)絡(luò)存儲·數(shù)據(jù)備份與還原
- Windows安全指南
- 大數(shù)據(jù)導(dǎo)論
- Building Analytics Teams
- Hadoop大數(shù)據(jù)開發(fā)基礎(chǔ)