- Deep Learning By Example
- Ahmed Menshawy
- 145字
- 2021-06-24 18:52:45
Dummy features
These variables are also known as categorical or binary features. This approach will be a good choice if we have a small number of distinct values for the feature to be transformed. In the Titanic data samples, the Embarked feature has only three distinct values (S, C, and Q) that occur frequently. So, we can transform the Embarked feature into three dummy variables, ('Embarked_S', 'Embarked_C', and 'Embarked_Q') to be able to use the random forest classifier.
The following code will show you how to do this kind of transformation:
# constructing binary features
def process_embarked():
global df_titanic_data
# replacing the missing values with the most common value in the variable
df_titanic_data.Embarked[df.Embarked.isnull()] = df_titanic_data.Embarked.dropna().mode().values
# converting the values into numbers
df_titanic_data['Embarked'] = pd.factorize(df_titanic_data['Embarked'])[0]
# binarizing the constructed features
if keep_binary:
df_titanic_data = pd.concat([df_titanic_data, pd.get_dummies(df_titanic_data['Embarked']).rename(
columns=lambda x: 'Embarked_' + str(x))], axis=1)
推薦閱讀
- 輕松學C語言
- 樂高機器人:WeDo編程與搭建指南
- Mastering Spark for Data Science
- 來吧!帶你玩轉(zhuǎn)Excel VBA
- 自動檢測與轉(zhuǎn)換技術(shù)
- PHP開發(fā)手冊
- Data Wrangling with Python
- 永磁同步電動機變頻調(diào)速系統(tǒng)及其控制(第2版)
- 可編程序控制器應(yīng)用實訓(三菱機型)
- PostgreSQL 10 Administration Cookbook
- Python:Data Analytics and Visualization
- Excel 2007常見技法與行業(yè)應(yīng)用實例精講
- 基于神經(jīng)網(wǎng)絡(luò)的監(jiān)督和半監(jiān)督學習方法與遙感圖像智能解譯
- Mastering GitLab 12
- JRuby語言實戰(zhàn)技術(shù)