- Deep Learning By Example
- Ahmed Menshawy
- 243字
- 2021-06-24 18:52:44
Titanic example revisited
In this section, we are going to go through the Titanic example again but from a different perspective while using the feature engineering tool. In case you skipped Chapter 2, Data Modeling in Action - The Titanic Example, the Titanic example is a Kaggle competition with the purpose of predicting weather a specific passenger survived or not.
During this revisit of the Titanic example, we are going to use the scikit-learn and pandas libraries. So first off, let's start by reading the train and test sets and get some statistics about the data:
# reading the train and test sets using pandas
train_data = pd.read_csv('data/train.csv', header=0)
test_data = pd.read_csv('data/test.csv', header=0)
# concatenate the train and test set together for doing the overall feature engineering stuff
df_titanic_data = pd.concat([train_data, test_data])
# removing duplicate indices due to coming the train and test set by re-indexing the data
df_titanic_data.reset_index(inplace=True)
# removing the index column the reset_index() function generates
df_titanic_data.drop('index', axis=1, inplace=True)
# index the columns to be 1-based index
df_titanic_data = df_titanic_data.reindex_axis(train_data.columns, axis=1)
We need to point out a few things about the preceding code snippet:
- As shown, we have used the concat function of pandas to combine the data frames of the train and test sets. This is useful for the feature engineering task as we need a full view of the distribution of the input variables/features.
- After combining both data frames, we need to do some modifications to the output data frame.
推薦閱讀
- 后稀缺:自動化與未來工作
- Instant Raspberry Pi Gaming
- Apache Hive Essentials
- 大數(shù)據(jù)技術(shù)與應(yīng)用
- Windows 7寶典
- AutoCAD 2012中文版繪圖設(shè)計高手速成
- 人工智能實踐錄
- Storm應(yīng)用實踐:實時事務(wù)處理之策略
- Citrix? XenDesktop? 7 Cookbook
- 從零開始學(xué)Java Web開發(fā)
- Xilinx FPGA高級設(shè)計及應(yīng)用
- Serverless Design Patterns and Best Practices
- 教育創(chuàng)新與創(chuàng)新人才:信息技術(shù)人才培養(yǎng)改革之路(四)
- 數(shù)據(jù)庫基礎(chǔ):Access
- Appcelerator Titanium Smartphone App Development Cookbook(Second Edition)