官术网_书友最值得收藏!

  • Deep Learning By Example
  • Ahmed Menshawy
  • 243字
  • 2021-06-24 18:52:44

Titanic example revisited

In this section, we are going to go through the Titanic example again but from a different perspective while using the feature engineering tool. In case you skipped Chapter 2, Data Modeling in Action - The Titanic Example, the Titanic example is a Kaggle competition with the purpose of predicting weather a specific passenger survived or not.

During this revisit of the Titanic example, we are going to use the scikit-learn and pandas libraries. So first off, let's start by reading the train and test sets and get some statistics about the data:

# reading the train and test sets using pandas
train_data = pd.read_csv('data/train.csv', header=0)
test_data = pd.read_csv('data/test.csv', header=0)

# concatenate the train and test set together for doing the overall feature engineering stuff
df_titanic_data = pd.concat([train_data, test_data])

# removing duplicate indices due to coming the train and test set by re-indexing the data
df_titanic_data.reset_index(inplace=True)

# removing the index column the reset_index() function generates
df_titanic_data.drop('index', axis=1, inplace=True)

# index the columns to be 1-based index
df_titanic_data = df_titanic_data.reindex_axis(train_data.columns, axis=1)

We need to point out a few things about the preceding code snippet:

  • As shown, we have used the concat function of pandas to combine the data frames of the train and test sets. This is useful for the feature engineering task as we need a full view of the distribution of the input variables/features.
  • After combining both data frames, we need to do some modifications to the output data frame.
主站蜘蛛池模板: 邵武市| 鹰潭市| 江口县| 雷州市| 和田市| 苏尼特左旗| 美姑县| 油尖旺区| 门头沟区| 扎兰屯市| 望奎县| 抚州市| 东山县| 汝南县| 南召县| 长泰县| 昌图县| 乌兰察布市| 张家港市| 平遥县| 栾川县| 穆棱市| 恭城| 江川县| 连州市| 庆安县| 新竹县| 黄平县| 淅川县| 屯留县| 普格县| 辉南县| 晋宁县| 锡林郭勒盟| 龙川县| 什邡市| 芜湖市| 威信县| 根河市| 宁远县| 陕西省|