官术网_书友最值得收藏!

Feature transformations

In the previous two sections, we covered reading the train and test sets and combining them. We also handled some missing values. Now, we will use the random forest classifier of scikit-learn to predict the survival of passengers. Different implementations of the random forest algorithm accept different types of data. The scikit-learn implementation of random forest accepts only numeric data. So, we need to transform the categorical features into numerical ones.

There are two types of features:

  • Quantitative: Quantitative features are measured in a numerical scale and can be meaningfully sorted. In the Titanic data samples, the Age feature is an example of a quantitative feature.

  • Qualitative: Qualitative variables, also called categorical variables, are variables that are not numerical. They describe data that fits into categories. In the Titanic data samples, the Embarked (indicates the name of the departure port) feature is an example of a qualitative feature.

We can apply different kinds of transformations to different variables. The following are some approaches that one can use to transform qualitative/categorical features.

主站蜘蛛池模板: 驻马店市| 息烽县| 莱西市| 江孜县| 新巴尔虎左旗| 昂仁县| 澄迈县| 浪卡子县| 乌兰县| 轮台县| 聂拉木县| 上犹县| 巴马| 雷山县| 鹿邑县| 砀山县| 奈曼旗| 黄梅县| 延吉市| 乌鲁木齐县| 白银市| 桦南县| 惠来县| 沧州市| 桐柏县| 志丹县| 前郭尔| 开平市| 南澳县| 拉萨市| 和静县| 汉中市| 公安县| 巩义市| 荥阳市| 嫩江县| 读书| 鄂尔多斯市| 庄河市| 安塞县| 车险|