官术网_书友最值得收藏!

Creating dummy variables

Creating dummy variables is a method to create separate variable for each category of a categorical variable., Although, the categorical variable contains plenty of information and might show a causal relationship with output variable, it can't be used in the predictive models like linear and logistic regression without any processing.

In our dataset, sex is a categorical variable with two categories that are male and female. We can create two dummy variables out of this, as follows:

dummy_sex=pd.get_dummies(data['sex'],prefix='sex')

The result of this statement is, as follows:

Fig. 2.17: Dummy variable for the sex variable in the Titanic dataset

This process is called dummifying, the variable creates two new variables that take either 1 or 0 value depending on what the sex of the passenger was. If the sex was female, sex_female would be 1 and sex_male would be 0. If the sex was male, sex_male would be 1 and sex_female would be 0. In general, all but one dummy variable in a row will have a 0 value. The variable derived from the value (for that row) in the original column will have a value of 1.

These two new variables can be joined to the source data frame, so that they can be used in the models. The method to that is illustrated, as follows:

column_name=data.columns.values.tolist()
column_name.remove('sex')
data[column_name].join(dummy_sex)

The column names are converted to a list and the sex is removed from the list before joining these two dummy variables to the dataset, as it will not make sense to have a sex variable with these two dummy variables.

主站蜘蛛池模板: 大邑县| 绵阳市| 伊金霍洛旗| 秭归县| 阳原县| 克拉玛依市| 吕梁市| 洛南县| 湛江市| 长岭县| 贺州市| 奉化市| 玉林市| 厦门市| 武鸣县| 怀宁县| 镇康县| 崇义县| 若羌县| 澳门| 嵊州市| 托里县| 博客| 菏泽市| 白水县| 马公市| 平顺县| 辽源市| 信丰县| 黑龙江省| 五台县| 洪洞县| 新竹县| 瓦房店市| 满洲里市| 永安市| 介休市| 开原市| 雷山县| 泗水县| 邮箱|