官术网_书友最值得收藏!

Creating dummy variables

Creating dummy variables is a method to create separate variable for each category of a categorical variable., Although, the categorical variable contains plenty of information and might show a causal relationship with output variable, it can't be used in the predictive models like linear and logistic regression without any processing.

In our dataset, sex is a categorical variable with two categories that are male and female. We can create two dummy variables out of this, as follows:

dummy_sex=pd.get_dummies(data['sex'],prefix='sex')

The result of this statement is, as follows:

Fig. 2.17: Dummy variable for the sex variable in the Titanic dataset

This process is called dummifying, the variable creates two new variables that take either 1 or 0 value depending on what the sex of the passenger was. If the sex was female, sex_female would be 1 and sex_male would be 0. If the sex was male, sex_male would be 1 and sex_female would be 0. In general, all but one dummy variable in a row will have a 0 value. The variable derived from the value (for that row) in the original column will have a value of 1.

These two new variables can be joined to the source data frame, so that they can be used in the models. The method to that is illustrated, as follows:

column_name=data.columns.values.tolist()
column_name.remove('sex')
data[column_name].join(dummy_sex)

The column names are converted to a list and the sex is removed from the list before joining these two dummy variables to the dataset, as it will not make sense to have a sex variable with these two dummy variables.

主站蜘蛛池模板: 景德镇市| 凤城市| 宕昌县| 温宿县| 凌海市| 望江县| 嘉祥县| 阳朔县| 朝阳区| 黄浦区| 东乡县| 夹江县| 陇川县| 尖扎县| 台中市| 涡阳县| 江源县| 扎鲁特旗| 昌平区| 台北县| 垦利县| 盘锦市| 福海县| 滨州市| 灯塔市| 兴文县| 原平市| 静宁县| 宿迁市| 瑞丽市| 韶关市| 永和县| 蓬莱市| 呼伦贝尔市| 修文县| 福清市| 崇阳县| 安多县| 昭苏县| 孝义市| 固镇县|