- Python:Advanced Predictive Analytics
- Ashish Kumar Joseph Babcock
- 263字
- 2021-07-02 20:09:22
Creating dummy variables
Creating dummy variables is a method to create separate variable for each category of a categorical variable., Although, the categorical variable contains plenty of information and might show a causal relationship with output variable, it can't be used in the predictive models like linear and logistic regression without any processing.
In our dataset, sex
is a categorical variable with two categories that are male and female. We can create two dummy variables out of this, as follows:
dummy_sex=pd.get_dummies(data['sex'],prefix='sex')
The result of this statement is, as follows:

Fig. 2.17: Dummy variable for the sex variable in the Titanic dataset
This process is called dummifying, the variable creates two new variables that take either 1
or 0
value depending on what the sex of the passenger was. If the sex was female, sex_female
would be 1
and sex_male
would be 0
. If the sex was male, sex_male
would be 1
and sex_female
would be 0
. In general, all but one dummy variable in a row will have a 0
value. The variable derived from the value (for that row) in the original column will have a value of 1
.
These two new variables can be joined to the source data frame, so that they can be used in the models. The method to that is illustrated, as follows:
column_name=data.columns.values.tolist() column_name.remove('sex') data[column_name].join(dummy_sex)
The column names are converted to a list and the sex is removed from the list before joining these two dummy variables to the dataset, as it will not make sense to have a sex variable with these two dummy variables.
- 企業(yè)數(shù)字化創(chuàng)新引擎:企業(yè)級PaaS平臺HZERO
- Microsoft SQL Server企業(yè)級平臺管理實踐
- Python數(shù)據(jù)分析與挖掘?qū)崙?zhàn)
- Visual Studio 2015 Cookbook(Second Edition)
- Python廣告數(shù)據(jù)挖掘與分析實戰(zhàn)
- 企業(yè)大數(shù)據(jù)系統(tǒng)構(gòu)建實戰(zhàn):技術(shù)、架構(gòu)、實施與應(yīng)用
- 大數(shù)據(jù)算法
- 大數(shù)據(jù):規(guī)劃、實施、運維
- 達夢數(shù)據(jù)庫性能優(yōu)化
- 數(shù)字媒體交互設(shè)計(初級):Web產(chǎn)品交互設(shè)計方法與案例
- SQL優(yōu)化最佳實踐:構(gòu)建高效率Oracle數(shù)據(jù)庫的方法與技巧
- Hands-On Mathematics for Deep Learning
- SQL應(yīng)用及誤區(qū)分析
- Gideros Mobile Game Development
- 大數(shù)據(jù)數(shù)學基礎(chǔ)(R語言描述)