- Python:Advanced Predictive Analytics
- Ashish Kumar Joseph Babcock
- 263字
- 2021-07-02 20:09:22
Creating dummy variables
Creating dummy variables is a method to create separate variable for each category of a categorical variable., Although, the categorical variable contains plenty of information and might show a causal relationship with output variable, it can't be used in the predictive models like linear and logistic regression without any processing.
In our dataset, sex
is a categorical variable with two categories that are male and female. We can create two dummy variables out of this, as follows:
dummy_sex=pd.get_dummies(data['sex'],prefix='sex')
The result of this statement is, as follows:

Fig. 2.17: Dummy variable for the sex variable in the Titanic dataset
This process is called dummifying, the variable creates two new variables that take either 1
or 0
value depending on what the sex of the passenger was. If the sex was female, sex_female
would be 1
and sex_male
would be 0
. If the sex was male, sex_male
would be 1
and sex_female
would be 0
. In general, all but one dummy variable in a row will have a 0
value. The variable derived from the value (for that row) in the original column will have a value of 1
.
These two new variables can be joined to the source data frame, so that they can be used in the models. The method to that is illustrated, as follows:
column_name=data.columns.values.tolist() column_name.remove('sex') data[column_name].join(dummy_sex)
The column names are converted to a list and the sex is removed from the list before joining these two dummy variables to the dataset, as it will not make sense to have a sex variable with these two dummy variables.
- 數(shù)據(jù)存儲架構與技術
- Mastering Ninject for Dependency Injection
- SQL Server 2008數(shù)據(jù)庫應用技術(第二版)
- 商業(yè)分析思維與實踐:用數(shù)據(jù)分析解決商業(yè)問題
- Mastering Machine Learning with R(Second Edition)
- The Game Jam Survival Guide
- 大數(shù)據(jù)技術入門
- 云數(shù)據(jù)中心網(wǎng)絡與SDN:技術架構與實現(xiàn)
- Visual Studio 2013 and .NET 4.5 Expert Cookbook
- 貫通SQL Server 2008數(shù)據(jù)庫系統(tǒng)開發(fā)
- The Natural Language Processing Workshop
- 大數(shù)據(jù)與機器學習:實踐方法與行業(yè)案例
- openGauss數(shù)據(jù)庫核心技術
- Microsoft Dynamics NAV 2015 Professional Reporting
- Practical Convolutional Neural Networks