- R Programming By Example
- Omar Trejo Navarro
- 342字
- 2021-07-02 21:30:42
Creating a new dataset with what we've learned
What we have learned so far in this chapter is that age, education, and ethnicity are important factors in understanding the way people voted in the Brexit Referendum. Younger people with higher education levels are related with votes in favor of remaining in the EU. Older white people are related with votes in favor of leaving the EU. We can now use this knowledge to make a more succinct data set that incorporates this knowledge. First we add relevant variables, and then we remove non-relevant variables.
Our new relevant variables are two groups of age (adults below and above 45), two groups of ethnicity (whites and non-whites), and two groups of education (high and low education levels):
data$Age_18to44 <- ( data$Age_18to19 + data$Age_20to24 + data$Age_25to29 + data$Age_30to44 ) data$Age_45plus <- ( data$Age_45to59 + data$Age_60to64 + data$Age_65to74 + data$Age_75to84 + data$Age_85to89 + data$Age_90plus ) data$NonWhite <- ( data$Black + data$Asian + data$Indian + data$Pakistani ) data$HighEducationLevel <- data$L4Quals_plus data$LowEducationLevel <- data$NoQuals
Now we remove the old variables that were used to create our newly added variables. To do so without having to manually specify a full list by leveraging the fact that all of them contain the word "Age", we create the age_variables logical vector, which contains a TRUE value for those variables that contain the word "Age" inside (FALSE otherwise), and make sure we keep our newly created Age_18to44 and Age_45plus variables. We remove the other ethnicity and education levels manually:
column_names <- colnames(data) new_variables <- !logical(length(column_names)) new_variables <- setNames(new_variables, column_names) age_variables <- sapply(column_names, function(x) grepl("Age", x)) new_variables[age_variables] <- FALSE new_variables[["AdultMeanAge"]] <- TRUE new_variables[["Age_18to44"]] <- TRUE new_variables[["Age_45plus"]] <- TRUE new_variables[["Black"]] <- FALSE new_variables[["Asian"]] <- FALSE new_variables[["Indian"]] <- FALSE new_variables[["Pakistani"]] <- FALSE new_variables[["NoQuals"]] <- FALSE new_variables[["L4Quals_plus"]] <- FALSE new_variables[["OwnedOutright"]] <- FALSE new_variables[["MultiDeprived"]] <- FALSE
We save our created data_adjusted object by selecting the new columns, create our new numerical variables for the new data structure, and save it as a CSV file:
data_adjusted <- data[, new_variables] numerical_variables_adjusted <- sapply(data_adjusted, is.numeric) write.csv(data_adjusted, file = "data_brexit_referendum_adjusted.csv")
- Introduction to DevOps with Kubernetes
- Mobile DevOps
- Dreamweaver CS3網(wǎng)頁設(shè)計(jì)與網(wǎng)站建設(shè)詳解
- 21天學(xué)通C#
- JBoss ESB Beginner’s Guide
- 大學(xué)計(jì)算機(jī)應(yīng)用基礎(chǔ)
- CompTIA Network+ Certification Guide
- Windows游戲程序設(shè)計(jì)基礎(chǔ)
- 單片機(jī)C語言程序設(shè)計(jì)完全自學(xué)手冊(cè)
- 教育機(jī)器人的風(fēng)口:全球發(fā)展現(xiàn)狀及趨勢(shì)
- 空間機(jī)械臂建模、規(guī)劃與控制
- INSTANT Adobe Story Starter
- 大數(shù)據(jù)導(dǎo)論
- 渲染王3ds Max三維特效動(dòng)畫技術(shù)
- Generative Adversarial Networks Projects