官术网_书友最值得收藏!

German Credit

Loans are not always repaid in full, and there are defaulters. In this case, it becomes important for the bank to identify potential defaulters based on the available information. Here, we adapt the GC dataset from the RSADBE package to properly reflect the labels of the factor variable. The transformed dataset is available as GC2.RData in the data folder. The GC dataset itself is mainly an adaptation of the version available at https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data). Here, we have 1,000 observations, and 20 covariate/independent variables such as the status of existing checking account, duration, and so forth. The final status of whether the loan was completely paid or not is available in the good_bad column. We will partition the data into training and testing parts, and create the formula too:

> library(RSADBE)
> load("../Data/GC2.RData")
> table(GC2$good_bad)
 bad good 
 300  700 
> set.seed(12345)
> Train_Test <- sample(c("Train","Test"),nrow(GC2),replace = TRUE,prob=c(0.7,0.3))
> head(Train_Test)
[1] "Test"  "Test"  "Test"  "Test"  "Train" "Train"
> GC2_Train <- GC2[Train_Test=="Train",]
> GC2_TestX <- within(GC2[Train_Test=="Test",],rm(good_bad))
> GC2_TestY <- GC2[Train_Test=="Test","good_bad"]
> GC2_Formula <- as.formula("good_bad~.")
主站蜘蛛池模板: 额尔古纳市| 海晏县| 松江区| 凯里市| 扶绥县| 江油市| 北川| 开远市| 桃江县| 宁都县| 五常市| 高清| 和政县| 太康县| 盘锦市| 宜川县| 桐柏县| 禄劝| 凭祥市| 钦州市| 迭部县| 西城区| 辉县市| 黑河市| 郴州市| 桑日县| 云和县| 米脂县| 驻马店市| 防城港市| 会泽县| 偏关县| 伊川县| 麻栗坡县| 郴州市| 丰县| 漳平市| 宜阳县| 上虞市| 莱阳市| 南陵县|