官术网_书友最值得收藏!

German Credit

Loans are not always repaid in full, and there are defaulters. In this case, it becomes important for the bank to identify potential defaulters based on the available information. Here, we adapt the GC dataset from the RSADBE package to properly reflect the labels of the factor variable. The transformed dataset is available as GC2.RData in the data folder. The GC dataset itself is mainly an adaptation of the version available at https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data). Here, we have 1,000 observations, and 20 covariate/independent variables such as the status of existing checking account, duration, and so forth. The final status of whether the loan was completely paid or not is available in the good_bad column. We will partition the data into training and testing parts, and create the formula too:

> library(RSADBE)
> load("../Data/GC2.RData")
> table(GC2$good_bad)
 bad good 
 300  700 
> set.seed(12345)
> Train_Test <- sample(c("Train","Test"),nrow(GC2),replace = TRUE,prob=c(0.7,0.3))
> head(Train_Test)
[1] "Test"  "Test"  "Test"  "Test"  "Train" "Train"
> GC2_Train <- GC2[Train_Test=="Train",]
> GC2_TestX <- within(GC2[Train_Test=="Test",],rm(good_bad))
> GC2_TestY <- GC2[Train_Test=="Test","good_bad"]
> GC2_Formula <- as.formula("good_bad~.")
主站蜘蛛池模板: 南雄市| 巧家县| 缙云县| 中牟县| 三门峡市| 钟祥市| 临湘市| 邵东县| 盐山县| 澎湖县| 会宁县| 达孜县| 芦溪县| 逊克县| 扶风县| 广宁县| 图们市| 布拖县| 开化县| 嵩明县| 徐水县| 临颍县| 海原县| 泸定县| 青龙| 安吉县| 自贡市| 固安县| 宜兴市| 抚顺市| 冷水江市| 莫力| 琼结县| 北川| 甘孜县| 平塘县| 澄迈县| 通江县| 乌拉特后旗| 水富县| 武鸣县|