官术网_书友最值得收藏!

German Credit

Loans are not always repaid in full, and there are defaulters. In this case, it becomes important for the bank to identify potential defaulters based on the available information. Here, we adapt the GC dataset from the RSADBE package to properly reflect the labels of the factor variable. The transformed dataset is available as GC2.RData in the data folder. The GC dataset itself is mainly an adaptation of the version available at https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data). Here, we have 1,000 observations, and 20 covariate/independent variables such as the status of existing checking account, duration, and so forth. The final status of whether the loan was completely paid or not is available in the good_bad column. We will partition the data into training and testing parts, and create the formula too:

> library(RSADBE)
> load("../Data/GC2.RData")
> table(GC2$good_bad)
 bad good 
 300  700 
> set.seed(12345)
> Train_Test <- sample(c("Train","Test"),nrow(GC2),replace = TRUE,prob=c(0.7,0.3))
> head(Train_Test)
[1] "Test"  "Test"  "Test"  "Test"  "Train" "Train"
> GC2_Train <- GC2[Train_Test=="Train",]
> GC2_TestX <- within(GC2[Train_Test=="Test",],rm(good_bad))
> GC2_TestY <- GC2[Train_Test=="Test","good_bad"]
> GC2_Formula <- as.formula("good_bad~.")
主站蜘蛛池模板: 日喀则市| 应用必备| 长乐市| 新丰县| 水富县| 秦安县| 崇州市| 北宁市| 鄂州市| 洛南县| 邢台市| 洛宁县| 南雄市| 鹤山市| 莱西市| 博野县| 松潘县| 陕西省| 兴隆县| 万宁市| 吉林市| 南充市| 密云县| 桑植县| 天柱县| 长子县| 青州市| 都安| 黔江区| 广南县| 安西县| 曲周县| 叶城县| 博野县| 馆陶县| 大化| 定襄县| 灌南县| 竹山县| 潜江市| 德兴市|