官术网_书友最值得收藏!

Pima Indians Diabetes

Diabetes is a health hazard, which is mostly incurable, and patients who are diagnosed with it have to adjust their lifestyles in order to cater to this condition. Based on variables such as pregnant, glucose, pressure, triceps, insulin, mass, pedigree, and age, the problem here is to classify the person as diabetic or not. Here, we have 768 observations. This dataset is drawn from the mlbench package:

> data("PimaIndiansDiabetes")
> set.seed(12345)
> Train_Test <- sample(c("Train","Test"),nrow(PimaIndiansDiabetes),replace = TRUE,
+ prob = c(0.7,0.3))
> head(Train_Test)
[1] "Test"  "Test"  "Test"  "Test"  "Train" "Train"
> PimaIndiansDiabetes_Train <- PimaIndiansDiabetes[Train_Test=="Train",]
> PimaIndiansDiabetes_TestX <- within(PimaIndiansDiabetes[Train_Test=="Test",],
+                                     rm(diabetes))
> PimaIndiansDiabetes_TestY <- PimaIndiansDiabetes[Train_Test=="Test","diabetes"]
> PID_Formula <- as.formula("diabetes~.")

The five datasets described up to this point are classification problems. We look at one example each for regression, time series, survival, clustering, and outlier detection problems.

主站蜘蛛池模板: 舟曲县| 班玛县| 江孜县| 建平县| 鹤山市| 米脂县| 深水埗区| 通江县| 阳东县| 隆回县| 东阿县| 安泽县| 凤翔县| 通海县| 通城县| 樟树市| 宜春市| 抚宁县| 崇州市| 克东县| 保山市| 乌拉特中旗| 确山县| 大荔县| 兴化市| 湘阴县| 遂溪县| 邵东县| 湘潭市| 白银市| 肃北| 砀山县| 永城市| 上思县| 仁化县| 镇原县| 饶阳县| 衡山县| 龙海市| 含山县| 宁波市|