官术网_书友最值得收藏!

Pima Indians Diabetes

Diabetes is a health hazard, which is mostly incurable, and patients who are diagnosed with it have to adjust their lifestyles in order to cater to this condition. Based on variables such as pregnant, glucose, pressure, triceps, insulin, mass, pedigree, and age, the problem here is to classify the person as diabetic or not. Here, we have 768 observations. This dataset is drawn from the mlbench package:

> data("PimaIndiansDiabetes")
> set.seed(12345)
> Train_Test <- sample(c("Train","Test"),nrow(PimaIndiansDiabetes),replace = TRUE,
+ prob = c(0.7,0.3))
> head(Train_Test)
[1] "Test"  "Test"  "Test"  "Test"  "Train" "Train"
> PimaIndiansDiabetes_Train <- PimaIndiansDiabetes[Train_Test=="Train",]
> PimaIndiansDiabetes_TestX <- within(PimaIndiansDiabetes[Train_Test=="Test",],
+                                     rm(diabetes))
> PimaIndiansDiabetes_TestY <- PimaIndiansDiabetes[Train_Test=="Test","diabetes"]
> PID_Formula <- as.formula("diabetes~.")

The five datasets described up to this point are classification problems. We look at one example each for regression, time series, survival, clustering, and outlier detection problems.

主站蜘蛛池模板: 绍兴县| 宾川县| 华池县| 池州市| 抚远县| 双江| 雅江县| 公安县| 商都县| 乡城县| 阜新| 鹿邑县| 云林县| 新竹市| 龙口市| 汝州市| 房山区| 蒙阴县| 富川| 延寿县| 马山县| 平顺县| 莒南县| 东宁县| 青河县| 扬中市| 额济纳旗| 金门县| 兴隆县| 天祝| 哈尔滨市| 密云县| 富蕴县| 客服| 新宁县| 邯郸市| 德安县| 岑巩县| 廉江市| 虹口区| 德格县|