官术网_书友最值得收藏!

Pima Indians Diabetes

Diabetes is a health hazard, which is mostly incurable, and patients who are diagnosed with it have to adjust their lifestyles in order to cater to this condition. Based on variables such as pregnant, glucose, pressure, triceps, insulin, mass, pedigree, and age, the problem here is to classify the person as diabetic or not. Here, we have 768 observations. This dataset is drawn from the mlbench package:

> data("PimaIndiansDiabetes")
> set.seed(12345)
> Train_Test <- sample(c("Train","Test"),nrow(PimaIndiansDiabetes),replace = TRUE,
+ prob = c(0.7,0.3))
> head(Train_Test)
[1] "Test"  "Test"  "Test"  "Test"  "Train" "Train"
> PimaIndiansDiabetes_Train <- PimaIndiansDiabetes[Train_Test=="Train",]
> PimaIndiansDiabetes_TestX <- within(PimaIndiansDiabetes[Train_Test=="Test",],
+                                     rm(diabetes))
> PimaIndiansDiabetes_TestY <- PimaIndiansDiabetes[Train_Test=="Test","diabetes"]
> PID_Formula <- as.formula("diabetes~.")

The five datasets described up to this point are classification problems. We look at one example each for regression, time series, survival, clustering, and outlier detection problems.

主站蜘蛛池模板: 镇康县| 封开县| 泗水县| 晋城| 高清| 射阳县| 陵水| 绍兴市| 遂平县| 凌源市| 贵州省| 株洲市| 台前县| 厦门市| 股票| 两当县| 枝江市| 嘉善县| 石泉县| 宣城市| 四会市| 宁城县| 林西县| 普兰县| 增城市| 天峨县| 洛浦县| 饶河县| 大丰市| 广宗县| 乌鲁木齐县| 大同县| 丹棱县| 建昌县| 思南县| 桦南县| 双城市| 亚东县| 合江县| 绥滨县| 同江市|