官术网_书友最值得收藏!

Pima Indians Diabetes

Diabetes is a health hazard, which is mostly incurable, and patients who are diagnosed with it have to adjust their lifestyles in order to cater to this condition. Based on variables such as pregnant, glucose, pressure, triceps, insulin, mass, pedigree, and age, the problem here is to classify the person as diabetic or not. Here, we have 768 observations. This dataset is drawn from the mlbench package:

> data("PimaIndiansDiabetes")
> set.seed(12345)
> Train_Test <- sample(c("Train","Test"),nrow(PimaIndiansDiabetes),replace = TRUE,
+ prob = c(0.7,0.3))
> head(Train_Test)
[1] "Test"  "Test"  "Test"  "Test"  "Train" "Train"
> PimaIndiansDiabetes_Train <- PimaIndiansDiabetes[Train_Test=="Train",]
> PimaIndiansDiabetes_TestX <- within(PimaIndiansDiabetes[Train_Test=="Test",],
+                                     rm(diabetes))
> PimaIndiansDiabetes_TestY <- PimaIndiansDiabetes[Train_Test=="Test","diabetes"]
> PID_Formula <- as.formula("diabetes~.")

The five datasets described up to this point are classification problems. We look at one example each for regression, time series, survival, clustering, and outlier detection problems.

主站蜘蛛池模板: 新巴尔虎右旗| 上蔡县| 玉门市| 临清市| 江北区| 城固县| 昌吉市| 神池县| 绥化市| 雷州市| 宁河县| 肇州县| 利津县| 乐至县| 荃湾区| 安泽县| 临夏市| 门头沟区| 喀什市| 长泰县| 金山区| 吴江市| 沧州市| 华容县| 容城县| 新竹市| 梓潼县| 屏南县| 静宁县| 成武县| 怀柔区| 忻城县| 芒康县| 丹棱县| 大港区| 永定县| 泸水县| 洪泽县| 昆明市| 通州市| 漳州市|