官术网_书友最值得收藏!

Pima Indians Diabetes

Diabetes is a health hazard, which is mostly incurable, and patients who are diagnosed with it have to adjust their lifestyles in order to cater to this condition. Based on variables such as pregnant, glucose, pressure, triceps, insulin, mass, pedigree, and age, the problem here is to classify the person as diabetic or not. Here, we have 768 observations. This dataset is drawn from the mlbench package:

> data("PimaIndiansDiabetes")
> set.seed(12345)
> Train_Test <- sample(c("Train","Test"),nrow(PimaIndiansDiabetes),replace = TRUE,
+ prob = c(0.7,0.3))
> head(Train_Test)
[1] "Test"  "Test"  "Test"  "Test"  "Train" "Train"
> PimaIndiansDiabetes_Train <- PimaIndiansDiabetes[Train_Test=="Train",]
> PimaIndiansDiabetes_TestX <- within(PimaIndiansDiabetes[Train_Test=="Test",],
+                                     rm(diabetes))
> PimaIndiansDiabetes_TestY <- PimaIndiansDiabetes[Train_Test=="Test","diabetes"]
> PID_Formula <- as.formula("diabetes~.")

The five datasets described up to this point are classification problems. We look at one example each for regression, time series, survival, clustering, and outlier detection problems.

主站蜘蛛池模板: 清涧县| 盐池县| 乌兰县| 高青县| 通河县| 石家庄市| 财经| 炎陵县| 西乌珠穆沁旗| 永嘉县| 英吉沙县| 称多县| 天镇县| 鄂州市| 肃北| 兴业县| 五寨县| 莲花县| 托克逊县| 南安市| 洛扎县| 大洼县| 朝阳市| 珲春市| 荣昌县| 株洲市| 丽江市| 且末县| 房产| 全南县| 诏安县| 琼中| 台东县| 乌鲁木齐市| 阿拉善盟| 台东市| 聂荣县| 霍林郭勒市| 斗六市| 南部县| 大关县|