官术网_书友最值得收藏!

  • Learning Jupyter 5
  • Dan Toomey
  • 161字
  • 2021-08-13 15:42:14

R cluster analysis

In this example, we will use R's cluster analysis functions to determine the clustering in the wheat dataset from https://uci.edu/.

The R script we want to use in Jupyter is as follows:

# load the wheat data set from uci.edu 
wheat <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/00236/seeds_dataset.txt", sep="\t") 
 
# define useful column names 
colnames(wheat) <-c("area", "perimeter", "compactness", "length", "width", "asymmetry", "groove", "undefined") 
 
# exclude incomplete cases from the data 
wheat <- wheat[complete.cases(wheat),] 
 
# calculate the clusters 
set.seed(117) #to make reproducible results 
fit <- kmeans(wheat, 5) 
fit 

Once entered into a Notebook, we will have something such as this:

The resulting, generated cluster information is k-means clustering with five clusters of sizes; 39, 53, 47, 29, and 30 (Note that I set the seed value for random number use, so your results will not vary):

So, we generated the information of five clusters (the parameter passed into the fit statement). It is a little bothersome that the cluster sum of squares vary greatly.

主站蜘蛛池模板: 普陀区| 泰安市| 阿拉善左旗| 石家庄市| 上思县| 突泉县| 孝义市| 巴彦淖尔市| 中西区| 金平| 二手房| 刚察县| 桂林市| 综艺| 潜江市| 阜新市| 子长县| 北海市| 绍兴县| 额济纳旗| 大英县| 高青县| 宁陕县| 连城县| 贡觉县| 垣曲县| 盐亭县| 夏河县| 克东县| 花垣县| 库车县| 内黄县| 观塘区| 克什克腾旗| 遂川县| 平湖市| 临颍县| 花垣县| 剑阁县| 开封市| 青川县|