官术网_书友最值得收藏!

Clustering

Clustering is a technique for grouping similar instances into clusters according to some distance measures. The main idea is to put instances that are similar (that is, close to each other) into the same cluster, while keeping the dissimilar points (that is, the ones further apart from each other) in different clusters. An example of how clusters might look like is shown in the following diagram:

The clustering algorithms follow two fundamentally different approaches. The first is a hierarchical or agglomerative approach that first considers each point as its own cluster, and then iteratively merges the most similar clusters together. It stops when further merging reaches a predefined number of clusters, or if the clusters to be merged are spread over a large region.

The other approach is based on point assignment. First, initial cluster centers (that is, centroids) are estimated, for instance, randomly, and then, each point is assigned to the closest cluster, until all of the points are assigned. The most well known algorithm in this group is k-means clustering.

The k-means clustering either picks initial cluster centers as points that are as far as possible from one another, or (hierarchically) clusters a sample of data and picks a point that is the closest to the center of each of the k-clusters.

主站蜘蛛池模板: 阿尔山市| 玉山县| 安塞县| 莎车县| 九龙城区| 壶关县| 新民市| 七台河市| 聂荣县| 竹山县| 梁河县| 宿松县| 长治市| 启东市| 丹巴县| 广平县| 耒阳市| 延津县| 瓮安县| 博乐市| 伊吾县| 安达市| 进贤县| 宜君县| 博野县| 永城市| 调兵山市| 手机| 太谷县| 孟津县| 图们市| 沈阳市| 渭南市| 河间市| 六枝特区| 疏附县| 德惠市| 龙南县| 资溪县| 四会市| 湛江市|