官术网_书友最值得收藏!

K-means

K-means is the simplest implementation of the principle of maximum separation and maximum internal cohesion. Let's suppose we have a dataset X ∈ ?M×N (that is, M N-dimensional samples) that we want to split into K clusters and a set of K centroids corresponding to the means of the samples assigned to each cluster Kj:

The set M and the centroids have an additional index (as a superscript) indicating the iterative step. Starting from an initial guess M(0), K-means tries to minimize an objective function called inertia (that is, the total average intra-cluster distance between samples assigned to a cluster Kj and its centroid μj):

It's easy to understand that S(t) cannot be considered as an absolute measure because its value is highly influenced by the variance of the samples. However, S(t+1) < S(t) implies that the centroids are moving closer to an optimal position where the points assigned to a cluster have the smallest possible distance to the corresponding centroid. Hence, the iterative procedure (also known as Lloyd's algorithm) starts by initializing M(0) with random values. The next step is the assignment of each sample xi ∈ X to the cluster whose centroid has the smallest distance from xi:

Once all assignments have been completed, the new centroids are recomputed as arithmetic means:

The procedure is repeated until the centroids stop changing (this implies also a sequence S(0) > S(1) > ... > S(tend)). The reader should have immediately understood that the computational time is highly influenced by the initial guess. If M(0) is very close to M(tend), a few iterations can find the optimal configuration. Conversely, when M(0) is purely random, the probability of an inefficient initial choice is close to 1 (that is, every initial uniform random choice is almost equivalent in terms of computational complexity).

主站蜘蛛池模板: 大余县| 渭南市| 壤塘县| 大冶市| 和顺县| 大同市| 雷山县| 潮安县| 溧阳市| 东城区| 竹溪县| 景东| 洪江市| 吉水县| 黄龙县| 奉新县| 望城县| 铜山县| 内乡县| 承德市| 云林县| 山东省| 青阳县| 瑞金市| 清丰县| 大田县| 安平县| 克东县| 璧山县| 汝阳县| 札达县| 子洲县| 梁平县| 玉环县| 遂平县| 古浪县| 达州市| 龙里县| 瑞金市| 石景山区| 同心县|