- Hands-On Unsupervised Learning with Python
- Giuseppe Bonaccorso
- 209字
- 2021-07-02 12:32:04
K-means
K-means is the simplest implementation of the principle of maximum separation and maximum internal cohesion. Let's suppose we have a dataset X ∈ ?M×N (that is, M N-dimensional samples) that we want to split into K clusters and a set of K centroids corresponding to the means of the samples assigned to each cluster Kj:

The set M and the centroids have an additional index (as a superscript) indicating the iterative step. Starting from an initial guess M(0), K-means tries to minimize an objective function called inertia (that is, the total average intra-cluster distance between samples assigned to a cluster Kj and its centroid μj):

It's easy to understand that S(t) cannot be considered as an absolute measure because its value is highly influenced by the variance of the samples. However, S(t+1) < S(t) implies that the centroids are moving closer to an optimal position where the points assigned to a cluster have the smallest possible distance to the corresponding centroid. Hence, the iterative procedure (also known as Lloyd's algorithm) starts by initializing M(0) with random values. The next step is the assignment of each sample xi ∈ X to the cluster whose centroid has the smallest distance from xi:

Once all assignments have been completed, the new centroids are recomputed as arithmetic means:

The procedure is repeated until the centroids stop changing (this implies also a sequence S(0) > S(1) > ... > S(tend)). The reader should have immediately understood that the computational time is highly influenced by the initial guess. If M(0) is very close to M(tend), a few iterations can find the optimal configuration. Conversely, when M(0) is purely random, the probability of an inefficient initial choice is close to 1 (that is, every initial uniform random choice is almost equivalent in terms of computational complexity).
- Istio入門與實(shí)戰(zhàn)
- 辦公通信設(shè)備維修
- Deep Learning with PyTorch
- 電腦組裝、維護(hù)、維修全能一本通(全彩版)
- 電腦常見故障現(xiàn)場(chǎng)處理
- Manage Partitions with GParted How-to
- 微服務(wù)分布式架構(gòu)基礎(chǔ)與實(shí)戰(zhàn):基于Spring Boot + Spring Cloud
- Apple Motion 5 Cookbook
- Large Scale Machine Learning with Python
- 筆記本電腦維修300問
- Source SDK Game Development Essentials
- 基于PROTEUS的電路設(shè)計(jì)、仿真與制板
- 數(shù)字媒體專業(yè)英語(第2版)
- 電腦組裝與維護(hù)即時(shí)通
- 圖解計(jì)算機(jī)組裝與維護(hù)