- Hands-On Unsupervised Learning with Python
- Giuseppe Bonaccorso
- 336字
- 2021-07-02 12:32:02
Cluster analysis
Cluster analysis (normally called just clustering) is an example of a task where we want to find out common features among large sets of samples. In this case, we always suppose the existence of a data generating process and we define the dataset X as:

A clustering algorithm is based on the implicit assumption that samples can be grouped according to their similarities. In particular, given two vectors, a similarity function is defined as the reciprocal or inverse of a metric function. For example, if we are working in a Euclidean space, we have:

In the previous formula, the constant ε has been introduced to avoid division by zero. It's obvious that d(a, c) < d(a, b) ? s(a, c) > s(a, b). Therefore, given a representative of each cluster , we can create the set of assigned vectors considering the rule:

In other words, a cluster contains all those elements whose distance from the representative is minimum compared to all other representatives. This implies that a cluster contains samples whose similarity with the representative is maximal compared to all representatives. Moreover, after the assignment, a sample gains the right to share its feature with the other members of the same cluster.
In fact, one of the most important applications of cluster analysis is trying to increase the homogeneity of samples that are recognized as similar. For example, a recommendation engine could be based on the clustering of the user vectors (containing information about their interests and bought products). Once the groups have been defined, all the elements belonging to the same cluster are considered as similar, hence we are implicitly authorized to share the differences. If user A has bought the product P and rated it positively, we can suggest this item to user B who didn't buy it and the other way around. The process can appear arbitrary, but it turns out to be extremely effective when the number of elements is large and the feature vectors contain many discriminative elements (for example, ratings).
- 筆記本電腦使用、維護(hù)與故障排除實戰(zhàn)
- SDL Game Development
- Mastering Adobe Photoshop Elements
- R Deep Learning Essentials
- Spring Cloud微服務(wù)架構(gòu)實戰(zhàn)
- 單片機(jī)系統(tǒng)設(shè)計與開發(fā)教程
- 基于Proteus仿真的51單片機(jī)應(yīng)用
- LPC1100系列處理器原理及應(yīng)用
- Wireframing Essentials
- Spring Cloud實戰(zhàn)
- 嵌入式系統(tǒng)設(shè)計大學(xué)教程(第2版)
- 多媒體應(yīng)用技術(shù)(第2版)
- FPGA進(jìn)階開發(fā)與實踐
- INSTANT Cinema 4D Starter
- 電腦組裝與硬件維修從入門到精通