官术网_书友最值得收藏!

Clustering with k-means 

k-Nearest Neighbors (kNN) is a well-known clustering method. It is based on finding similarities in data points, or what we call the feature similarity. Thus, this algorithm is simple, and is widely used to solve many classification problems, like recommendation systems, anomaly detection, credit ratings, and so on?. However, it requires a high amount of memory. While it is a supervised learning model, it should be fed by labeled data, and the outputs are known. We only need to map the function that relates the two parties. A kNN algorithm is non-parametric. Data is represented as feature vectors. You can see it as a mathematical representation:

The classification is done like a vote; to know the class of the data selected, you must first compute the distance between the selected item and the other, training item. But how can we calculate these distances?

Generally, we have two major methods for calculating. We can use the Euclidean distance:

Or, we can use the cosine similarity:

The second step is choosing k the nearest distances (k can be picked arbitrarily). Finally, we conduct a vote, based on a confidence level. In other words, the data will be assigned to the class with the largest probability.

主站蜘蛛池模板: 合水县| 五家渠市| 武汉市| 麦盖提县| 垫江县| 金山区| 博乐市| 东莞市| 临江市| 札达县| 额尔古纳市| 平乡县| 神农架林区| 密云县| 塘沽区| 丹江口市| 中江县| 清徐县| 莲花县| 汶上县| 黄梅县| 凭祥市| 四川省| 苍梧县| 宁津县| 厦门市| 赣州市| 宝山区| 健康| 正镶白旗| 石嘴山市| 定西市| 微山县| 新宾| 维西| 慈溪市| 广灵县| 新干县| 平乡县| 南宁市| 垣曲县|