官术网_书友最值得收藏!

Clustering with k-means 

k-Nearest Neighbors (kNN) is a well-known clustering method. It is based on finding similarities in data points, or what we call the feature similarity. Thus, this algorithm is simple, and is widely used to solve many classification problems, like recommendation systems, anomaly detection, credit ratings, and so on?. However, it requires a high amount of memory. While it is a supervised learning model, it should be fed by labeled data, and the outputs are known. We only need to map the function that relates the two parties. A kNN algorithm is non-parametric. Data is represented as feature vectors. You can see it as a mathematical representation:

The classification is done like a vote; to know the class of the data selected, you must first compute the distance between the selected item and the other, training item. But how can we calculate these distances?

Generally, we have two major methods for calculating. We can use the Euclidean distance:

Or, we can use the cosine similarity:

The second step is choosing k the nearest distances (k can be picked arbitrarily). Finally, we conduct a vote, based on a confidence level. In other words, the data will be assigned to the class with the largest probability.

主站蜘蛛池模板: 荆州市| 鲁甸县| 金昌市| 张家界市| 鞍山市| 郑州市| 桃江县| 安仁县| 进贤县| 邵武市| 乌海市| 桐梓县| 丹阳市| 承德市| 镇赉县| 丹东市| 专栏| 中阳县| 威信县| 永和县| 夹江县| 穆棱市| 纳雍县| 肇州县| 鲁甸县| 襄垣县| 贺州市| 中山市| 山阳县| 南昌市| 汕头市| 桓仁| 宣化县| 虹口区| 平邑县| 台东县| 凤台县| 宜州市| 田东县| 大渡口区| 繁峙县|