官术网_书友最值得收藏!

Clustering with k-means 

k-Nearest Neighbors (kNN) is a well-known clustering method. It is based on finding similarities in data points, or what we call the feature similarity. Thus, this algorithm is simple, and is widely used to solve many classification problems, like recommendation systems, anomaly detection, credit ratings, and so on?. However, it requires a high amount of memory. While it is a supervised learning model, it should be fed by labeled data, and the outputs are known. We only need to map the function that relates the two parties. A kNN algorithm is non-parametric. Data is represented as feature vectors. You can see it as a mathematical representation:

The classification is done like a vote; to know the class of the data selected, you must first compute the distance between the selected item and the other, training item. But how can we calculate these distances?

Generally, we have two major methods for calculating. We can use the Euclidean distance:

Or, we can use the cosine similarity:

The second step is choosing k the nearest distances (k can be picked arbitrarily). Finally, we conduct a vote, based on a confidence level. In other words, the data will be assigned to the class with the largest probability.

主站蜘蛛池模板: 尼木县| 陆川县| 鲁山县| 丰宁| 德庆县| 眉山市| 柘城县| 南京市| 天等县| 绥德县| 衡水市| 肥西县| 金阳县| 保康县| 浠水县| 贡嘎县| 丹凤县| 洪泽县| 惠州市| 泾川县| 黄大仙区| 根河市| 青州市| 鄯善县| 游戏| 云阳县| 秦安县| 浏阳市| 绍兴市| 东港市| 常宁市| 波密县| 留坝县| 佛山市| 同德县| 荣成市| 务川| 贵德县| 滨海县| 屯昌县| 淮安市|