官术网_书友最值得收藏!

Choosing a good k

It is important to pick a proper value of hyperparameter k, since it can improve a model's performance as well as degrade it when chosen incorrectly. One popular rule of thumb is to take a square root of the number of training samples. Many popular software packages use this heuristic as a default k value. Unfortunately, this doesn't always work well, because of the differences in the data and distance metrics.

There is no mathematically-grounded way to come up with the optimal number of neighbors from the very beginning. The only option is to scan through a range of ks, and choose the best one according to some performance metric. You can use any performance metric that we've already described in the previous chapter: accuracy, F1, and so on. The cross-validation is especially useful when the data is scarce.

In fact, there is a variation of KNN, which doesn't require k at all. The idea is to make the algorithm take the radius of a ball to search the neighbors within. The k will be different for each point then, depending on the local density of points. This variation of the algorithm is known as radius-based neighbor learning. It suffers from the n-ball volume problem (see next section), because the more features you have, the bigger the radius should be to catch at least one neighbor.

主站蜘蛛池模板: 兴文县| 广饶县| 集安市| 乌拉特前旗| 张家口市| 从江县| 英超| 大庆市| 烟台市| 开封县| 武汉市| 会泽县| 芒康县| 香河县| 南汇区| 淅川县| 金乡县| 灵川县| 堆龙德庆县| 根河市| 清流县| 镇远县| 美姑县| 沙雅县| 得荣县| 新沂市| 禄丰县| 岳普湖县| 英吉沙县| 冷水江市| 临邑县| 河曲县| 枣阳市| 汽车| 周口市| 繁峙县| 泸溪县| 黔西| 长春市| 嘉鱼县| 洪雅县|