- Machine Learning with Swift
- Alexander Sosnovshchenko
- 536字
- 2021-06-24 18:55:04
Understanding the KNN algorithm
To recognize different types of motion activities, we will train the KNN classifier. The idea of the method is to find k training samples closest to the sample with an unknown label, and predict the label as a most frequent class among those k. That's it:

In fact, the algorithm is so simple, that it's tempting to formulate it in more complicated terms. Let's do it. The secret sauce of a KNN is a distance metric: function, which defines how close to each other two samples are. We have discussed several of them already: Euclidean, Manhattan, Minkowski, edit distance, and DTW. Following the terminology, samples are points in some n-dimensional space, where n equals to the number of features in each sample. This space is called feature space, and samples are distributed in it as clouds of points. Classification of an unknown data point happens in three steps:
- Calculate distances from the point to all points in a training set
- Choose the k-closest neighbors to the unknown point
- Perform a majority vote among them
The surface that separates one class of points from another class is known as a decision boundary. The KNN algorithm creates piecewise linear decision boundaries that can approximate a decision boundary of any complexity by adding more and more training samples:

You may think that this algorithm is too simple to be used for anything but some toy tasks. But over the years, KNN has demonstrated to be successfully employed for a wide range of problems, such as handwriting recognition, and satellite photo classification. It's also worth noting that it's easy to turn this classification algorithm into regression—you just need to replace categorical labels with the real numbers, and add an interpolation function.
Many restrictions of the linear regressions come from the fact that it assumes that data is normally distributed. The class of statistical models which makes explicit assumptions about the statistical distribution underlying data is called parametric models.
Unlike linear regression, KNN makes no assumptions about the distribution from which samples are generated. That's why we call them non-parametric. This is the right tool to choose in situations where data has unusual distribution, and the decision boundary is irregular.
- 新媒體跨界交互設計
- ATmega16單片機項目驅動教程
- 基于Proteus和Keil的C51程序設計項目教程(第2版):理論、仿真、實踐相融合
- 基于ARM的嵌入式系統(tǒng)和物聯(lián)網(wǎng)開發(fā)
- 嵌入式系統(tǒng)設計教程
- Hands-On Machine Learning with C#
- 計算機組裝維修與外設配置(高等職業(yè)院校教改示范教材·計算機系列)
- Machine Learning Solutions
- Arduino BLINK Blueprints
- Creating Flat Design Websites
- Internet of Things Projects with ESP32
- 超大流量分布式系統(tǒng)架構解決方案:人人都是架構師2.0
- LPC1100系列處理器原理及應用
- 筆記本電腦芯片級維修從入門到精通(圖解版)
- 計算機組裝與維護(慕課版)