官术网_书友最值得收藏!

Implementing KNN in Swift

Fast implementations of KNN and DTW can be found in many machine learning and DSP libraries, for example lbimproved and matchbox C++ libraries:

The KNN classifier works with virtually any type of data since you define distance metric for your data points. That's why we define it as a generic structure parameterized with types for features and labels. Labels should conform to a Hashable protocol, as we're going to use them for dictionary keys:

struct kNN<X, Y> where Y: Hashable { ... } 

KNN has two hyperparameters: k—the number of neighbors var k: Int, and distance metric. We'll define it elsewhere, and pass during the initialization. Metric is a function, returning double distance for any two samples x1 and x2:

var distanceMetric: (_ x1: X, _ x2: X) -> Double 

During the initialization, we just record the hyperparameters inside our structure. The definition of init looks like this:

init (k: Int, distanceMetric: @escaping (_ x1: X, _ x2: X) -> Double) { 
    self.k = k 
    self.distanceMetric = distanceMetric 
}  

KNN stores all its training data points. We are using the array of pairs (features, label) for this purposes:

private var data: [(X, Y)] = []  

As usual with supervised learning models, we'll stick to the interface with two methods, train and predict, which reflect the two phases of a supervised algorithm's life. The train method in the case of KNN just saves the data points to use them later in the predict method:

mutating func train(X: [X], y: [Y]) { 
    data.append(contentsOf: zip(X, y)) 
} 

The predict method takes the data point and predicts the label for it:

func predict(x: X) -> Y? {  
    assert(data.count > 0, "Please, use method train() at first to provide training data.") 
    assert(k > 0, "Error, k must be greater then 0.") 

For this, we iterate through all samples in the training dataset, and compare them with the input sample x. We use (distance, label) tuples to keep track of distances to each of the training samples. After this, we sort all the samples descending by distances, and take the (prefix) first k elements:

    let tuples = data 
        .map { (distanceMetric(x, $0.0), $0.1) }  
        .sorted { $0.0 < $1.0 }  
        .prefix(upTo: k) 
This implementation is not optimal, and can be improved by keeping track of only the best k samples at each step, but the goal of it is to demonstrate the simplest machine learning algorithm without diving into the complex data structures, and show that even such na?ve versions of it can perform well on complex tasks.

Now we arrange majority voting among top k samples. We count the frequency of each label, and sort them from descending:

let countedSet = NSCountedSet(array: tuples.map{$0.1}) 
let result = countedSet.allObjects.sorted { 
    countedSet.count(for: $0) > countedSet.count(for: $1) 
    }.first 
return result as? Y 
} 

The result variable holds a predicted class label.

主站蜘蛛池模板: 故城县| 香格里拉县| 万安县| 海安县| 东明县| 渝北区| 南溪县| 陇川县| 隆回县| 乐业县| 巴东县| 万州区| 庆元县| 兴义市| 济阳县| 读书| 昌吉市| 洛浦县| 邹城市| 白城市| 宁蒗| 英山县| 石家庄市| 诸城市| 蓝山县| 平泉县| 南丰县| 洱源县| 大方县| 鞍山市| 双流县| 湟中县| 宁城县| 莱州市| 江都市| 桃源县| 东阿县| 白沙| 望奎县| 神农架林区| 东兰县|