官术网_书友最值得收藏!

Cold start problem

A very common situation is when a machine learning system starts functioning in a new environment, where no information to pre-train is available. The situation is known as a cold start. Such a system requires a certain amount of time to collect enough training data, and start producing meaningful predictions. The problem often arises in the context of personalization and recommender systems.

One solution for this it is so-called active learning, where the system can actively seek new data that could improve its performance. Usually, this means that the system queries a user to label some data. For instance, the user can be asked to provide some labeled examples before the start of the system, or the system can ping him when it stumbles upon especially hard cases asking to label them manually. Active learning is a special case of semi-supervised learning.

The second component of active learning is estimating which samples are the most useful by associating weights to them. In the case of KNN, these can be the samples that the model is less confident about, for example, the samples for whom their neighbors' classes are divided almost equally or the samples that are far from all others (outliers).

However, some researchers point out that active learning is built on flawed assumptions: the user is always available and willing to answer questions and he is always right in his/her answers. This is also something worth keeping in mind when building an active learning solution.

I guess when the Twitter app pings you at 4 AM with push notifications like Take a look at this and 13 other Highlights, it just wants to update its small personalized binary classifier of interesting/not interesting content using active learning.

Figure 3.8: App interface

In the classification phase, we feed unlabeled chunks of the same size into the classifier and get predictions which display to the user. We use DTW as a distance measure with locality constraint 3. In my experiments, k as 1 gave the best results but you can experiment with other number of neighbors. I will show here only the machine learning part, without the data collection part and user interface.

Creating the classifier:

classifier = kNN(k: 1, distanceMetric: DTW.distance(w: 3)) 

Training the classifier:

self.classifier.train(X: [magnitude(series3D: series)], y: [motionType]) 

The magnitude() function converts three-dimensional series into one-dimensional by calculating vector magnitude   to simplify the computations.

Making the predictions:

let motionType = self.classifier.predict(x: magnitude(series3D: series)) 
主站蜘蛛池模板: 铜山县| 新乐市| 九龙坡区| 泽库县| 云南省| 霍山县| 河北区| 龙江县| 嘉义市| 建宁县| 马关县| 南和县| 阳山县| 甘孜| 长子县| 达孜县| 上饶市| 于田县| 鄂伦春自治旗| 桂林市| 宾阳县| 张家界市| 长治市| 万全县| 康平县| 房产| 丰镇市| 涪陵区| 新民市| 阿城市| 荆州市| 铜梁县| 平武县| 赤水市| 丹寨县| 依兰县| 浑源县| 阿图什市| 曲麻莱县| 呼图壁县| 郧西县|