官术网_书友最值得收藏!

Using classification models to predict class labels

With these tools in hand, we can now take on our first real classification example.

Consider the small town of Randomville, where people are crazy about their two sports teams, the Randomville Reds and the Randomville Blues. The Reds had been around for a long time, and people loved them. But then some out-of-town millionaire came along and bought the Reds' top scorer and started a new team, the Blues. To the discontent of most Reds fans, the top scorer would go on to win the championship title with the Blues. Years later he would return to the Reds, despite some backlash from fans who could never forgive him for his earlier career choices. But anyway, you can see why fans of the Reds don't necessarily get along with fans of the Blues. In fact, these two fan bases are so divided that they never even live next to each other. I've even heard stories where the Red fans deliberately moved away once Blues fans moved in next door. True story!

Anyway, we are new in town and are trying to sell some Blues merchandise to people by going from door to door. However, every now and then we come across a bleeding-heart Reds fan who yells at us for selling Blues stuff and chases us off their lawn. Not nice! It would be much less stressful, and a better use of our time, to avoid these houses altogether and just visit the Blues fans instead.

Confident that we can learn to predict where the Reds fans live, we start keeping track of our encounters. If we come by a Reds fan's house, we draw a red triangle on we handy town map; otherwise, we draw a blue square. After a while, we get a pretty good idea of where everyone lives:

Randomville's town map

However, now we approach the house that is marked as a green circle in the preceding map. Should we knock on their door? We try to find some clue as to what team they prefer (perhaps a team flag hanging from the back porch), but we can't see any. How can we know if it is safe to knock on their door?

What this silly example illustrates is exactly the kind of problem a supervised learning algorithm can solve. We have a bunch of observations (houses, their locations, and their color) that make up our training data. We can use this data to learn from experience, so that when we face the task of predicting the color of a new house, we can make a well-informed estimate.

As we mentioned earlier, fans of the Reds are really passionate about their team, so they would never move next to a Blues fan. Couldn't we use this information and look at all the neighboring houses, in order to find out what kind of fan lives in the new house?

This is exactly what the k-NN algorithm would do.

主站蜘蛛池模板: 望江县| 余姚市| 英吉沙县| 容城县| 老河口市| 桐城市| 红原县| 滦南县| 温州市| 额济纳旗| 宁德市| 常州市| 横峰县| 益阳市| 通山县| 沂南县| 定安县| 昌乐县| 靖江市| 克山县| 通河县| 富川| 临泽县| 吴旗县| 通河县| 石家庄市| 凉城县| 静乐县| 米脂县| 邵阳县| 依安县| 江川县| 苏尼特右旗| 九龙城区| 南川市| 延寿县| 壤塘县| 赤峰市| 平舆县| 新宁县| 平昌县|