官术网_书友最值得收藏!

Using classification models to predict class labels

With these tools in hand, we can now take on our first real classification example.

Consider the small town of Randomville, where people are crazy about their two sports teams, the Randomville Reds and the Randomville Blues. The Reds had been around for a long time, and people loved them. But then some out-of-town millionaire came along and bought the Reds' top scorer and started a new team, the Blues. To the discontent of most Reds fans, the top scorer would go on to win the championship title with the Blues. Years later he would return to the Reds, despite some backlash from fans who could never forgive him for his earlier career choices. But anyway, you can see why fans of the Reds don't necessarily get along with fans of the Blues. In fact, these two fan bases are so divided that they never even live next to each other. I've even heard stories where the Red fans deliberately moved away once Blues fans moved in next door. True story!

Anyway, we are new in town and are trying to sell some Blues merchandise to people by going from door to door. However, every now and then we come across a bleeding-heart Reds fan who yells at us for selling Blues stuff and chases us off their lawn. Not nice! It would be much less stressful, and a better use of our time, to avoid these houses altogether and just visit the Blues fans instead.

Confident that we can learn to predict where the Reds fans live, we start keeping track of our encounters. If we come by a Reds fan's house, we draw a red triangle on we handy town map; otherwise, we draw a blue square. After a while, we get a pretty good idea of where everyone lives:

Randomville's town map

However, now we approach the house that is marked as a green circle in the preceding map. Should we knock on their door? We try to find some clue as to what team they prefer (perhaps a team flag hanging from the back porch), but we can't see any. How can we know if it is safe to knock on their door?

What this silly example illustrates is exactly the kind of problem a supervised learning algorithm can solve. We have a bunch of observations (houses, their locations, and their color) that make up our training data. We can use this data to learn from experience, so that when we face the task of predicting the color of a new house, we can make a well-informed estimate.

As we mentioned earlier, fans of the Reds are really passionate about their team, so they would never move next to a Blues fan. Couldn't we use this information and look at all the neighboring houses, in order to find out what kind of fan lives in the new house?

This is exactly what the k-NN algorithm would do.

主站蜘蛛池模板: 乾安县| 原阳县| 河间市| 剑阁县| 泰州市| 长顺县| 鄂伦春自治旗| 磐石市| 崇左市| 克什克腾旗| 延长县| 肇东市| 德阳市| 晋宁县| 宁陕县| 新平| 龙泉市| 永城市| 县级市| 仙游县| 宣武区| 嘉荫县| 乌兰浩特市| 诸城市| 博罗县| 普兰县| 平果县| 嘉鱼县| 方城县| 当涂县| 寻乌县| 屯留县| 通榆县| 柳林县| 工布江达县| 汉沽区| 老河口市| 湖州市| 潞城市| 永平县| 武冈市|