官术网_书友最值得收藏!

Generalization/true error

This is the second and more important type of error in data science. The whole purpose of building learning systems is the ability to get a smaller generalization error on the test set; in other words, to get the model to work well on a set of observation/samples that haven't been used in the training phase. If you still consider the class scenario from the previous section, you can think of generalization error as the ability to solve exam problems that weren’t necessarily similar to the problems you solved in the classroom to learn and get familiar with the subject. So, generalization performance is the model's ability to use the skills (parameters) that it learned in the training phase in order to correctly predict the outcome/output of unseen data.

In Figure 13, the light blue line represents the generalization error. You can see that as you increase the model complexity, the generalization error will be reduced, until some point when the model will start to lose its increasing power and the generalization error will decrease. This part of the curve where you get the generalization error to lose its increasing generalization power, is called overfitting.

The takeaway message from this section is to minimize the generalization error as much as you can.

主站蜘蛛池模板: 洛阳市| 芦溪县| 永寿县| 富裕县| 稷山县| 渭源县| 原阳县| 平和县| 全州县| 渑池县| 兴化市| 江陵县| 梓潼县| 察隅县| 闻喜县| 长宁区| 乌兰察布市| 台山市| 宝应县| 青田县| 黎城县| 赫章县| 黄冈市| 富裕县| 黄冈市| 南丹县| 喀喇| 洪洞县| 怀来县| 阳信县| 亚东县| 博客| 兴化市| 巴彦县| 通榆县| 滦平县| 甘肃省| 多伦县| 兴国县| 四子王旗| 卢氏县|