官术网_书友最值得收藏!

Apparent (training set) error

This the first type of error that you don't have to care about minimizing. Getting a small value for this type of error doesn't mean that your model will work well over the unseen data (generalize). To better understand this type of error, we'll give a trivial example of a class scenario. The purpose of solving problems in the classroom is not to be able to solve the same problem again in the exam, but to be able to solve other problems that won’t necessarily be similar to the ones you practiced in the classroom. The exam problems could be from the same family of the classroom problems, but not necessarily identical.

Apparent error is the ability of the trained model to perform on the training set for which we already know the true outcome/output. If you manage to get 0 error over the training set, then it is a good indicator for you that your model (mostly) won't work well on unseen data (won't generalize). On the other hand, data science is about using a training set as a base knowledge for the learning algorithm to work well on future unseen data.

In Figure 3, the red curve represents the apparent error. Whenever you increase the model's ability to memorize things (such as increasing the model complexity by increasing the number of explanatory features), you will find that this apparent error approaches zero. It can be shown that if you have as many features as observations/samples, then the apparent error will be zero:

Figure 13: Apparent error (red curve) and generalization/true error (light blue)
主站蜘蛛池模板: 深水埗区| 防城港市| 梁河县| 泉州市| 上饶县| 棋牌| 孙吴县| 镇原县| 大石桥市| 道孚县| 海门市| 东乡县| 鄯善县| 青河县| 定陶县| 利川市| 通山县| 寻乌县| 乾安县| 如皋市| 鄂尔多斯市| 屏山县| 屏山县| 江西省| 东宁县| 正定县| 运城市| 沙坪坝区| 西安市| 龙门县| 齐齐哈尔市| 岫岩| 来宾市| 长海县| 天柱县| 龙井市| 玉溪市| 七台河市| 大关县| 库伦旗| 团风县|