官术网_书友最值得收藏!

Apparent (training set) error

This the first type of error that you don't have to care about minimizing. Getting a small value for this type of error doesn't mean that your model will work well over the unseen data (generalize). To better understand this type of error, we'll give a trivial example of a class scenario. The purpose of solving problems in the classroom is not to be able to solve the same problem again in the exam, but to be able to solve other problems that won’t necessarily be similar to the ones you practiced in the classroom. The exam problems could be from the same family of the classroom problems, but not necessarily identical.

Apparent error is the ability of the trained model to perform on the training set for which we already know the true outcome/output. If you manage to get 0 error over the training set, then it is a good indicator for you that your model (mostly) won't work well on unseen data (won't generalize). On the other hand, data science is about using a training set as a base knowledge for the learning algorithm to work well on future unseen data.

In Figure 3, the red curve represents the apparent error. Whenever you increase the model's ability to memorize things (such as increasing the model complexity by increasing the number of explanatory features), you will find that this apparent error approaches zero. It can be shown that if you have as many features as observations/samples, then the apparent error will be zero:

Figure 13: Apparent error (red curve) and generalization/true error (light blue)
主站蜘蛛池模板: 扬中市| 丹江口市| 邢台县| 肇庆市| 兴安县| 台北县| 南召县| 张家川| 新晃| 汝阳县| 祁连县| 米易县| 商河县| 榆社县| 潮州市| 铜鼓县| 松阳县| 商洛市| 焦作市| 莫力| 南城县| 西林县| 正宁县| 阿坝| 乾安县| 英山县| 定日县| 含山县| 镇沅| 原平市| 永嘉县| 江永县| 七台河市| 安龙县| 法库县| 闵行区| 呼玛县| 遵化市| 赞皇县| 温宿县| 弥渡县|