官术网_书友最值得收藏!

Overfitting

Overfitting occurs when the model was so well trained that it fits the training data too perfectly and cannot handle new data. 

Say you have a unique predictor of an outcome and that the data follows a quadratic pattern:

  1. You fit a linear regression on that data , the predictions are weak. Your model is underfitting the data. There is a high error level on both the training error and the validation dataset.
  2. You add the square of the predictor in the model  and find that your model makes good predictions. The error on both the training and the validation datasets are equivalent and lower than for the simpler model.
  3. If you increase the number and power of polynomial features so that the model is now , you end up fitting the training data too closely. The model has a very low prediction error on the training dataset but is unable to predict anything on new data. The prediction error on the validation dataset remains high.

This is a case of overfitting.

The following graph shows an example of an overfitting model with regard to the previous quadratic dataset, by setting a high order for the polynomial regression (n = 16). The polynomial regression fits the training data so well it would be incapable of any predictions on new data whereas the quadratic model (n = 2) would be more robust:

The best way to detect overfitting is, therefore, to compare the prediction errors on the training and validation sets. A significant gap between the two errors implies overfitting. A way to prevent this overfitting from happening is to add constraints on the model. In machine learning, we use regularization.

主站蜘蛛池模板: 北辰区| 团风县| 德安县| 武冈市| 周宁县| 新和县| 祁东县| 康乐县| 南平市| 江口县| 潜山县| 鲁甸县| 达拉特旗| 甘孜县| 新营市| 上虞市| 南丹县| 奇台县| 丹东市| 海林市| 永修县| 芜湖市| 澄迈县| 兰州市| 方山县| 衢州市| 宁安市| 乡宁县| 大宁县| 简阳市| 元阳县| 浏阳市| 石狮市| 阿拉善右旗| 仙居县| 兴海县| 丘北县| 尤溪县| 屏东县| 利川市| 紫云|