官术网_书友最值得收藏!

Overfitting

Overfitting occurs when the model was so well trained that it fits the training data too perfectly and cannot handle new data. 

Say you have a unique predictor of an outcome and that the data follows a quadratic pattern:

  1. You fit a linear regression on that data , the predictions are weak. Your model is underfitting the data. There is a high error level on both the training error and the validation dataset.
  2. You add the square of the predictor in the model  and find that your model makes good predictions. The error on both the training and the validation datasets are equivalent and lower than for the simpler model.
  3. If you increase the number and power of polynomial features so that the model is now , you end up fitting the training data too closely. The model has a very low prediction error on the training dataset but is unable to predict anything on new data. The prediction error on the validation dataset remains high.

This is a case of overfitting.

The following graph shows an example of an overfitting model with regard to the previous quadratic dataset, by setting a high order for the polynomial regression (n = 16). The polynomial regression fits the training data so well it would be incapable of any predictions on new data whereas the quadratic model (n = 2) would be more robust:

The best way to detect overfitting is, therefore, to compare the prediction errors on the training and validation sets. A significant gap between the two errors implies overfitting. A way to prevent this overfitting from happening is to add constraints on the model. In machine learning, we use regularization.

主站蜘蛛池模板: 迁安市| 农安县| 乌兰浩特市| 永仁县| 河源市| 夏邑县| 山东省| 肥西县| 大港区| 堆龙德庆县| 东安县| 淮北市| 普兰县| 大石桥市| 泽库县| 海阳市| 蒙自县| 轮台县| 望奎县| 乐昌市| 堆龙德庆县| 赤城县| 平南县| 泸西县| 页游| 灵石县| 长葛市| 海丰县| 广水市| 绿春县| 凤庆县| 剑河县| 凤凰县| 咸丰县| 柘城县| 盖州市| 鲜城| 泰来县| 武城县| 莱西市| 天气|