- Effective Amazon Machine Learning
- Alexis Perrier
- 289字
- 2021-07-03 00:17:52
Overfitting
Overfitting occurs when the model was so well trained that it fits the training data too perfectly and cannot handle new data.
Say you have a unique predictor of an outcome and that the data follows a quadratic pattern:
- You fit a linear regression on that data
, the predictions are weak. Your model is underfitting the data. There is a high error level on both the training error and the validation dataset.
- You add the square of the predictor in the model
and find that your model makes good predictions. The error on both the training and the validation datasets are equivalent and lower than for the simpler model.
- If you increase the number and power of polynomial features so that the model is now
, you end up fitting the training data too closely. The model has a very low prediction error on the training dataset but is unable to predict anything on new data. The prediction error on the validation dataset remains high.
This is a case of overfitting.
The following graph shows an example of an overfitting model with regard to the previous quadratic dataset, by setting a high order for the polynomial regression (n = 16). The polynomial regression fits the training data so well it would be incapable of any predictions on new data whereas the quadratic model (n = 2) would be more robust:

The best way to detect overfitting is, therefore, to compare the prediction errors on the training and validation sets. A significant gap between the two errors implies overfitting. A way to prevent this overfitting from happening is to add constraints on the model. In machine learning, we use regularization.
- Python絕技:運用Python成為頂級數據工程師
- 從零開始學Hadoop大數據分析(視頻教學版)
- 區塊鏈通俗讀本
- 智慧的云計算
- Gideros Mobile Game Development
- Mastering ROS for Robotics Programming(Second Edition)
- 算法設計與分析
- Node.js High Performance
- Unity Game Development Blueprints
- Microsoft Dynamics NAV 2015 Professional Reporting
- 區塊鏈應用開發指南:業務場景剖析與實戰
- Python金融數據挖掘與分析實戰
- Learn Selenium
- SQL Server 數據庫教程(2008版)
- Hadoop海量數據處理:技術詳解與項目實戰(第2版)