官术网_书友最值得收藏!

Overfitting and underfitting

Overfitting is a phenomenon that happens when an algorithm learns it's training data too well to the point where it cannot accurately predict on new data. Models that overfit learn the small, intricate details of their training set and don't generalize well. For analogy, think about it as if you were learning a new language. Instead of learning the general form of the language, say Spanish, you've learned to perfect a local version of it from a remote part of South America, including all of the local slang. If you went to Spain and tried to speak that version of Spanish, you would probably get some puzzled looks from the locals! Underfitting would be exact opposite of this; you didn't study enough Spanish, and so you do not have enough knowledge to communicate effectively. From a modeling standpoint, an underfit model is not complex enough to generalize to new data.

Overfitting and underfitting are tried to a machine learning phenomenon known as the bias/variance tradeoff:

  • Bias is the error that your model learns as it tries to approximately predict things. Understanding that models are simplified versions of reality, bias in a model is the error that develops from trying to create this simplified version.
  • Variance is the degree to which your error changes based on variations in the input data. It measures your model's sensitivity to the intricacies of the input data. 

The way to mitigate bias is to increase the complexity of a model, although this will increase variance and will lead to overfitting. To mitigate variance on the other hand, we could make our model to generalize well by reducing complexing, although this would lead to higher bias. As you can see, we cannot have a both low bias and low variance at the same time! A good model will be balanced between it's bias and variance. There are two ways to combat overfitting; cross-validation and regularization. We will touch upon cross-validation methods now, and come back to regularization in Chapter 4Your First Artificial Neural Network when we begin to build our first ANNs. 

主站蜘蛛池模板: 汕尾市| 青铜峡市| 石景山区| 德惠市| 丘北县| 资溪县| 鲜城| 普陀区| 六安市| 杨浦区| 杭州市| 砀山县| 二连浩特市| 大竹县| 贞丰县| 大悟县| 卫辉市| 龙井市| 邳州市| 斗六市| 榆林市| 广东省| 宿州市| 阿巴嘎旗| 建始县| 侯马市| 丁青县| 崇义县| 葵青区| 苏尼特右旗| 安宁市| 大名县| 青浦区| 菏泽市| 达孜县| 三河市| 乐安县| 宜章县| 渭南市| 建阳市| 龙里县|