官术网_书友最值得收藏!

Comparing underfitting and overfitting

In the preceding list, step 4 implies an iterative process where we try models, parameters, and features until we get the best result that we can. Let's now think about a classification problem, where we want to separate squares from circles, as shown in the following diagram. At the beginning of the process, we will probably be in a situation that is similar to the first chart (on the left-hand side). The model fails to efficiently separate the two shapes and both sides are a mixture of both squares and circles. This is called underfitting and refers to a model that fails to represent the characteristics of the dataset:

As we continue tuning parameters and adjusting the model to the training dataset, we might find ourselves in a situation that is similar to the third chart (on the right-hand side). The model accurately splits the dataset, leaving only one shape on each side of the border line. Even if this seems correct, it completely lacks generalization. The result adjusts so well to the training data that it will be completely wrong to we test it against a different dataset. This problem is called overfitting.

To solve the problem of overfitting in our model, we need to increase its adaptability. However, making it too flexible can also make it bad at predicting. To avoid this, the usual solution is to use regularization techniques. There are many similar techniques that can be found in specialized literature, but they are beyond the scope of this book.

The center chart shows a more flexible model; it represents the dataset, but is general enough to deal with new, previously unseen data. It is often time-consuming and it can be difficult to get the right balance in order to build a good machine learning model.

主站蜘蛛池模板: 鲁甸县| 北票市| 榆林市| 霍城县| 惠州市| 阿拉善右旗| 巴彦淖尔市| 洛阳市| 长武县| 汤阴县| 大姚县| 凌源市| 遂昌县| 邵阳市| 石渠县| 铜梁县| 南澳县| 城口县| 民权县| 泽库县| 平罗县| 乌鲁木齐市| 平顺县| 郧西县| 肇东市| 葫芦岛市| 明溪县| 中西区| 邹城市| 富蕴县| 安庆市| 宁夏| 台江县| 岐山县| 龙岩市| 昌吉市| 潞城市| 桐乡市| 宜兰县| 河源市| 祁门县|