官术网_书友最值得收藏!

Train and test sets

To estimate the generalization error, we split our data into two parts: training data and testing data. A general rule of thumb is to split them by the training: testing ratio, that is, 70:30. We first train the predictor on the training data, then predict the values for the test data, and finally, compute the error, that is, the difference between the predicted and the true values. This gives us an estimate of the true generalization error.

The estimation is based on the two following assumptions: first, we assume that the test set is an unbiased sample from our dataset; and second, we assume that the actual new data will reassemble the distribution as our training and testing examples. The first assumption can be mitigated by cross-validation and stratification. Also, if it is scarce, one can't afford to leave out a considerable amount of data for a separate test set, as learning algorithms do not perform well if they don't receive enough data. In such cases, cross-validation is used instead.

主站蜘蛛池模板: 彩票| 陵川县| 安乡县| 隆尧县| 兰西县| 泊头市| 婺源县| 牡丹江市| 龙陵县| 慈利县| 虞城县| 静安区| 巴马| 阿瓦提县| 赤水市| 双城市| 凭祥市| 田林县| 铜山县| 贵港市| 偃师市| 达尔| 中西区| 辽阳市| 灵山县| 河北省| 巴塘县| 松溪县| 瓮安县| 香河县| 剑阁县| 郎溪县| 马尔康县| 马公市| 江安县| 昆明市| 甘孜县| 玉溪市| 夹江县| 大渡口区| 拉萨市|