官术网_书友最值得收藏!

Size of the training, development, and test set

Typically, machine learning practitioners choose the size of the three sets in the ratio of 60:20:20 or 70:15:15. However, there is no hard and fast rule that states that the development and test sets should be of equal size. The following diagram shows the different sizes of the training, development, and test sets:

Another example of the three different sets is as follows:

But what about the scenarios where we have big data to deal with? For example, if we have 10,000,000 records or observations, how would we partition the data? In such a scenario, ML practitioners take most of the data for the training set—as much as 98-99%—and the rest gets divided up for the development and test sets. This is done so that the practitioner can take different kinds of scenarios into account. So, even if we have 1% of data for development and the same for the test test, we will end up with 100,000 records each, and that is a good number.

主站蜘蛛池模板: 宁远县| 通化市| 鄂伦春自治旗| 广德县| 定襄县| 菏泽市| 邢台市| 清水县| 龙陵县| 辛集市| 阿拉善左旗| 监利县| 双流县| 遂溪县| 荔浦县| 天祝| 永新县| 静乐县| 喀什市| 揭东县| 秦皇岛市| 名山县| 福海县| 大冶市| 阜宁县| 密云县| 达尔| 溧水县| 越西县| 明星| 新绛县| 丹东市| 上林县| 隆尧县| 芦溪县| 高州市| 新晃| 开封县| 临清市| 区。| 大余县|