- Machine Learning Quick Reference
- Rahul Kumar
- 178字
- 2021-08-20 10:05:06
Size of the training, development, and test set
Typically, machine learning practitioners choose the size of the three sets in the ratio of 60:20:20 or 70:15:15. However, there is no hard and fast rule that states that the development and test sets should be of equal size. The following diagram shows the different sizes of the training, development, and test sets:

Another example of the three different sets is as follows:

But what about the scenarios where we have big data to deal with? For example, if we have 10,000,000 records or observations, how would we partition the data? In such a scenario, ML practitioners take most of the data for the training set—as much as 98-99%—and the rest gets divided up for the development and test sets. This is done so that the practitioner can take different kinds of scenarios into account. So, even if we have 1% of data for development and the same for the test test, we will end up with 100,000 records each, and that is a good number.
- Hands-On Machine Learning on Google Cloud Platform
- Visual FoxPro 6.0數據庫與程序設計
- Learning Apache Cassandra(Second Edition)
- Pig Design Patterns
- 控制系統計算機仿真
- Moodle Course Design Best Practices
- 嵌入式操作系統
- 信息物理系統(CPS)測試與評價技術
- 內模控制及其應用
- RedHat Linux用戶基礎
- Mastering GitLab 12
- 嵌入式Linux系統實用開發
- 網絡脆弱性掃描產品原理及應用
- MPC5554/5553微處理器揭秘
- 貫通Hibernate開發