- R Machine Learning Projects
- Dr. Sunil Kumar Chinnamgari
- 255字
- 2021-07-02 14:23:08
Holdout sample
While working on a training dataset, a small portion of the data is kept aside for testing the performance of the models. The small portion of data is unseen data (not used in training), therefore one can rely on the measurements obtained for this data. The measurements obtained can be used to tune the parameters of the model or just to report out the performance of the model so as to set expectations in terms of what level of performance can be expected from the model.
It may be noted that the performance measurement reported out on the basis of a holdout sample is not as robust an estimate as that of a k-fold cross validation estimate. This is because there could be some unknown biases that could have crept in during the random split of the holdout set from the original dataset. Also, there are also no guarantees that the holdout dataset has a representation of all the classes involved in the training dataset. If we need representation of all classes in the holdout dataset, then a special technique called a stratified holdout sample needs to be applied. This ensures that there is representation for all classes in the holdout dataset. It is obvious that a performance measurement obtained from a stratified holdout sample is a better estimate of performance than that of the estimate of performance obtained from a nonstratified holdout sample.
70%-30%, 80%-20%, and 90%-10% are generally the sets of training data-holdout data splits observed in ML projects.
- 城市道路交通主動控制技術(shù)
- Learning Azure Cosmos DB
- 網(wǎng)絡(luò)安全技術(shù)及應(yīng)用
- Microsoft System Center Confi guration Manager
- Salesforce for Beginners
- 多媒體制作與應(yīng)用
- 人工智能:語言智能處理
- Flink原理與實踐
- INSTANT Puppet 3 Starter
- Python文本分析
- Microsoft Dynamics CRM 2013 Marketing Automation
- 貫通Java Web輕量級應(yīng)用開發(fā)
- 人工智能:重塑個人、商業(yè)與社會
- Microsoft 365 Mobility and Security:Exam Guide MS-101
- 運動控制系統(tǒng)應(yīng)用及實例解析