- Statistics for Data Science
- James D. Miller
- 224字
- 2021-07-02 14:58:55
Cross-validation
Cross-validation is a method for assessing a data science process performance. Mainly used with predictive modeling to estimate how accurately a model might perform in practice, one might see cross-validation used to check how a model will potentially generalize, in other words, how the model can apply what it infers from samples to an entire population (or recordset).
With cross-validation, you identify a (known) dataset as your validation dataset on which training is run along with a dataset of unknown data (or first seen data) against which the model will be tested (this is known as your testing dataset). The objective is to ensure that problems such as overfitting (allowing non-inclusive information to influence results) are controlled and also provide an insight into how the model will generalize a real problem or on a real data file.
The cross-validation process will consist of separating data into samples of similar subsets, performing the analysis on one subset (called the training set) and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, multiple iterations (also called folds or rounds) of cross-validation are performed using different partitions, and the validation results are averaged over the rounds. Typically, a data scientist will use a models stability to determine the actual number of rounds of cross-validation that should be performed.
- 大數據戰(zhàn)爭:人工智能時代不能不說的事
- 基于LPC3250的嵌入式Linux系統(tǒng)開發(fā)
- Google App Inventor
- Docker Quick Start Guide
- CompTIA Linux+ Certification Guide
- 可編程序控制器應用實訓(三菱機型)
- Storm應用實踐:實時事務處理之策略
- 基于單片機的嵌入式工程開發(fā)詳解
- 網站前臺設計綜合實訓
- Bayesian Analysis with Python
- Salesforce Advanced Administrator Certification Guide
- 基于敏捷開發(fā)的數據結構研究
- 大數據案例精析
- 工業(yè)機器人集成應用
- Puppet 3 Beginner’s Guide