- Statistics for Data Science
- James D. Miller
- 224字
- 2021-07-02 14:58:55
Cross-validation
Cross-validation is a method for assessing a data science process performance. Mainly used with predictive modeling to estimate how accurately a model might perform in practice, one might see cross-validation used to check how a model will potentially generalize, in other words, how the model can apply what it infers from samples to an entire population (or recordset).
With cross-validation, you identify a (known) dataset as your validation dataset on which training is run along with a dataset of unknown data (or first seen data) against which the model will be tested (this is known as your testing dataset). The objective is to ensure that problems such as overfitting (allowing non-inclusive information to influence results) are controlled and also provide an insight into how the model will generalize a real problem or on a real data file.
The cross-validation process will consist of separating data into samples of similar subsets, performing the analysis on one subset (called the training set) and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, multiple iterations (also called folds or rounds) of cross-validation are performed using different partitions, and the validation results are averaged over the rounds. Typically, a data scientist will use a models stability to determine the actual number of rounds of cross-validation that should be performed.
- Hands-On Graph Analytics with Neo4j
- 大數(shù)據(jù)技術(shù)與應(yīng)用基礎(chǔ)
- Natural Language Processing Fundamentals
- Apache Hive Essentials
- 電腦上網(wǎng)直通車
- 21天學(xué)通C++
- 讓每張照片都成為佳作的Photoshop后期技法
- 系統(tǒng)安裝與重裝
- 貫通Java Web開發(fā)三劍客
- HTML5 Canvas Cookbook
- DevOps Bootcamp
- Photoshop CS4數(shù)碼攝影處理50例
- 空間機器人
- MongoDB 4 Quick Start Guide
- 精通ROS機器人編程(原書第2版)