官术网_书友最值得收藏!

Cross-validation

Cross-validation is a method for assessing a data science process performance. Mainly used with predictive modeling to estimate how accurately a model might perform in practice, one might see cross-validation used to check how a model will potentially generalize, in other words, how the model can apply what it infers from samples to an entire population (or recordset).

With cross-validation, you identify a (known) dataset as your validation dataset on which training is run along with a dataset of unknown data (or first seen data) against which the model will be tested (this is known as your testing dataset). The objective is to ensure that problems such as overfitting (allowing non-inclusive information to influence results) are controlled and also provide an insight into how the model will generalize a real problem or on a real data file.

The cross-validation process will consist of separating data into samples of similar subsets, performing the analysis on one subset (called the training set) and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, multiple iterations (also called folds or rounds) of cross-validation are performed using different partitions, and the validation results are averaged over the rounds. Typically, a data scientist will use a models stability to determine the actual number of rounds of cross-validation that should be performed.

主站蜘蛛池模板: 海盐县| 喜德县| 西乌珠穆沁旗| 奎屯市| 奉贤区| 张家港市| 安溪县| 苍南县| 买车| 永善县| 大冶市| 望奎县| 错那县| 开化县| 武功县| 拜城县| 玉环县| 祁东县| 策勒县| 陈巴尔虎旗| 油尖旺区| 龙岩市| 连城县| 兴业县| 大厂| 乳山市| 江川县| 西畴县| 铁岭县| 芮城县| 府谷县| 广西| 林周县| 都兰县| 昌宁县| 南部县| 惠水县| 华宁县| 鹤峰县| 吉首市| 高州市|