官术网_书友最值得收藏!

Cross-validation

Cross-validation is a method for assessing a data science process performance. Mainly used with predictive modeling to estimate how accurately a model might perform in practice, one might see cross-validation used to check how a model will potentially generalize, in other words, how the model can apply what it infers from samples to an entire population (or recordset).

With cross-validation, you identify a (known) dataset as your validation dataset on which training is run along with a dataset of unknown data (or first seen data) against which the model will be tested (this is known as your testing dataset). The objective is to ensure that problems such as overfitting (allowing non-inclusive information to influence results) are controlled and also provide an insight into how the model will generalize a real problem or on a real data file.

The cross-validation process will consist of separating data into samples of similar subsets, performing the analysis on one subset (called the training set) and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, multiple iterations (also called folds or rounds) of cross-validation are performed using different partitions, and the validation results are averaged over the rounds. Typically, a data scientist will use a models stability to determine the actual number of rounds of cross-validation that should be performed.

主站蜘蛛池模板: 赣榆县| 台山市| 铜川市| 旬阳县| 西贡区| 巴彦县| 邹平县| 峡江县| 蚌埠市| 辽阳县| 涞源县| 沙田区| 博乐市| 博白县| 湖口县| 建宁县| 固安县| 工布江达县| 梁山县| 兖州市| 芦山县| 枞阳县| 庄河市| 大姚县| 沂水县| 汉中市| 闸北区| 深水埗区| 大渡口区| 买车| 紫金县| 锡林郭勒盟| 天镇县| 鲁甸县| 仁怀市| 梅州市| 朝阳市| 色达县| 广宗县| 平顶山市| 闻喜县|