官术网_书友最值得收藏!

K-fold cross-validation

You've already seen a form of cross-validation before; holding out a portion of our data is the simplest form of cross- validation that we can have. While this is generally a good practice, it can sometimes leave important features out of the training set that can create poor performance when it comes time to test. To remedy this, we can take standard cross validation a step further with a technique called k-fold cross validation

In k-fold cross validation, our dataset is evenly divided in k event parts, chosen by the user. As a rule of thumb, generally you should stick to k = 5 or k = 10 for best performance. The model is then trained and tested k times over. During each training episode, one k segment of the data held out as a testing set and the other segments used as training. You can think of this like shuffling a deck of cards - each time we are taking one card out for testing, and leaving the rest for training. The total accuracy of the model and it's error is then the combination of all of the train/test episode that were conducted. 

There are some models, such as Logistic Regression and Support vector machines, which benefit from k-fold cross validation. Neural network models, such as the ones that we will be discussing in the coming chapter, also benefit from k-fold cross validation methods. Random Forest models like we described precedingly, on the other hand, do not require k-fold cross-validation. K-fold is used as a tuning and optimization method for balancing feature importances, and Random Forests already contain a measure of feature importance. 

主站蜘蛛池模板: 福泉市| 兴安县| 汕尾市| 双辽市| 淄博市| 玉山县| 会昌县| 桐城市| 建瓯市| 湾仔区| 疏附县| 来凤县| 宜丰县| 三江| 河津市| 卓尼县| 普格县| 台安县| 晋宁县| 汪清县| 文山县| 葫芦岛市| 蓬安县| 北安市| 和田县| 九台市| 石阡县| 疏附县| 兴隆县| 汝州市| 孝感市| 渭南市| 那坡县| 龙里县| 东兴市| 宾阳县| 通城县| 新龙县| 北海市| 娱乐| 河间市|