官术网_书友最值得收藏!

K-fold cross-validation

You've already seen a form of cross-validation before; holding out a portion of our data is the simplest form of cross- validation that we can have. While this is generally a good practice, it can sometimes leave important features out of the training set that can create poor performance when it comes time to test. To remedy this, we can take standard cross validation a step further with a technique called k-fold cross validation

In k-fold cross validation, our dataset is evenly divided in k event parts, chosen by the user. As a rule of thumb, generally you should stick to k = 5 or k = 10 for best performance. The model is then trained and tested k times over. During each training episode, one k segment of the data held out as a testing set and the other segments used as training. You can think of this like shuffling a deck of cards - each time we are taking one card out for testing, and leaving the rest for training. The total accuracy of the model and it's error is then the combination of all of the train/test episode that were conducted. 

There are some models, such as Logistic Regression and Support vector machines, which benefit from k-fold cross validation. Neural network models, such as the ones that we will be discussing in the coming chapter, also benefit from k-fold cross validation methods. Random Forest models like we described precedingly, on the other hand, do not require k-fold cross-validation. K-fold is used as a tuning and optimization method for balancing feature importances, and Random Forests already contain a measure of feature importance. 

主站蜘蛛池模板: 多伦县| 昭苏县| 台安县| 刚察县| 临邑县| 吴江市| 竹北市| 海安县| 绥棱县| 海城市| 岑溪市| 巧家县| 莒南县| 同江市| 曲麻莱县| 启东市| 荆州市| 灵丘县| 呼和浩特市| 堆龙德庆县| 普兰店市| 榆中县| 朝阳县| 汾阳市| 平顺县| 左权县| 年辖:市辖区| 尤溪县| 太仆寺旗| 兴文县| 韶山市| 富民县| 额济纳旗| 五原县| 托里县| 子长县| 台前县| 遂昌县| 浪卡子县| 钦州市| 射阳县|