官术网_书友最值得收藏!

K-fold cross-validation

You've already seen a form of cross-validation before; holding out a portion of our data is the simplest form of cross- validation that we can have. While this is generally a good practice, it can sometimes leave important features out of the training set that can create poor performance when it comes time to test. To remedy this, we can take standard cross validation a step further with a technique called k-fold cross validation

In k-fold cross validation, our dataset is evenly divided in k event parts, chosen by the user. As a rule of thumb, generally you should stick to k = 5 or k = 10 for best performance. The model is then trained and tested k times over. During each training episode, one k segment of the data held out as a testing set and the other segments used as training. You can think of this like shuffling a deck of cards - each time we are taking one card out for testing, and leaving the rest for training. The total accuracy of the model and it's error is then the combination of all of the train/test episode that were conducted. 

There are some models, such as Logistic Regression and Support vector machines, which benefit from k-fold cross validation. Neural network models, such as the ones that we will be discussing in the coming chapter, also benefit from k-fold cross validation methods. Random Forest models like we described precedingly, on the other hand, do not require k-fold cross-validation. K-fold is used as a tuning and optimization method for balancing feature importances, and Random Forests already contain a measure of feature importance. 

主站蜘蛛池模板: 靖州| 浦城县| 宣威市| 含山县| 西平县| 永靖县| 绥阳县| 肇东市| 西峡县| 山阳县| 商河县| 文化| 聂拉木县| 白城市| 乐平市| 海口市| 乌兰浩特市| 丰台区| 彩票| 娄底市| 皋兰县| 措勤县| 尼木县| 道真| 郧西县| 繁昌县| 临猗县| 玉田县| 宝清县| 枣强县| 桃园县| 和田县| 镇巴县| 麻城市| 洪雅县| 调兵山市| 油尖旺区| 永平县| 永修县| 大竹县| 房产|