官术网_书友最值得收藏!

Random forest

To greatly improve our model's predictive ability, we can produce numerous trees and combine the results. The random forest technique does this by applying two different tricks in model development. The first is the use of bootstrap aggregation, or bagging, as it's called.

In bagging, an individual tree is built on a random sample of the dataset, roughly two-thirds of the total observations (note that the remaining one-third is referred to as out-of-bag (oob)). This is repeated dozens or hundreds of times and the results are averaged. Each of these trees is grown and not pruned based on any error measure, and this means that the variance of each of these individual trees is high. However, by averaging the results, you can reduce the variance without increasing the bias.

The next thing that random forest brings to the table is that concurrently with the random sample of the data—that is, baggingit also takes a random sampling of the input features at each split. In the randomForest package, we'll use the default random number of the predictors that're sampled, which, for classification problems, is the square root of the total predictors, and for regression, is the total number of the predictors divided by three. The number of predictors the algorithm randomly chooses at each split can be changed via the model tuning process.

By doing this random sample of the features at each split and incorporating it into the methodology, you can mitigate the effect of a highly correlated predictor becoming the main driver in all of your bootstrapped trees, preventing you from reducing the variance that you hoped to achieve with bagging. The subsequent averaging of the trees that're less correlated to each other is more generalizable and robust to outliers than if you only performed bagging.

主站蜘蛛池模板: 连平县| 奉新县| 天水市| 建德市| 东山县| 晋江市| 胶南市| 元氏县| 阿拉尔市| 兴国县| 乌兰县| 黄平县| 施秉县| 山阳县| 安仁县| 贵阳市| 游戏| 瓦房店市| 沙雅县| 东城区| 明溪县| 峡江县| 邵阳县| 白朗县| 元氏县| 鲜城| 遂昌县| 新乡县| 神木县| 象山县| 阿城市| 自治县| 汉中市| 肇源县| 聂荣县| 玛沁县| 高陵县| 临朐县| 渝北区| 和田市| 凌海市|