官术网_书友最值得收藏!

Random forest

To greatly improve our model's predictive ability, we can produce numerous trees and combine the results. The random forest technique does this by applying two different tricks in model development. The first is the use of bootstrap aggregation, or bagging, as it's called.

In bagging, an individual tree is built on a random sample of the dataset, roughly two-thirds of the total observations (note that the remaining one-third is referred to as out-of-bag (oob)). This is repeated dozens or hundreds of times and the results are averaged. Each of these trees is grown and not pruned based on any error measure, and this means that the variance of each of these individual trees is high. However, by averaging the results, you can reduce the variance without increasing the bias.

The next thing that random forest brings to the table is that concurrently with the random sample of the data—that is, baggingit also takes a random sampling of the input features at each split. In the randomForest package, we'll use the default random number of the predictors that're sampled, which, for classification problems, is the square root of the total predictors, and for regression, is the total number of the predictors divided by three. The number of predictors the algorithm randomly chooses at each split can be changed via the model tuning process.

By doing this random sample of the features at each split and incorporating it into the methodology, you can mitigate the effect of a highly correlated predictor becoming the main driver in all of your bootstrapped trees, preventing you from reducing the variance that you hoped to achieve with bagging. The subsequent averaging of the trees that're less correlated to each other is more generalizable and robust to outliers than if you only performed bagging.

主站蜘蛛池模板: 阳江市| 广州市| 乌拉特后旗| 鄂州市| 河西区| 吉安市| 门头沟区| 雷州市| 清水河县| 彝良县| 延寿县| 蓬莱市| 三穗县| 天津市| 杭锦旗| 平和县| 阳春市| 晋州市| 枣强县| 永春县| 洛宁县| 商城县| 枞阳县| 阿尔山市| 东港市| 新绛县| 唐河县| 广德县| 故城县| 乌鲁木齐县| 武川县| 延庆县| 南丹县| 富锦市| 中西区| 上饶市| 泽州县| 丁青县| 资源县| 克什克腾旗| 湘西|