官术网_书友最值得收藏!

Summary

In this chapter, you learned both the power of tree-based learning methods for classification problems. Single trees, while easy to build and interpret, may not have the necessary predictive power for many of the problems that we're trying to solve. To improve on the predictive ability, we have the tools of random forest and gradient-boosted trees at our disposal. With random forest, hundreds or even thousands of trees are built and the results aggregated for an overall prediction. Each tree of the random forest is built using a sample of the data called bootstrapping as well as a sample of the predictive variables. As for gradient boosting, an initial, and a relatively small, tree is produced. After this initial tree is built, subsequent trees are produced based on the residuals/misclassifications. The intended result of such a technique is to build a series of trees that can improve on the weakness of the prior tree in the process, resulting in decreased bias and variance. We also saw that, in R, we can utilize random forests as an effective feature selection/reduction method.

While these methods are extremely powerful, they aren't some sort of nostrum in the world of machine learning. Different datasets require judgment on the part of the analyst as to which techniques are applicable. The techniques to be applied to the analysis, and the selection of the tuning parameters is equally important. This fine tuning can make all of the difference between a good predictive model and a great predictive model.

In the next chapter, we'll turn our attention to using R to build neural networks and deep learning models.

主站蜘蛛池模板: 建湖县| 怀化市| 罗甸县| 东明县| 海淀区| 铁力市| 炎陵县| 芦山县| 哈巴河县| 莲花县| 汉源县| 蕉岭县| 潍坊市| 徐州市| 溧阳市| 叶城县| 西乌| 刚察县| 阿勒泰市| 巧家县| 乐昌市| 偃师市| 清水河县| 威远县| 巫山县| 石台县| 海晏县| 应城市| 垣曲县| 类乌齐县| 沙洋县| 若羌县| 黄山市| 永川市| 阳城县| 鹤壁市| 岐山县| 乌拉特后旗| 乌拉特后旗| 许昌市| 铜梁县|