- Hands-On Ensemble Learning with R
- Prabhanjan Narayanachar Tattar
- 382字
- 2021-07-23 19:10:53
Decision tree
Breiman and Quinlan mainly developed decision trees, which have evolved a lot since the 1980s. If the dependent variable is continuous, the decision tree will be a regression tree and if it is categorical variable, it will be a classification tree. Of course, we can have a survival tree as well. Decision trees will be the main model that will be the beneficiary of the ensemble technique, as will be seen throughout the book.
Consider the regression tree given in the following diagram. We can see that there are three input variables, which are , and the output variable is Y. Strictly speaking, a decision tree will not display all the variables used to build the tree. In this tree structure, a decision tree is conventionally displayed upside down. We have four terminal nodes. If the condition
is satisfied, we move to the right side of the tree and conclude that the average Y value is 40. If the condition is not satisfied, we move to the left, and check whether
. If this condition is not satisfied, we move to the left side of the tree and conclude that the average Y value is 100. Upon the satisfactory meeting of this condition, we move to the right side and then if the categorical variable
, the average Y value would be 250, or 10 otherwise. This decision tree can be captured in the form of an equation too, as follows:



Figure 5: Regression tree
The statistician Terry Therneau developed the rpart
R package.
Decision tree for hypothyroid classification
Using the rpart
function from the rpart
package, we build a classification tree for the same formula as the earlier partitioned data. The constructed tree can be visualized using the plot function, and the variable name is embossed on the tree with the text function. The equation of the fitted classification tree (see Figure Classification Tree for Hypothyroid) is the following:

Prediction and accuracy is carried out in a similar way as mentioned earlier:
> CT_fit <- rpart(HT2_Formula,data=HT2_Train) > plot(CT_fit,uniform=TRUE) > text(CT_fit) > CT_predict <- predict(CT_fit,newdata=HT2_TestX,type="class") > CT_Accuracy <- sum(CT_predict==HT2_TestY)/nte > CT_Accuracy [1] 0.9874213836

Figure 6: Classification tree for Hypothyroid
Consequently, the classification tree gives an accuracy of 98.74%, which is the best of the four models considered thus far. Next, we will consider the final model, support vector machines.
- 大學計算機基礎(chǔ):基礎(chǔ)理論篇
- Mastering Spark for Data Science
- 21天學通Java
- C語言開發(fā)技術(shù)詳解
- 數(shù)據(jù)通信與計算機網(wǎng)絡(luò)
- PostgreSQL 10 Administration Cookbook
- LAMP網(wǎng)站開發(fā)黃金組合Linux+Apache+MySQL+PHP
- Excel 2010函數(shù)與公式速查手冊
- INSTANT Puppet 3 Starter
- C++程序設(shè)計基礎(chǔ)(上)
- 傳感器與自動檢測
- ZigBee無線通信技術(shù)應(yīng)用開發(fā)
- 筆記本電腦使用與維護
- Spark Streaming實時流式大數(shù)據(jù)處理實戰(zhàn)
- 暗戰(zhàn)強人:黑客攻防入門全程圖解