官术网_书友最值得收藏!

Decision tree

Breiman and Quinlan mainly developed decision trees, which have evolved a lot since the 1980s. If the dependent variable is continuous, the decision tree will be a regression tree and if it is categorical variable, it will be a classification tree. Of course, we can have a survival tree as well. Decision trees will be the main model that will be the beneficiary of the ensemble technique, as will be seen throughout the book.

Consider the regression tree given in the following diagram. We can see that there are three input variables, which are Decision tree, and the output variable is Y. Strictly speaking, a decision tree will not display all the variables used to build the tree. In this tree structure, a decision tree is conventionally displayed upside down. We have four terminal nodes. If the condition Decision tree is satisfied, we move to the right side of the tree and conclude that the average Y value is 40. If the condition is not satisfied, we move to the left, and check whether Decision tree. If this condition is not satisfied, we move to the left side of the tree and conclude that the average Y value is 100. Upon the satisfactory meeting of this condition, we move to the right side and then if the categorical variable Decision tree, the average Y value would be 250, or 10 otherwise. This decision tree can be captured in the form of an equation too, as follows:

Decision tree
Decision tree
Decision tree

Figure 5: Regression tree

The statistician Terry Therneau developed the rpart R package.

Decision tree for hypothyroid classification

Using the rpart function from the rpart package, we build a classification tree for the same formula as the earlier partitioned data. The constructed tree can be visualized using the plot function, and the variable name is embossed on the tree with the text function. The equation of the fitted classification tree (see Figure Classification Tree for Hypothyroid) is the following:

Decision tree for hypothyroid classification

Prediction and accuracy is carried out in a similar way as mentioned earlier:

> CT_fit <- rpart(HT2_Formula,data=HT2_Train)
> plot(CT_fit,uniform=TRUE)
> text(CT_fit)
> CT_predict <- predict(CT_fit,newdata=HT2_TestX,type="class")
> CT_Accuracy <- sum(CT_predict==HT2_TestY)/nte
> CT_Accuracy
[1] 0.9874213836
Decision tree for hypothyroid classification

Figure 6: Classification tree for Hypothyroid

Consequently, the classification tree gives an accuracy of 98.74%, which is the best of the four models considered thus far. Next, we will consider the final model, support vector machines.

主站蜘蛛池模板: 永新县| 太康县| 洪江市| 襄汾县| 赣州市| 漯河市| 平阳县| 黎川县| 泰安市| 集贤县| 龙胜| 同心县| 都兰县| 安康市| 仁怀市| 睢宁县| 彩票| 芒康县| 昭平县| 会宁县| 海南省| 洛川县| 万载县| 济南市| 宁化县| 永新县| 鹰潭市| 洪泽县| 邢台市| 平乡县| 东乌珠穆沁旗| 六安市| 遂宁市| 思茅市| 苗栗市| 棋牌| 石首市| 辉县市| 鄂托克旗| 五原县| 兴义市|