- Advanced Machine Learning with R
- Cory Lesmeister Dr. Sunil Kumar Chinnamgari
- 310字
- 2021-06-24 14:24:43
Classification trees
Classification trees operate under the same principle as regression trees, except that the splits aren't determined by the RSS but by an error rate. The error rate used isn't what you would expect where the calculation is simply the misclassified observations divided by the total observations. As it turns out, when it comes to tree-splitting, a misclassification rate, by itself, may lead to a situation where you can gain information with a further split but not improve the misclassification rate. Let's look at an example.
Suppose we have a node, let's call it N0, where you have seven observations labeled No and three observations labeled Yes. We can say that the misclassified rate is 30%. With this in mind, let's calculate a common alternative error measure called the Gini index. The formula for a single node Gini index is as follows:

Then, for N0, the Gini is 1 - (.7)2 - (.3)2, which is equal to 0.42, versus the misclassification rate of 30%.
Taking this example further, we'll now create node N1 with three observations from Class 1 and none from Class 2, along with N2, which has four observations from Class 1 and three from Class 2. Now, the overall misclassification rate for this branch of the tree is still 30%, but look at how the overall Gini index has improved:
- Gini(N1) = 1 - (3/3)2 - (0/3)2 = 0
- Gini(N2) = 1 - (4/7)2 - (3/7)2 = 0.49
- New Gini index = (proportion of N1 x Gini(N1)) + (proportion of N2 x Gini(N2)), which is equal to (0.3 x 0) + (0.7 x 0.49) or 0.343
By doing a split on a surrogate error rate, we actually improved our model impurity, reducing it from 0.42 to 0.343, whereas the misclassification rate didn't change. This is the methodology that's used by the rpart() package, which we'll be using in this chapter.
- Raspberry Pi 3 Cookbook for Python Programmers
- Windows phone 7.5 application development with F#
- 計算機應用與維護基礎教程
- Camtasia Studio 8:Advanced Editing and Publishing Techniques
- Spring Cloud微服務架構實戰
- Wireframing Essentials
- FL Studio Cookbook
- 單片微機原理及應用
- Spring Security 3.x Cookbook
- Mastering Machine Learning on AWS
- 微服務實戰(Dubbox +Spring Boot+Docker)
- INSTANT Cinema 4D Starter
- 新編計算機組裝與維護
- Unreal Engine 4 AI Programming Essentials
- 現場總線技術及應用