- Machine Learning with Swift
- Alexander Sosnovshchenko
- 654字
- 2021-06-24 18:54:58
Combinatorial entropy
Information gain criterion is based on the Shannon entropy notion. The Shannon entropy is a very important topic in the information theory, physics, and other domains. Mathematically, it is expressed as:

Where i is a state of a system, N is a total number of possible states, and pi is a probability of the system being in the state i. Entropy describes the amount of uncertainty in the system. The more order you have in the system, the less entropy there is.
If you want to learn more about entropy, check the nice, interactive blog Entropy Explained, With Sheep by Aatish Bhatia at: https://aatishb.com/entropy/.
Let's show a simple example of how entropy can be useful for decision tree construction. For this, we'll simplify a task of alien creature classification, assuming that we can measure only one feature: body length. We have 10 individuals (= platyhog and
= rabbosaurus) with the following body lengths:

If we take one random individual from the group, it can be a platyhog with the probability of 0.6, or a rabbosaurus with the probability of 0.4. We have two states in this system for two outcomes. Let's calculate the entropy of it:

So, the amount of uncertainty in this dataset is 0.97. Is it a lot, or a little? We don't have anything yet to compare it with, so let's divide the set at the middle (> 5 meters), and calculate the entropy for both subsets:



H and H are now less than the original H. This demonstrates how you can reduce the entropy by splitting the dataset in the right place. This idea lies in the fundamentals of decision tree learning algorithms.
We can calculate how effectively we reduced the entropy by splitting the set using the Information Gain (IG) criterion:
Information Gain = Entropy(parent) - Weighted Sum of Entropy (Children), or:

q is a number of groups after splitting, Ni is a count of elements in the i-th group, N—is the total count of elements before split. In our example, q = 2, N = 10, and N1 = N2 = 5:

This means that asking the question Is the body length greater than 5? gives us an information gain of 0.61. Is it a lot, or a little? Let's compare it to the information loss of the split around length > 7:

Apparently, the choice of the middle point was lucky, because all other splits don't look promising. But you are free to check them if you want.
There is no sense to split the left part further, but we can continue splitting the right subset until entropy of each of its children will not be equal to zero (see Figure 2.9).
So, this is our decision tree, and a recursive algorithm for its building. But now comes an interesting question: how to know which split yields the maximal information gain? The simplest way is a greedy search: just check all possible variants.
Information gain is only one of the heuristics, there are more of them; for instance, in our scikit-learn decision tree learner, we used Gini impurity as a heuristic. According to the Michigan State University (http://www.cse.msu.edu/~cse802/DecisionTrees.pdf):
Check the documentation on the criterion property of DecisionTreeClassifier for more information about different heuristics available for tree learning in scikit-learn. In practice, Gini works very similarly to the information gain. A historical fact to dilute the theoretical exposition: Corrado Gini was an Italian statistician and the author of The Scientific Basis of Fascism (1927):

- Arduino入門基礎教程
- Intel FPGA/CPLD設計(基礎篇)
- Android NDK Game Development Cookbook
- 電腦維護與故障排除傻瓜書(Windows 10適用)
- 電腦軟硬件維修大全(實例精華版)
- SDL Game Development
- 電腦常見問題與故障排除
- Effective STL中文版:50條有效使用STL的經驗(雙色)
- 施耐德SoMachine控制器應用及編程指南
- 硬件產品經理成長手記(全彩)
- OUYA Game Development by Example
- 單片機系統設計與開發教程
- 筆記本電腦使用、維護與故障排除從入門到精通(第5版)
- FreeSWITCH Cookbook
- USB應用分析精粹:從設備硬件、固件到主機端程序設計