官术网_书友最值得收藏!

How do decision trees learn?

It's time to find out how decision trees actually learn in order to configure them. In the internal structure we just printed, the tree decided to use a petal width of 0.8 as its initial splitting decision. This was done because decision trees try to build the smallest possible tree using the following technique.

It went through all the features trying to find a feature (petal width, here) and a value within that feature (0.8, here) so that if we split all our training data into two parts (one for petal width ≤ 0.8, and one for petal width > 0.8), we get the purest split possible. In other words, it tries to find a condition where we can separate our classes as much as possible. Then, for each side, it iteratively tries to split the data further using the same technique.

Splitting criteria

If we onlyhad two classes, an ideal split would put members of one class on one side and members of the others on the other side. In our case, we succeeded in putting members of class 0 on one side and members of classes 1 and 2 on the other. Obviously, we are not always guaranteed to get such a pure split. As we can see in the other branches further down the tree, we always had a mix of samples from classes 1 and 2 on each side.

Having said that, we need a way to measure purity. We need a criterion based on if one split is purer than the other. There are two criteria that scikit-learn uses for classifiers' purity—gini and entropy—with the gini criterion as its default option. When it comes to decision tree regression, there are other criteria that we will come across later on.

Preventing overfitting

"If you look for perfection, you'll never be content."
– Leo Tolstoy

After the first split, the tree went on to try to separate between the remaining classes; the Versicolor and the Virginica irises. However, are we really sure that our training data is detailed enough to explain all the nuances that differentiate between the two classes? Isn't it possible that all those branches are driving the algorithm to learn things that happen to exist in the training data, but will not generalize well enough when faced with future data? Allowing a tree to grow so much results in what is called overfitting. The tree tries to perfectly fit the training data, forgetting that the data it may encounter in the future may be different. To prevent overfitting, the following settings may be used to limit the growth of a tree:

  • max_depth:This is the maximum depth a tree can get to. A lower number means that the tree will stop branching earlier. Setting it to None means that the tree will continue to grow until all the leaves are pure or until all the leaves contain fewer than the min_samples_split samples.
  • min_samples_split: The minimum number of samples needed in a level to allow further splitting there. A higher number means that the tree will stop branching earlier.
  • min_samples_leaf:The minimum number of samples needed in a level to allow it to become a leaf node. A leaf node is a node where there are no further splits and where decisions are made. A higher number may have the effect of smoothing the model, especially in regression.
One quick way to check for overfitting is to compare the classifier's accuracy on the test set to its accuracy on the training set. Having a much higher score for your training set compared to the test set is a sign of overfitting. A smaller and more pruned tree is recommended in this case.

If max_depth is not set at training time to limit the tree's growth, then alternatively, you can prune the tree after it has been built. Curious readers can check the cost_complexity_pruning_path() method of the decision tree and find out how to use it to prune an already-grown tree.

Predictions

At the end of the training process, nodes that aren't split any further are called leaf nodes. Within a leaf node, we may have five samples—four of them from class 1, one from class 2, and none from class 0. Then, at prediction time, if a sample ends up in the same leaf node, we can easily decide that the new sample belongs to class 1 since this leaf node had a 4:1 ratio of its training samples from class 1 compared to the other two classes.

When we make predictions on the test set, we can evaluate the classifier's accuracy versus the actual labels we have in the test set. Nevertheless, the manner in which we split our data may affect the reliability of the scores we get. In the next section, we will see how to get more reliable scores.

主站蜘蛛池模板: 炎陵县| 昌黎县| 丹凤县| 秦安县| 游戏| 东海县| 九龙城区| 台北市| 平江县| 铁岭市| 黑山县| 高尔夫| 仪陇县| 望江县| 日照市| 娱乐| 新泰市| 青州市| 黄大仙区| 莱阳市| 德昌县| 喀喇| 封丘县| 安康市| 鹰潭市| 古浪县| 泽州县| 平乐县| 金川县| 密云县| 全椒县| 永修县| 秀山| 牡丹江市| 义乌市| 延津县| 祁门县| 乐清市| 大竹县| 潼南县| 巫溪县|