官术网_书友最值得收藏!

How do decision trees learn?

It's time to find out how decision trees actually learn in order to configure them. In the internal structure we just printed, the tree decided to use a petal width of 0.8 as its initial splitting decision. This was done because decision trees try to build the smallest possible tree using the following technique.

It went through all the features trying to find a feature (petal width, here) and a value within that feature (0.8, here) so that if we split all our training data into two parts (one for petal width ≤ 0.8, and one for petal width > 0.8), we get the purest split possible. In other words, it tries to find a condition where we can separate our classes as much as possible. Then, for each side, it iteratively tries to split the data further using the same technique.

Splitting criteria

If we onlyhad two classes, an ideal split would put members of one class on one side and members of the others on the other side. In our case, we succeeded in putting members of class 0 on one side and members of classes 1 and 2 on the other. Obviously, we are not always guaranteed to get such a pure split. As we can see in the other branches further down the tree, we always had a mix of samples from classes 1 and 2 on each side.

Having said that, we need a way to measure purity. We need a criterion based on if one split is purer than the other. There are two criteria that scikit-learn uses for classifiers' purity—gini and entropy—with the gini criterion as its default option. When it comes to decision tree regression, there are other criteria that we will come across later on.

Preventing overfitting

"If you look for perfection, you'll never be content."
– Leo Tolstoy

After the first split, the tree went on to try to separate between the remaining classes; the Versicolor and the Virginica irises. However, are we really sure that our training data is detailed enough to explain all the nuances that differentiate between the two classes? Isn't it possible that all those branches are driving the algorithm to learn things that happen to exist in the training data, but will not generalize well enough when faced with future data? Allowing a tree to grow so much results in what is called overfitting. The tree tries to perfectly fit the training data, forgetting that the data it may encounter in the future may be different. To prevent overfitting, the following settings may be used to limit the growth of a tree:

  • max_depth:This is the maximum depth a tree can get to. A lower number means that the tree will stop branching earlier. Setting it to None means that the tree will continue to grow until all the leaves are pure or until all the leaves contain fewer than the min_samples_split samples.
  • min_samples_split: The minimum number of samples needed in a level to allow further splitting there. A higher number means that the tree will stop branching earlier.
  • min_samples_leaf:The minimum number of samples needed in a level to allow it to become a leaf node. A leaf node is a node where there are no further splits and where decisions are made. A higher number may have the effect of smoothing the model, especially in regression.
One quick way to check for overfitting is to compare the classifier's accuracy on the test set to its accuracy on the training set. Having a much higher score for your training set compared to the test set is a sign of overfitting. A smaller and more pruned tree is recommended in this case.

If max_depth is not set at training time to limit the tree's growth, then alternatively, you can prune the tree after it has been built. Curious readers can check the cost_complexity_pruning_path() method of the decision tree and find out how to use it to prune an already-grown tree.

Predictions

At the end of the training process, nodes that aren't split any further are called leaf nodes. Within a leaf node, we may have five samples—four of them from class 1, one from class 2, and none from class 0. Then, at prediction time, if a sample ends up in the same leaf node, we can easily decide that the new sample belongs to class 1 since this leaf node had a 4:1 ratio of its training samples from class 1 compared to the other two classes.

When we make predictions on the test set, we can evaluate the classifier's accuracy versus the actual labels we have in the test set. Nevertheless, the manner in which we split our data may affect the reliability of the scores we get. In the next section, we will see how to get more reliable scores.

主站蜘蛛池模板: 中宁县| 康保县| 康定县| 奉新县| 桃园县| 新郑市| 汉源县| 广平县| 莆田市| 民乐县| 扎鲁特旗| 宁阳县| 马关县| 吉木萨尔县| 屯昌县| 舟曲县| 灵宝市| 贵定县| 邻水| 灵宝市| 凤台县| 乳源| 武山县| 宾川县| 乐业县| 安吉县| 抚远县| 靖边县| 乐至县| 新田县| 利津县| 柳林县| 林芝县| 长泰县| 库伦旗| 岳阳县| 于都县| 基隆市| 桃源县| 察雅县| 准格尔旗|