官术网_书友最值得收藏!

Parameters in decision trees

One of the most important parameters for a Decision Tree is the stopping criterion. When the tree building is nearly completed, the final few decisions can often be somewhat arbitrary and rely on only a small number of samples to make their decision. Using such specific nodes can result in trees that significantly overfit the training data. Instead, a stopping criterion can be used to ensure that the Decision Tree does not reach this exactness.

Instead of using a stopping criterion, the tree could be created in full and then trimmed. This trimming process removes nodes that do not provide much information to the overall process. This is known as pruning and results in a model that generally does better on new datasets because it hasn't overfitted the training data.

The decision tree implementation in scikit-learn provides a method to stop the building of a tree using the following options:

  • min_samples_split: This specifies how many samples are needed in order to create a new node in the Decision Tree
  • min_samples_leaf: This specifies how many samples must be resulting from a node for it to stay

The first dictates whether a decision node will be created, while the second dictates whether a decision node will be kept.

Another parameter for decision trees is the criterion for creating a decision. Gini impurity and information gain are two popular options for this parameter:

  • Gini impurity: This is a measure of how often a decision node would incorrectly predict a sample's class
  • Information gain: This uses information-theory-based entropy to indicate how much extra information is gained by the decision node

These parameter values do approximately the same thing--decide which rule and value to use to split a node into subnodes. The value itself is simply which metric to use to determine that split, however this can make a significant impact on the final models.

主站蜘蛛池模板: 会泽县| 柯坪县| 海安县| 绩溪县| 新乡市| 宜宾市| 昭觉县| 唐河县| 岳西县| 迁安市| 迁西县| 九寨沟县| 乡城县| 无为县| 涿州市| 西林县| 淳安县| 彩票| 毕节市| 大英县| 渭南市| 河北区| 兖州市| 迭部县| 射洪县| 新蔡县| 罗源县| 玉溪市| 桂林市| 茌平县| 隆化县| 灵璧县| 双鸭山市| 辰溪县| 平安县| 宜宾县| 杭州市| 抚松县| 清苑县| 南川市| 若羌县|