官术网_书友最值得收藏!

Decision tree learning pros and cons

Advantages:

  • Easy to understand and interpret, perfect for visual representation. This is an example of a white box model, which closely mimics the human decision-making process.
  • Can work with numerical and categorical features.
  • Requires little data preprocessing: no need for one-hot encoding, dummy variables, and so on.
  • Non-parametric model: no assumptions about the shape of data.
  • Fast for inference.
  • Feature selection happens automatically: unimportant features will not influence the result. The presence of features that depend on each other (multicollinearity) also doesn't affect the quality.

Disadvantages:

  • It tends to overfit. This usually can be mitigated in one of three ways:
    • Limiting tree depth
    • Setting the minimal number of objects in leaves
    • Tree pruning by deleting unimportant splits moving from the leaves to the root
  • It is unstable—small changes in data can dramatically affect the structure of the tree and the final prediction.
  • The problem with finding the globally optimal decision tree is NP-complete. That's why we use different heuristics and greedy search. Unfortunately, this approach doesn't guarantee learning the globally best tree, only locally optimal ones.
  • Inflexible, in the sense that you can't incorporate a new data into them easily. If you obtained new labeled data, you should retrain the tree from scratch on the whole dataset. This makes decision trees a poor choice for any applications that require dynamic model adjustment.
主站蜘蛛池模板: 绿春县| 吉林省| 莒南县| 商丘市| 永登县| 辽阳县| 海淀区| 盱眙县| 若尔盖县| 泸西县| 仙居县| 东源县| 得荣县| 喀喇| 康乐县| 贵定县| 肃南| 陆川县| 余姚市| 酒泉市| 海丰县| 和平区| 铜陵市| 莱西市| 图木舒克市| 屯昌县| 黄浦区| 陆河县| 临桂县| 东方市| 南溪县| 富顺县| 商洛市| 古蔺县| 田阳县| 全椒县| 扎兰屯市| 寿阳县| 敦煌市| 永福县| 潜江市|