官术网_书友最值得收藏!

Tuning hyperparameters

The simplest way to simplify the decision tree is to limit its depth. How deep is it now? You can see 20 splits, or 21 layers, in Figure 2.5. At the same time, we have only three features. There are six of them actually, if we are taking into account one-hot encoded categorical color. Let's limit the maximum depth of the tree aggressively to be comparable with the number of features. tree_model object has a max_depth property, and so we're setting it to be less than the number of features:

In []: 
tree_model.max_depth = 4 

After these manipulations, we can retrain our model and reevaluate its accuracy:

In []: 
tree_model = tree_model.fit(X_train, y_train) 
tree_model.score(X_train, y_train) 
Out[]: 
0.90571428571428569 

Note that accuracy on training is now set less by about 6%. How about test set?

In []: 
tree_model.score(X_test, y_test) 
Out[]: 
0.92000000000000004 

Accuracy on previously unseen data is now higher, by about 4%. This doesn't look like a great achievement, until you realize that it's an additional 40 correctly classified creatures from our initial set of 1,000. In modern machine learning contests, the final difference between 1st and 100th place can easily be about 1%.

Let's draw a tree structure after pruning. Code for this visualization is the same as before:

Figure 2.7: Tree structure after limiting its depth
主站蜘蛛池模板: 德阳市| 沁阳市| 浦江县| 济宁市| 浦城县| 资兴市| 会泽县| 洪雅县| 庆云县| 沁水县| 海晏县| 越西县| 肇源县| 岚皋县| 青阳县| 报价| 淮阳县| 剑川县| 尚义县| 阳春市| 锡林郭勒盟| 固原市| 乐清市| 醴陵市| 松桃| 磐安县| 瑞安市| 武义县| 许昌市| 临泽县| 开鲁县| 额尔古纳市| 台前县| 同德县| 白银市| 融水| 天津市| 武安市| 克什克腾旗| 北安市| 西吉县|