官术网_书友最值得收藏!

Tree visualization

Let us take a look at the code to visualize a tree as follows:

In []: 
labels = df.label.astype('category').cat.categories 
labels = list(labels) 
labels 
Out[]: 
[u'platyhog', u'rabbosaurus']  

Define a variable to store all the names for the features:

In []: 
feature_names = map(lambda x: x.encode('utf-8'), features.columns.get_values()) 
feature_names 
Out[]: 
['length', 
 'fluffy', 
 'color_light black', 
 'color_pink gold', 
 'color_purple polka-dot', 
 'color_space gray'] 

Then, create the graph object using the export_graphviz function:

In []: 
import pydotplus  
dot_data = tree.export_graphviz(tree_model, out_file=None,  
                                feature_names=feature_names,   
                                 class_names=labels,   
                                 filled=True, rounded=True,   
                                 special_characters=True) 
dot_data 
Out[]: 
u'digraph Tree {nnode [shape=box, style="filled, rounded", color="black", fontname=helvetica] ;nedge [fontname=helvetica] ;n0 [label=<length &le; 26.6917<br/>entropy = 0.9971<br/>samples = 700<br/>value = [372, ... 
In []: 
graph = pydotplus.graph_from_dot_data(dot_data.encode('utf-8')) 
graph.write_png('tree1.png') 
Out[]: 
True 

Put a markdown to the next cell to see the newly-created file as follows:

![](tree1.png) 
Figure 2.5: Decision tree structure and a close-up of its fragment

The preceding diagram shows what our decision tree looks like. During the training, it grows upside-down. Data (features) travels through it from its root (top) to the leaves (bottom). To predict the label for a sample from our dataset using this classifier, we should start from the root, and move until we reach the leaf. In each node, one feature is compared to some value; for example, in the root node, the tree checks if the length is < 26.0261. If the condition is met, we move along the left branch; if not, along the right.

Let's look closer at a part of the tree. In addition to the condition in each node, we have some useful information:

  • Entropy value
  • Number of samples in the training set which supports this node
  • How many samples support each outcome
  • The most likely outcome at this stage
主站蜘蛛池模板: 玉龙| 元江| 灵石县| 申扎县| 晋宁县| 嵩明县| 绥棱县| 贺兰县| 会昌县| 稷山县| 龙口市| 龙陵县| 鞍山市| 马尔康县| 南漳县| 柯坪县| 沈阳市| 沁阳市| 玛多县| 黄浦区| 兰考县| 襄城县| 罗定市| 遵义市| 康定县| 碌曲县| 仪陇县| 沐川县| 隆昌县| 张掖市| 巩留县| 皮山县| 西充县| 南丰县| 白山市| 汉阴县| 郁南县| 呼和浩特市| 淅川县| 额尔古纳市| 柳河县|