官术网_书友最值得收藏!

Tree visualization

Let us take a look at the code to visualize a tree as follows:

In []: 
labels = df.label.astype('category').cat.categories 
labels = list(labels) 
labels 
Out[]: 
[u'platyhog', u'rabbosaurus']  

Define a variable to store all the names for the features:

In []: 
feature_names = map(lambda x: x.encode('utf-8'), features.columns.get_values()) 
feature_names 
Out[]: 
['length', 
 'fluffy', 
 'color_light black', 
 'color_pink gold', 
 'color_purple polka-dot', 
 'color_space gray'] 

Then, create the graph object using the export_graphviz function:

In []: 
import pydotplus  
dot_data = tree.export_graphviz(tree_model, out_file=None,  
                                feature_names=feature_names,   
                                 class_names=labels,   
                                 filled=True, rounded=True,   
                                 special_characters=True) 
dot_data 
Out[]: 
u'digraph Tree {nnode [shape=box, style="filled, rounded", color="black", fontname=helvetica] ;nedge [fontname=helvetica] ;n0 [label=<length &le; 26.6917<br/>entropy = 0.9971<br/>samples = 700<br/>value = [372, ... 
In []: 
graph = pydotplus.graph_from_dot_data(dot_data.encode('utf-8')) 
graph.write_png('tree1.png') 
Out[]: 
True 

Put a markdown to the next cell to see the newly-created file as follows:

![](tree1.png) 
Figure 2.5: Decision tree structure and a close-up of its fragment

The preceding diagram shows what our decision tree looks like. During the training, it grows upside-down. Data (features) travels through it from its root (top) to the leaves (bottom). To predict the label for a sample from our dataset using this classifier, we should start from the root, and move until we reach the leaf. In each node, one feature is compared to some value; for example, in the root node, the tree checks if the length is < 26.0261. If the condition is met, we move along the left branch; if not, along the right.

Let's look closer at a part of the tree. In addition to the condition in each node, we have some useful information:

  • Entropy value
  • Number of samples in the training set which supports this node
  • How many samples support each outcome
  • The most likely outcome at this stage
主站蜘蛛池模板: 孝义市| 老河口市| 抚宁县| 石林| 乌兰县| 上蔡县| 梅河口市| 岗巴县| 江油市| 绥滨县| 班玛县| 望城县| 措美县| 西丰县| 铜川市| 云霄县| 镇赉县| 普格县| 杭锦后旗| 镇康县| 沈丘县| 新蔡县| 仙桃市| 武鸣县| 增城市| 盐源县| 乐昌市| 吉安县| 平湖市| 北票市| 汉阴县| 马龙县| 营山县| 儋州市| 星子县| 延长县| 宜阳县| 永川市| 凭祥市| 辽宁省| 冀州市|