- Hands-On Machine Learning with scikit:learn and Scientific Python Toolkits
- Tarek Amr
- 868字
- 2021-06-18 18:24:30
Visualizing the tree's decision boundaries
To be able to pick the right algorithm for the problem, it is important to have a conceptual understanding of how an algorithm makes its decision. As we already know by now, decision trees pick one feature at a time and try to split the data accordingly. Nevertheless, it is important to be able to visualize those decisions as well. Let me first plot our classes versus our features, then I will explain further:

When the tree made a decision to split the data around a petal width of 0.8, you can think of it as drawing a horizontal line in the right-hand side graph at the value of 0.8. Then, with every later split, the tree splits the space further using combinations of horizontal and vertical lines. By knowing this, you should not expect the algorithm to use curves or 45-degree lines to separate the classes.
One trick to plot the decision boundaries that a tree has after it has been trained is to use contour plots. For simplicity, let's assume we only have two features—petal length and petal width. We then generate almost all the possible values for those two features and predict the class labels for our new hypothetical data. Then, we create a contour plot using those predictions to see the boundaries between the classes. The following function, created by Richard Johanssonof the University of Gothenburg, does exactly that:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def plot_decision_boundary(clf, x, y):
feature_names = x.columns
x, y = x.values, y.values
x_min, x_max = x[:,0].min(), x[:,0].max()
y_min, y_max = x[:,1].min(), x[:,1].max()
step = 0.02
xx, yy = np.meshgrid(
np.arange(x_min, x_max, step),
np.arange(y_min, y_max, step)
)
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure(figsize=(12,8))
plt.contourf(xx, yy, Z, cmap='Paired_r', alpha=0.25)
plt.contour(xx, yy, Z, colors='k', linewidths=0.7)
plt.scatter(x[:,0], x[:,1], c=y, edgecolors='k')
plt.title("Tree's Decision Boundaries")
plt.xlabel(feature_names[0])
plt.ylabel(feature_names[1])
This time, we will train our classifier using two features only, and then call the preceding function using the newly trained model:
x = df[['petal width (cm)', 'petal length (cm)']]
y = df['target']
clf = DecisionTreeClassifier(max_depth=3)
clf.fit(x, y)
plot_decision_boundary(clf, x, y)
Richard Johansson's functions overlay the contour graph over our samples to give us the following graph:

By seeing the decision boundaries as well as the data samples, you can make better decisions on whether one algorithm is good for the problem at hand.
Feature engineering
On seeing the class distribution versus the petal lengths and widths, you may wonder: what if the decision trees could also draw boundaries that are at 40 degrees? Wouldn't 40-degree boundaries be more apt than those horizontal and vertical jigsaws?Unfortunately, decision trees cannot do that, but let's put the algorithm aside for a moment and think about the data instead. How about creating a new axis where the class boundaries change their orientation?
Let's create two new columns—petal length x width (cm) and sepal length x width (cm)—and see how the class distribution will look:
df['petal length x width (cm)'] = df['petal length (cm)'] * df['petal width (cm)']
df['sepal length x width (cm)'] = df['sepal length (cm)'] * df['sepal width (cm)']
The following code will plot the classes versus the newly derived features:
fig, ax = plt.subplots(1, 1, figsize=(12, 6));
h_label = 'petal length x width (cm)'
v_label = 'sepal length x width (cm)'
for c in df['target'].value_counts().index.tolist():
df[df['target'] == c].plot(
title='Class distribution vs the newly derived features',
kind='scatter',
x=h_label,
y=v_label,
color=['r', 'g', 'b'][c], # Each class different color
marker=f'${c}$', # Use class id as marker
s=64,
alpha=0.5,
ax=ax,
)
fig.show()
Running this code will produce the following graph:

This new projection looks better; it makes the data more vertically separable. Nevertheless, the proof of the pudding is still in the eating. So, let's train two classifiers—one on the original features and one on the newly derived features—and see
how their accuracies compare. The following code goes through 500 iterations, each time splitting the data randomly, and then training both models, each with its own set of features, and storing the accuracy we get with each iteration:
features_orig = iris.feature_names
features_new = ['petal length x width (cm)', 'sepal length x width (cm)']
accuracy_scores_orig = []
accuracy_scores_new = []
for _ in range(500):
df_train, df_test = train_test_split(df, test_size=0.3)
x_train_orig = df_train[features_orig]
x_test_orig = df_test[features_orig]
x_train_new = df_train[features_new]
x_test_new = df_test[features_new]
y_train = df_train['target']
y_test = df_test['target']
clf_orig = DecisionTreeClassifier(max_depth=2)
clf_new = DecisionTreeClassifier(max_depth=2)
clf_orig.fit(x_train_orig, y_train)
clf_new.fit(x_train_new, y_train)
y_pred_orig = clf_orig.predict(x_test_orig)
y_pred_new = clf_new.predict(x_test_new)
accuracy_scores_orig.append(round(accuracy_score(y_test, y_pred_orig),
3))
accuracy_scores_new.append(round(accuracy_score(y_test, y_pred_new),
3))
accuracy_scores_orig = pd.Series(accuracy_scores_orig)
accuracy_scores_new = pd.Series(accuracy_scores_new)
Then, we can use box plots to compare the accuracies of the two classifiers:
fig, axs = plt.subplots(1, 2, figsize=(16, 6), sharey=True);
accuracy_scores_orig.plot(
title='Distribution of classifier accuracy [Original Features]',
kind='box',
grid=True,
ax=axs[0]
)
accuracy_scores_new.plot(
title='Distribution of classifier accuracy [New Features]',
kind='box',
grid=True,
ax=axs[1]
)
fig.show()
Here, we put the top plots side by side to be able to compare them to each other:

Clearly, the derived features helped a bit. Its accuracy is higher on average (0.96 versus 0.93), and its lower bound is also higher.