官术网_书友最值得收藏!

Moving towards a standard workflow

Estimators scikit-learn have two and predict(). We train the algorithm using the
predict() method on our testing set. We evaluate it using the predict() method on our testing set.

  1. First, we need to create these training and testing sets. As before, import and run the train_test_split function:
from sklearn.cross_validation import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=14)
  1. Then, we import the nearest neighbor class and create an instance for it. We leave the parameters as defaults for now and will test other values later in this chapter. By default, the algorithm will choose the five nearest neighbors to predict the class of a testing sample:

from sklearn.neighbors import KNeighborsClassifier estimator = KNeighborsClassifier()
  1. After creating our estimator, we must then fit it on our training dataset. For the nearest neighbor class, this training step simply records our dataset, allowing us to find the nearest neighbor for a new data point, by comparing that point to the training dataset:
estimator.fit(X_train, y_train)
  1. We then train the algorithm with our test set and evaluate with our testing set:
y_predicted = estimator.predict(X_test) 
accuracy = np.mean(y_test == y_predicted) * 100
print("The accuracy is {0:.1f}%".format(accuracy))

This model scores 86.4 percent accuracy, which is impressive for a default algorithm and just a few lines of code! Most scikit-learn default parameters are chosen deliberately to work well with a range of datasets. However, you should always aim to choose parameters based on knowledge of the application experiment. We will use strategies for doing this parameter search in later chapters.

主站蜘蛛池模板: 辽源市| 舒兰市| 永新县| 肃宁县| 江陵县| 漯河市| 绿春县| 凤山市| 囊谦县| 达拉特旗| 乌兰浩特市| 绵竹市| 恩平市| 正镶白旗| 航空| 房山区| 贡觉县| 栾川县| 顺平县| 永昌县| 罗平县| 农安县| 甘孜县| 衢州市| 偏关县| 礼泉县| 浦江县| 儋州市| 象山县| 华池县| 尤溪县| 南靖县| 都安| 尖扎县| 临桂县| 沧州市| 双辽市| 芜湖市| 苗栗市| 大石桥市| 太仓市|