官术网_书友最值得收藏!

How to do it...

In the following steps, you will load the standard wine dataset and use Bayesian optimization to tune the hyperparameters of an XGBoost model:

  1. Load the wine dataset from scikit-learn:
from sklearn import datasets

wine_dataset = datasets.load_wine()
X = wine_dataset.data
y = wine_dataset.target
  1. Import XGBoost and stratified K-fold:
import xgboost as xgb
from sklearn.model_selection import StratifiedKFold
  1.  Import BayesSearchCV from scikit-optimize and specify the number of parameter settings to test:
from skopt import BayesSearchCV

n_iterations = 50
  1. Specify your estimator. In this case, we select XGBoost and set it to be able to perform multi-class classification:
estimator = xgb.XGBClassifier(
n_jobs=-1,
objective="multi:softmax",
eval_metric="merror",
verbosity=0,
num_class=len(set(y)),
)

  1. Specify a parameter search space:
search_space = {
"learning_rate": (0.01, 1.0, "log-uniform"),
"min_child_weight": (0, 10),
"max_depth": (1, 50),
"max_delta_step": (0, 10),
"subsample": (0.01, 1.0, "uniform"),
"colsample_bytree": (0.01, 1.0, "log-uniform"),
"colsample_bylevel": (0.01, 1.0, "log-uniform"),
"reg_lambda": (1e-9, 1000, "log-uniform"),
"reg_alpha": (1e-9, 1.0, "log-uniform"),
"gamma": (1e-9, 0.5, "log-uniform"),
"min_child_weight": (0, 5),
"n_estimators": (5, 5000),
"scale_pos_weight": (1e-6, 500, "log-uniform"),
}
  1. Specify the type of cross-validation to perform:
cv = StratifiedKFold(n_splits=3, shuffle=True)
  1. Define BayesSearchCV using the settings you have defined:
bayes_cv_tuner = BayesSearchCV(
estimator=estimator,
search_spaces=search_space,
scoring="accuracy",
cv=cv,
n_jobs=-1,
n_iter=n_iterations,
verbose=0,
refit=True,
)

 

  1. Define a callback function to print out the progress of the parameter search:
import pandas as pd
import numpy as np

def print_status(optimal_result):
"""Shows the best parameters found and accuracy attained of the search so far."""
models_tested = pd.DataFrame(bayes_cv_tuner.cv_results_)
best_parameters_so_far = pd.Series(bayes_cv_tuner.best_params_)
print(
"Model #{}\nBest accuracy so far: {}\nBest parameters so far: {}\n".format(
len(models_tested),
np.round(bayes_cv_tuner.best_score_, 3),
bayes_cv_tuner.best_params_,
)
)

clf_type = bayes_cv_tuner.estimator.__class__.__name__
models_tested.to_csv(clf_type + "_cv_results_summary.csv")

 

  1. Perform the parameter search:
result = bayes_cv_tuner.fit(X, y, callback=print_status)

 

As you can see, the following shows the output:

Model #1
Best accuracy so far: 0.972
Best parameters so far: {'colsample_bylevel': 0.019767840658391753, 'colsample_bytree': 0.5812505808116454, 'gamma': 1.7784704701058755e-05, 'learning_rate': 0.9050859661329937, 'max_delta_step': 3, 'max_depth': 42, 'min_child_weight': 1, 'n_estimators': 2334, 'reg_alpha': 0.02886003776717955, 'reg_lambda': 0.0008507166793122457, 'scale_pos_weight': 4.801764874750116e-05, 'subsample': 0.7188797743009225}

Model #2
Best accuracy so far: 0.972
Best parameters so far: {'colsample_bylevel': 0.019767840658391753, 'colsample_bytree': 0.5812505808116454, 'gamma': 1.7784704701058755e-05, 'learning_rate': 0.9050859661329937, 'max_delta_step': 3, 'max_depth': 42, 'min_child_weight': 1, 'n_estimators': 2334, 'reg_alpha': 0.02886003776717955, 'reg_lambda': 0.0008507166793122457, 'scale_pos_weight': 4.801764874750116e-05, 'subsample': 0.7188797743009225}

<snip>

Model #50
Best accuracy so far: 0.989
Best parameters so far: {'colsample_bylevel': 0.013417868502558758, 'colsample_bytree': 0.463490250419848, 'gamma': 2.2823050161337873e-06, 'learning_rate': 0.34006478878384533, 'max_delta_step': 9, 'max_depth': 41, 'min_child_weight': 0, 'n_estimators': 1951, 'reg_alpha': 1.8321791726476395e-08, 'reg_lambda': 13.098734837402576, 'scale_pos_weight': 0.6188077759379964, 'subsample': 0.7970035272497132}
主站蜘蛛池模板: 乳山市| 公安县| 馆陶县| 娱乐| 成都市| 彰武县| 莒南县| 天门市| 钟山县| 仁寿县| 大关县| 肃南| 克山县| 福安市| 崇阳县| 阜平县| 明水县| 英山县| 青岛市| 祁门县| 若羌县| 清镇市| 大竹县| 马山县| 台前县| 黄浦区| 三江| 贵南县| 高陵县| 韶关市| 如皋市| 新田县| 微山县| 偏关县| 中卫市| 年辖:市辖区| 昌平区| 大新县| 通许县| 禄劝| 景德镇市|