- Statistics for Machine Learning
- Pratap Dangeti
- 473字
- 2021-07-02 19:05:57
Grid search
Grid search in machine learning is a popular way to tune the hyperparameters of the model in order to find the best combination for determining the best fit:

In the following code, implementation has been performed to determine whether a particular user will click an ad or not. Grid search has been implemented using a decision tree classifier for classification purposes. Tuning parameters are the depth of the tree, the minimum number of observations in terminal node, and the minimum number of observations required to perform the node split:
# Grid search >>> import pandas as pd >>> from sklearn.tree import DecisionTreeClassifier >>> from sklearn.model_selection import train_test_split >>> from sklearn.metrics import classification_report,confusion_matrix,accuracy_score >>> from sklearn.pipeline import Pipeline >>> from sklearn.grid_search import GridSearchCV >>> input_data = pd.read_csv("ad.csv",header=None) >>> X_columns = set(input_data.columns.values) >>> y = input_data[len(input_data.columns.values)-1] >>> X_columns.remove(len(input_data.columns.values)-1) >>> X = input_data[list(X_columns)]
Split the data into train and testing:
>>> X_train, X_test,y_train,y_test = train_test_split(X,y,train_size = 0.7,random_state=33)
Create a pipeline to create combinations of variables for the grid search:
>>> pipeline = Pipeline([ ... ('clf', DecisionTreeClassifier(criterion='entropy')) ])
Combinations to explore are given as parameters in Python dictionary format:
>>> parameters = { ... 'clf__max_depth': (50,100,150), ... 'clf__min_samples_split': (2, 3), ... 'clf__min_samples_leaf': (1, 2, 3)}
The n_jobs field is for selecting the number of cores in a computer; -1 means it uses all the cores in the computer. The scoring methodology is accuracy, in which many other options can be chosen, such as precision, recall, and f1:
>>> grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, scoring='accuracy') >>> grid_search.fit(X_train, y_train)
Predict using the best parameters of grid search:
>>> y_pred = grid_search.predict(X_test)
The output is as follows:
>>> print ('\n Best score: \n', grid_search.best_score_) >>> print ('\n Best parameters set: \n') >>> best_parameters = grid_search.best_estimator_.get_params() >>> for param_name in sorted(parameters.keys()): >>> print ('\t%s: %r' % (param_name, best_parameters[param_name])) >>> print ("\n Confusion Matrix on Test data \n",confusion_matrix(y_test,y_pred)) >>> print ("\n Test Accuracy \n",accuracy_score(y_test,y_pred)) >>> print ("\nPrecision Recall f1 table \n",classification_report(y_test, y_pred))

The R code for grid searches on decision trees is as follows:
# Grid Search on Decision Trees library(rpart) input_data = read.csv("ad.csv",header=FALSE) input_data$V1559 = as.factor(input_data$V1559) set.seed(123) numrow = nrow(input_data) trnind = sample(1:numrow,size = as.integer(0.7*numrow)) train_data = input_data[trnind,];test_data = input_data[-trnind,] minspset = c(2,3);minobset = c(1,2,3) initacc = 0 for (minsp in minspset){ for (minob in minobset){ tr_fit = rpart(V1559 ~.,data = train_data,method = "class",minsplit = minsp, minbucket = minob) tr_predt = predict(tr_fit,newdata = train_data,type = "class") tble = table(tr_predt,train_data$V1559) acc = (tble[1,1]+tble[2,2])/sum(tble) acc if (acc > initacc){ tr_predtst = predict(tr_fit,newdata = test_data,type = "class") tblet = table(test_data$V1559,tr_predtst) acct = (tblet[1,1]+tblet[2,2])/sum(tblet) acct print(paste("Best Score")) print( paste("Train Accuracy ",round(acc,3),"Test Accuracy",round(acct,3))) print( paste(" Min split ",minsp," Min obs per node ",minob)) print(paste("Confusion matrix on test data")) print(tblet) precsn_0 = (tblet[1,1])/(tblet[1,1]+tblet[2,1]) precsn_1 = (tblet[2,2])/(tblet[1,2]+tblet[2,2]) print(paste("Precision_0: ",round(precsn_0,3),"Precision_1: ",round(precsn_1,3))) rcall_0 = (tblet[1,1])/(tblet[1,1]+tblet[1,2]) rcall_1 = (tblet[2,2])/(tblet[2,1]+tblet[2,2]) print(paste("Recall_0: ",round(rcall_0,3),"Recall_1: ",round(rcall_1,3))) initacc = acc } } }
- Mastering Ext JS(Second Edition)
- Boost程序庫完全開發指南:深入C++”準”標準庫(第5版)
- React Native Cookbook
- Mastering Scientific Computing with R
- Python高效開發實戰:Django、Tornado、Flask、Twisted(第2版)
- Web全棧工程師的自我修養
- 微信小程序入門指南
- NoSQL數據庫原理
- 現代C++編程實戰:132個核心技巧示例(原書第2版)
- Qt5 C++ GUI Programming Cookbook
- 你真的會寫代碼嗎
- SQL Server實例教程(2008版)
- FusionCharts Beginner’s Guide:The Official Guide for FusionCharts Suite
- Learning RSLogix 5000 Programming
- Oracle API Management 12c Implementation