官术网_书友最值得收藏!

Example of ridge regression machine learning

Ridge regression is a machine learning model in which we do not perform any statistical diagnostics on the independent variables and just utilize the model to fit on test data and check the accuracy of the fit. Here, we have used the scikit-learn package:

>>> from sklearn.linear_model import Ridge 
 
>>> wine_quality = pd.read_csv("winequality-red.csv",sep=';') 
>>> wine_quality.rename(columns=lambda x: x.replace(" ", "_"), inplace=True) 
 
>>> all_colnms = ['fixed_acidity', 'volatile_acidity', 'citric_acid', 'residual_sugar', 'chlorides', 'free_sulfur_dioxide', 'total_sulfur_dioxide', 'density', 'pH', 'sulphates', 'alcohol'] 
 
>>> pdx = wine_quality[all_colnms] 
>>> pdy = wine_quality["quality"] 
 
>>> x_train,x_test,y_train,y_test = train_test_split(pdx,pdy,train_size = 0.7,random_state=42)

A simple version of a grid search from scratch is described as follows, in which various values of alphas are to be tested in a grid search to test the model's fitness:

>>> alphas = [1e-4,1e-3,1e-2,0.1,0.5,1.0,5.0,10.0]

Initial values of R-squared are set to 0 in order to keep track of its updated value and to print whenever the new value is greater than the existing value:

>>> initrsq = 0 
 
>>> print ("\nRidge Regression: Best Parameters\n") 
>>> for alph in alphas: 
...      ridge_reg = Ridge(alpha=alph)  
...      ridge_reg.fit(x_train,y_train)   0 
...      tr_rsqrd = ridge_reg.score(x_train,y_train) 
...      ts_rsqrd = ridge_reg.score(x_test,y_test)    

The following code always keeps track of the test R-squared value and prints if the new value is greater than the existing best value:

>>>     if ts_rsqrd > initrsq: 
...          print ("Lambda: ",alph,"Train R-Squared value:",round(tr_rsqrd,5),"Test R-squared value:",round(ts_rsqrd,5)) 
...          initrsq = ts_rsqrd   

This is shown in the following screenshot:

Also, please note that the test R-squared value generated from ridge regression is similar to the value obtained from multiple linear regression (0.3519), but with no stress on the diagnostics of variables, and so on. Hence, machine learning models are relatively compact and can be utilized for learning automatically without manual intervention to retrain the model; this is one of the biggest advantages of using ML models for deployment purposes.

The R code for ridge regression on the wine quality data is as follows:

# Ridge regression 
library(glmnet) 
 
wine_quality = read.csv("winequality-red.csv",header=TRUE,sep = ";",check.names = FALSE) 
names(wine_quality) <- gsub(" ", "_", names(wine_quality)) 
 
set.seed(123) 
numrow = nrow(wine_quality) 
trnind = sample(1:numrow,size = as.integer(0.7*numrow)) 
train_data = wine_quality[trnind,]; test_data = wine_quality[-trnind,] 
 
xvars = c("fixed_acidity","volatile_acidity","citric_acid","residual_sugar","chlorides","free_sulfur_dioxide",            "total_sulfur_dioxide","density","pH","sulphates","alcohol") 
yvar = "quality" 
 
x_train = as.matrix(train_data[,xvars]);y_train = as.double (as.matrix (train_data[,yvar])) 
x_test = as.matrix(test_data[,xvars]) 
 
print(paste("Ridge Regression")) 
lambdas = c(1e-4,1e-3,1e-2,0.1,0.5,1.0,5.0,10.0) 
initrsq = 0 
for (lmbd in lambdas){ 
  ridge_fit = glmnet(x_train,y_train,alpha = 0,lambda = lmbd) 
  pred_y = predict(ridge_fit,x_test) 
  R2 <- 1 - (sum((test_data[,yvar]-pred_y )^2)/sum((test_data[,yvar]-mean(test_data[,yvar]))^2)) 
   
  if (R2 > initrsq){ 
    print(paste("Lambda:",lmbd,"Test Adjusted R-squared :",round(R2,4))) 
    initrsq = R2 
  } 
}
主站蜘蛛池模板: 武鸣县| 安平县| 弋阳县| 上犹县| 太谷县| 德化县| 五大连池市| 朝阳区| 河南省| 彭州市| 阿坝| 黎川县| 锦屏县| 永胜县| 墨脱县| 金昌市| 广西| 云林县| 鄂尔多斯市| 临清市| 阳西县| 东乌珠穆沁旗| 灌阳县| 郧西县| 灵宝市| 固始县| 武陟县| 神池县| 凉山| 夹江县| 竹山县| 延寿县| 高雄市| 蓬安县| 宿州市| 赤峰市| 肇源县| 资源县| 南皮县| 平邑县| 白河县|