- Statistics for Machine Learning
- Pratap Dangeti
- 344字
- 2021-07-02 19:06:00
Example of ridge regression machine learning
Ridge regression is a machine learning model in which we do not perform any statistical diagnostics on the independent variables and just utilize the model to fit on test data and check the accuracy of the fit. Here, we have used the scikit-learn package:
>>> from sklearn.linear_model import Ridge >>> wine_quality = pd.read_csv("winequality-red.csv",sep=';') >>> wine_quality.rename(columns=lambda x: x.replace(" ", "_"), inplace=True) >>> all_colnms = ['fixed_acidity', 'volatile_acidity', 'citric_acid', 'residual_sugar', 'chlorides', 'free_sulfur_dioxide', 'total_sulfur_dioxide', 'density', 'pH', 'sulphates', 'alcohol'] >>> pdx = wine_quality[all_colnms] >>> pdy = wine_quality["quality"] >>> x_train,x_test,y_train,y_test = train_test_split(pdx,pdy,train_size = 0.7,random_state=42)
A simple version of a grid search from scratch is described as follows, in which various values of alphas are to be tested in a grid search to test the model's fitness:
>>> alphas = [1e-4,1e-3,1e-2,0.1,0.5,1.0,5.0,10.0]
Initial values of R-squared are set to 0 in order to keep track of its updated value and to print whenever the new value is greater than the existing value:
>>> initrsq = 0 >>> print ("\nRidge Regression: Best Parameters\n") >>> for alph in alphas: ... ridge_reg = Ridge(alpha=alph) ... ridge_reg.fit(x_train,y_train) 0 ... tr_rsqrd = ridge_reg.score(x_train,y_train) ... ts_rsqrd = ridge_reg.score(x_test,y_test)
The following code always keeps track of the test R-squared value and prints if the new value is greater than the existing best value:
>>> if ts_rsqrd > initrsq: ... print ("Lambda: ",alph,"Train R-Squared value:",round(tr_rsqrd,5),"Test R-squared value:",round(ts_rsqrd,5)) ... initrsq = ts_rsqrd
This is shown in the following screenshot:

Also, please note that the test R-squared value generated from ridge regression is similar to the value obtained from multiple linear regression (0.3519), but with no stress on the diagnostics of variables, and so on. Hence, machine learning models are relatively compact and can be utilized for learning automatically without manual intervention to retrain the model; this is one of the biggest advantages of using ML models for deployment purposes.
The R code for ridge regression on the wine quality data is as follows:
# Ridge regression library(glmnet) wine_quality = read.csv("winequality-red.csv",header=TRUE,sep = ";",check.names = FALSE) names(wine_quality) <- gsub(" ", "_", names(wine_quality)) set.seed(123) numrow = nrow(wine_quality) trnind = sample(1:numrow,size = as.integer(0.7*numrow)) train_data = wine_quality[trnind,]; test_data = wine_quality[-trnind,] xvars = c("fixed_acidity","volatile_acidity","citric_acid","residual_sugar","chlorides","free_sulfur_dioxide", "total_sulfur_dioxide","density","pH","sulphates","alcohol") yvar = "quality" x_train = as.matrix(train_data[,xvars]);y_train = as.double (as.matrix (train_data[,yvar])) x_test = as.matrix(test_data[,xvars]) print(paste("Ridge Regression")) lambdas = c(1e-4,1e-3,1e-2,0.1,0.5,1.0,5.0,10.0) initrsq = 0 for (lmbd in lambdas){ ridge_fit = glmnet(x_train,y_train,alpha = 0,lambda = lmbd) pred_y = predict(ridge_fit,x_test) R2 <- 1 - (sum((test_data[,yvar]-pred_y )^2)/sum((test_data[,yvar]-mean(test_data[,yvar]))^2)) if (R2 > initrsq){ print(paste("Lambda:",lmbd,"Test Adjusted R-squared :",round(R2,4))) initrsq = R2 } }
- ASP.NET Core:Cloud-ready,Enterprise Web Application Development
- Spring 5.0 By Example
- 測試驅動開發:入門、實戰與進階
- Oracle Exadata性能優化
- 構建移動網站與APP:HTML 5移動開發入門與實戰(跨平臺移動開發叢書)
- Learning Flask Framework
- Learning Elixir
- Visual C++串口通信技術詳解(第2版)
- C/C++常用算法手冊(第3版)
- Mastering macOS Programming
- Extending Puppet(Second Edition)
- Learning Apache Cassandra
- Python程序設計與算法基礎教程(第2版)(微課版)
- Android嵌入式系統程序開發:基于Cortex-A8(第2版)
- JSP程序設計實例教程(第2版)