官术网_书友最值得收藏!

Linear regression versus gradient descent

In the following code, a comparison has been made between applying linear regression in a statistical way and gradient descent in a machine learning way on the same dataset:

>>> import numpy as np 
>>> import pandas as pd 

The following code describes reading data using a pandas DataFrame:

>>> train_data = pd.read_csv("mtcars.csv")      

Converting DataFrame variables into NumPy arrays in order to process them in scikit learn packages, as scikit-learn is built on NumPy arrays itself, is shown next:

>>> X = np.array(train_data["hp"])  ; y = np.array(train_data["mpg"])  
>>> X = X.reshape(32,1); y = y.reshape(32,1) 

Importing linear regression from the scikit-learn package; this works on the least squares method:

>>> from sklearn.linear_model import LinearRegression 
>>> model = LinearRegression(fit_intercept = True) 

Fitting a linear regression model on the data and display intercept and coefficient of single variable (hp variable):

>>> model.fit(X,y) 
>>> print ("Linear Regression Results" ) 
>>> print ("Intercept",model.intercept_[0] ,"Coefficient", model.coef_[0]) 

Now we will apply gradient descent from scratch; in future chapters, we can use the scikit-learn built-in modules rather than doing it from first principles. However, here, an illustration has been provided on the internal workings of the optimization method on which the whole machine learning has been built.

Defining the gradient descent function gradient_descent with the following:

  • x: Independent variable.
  • y: Dependent variable.
  • learn_rate: Learning rate with which gradients are updated; too low causes slower convergence and too high causes overshooting of gradients.
  • batch_size: Number of observations considered at each iteration for updating gradients; a high number causes a lower number of iterations and a lower number causes an erratic decrease in errors. Ideally, the batch size should be a minimum value of 30 due to statistical significance. However, various settings need to be tried to check which one is better.
  • max_iter: Maximum number of iteration, beyond which the algorithm will get auto-terminated:
>>> def gradient_descent(x, y,learn_rate, conv_threshold,batch_size, max_iter): 
...    converged = False 
...    iter = 0 
...    m = batch_size   
...    t0 = np.random.random(x.shape[1]) 
...    t1 = np.random.random(x.shape[1]) 
Mean square error calculation
Squaring of error has been performed to create the convex function, which has nice convergence properties:
...    MSE = (sum([(t0 + t1*x[i] - y[i])**2 for i in range(m)])/ m)

The following code states, run the algorithm until it does not meet the convergence criteria:

...      while not converged:         
...          grad0 = 1.0/m * sum([(t0 + t1*x[i] - y[i]) for i in range(m)])   
...          grad1 = 1.0/m * sum([(t0 + t1*x[i] - y[i])*x[i] for i in range(m)])  
...          temp0 = t0 - learn_rate * grad0 
...          temp1 = t1 - learn_rate * grad1     
...          t0 = temp0 
...          t1 = temp1 

Calculate a new error with updated parameters, in order to check whether the new error changed more than the predefined convergence threshold value; otherwise, stop the iterations and return parameters:

...          MSE_New = (sum( [ (t0 + t1*x[i] - y[i])**2 for i in range(m)] ) / m) 
...          if abs(MSE - MSE_New ) <= conv_threshold: 
...              print 'Converged, iterations: ', iter 
...              converged = True     
...          MSE = MSE_New    
...          iter += 1      
...          if iter == max_iter: 
...              print 'Max interactions reached' 
...              converged = True 
...          return t0,t1 

The following code describes running the gradient descent function with defined values. Learn rate = 0.0003, convergence threshold = 1e-8, batch size = 32, maximum number of iteration = 1500000:

>>> if __name__ == '__main__': 
...      Inter, Coeff = gradient_descent(x = X,y = y,learn_rate=0.00003 , conv_threshold = 1e-8, batch_size=32,max_iter=1500000) 
...      print ('Gradient Descent Results')  
...      print (('Intercept = %s Coefficient = %s') %(Inter, Coeff))  

The R code for linear regression versus gradient descent is as follows:

# Linear Regression 
train_data = read.csv("mtcars.csv",header=TRUE) 
model <- lm(mpg ~ hp, data = train_data) 
print (coef(model)) 
 
# Gradient descent 
gradDesc <- function(x, y, learn_rate, conv_threshold, batch_size, max_iter) { 
  m <- runif(1, 0, 1) 
  c <- runif(1, 0, 1) 
  ypred <- m * x + c 
  MSE <- sum((y - ypred) ^ 2) / batch_size 
   
  converged = F 
  iterations = 0 
   
  while(converged == F) { 
    m_new <- m - learn_rate * ((1 / batch_size) * (sum((ypred - y) * x))) 
    c_new <- c - learn_rate * ((1 / batch_size) * (sum(ypred - y))) 
     
    m <- m_new 
    c <- c_new 
ypred <- m * x + c
MSE_new <- sum((y - ypred) ^ 2) / batch_size

if(MSE - MSE_new <= conv_threshold) {
converged = T
return(paste("Iterations:",iterations,"Optimal intercept:", c, "Optimal slope:", m))
}
iterations = iterations + 1

if(iterations > max_iter) {
converged = T
return(paste("Iterations:",iterations,"Optimal intercept:", c, "Optimal slope:", m))
}
MSE = MSE_new
}
}
gradDesc(x = train_data$hp,y = train_data$mpg, learn_rate = 0.00003, conv_threshold = 1e-8, batch_size = 32, max_iter = 1500000)
主站蜘蛛池模板: 郁南县| 开原市| 云南省| 宁晋县| 宣化县| 灵丘县| 丰镇市| 惠州市| 涡阳县| 洛宁县| 迭部县| 民和| 周口市| 泸州市| 宝清县| 达州市| 崇州市| 岚皋县| 伊川县| 安平县| 岗巴县| 沈丘县| 皋兰县| 泌阳县| 上饶市| 萝北县| 巴彦淖尔市| 镇雄县| 宁南县| 鄂伦春自治旗| 德昌县| 昌黎县| 全椒县| 乌鲁木齐县| 盐亭县| 福安市| 墨脱县| 盐城市| 新营市| 柞水县| 甘洛县|