官术网_书友最值得收藏!

Regularization on linear models

The Stochastic Gradient Descent algorithm (SGD) finds the optimal weights {wi} of the model by minimizing the error between the true and the predicted values on the N training samples:

Where  are the predicted values, ?i the real values to be predicted; we have N samples, and each sample has n dimensions.

Regularization consists of adding a term to the previous equation and to minimize the regularized error:

The  parameter helps quantify the amount of regularization, while R(w) is the regularization term dependent on the regression coefficients.

There are two types of weight constraints usually considered:

  • L2 regularization as the sum of the squares of the coefficients:
  • L1 regularization as the sum of the absolute value of the coefficients:

The constraint on the coefficients introduced by the regularization term R(w) prevents the model from overfitting the training data. The coefficients become tied together by the regularization and can no longer be tightly leashed to the predictors. Each type of regularization has its characteristic and gives rise to different variations on the SGD algorithm, which we now introduce:

主站蜘蛛池模板: 韩城市| 巴楚县| 台北市| 金阳县| 宁乡县| 娱乐| 眉山市| 永和县| 峨山| 保德县| 茌平县| 莒南县| 灌阳县| 武胜县| 庄浪县| 宁河县| 金平| 洛南县| 永胜县| 汕尾市| 应城市| 蓝山县| 长沙县| 永胜县| 普宁市| 泉州市| 张家界市| 南宁市| 漳州市| 辽阳县| 巴楚县| 进贤县| 乌什县| 阜阳市| 娱乐| 东平县| 洛川县| 哈尔滨市| 年辖:市辖区| 汕尾市| 乌拉特中旗|