官术网_书友最值得收藏!

Regularization on linear models

The Stochastic Gradient Descent algorithm (SGD) finds the optimal weights {wi} of the model by minimizing the error between the true and the predicted values on the N training samples:

Where  are the predicted values, ?i the real values to be predicted; we have N samples, and each sample has n dimensions.

Regularization consists of adding a term to the previous equation and to minimize the regularized error:

The  parameter helps quantify the amount of regularization, while R(w) is the regularization term dependent on the regression coefficients.

There are two types of weight constraints usually considered:

  • L2 regularization as the sum of the squares of the coefficients:
  • L1 regularization as the sum of the absolute value of the coefficients:

The constraint on the coefficients introduced by the regularization term R(w) prevents the model from overfitting the training data. The coefficients become tied together by the regularization and can no longer be tightly leashed to the predictors. Each type of regularization has its characteristic and gives rise to different variations on the SGD algorithm, which we now introduce:

主站蜘蛛池模板: 阜新市| 长沙市| 慈溪市| 崇文区| 油尖旺区| 开鲁县| 奉化市| 堆龙德庆县| 凤台县| 桐城市| 改则县| 商丘市| 丽水市| 蛟河市| 工布江达县| 汶上县| 锦州市| 镇巴县| 大石桥市| 凤山县| 蚌埠市| 舞阳县| 博乐市| 洪泽县| 明光市| 云龙县| 肇东市| 安宁市| 佛坪县| 木里| 梅河口市| 曲阜市| 贵德县| 祁门县| 沈阳市| 抚顺县| 裕民县| 香港| 绥中县| 阳原县| 丹东市|