官术网_书友最值得收藏!

The back propagation function

Once forward propagation is complete, we have the network's prediction for each data point. We also know that data point's actual value. Typically, the prediction is defined as  while the actual value of the target variable is defined as y.

Once both y and  are known, the network's error can be computed using the cost function. Recall that the cost function is the average of the loss function.

In order for learning to occur within the network, the network's error signal must be propagated backwards through the network layers from the last layer to the first. Our goal in back propagation is to propagate this error signal backwards through the network while using it to update the network weights as the signal travels. Mathematically, to do so we need to minimize the cost function by nudging the weights towards values that make the cost function the smallest. This process is called gradient descent.

The gradient is the partial derivative of the error function with respect to each weight within the network. The gradient of each weight can be calculated, layer by layer, using the chain rule and the gradients of the layers above.

Once the gradients of each layer are known, we can use the gradient descent algorithm to minimize the cost function.

The Gradient Descent will repeat this update until the network's error is minimized and the process has converged:

The gradient descent algorithm multiples the gradient by a learning rate called alpha and subtracts that value from the current value of each weight. The learning rate is a hyperparameter.

主站蜘蛛池模板: 彝良县| 南京市| 会东县| 沁水县| 紫阳县| 林口县| 湖口县| 天等县| 郓城县| 大竹县| 乐都县| 光泽县| 老河口市| 黔西| 蒲城县| 晋州市| 镇安县| 保康县| 开封市| 石阡县| 乐昌市| 岳阳县| 阜南县| 潞西市| 东明县| 新竹县| 柳江县| 仪征市| 贡觉县| 舒兰市| 泾源县| 诸暨市| 武宣县| 邳州市| 昌图县| 临泽县| 石台县| 桐柏县| 陵水| 扬州市| 天长市|