官术网_书友最值得收藏!

The back propagation function

Once forward propagation is complete, we have the network's prediction for each data point. We also know that data point's actual value. Typically, the prediction is defined as  while the actual value of the target variable is defined as y.

Once both y and  are known, the network's error can be computed using the cost function. Recall that the cost function is the average of the loss function.

In order for learning to occur within the network, the network's error signal must be propagated backwards through the network layers from the last layer to the first. Our goal in back propagation is to propagate this error signal backwards through the network while using it to update the network weights as the signal travels. Mathematically, to do so we need to minimize the cost function by nudging the weights towards values that make the cost function the smallest. This process is called gradient descent.

The gradient is the partial derivative of the error function with respect to each weight within the network. The gradient of each weight can be calculated, layer by layer, using the chain rule and the gradients of the layers above.

Once the gradients of each layer are known, we can use the gradient descent algorithm to minimize the cost function.

The Gradient Descent will repeat this update until the network's error is minimized and the process has converged:

The gradient descent algorithm multiples the gradient by a learning rate called alpha and subtracts that value from the current value of each weight. The learning rate is a hyperparameter.

主站蜘蛛池模板: 泰和县| 乌拉特前旗| 陆丰市| 安康市| 万载县| 临西县| 韶关市| 昌都县| 台东市| 桐城市| 大悟县| 苍梧县| 成都市| 赫章县| 康乐县| 镶黄旗| 阿坝| 门头沟区| 冷水江市| 武穴市| 新宁县| 黄龙县| 中卫市| 白朗县| 沙坪坝区| 苗栗市| 海晏县| 沾益县| 青川县| 宁城县| 阿瓦提县| 察雅县| 古丈县| 道孚县| 广饶县| 乌兰浩特市| 沙河市| 甘洛县| 鄄城县| 于田县| 富锦市|