- Deep Learning Quick Reference
- Mike Bernico
- 269字
- 2021-06-24 18:40:04
The back propagation function
Once forward propagation is complete, we have the network's prediction for each data point. We also know that data point's actual value. Typically, the prediction is defined as while the actual value of the target variable is defined as y.
Once both y and are known, the network's error can be computed using the cost function. Recall that the cost function is the average of the loss function.
In order for learning to occur within the network, the network's error signal must be propagated backwards through the network layers from the last layer to the first. Our goal in back propagation is to propagate this error signal backwards through the network while using it to update the network weights as the signal travels. Mathematically, to do so we need to minimize the cost function by nudging the weights towards values that make the cost function the smallest. This process is called gradient descent.
The gradient is the partial derivative of the error function with respect to each weight within the network. The gradient of each weight can be calculated, layer by layer, using the chain rule and the gradients of the layers above.
Once the gradients of each layer are known, we can use the gradient descent algorithm to minimize the cost function.
The Gradient Descent will repeat this update until the network's error is minimized and the process has converged:

The gradient descent algorithm multiples the gradient by a learning rate called alpha and subtracts that value from the current value of each weight. The learning rate is a hyperparameter.
- 平面設計初步
- 物聯網與云計算
- Zabbix Network Monitoring(Second Edition)
- Hands-On Linux for Architects
- Implementing Splunk 7(Third Edition)
- JSP從入門到精通
- Linux:Powerful Server Administration
- Kubernetes for Serverless Applications
- 突破,Objective-C開發速學手冊
- 人工智能技術入門
- 智能鼠原理與制作(進階篇)
- ZigBee無線通信技術應用開發
- 手把手教你學Flash CS3
- 工業機器人集成應用
- Learning iOS 8 for Enterprise