- Neural Networks with Keras Cookbook
- V Kishore Ayyadevara
- 530字
- 2021-07-02 12:46:23
Getting ready
We change each weight within the neural network by a small amount – one at a time. A change in the weight value will have an impact on the final loss value (either increasing or decreasing loss). We'll update the weight in the direction of decreasing loss.
Additionally, in some scenarios, for a small change in weight, the error increases/decreases considerably, while in some cases the error decreases by a small amount.
By updating the weights by a small amount and measuring the change in error that the update in weights leads to, we are able to do the following:
- Determine the direction of the weight update
- Determine the magnitude of the weight update
Before implementing back-propagation, let's understand one additional detail of neural networks: the learning rate.
Intuitively, the learning rate helps us to build trust in the algorithm. For example, when deciding on the magnitude of the weight update, we would potentially not change it by a huge amount in one go, but take a more careful approach in updating the weights more slowly.
This results in obtaining stability in our model; we will look at how the learning rate helps with stability in the next chapter.
The whole process by which we update weights to reduce error is called a gradient-descent technique.
Stochastic gradient descent is the means by which error is minimized in the preceding scenario. More intuitively, gradient stands for difference (which is the difference between actual and predicted) and descent means reduce. Stochastic stands for the selection of number of random samples based on which a decision is taken.
Apart from stochastic gradient descent, there are many other optimization techniques that help to optimize for the loss values; the different optimization techniques will be discussed in the next chapter.
Back-propagation works as follows:
- Calculates the overall cost function from the feedforward process.
- Varies all the weights (one at a time) by a small amount.
- Calculates the impact of the variation of weight on the cost function.
- Depending on whether the change has an increased or decreased the cost (loss) value, it updates the weight value in the direction of loss decrease. And then repeats this step across all the weights we have.
If the preceding steps are performed n number of times, it essentially results in n epochs.
In order to further cement our understanding of back-propagation in neural networks, let's start with a known function and see how the weights could be derived:
For now, we will have the known function as y = 2x, where we try to come up with the weight value and bias value, which are 2 and 0 in this specific case:

If we formulate the preceding dataset as a linear regression, (y = a*x+b), where we are trying to calculate the values of a and b (which we already know are 2 and 0, but are checking how those values are obtained using gradient descent), let's randomly initialize the a and b parameters to values of 1.477 and 0 (the ideal values of which are 2 and 0).