官术网_书友最值得收藏!

Getting ready

We change each weight within the neural network by a small amount  one at a time. A change in the weight value will have an impact on the final loss value (either increasing or decreasing loss). We'll update the weight in the direction of decreasing loss.

Additionally, in some scenarios, for a small change in weight, the error increases/decreases considerably, while in some cases the error decreases by a small amount.

By updating the weights by a small amount and measuring the change in error that the update in weights leads to, we are able to do the following:

  • Determine the direction of the weight update
  • Determine the magnitude of the weight update

Before implementing back-propagation, let's understand one additional detail of neural networks: the learning rate.

Intuitively, the learning rate helps us to build trust in the algorithm. For example, when deciding on the magnitude of the weight update, we would potentially not change it by a huge amount in one go, but take a more careful approach in updating the weights more slowly.

This results in obtaining stability in our model; we will look at how the learning rate helps with stability in the next chapter.

The whole process by which we update weights to reduce error is called a gradient-descent technique.

Stochastic gradient descent is the means by which error is minimized in the preceding scenario. More intuitively, gradient stands for difference (which is the difference between actual and predicted) and descent means reduce. Stochastic stands for the selection of number of random samples based on which a decision is taken.

Apart from stochastic gradient descent, there are many other optimization techniques that help to optimize for the loss values; the different optimization techniques will be discussed in the next chapter.

Back-propagation works as follows:

  • Calculates the overall cost function from the feedforward process.
  • Varies all the weights (one at a time) by a small amount.
  • Calculates the impact of the variation of weight on the cost function.
  • Depending on whether the change has an increased or decreased the cost (loss) value, it updates the weight value in the direction of loss decrease. And then repeats this step across all the weights we have.

If the preceding steps are performed n number of times, it essentially results in n epochs

In order to further cement our understanding of back-propagation in neural networks, let's start with a known function and see how the weights could be derived:

For now, we will have the known function as y = 2xwhere we try to come up with the weight value and bias value, which are 2 and 0 in this specific case:

 

If we formulate the preceding dataset as a linear regression, (y = a*x+b), where we are trying to calculate the values of a and b (which we already know are 2 and 0, but are checking how those values are obtained using gradient descent), let's randomly initialize the a and b parameters to values of 1.477 and 0 (the ideal values of which are 2 and 0).

主站蜘蛛池模板: 甘泉县| 肥城市| 平罗县| 西昌市| 澄江县| 高安市| 商洛市| 乌拉特后旗| 手机| 新野县| 商都县| 达拉特旗| 漾濞| 神农架林区| 兰西县| 吉隆县| 海安县| 八宿县| 逊克县| 岑溪市| 额济纳旗| 新巴尔虎左旗| 南江县| 石家庄市| 缙云县| 新野县| 阿克陶县| 巧家县| 新余市| 军事| 瓦房店市| 上思县| 水富县| 海晏县| 屏南县| 石台县| 望城县| 雅安市| 望城县| 平乐县| 桂林市|