官术网_书友最值得收藏!

Speeding up the training process using batch normalization

In the previous section on the scaling dataset, we learned that optimization is slow when the input data is not scaled (that is, it is not between zero and one). 

The hidden layer value could be high in the following scenarios:

  • Input data values are high
  • Weight values are high
  • The multiplication of weight and input are high

Any of these scenarios can result in a large output value on the hidden layer.

Note that the hidden layer is the input layer to output layer. Hence, the phenomenon of high input values resulting in a slow optimization holds true when hidden layer values are large as well.

Batch normalization comes to the rescue in this scenario. We have already learned that, when input values are high, we perform scaling to reduce the input values. Additionally, we have learned that scaling can also be performed using a different method, which is to subtract the mean of the input and divide it by the standard deviation of the input. Batch normalization performs this method of scaling.

Typically, all values are scaled using the following formula:

Notice that γ and β are learned during training, along with the original parameters of the network.

主站蜘蛛池模板: 靖州| 福安市| 株洲县| 彩票| 苗栗县| 抚州市| 亳州市| 道真| 喀什市| 岑溪市| 南宫市| 察哈| 辽阳县| 日土县| 盘锦市| 江口县| 江西省| 桂阳县| 东丽区| 长乐市| 南靖县| 电白县| 壤塘县| 油尖旺区| 大冶市| 施甸县| 桦川县| 象州县| 乌鲁木齐县| 四会市| 三门峡市| 钟山县| 灵山县| 淮滨县| 集贤县| 安西县| 石棉县| 沾益县| 泗水县| 清流县| 图们市|