官术网_书友最值得收藏!

Stochastic and minibatch gradient descents

The algorithm describe in the previous section assumes a forward and corresponding backwards pass over the entire dataset and as such it's called batch gradient descent.

Another possible way to do gradient descent would be to use a single data point at a time, updating the network weights as we go. This method might help speed up convergence around saddle points where the network might stop converging. Of course, the error estimation of only a single point may not be a very good approximation of the error of the entire dataset.

The best solution to this problem is using mini batch gradient descent, in which we will take some random subset of the data called a mini batch to compute our error and update our network weights. This is almost always the best option. It has the additional benefit of naturally splitting a very large dataset into chunks that are more easily managed in the memory of a machine, or even across machines.

This is an extremely high-level description of one of the most important parts of a neural network, which we believe fits with the practical nature of this book. In practice, most modern frameworks handle these steps for us; however, they are most certainly worth knowing at least theoretically. We encourage the reader to go deeper into forward and backward propagation as time permits.
主站蜘蛛池模板: 凤凰县| 寿阳县| 枝江市| 高州市| 波密县| 五家渠市| 土默特右旗| 洛扎县| 砚山县| 常德市| 绵阳市| 营山县| 筠连县| 来安县| 陈巴尔虎旗| 清水河县| 阳原县| 嘉峪关市| 长宁县| 怀宁县| 礼泉县| 齐河县| 翁牛特旗| 康定县| 大悟县| 响水县| 兰西县| 龙口市| 囊谦县| 康平县| 保康县| 永仁县| 长白| 巢湖市| 威远县| 彭水| 杭锦后旗| 深圳市| 兴宁市| 奉化市| 贵州省|