官术网_书友最值得收藏!

Stochastic and minibatch gradient descents

The algorithm describe in the previous section assumes a forward and corresponding backwards pass over the entire dataset and as such it's called batch gradient descent.

Another possible way to do gradient descent would be to use a single data point at a time, updating the network weights as we go. This method might help speed up convergence around saddle points where the network might stop converging. Of course, the error estimation of only a single point may not be a very good approximation of the error of the entire dataset.

The best solution to this problem is using mini batch gradient descent, in which we will take some random subset of the data called a mini batch to compute our error and update our network weights. This is almost always the best option. It has the additional benefit of naturally splitting a very large dataset into chunks that are more easily managed in the memory of a machine, or even across machines.

This is an extremely high-level description of one of the most important parts of a neural network, which we believe fits with the practical nature of this book. In practice, most modern frameworks handle these steps for us; however, they are most certainly worth knowing at least theoretically. We encourage the reader to go deeper into forward and backward propagation as time permits.
主站蜘蛛池模板: 永平县| 中阳县| 冀州市| 新密市| 岚皋县| 千阳县| 沅陵县| 光泽县| 阜阳市| 沙河市| 达日县| 溆浦县| 家居| 枞阳县| 石家庄市| 泾阳县| 中西区| 新兴县| 和林格尔县| 皋兰县| 惠安县| 曲水县| 宁晋县| 威宁| 会昌县| 新丰县| 嘉黎县| 会东县| 宁河县| 安阳市| 垦利县| 靖西县| 石柱| 云林县| 神池县| 江阴市| 洪泽县| 绍兴市| 凉城县| 海口市| 莒南县|