- Deep Learning with Theano
- Christopher Bourez
- 430字
- 2021-07-15 17:17:00
Backpropagation and stochastic gradient descent
Backpropagation, or the backward propagation of errors, is the most commonly used supervised learning algorithm for adapting the connection weights.
Considering the error or the cost as a function of the weights W and b, a local minimum of the cost function can be approached with a gradient descent, which consists of changing weights along the negative error gradient:

Here,

is the learning rate, a positive constant defining the speed of a descent.
The following compiled function updates the variables after each feedforward run:
g_W = T.grad(cost=cost, wrt=W) g_b = T.grad(cost=cost, wrt=b) learning_rate=0.13 index = T.lscalar() train_model = theano.function( inputs=[index], outputs=[cost,error], updates=[(W, W - learning_rate * g_W),(b, b - learning_rate * g_b)], givens={ x: train_set_x[index * batch_size: (index + 1) * batch_size], y: train_set_y[index * batch_size: (index + 1) * batch_size] } )
The input variable is the index of the batch, since all the dataset has been transferred in one pass to the GPU in shared variables.
Training consists of presenting each sample to the model iteratively (iterations) and repeating the operation many times (epochs):
n_epochs = 1000 print_every = 1000 n_train_batches = train_set[0].shape[0] // batch_size n_iters = n_epochs * n_train_batches train_loss = np.zeros(n_iters) train_error = npzeros(n_iters) for epoch in range(n_epochs): for minibatch_index in range(n_train_batches): iteration = minibatch_index + n_train_batches * epoch train_loss[iteration], train_error[iteration] = train_model(minibatch_index) if (epoch * train_set[0].shape[0] + minibatch_index) % print_every == 0 : print('epoch {}, minibatch {}/{}, training error {:02.2f} %, training loss {}'.format( epoch, minibatch_index + 1, n_train_batches, train_error[iteration] * 100, train_loss[iteration] ))
This only reports the loss and error on one mini-batch, though. It would be good to also report the average over the whole dataset.
The error rate drops very quickly during the first iterations, then slows down.
Execution time on a GPU GeForce GTX 980M laptop is 67.3 seconds, while on an Intel i7 CPU, it is 3 minutes and 7 seconds.
After a long while, the model converges to a 5.3 - 5.5% error rate, and with a few more iterations could go further down, but could also lead to overfitting, Overfitting occurs when the model fits the training data well but does not get the same error rate on unseen data.
In this case, the model is too simple to overfit on this data.
A model that is too simple cannot learn very well. The principle of deep learning is to add more layers, that is, increase the depth and build deeper networks to gain better accuracy.
We'll see in the following section how to compute a better estimation of the model accuracy and the training stop.
- Dynamics 365 for Finance and Operations Development Cookbook(Fourth Edition)
- Software Testing using Visual Studio 2012
- 機器人Python青少年編程開發(fā)實例
- SQL Server 2012數(shù)據(jù)庫管理與開發(fā)項目教程
- Flux Architecture
- 琢石成器:Windows環(huán)境下32位匯編語言程序設計
- 深入淺出Serverless:技術原理與應用實踐
- Kubernetes進階實戰(zhàn)
- Orleans:構建高性能分布式Actor服務
- uni-app跨平臺開發(fā)與應用從入門到實踐
- Oracle 12c從入門到精通(視頻教學超值版)
- 數(shù)據(jù)科學中的實用統(tǒng)計學(第2版)
- Illustrator CS6中文版應用教程(第二版)
- Java程序設計基礎教程
- 嵌入式Linux與物聯(lián)網(wǎng)軟件開發(fā):C語言內核深度解析