官术网_书友最值得收藏!

Backpropagation and stochastic gradient descent

Backpropagation, or the backward propagation of errors, is the most commonly used supervised learning algorithm for adapting the connection weights.

Considering the error or the cost as a function of the weights W and b, a local minimum of the cost function can be approached with a gradient descent, which consists of changing weights along the negative error gradient:

Here,

is the learning rate, a positive constant defining the speed of a descent.

The following compiled function updates the variables after each feedforward run:

g_W = T.grad(cost=cost, wrt=W)
g_b = T.grad(cost=cost, wrt=b)

learning_rate=0.13
index = T.lscalar()

train_model = theano.function(
    inputs=[index],
    outputs=[cost,error],
    updates=[(W, W - learning_rate * g_W),(b, b - learning_rate * g_b)],
    givens={
        x: train_set_x[index * batch_size: (index + 1) * batch_size],
        y: train_set_y[index * batch_size: (index + 1) * batch_size]
    }
)

The input variable is the index of the batch, since all the dataset has been transferred in one pass to the GPU in shared variables.

Training consists of presenting each sample to the model iteratively (iterations) and repeating the operation many times (epochs):

n_epochs = 1000
print_every = 1000

n_train_batches = train_set[0].shape[0] // batch_size
n_iters = n_epochs * n_train_batches
train_loss = np.zeros(n_iters)
train_error = npzeros(n_iters)

for epoch in range(n_epochs):
    for minibatch_index in range(n_train_batches):
        iteration = minibatch_index + n_train_batches * epoch
        train_loss[iteration], train_error[iteration] = train_model(minibatch_index)
        if (epoch * train_set[0].shape[0] + minibatch_index) % print_every == 0 :
            print('epoch {}, minibatch {}/{}, training error {:02.2f} %, training loss {}'.format(
                epoch,
                minibatch_index + 1,
                n_train_batches,
                train_error[iteration] * 100,
                train_loss[iteration]
            ))

This only reports the loss and error on one mini-batch, though. It would be good to also report the average over the whole dataset.

The error rate drops very quickly during the first iterations, then slows down.

Execution time on a GPU GeForce GTX 980M laptop is 67.3 seconds, while on an Intel i7 CPU, it is 3 minutes and 7 seconds.

After a long while, the model converges to a 5.3 - 5.5% error rate, and with a few more iterations could go further down, but could also lead to overfitting, Overfitting occurs when the model fits the training data well but does not get the same error rate on unseen data.

In this case, the model is too simple to overfit on this data.

A model that is too simple cannot learn very well. The principle of deep learning is to add more layers, that is, increase the depth and build deeper networks to gain better accuracy.

We'll see in the following section how to compute a better estimation of the model accuracy and the training stop.

主站蜘蛛池模板: 多伦县| 仙居县| 潞西市| 淮滨县| 郴州市| 阿克苏市| 图片| 枝江市| 乐至县| 崇明县| 富锦市| 镶黄旗| 英山县| 莒南县| 永善县| 资兴市| 湖口县| 黄梅县| 项城市| 沙河市| 新郑市| 伊吾县| 久治县| 会宁县| 徐水县| 盖州市| 根河市| 乐都县| 海城市| 华坪县| 虎林市| 盐池县| 资源县| 深泽县| 沂水县| 抚州市| 德格县| 台湾省| 江达县| 舟曲县| 景东|