官术网_书友最值得收藏!

Training the model

All the magic in this model lies behind the RNN cells. In our simple example, each cell presents the same equations, just with a different set of variables. A detailed version of a single cell looks like this:

First, let's explain the new terms that appear in the preceding diagram:

  • Weights (, , ): A weight is a matrix (or a number) that represents the strength of the value it is applied to. For example, determines how much of the input should be considered in the following equations.
    If  consists of high values, then should have significant influence on the end result. The weight values are often initialized randomly or with a distribution (such as normal/Gaussian distribution). It is important to be noted that   ,  and  are the same for each step. Using the backpropagation algorithm, they are being modified with the aim of  producing accurate predictions
  • Biases (, ): An offset vector (different for each layer), which adds a change to the value of the output 
  • Activation function (tanh): This determines the final value of the current memory state  and the output . Basically, the activation functions map the resultant values of several equations similar to the following ones into a desired range: (-1, 1) if we are using the tanh function, (0, 1) if we are using sigmoid function, and (0, +infinity) if we are using ReLu (https://ai.stackexchange.com/questions/5493/what-is-the-purpose-of-an-activation-function-in-neural-networks)

Now, let's go over the process of computing the variables. To calculate  and , we can do the following:

As you can see, the memory state  is a result of the previous value  and the input . Using this formula helps in retaining information about all the previous states.

The input  is a one-hot representation of the word volunteer. Recall from before that one-hot encoding is a type of word embedding. If the text corpus consists of 20,000 unique words and volunteer is the 19th word, then  is a 20,000-dimensional vector where all elements are 0 except the one at the 19th position, which has a value of 1, which suggests that we only taking into account this particular word.

The sum between , and  is passed to the tanh activation function, which squashes the result between -1 and 1 using the following formula:

In this, e = 2.71828 (Euler's number) and z is any real number.

The output  at time step t is calculated using  and the softmax function. This function can be categorized as an activation with the exception that its primary usage is at the output layer when a probability distribution is needed. For example, predicting the correct outcome in a classification problem can be achieved by picking the highest probable value from a vector where all the elements sum up to 1. Softmax produces this vector, as follows:

In this, e = 2.71828 (Euler's number) and z is a K-dimensional vector. The formula calculates probability for the value at the ith position in the vector z.

After applying the softmax function,  becomes a vector of the same dimension as  (the corpus size 20,000) with all its elements having a total sum of 1. With that in mind, finding the predicted word from the text corpus becomes straightforward.

主站蜘蛛池模板: 西林县| 新和县| 韩城市| 灵宝市| 彰化县| 封开县| 斗六市| 黄石市| 原阳县| 梁山县| 芷江| 宜兴市| 扎鲁特旗| 综艺| 杂多县| 石柱| 渭源县| 大方县| 西华县| 丹阳市| 法库县| 永宁县| 海丰县| 固阳县| 马山县| 松潘县| 潼南县| 海门市| 镇原县| 彩票| 瑞安市| 广州市| 通海县| 中宁县| 延长县| 五台县| 方城县| 咸丰县| 芒康县| 库伦旗| 微山县|