官术网_书友最值得收藏!

Training the model

All the magic in this model lies behind the RNN cells. In our simple example, each cell presents the same equations, just with a different set of variables. A detailed version of a single cell looks like this:

First, let's explain the new terms that appear in the preceding diagram:

  • Weights (, , ): A weight is a matrix (or a number) that represents the strength of the value it is applied to. For example, determines how much of the input should be considered in the following equations.
    If  consists of high values, then should have significant influence on the end result. The weight values are often initialized randomly or with a distribution (such as normal/Gaussian distribution). It is important to be noted that   ,  and  are the same for each step. Using the backpropagation algorithm, they are being modified with the aim of  producing accurate predictions
  • Biases (, ): An offset vector (different for each layer), which adds a change to the value of the output 
  • Activation function (tanh): This determines the final value of the current memory state  and the output . Basically, the activation functions map the resultant values of several equations similar to the following ones into a desired range: (-1, 1) if we are using the tanh function, (0, 1) if we are using sigmoid function, and (0, +infinity) if we are using ReLu (https://ai.stackexchange.com/questions/5493/what-is-the-purpose-of-an-activation-function-in-neural-networks)

Now, let's go over the process of computing the variables. To calculate  and , we can do the following:

As you can see, the memory state  is a result of the previous value  and the input . Using this formula helps in retaining information about all the previous states.

The input  is a one-hot representation of the word volunteer. Recall from before that one-hot encoding is a type of word embedding. If the text corpus consists of 20,000 unique words and volunteer is the 19th word, then  is a 20,000-dimensional vector where all elements are 0 except the one at the 19th position, which has a value of 1, which suggests that we only taking into account this particular word.

The sum between , and  is passed to the tanh activation function, which squashes the result between -1 and 1 using the following formula:

In this, e = 2.71828 (Euler's number) and z is any real number.

The output  at time step t is calculated using  and the softmax function. This function can be categorized as an activation with the exception that its primary usage is at the output layer when a probability distribution is needed. For example, predicting the correct outcome in a classification problem can be achieved by picking the highest probable value from a vector where all the elements sum up to 1. Softmax produces this vector, as follows:

In this, e = 2.71828 (Euler's number) and z is a K-dimensional vector. The formula calculates probability for the value at the ith position in the vector z.

After applying the softmax function,  becomes a vector of the same dimension as  (the corpus size 20,000) with all its elements having a total sum of 1. With that in mind, finding the predicted word from the text corpus becomes straightforward.

主站蜘蛛池模板: 莱州市| 噶尔县| 兴义市| 汶上县| 迁西县| 昭觉县| 海林市| 陆川县| 米脂县| 抚顺县| 南靖县| 宝丰县| 千阳县| 阳西县| 九寨沟县| 会泽县| 开阳县| 株洲县| 通辽市| 保亭| 鄯善县| 华蓥市| 双牌县| 临海市| 民丰县| 荥阳市| 柳林县| 锡林浩特市| 肇源县| 长岭县| 福海县| 谷城县| 奈曼旗| 新民市| 大英县| 安丘市| 巴中市| 马边| 上饶市| 大港区| 内黄县|