- Reinforcement Learning with TensorFlow
- Sayon Dutta
- 470字
- 2021-08-27 18:51:58
The Q-learning approach to reinforcement learning
Q-learning is an attempt to learn the value Q(s,a) of a specific action given to the agent in a particular state. Consider a table where the number of rows represent the number of states, and the number of columns represent the number of actions. This is called a Q-table. Thus, we have to learn the value to find which action is the best for the agent in a given state.
Steps involved in Q-learning:
Initialize the table of Q(s,a) with uniform values (say, all zeros).
Observe the current state, s
Choose an action, a, by epsilon greedy or any other action selection policies, and take the action
As a result, a reward, r, is received and a new state, s', is perceived
Update the Q value of the (s,a) pair in the table by using the following Bellman equation:


Then, set the value of current state as a new state and repeat the process to complete one episode, that is, reaches the terminal state
Run multiple episodes to train the agent
To simplify, we can say that the Q-value for a given state, s, and action, a, is updated by the sum of current reward, r, and the discounted () maximum Q value for the new state among all its actions. The discount factor delays the reward from the future compared to the present rewards. For example, a reward of 100 today will be worth more than 100 in the future. Similarly, a reward of 100 in the future must be worth less than 100 today. Therefore, we will discount the future rewards. Repeating this update process continuously results in Q-table values converging to accurate measures of the expected future reward for a given action in a given state.
When the volume of the state and action spaces increase, maintaining a Q-table is difficult. In the real world, the state spaces are infinitely large. Thus, there's a requirement of another approach that can produce Q(s,a) without a Q-table. One solution is to replace the Q-table with a function. This function will take the state as the input in the form of a vector, and output the vector of Q-values for all the actions in the given state. This function approximator can be represented by a neural network to predict the Q-values. Thus, we can add more layers and fit in a deep neural network for better prediction of Q-values when the state and action space becomes large, which seemed impossible with a Q-table. This gives rise to the Q-network and if a deeper neural network, such as a convolutional neural network, is used then it results in a deep Q-network (DQN).
More details on Q-learning and deep Q-networks will be covered in Chapter 5, Q-Learning and Deep Q-Networks.
- 現(xiàn)代測(cè)控系統(tǒng)典型應(yīng)用實(shí)例
- 電力自動(dòng)化實(shí)用技術(shù)問答
- Splunk 7 Essentials(Third Edition)
- Java開發(fā)技術(shù)全程指南
- 最后一個(gè)人類
- Creo Parametric 1.0中文版從入門到精通
- Excel 2007技巧大全
- 嵌入式操作系統(tǒng)原理及應(yīng)用
- 網(wǎng)絡(luò)脆弱性掃描產(chǎn)品原理及應(yīng)用
- 傳感器與自動(dòng)檢測(cè)
- 機(jī)器人剛?cè)狁詈蟿?dòng)力學(xué)
- Mastering MongoDB 4.x
- 軟測(cè)之魂
- Flink內(nèi)核原理與實(shí)現(xiàn)
- Oracle 11g基礎(chǔ)與提高