- Reinforcement Learning with TensorFlow
- Sayon Dutta
- 254字
- 2021-08-27 18:52:00
Q-Learning
Now, let's try to program a reinforcement learning agent using Q-learning. Q-learning consists of a Q-table that contains Q-values for each state-action pair. The number of rows in the table is equal to the number of states in the environment and the number of columns equals the number of actions. Since the number of states is 16 and the number of actions is 4, the Q-table for this environment consists of 16 rows and 4 columns. The code for it is given here:
print("Number of actions : ",env.action_space.n)
print("Number of states : ",env.observation_space.n)
----------------------
Number of actions : 4
Number of states : 16
The steps involved in Q-learning are as follows:
- Initialize the Q-table with zeros (eventually, updating will happen with a reward received for each action taken during learning).
- Updating of a Q value for a state-action pair, that is, Q(s, a) is given by:
In this formula:
-
- s = current state
- a = action taken (choosing new action through epsilon-greedy approach)
- s' = resulted new state
- a' = action for the new state
- r = reward received for the action a
= learning rate, that is, the rate at which the learning of the agent converges towards minimized error
= discount factor, that is, discounts the future reward to get an idea of how important that future reward is with regards to the current reward
- By updating the Q-values as per the formula mentioned in step 2, the table converges to obtain accurate values for an action in a given state.
推薦閱讀
- Mastering Matplotlib 2.x
- Seven NoSQL Databases in a Week
- 基于LPC3250的嵌入式Linux系統開發
- 返璞歸真:UNIX技術內幕
- PIC單片機C語言非常入門與視頻演練
- Google App Inventor
- 空間傳感器網絡復雜區域智能監測技術
- AWS Certified SysOps Administrator:Associate Guide
- 項目管理成功利器Project 2007全程解析
- Grome Terrain Modeling with Ogre3D,UDK,and Unity3D
- 學會VBA,菜鳥也高飛!
- 網絡管理工具實用詳解
- 從零開始學Java Web開發
- 單片機原理實用教程
- 空間機器人智能感知技術