官术网_书友最值得收藏!

Understanding Q-learning

Q-learning is an off-policy algorithm that was first proposed by Christopher Watkins in 1989, and is a widely used RL algorithm. Q-learning, such as SARSA, keeps an update of the state-action value function for each state-action pair, and recursively updates it using the Bellman equation of dynamic programming as new experiences are collected. Note that it is an off-policy algorithm as it uses the state-action value function evaluated at the action, which will maximize the value. Q-learning is used for problems where the actions are discrete – for example, if we have the actions move north, move south, move east, move west, and we are to decide the optimum action in a given state, then Q-learning is applicable in such settings.

In the classical Q-learning approach, the update is given as follows, where the max is performed over actions, that is, we choose the action a corresponding to the maximum value of Q at state st+1:

The α is the learning rate, which is a hyper-parameter that the user can specify.

Before we code the algorithms in Python, let's find out what kind of problems will be considered.

主站蜘蛛池模板: 陆河县| 砚山县| 舒城县| 麟游县| 广宁县| 出国| 黄大仙区| 凤翔县| 阿拉善盟| 湖北省| 达日县| 静宁县| 丰原市| 石家庄市| 沙河市| 庆阳市| 和龙市| 宝山区| 台北县| 句容市| 射阳县| 子长县| 遵义市| 江城| 榆林市| 怀宁县| 嘉兴市| 开平市| 康保县| 如东县| 柘城县| 司法| 黑水县| 揭阳市| 农安县| 闸北区| 沙田区| 台南市| 扶余县| 连云港市| 教育|