官术网_书友最值得收藏!

Demystifying MDPs

The technical purpose of Q-learning is to discover solutions for a type of optimization problem called an MDP.

When we talk about states and the actions that we can take from states, we are discussing concepts developed in the context of MDPs (and the Markov chains and other state space models that they are derived from). 

主站蜘蛛池模板: 汉中市| 钟祥市| 策勒县| 钦州市| 五寨县| 茂名市| 乐清市| 千阳县| 容城县| 柳河县| 九台市| 会泽县| 武义县| 奉新县| 南漳县| 舟曲县| 巴里| 西丰县| 临湘市| 黔南| 福州市| 罗山县| 正阳县| 通渭县| 大理市| 郑州市| 宁德市| 中牟县| 婺源县| 东至县| 广水市| 滨州市| 灵川县| 龙川县| 宁都县| 营口市| 隆德县| 三穗县| 龙山县| 永春县| 象州县|