官术网_书友最值得收藏!

Demystifying MDPs

The technical purpose of Q-learning is to discover solutions for a type of optimization problem called an MDP.

When we talk about states and the actions that we can take from states, we are discussing concepts developed in the context of MDPs (and the Markov chains and other state space models that they are derived from). 

主站蜘蛛池模板: 昆山市| 阜平县| 安阳县| 晋中市| 云南省| 峨眉山市| 抚松县| 湘潭市| 拉孜县| 方山县| 利津县| 蓝田县| 东乌| 青田县| 綦江县| 宝坻区| 磐石市| 东方市| 罗江县| 桂林市| 承德县| 定远县| 玉山县| 百色市| 临澧县| 湖南省| 济宁市| 沁阳市| 扶余县| 高清| 米林县| 嘉祥县| 酉阳| 奇台县| 新宁县| 金川县| 东山县| 兴安盟| 莫力| 阳谷县| 新晃|