官术网_书友最值得收藏!

  • Python Reinforcement Learning
  • Sudharsan Ravichandiran Sean Saito Rajalingappaa Shanmugamani Yang Wenzhuo
  • 85字
  • 2021-06-24 15:17:32

The policy function

We have learned about the policy function in Chapter 1Introduction to Reinforcement Learning, which maps the states to actions. It is denoted by π. 

The policy function can be represented as  , indicating mapping from states to actions. So, basically, a policy function says what action to perform in each state. Our ultimate goal lies in finding the optimal policy which specifies the correct action to perform in each state, which maximizes the reward.

主站蜘蛛池模板: 招远市| 即墨市| 广南县| 盘山县| 和林格尔县| 永春县| 太湖县| 文山县| 广德县| 榕江县| 镇雄县| 涡阳县| 自治县| 保定市| 诸城市| 黄浦区| 固安县| 梁河县| 广平县| 三门峡市| 佳木斯市| 江津市| 凉城县| 靖安县| 罗定市| 札达县| 新巴尔虎左旗| 东城区| 望奎县| 司法| 玛沁县| 江都市| 井冈山市| 贵南县| 永安市| 大埔县| 崇礼县| 赣州市| 交口县| 长春市| 沂水县|