官术网_书友最值得收藏!

RL algorithm

The steps involved in typical RL algorithm are as follows:

  1. First, the agent interacts with the environment by performing an action
  2. The agent performs an action and moves from one state to another
  3. And then the agent will receive a reward based on the action it performed
  4. Based on the reward, the agent will understand whether the action was good or bad
  5. If the action was good, that is, if the agent received a positive reward, then the agent will prefer performing that action or else the agent will try performing an other action which results in a positive reward. So it is basically a trial and error learning process
主站蜘蛛池模板: 弥渡县| 维西| 铁岭市| 连江县| 永济市| 通渭县| 涿州市| 北流市| 南汇区| 神农架林区| 平谷区| 肥西县| 郴州市| 福泉市| 贺州市| 宾阳县| 衡阳县| 临泉县| 浪卡子县| 乌兰察布市| 南丰县| 高安市| 青浦区| 顺义区| 改则县| 武山县| 乐都县| 平远县| 祁阳县| 颍上县| 辽阳市| 类乌齐县| 宁国市| 许昌市| 左权县| 靖西县| 且末县| 江都市| 兴义市| 昌吉市| 凤阳县|