官术网_书友最值得收藏!

RL algorithm

The steps involved in typical RL algorithm are as follows:

  1. First, the agent interacts with the environment by performing an action
  2. The agent performs an action and moves from one state to another
  3. And then the agent will receive a reward based on the action it performed
  4. Based on the reward, the agent will understand whether the action was good or bad
  5. If the action was good, that is, if the agent received a positive reward, then the agent will prefer performing that action or else the agent will try performing an other action which results in a positive reward. So it is basically a trial and error learning process
主站蜘蛛池模板: 金堂县| 盈江县| 晋城| 乾安县| 鄂托克前旗| 浑源县| 昭平县| 广西| 红桥区| 德保县| 紫阳县| 沈阳市| 武邑县| 乐都县| 岳西县| 惠水县| 松阳县| 无极县| 靖边县| 汪清县| 仙桃市| 明星| 延川县| 台安县| 阿坝| 临澧县| 平顺县| 诸城市| 香格里拉县| 乐业县| 临安市| 厦门市| 黑河市| 孝感市| 额敏县| 云阳县| 龙山县| 崇阳县| 西林县| 宝兴县| 舒城县|