官术网_书友最值得收藏!

Basic terminologies and conventions

The following are the basic terminologies associated with reinforcement learning:

  • Agent: This we create by programming such that it is able to sense the environment, perform actions, receive feedback, and try to maximize rewards.
  • Environment: The world where the agent resides. It can be real or simulated.
  • State: The perception or configuration of the environment that the agent senses. State spaces can be finite or infinite.
  • Rewards: Feedback the agent receives after any action it has taken. The goal of the agent is to maximize the overall reward, that is, the immediate and the future reward. Rewards are defined in advance. Therefore, they must be created properly to achieve the goal efficiently.
  • Actions: Anything that the agent is capable of doing in the given environment. Action space can be finite or infinite.
  • SAR triple: (state, action, reward) is referred as the SAR triple, represented as (s, a, r).
  • Episode: Represents one complete run of the whole task.

Let's deduce the convention shown in the following diagram:

Every task is a sequence of SAR triples. We start from state S(t), perform action A(t) and thereby, receive a reward R(t+1), and land on a new state S(t+1). The current state and action pair gives rewards for the next step. Since, S(t) and A(t) results in S(t+1), we have a new triple of (current state, action, new state), that is, [S(t),A(t),S(t+1)] or (s,a,s').

主站蜘蛛池模板: 望都县| 黔西县| 太保市| 宝鸡市| 黑山县| 兴文县| 临猗县| 衡东县| 健康| 宁河县| 巢湖市| 樟树市| 莲花县| 郑州市| 中江县| 黄梅县| 大渡口区| 敦煌市| 尼木县| 聂拉木县| 大竹县| 仙桃市| 晋中市| 刚察县| 雷州市| 江油市| 五原县| 葵青区| 万山特区| 措美县| 吉林省| 西乡县| 微山县| 台南县| 鹤峰县| 子洲县| 新巴尔虎右旗| 克东县| 淅川县| 明光市| 蓝山县|