- Reinforcement Learning with TensorFlow
- Sayon Dutta
- 234字
- 2021-08-27 18:51:57
Basic terminologies and conventions
The following are the basic terminologies associated with reinforcement learning:
- Agent: This we create by programming such that it is able to sense the environment, perform actions, receive feedback, and try to maximize rewards.
- Environment: The world where the agent resides. It can be real or simulated.
- State: The perception or configuration of the environment that the agent senses. State spaces can be finite or infinite.
- Rewards: Feedback the agent receives after any action it has taken. The goal of the agent is to maximize the overall reward, that is, the immediate and the future reward. Rewards are defined in advance. Therefore, they must be created properly to achieve the goal efficiently.
- Actions: Anything that the agent is capable of doing in the given environment. Action space can be finite or infinite.
- SAR triple: (state, action, reward) is referred as the SAR triple, represented as (s, a, r).
- Episode: Represents one complete run of the whole task.
Let's deduce the convention shown in the following diagram:

Every task is a sequence of SAR triples. We start from state S(t), perform action A(t) and thereby, receive a reward R(t+1), and land on a new state S(t+1). The current state and action pair gives rewards for the next step. Since, S(t) and A(t) results in S(t+1), we have a new triple of (current state, action, new state), that is, [S(t),A(t),S(t+1)] or (s,a,s').
推薦閱讀
- Ansible Configuration Management
- 網(wǎng)絡(luò)服務(wù)器架設(shè)(Windows Server+Linux Server)
- 電腦上網(wǎng)直通車
- 城市道路交通主動控制技術(shù)
- CompTIA Linux+ Certification Guide
- 可編程序控制器應(yīng)用實訓(xùn)(三菱機型)
- 具比例時滯遞歸神經(jīng)網(wǎng)絡(luò)的穩(wěn)定性及其仿真與應(yīng)用
- RedHat Linux用戶基礎(chǔ)
- 計算機組成與操作系統(tǒng)
- Linux系統(tǒng)下C程序開發(fā)詳解
- PowerMill 2020五軸數(shù)控加工編程應(yīng)用實例
- C#求職寶典
- 智能+:制造業(yè)的智能化轉(zhuǎn)型
- FANUC工業(yè)機器人虛擬仿真教程
- Mastering MongoDB 4.x