- Reinforcement Learning with TensorFlow
- Sayon Dutta
- 130字
- 2021-08-27 18:52:01
Markov decision processes
As already mentioned, an MDP is a reinforcement learning approach in a gridworld environment containing sets of states, actions, and rewards, following the Markov property to obtain an optimal policy. MDP is defined as the collection of the following:
- States: S
- Actions: A(s), A
- Transition model: T(s,a,s') ~ P(s'|s,a)
- Rewards: R(s), R(s,a), R(s,a,s')
- Policy:
is the optimal policy
In the case of an MDP, the environment is fully observable, that is, whatever observation the agent makes at any point in time is enough to make an optimal decision. In case of a partially observable environment, the agent needs a memory to store the past observations to make the best possible decisions.
Let's try to break this into different lego blocks to understand what this overall process means.
推薦閱讀
- Hands-On Graph Analytics with Neo4j
- 數(shù)據(jù)中心建設(shè)與管理指南
- Blockchain Quick Start Guide
- Visual C# 2008開(kāi)發(fā)技術(shù)實(shí)例詳解
- STM32嵌入式微控制器快速上手
- 高維聚類知識(shí)發(fā)現(xiàn)關(guān)鍵技術(shù)研究及應(yīng)用
- Implementing AWS:Design,Build,and Manage your Infrastructure
- Docker on Amazon Web Services
- SAP Business Intelligence Quick Start Guide
- Hands-On Data Warehousing with Azure Data Factory
- ESP8266 Robotics Projects
- Apache源代碼全景分析(第1卷):體系結(jié)構(gòu)與核心模塊
- 基于RPA技術(shù)財(cái)務(wù)機(jī)器人的應(yīng)用與研究
- Mastering Ansible(Second Edition)
- PostgreSQL 10 High Performance