書名： Reinforcement Learning with TensorFlow
作者名： Sayon Dutta
本章字數： 130字
更新時間： 2021-08-27 18:52:01

Markov decision processes

As already mentioned, an MDP is a reinforcement learning approach in a gridworld environment containing sets of states, actions, and rewards, following the Markov property to obtain an optimal policy. MDP is defined as the collection of the following:

States: S
Actions: A(s), A
Transition model: T(s,a,s') ~ P(s'|s,a)
Rewards: R(s), R(s,a), R(s,a,s')
Policy: is the optimal policy

In the case of an MDP, the environment is fully observable, that is, whatever observation the agent makes at any point in time is enough to make an optimal decision. In case of a partially observable environment, the agent needs a memory to store the past observations to make the best possible decisions.

Let's try to break this into different lego blocks to understand what this overall process means.

官术网_书友最值得收藏!

Reinforcement Learning with TensorFlow

Markov decision processes