書名： Reinforcement Learning with TensorFlow
作者名： Sayon Dutta
本章字數： 81字
更新時間： 2021-08-27 18:51:57

The policy model for optimality

Policy is defined as the model that guides the agent with action selection in different states. Policy is denoted as . is basically the probability of a certain action given a particular state:

Thus, a policy map will provide the set of probabilities of different actions given a particular state. The policy along with the value function create a solution that helps in agent navigation as per the policy and the calculated value of the state.

官术网_书友最值得收藏!

Reinforcement Learning with TensorFlow

The policy model for optimality