官术网_书友最值得收藏!

Policy

A policy is an algorithm or a set of rules that describe how an agent makes its decisions. An example policy can be the strategy an investor uses to trade stocks, where the investor buys a stock when its price goes down and sells the stock when the price goes up.

More formally, a policy is a function, usually denoted as , that maps a state, , to an action, :

This means that an agent decides its action given its current state. This function can represent anything, as long as it can receive a state as input and output an action, be it a table, graph, or machine learning classifier.

For example, suppose we have an agent that is supposed to navigate a maze. We shall further assume that the agent knows what the maze looks like; the following is how the agent's policy can be represented:

Figure 1: A maze where each arrow indicates where an agent would go next

Each white square in this maze represents a state the agent can be in. Each blue arrow refers to the action an agent would take in the corresponding square. This essentially represents the agent's policy for this maze. Moreover, this can also be regarded as a deterministic policy, for the mapping from the state to the action is deterministic. This is in contrast to a stochastic policy, where a policy would output a probability distribution over the possible actions given some state:

Here,is a normalized probability vector over all the possible actions, as shown in the following example:

Figure 2: A policy mapping the game state (the screen) to actions (probabilities)

The agent playing the game of Breakout has a policy that takes the screen of the game as input and returns a probability for each possible action.

主站蜘蛛池模板: 姜堰市| 延长县| 中牟县| 延长县| 恩施市| 石景山区| 清流县| 巴林左旗| 武义县| 南涧| 织金县| 罗山县| 讷河市| 三原县| 莱州市| 博湖县| 竹溪县| 伊金霍洛旗| 洛浦县| 信丰县| 昌吉市| 如皋市| 平顶山市| 图们市| 竹北市| 奉贤区| 宾阳县| 凉城县| 松溪县| 洛阳市| 寿宁县| 巩义市| 彭泽县| 唐海县| 虹口区| 江山市| 肇东市| 锡林浩特市| 武义县| 广丰县| 蒙城县|