官术网_书友最值得收藏!

What is RL? 

An RL agent is an optimization process that learns from experience, using data from its environment that it has collected through its own observations. It starts out knowing nothing about a task explicitly, learns by trial and error about what happens when it makes decisions, keeps track of successful decisions, and makes those same decisions under the same circumstances in the future.

In fields other than AI, RL is also referred to as dynamic programming. It takes much of its basic operating structure from behavioral psychology, and many of its mathematical constructs such as utility functions are taken from fields such as economics and game theory. 

Let's get familiar with some key concepts in RL:

  • Agent: This is the decision-making entity.
  • Environment: This is the world in which the agent operates, such as a game to win or task to accomplish. 
  • State: This is where the agent is in its environment. When you define the states that an agent can be in, think about what it needs to know about its environment. For example, a self-driving car will need to know whether the next traffic light is red or green and whether there are pedestrians in the crosswalk; these are defined as state variables.
  • Action: This is the next move that the agent chooses to take.
  • Reward: This is the feedback that the agent gets from the environment for taking that action.
  • Policy: This is a function to map the agent's states to its actions. For your first RL agent, this will be as simple as a lookup table, called the Q-table. It will operate as your agent's brain. 
  • Value: This is the future reward that an agent would receive by taking an action based on the future actions it could take. This is separate from the immediate reward it will get from taking that action (the value is also commonly called the utility).

The first type of RL agent that you will create is a model-free agent. A model-free RL agent does not know anything about a state that it has not seen, and so will not be able to estimate the value of the reward that it will receive from an unknown state. In other words, it cannot generalize about its environment. We will explore the differences between model-free learning and model-based learning in greater depth later in the book.

The two major model-free RL algorithms are called Q-learning and state-action-reward-state-action (SARSA). The algorithm that we will use throughout the book is Q-learning. 

As we will see in the SARSA versus Q-learning – on-policy or off? section comparing the two algorithms, Q-learning can be treated as a variant of SARSA. We choose to use Q-learning as our introductory RL algorithm because it is relatively simple and straightforward to learn. As we build on and increase our RL skills, we can branch out into other algorithms that may be more complicated to learn, but they will give us better results. 

主站蜘蛛池模板: 玛沁县| 南平市| 孟州市| 汶川县| 朝阳区| 乾安县| 文水县| 青铜峡市| 阿拉善左旗| 磐石市| 新营市| 农安县| 渝北区| 临泉县| 阜宁县| 陈巴尔虎旗| 长白| 宣威市| 南召县| 枣强县| 丹江口市| 孝义市| 祁东县| 溧水县| 行唐县| 双峰县| 云梦县| 扎鲁特旗| 叙永县| 澜沧| 济宁市| 九台市| 台州市| 菏泽市| 鸡西市| 新野县| 长沙县| 同仁县| 乐清市| 岑溪市| 邛崃市|