官术网_书友最值得收藏!

What is RL? 

An RL agent is an optimization process that learns from experience, using data from its environment that it has collected through its own observations. It starts out knowing nothing about a task explicitly, learns by trial and error about what happens when it makes decisions, keeps track of successful decisions, and makes those same decisions under the same circumstances in the future.

In fields other than AI, RL is also referred to as dynamic programming. It takes much of its basic operating structure from behavioral psychology, and many of its mathematical constructs such as utility functions are taken from fields such as economics and game theory. 

Let's get familiar with some key concepts in RL:

  • Agent: This is the decision-making entity.
  • Environment: This is the world in which the agent operates, such as a game to win or task to accomplish. 
  • State: This is where the agent is in its environment. When you define the states that an agent can be in, think about what it needs to know about its environment. For example, a self-driving car will need to know whether the next traffic light is red or green and whether there are pedestrians in the crosswalk; these are defined as state variables.
  • Action: This is the next move that the agent chooses to take.
  • Reward: This is the feedback that the agent gets from the environment for taking that action.
  • Policy: This is a function to map the agent's states to its actions. For your first RL agent, this will be as simple as a lookup table, called the Q-table. It will operate as your agent's brain. 
  • Value: This is the future reward that an agent would receive by taking an action based on the future actions it could take. This is separate from the immediate reward it will get from taking that action (the value is also commonly called the utility).

The first type of RL agent that you will create is a model-free agent. A model-free RL agent does not know anything about a state that it has not seen, and so will not be able to estimate the value of the reward that it will receive from an unknown state. In other words, it cannot generalize about its environment. We will explore the differences between model-free learning and model-based learning in greater depth later in the book.

The two major model-free RL algorithms are called Q-learning and state-action-reward-state-action (SARSA). The algorithm that we will use throughout the book is Q-learning. 

As we will see in the SARSA versus Q-learning – on-policy or off? section comparing the two algorithms, Q-learning can be treated as a variant of SARSA. We choose to use Q-learning as our introductory RL algorithm because it is relatively simple and straightforward to learn. As we build on and increase our RL skills, we can branch out into other algorithms that may be more complicated to learn, but they will give us better results. 

主站蜘蛛池模板: 南京市| 武山县| 海盐县| 枞阳县| 克拉玛依市| 雷山县| 来凤县| 丹寨县| 囊谦县| 寿阳县| 萝北县| 景德镇市| 剑河县| 湖北省| 永仁县| 洛扎县| 班戈县| 湖北省| 保亭| 岗巴县| 自贡市| 桦南县| 锦州市| 平陆县| 榕江县| 宁国市| 朔州市| 林周县| 来凤县| 喀喇沁旗| 南华县| 恩施市| 准格尔旗| 保山市| 芜湖县| 石屏县| 汨罗市| 阳泉市| 揭西县| 龙岩市| 施甸县|