官术网_书友最值得收藏!

What is RL? 

An RL agent is an optimization process that learns from experience, using data from its environment that it has collected through its own observations. It starts out knowing nothing about a task explicitly, learns by trial and error about what happens when it makes decisions, keeps track of successful decisions, and makes those same decisions under the same circumstances in the future.

In fields other than AI, RL is also referred to as dynamic programming. It takes much of its basic operating structure from behavioral psychology, and many of its mathematical constructs such as utility functions are taken from fields such as economics and game theory. 

Let's get familiar with some key concepts in RL:

  • Agent: This is the decision-making entity.
  • Environment: This is the world in which the agent operates, such as a game to win or task to accomplish. 
  • State: This is where the agent is in its environment. When you define the states that an agent can be in, think about what it needs to know about its environment. For example, a self-driving car will need to know whether the next traffic light is red or green and whether there are pedestrians in the crosswalk; these are defined as state variables.
  • Action: This is the next move that the agent chooses to take.
  • Reward: This is the feedback that the agent gets from the environment for taking that action.
  • Policy: This is a function to map the agent's states to its actions. For your first RL agent, this will be as simple as a lookup table, called the Q-table. It will operate as your agent's brain. 
  • Value: This is the future reward that an agent would receive by taking an action based on the future actions it could take. This is separate from the immediate reward it will get from taking that action (the value is also commonly called the utility).

The first type of RL agent that you will create is a model-free agent. A model-free RL agent does not know anything about a state that it has not seen, and so will not be able to estimate the value of the reward that it will receive from an unknown state. In other words, it cannot generalize about its environment. We will explore the differences between model-free learning and model-based learning in greater depth later in the book.

The two major model-free RL algorithms are called Q-learning and state-action-reward-state-action (SARSA). The algorithm that we will use throughout the book is Q-learning. 

As we will see in the SARSA versus Q-learning – on-policy or off? section comparing the two algorithms, Q-learning can be treated as a variant of SARSA. We choose to use Q-learning as our introductory RL algorithm because it is relatively simple and straightforward to learn. As we build on and increase our RL skills, we can branch out into other algorithms that may be more complicated to learn, but they will give us better results. 

主站蜘蛛池模板: 吉水县| 四子王旗| 江孜县| 万荣县| 无锡市| 安溪县| 安福县| 穆棱市| 长宁区| 邳州市| 安塞县| 屏边| 五莲县| 永胜县| 天水市| 巴马| 平乡县| 都江堰市| 杨浦区| 新兴县| 龙井市| 庄浪县| 赣州市| 仁布县| 枣强县| 沅陵县| 秦皇岛市| 香格里拉县| 米泉市| 桓台县| 彩票| 尖扎县| 抚顺市| 治县。| 达州市| 囊谦县| 威宁| 固镇县| 温宿县| 视频| 红桥区|