官术网_书友最值得收藏!

Getting started with reinforcement learning

Reinforcement learning aims to create algorithms that can learn and adapt to environmental changes. This programming technique is based on the concept of receiving external stimuli that depend on the actions chosen by the agent. A correct choice will involve a reward, while an incorrect choice will lead to a penalty. The goal of the system is to achieve the best possible result, of course.

These mechanisms derive from the basic concepts of machine learning (learning from experience), in an attempt to simulate human behavior. In fact, in our mind, we activate brain mechanisms that lead us to chase and repeat what, in us, produces feelings of gratification and wellbeing. Whenever we experience moments of pleasure (food, sex, love, and so on), some substances are produced in our brains that work by reinforcing that same stimulus, emphasizing it.

Along with this mechanism of neurochemical reinforcement, an important role is represented by memory. In fact, the memory collects the experience of the subject in order to be able to repeat it in the future. Evolution has endowed us with this mechanism to push us to repeat gratifying experiences in the direction of the best solutions.

This is why we so powerfully remember the most important experiences of our life: experiences, especially those that are powerfully rewarding, are impressed in memories and condition our future explorations. Previously, we have seen that learning from experience can be simulated by a numerical algorithm in various ways, depending on the nature of the signal used for learning and the type of feedback adopted by the system.

The following diagram shows a flowchart that displays an agent's interaction with the environment in a reinforcement learning setting:

Scientific literature has taken an uncertain stance on the classification of learning by reinforcement as a paradigm. In fact, in the initial phase of the literature, it was considered a special case of supervised learning, after which it was fully promoted as the third paradigm of machine learning algorithms. It is applied in different contexts in which supervised learning is inefficient: the problems of interaction with the environment are a clear example.

The following list shows the steps to follow to correctly apply a reinforcement learning algorithm:

  1. Preparation of the agent
  2. Observation of the environment
  3. Selection of the optimal strategy
  4. Execution of actions
  5. Calculation of the corresponding reward (or penalty)
  6. Development of updating strategies (if necessary)
  7. Repetition of steps 2 through 5 iteratively until the agent learns the optimal strategies

Reinforcement learning is based on a theory from psychology, elaborated following a series of experiments performed on animals. In particular, Edward Thorndike (American psychologist) noted that if a cat is given a reward immediately after the execution of a behavior considered correct, then this increases the probability that this behavior will repeat itself. On the other hand, in the face of unwanted behavior, the application of a punishment decreases the probability of a repetition of the error.

On the basis of this theory, after defining a goal to be achieved, reinforcement learning tries to maximize the rewards received for the execution of the action or set of actions that allows to reach the designated goal.

主站蜘蛛池模板: 深水埗区| 曲阳县| 城固县| 达拉特旗| 乌什县| 安图县| 台中县| 白山市| 高唐县| 衡水市| 安徽省| 徐闻县| 马尔康县| 吉隆县| 云南省| 沅江市| 藁城市| 区。| 柞水县| 清涧县| 远安县| 柳河县| 澳门| 金湖县| 剑河县| 九龙坡区| 茶陵县| 兰溪市| 宜章县| 神池县| 阳东县| 多伦县| 邵阳市| 镇坪县| 锦州市| 仲巴县| 鄂温| 民县| 祁连县| 公主岭市| 永丰县|