官术网_书友最值得收藏!

Reinforcement learning algorithms

A reinforcement learning scenario can be considered as a supervised one where the hidden teacher provides only approximate feedback after every decision of the model. More formally, reinforcement learning is characterized by continuous interaction between an agent and an environment. The former is responsible for making decisions (actions), finalized to increase its return, while the latter provides feedback to every action. Feedback is generally considered as a reward, whose value can be either positive (the action has been successful) or negative (the action shouldn't be repeated). As the agent analyzes different configurations of the environment (states), every reward must be considered as bound to the tuple (action, state). Hence, the final goal is to find a policy (a strategy that suggests the optimal action in every state) that maximizes the expected total reward.

A very classical example of reinforcement learning is an agent that learns how to play a game. During an episode, the agent tests the actions in all encountered states and collects the rewards. An algorithm corrects the policy to reduce the likelihood of non-positive actions (that is, those whose reward is positive) and increase the expected total rewards obtainable at the end of the episode.

Reinforcement learning has many interesting applications, which are not limited to games. For example, a recommendation system can correct suggestions according to binary feedback (for example, thumb up or down) provided by the user. The main difference between reinforcement and supervised learning is the information provided by the environment. In fact, in a supervised scenario, the correction is normally proportional to it, while in a reinforcement learning one it must be analyzed considering a sequence of actions and future rewards. Therefore, the corrections are normally based on the estimation of the expected reward and their effect is influenced by the value of the subsequent actions. For example, a supervised model has no memory, hence its corrections are immediate, while a reinforcement learning agent must consider the partial rollout of an episode in order to decide whether an action is actually negative or not.

Reinforcement learning is a fascinating branch of machine learning. Unfortunately, this topic is beyond the scope of this work, therefore we are not discussing it in detail (you can find further details in Hands-On Reinforcement Learning with Python, Ravichandiran S., Packt Publishing, 2018 and Mastering Machine Learning Algorithms, Bonaccorso G., Packt Publishing, 2018).

We can now briefly explain why Python has been chosen as the main language for this exploration of the world of unsupervised learning.

主站蜘蛛池模板: 玉环县| 塔河县| 台北市| 宁波市| 长武县| 云和县| 广饶县| 威远县| 新乐市| 松阳县| 诸暨市| 阿巴嘎旗| 伊吾县| 西盟| 巴塘县| 揭西县| 永顺县| 台州市| 平定县| 日土县| 西充县| 布尔津县| 车致| 南阳市| 柳林县| 刚察县| 龙泉市| 宣汉县| 陵川县| 隆昌县| 高雄市| 无为县| 府谷县| 江孜县| 师宗县| 恩施市| 高雄市| 仙桃市| 淮阳县| 聂拉木县| 九寨沟县|