Reinforcement learning algorithms

A reinforcement learning scenario can be considered as a supervised one where the hidden teacher provides only approximate feedback after every decision of the model. More formally, reinforcement learning is characterized by continuous interaction between an agent and an environment. The former is responsible for making decisions (actions), finalized to increase its return, while the latter provides feedback to every action. Feedback is generally considered as a reward, whose value can be either positive (the action has been successful) or negative (the action shouldn't be repeated). As the agent analyzes different configurations of the environment (states), every reward must be considered as bound to the tuple (action, state). Hence, the final goal is to find a policy (a strategy that suggests the optimal action in every state) that maximizes the expected total reward.

A very classical example of reinforcement learning is an agent that learns how to play a game. During an episode, the agent tests the actions in all encountered states and collects the rewards. An algorithm corrects the policy to reduce the likelihood of non-positive actions (that is, those whose reward is positive) and increase the expected total rewards obtainable at the end of the episode.

Reinforcement learning has many interesting applications, which are not limited to games. For example, a recommendation system can correct suggestions according to binary feedback (for example, thumb up or down) provided by the user. The main difference between reinforcement and supervised learning is the information provided by the environment. In fact, in a supervised scenario, the correction is normally proportional to it, while in a reinforcement learning one it must be analyzed considering a sequence of actions and future rewards. Therefore, the corrections are normally based on the estimation of the expected reward and their effect is influenced by the value of the subsequent actions. For example, a supervised model has no memory, hence its corrections are immediate, while a reinforcement learning agent must consider the partial rollout of an episode in order to decide whether an action is actually negative or not.

Reinforcement learning is a fascinating branch of machine learning. Unfortunately, this topic is beyond the scope of this work, therefore we are not discussing it in detail (you can find further details in Hands-On Reinforcement Learning with Python, Ravichandiran S., Packt Publishing, 2018 and Mastering Machine Learning Algorithms, Bonaccorso G., Packt Publishing, 2018).

We can now briefly explain why Python has been chosen as the main language for this exploration of the world of unsupervised learning.

官术网_书友最值得收藏!

Hands-On Unsupervised Learning with Python

Reinforcement learning algorithms