- Hands-On Q-Learning with Python
- Nazia Habib
- 398字
- 2021-06-24 15:13:09
RL, supervised learning, and unsupervised learning
What is the difference between RL, supervised learning, and unsupervised learning? Well, all of them involve developing rules about an unknown environment using labeled or unlabeled data. The following is a simple diagram charting the different terms:

Take a look at the following definitions of each term:
- Supervised learning feeds labeled training data into an algorithm, trains the algorithm on that data, generates predictions for unlabeled testing data, and then compares the predictions of the model to the actual labels. The goal of supervised learning is to generate class labels for unseen data or to predict unseen numerical values using regression.
- Unsupervised learning looks for similarities between different observations of unlabeled data. An unsupervised learning algorithm looks for observations that fit together along axes of similarity. The goal of unsupervised learning is to group together similar observations based on relevant criteria.
- RL seeks to optimize a variable under a set of constraints. An RL algorithm, called an agent, is seeking an optimal path to a goal. Therefore, the goal of RL is to find a set of actions, mapped to a set of states, that leads us to the best possible outcome in a situation that we have limited information about.
The primary difference between these three learning methods is in the type of question being asked:
- Supervised learning works well for classification and regression problems (for example, whether a customer will buy a product or how much they might spend)
- Unsupervised learning works well for problems dealing with association (for example, what products customers might buy together) and anomaly detection
- RL works best when there is a specific value to be optimized and a function that can be discovered within a problem to optimize it (for example, how can we maximize the number of times a user will click on links or download apps based on the advertisements that we show them)
Note that this list of uses for each method is not exhaustive; we are only presenting well-known examples of the type of problem each method tends to work well for.
There are many other examples of questions that we might ask and other machine learning algorithms that we might use to solve them, but understanding the broad similarities and differences between these three major types will be useful for us going forward.