We have previously established that the goal of supervised learning is always to predict the labels (or target values) of some data. However, depending on the nature of these labels, supervised learning can come in two distinct forms:
Classification: Supervised learning is called classification whenever we use the data to predict categories. A good example of this is when we try to predict whether an image contains a cator adog. Here, the labels of the data are categorical, either one or the other, but never a mixture of categories. For example, a picture contains either a cat or a dog, never 50 percent cat and 50 percent dog (before you ask, no, here we do not consider pictures of the cartoon character CatDog), and our job is simply to tell which one it is. When there are only two choices, it is calledtwo-classorbinaryclassification. When there are more than two categories, as when predicting what the weather will be like the next day, it is known asmulti-classclassification.
Regression: Supervised learning is called regression whenever we use the data to predict real values. A good example of this is when we try to predict stock prices. Rather than predicting stock categories, the goal of regression is to predict a target value as accurately as possible; for example, to predict the stock prices with as little an error as possible.
Perhaps the easiest way to figure out whether we are dealing with a classification or regression problem is to ask ourselves the following question: What are we actually trying to predict? The answer is given in the following figure:
Differentiating between classification and regression problems