官术网_书友最值得收藏!

Supervised learning in practice with Python

As we said earlier, supervised learning algorithms learn to approximate a function by mapping inputs and outputs to create a model that is able to predict future outputs given unseen inputs.

It's conventional to denote inputs as x and outputs as y; both can be numerical or categorical.

We can distinguish them as two different types of supervised learning:

  • Classification
  • Regression

Classification is a task where the output variable can assume a finite amount of elements, called categories. An example of classification would be classifying different types of flowers (output) given the sepal length (input). Classification can be further categorized in more sub types:

  • Binary classification: The task of predicting whether an instance belongs either to one class or the other
  • Multiclass classification: The task (also known as multinomial) of predicting the most probable label (class) for each single instance
  • Multilabel classification: When multiple labels can be assigned to each input

Regression is a task where the output variable is continuous. Here are some common regression algorithms:

  • Linear regression: This finds linear relationships between inputs and outputs
  • Logistic regression: This finds the probability of a binary output

In general, the supervised learning problem is solved in a standard way by performing the following steps:

  1. Performing data cleaning to make sure the data we are using is as accurate and descriptive as possible.
  2. Executing the feature engineering process, which involves the creation of new features out of the existing ones for improving the algorithm's performance.
  3. Transforming input data into something that our algorithm can understand, which is known as data transformation. Some algorithms, such as neural networks, don't work well with data that is not scaled as they would naturally give more importance to inputs with a larger magnitude.
  4. Choosing an appropriate model (or a few of them) for the problem.
  5. Choosing an appropriate metric to measure the effectiveness of our algorithm.
  6. Train the model using a subset of the available data, called the training set. On this training set, we calibrate the data transformations.
  7. Testing the model.
主站蜘蛛池模板: 定州市| 南通市| 巴彦淖尔市| 赤水市| 宝鸡市| 九龙城区| 天气| 陇南市| 墨竹工卡县| 清涧县| 新宾| 安达市| 永春县| 嵊泗县| 白朗县| 什邡市| 龙口市| 井陉县| 丰台区| 长武县| 绵竹市| 高雄市| 华容县| 宁国市| 德令哈市| 温州市| 贵南县| 沐川县| 白山市| 丰都县| 刚察县| 扎鲁特旗| 丘北县| 沈阳市| 陆河县| 武山县| 灵璧县| 瑞安市| 宁蒗| 灵台县| 平果县|