- Machine Learning with Scala Quick Start Guide
- Md. Rezaul Karim
- 321字
- 2021-06-24 14:32:00
Supervised learning
Supervised learning is the simplest and most well-known automatic learning task. It is based on a number of predefined examples, in which the category to which each of the inputs should belong is already known, as shown in the following diagram:

The preceding diagram shows a typical workflow of supervised learning. An actor (for example, a data scientist or data engineer) performs Extraction Transformation Load (ETL) and the necessary feature engineering (including feature extraction, selection, and so on) to get the appropriate data with features and labels so that they can be fed in to the model. Then he would split the data into training, development, and test sets. The training set is used to train an ML model, the validation set is used to validate the training against the overfitting problem and regularization, and then the actor would evaluate the model's performance on the test set (that is, unseen data).
However, if the performance is not satisfactory, he can perform additional tuning to get the best model based on hyperparameter optimization. Finally, he would deploy the best model in a production-ready environment. The following diagram summarizes these steps in a nutshell:

In the overall life cycle, there might be many actors involved (for example, a data engineer, data scientist, or an ML engineer) to perform each step independently or collaboratively. The supervised learning context includes classification and regression tasks; classification is used to predict which class a data point is a part of (discrete value). It is also used for predicting the label of the class attribute. On the other hand, regression is used for predicting continuous values and making a numeric prediction of the class attribute.
In the context of supervised learning, the learning process required for the input dataset is split randomly into three sets, for example, 60% for the training set, 10% for the validation set, and the remaining 30% for the testing set.
- 大數據技術基礎
- 3D Printing with RepRap Cookbook
- 極簡AI入門:一本書讀懂人工智能思維與應用
- Java開發技術全程指南
- CSS全程指南
- Visual C++編程全能詞典
- 完全掌握AutoCAD 2008中文版:機械篇
- Microsoft System Center Confi guration Manager
- 基于神經網絡的監督和半監督學習方法與遙感圖像智能解譯
- Flink原理與實踐
- Learn QGIS
- Spark大數據商業實戰三部曲:內核解密|商業案例|性能調優
- Mastering Predictive Analytics with scikit:learn and TensorFlow
- 無人駕駛感知智能
- 電動汽車驅動與控制技術