官术网_书友最值得收藏!

How it works...

Model evaluation is a key step in any machine learning process. It is different for supervised and unsupervised models. In supervised models, predictions play a major role; whereas in unsupervised models, homogeneity within clusters and heterogeneity across clusters play a major role.

Some widely used model evaluation parameters for regression models (including cross validation) are as follows:

  • Coefficient of determination
  • Root mean squared error
  • Mean absolute error
  • Akaike or Bayesian information criterion

Some widely used model evaluation parameters for classification models (including cross validation) are as follows:

  • Confusion matrix (accuracy, precision, recall, and F1-score)
  • Gain or lift charts
  • Area under ROC (receiver operating characteristic) curve
  • Concordant and discordant ratio

Some of the widely used evaluation parameters of unsupervised models (clustering) are as follows:

  • Contingency tables
  • Sum of squared errors between clustering objects and cluster centers or centroids
  • Silhouette value
  • Rand index
  • Matching index
  • Pairwise and adjusted pairwise precision and recall (primarily used in NLP)

Bias and variance are two key error components of any supervised model; their trade-off plays a vital role in model tuning and selection. Bias is due to incorrect assumptions made by a predictive model while learning outcomes, whereas variance is due to model rigidity toward the training dataset. In other words, higher bias leads to underfitting and higher variance leads to overfitting of models.

In bias, the assumptions are on target functional forms. Hence, this is dominant in parametric models such as linear regression, logistic regression, and linear discriminant analysis as their outcomes are a functional form of input variables.

Variance, on the other hand, shows how susceptible models are to change in datasets. Generally, target functional forms control variance. Hence, this is dominant in non-parametric models such as decision trees, support vector machines, and K-nearest neighbors as their outcomes are not directly a functional form of input variables. In other words, the hyperparameters of non-parametric models can lead to overfitting of predictive models.

主站蜘蛛池模板: 巨鹿县| 吉隆县| 南宫市| 仁怀市| 屏东市| 基隆市| 鸡西市| 开江县| 磴口县| 朝阳县| 南涧| 柞水县| 庆元县| 白山市| 电白县| 斗六市| 澄城县| 永登县| 板桥市| 诏安县| 清水河县| 澎湖县| 白城市| 修文县| 阿坝县| 大英县| 乌兰浩特市| 象山县| 延长县| 牙克石市| 湖北省| 安平县| 郧西县| 阿拉善左旗| 元朗区| 册亨县| 通许县| 桐乡市| 莱阳市| 克东县| 开封市|