官术网_书友最值得收藏!

Evaluating the performance of your model

Evaluating the predictive performance of a model requires defining a measure of the quality of its predictions. There are several available metrics both for regression and classification. The metrics used in the context of Amazon ML are the following ones:

  • RMSE for regression: The root mean squared error is defined by the square of the difference between the true outcome values and their predictions:
  • F-1 Score and ROC-AUC for classification: Amazon ML uses logistic regression for binary classification problems. For each prediction, logistic regression returns a value between 0 and 1. This value is interpreted as a probability of the sample belonging to one of the two classes. A probability lower than 0.5 indicates belonging to the first class, while a probability higher than 0.5 indicates a belonging to the second class. The decision is therefore highly dependent on the value of the threshold. A value which we can modify.
  • Denoting one class positive and the other negative, we have four possibilities depicted in the following table:
  • This matrix is called a confusion matrix (https://en.wikipedia.org/wiki/Confusion_matrix) . It defines four indicators of the performance of a classification model:
    • TP: How many Yes were correctly predicted Yes
    • FP: How many No were wrongly predicted Yes
    • FN: How many Yes were wrongly predicted No
    • TN: How many No were correctly predicted No
  • From these four indicators, we can define the following metrics:
    • Recall: This denotes the amount of predicted positives actually positive. Recall is also called True Positive Rate (TPR) or sensitivity. It is the probability of detection:

Recall = (TP / TP + FN)

    • Precision as the fraction of the real positives over all the positive predicted values:

Precision = (TP / TP + FP)

    • False Positive Rate is the number of falsely predicted positives over all the true negatives. It's the probability of false alarm:

FPR = FP / FP + TN

    • Finally, the F1-score is defined as the weighted average of the recall and the precision, and is given by the following:

F1-score = 2 TP / ( 2 TP + FP + FN)

    • A F1 score is always between 0 and 1, with 1 the best value and 0 the worst one.

As noted previously, these scores are all dependent on the initial threshold used to interpret the result of the logistic regression in order to decide when a prediction belongs to one class or the other. We can choose to vary that threshold. This is where the ROC-AUC comes in.

If you plot the True Positive Rate (Recall) against the False Positive Rate for different values of the decision threshold, you obtain a graph like the following, called the Receiver Operating Characteristic or ROC curve:

  • The diagonal line indicates an equal probability of belonging to one class or another. The closer the curve is to the upper-left corner, the better your model performances are.
  • The ROC curve has been widely used since WWII, when it was first invented to detect enemy planes in radar signals.
  • Once you have the ROC curve, you can calculate the Area Under the Curve or AUC.
  • The AUC will give you a unique score for your model taking into account all the possible values for the probability threshold from 0 to 1. The higher the AUC the better.
主站蜘蛛池模板: 太原市| 垣曲县| 平和县| 松溪县| 灯塔市| 武汉市| 遵化市| 阿尔山市| 什邡市| 定州市| 新津县| 武冈市| 武安市| 太仓市| 罗平县| 甘南县| 河池市| 湄潭县| 巴塘县| 涟源市| 武功县| 临江市| 乐平市| 三穗县| 大理市| 台湾省| 泸州市| 信丰县| 平遥县| 诸暨市| 天等县| 江都市| 游戏| 广河县| 永吉县| 班戈县| 监利县| 富平县| 苏尼特右旗| 夏河县| 永州市|