- Effective Amazon Machine Learning
- Alexis Perrier
- 562字
- 2021-07-03 00:17:53
Evaluating the performance of your model
Evaluating the predictive performance of a model requires defining a measure of the quality of its predictions. There are several available metrics both for regression and classification. The metrics used in the context of Amazon ML are the following ones:
- RMSE for regression: The root mean squared error is defined by the square of the difference between the true outcome values and their predictions:

- F-1 Score and ROC-AUC for classification: Amazon ML uses logistic regression for binary classification problems. For each prediction, logistic regression returns a value between 0 and 1. This value is interpreted as a probability of the sample belonging to one of the two classes. A probability lower than 0.5 indicates belonging to the first class, while a probability higher than 0.5 indicates a belonging to the second class. The decision is therefore highly dependent on the value of the threshold. A value which we can modify.
- Denoting one class positive and the other negative, we have four possibilities depicted in the following table:

- This matrix is called a confusion matrix (https://en.wikipedia.org/wiki/Confusion_matrix) . It defines four indicators of the performance of a classification model:
- TP: How many Yes were correctly predicted Yes
- FP: How many No were wrongly predicted Yes
- FN: How many Yes were wrongly predicted No
- TN: How many No were correctly predicted No
- From these four indicators, we can define the following metrics:
- Recall: This denotes the amount of predicted positives actually positive. Recall is also called True Positive Rate (TPR) or sensitivity. It is the probability of detection:
Recall = (TP / TP + FN)
-
- Precision as the fraction of the real positives over all the positive predicted values:
Precision = (TP / TP + FP)
-
- False Positive Rate is the number of falsely predicted positives over all the true negatives. It's the probability of false alarm:
FPR = FP / FP + TN
-
- Finally, the F1-score is defined as the weighted average of the recall and the precision, and is given by the following:
F1-score = 2 TP / ( 2 TP + FP + FN)
-
- A F1 score is always between 0 and 1, with 1 the best value and 0 the worst one.
As noted previously, these scores are all dependent on the initial threshold used to interpret the result of the logistic regression in order to decide when a prediction belongs to one class or the other. We can choose to vary that threshold. This is where the ROC-AUC comes in.
If you plot the True Positive Rate (Recall) against the False Positive Rate for different values of the decision threshold, you obtain a graph like the following, called the Receiver Operating Characteristic or ROC curve:

- The diagonal line indicates an equal probability of belonging to one class or another. The closer the curve is to the upper-left corner, the better your model performances are.
- The ROC curve has been widely used since WWII, when it was first invented to detect enemy planes in radar signals.
- Once you have the ROC curve, you can calculate the Area Under the Curve or AUC.
- The AUC will give you a unique score for your model taking into account all the possible values for the probability threshold from 0 to 1. The higher the AUC the better.
- MySQL高可用解決方案:從主從復(fù)制到InnoDB Cluster架構(gòu)
- Java Data Science Cookbook
- 云計算服務(wù)保障體系
- 數(shù)據(jù)化網(wǎng)站運營深度剖析
- Spark核心技術(shù)與高級應(yīng)用
- Python數(shù)據(jù)分析與挖掘?qū)崙?zhàn)(第3版)
- 視覺大數(shù)據(jù)智能分析算法實戰(zhàn)
- 跨領(lǐng)域信息交換方法與技術(shù)(第二版)
- Oracle數(shù)據(jù)庫管理、開發(fā)與實踐
- Unity 2018 By Example(Second Edition)
- Spring MVC Beginner’s Guide
- 大數(shù)據(jù)時代系列(套裝9冊)
- 離線和實時大數(shù)據(jù)開發(fā)實戰(zhàn)
- Visual Studio 2012 and .NET 4.5 Expert Development Cookbook
- 成功之路:ORACLE 11g學(xué)習(xí)筆記