- Machine Learning Quick Reference
- Rahul Kumar
- 354字
- 2021-08-20 10:05:08
H-measure
Binary classification has to apply techniques so that it can map independent variables to different labels. For example, a number of variables exist such as gender, income, number of existing loans, and payment on time/not, that get mapped to yield a score that helps us classify the customers into good customers (more propensity to pay) and bad customers.
Typically, everyone seems to be caught up with the misclassification rate or derived form since the area under curve (AUC) is known to be the best evaluator of our classification model. You get this rate by dividing the total number of misclassified examples by the total number of examples. But does this give us a fair assessment? Let's see. Here, we have a misclassification rate that keeps something important under wraps. More often than not, classifiers come up with a tuning parameter, the side effect of which tends to be favoring false positives over false negatives, or vice versa. Also, picking the AUC as sole model evaluator can act as a double whammy for us. AUC has got different misclassification costs for different classifiers, which is not desirable. This means that using this is equivalent to using different metrics to evaluate different classification rules.
As we have already discussed, the real test of any classifier takes place on the unseen data, and this takes a toll on the model by some decimal points. Adversely, if we have got scenarios like the preceding one, the decision support system will not be able to perform well. It will start producing misleading results.
H-measure overcomes the situation of incurring different misclassification costs for different classifiers. It needs a severity ratio as input, which examines how much more severe misclassifying a class 0 instance is than misclassifying a class 1 instance:
Severity Ratio = cost_0/cost_1
Here, cost_0 > 0 is the cost of misclassifying a class 0 datapoint as class 1.
It is sometimes more convenient to consider the normalized cost c = cost_0/(cost_0 + cost_1) instead. For example, severity.ratio = 2 implies that a false positive costs twice as much as a false negative.
- AWS:Security Best Practices on AWS
- UTM(統(tǒng)一威脅管理)技術(shù)概論
- 工業(yè)機(jī)器人工程應(yīng)用虛擬仿真教程:MotoSim EG-VRC
- 數(shù)據(jù)產(chǎn)品經(jīng)理:解決方案與案例分析
- 單片機(jī)C語言應(yīng)用100例
- 突破,Objective-C開發(fā)速學(xué)手冊(cè)
- 奇點(diǎn)將至
- 典型Hadoop云計(jì)算
- Red Hat Enterprise Linux 5.0服務(wù)器構(gòu)建與故障排除
- 軟件測(cè)試設(shè)計(jì)
- 新世紀(jì)Photoshop CS6中文版應(yīng)用教程
- 大數(shù)據(jù):從基礎(chǔ)理論到最佳實(shí)踐
- Architectural Patterns
- Learn SOLIDWORKS 2020
- 網(wǎng)頁配色萬用寶典