官术网_书友最值得收藏!

H-measure

Binary classification has to apply techniques so that it can map independent variables to different labels. For example, a number of variables exist such as gender, income, number of existing loans, and payment on time/not, that get mapped to yield a score that helps us classify the customers into good customers (more propensity to pay) and bad customers.

Typically, everyone seems to be caught up with the misclassification rate or derived form since the area under curve (AUC) is known to be the best evaluator of our classification model. You get this rate by dividing the total number of misclassified examples by the total number of examples. But does this give us a fair assessment? Let's see. Here, we have a misclassification rate that keeps something important under wraps. More often than not, classifiers come up with a tuning parameter, the side effect of which tends to be favoring false positives over false negatives, or vice versa. Also, picking the AUC as sole model evaluator can act as a double whammy for us. AUC has got different misclassification costs for different classifiers, which is not desirable. This means that using this is equivalent to using different metrics to evaluate different classification rules.

As we have already discussed, the real test of any classifier takes place on the unseen data, and this takes a toll on the model by some decimal points. Adversely, if we have got scenarios like the preceding one, the decision support system will not be able to perform well. It will start producing misleading results.

H-measure overcomes the situation of incurring different misclassification costs for different classifiers. It needs a severity ratio as input, which examines how much more severe misclassifying a class 0 instance is than misclassifying a class 1 instance:

Severity Ratio = cost_0/cost_1

Here, cost_0 > 0 is the cost of misclassifying a class 0 datapoint as class 1.

It is sometimes more convenient to consider the normalized cost c = cost_0/(cost_0 + cost_1) instead. For example, severity.ratio = 2 implies that a false positive costs twice as much as a false negative.

主站蜘蛛池模板: 大新县| 屯留县| 望奎县| 沿河| 新昌县| 三亚市| 资溪县| 上犹县| 安泽县| 石楼县| 左权县| 正蓝旗| 津南区| 开鲁县| 西吉县| 寿阳县| 凤山县| 黄浦区| 台北市| 青州市| 阳泉市| 格尔木市| 乃东县| 项城市| 昌黎县| 扶沟县| 佛学| 太康县| 台江县| 花莲市| 鄂托克旗| 青川县| 尉犁县| 土默特右旗| 凤台县| 洱源县| 岳普湖县| 姜堰市| 原阳县| 离岛区| 盖州市|