官术网_书友最值得收藏!

A trade-off between homogeneity and completeness using the V-measure

The reader who's familiar with supervised learning should know the concept of F-score (or F-measure), which is the harmonic mean of precision and recall. The same kind of trade-off can be employed also when evaluating clustering results given the ground truth.

In fact, in many cases, it's helpful to have a single measure that takes into account both homogeneity and completeness. Such a result can be easily achieved using the V-measure (or V-score), which is defined as:

For the Breast Cancer Wisconsin dataset, the V-measure is as follows:

from sklearn.metrics import v_measure_score

print('V-Score: {}'.format(v_measure_score(kmdff['diagnosis'], kmdff['prediction'])))

The output of the previous snippet is as follows:

V-Score: 0.46479332792160793

As expected, the V-Score is an average measure that, in this case, is negatively influenced by a lower homogeneity. Of course, this index doesn't provide any different information, hence it's helpful only to synthesize completeness and homogeneity in a single value. However, with a few simple but tedious mathematical manipulations, it's possible to prove that the V-measure is also symmetric (that is, V(Ypred|Vtrue) = V(Ytrue|Ypred)); therefore, given two independent assignments Y1 and Y2, V(Y1|Y2) it is a measure of agreement between them. Such a scenario is not extremely common, because other measures can achieve a better result. However, such a score could be employed, for example, to check whether two algorithms (possibly based on different strategies) tend to produce the same assignments or if they are discordant. In the latter case, even if the ground truth is unknown, the data scientist can understand that one strategy is surely not as effective as the other one and start an exploration process in order to find out the optimal clustering algorithm.

主站蜘蛛池模板: 鹤庆县| 黄陵县| 新竹市| 来宾市| 万盛区| 靖远县| 玉门市| 常熟市| 吉安县| 乌审旗| 铜鼓县| 尼勒克县| 天祝| 和硕县| 余庆县| 噶尔县| 纳雍县| 武清区| 揭西县| 盐津县| 东乌珠穆沁旗| 利川市| 沾益县| 广河县| 尼玛县| 防城港市| 雅江县| 德兴市| 桃园市| 原阳县| 喀什市| 康定县| 兴安县| 太仆寺旗| 湘潭市| 武清区| 溆浦县| 浦城县| 德昌县| 嵊泗县| 华阴市|