Skip to main content

Table 4 Clustering comparison functions implemented in CDLIB

From: CDLIB: a python library to extract, compare and evaluate communities from complex networks

Name

Description

Formula

Reference

ami

Adjusted Mutual Information is an adjustment of the Mutual Information score to account for chance.

\(\frac {(X, Y) - E(MI(X, Y))} {max(H(X), H(Y)) - E(MI(X, Y))}\)

(Vinh et al. 2010)

ari

The Rand Index computes a similarity measure between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings.

\(\frac {RI - Expected_{RI}}{max(RI - Expected_{RI})}\)

(Hubert and Arabie 1985)

closeness

Closeness of community size distributions.

\(\frac {1}{2} \sum _{i=1}^{r} \sum _{j=1}^{s} \min \left \{\frac {x_{a}\left (n^{i}_{a}\right)}{N^{a}}, \frac {x_{b}\left (x^{j}_{b}\right)}{N_{b}}\right \} \delta _{1}\left (n^{i}_{a}, n^{j}_{b}\right)\)

(Dao et al. 2018)

f1

Average F1 score (harmonic mean of Precision and Recall) of the optimal matches among the partitions in input. clustering.

\(2 \times \frac {precision \times recall} {precision + recall}\)

(Rossetti et al. 2016)

nf1

Normalized version of F1 that corrects the resemblance score taking into account degree of node overlap and clutering coverage.

\( \frac {F1\times Coverage}{Redundancy}\)

(Rossetti 2017),(Rossetti et al. 2016)

nmi

Normalized Mutual Information (NMI) is an normalization of the Mutual Information (MI) score to scale the results between 0 (no mutual information) and 1 (perfect correlation)

\(\frac {H(X) + H(Y) - H(X, Y)}{(H(X) + H(Y))/2}\)

(Lancichinetti et al. 2009)

onmi-LFK

Original extension of the Normalized Mutual Information (NMI) score to cope with overlapping partitions.

\( 1 - \frac {1}{2}\bigg (\frac {H(X|Y)}{H(X)}+\frac {H(Y|X)}{H(Y)}\bigg)\)

(Lancichinetti et al. 2009)

onmi-MGH

Extension of the Normalized Mutual Information (NMI) score to cope with overlapping partitions, based on max normalization.

\( \frac {I(X:Y)}{max(H(X),H(Y))}\)

(McDaid et al. 2011)

omega

Resemblance index defined for overlapping, complete coverage, clusterings.

\( \frac {Obs(s1, s2) - Exp(s1, s2)} {1 - Exp(s1, s2)}\)

(Murray et al. 2012)

vi

Variation of Information among two nodes partitions.

H(X)+H(Y)−2MI(X,Y)

(Meilă 2007)