Skip to main content

Table 3 Fitness functions implemented in CDLIB

From: CDLIB: a python library to extract, compare and evaluate communities from complex networks

Name

Description

Formula

Reference

average_internal_degree

The average internal degree of the community set.

\(\frac {2e_{C}}{n_{C}}\)

(Radicchi et al. 2004)

avg_distance

The average path length across all possible pair of nodes composing the community set.

 

(Orman et al. 2012)

avg_embeddedness

The average embeddedness of nodes within the community.

\( \frac {1}{n_{C}} \sum _{i \in C} \frac {k^{int}_{iC}}{\Gamma (i)}\)

(Orman et al. 2012)

avg_odf

Average fraction of edges of a node of a community that point outside the community itself.

\(\frac {1}{n_{C}} \sum _{u \in C} \frac {|{(u,v)\in E: v \not \in C}|}{\Gamma (u)}\)

(Flake et al. 2000)

avg_transitivity

The average clustering coefficient of its nodes w.r.t. their connection within the community itself.

 

(Orman et al. 2012)

conductance

Fraction of total edge volume that points outside the algorithms.

\(\frac {n_{C}}{2 e_{C}+n_{C}}\)

(Shi and Malik 2000)

cut_ratio

Fraction of existing edges (out of all possible edges) leaving the community.

\(\frac {n_{C}}{b_{C} (n - b_{C})}\)

(Fortunato 2010)

edges_inside

Number of edges internal to the community.

eC

(Radicchi et al. 2004)

expansion

Number of edges per community node that point outside the cluster.

\(\frac {n_{C}}{b_{C}}\)

(Radicchi et al. 2004)

flake_odf

Fraction of nodes of the clustering that have fewer edges pointing inside than to the outside of their communities.

\(\frac {{ u:u \in C,| {(u,v) \in E: v \in C }| < |\Gamma (u)/2 }|}{n_{C}} \)

(Flake et al. 2000)

fraction_over_median_degree

raction of community nodes having internal degree higher than the median degree value.

\(\frac {{u: u \in C,| {(u,v): v \in C}| > |d_{m}}| }{n_{C}}\)

(Yang and Leskovec 2015)

hub_dominance

The ratio of the degree of its most connected node w.r.t. the theoretically maximal degree within the community.

\(\frac {max_{i \in C}(k^{int}_{iC})}{n_{C}-1}\)

(Orman et al. 2012)

internal_edge_density

The internal density of the community set.

\(\frac {e_{C}}{n_{C}(n_{C}-1)/2}\)

(Radicchi et al. 2004)

max_odf

Maximum fraction of edges of a node of a community that point outside the community itself.

\(max_{u \in C} \frac {|{(u,v)\in E: v \not \in C}|}{\Gamma (u)}\)

(Flake et al. 2000)

normalized_cut

Normalized variant of the Cut-Ratio.

\(\frac {n_{C}}{2e_{C}+n_{C}} + \frac {n_{C}}{2(|E|-e_{C})+n_{C}}\)

(Shi and Malik 2000)

scaled_density

The ratio of the community density w.r.t. the complete graph density.

\(\frac {2|E|}{n-1}\)

(Orman et al. 2012)

significance

Estimate the likelihood that the identified partition appears in a random graph.

\(\sum _{c}\binom {n_{C}}{2}D(p_{C} || p)\)

(Traag et al. 2015)

size

Number of community nodes.

  

surprise

Statistical approach that assumes that edges emerge randomly according to a hyper-geometric distribution: the higher the surprise, the less likely the clustering is resulted from a random realization.

\( - log \sum _{i=m_{C}}^{min(|E|,M_{int})}\binom {|E|}{i} \left \langle q \right \rangle ^{i} (1-\left \langle q \right \rangle)^{|E|-i}\)

(Traag et al. 2015)

triangle_participation_ratio

Fraction of community nodes that belong to a triad.

\(\frac { | { u: u \in C,{(v,w):v, w \in C,(u,v) \in E,(u,w) \in E,(v,w) \in E} \not = \emptyset } |}{n_{C}}\)

(Yang and Leskovec 2015)

erdos_renyi_modularity

Variation of the Newman-Girvan modularity that assumes that nodes in a network connected randomly with a constant probability p.

\(\frac {1}{|E|}\sum _{C \in \{C_{1},\dots C_{k}\}} (e_{C} - \frac {|E|n_{C}(n_{C} -1)}{n(n-1)})\)

(Erdös and Rényi 1959)

link_modularity

Variation of the Girvan-Newman modularity for directed graphs with overlapping communities.

\(\frac {1}{|E|}\sum _{i,j \in C}[A_{ij}\delta (c_{i}, c_{j})-\frac {k_{i}C^{out}k_{j}C^{in}}{m}\delta (c_{i}, c_{j})]\)

(Nicosia et al. 2009)

modularity_density

Variation of the Erdos-Renyi modularity that includes information about community sizes into the expected density coefficient so to avoid the negligence of small and dense communities.

\(\sum _{C \in \{C_{1},\dots C_{k}\}} \frac {1}{n_{C}} (\sum _{i \in C} k^{int}_{iC} - \sum _{i \in C} k^{out}_{iC}) \)

(Li et al. 2016)

newman_girvan_modularity

Difference of the fraction of intra community edges of a clustering with the expected number of such edges if distributed according to a null model.

\(\frac {1}{|E|}\sum _{C \in \{C_{1},\dots C_{k}\}}(e_{C} - \frac {(2 e_{C} + l_{C})^{2}}{4|E|})\)

(Newman and Girvan 2004)

z_modularity

Variant of the standard modularity proposed to avoid the resolution limit.

\(\frac {\sum _{C \in \{C_{1},\dots C_{k}\}}\frac {m_{C}}{|E|}-\sum _{C \in \{C_{1},\dots C_{k}\}}\left (\frac {D_{C}}{2|E|}\right)^{2}}{\sqrt {\sum _{C \in \{C_{1},\dots C_{k}\}}\left (\frac {D_{C}}{2|E|}\right)^{2} \left (1- \sum _{C \in \{C_{1},\dots C_{k}\}} \left (\frac {D_{C}}{2|E|}\right)^{2}\right)^{2}}}\)

(Miyauchi and Kawase 2016)

  1. The upper part of the table groups all the community-wise fitness scores for which the library allows to compute the overall distribution as well as standard statistical indexes. The lower part of the table groups the implemented modularity-based quality scores