Ensemble Clustering for Graphs: Comparisons and Applications

We recently proposed a new ensemble clustering algorithm for graphs (ECG) based on the concept of consensus clustering. We validated our approach by replicating a study comparing graph clustering algorithms over benchmark graphs, showing that ECG outperforms the leading algorithms. In this paper, we extend our comparison by considering a wider range of parameters for the benchmark, generating graphs with different properties. We provide new experimental results showing that the ECG algorithm alleviates the well-known resolution limit issue, and that it leads to better stability of the partitions. We also illustrate how the ensemble obtained with ECG can be used to quantify the presence of community structure in the graph, and to zoom in on the sub-graph most closely associated with seed vertices. Finally, we illustrate further applications of ECG by comparing it to previous results for community detection on weighted graphs, and community-aware anomaly detection.

are presented in Section 5. First, we consider a real graph, and show how ECG weights can be used to zoom-in on significant sub-graphs given some seed vertices. We then use ECG for two recently published applications, respectively for clustering weighted graphs [16], and using graph clustering to find anomalous nodes [17]. We wrap-up in the last section.

Previous Results
Let G = (V, E) be a graph where V = {1, 2, . . . , n} is the set of vertices, and E ⊆ {(u, v) | u, v ∈ V, u < v} is the set of edges. We consider undirected graphs. Edges can have weights w(e) > 0 for each e ∈ E. For un-weighted graphs, we let w(e) = 1 for all e ∈ E. The 2-core of a graph G is its maximal subgraph whose vertices have degree at least 2. Let P i = {C 1 i , . . . , C li i } be a partition of V of size l i . We refer to the C j i as clusters of vertices. We use to denote the indicator function for v ∈ C j i .

The ECG Algorithm
The ECG algorithm is a consensus clustering algorithm for graphs. Its generation step consists of independently obtaining k randomized level-1 partitions from the multilevel-Louvain (ML) algorithm [10]: P = {P 1 , . . . , P k }. Its integration step is performed by running ML on a re-weighted version of the initial graph G = (V, E). The ECG weights are obtained through co-association. The weight of an edge e = (u, v) ∈ E is defined as: where 0 < w * < 1 is some minimum weight and α Pi ( indicates if the vertices u and v co-occur in a cluster of P i or not. When running the ECG algorithm, the size k of the ensemble and the minimum edge weight w * are the only parameters that need to be supplied. Guidelines for the parameters are given in [14], where we also show that the results are not too sensitive with respect to those parameters.

Comparison Study
In [14], we re-visited a recently published study of graph clustering algorithms, comparing the best performing algorithms from that study with the ECG algorithm. In general, we found ECG to yield better clusters with respect to all of the measures considered. Moreover, ECG generally found a number of communities much closer to the true value.
The algorithms are compared on graphs generated with the LFR benchmarks for undirected and unweighted graphs and with non-overlapping communities. A key parameter when generating an LFR graph G is the mixing parameter µ, which sets the expected proportion of edges in G for which the two endpoints are in different communities. We considered .03 ≤ µ ≤ .75.
It was recently shown [18] that graph-agnostic measures such as the adjusted RAND index (ARI) yield high scores for refinements of the true partition, while a graph-aware version (AGRI) gives high scores for coarsenings of the true partition when measuring graph partition similarities. It is thus recommended to use both measures to compare algorithms, as we do throughout this paper. We compared the true communities with those found by the ECG algorithm as well as three other state-of-the-art algorithms: InfoMap (IM) [9], WalkTrap (WT) [5] and multilevel-Louvain (ML) [10]. The quality of the results from ECG are clear from the first two plots of Figure 1, and the number of communities found with ECG remains much closer to the true number as the proportion of noise increases, as shown in the third plot. Those conclusions are illustrative of the results we reported in [14].

Expanding the Comparison
In the LFR benchmark [19], three important parameters are: the mixing parameter (µ), the (negative) degree distribution power law exponent (γ 1 ), and the (negative) community size distribution power law exponent (γ 2 ). It is generally recommended to use 2 ≤ γ 1 ≤ 3 and 1 ≤ γ 2 ≤ 2 to model realistic networks [19], [20]. In the previous section, we compared ECG with other state-of-the-art algorithms with the same parameter choices as in [11]. While we considered a wide range for parameter µ, the power law exponents were fixed at γ 1 = 2 and γ 2 = 1. In this section, we summarize the impact of those parameters on the types of networks that are generated, and we re-visit the comparison results, exploring a wider set of graphs. Figure 1: In the first two plots, we compare the accuracy of ECG with state-of-the-art algorithms: InfoMap (IM), WalkTrap (WT) and Louvain (ML). Results from each algorithm are compared with the true communities for LFR graphs with n = 22, 186 vertices, and for various values of µ, the proportion of noisy edges. For each value of µ, we average over 10 LFR graphs; the shaded area shows the standard deviation. We see that ECG outperforms all other algorithms. In the third plot, we look at the ratio of the number of computed vs true communities. We see that ECG remains very close to the desired valueĈ/C = 1, as opposed to the other algorithms.
In Figure 2, we show some topological graph differences over 5 choices of parameters (γ 1 , γ 2 ) in the recommended range. We see that for larger values of those parameters, the communities generated are small and of similar size while smaller values for (γ 1 , γ 2 ) yield graphs with more heterogeneous community sizes, which are more realistic. Figure 2: We selected 5 choices for the power law parameters (γ 1 , γ 2 ) which are representative of various types of networks obtained with the LFR benchmark, and we look at the distribution of the sizes of communities. We see that with the largest recommended values (γ 1 , γ 2 ) = (3, 2), we get small communities of homogeneous size. As the exponents decrease, the sizes of the communities get more heterogeneous. All results were obtained by averaging over 10 graphs with 22,186 nodes for every choice of parameters (µ, γ 1 , γ 2 ).
In Figure 3, we again compare ECG with IM, WT and ML. For the larger values of (γ 1 , γ 2 ), we see that the ML algorithm does not do very well, with ECG doing much better and IM yielding the best results. As before, we use both the ARI measure and its graph-aware counterpart AGRI. As the exponents decrease, indicative of more heterogeneous community size distribution, we see that the ML algorithm does better, and ECG gives the best results overall. Therefore, by expanding the comparison over a wider range of LFR parameters, we see that ECG generally gives better results, with the exception of graphs with small communities of homogeneous size, where IM is slightly better.

Resolution Limit and Stability
At the heart of ECG is the fact that we use multiple runs of the single-level Louvain algorithm to build an ensemble of weak (or local) partitionings of the vertices. In this section, we illustrate the two main reasons for this choice. Recall that those graphs have many small communities of similar sizes. As we move toward the right, the Louvain algorithm does better and ECG yields the best results. Those graphs have more heterogeneous community sizes.

Resolution Issue: Ring of Cliques Illustration
The resolution limit issue is well illustrated by the infamous ring of cliques example, where the n vertices form l cliques (full sub-graphs) of size m, wired together as a ring. For some choices of l and m, grouping pairs of adjacent cliques yields a higher modularity value than the natural choice of each clique forming its own cluster [21]. The latter yields higher modularity if and only if m(m − 1) > l − 2. In [14], we show that choosing a small value for w * in (1) can alleviate this issue. In particular, choosing w * < 1/n avoids the issue altogether.
In Figure 4, we look at rings of l cliques of size m = 5, with 1 to 5 edges between contiguous cliques. For the ML algorithm, we see the resolution limit issue when l > 20 (with 1 edge between contiguous cliques), which agrees with the known results. The IM algorithm is stable when only a few edges link the cliques, but quickly becomes unstable as more edges are added, while the ECG algorithm remains very stable keeping the default choice of w * = .05. Figure 4: In each plot, we consider l cliques of size m = 5 where contiguous cliques are linked by 1 to 5 edges, respectively. We compare the number of communities found by the InfoMap (IM), Louvain (ML) and ECG algorithms. The resolution limit phenomenon is clearly seen with the ML algorithm. The IM algorithm fails to find the right number of communities when we increase the number of edges between the cliques, while ECG remains more stable.
We further illustrate this stability in Figure 5, where we add up to 15 edges between the cliques of size 5 in a ring with 4 cliques. We see that even when the number of edges linking the cliques is comparable to the number of edges within each clique, the signal obtained with the ECG weights still favours the cliques. This behaviour allow to better identify communities in noisy graphs. In the right plot of Figure 5, we show the case where 15 edges are added between contiguous cliques. Thicker edges are the ones where the ECG weights are above 0.8. We see that most of the clique structure is still captured when looking only at those high weight edges. Figure 5: We add 1 to 15 edges between contiguous cliques in a ring of 4 cliques of size 5, and we look at the effect on the ECG edge weights for edges internal to the cliques, or external edges linking the contiguous cliques. In the right plot, we look at the case with 15 edges between cliques; thick edges are the ones where the ECG weight is 0.8 or above.

Weight distribution and community structure
We compare the ECG weight distribution over LFR graphs where we vary the mixing parameter. We also compare with a random graph having the same degree distribution as one of the LFR graphs. Bi-modal distribution of the ECG weights near the boundaries (0 and 1) is indicative of strong community structure. We propose a simple community strength indicator (CSI) based on the point-mass Wasserstein distance. For all edges (u, v) ∈ E, with W P (u, v) from (1), we define: such that 0 ≤ CSI ≤ 1, where a value close to 1 is indicative of strong community structure, random weights W P (u, v) yield a value close to 0.5, and CSI = 0 when all W P (u, v) = 0.5. In Figure 6, we see the bi-modal distribution of the weights for low and mid-range choices of µ, along with high CSI values. For larger values of µ, the distribution is not as clear, and there are less and less edges with weight close to 1, which indicate a weak community structure, as confirmed by the CSI values. The random graphs have low weights only, which is indicative of the absence of community structure. This example illustrates how the distribution of edge weights obtained with ECG, along with the proposed CSI, can be used to assess the strength of community structure in a graph. Figure 6: Violin plots of the ECG weight distribution for a family of LFR graphs with n = 22, 186 nodes, parameters γ 1 = 2, γ 2 = 1 and .21 ≤ µ ≤ .75. We also compare with a random graph of the same size and degree distribution as the graph with µ = 0.21. We see the bi-modal distribution over LFR graphs up to a very high noise level. For large µ, the signal gets weaker. It is even weaker for the random graph. The Community Strength Indicator (CSI) is also reported.

Stability of ECG
So far, we saw that the ECG weights are useful to alleviate the resolution issues of modularity, and can also be used to assess the presence of community structure in a graph. We illustrate another advantage of ECG which is to significantly reduce the instability in the ML algorithm. To test for stability, we run the same algorithm twice on each graph considered, and we compare the two partitions obtained with the ARI (or AGRI) measure.
In Figure 7, we did this for the ML and ECG algorithms over LFR graphs with the same parameters as in the previous section. We see that in all cases, ECG greatly improves the stability of the Louvain algorithm.

Other Applications of ECG
In this Section, we look at a few applications with ECG.

ECG weights and a dimmer process
Assume that we are interested in some seed vertices in a graph. In large graphs, it is not clear how to properly "zoom in" on the sub-graph showing the main interactions around the seed vertices. Taking the seed's ego-nets (immediate neighbours) may not show all the strong interactions, and taking the entire clusters from a partition which contain the seed vertices may be too large. The weights provided by ECG can be used to define a dimmer-like process around the seed vertices, thus highlighting the sub-graphs that are the most tightly connected to the seeds.
Consider a graph G, a seed vertex v and G v ⊂ G the sub-graph of G formed by keeping only the ECG cluster containing vertex v. Given some threshold θ, we delete all edges in G v with ECG weights below θ, and we keep the connected component sub-graph containing vertex v. Increasing θ from 0 to 1 provides a hierarchy of sub-graphs of decreasing size which all contain vertex v.
As an illustration of this process, we consider the Amazon co-purchasing graph available from the SNAP repository [22]. This graph has 334,863 nodes and 925,872 edges. There are over 75,000 communities, 5000 of which are identified as the top ones. We picked a vertex v that belongs to one of those top communities 1 . We ran ECG, and isolated the sub-graph G v induced by the vertices in the ECG cluster that contains v. In Figure 8, we gradually increase the threshold θ, keeping only edges in G v with ECG weight above that threshold, and showing the connected component containing v. In the first plot, we set θ = 0, thus showing G v (v is shown with larger size). Vertices in red belong to the same ground truth community as v. While we see a lot of spurious vertices in the first plot, discarding edges with low ECG weights (setting θ = 0.1) yields the second sub-graph, where all ground truth vertices are retained. The last plot shows a more aggressive filtering, where we retain only edges with high ECG weights (setting θ = 0.72). This reveals a tightly connected subset of vertices around the seed vertex v.

Application to weighted graphs
So far, all of our comparisons for ECG were done over un-weighted graphs. In [16], among other things, the authors study various edge re-weighting schemes and graph clustering algorithms over weighted LFR graphs. They found that Figure 8: We consider a seed vertex (display with larger size) from the Amazon co-purchasing graph. Vertices from the same ground truth communities are displayed in red, and other vertices are displayed in black. From left to right, we display respectively (i) the entire sub-graph obtained from the ECG part that contains the seed vertex, (ii) a connected sub-graph with ECG edge weights above 0.1 containing the seed vertex, and (iii) a connected sub-graph with ECG edge weights above 0.72 containing the seed vertex. While the first plot has many spurious vertices, as we zoom in, most nodes we retain are in the same true community as the seed node.
using the GloVe re-weighting function along with the Label Propagation (LP) algorithm [7] gave the best results for identifying communities.
We re-created this experiment, using the same graphs available at [23], and the same re-weighting function with the best choice of parameters as reported in Table 1 of [16]. We compared LP, ML and ECG algorithms. While we generally obtained good results with LP, we had to discard some runs as this algorithm sometimes failed to converge. We show our results in Figure 9, where we summarize the ARI and AGRI scores we obtained over all graphs for which the LP algorithm converged. We see that the results are better in general with ECG, and with much improved stability. Figure 9: Results over weighted LFR graphs for 3 graph clustering algorithms: Multilevel-Louvain (ML), Label Propagation (LP) and ECG. Results are obtained over all graphs in [23] for which the LP algorithm converged. GloVe re-weighting as reported in [16] is applied to all graphs.

Community-aware anomaly detection
In [17], the authors propose CADA, a community-aware method for detecting anomalous vertices. In a nutshell, for each vertex v ∈ V , let N (v) represent the number of neighbors of v, and N c (v) the number of neighbors of v that belong to the most represented community obtained with the IM or ML algorithm. They define: where x ∈ {IM, M L} indicates the clustering algorithm used. They compare their algorithm to other methods by generating LFR graphs with degree exponent γ 1 = 3 and community size exponent γ 2 = 2. As we saw earlier, this choice corresponds to small communities of homogeneous size, where the IM algorithm performs best. We re-visited this approach with ECG, considering different values for the power law exponents, as in section "Expanding the Comparison". We generated LFR graphs with n = 22, 186 nodes and various values for the mixing parameters. For each graph, we introduced 200 random anomalous nodes with the same degree distribution, as in Figure 1 of [17].
In Figure 10, we compare CADA ECG with CADA IM and CADA M L using the areas under the ROC curves (AUC). We see that for large choices of the power law exponents, the IM version does best. This is the only choice of parameters used in [17]. As we decrease the values of the exponents, we see that using ECG becomes a better choice, in particular for large values of µ. This is due to the increased stability and the ability to distinguish the signal from the noise provided by the ECG weights, which we illustrated earlier in section "Resolution Limit and Stability".

Conclusion
In [14], we proposed ECG, a new graph clustering algorithm based on the concept of consensus clustering, and we compared it to other algorithms by re-creating the study in [11]. In this paper, we compared ECG with state-of-the-art algorihms over a wider range of graphs, showing ECG to be the best performing algorithm in most cases. We provided empirical evidence for the two main advantages of ECG: its ability to greatly reduce the resolution limit issue of modularity, and its high stability. We also illustrated how the edge weights generated in ECG can be used to assess the presence of community structure in graphs. Finally, we favourably applied ECG to three tasks: we showed how to extract relevant sub-graphs around seed vertices, we used ECG to find communities in weighted graphs, and we applied ECG for the task of detecting anomalous vertices in graphs.