We denote a bipartite network by G=(P, S, E) where P = { p1, p2,.., pk} and S = { s1, s2,.., sl}, E⊆P×S is the set of edges in G, n=k+l is the order of G and m=|E| is the size. Every bipartite network we deal with is assumed to be unweighted and undirected, so its adjacency matrix A is symmetric and can be written in block form. It is not necessary that it be connected, since each component can be separately clustered, but for simplicity here we assume that it is (i.e. there exists a path between every pair of nodes). We further assume k≥2 and l≥2, since otherwise G is a star and necessarily forms a single community. For brevity, we may represent a biclique by its node set alone.
The MaxBic algorithm is based on three concepts: node similarity; transformation of a biclique into a clique on the same node set to simplify tracking of overlap and merging; and maximal bicliques. We work with node similarity, rather than edge similarity as in Ahn et al. (2010), because in social networks nodes in the same community have similar patterns and a community of nodes has homogenous structure Barrat et al. (2008).
Node similarity
In social science, the idea of similarity between nodes is not new, with studies going back to 1971 (Lorrain and White 1971). Similarity there is defined in term of structural equivalence (Lorrain and White 1971; Van Steen 2010) where two nodes i and j are structurally equivalent if they have the same pattern of relationships with all other nodes. This implies that nodes i and j share the same neighbors for the same purpose (Leicht et al. 2006). The more similar the nodes are, the more common neighbors they have.
In a bipartite network, it is usual to infer that the more neighbors that two nodes i and j in the same set (say P) have in common in the other set (S), the higher the likelihood that they interact, and so the structural similarity of nodes i and j can be measured by their number of common neighbors. It has been proved that this count is an effective measure for structural similarity and gives accurate results (Zhou et al. 2009; Liben Nowell and Kleinberg 2007) on large-scale networks. It underpins the definition of the unipartite projections GP and GS of G. Moreover, it uses the fundamental topology of the network (Leicht et al. 2006). Thus we define the similarity of nodes i and j to be their common neighbors index CNIij, i.e. the size of their set of common neighbors CNSij≡Γ(i)∩Γ(j), where Γ(i)≡{x|{i,x}∈E} is the exclusive neighborhood of node i. The similarity can be calculated either from the CNS or from the network’s adjacency matrix A=[aij] :
$$ {CNI}_{ij} \equiv |\Gamma(i) \cap \Gamma(j) | = \sum_{x}a_{ix} a_{jx}\,. $$
(1)
Community strength
Clusters of nodes can be regarded as strong or weak. Probably the simplest and most natural definition of a strong cluster is a set of nodes which form a clique, that is, the subgraph they induce is complete (Palla et al. 2005). The definition of modularity in Newman and Girvan (2004) is given for general networks and compares the number of edges within a cluster to the expected number in an equivalent network with edges placed at random, so a clique will maximise modularity for its set of nodes. Bimodularity is correspondingly defined for bipartite networks (Barber 2007), so that a biclique will maximise it. However there are less absolute ideas of community which are commonly used, and which are more appropriate for measuring the relative strength of maximal bicliques.
The definitions of strong and weak community given in Radicchi et al. (2004), which compare the number of edges outgoing from the cluster to the rest of the network, are less strict conditions than those given in Hu et al. (2008), which compare the number of edges outgoing from the cluster to each other cluster and not to all the rest of the network.
Here we order community strength into 5 categories, by comparing the definitions in Hu et al. (2008); Radicchi et al. (2004).
Definition 1
For a particular cluster c to which node i belongs, separate the degree ki of i into two parts: the number of edges \(k_{i}^{in} = \sum _{j \in c} a_{ji}\) connecting node i to other nodes in c, and the number of edges \(k_{i}^{out} = \sum _{j \not \in c} a_{ij}\) connecting node i to the nodes in the rest of the network. Then c is
-
1
strong (= strong in Radicchi et al. (2004)) if
$$ k_{i}^{in} > k_{i}^{out}, \qquad \forall i \in c \,; $$
(2)
-
2
almost strong (= strong in Hu et al. (2008)) if
$$ k_{i}^{in} \geq \max_{c^{\prime} \not = c} \left\{ \sum_{j \in c^{\prime}} a_{ij} \right\}, \qquad \forall i \in c \,; $$
(3)
-
3
almost weak (= weak in Radicchi et al. (2004)) if
$$ \sum_{i \in c}k_{i}^{in} > \sum_{i \in c}k_{i}^{out} \,; $$
(4)
-
4
weak (= weak in Hu et al. (2008)) if
$$ \sum_{i \in c} k_{i}^{in} \geq \max_{c^{\prime} \not = c} \left\{ \sum_{i \in c} \sum_{j \in c^{\prime}} a_{ij} \right\}, ~~\text{and} $$
(5)
-
5
very weak if it does not belong to any strength level according to the above 4 categories.
Clearly the definition of strong community in (2) implies the definition of almost strong in (3), which means that the definition of strong community in (2) is more restrictive than in (3). Similarly, the definition of almost weak community in (4) implies the definition of weak community in (5), which means that the definition of weak community in (5) is less restrictive than the definition in (4). Furthermore, strong implies almost weak, but almost strong does not imply almost weak. Figure 1 illustrates which definition implies which.
Based on these categories, we introduce the following measure St of community strength.
Definition 2
For node i in community c let \( k_{i}^{max-out} = \max _{c^{\prime } \not = c} \sum _{j \in c^{\prime }} a_{ij}\) be the maximum number of outgoing edges from i toward another community in the network. The strength St(c) of c is defined as
$$\begin{array}{@{}rcl@{}} St(c) & = & \left\{ \begin{array}{ll} \sum_{i \in c} k_{i}^{in} - \sum_{i \in c} k_{i}^{out} & ~~~~~~ \text{if}~~ c ~~ \text{is strong}\\ \sum_{i \in c} k_{i}^{{{in}}} - \sum_{i \in c} k_{i}^{max-out} & ~~~~~~ \text{if}~~ c ~~ \text{is almost strong}\\ \sum_{i \in c} k_{i}^{{{out}}} - \sum_{i \in c} k_{i}^{in} & ~~~~~~ \text{if}~~ c~~ \text{is almost weak}\\ \sum_{i \in c} k_{i}^{{{max-out}}} - \sum_{i \in c} k_{i}^{in} & ~~~~~~ \text{if}~~ c~~ \text{is weak}\\ \end{array} \right. \end{array} $$
(6)
The higher value for strong and almost strong communities indicates the stronger community, while the higher value for almost weak and weak communities indicates the weaker community.
MaxBic: a new maximal biclique finding algorithm
The MaxBic algorithm can be divided into seven phases, described below. Pseudocode is given in Appendix A. A worked example on a toy bipartite network is given in Fig. 2.
MaxBic finds, from each node, at most one maximal biclique containing it (others will usually be found from different nodes). First, using the idea of node similarity, it finds the set of common neighbors for every pair of nodes in P (S can equally well be chosen, see Remark 1 below). Each pair in P together with its common neighbors is the node set of a biclique we term a basic biclique. Second, the node set of each basic biclique is formally treated as the node set of a clique in a (unipartite) supergraph G∗ of G, in order to merge node sets based on (bi)cliques rather than merging individual nodes. Conceptually this is based on the results from (Evans 2010) which show in benchmark unipartite networks that clique graphs find overlapping communities accurately while node partition methods fail. Third, the inclusive neighborhoods for each node in G∗ are cross-checked for complete overlap to permit merging and enlarging, based on the idea that two nodes in P∨S are more likely to belong to the same bipartite community in G when their node similarity in G∗ is optimal. The resulting bicliques in G are checked for cover and maximality. Any node from S not yet accounted for is included in any adjacent community at this point. These communities are then categorised and ordered by their strength, as described in “Community strength” subsection.
Phase (i)Determine the common neighbors set of each pair of nodes p,p′∈P. Find the exclusive neighborhood Γ(p) for each p∈P, then, for every pair of nodes in p,p′ in P, find \(\phantom {\dot {i}\!}{CNS}_{pp^{\prime }}\). The subgraph of G induced by the node set {p,p′,Γ(p)∩Γ(p′)} is a basic biclique\(K_{2, CNI_{p, p^{\prime }}}\).
Phase (ii)Replace each basic biclique by a clique. Connect each node in K(p,p′)={p,p′,Γ(p)∩Γ(p′)} to every other node in it, to form the basic clique K∗(p,p′). Each basic clique is underpinned by the structure of the projections GP and GS: added edge {p,p′} in G∗ would form an edge in GP and the added edges on \(\phantom {\dot {i}\!}{CNS}_{pp^{\prime }}\) in G∗ would form a clique in GS, and so represents fundamental community structure in S. Formally we have a new network G∗=(P∨S,E∗) where E∗ is the edge list of the basic cliques. Note G∗ is no longer bipartite, though it has the same node set P∨S as G, and E⊂E∗.
Phase (iii)Find the inclusive neighborhood of each node in G∗by merging basic cliques. For each node i∈G∗ we find the set containing i and its neighbors in G∗:
if i∈P then,
$$ \Gamma_{+}^{*}(i) = \{i\} \cup \left(\bigcup_{p \in P, ~{CNS}_{ip} \neq \emptyset} ~K(i, p)\right) $$
(7)
and if i∈S, then
$$ \Gamma_{+}^{*}(i) = \{i\} \cup \left(\bigcup_{p, p^{\prime} \in P, ~ i \in K(p, p^{\prime})} ~K(p, p^{\prime})\right)\,. $$
(8)
Form a container Z of clusters, initially one for each node i∈P∨S, which will be updated. Each cluster ci in Z will consist of an element (node i) and a list, with the form ci={i:listi}={i:j1,j2… }. Initially ci={i:∅}, so we start with as many clusters as we have nodes in the network G.
Phase (iv)Test for total overlap of neighborhoods and form larger clusters accordingly. Fix a main node i∈P∨S and run through each other node j, in order to merge clusters on nodes with high similarity. Merge j to listi if
$$ \Gamma^{*}_{+}(i) \subseteq \Gamma^{*}_{+}(j). $$
(9)
Node j will merge only if in G∗ it is in a common clique with i, so in Z the nodes of ci={i:j1,j2… } after this phase will be from a clique in G∗ and thus from a biclique in G (not necessarily maximal).
Phase (v)RepeatPhase (iv)for each i∈P∨S. The output from this step is clusters, stored in Z. Each cluster will be either the node set of a biclique in G or of the form {s:∅} for some s∈S. (No cluster of the form {p:∅} for some p∈P will be in Z: since G is connected, a nonempty K(p,p′) will exist).
Phase (vi)Reduce the redundancy of clusters. First, remove clusters ci from Z that satisfy the condition (i∪listi)⊆(j∪listj) for any cj. This ensures that every i∈P∨S is in at least one cluster and that the biclique underlying the cluster is maximal.
Second, if |listi|≥2 and listi⊂listj, merge element i to cj. A cluster surviving to this point with |list|<2, which has one node from P and the other from S, will stand alone, as it forms the smallest biclique, a single edge.
Finally, there might be some nodes s in S which are not merged, because they have degree 1 and aren’t in any \(\Gamma ^{*}_{+}(i), ~i \in P\), so cs={s:∅} in Z after Phase (v). They are included in any community to which their adjacent node (in P) belongs.
Phase (vii)Categorise and order the communities in G based on their strength. Using the definitions in “Community strength” subsection, we have five categories: strong, almost strong, almost weak, weak and very weak. Communities in the first four categories are ordered in descending order of strength St.
Remark 1
If we use set S instead of P, the only difference in the edge list E∗ will come from the smallest basic bicliques, those with 3 nodes. These can only make differences to the communities of size 1, 2 or 3 found by MaxBic, according to the following argument.
Suppose \(\phantom {\dot {i}\!}K(p_{1}, p_{2}) = \{p_{1}, p_{2}, s_{1}, s_{2}, \dots, s_{l^{\prime }}\}\) is a basic biclique when starting from P. Then E∗∖E contains the edges (p1,p2),(s1,s2) and if l′=2 then starting from S the same edges arise by symmetry. If l′=3 then also (s1,s3) and (s2,s3) are in E∗∖E. Starting from S, we would obtain basic bicliques K(s1,s2)={s1,s2,p1,p2,… },K(s1,s3)={s1,s3,p1,p2,… } and K(s2,s3)={s2,s3,p1,p2,… }, so the 4 specified edges are in E∗∖E. If l′>3 this argument generalises. In the smallest case \(\phantom {\dot {i}\!}l^{\prime } = 1, (p_{1}, p_{2})\) is in E∗∖E when starting from P but this edge will not arise when starting from S.
Computational complexity
Enumeration of all maximal bicliques is at least exponential in n (Viard et al. 2016). However MaxBic does not find all maximal bicliques, but at most one maximal biclique for each node in G, and its time complexity is at most O(n2 k) where k=|P|, according to the following argument.
Phase (i) Finding basic bicliques (Algorithm lines 5–12) requires computation of CNS for every pair of nodes in P so has time complexity O(k2).
Phase (ii-iii) Finding \(\Gamma _{+}^{*}(i)\) requires a check of at worst each K(p,p′) (see Eqs. (7) and (8), Algorithm line 23) and there are n nodes to check for so this step has time complexity O(n k2).
Phase (iv-v) Searching for similarity between nodes using \(\Gamma _{+}^{*}(i)\) (Algorithm lines 25–31) means comparing every pair of these, which can cost O(n2).
Phase (vi) The last operation compares each community with each other in order to reduce the redundancy (Algorithm lines 35–38) which takes O(|Z|2), where |Z| is the number of communities, and is no more than n. So this step takes up to O(n2) also.
Putting these together and noting k<n gives at worst O(n2 k).
MaxBic uses set P to find up to n maximal bicliques. If l≪k it could be advantageous to use S instead of P, so overall we have complexity O(n2 min{k,l}).
Reducing redundancy and revealing hierarchy–MaxBicR(J)
At present there is no commonly accepted standard for evaluating the efficiency and accuracy of overlapping community detection algorithms for bipartite networks. We are proposing MaxBic as an algorithm for determining ground truth or metadata communities of overlapping maximal bicliques at the base level of a hierarchy. These communities are necessarily as tightly connected as is possible. To reduce redundancy at the base level and determine higher levels of a hierarchy of overlapping communities any suitable merging algorithm (such as described in “Overlapping communities algorithms” subsection) could be applied.
We have chosen to develop our own merging algorithm in order to avoid bias when comparing performance with other algorithms. We tried six different methods of merging base-level communities, all based on the Jaccard similarity coefficient of pairs of communities which, for two node sets c and c′ is:
$$ J(c,c^{\prime}) = \frac{|c \cap c^{\prime}|}{| c \cup c^{\prime}|} $$
(10)
When J(c,c′)=1 the node sets are identical and when J(c,c′)=0 they have no overlap.
We tested all six methods on the Southern Women and Noordin Top networks. For different threshold values of J, we plotted the number of second-level communities found against the extended normalised mutual information NMI (Lancichinetti et al. 2009) of those second-level communities and the base-level communities found by MaxBic. The larger the NMI value, the better the match between two structures. For simplicity here, we selected one of the six, which uses all nodes in a community so as not to bias towards P or S, and which gives relatively consistent performance. The full results are reported in Alzahrani (2016).
There are two stages in our selected merging algorithm. In the first stage, select a threshold J∈[0,1] for the Jaccard similarity coefficient and merge (without discarding) any two node clusters output by MaxBic which have similarity coefficient at least J. In the second stage, treat the resulting clusters as super nodes and weight edges between two super nodes by their Jaccard coefficient, again thresholding on J. We obtain the incidence matrix of the cluster graph. We use the modified version of the Breadth-First Search algorithm (BFS) to traverse between super nodes and identify connected components of the cluster graph. The node set of each connected component is output as a community at the second level of the hierarchy. We term the combined algorithm (MaxBic followed by this merging algorithm) MaxBicR(J). After empirical testing for J in steps of 0.1 we fix J=0.6 in what follows.