Skip to main content

Advertisement

A fast and robust kernel optimization method for core–periphery detection in directed and weighted graphs

Article metrics

Abstract

Many graph mining tasks can be viewed as classification problems on high dimensional data. Within this class we consider the issue of discovering core-periphery structure, which has wide applications in the economic and social sciences. In contrast to many current approaches, we allow for weighted and directed edges and we do not assume that the overall network is connected. Our approach extends recent work on a relevant relaxed nonlinear optimization problem. In the directed, weighted setting, we derive and analyze a globally convergent iterative algorithm. We also relate the algorithm to a maximum likelihood reordering problem on an appropriate core-periphery random graph model. We illustrate the effectiveness of the new algorithm on a large scale directed email network.

Introduction

Graph theory gives a common framework for formulating and tackling a range of problems arising in data science. Many such tasks can be viewed in terms of categorizing nodes or discovering hidden substructures that relate them. Clustering, or community detection, is perhaps the most widely studied problem, and it forms the basis of many classification algorithms (Bertozzi et al. 2018). In this work we study the different, but closely related, issue of identifying core–periphery structure; we seek a set of nodes that are highly connected internally and with the rest of the network, forming the core, and a set of peripheral nodes that are strongly connected to the core but have only sparse internal connections.

This kind of structure is important for a number of reasons. For example, identifying core–periphery structures can help in identifying and categorizing hubs, i.e., well-connected nodes. As noted in (Rombach et al. 2014), such nodes often occur in real–world networks. This is an issue for some community detection methods, as hubs tend to be connected to many different communities and, thus, can be awkward to classify. Moreover, the set of core nodes can be used to identify internally cohesive subgraphs of highly central nodes. In fact, even though all core nodes typically have high centrality score, not all nodes with high centrality measures belong to the core and it is possible to find sparsely connected subgraphs of central nodes not belonging to the core (Borgatti and Everett 2000).

The concept of the network core–periphery is closely related to the idea of rich-clubs, nested networks and onion network structures (Zhou and Mondragón 2004; Bascompte et al. 2003; Schneider et al. 2011). In particular, a number of core–defining algorithms have been proposed in recent years, e.g., (Cucuringu et al. 2016; Rombach et al. 2017; Tudisco and Higham 2019), following the seminal work by Borgatti and Everett (2000). Core–periphery structure has been detected and interpreted in many complex systems, including protein–protein interaction networks (Kim et al. 2007), metabolic and gene regulatory networks (Sandhu et al. 2012), social networks (Borgatti and Everett 2000; Baños et al. 2013), engineered networks (such as the Internet, power-grids or transportation networks) (Tudisco and Higham 2019), and economic networks (Tomasello et al. 2017). See also the review (Csermely et al. 2013).

From a computational perspective, several recent works provide algorithms that apply to undirected networks. In particular, we have introduced in (Tudisco and Higham 2019) a scalable nonlinear optimization method with global quality guarantees for core–periphery detection in binary, undirected and connected graphs. This method exploits an intriguing connection between optimization and nonlinear eigenproblems and allows for a fast and easily implementable iteration which guarantees to compute the global maximum of a highly nonconvex core–score quality function.

In this work we consider the core-periphery concept in the more general setting of directed, weighted and possibly disconnected networks and we extend the results of (Tudisco and Higham 2019), both in terms of the algorithms and of the theoretical analysis, to this more challenging case.

In our directed case, we use the concept that a set of nodes forms a core if there are many core-to-core, core-to-periphery and periphery-to-core edges, with few periphery-to-periphery edges.

Although the ideal core–periphery subdivision defines two well distinguished sets of nodes, in practice one often looks for a core–score vector u≥0 such that a smaller value ui indicates that node i is more peripheral. Such an assignment may be viewed as a type of node centrality measure (Newman 2011). Indeed the classic cases of degree centrality and eigenvector centrality have been proposed and tested in this context (Borgatti and Everett 2000; Rombach et al. 2017; Tudisco and Higham 2019; Mondragón 2016), and, as we explain in the “Connection with degree and eigenvector centralities” section, the approach we propose here may be viewed as a nonlinear generalization of both these cases.

The manuscript is organized as follows. The “Notation” section introduces some relevant notation. In the “Core–periphery via functional kernel optimization” section we express the core-periphery detection problem in terms of kernel-based optimization and in the “Connection with degree and eigenvector centralities” section we connect this idea with classical node centrality measures. The “Logistic core-periphery random model for directed graphs” section shows that another viewpoint is also relevant; the approach may be viewed as maximum likelhood reordering under a new random graph model that generates directed core-periphery structure. In the “Core-periphery nonlinear operator” section we study the nonlinear optimization problem and show that it may be solved via an inexpensive and globally convergent iteration. The “Enron dataset” section illustrates the performance of the algorithm on a large scale email dataset and the “Conclusion” section gives some conclusions.

Notation

We consider directed and possibly weighted graphs G=(V,E) with node set V={1,…,n} and adjacency matrix A=(Aij).

If node i does not point to node j then the entry aij is zero. Otherwise, aij takes a positive value, accounting for the strength of the directional tie from i to j.

We let \(\mathbb {1}\) denote the column vector in \(\mathbb {R}^{n}\) with all values equal to one, and define the in and out degree vectors as \( \boldsymbol {d}^{\text {in}} = A \mathbb {1}\) and \(\boldsymbol {d}^{\text {out}} = A^{T}\mathbb {1} \), respectively. Operations on and between vectors are to be interpreted in a componentwise sense, so that, for example, xp−1 has ith component given by \(x_{i}^{p-1}\) and xp−1y has ith component given by \(x_{i}^{p-1} y_{i}\). Inequalities involving vectors and matrices are also to be interpreted componentwise, so that, for example, A≥0 means Aij≥0 for all i and j.

Core–periphery via functional kernel optimization

To search for the presence of a core and periphery we define a core–score vector, that is, a nonnegative vector u quantifying the coreness of the nodes, where ui>uj indicates that node i is closer to the core than node j. We define our core–score vector as the solution to the following nonconvex and constrained core–periphery quality function maximization problem

$$ \begin{array}{ll} \max & f_{\alpha}(\boldsymbol{x})\\ \text{s. t.} & \boldsymbol{x} \geq 0 \text{ and }\|\boldsymbol{x} \|=1 \end{array} $$
(1)

where, for some fixed real number \(\alpha \in \mathbb {R}\) to be chosen, fα is the core–quality function

$$ f_{\alpha}(\boldsymbol{x})= \sum_{i,j=1}^{n} A_{ij}\, \kappa_{\alpha}(x_{i},x_{j}), \qquad \kappa_{\alpha}(x,y) = \Big(\frac{|x|^{\alpha} + |y|^{\alpha}}{2}\Big)^{1/\alpha} \,. $$
(2)

Note that, since only relative values are important, a constraint of the form x=1 is very natural. However, there is no reason at this stage to prefer a particular norm over another. Therefore, we assume for now that · is any vector norm and consider the problem (1) in this general setting.

For \(x,y\in \mathbb {R}\), the kernel κα(x,y) is the generalized (or Binomial) mean of the two nonnegative numbers |x| and |y|. The case α is particularly well-suited for core-periphery purposes. If x is a nonnegative vector, we have

$$f_{\infty}(\boldsymbol{x}) := {\lim}_{\alpha\to\infty} f_{\alpha}(\boldsymbol{x}) = \sum_{ij=1}^{n} A_{ij} \max\{x_{i}, x_{j}\}, $$

and thus any nonnegative vector x for which f(x) is large assumes a necessarily large value on the entries involving the nodes in the core and smaller values within the periphery. In fact, when · denotes a p-norm, any vector x≥0, x=1 such that f(x) is large assigns to each node a value xi between zero and one so that each connection between two nodes i,j in the graph or, equivalently, each nonzero in the weight matrix A, involves at least one node such that xi is large. We note that the relevance of f(x) as a core–periphery quality function is highlighted for example in (Rombach et al. 2017), and the relaxed version (1) involving α was considered in (Tudisco and Higham 2019) for undirected graphs.

Connection with degree and eigenvector centralities

In the undirected case, A=AT, it has been argued that both the degree vector and the eigenvector (or Bonacich) centrality vector carry interesting core–periphery information and are good candidates for core score vectors (Borgatti and Everett 2000; Rombach et al. 2017; Tudisco and Higham 2019; Mondragón 2016). In this section we show that when α=1 or α=0 the problem (1) admits an explicit solution that, even when the graph is directed, boils down to the degree and the eigenvector centrality, respectively.

When α=1 the function fα(x) is linear, taking the form of the scalar product

$$f_{1}(\boldsymbol{x}) = \sum_{ij=1}^{n} A_{ij}\, \kappa_{1}(x_{i},x_{j}) = \frac 1 2 \mathbb{1}^{T} \left(A+A^{T}\right)\boldsymbol{x} \,. $$

As both the matrix A+AT and the vector x have nonnegative entries, using the Cauchy–Schwarz inequality, we have

$$f_{1}(\boldsymbol{x}) = \frac 1 2 |\mathbb{1}^{T} \left(A+A^{T}\right)\boldsymbol{x}| \leq \frac 1 2 \|\mathbb{1}^{T} \left(A+A^{T}\right)\|_{2} \|\boldsymbol{x}\|_{2} $$

and the inequality is always strict unless x is a multiple of \(\left (A+A^{T}\right)\mathbb {1} = \boldsymbol {d}^{\text {in}} + \boldsymbol {d}^{\text {out}}\). Therefore, if we choose the norm constraint in (1) to be x2=1, we have that

$$\max_{\boldsymbol{x}\geq 0 : \|\boldsymbol{x}\|_{2}=1} f_{1}(\boldsymbol{x}) = \frac 1 2 \|\boldsymbol{d}^{\text{in}} + \boldsymbol{d}^{\text{out}}\|_{2} $$

with maximizer u given by \(\boldsymbol {u} = \left (A+A^{T}\right)\mathbb {1} = \boldsymbol {d}^{\text {in}} + \boldsymbol {d}^{\text {out}}\) properly normalized. This shows that the solution of (1) reduces to the degree vector when the graph is undirected and coincides with the sum of the incoming and outgoing degree vectors in the general case.

It is well-known that when α→0, κα(x,y) converges to the geometric mean of |x| and |y|. Thus, when α=0 and x≥0, we have

$$f_{0}(\boldsymbol{x}) := {\lim}_{\alpha\to 0}f_{\alpha}(\boldsymbol{x}) = \sum_{ij=1}^{n} A_{ij} \, \sqrt{x_{i}x_{j}}\,. $$

As \(\boldsymbol {x}\mapsto \sqrt {\boldsymbol {x}}\) is bijective on the set of vectors with nonnegative entries, we can change variable \(\boldsymbol {y} = \sqrt {\boldsymbol {x}}\) in (1) and recast the problem (1) as

$$\max_{\boldsymbol{y}\geq 0} \sum_{ij}A_{ij}\, y_{i}y_{j} = \boldsymbol{y}^{T} A \boldsymbol{y}, \qquad \text{ subject to} \|\boldsymbol{y}^{2}\|=1. $$

Again, if we consider the 1-norm, we can write the constraint y21=1 as y21=yTy=1 and problem (1) becomes

$$\max_{\boldsymbol{x}\geq 0}f_{0}(\boldsymbol{x}) = \max_{\boldsymbol{y}\geq 0} f_{0}\left(\boldsymbol{y}^{2}\right)=\max_{\boldsymbol{y}\geq 0} \frac{\boldsymbol{y}^{T} A\boldsymbol{y}}{\boldsymbol{y}^{T} \boldsymbol{y}}\,. $$

By the Perron–Frobenius theorem (Horn and Johnson 1990), the Rayleigh quotient yTAy/yTy has a unique nonnegative maximizer c, which coincides with the Perron eigenvector of the nonnegative matrix A+AT. In other words, the core score that maximizes f0 is the vector u=c2, where c is the eigenvector centrality of the symmetrized network with weight matrix A+AT.

In the “Core-periphery nonlinear operator” section we show that both these two cases are actually a special case of a more general setting. We prove that for any α≥0 the solution of (1) is the Perron eigenvector of a nonlinear core–periphery operator. In particular, this implies that the solution u to (1) is unique for any α≥0, with an appropriate normalization. Moreover, using nonlinear Perron–Frobenius theory, this further allows us to introduce an iterative algorithm that computes u, with global convergence guarantees.

Logistic core-periphery random model for directed graphs

We introduced in (Tudisco and Higham 2019) a random graph model for undirected and unweighted graphs that can be used to artificially generate networks with a planted core–periphery structure. This model, unlike more classical block-based versions, is based on the logistic sigmoid function 1/(1+ex) rather than a Heaviside step function and allows a smooth transition between the set of core nodes and the set of peripheral ones. We refer to it as the logistic core–periphery random model. We notice that similar logistic function based random models have been considered in (O’Connor et al. 2015; Hoff et al. 2002; Jia and Benson 2018).

Here we extend the model to the case of directed graphs and we prove that the method of maximum likelihood applied to this random model coincides with the core–periphery quality function maximization problem (1), which provides the core–periphery analogue of a known phenomenon for stochastic block models in the community detection case (Newman 2016).

Consider a core–ranking assignment, that is, a nonnegative permutation vector π that assigns a distinct integer πi between 1 and n to each vertex i. The closer πi is to 1, the higher the rank of i as a member of the core. For convenience, we shift-and-scale the core–ranking vectors via the affine transform \(\boldsymbol {u}\mapsto \mathbb {1}-\boldsymbol {\pi }/n\). Hence, we consider the set

$$\text{CR}(n) =\Big\{\boldsymbol{u}\in\mathbb{R}^{n} : u_{i} = 1-\pi_{i}/n,\, \, \boldsymbol{\pi} \text{ is a permutation of }\{1,\dots,n\}\Big\}\,, $$

so that, similarly to a core–score assignment, uCR(n) has values in [0,1] and larger values of u correspond to higher positions in the core ranking.

Now, given uCR(n), the logistic core–periphery random model generates an edge from node i to node j with independent probability given by

$$ \Pr(i \to j) = \frac{1}{1+e^{-\kappa_{\alpha}(u_{i},u_{j})}} = p_{ij}(\boldsymbol{u})\,. $$
(3)

Note that for α2α1≥0 we have

$$\sqrt{|xy|} = \kappa_{0}(x,y) \geq \kappa_{\alpha_{1}}(x,y) \geq \kappa_{\alpha_{2}}(x,y) \geq \kappa_{\infty}(x,y)=\max\{|x|,|y|\}\,. $$

Thus, for any α≥0, the probability pij(u) tends to be large if at least one of the nodes i and j has a high core rank and this effect increases as α grows, as shown by Fig. 1.

Fig. 1
figure1

Probability pij(u ̱) for different values of the core scores uj, ui and of the parameter α. Upper panel: pij(u ̱) as a function of ui, for different values of uj and α. Lower panel: contour plots of pij(u ̱) as a function of ui and uj, for different fixed values of α

Suppose we are given a network with the nodes in arbitrary order and wish to find the best core ranking assignment based on the logistic random model (3). From a maximum likelihood perspective, this corresponds to maximizing the log-likelihood

$$ L_{\alpha}(\boldsymbol{u})= \sum_{ij\in E}\log p_{ij}(\boldsymbol{u}) + \sum_{ij\notin E}\log(1-p_{ij}(\boldsymbol{u})) $$
(4)

among all possible uCR(n). In other words, assuming that the given network is a sample from the logistic core–periphery random model (3) with the node labels shuffled arbitrarily, this is most likely to be the correct reordering.

A key observation here is that, for ranking vectors uCR(n), this maximum likelihood approach is equivalent to assigning a core–score to the nodes which maximizes a logistic core quality function fα. In fact, the following extension of Theorem 3.1 in (Tudisco and Higham 2019) holds.

Theorem 1

Let G be a directed unweighted graph. For any α≥0, a vector uCR(n) is solution of

$$\max_{\boldsymbol{u}\in \text{CR}(n)} \, L_{\alpha}(\boldsymbol{u}) $$

if and only if it is solution of

$$\max_{\boldsymbol{u} \in \text{CR}(n)} \, f_{\alpha}(\boldsymbol{u}). $$

This equivalence provides further justification for the kernel optimization approach. It also suggests that the logistic core–periphery random model (3) is a useful resource for testing core–periphery detection algorithms in this directed setting.

We also note that a closely related generative random graph model for core–periphery networks was proposed in (Jia and Benson 2018). That work focused on the undirected case and aimed to incorporate additionally available spatial information.

Core-periphery nonlinear operator

A study of the Hessian of fα reveals that fα is neither convex nor concave in general. This makes the solution of (1) particularly challenging. However, here we show that this optimization problem can be re-cast in terms of the Perron eigenvector of a nonlinear operator. We then show how its solution is always achievable via a generalization of the classical power method from numerical linear algebra.

Given α≥1, consider the nonlinear core-periphery operator \({\varPhi }_{\alpha }:\mathbb {R}^{n}\to \mathbb {R}^{n}\), entrywise defined as follows

$${\boldsymbol{x} \mapsto {\varPhi}_{\alpha}(\boldsymbol{x})_{i} = |x_{i}|^{\alpha-2}x_{i} \sum_{j=1}^{n} \frac{A_{ij}+A_{ji}}{\kappa_{\alpha}(x_{i},x_{j})^{\alpha-1}} \quad i=1,\dots, n}. $$

Given p>1, we consider the following nonlinear eigenvalue problem for Φα

$$ {\varPhi}_{\alpha}(\boldsymbol{x}) = \lambda\, \boldsymbol{x}^{p-1}\,. $$
(5)

One easily realizes that Φα is linear if and only if α=1 in which case that operator degenerates into the map such that Φ1(x)=din+dout, for any nonnegative vector x≥0. In this setting it is easily seen that the only nonnegative solution of (5) is x=din+dout with λ=1 and p=2. Combined with the discussion of the “Connection with degree and eigenvector centralities” section, this shows that for the case α=1 the unique nonnegative solution of the eigenvalue problem (5) coincides with the maximizer of (1). We can retrieve the same analogy for α→0 and p=1. In that case we have \({\lim }_{\alpha \to 0}{\varPhi }_{\alpha }(\boldsymbol {x})_{i} ={\varPhi }_{0}(\boldsymbol {x})_{i} = \frac 1 {\sqrt {x_{i}}}\sum _{j=1}^{n} (A_{ij}+A_{ji})\sqrt {x_{j}}\) whereas (5) becomes \({\varPhi }_{0}(\boldsymbol {x}) = \lambda \mathbb {1}\). Arguing as in the “Connection with degree and eigenvector centralities” section, again, we deduce that for α=0 and p=1 a nonnegative solution of the eigenvalue problem (5) coincides with a maximizer of (1).

When α≠0,1 the question of existence and uniqueness of a solution to (5) is less trivial. The following theorem gives a full answer and shows that the same one-to-one correspondence between (5) and (1) holds.

Theorem 2

Let α≥0 and p> max{1,α}. Then the eigenvalue problem (5) has a unique nonnegative solution u≥0 such that up:=(|u1|p++|un|p)1/p=1 which is also the unique solution of (1), provided that ·=·p. Moreover u is positive if and only if the network has no isolated nodes, i.e., all nodes have at least one outgoing or one incoming edge.

Note that, as no assumption on the connectedness of the graph is made, the eigenvector centrality, i.e., the nonnegative solution of (5) for α=0 and p=1, is not uniquely defined. Instead, Theorem 2 shows that the core–score assignment is always unique when p> max{1,α} and α≥0. The relevance of Theorem 2 is not only theoretical. In fact, it comes together with the following corollary which shows the global convergence to u of a simple iterative scheme.

Corollary 1

Given an initial guess u0>0 and parameters α≥0, p> max{1,α} and q=p/(p−1), consider the following iterative method

$$\left\{\begin{array}{l} \boldsymbol{v}_{k+1} = {\varPhi}_{\alpha}(\boldsymbol{u}_{k})\\ \boldsymbol{u}_{k+1} = \|\boldsymbol{v}_{k+1}\|_{q}^{1-q}|\boldsymbol{v}_{k+1}|^{q-2} \boldsymbol{v}_{k+1} \end{array}\right., \qquad k=0,1,2,3,\dots $$

Then uk≥0 for all k≥0 and \(\|\boldsymbol {u}_{k} - \boldsymbol {u}\|=\mathcal O\left (\left (\frac {\alpha -1}{p-1}\right)^{k}\right)\), i.e., uk converges to the unique solution u≥0 of (1) and (5).

Note that on sparse networks the method scales linearly with the number of nodes. In fact, each iteration requires \(\mathcal O(|E|)\) floating point operations, where |E| is the number of edges in the graph. Moreover, the free parameter p allows us to tune the overall number of iterations \(k^{\star }=\mathcal O\left (\ln \varepsilon /\ln \frac {\alpha -1}{p-1}\right)\) required to achieve the precision \(\|\boldsymbol {u}_{k^{\star }}-\boldsymbol {u}\|=\mathcal O(\varepsilon)\). We will refer to the iteration in Corollary 1 as the Nonlinear Spectral Method (NSM).

From Corollary 1 the choice pα appears to be attractive, since it leads an extremely rapid (linear) convergence rate. However, in choosing values for p and α, we must take account of two further issues. 1. As we argued in the “Core–periphery via functional kernel optimization” section, a larger value of α gives a kernel that more closely matches the ideal of max{xi,xj}. 2. A larger value of p produces a relaxed problem that is less likely to distinguish between the nodes. (Note that in the extreme case of p=, the constraint x=1 allows for the obvious solution \(\boldsymbol {x} = \mathbb {1}\), which assigns the same score to all nodes).

Combining points 1 and 2 with Corollary 1, we must compromise between a large parameter α in the kernel and a not-too-large value p for the vector norm, while keeping p>α to maintain convergence. In practice, we found that changing the value of p did not significantly affect the core–periphery structure output of the algorithm, which was instead governed by the value of α. In our experiments we chose p=2α and α=10, as this produced good results with guaranteed fast convergence. Moreover, we observed that larger values of α did not produce a noticeable change in the core–periphery structure identified.

Enron dataset

The Enron email network consists of 1,148,072 emails sent between 87,273 employees of Enron between 1999 and 2003. Nodes in the network are individual employees and weighted directed edges, with weights ranging from 1 to 3,904, count the number of emails sent from one employee to another. It is possible to send an email to oneself, and thus this network contains self–loops. Note that this network is not strongly (or even weakly) connected. The data has been collected from (Klimt and Yang 2004).

Three plots in Fig. 2 display the network by means of colored adjacency sparsity plots. Here, each nonzero entry in the adjacency matrix is shown with an intensity that corresponds to the edge weight (the darker the dot the larger the weight on the corresponding edge). These plots correspond to three different node labelings: the first one (top–left corner) is the original node labeling; the second plot (top–right corner) is the labeling, somewhat corresponding to a rich–club paradigm, obtained by re-ordering the nodes according to decreasing values of the overall degree din+dout; the third one (bottom–left corner) is the labeling corresponding to decreasing values of the core–score computed with the NSM using parameters α=10 and p=20. This latter figure clearly shows that the Enron email dataset contains a strong core–periphery structure, which was less prevalent initially. This is further confirmed by the core–periphery profile in the bottom–right plot, which shows the behavior of

$$\gamma(S_{k}) = \frac{\sum_{i,j \in S_{k}} A_{ij}}{\sum_{i\in S_{k}} d^{\text{in}}_{i} + d^{\text{out}}_{i}} $$

against k, where Sk is the set of k most peripheral nodes corresponding to a core–score assignment. As k varies from 1 to n=87,273, γ(Sk) varies from 0 to 1 and measures the ratio of periphery–periphery links to periphery–all links, if Sk were to be chosen to be the periphery set. Thus a network has a strong core–periphery structure revealed by a core–score vector if the corresponding profile γ(Sk) takes small values as k increases from zero and then grows dramatically as k crosses some threshold value.

Fig. 2
figure2

Adjacency sparsity plots and core–periphery profile of the Enron email dataset. The two panels in the top and the panel in the bottom-left corner show the nonzero entries of the adjacency matrix of the network, with different color intensities for different edge weights, when the nodes are re-labeled in three different ways: the top-left panel corresponds to the original node labeling; the top-right panel is the labeling obtained by re–ordering the nodes according to decreasing values of the overall degree d ̱in+d ̱out; the bottom-left is the labeling corresponding to decreasing values of the core–score computed with the proposed NSM. Finally, the bottom-right panel shows the persistence probability γ(Sk) as a function of k, when Sk is the set of the k most peripheral nodes according to the degree vector (orange line) or the NSM (blue line)

For undirected networks, the profile γ(Sk) was proposed in (Della Rossa et al. 2013) as a means to visualize core-periphery structure. In this case, γ(Sk) coincides with the persistence probability of the set Sk, i.e., the probability that a random walker who is currently in any of the nodes of Sk remains in Sk at the next time step. For directed strongly connected networks, the persistence probability of Sk would instead be given by \(\sum _{ij\in S_{k}}y_{i}P_{ij}/\sum _{j\in S_{k}}y_{j}\), where y is the stationary distribution of the random walk with transition matrix \(P_{ij}=A_{ij}/d_{i}^{\text {out}}\). However, as the Enron dataset we are considering is not connected, y is not well defined, and we compute γ(Sk) in its place.

Finally, in order to show how the parameter α affects the core–periphery assignment obtained with NSM on this dataset, we show in Fig. 3 the core–periphery structure and core-periphery profile using three different values of α, namely α{1.5,3,10}. While changing the value of p does not effect the core–periphery structure output of the algorithm, small values of α show a weaker core–periphery structure, which is consistent with the fact that our model ideally works best when α. However, in practice α=10 performs well and have observed that larger values of α do not result in any significant change.

Fig. 3
figure3

Adjacency matrix sparsity plots and core–periphery profiles corresponding to the relabeling obtained with the NSM and different values of α

Since the network is not strongly connected, we do not show plots corresponding to the eigenvector centrality—this is not uniquely defined and, in our tests, different runs of Julia’s Arpack.eigs gave rise to very different reorderings.

For this network the NSM with parameters α=10 and p=20 computed the solution to 9 digits of precision in less than 5 s on a standard i7 single core laptop, using Julia 1.0. Our code in both Matlab and Julia is available online at the address https://github.com/ftudisco/nonlinear-core-periphery .

Conclusion

Our main aim in this work was to show that the attractive properties of the nonlinear spectral method proposed in (Tudisco and Higham 2019) can almost completely be transferred to the directed, weighted and unconnected setting. In particular we show that for the core–periphery kernel quality function (1), proposed for example in (Rombach et al. 2017; Tudisco and Higham 2019), there is always a unique solution for α≥0 and p> max{1,α}, and this solution can be computed via a nonlinear spectral method whenever it is feasible to form matrix-vector products based on the network weight matrix. The proposed method, which exploits an intriguing connection between optimization and eigenproblems, generalizes the classical power method in order to compute the global maximum of a highly nonconvex function; thus it may also be of interest in other machine learning contexts.

Appendix: Theorem proofs

Proof

(Proof of Theorem 1.) First note that, by adding and removing \(\sum _{ij\in E}\log (1-p_{ij}(\boldsymbol {u}))\), the log likelihood (4) can be equivalently written as

$$L_{\alpha}(\boldsymbol{u}) = \sum_{ij\in E}\log\left(\frac{p_{ij}(\boldsymbol{u})}{1-p_{ij}(\boldsymbol{u}) }\right) + \sum_{i,j=1}^{n} \log(1-p_{ij}(\boldsymbol{u})) =: S_{1}(\boldsymbol{u})+S_{2}(\boldsymbol{u})\,. $$

Let us analyze the two terms S1 and S2 individually. If \(a = \frac {1}{1+e^{-b}}\), we have

$$\frac a {1-a} = \left(\frac{1}{1+e^{-b}}\right)\left(\frac{e^{-b}}{1+e^{-b}}\right)^{-1} = e^{b} $$

which, from (3), implies that

$$S_{1}(\boldsymbol{u}) = \sum_{ij\in E}\log\left(\frac{p_{ij}(\boldsymbol{u})}{1-p_{ij}(\boldsymbol{u}) }\right) = \sum_{ij\in E}\kappa_{\alpha}(u_{i},u_{j}) = \sum_{i,j=1}^{n} A_{ij}\kappa_{\alpha}(u_{i},u_{j}) \,, $$

i.e., S1(u)=fα(u). Now note that, if u,vCR(n) then there exists a permutation σ of {1,…,n} such that ui=vσ(i) for all i. Therefore,

$$S_{2}(\boldsymbol{u}) = \sum_{i,j=1}^{n} \log\left(\frac{1}{1+e^{-\kappa_{\alpha}(u_{i},u_{j})}} \right) = \sum_{i,j=1}^{n} \log\left(\frac{1}{1+e^{-\kappa_{\alpha}(v_{\sigma(i)},v_{\sigma(j)})} } \right) = S_{2}(\boldsymbol{v}), $$

which implies that S2 is constant on CR(n). Thus u maximizes Lα(u) if and only if it maximizes S1(u) and the proof is complete. □

Proof

(Proof of Theorem 2 and Corollary 1.) This proof is based on the proof of Theorem 4.5 in (Tudisco and Higham 2019) and the lemmas therein proved. For convenience, let us denote by \(\mathbb {R}^{n}_{+}\) the cone of vectors with nonnegative entries. Since we are interested in a nonnegative maximizer of fα(x) constrained on the sphere xp=1, we can equivalently look for a maximizer of fα(x/xp) on the whole cone of nonnegative vectors \(\mathbb {R}^{n}_{+}\). Now, notice that fα is positively 1-homogeneous, that is fα(ax)=afα(x) holds for any real number a≥0. Therefore we can further change our problem into the global maximum on \(\mathbb {R}^{n}_{+}\) of g(x)=fα(x)/xp, without losing any generality. The critical point condition for g implies the equivalence with the eigenvalue problem (5), i.e., x is a stationary point for g if and only if it is such that Φα(x)=λxp−1. As p>1, we can equivalently write \({\tilde {\varPhi }}(\boldsymbol {x}) = \mu \boldsymbol {x}\), with \(\mu = \lambda ^{\frac 1 {p-1} }\) and \(\tilde {{\varPhi }}(\boldsymbol {x}) = {\varPhi }_{\alpha }(\boldsymbol {x})^{\frac 1 {p-1}}\). We now show that there can only be one nonnegative x such that xp=1 and \(\tilde {\varPhi }(\boldsymbol {x}) = \mu \boldsymbol {x}\).

To this end, note that \(\tilde {\varPhi }(\boldsymbol {x})\geq 0\) for any x≥0. Thus if x≥0 and xp=1, then μ>0 and we have

$$\mu = \|\mu \boldsymbol{x}\|_{p} = \|\tilde{\varPhi}(\boldsymbol{x})\|_{p} = \|{\varPhi}_{\alpha}(\boldsymbol{x})\|_{q}^{q-1}, $$

where q is such that 1/p+1/q=1. Therefore any x≥0, xp=1 solution of (5) is a fixed point of the map

$$\mathit \Psi: \mathbb{R}^{n}_{+} \to \mathbb{R}^{n}_{+},\qquad \mathit \Psi(\boldsymbol{x}) = \frac{\tilde{\varPhi}(\boldsymbol{x})}{\|\tilde{\varPhi}(\boldsymbol{x})\|_{p}} = \frac{{\varPhi}_{\alpha}(\boldsymbol{x})^{q-1}}{\|{\varPhi}_{\alpha}(\boldsymbol{x})\|_{q}^{q-1}} \,. $$

Let τ(x,y)= lnx− lny. Lemma 4.4 of (Tudisco and Higham 2019) implies that

$$\frac{\tau\big(\mathit \Psi(\boldsymbol{x}),\mathit \Psi(\boldsymbol{y}) \big)}{\tau(\boldsymbol{x}, \boldsymbol{y})} \leq \frac{|\alpha-1|}{p-1}, $$

for any \(\boldsymbol {x}, \boldsymbol {y}\in \mathbb {R}^{n}_{+}\). As τ is a complete metric on the cone \(\mathbb {R}^{n}_{+}\) (see, for example, (Lemmens and Nussbaum 2012)), this shows that Ψ is a contraction and thus it has a unique fixed point.

We conclude that, when α>0 and p> max{1,α}, the eigenvalue problem (5) has a unique nonnegative solution u≥0 such that up=1, which is also the unique solution of (1) when we choose ·=·p. Note moreover that if we start with a positive u0>0 and apply Ψ iteratively to give uk+1=Ψ(uk) we obtain \(\tau (\boldsymbol {u}_{k+1}, \boldsymbol {u}_{k})\leq \left (\frac {|\alpha -1|}{p-1}\right)^{k}\tau (\boldsymbol {u}_{1},\boldsymbol {u}_{0})\). This, together with the inequality τ(u1,u0)≤γ, proved for example in Corollary 4.6 of (Tudisco and Higham 2019), completes the proof of Corollary 1.

Next we prove that u has a zero component i if and only if i is an isolated node, i.e., it has no incoming nor outgoing links. To this end, let \(\Omega _{A}^{+} \subseteq \mathbb {R}^{n}_{+}\) be the set of vectors

$$\Omega_{A}^{+} = \{ \boldsymbol{x} \geq 0 : x_{i} = 0 \text{ if and only if}~i \text{ is isolated}\} \,. $$

Note that, equivalently, xi=0 for \(\boldsymbol {x}\in \Omega _{A}^{+}\) if and only if Aij+Aji=0 for all j=1,…,n. Now note that if \(\boldsymbol {x} \in \Omega _{A}^{+}\) then \({\varPhi }_{\alpha }(\boldsymbol {x}) \in \Omega _{A}^{+}\). In fact, from its definition

$${\varPhi}_{\alpha}(\boldsymbol{x})_{i} = x_{i}^{\alpha-1} \sum_{j=1}^{n} \frac{A_{ij}+A_{ji}}{\kappa_{\alpha}(x_{i},x_{j})^{\alpha-1}} $$

we see that, if i is isolated, xi=0 and thus Φα(x)i=0, whereas if xi>0, then Φα(x)i>0 as κα(xi,xj)≥xi>0 and there exists at least one j such that \(\phantom {\dot {i}\!}A_{i,j^{*}}+A_{j^{*},i}>0\), which implies \(\phantom {\dot {i}\!}{\varPhi }_{\alpha }(\boldsymbol {x})_{i}\geq x_{i}^{\alpha -1}(A_{i,j^{*}}+A_{j^{*},i})\kappa _{\alpha }(x_{i},x_{j^{*}})^{1-\alpha }>0\). Note that the same conclusion holds for any initial positive vector; that is, x>0 implies \({\varPhi }_{\alpha }(\boldsymbol {x}) \in \Omega _{A}^{+}\). Therefore the iterative method of Corollary 1 converges to a vector in \(\Omega _{A}^{+}\) for any starting point, or, equivalently, any nonnegative solution of (5) must be in \(\Omega _{A}^{+}\). Since there exists only one such solution, the proof is complete. □

Availability of data and materials

All data and code related to this work are publicly available at the GitHub online repository https://github.com/ftudisco/nonlinear-core-periphery . The Enron dataset here used is also available at http://konect.uni-koblenz.de/networks/enron .

References

  1. Baños, R, Borge-Holthoefer J, Wang N, Moreno Y, González-Bailón S (2013) Diffusion dynamics with changing network composition. Entropy 15(11):4553–4568.

  2. Bascompte, J, Jordano P, Melián CJ, Olesen JM (2003) The nested assembly of plant–animal mutualistic networks. Proc Natl Acad Sci 100(16):9383–9387.

  3. Bertozzi, AL, Luo X, Stuart AM, Zygalakis KC (2018) Uncertainty quantification in the classification of high dimensional data. SIAM-ASA J Uncertain Quantif 6:568–595. https://doi.org/10.1137/17M1134214.

  4. Borgatti, SP, Everett MG (2000) Models of core/periphery structures. Soc Netw 21:375–395.

  5. Csermely, P, London A, Wu L-Y, Uzzi B (2013) Structure and dynamics of core/periphery networks. J Complex Netw 1(2):93–123.

  6. Cucuringu, M, Rombach P, Lee SH, Porter MA (2016) Detection of core–periphery structure in networks using spectral methods and geodesic paths. Eur J Appl Math 27(6):846–887.

  7. Della Rossa, F, Dercole F, Piccardi C (2013) Profiling core-periphery network structure by random walkers. Sci Rep 3:1467.

  8. Hoff, PD, Raftery AE, Handcock MS (2002) Latent space approaches to social network analysis. J Am Stat Assoc 97(460):1090–1098.

  9. Horn, RA, Johnson CR (1990) Matrix Analysis. Cambridge University Press, Cambridge.

  10. Jia, J, Benson AR (2018) Random spatial network models with core-periphery structure In: Proc. ACM International Conf. on Web Search and Data Mining (WSDM).. ACM, New York.

  11. Kim, PM, Korbel JO, Gerstein MB (2007) Positive selection at the protein network periphery: evaluation in terms of structural constraints and cellular context. Proc Natl Acad Sci 104(51):20274–20279.

  12. Klimt, B, Yang Y (2004) The Enron corpus: A new dataset for email classification research In: Proc. European Conf. on Machine Learning, 217–226.. Springer, Berlin.

  13. Lemmens, B, Nussbaum RD (2012) Nonlinear Perron-Frobenius Theory. Cambridge University Press, Cambridge.

  14. Mondragón, RJ (2016) Network partition via a bound of the spectral radius. J Complex Netw 5(4):513–526.

  15. Newman, ME (2016) Equivalence between modularity optimization and maximum likelihood methods for community detection. Phys Rev E 94(5):052315.

  16. Newman, MEJ (2011) Networks: an Introduction. Oxford University Press, Oxford.

  17. O’Connor, L, Médard M, Feizi S (2015) Maximum likelihood latent space embedding of logistic random dot product graphs. arXiv preprint. arXiv:1510.00850.

  18. Rombach, MP, Porter MA, Fowler JH, Mucha PJ (2014) Core-periphery structure in networks. SIAM J Appl Math 74:167–190.

  19. Rombach, MP, Porter MA, Fowler JH, Mucha PJ (2017) Core-periphery structure in networks (revisited). SIAM Rev 59:619–646.

  20. Sandhu, KS, Li G, Poh HM, Quek YLK, Sia YY, Peh SQ, Mulawadi FH, Lim J, Sikic M, Menghi F, et al. (2012) Large-scale functional organization of long-range chromatin interaction networks. Cell Rep 2(5):1207–1219.

  21. Schneider, CM, Moreira AA, Andrade JS, Havlin S, Herrmann HJ (2011) Mitigation of malicious attacks on networks. Proc Natl Acad Sci 108(10):3838–3841.

  22. Tomasello, MV, Napoletano M, Garas A, Schweitzer F (2017) The rise and fall of R & D networks. Ind Corp Chang 26(4):617–646.

  23. Tudisco, F, Higham DJ (2019) A nonlinear spectral method for core-periphery detection in networks. SIAM J Math Data Sci 269–292. https://epubs.siam.org/toc/sjmdaq/1/2.

  24. Zhou, S, Mondragón RJ (2004) The rich-club phenomenon in the internet topology. IEEE Commun Lett 8(3):180–182.

Download references

Acknowledgments

Not applicable.

Funding

The work of F.T. was supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie individual fellowship “MAGNET” No 744014. The work of D.J.H. is supported by the EPSRC/RCUK Established Career Fellowship EP/M00158X/1 and by the EPSRC Programme Grant EP/P020720/1.

Author information

Authors’ contributions

Both authors contributed to the problem formulation, algorithm design and analysis, and they co-wrote the manuscript. FT wrote the code for the computational experiments. Both authors read and approved the final manuscript.

Authors’ information

FT is a Reseach Assistant in the School of Mathematics at the University of Edinburgh (UK). DJH is Professor of Numerical Analysis in the School of Mathematics at the University of Edinburgh (UK).

Correspondence to Francesco Tudisco.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tudisco, F., Higham, D.J. A fast and robust kernel optimization method for core–periphery detection in directed and weighted graphs. Appl Netw Sci 4, 75 (2019) doi:10.1007/s41109-019-0173-9

Download citation

Keywords

  • Network
  • Nonlinear Perron–Frobenius
  • Power method
  • Relaxation