Skip to main content

Statistical properties of the MetaCore network of protein–protein interactions

Abstract

The MetaCore commercial database describes interactions of proteins and other chemical molecules and clusters in the form of directed network between these elements, viewed as nodes. The number of nodes goes beyond 40 thousands with almost 300 thousands links between them. The links have essentially bi-functional nature describing either activation or inhibition actions between proteins. We present here the analysis of statistical properties of this complex network applying the methods of the Google matrix, PageRank and CheiRank algorithms broadly used in the frame of the World Wide Web, Wikipedia, the world trade and other directed networks. We specifically describe the Ising PageRank approach which allows to treat the bi-functional type of protein–protein interactions. We also show that the developed reduced Google matrix algorithm allows to obtain an effective network of interactions inside a specific group of selected proteins. In addition to already known direct protein–protein interactions, this method allows to infer non trivial and unknown interactions between proteins arising from the summation over all the indirect pathways passing via the global bi-functional network. The developed analysis allows to establish an average action of each protein being more oriented to activation or inhibition. We argue that the described Google matrix analysis represents an efficient tool for investigation of influence of specific groups of proteins related to specific diseases.

Introduction

The MetaCore database (MetaCore) provides a large size network of Protein–Protein Interactions (PPIs). It has been shown to be useful for analysis of specific biological problems (see e.g. Ekins et al. 2006; Bessarabova et al. 2012) and finds various medical applications as e.g. the improvement of cancer prognosis (Winter et al. 2012). At present, the network has \(N=40{,}079\) nodes with \(N_\ell = 292{,}904\) links and an average of \(2n_l = 2N_l/N \approx 14.6\) links per node. The nodes are composed mainly by proteins but in addition there are also other molecules and molecular clusters catalyzing the interactions with proteins. This PPI network is directed and non-weighted. Its interesting feature is the bi-functional nature of the links leading to either the activation or the inhibition of one protein by another one. In some cases, the link action is neutral or unknown.

In the present work, we describe the statistical properties and the Google matrix analysis (GMA) of the MetaCore network. The GMA and the related PageRank algorithm has been at the foundation of the Google search engine with important applications to the World Wide Web (WWW) analysis (Brin and Page 1998; Langville and Meyer 2006). A variety of GMA applications to directed networks are presented in Ermann et al. (2015) and more specifically, the PageRank algorithm has been applied to various genomics networks (see e.g. Voevodski et al. 2009; Lee et al. 2011; Winter et al. 2012; Du et al. 2012; Li et al. 2021; Shan et al. 2021; Yang et al. 2021; Moroney et al. 2020; Stassen et al. 2021; Zhang et al. 2021). The first application of the GMA to PPI network with directed causal links was reported for the SIGNOR PPI network (Perfetto et al. 2016) in Lages et al. (2018). However, the size of the SIGNOR network is by a factor ten smaller than the MetaCore one and thus the GMA of SIGNOR network can be considered only as a test bed for more detailed studies of PPIs.

An important feature of the PPI networks is the bi-functional character of the directed links representing activation or inhibition actions. Usually, the directed networks have been considered without functionality of links (see e.g. Brin and Page 1998; Langville and Meyer 2006; Ermann et al. 2015). The Ising-Google matrix analysis (IGMA) (Frahm and Shepelyansky 2019a) extends the GMA for bi-functional links. A test application to the SIGNOR PPI network (Perfetto et al. 2016) can be found in Frahm and Shepelyansky (2020). The Ising-Google matrix analysis (IGMA) represents each node by two states \(\uparrow\) and \(\downarrow\), like Ising spins up and down. A link is then represented by a \(2 \times 2\)-matrix describing the actions of activation or inhibition (Frahm and Shepelyansky 2019a, 2020). By contrast with the case of links without functionality, this description leads to a doubling of the number of nodes \(N_I=2N\). In the present work, we apply the IGMA to the MetaCore network which provide bi-functional interactions between multiple proteins.

In addition, we also use the reduced Google matrix analysis (RGMA), developed in Frahm and Shepelyansky (2016), Frahm et al. (2016), to describe the effective interactions between a subset of \(N_{\mathrm{r}}\ll N\) selected nodes taking account of all the indirect pathways connecting each couple of these \(N_{\mathrm{r}}\) nodes throughout the global PPI network. The efficiency of the RGMA has been demonstrated for large variety of directed networks including Wikipedia and the world trade network (see e.g. El Zant et al. 2018; Coquidé et al. 2019). The RGMA adapted to the IGMA for bi-functional links is called hereafter the RIGMA.

The paper is composed as follows: the data sets and the methods are described in the “Data sets and methods” section, the results are presented in the “Results” section and the discussion and the conclusion are given in the “Discussion” section.

Data sets and methods

Google matrix construction of the MetaCore network

At the first step, we start the construction of the Google matrix G of the MetaCore network neglecting the bi-functional character of the links and considering unweighted links. Considering the adjacency matrix A, the elements \(A_{ij}\) of which are equal to 1 if node j points to node i and equal to 0 otherwise, the stochastic matrix S of the node-to-node Markov transitions is obtained by normalizing to unity each column of the adjacency matrix A. For dangling nodes, ie, nodes without outgoign links, the corresponding column is filled with elements with value 1/N. The stochastic matrix S describes a Markov chain process on the network: a random surfer hops from node to node in accordance with the network structure and hops anywhere on the network if it reaches a dangling node. The elements of the Google matrix G takes then the standard form

$$\begin{aligned} G_{ij} = \alpha S_{ij} + (1-\alpha ) / N \end{aligned}$$
(1)

where \(0.5\le \alpha <1\) is the damping factor. The random surfer obeying to the stochastic process encoded in G explores, with a probability \(\alpha\), the network in accordance to the stochastic matrix S and hops, with a complementary probability \(\left( 1-\alpha \right)\), to any node of the network. The damping factor allows the random surfer to escape from possible isolated communities. Here, we use the standard value \(\alpha =0.85\) (Langville and Meyer 2006; Ermann et al. 2015). The PageRank vector P is the right eigenvector of the Google matrix G corresponding to the leading eigenvalue, here \(\lambda =1\). The corresponding eigenproblem equation is then \(G P = P\). According to the Perron-Frobenius theorem, the PageRank vector P has positive elements. The PageRank vector element P(j) gives the probability to find the random surfer on the node j once the Markov process has reached the stationary regime. Consequently, all the nodes can be ranked by decreasing PageRank probability. We define the PageRank index K(j) giving the rank of the node j. The node j with the highest (lowest) PageRank probability P(j) corresponds to \(K(j)=1\) (\(K(j)=N\)). A recursive definition can be given: according to the PageRank algorithm, a node is all the more central that it is pointed by other central nodes. The PageRank algorithm then measures the influence of a node within the global network.

It is also useful to consider a network obtained by the inversion of all the directions of the links. For this inverted network, the corresponding Google matrix is denoted \(G^*\) and the corresponding PageRank vector is called the CheiRank vector \(P^*\) and is defined such as \(G^* P^* = P^*\). The importance and the detailed statistical analysis of the CheiRank vector have been reported in Chepelianskii (2010) and Zhirov et al. (2010) (see also Ermann et al. 2015; Coquidé et al. 2019). We define also a CheiRank index \(K^*(j)\) giving the rank of the node j according to its CheiRank probability \(P^*(j)\). According to the CheiRank algorithm, a node is all the more central that it points toward central nodes. The CheiRank algorithm then measures the diffusivity of a node within the global network.

Reduced Google matrix

The concept of the reduced Google matrix analysis was introduced in [21] and applied with details to Wikipedia networks in Frahm et al. (2016). The RGMA determines effective interactions between a selected subset of \(N_{\mathrm{r}}\) nodes embedded in a global network of size \(N \gg N_{\mathrm{r}}\). These effective interactions are determined taking into account that there are many indirect links between the \(N_{\mathrm{r}}\) nodes via all the other \(N_{\mathrm{s}}=N-N_{\mathrm{r}}\) nodes of the network. As an example, we may have two nodes A and C which belongs to the selected subset of \(N_{\mathrm{r}}\) nodes and which are not coupled by any direct link. However, it may exist a chain of links from A to \(B_1\), then from \(B_1\) to \(B_2\),..., and then from \(B_m\) to C where \(B_1,\dots ,B_m\) are nodes not belonging to the subset of \(N_{\mathrm{r}}\) nodes. Although A and C are not directly connected, there is a chain of \(m+1\) directed links indirectly connecting A and C. The RGMA allows to infer an effective weighted link between any couple of two nodes of the \(N_{\mathrm{r}}\) subset of interest taking account of the possible direct link existing between these two nodes and taking account of all the possible chains of links connecting them throughout the remaining global network of size \(N_{\mathrm{s}}=N-N_{\mathrm{r}}\gg N_{\mathrm{r}}\). It is important to stress that rather often the network analysis is done taking only into account the direct links between the \(N_{\mathrm{r}}\) nodes and, as a consequence, completely omitting their indirect interactions via the global network. It is known that such a simplified approach produces erroneous results as it happened for the network of historical figures extracted from Wikipedia when only direct links between historical figures were taking into account and all other links had been omitted (Aragon et al. 2012) (see discussion at Eom et al. 2015).

It is convenient to write the Google matrix G associated to the global network as

$$\begin{aligned} G=\left( \begin{array}{cc} {G_{\mathrm{rr}}}&{} {G_{\mathrm{rs}}}\\ {G_{\mathrm{sr}}}&{} {G_{\mathrm{ss}}}\end{array}\right) \end{aligned}$$
(2)

where the label “\({\mathrm{r}}\)” refers to the nodes of the reduced network, ie the subset of \(N_{\mathrm{r}}\) nodes, and “\(\mathrm s\)” to the other \(N_{\mathrm{s}}=N-N_{\mathrm{r}}\) nodes which form the complementary network acting as an effective “scattering network”. The reduced Google matrix \({G_{\mathrm{R}}}\) associated to the subset of the \(N_{\mathrm{r}}\) nodes is a \(N_{\mathrm{r}}\times N_{\mathrm{r}}\) matrix defined as

$$\begin{aligned} G_{\mathrm{R}}P_{\mathrm{r}}=P_{\mathrm{r}}\end{aligned}$$
(3)

where \(P_{\mathrm{r}}\) is a \(N_{\mathrm{r}}\) size vector the components of which are the normalized PageRank probabilities of the \(N_{\mathrm{r}}\) nodes of interest, \(P_{\mathrm{r}}(j)=P(j)/\sum _{i=1}^{N_{\mathrm{r}}}P(i)\). The RGMA consists in finding an effective Google matrix for the subset of \(N_{\mathrm{r}}\) nodes keeping the relative ranking between these nodes. To ensure the relation (3), the reduced Google matrix \({G_{\mathrm{R}}}\) has the form Frahm and Shepelyansky (2016), Frahm et al. (2016)

$$\begin{aligned} {G_{\mathrm{R}}}={G_{\mathrm{rr}}}+{G_{\mathrm{rs}}}(\mathbf{1} -{G_{\mathrm{ss}}})^{-1} {G_{\mathrm{sr}}}. \end{aligned}$$
(4)

As shown in Frahm and Shepelyansky (2016), Frahm et al. (2016), the reduced Google matrix \({G_{\mathrm{R}}}\) can be represented as the sum of three components

$$\begin{aligned} {G_{\mathrm{R}}}={G_{\mathrm{rr}}}+ {G_{\mathrm{pr}}}+ {G_{\mathrm{qr}}}. \end{aligned}$$
(5)

Here, the first component, \({G_{\mathrm{rr}}}\), corresponds to the direct transitions between the \(N_{\mathrm{r}}\) nodes; the second component, \({G_{\mathrm{pr}}}\), is a matrix of rank with all the columns being approximately equal to the reduced PageRank vector \(P_{\mathrm{r}}\); the third component, \({G_{\mathrm{qr}}}\), describes all the indirect pathways passing through the global network. Thus, the component \({G_{\mathrm{qr}}}\) represents the most nontrivial information related to indirect hidden transitions. We also define \({G_{\mathrm{qrnd}}}\) matrix which is the \({G_{\mathrm{qr}}}\) matrix deprived of its diagonal elements. The contribution of each component is characterized by their weights \({W_{\mathrm{R}}}\), \({W_{\mathrm{pr}}}\), \({W_{\mathrm{rr}}}\), \({W_{\mathrm{qr}}}\) (\({W_{\mathrm{qrnd}}}\)) respectively for \({G_{\mathrm{R}}}\), \({G_{\mathrm{pr}}}\), \({G_{\mathrm{rr}}}\), \({G_{\mathrm{qr}}}\) (\({G_{\mathrm{qrnd}}}\)). The weight of a matrix is given by the sum of all the matrix elements divided by its size, here \(N_{\mathrm{r}}\) (by definition \({W_{\mathrm{R}}}=1\)). Examples of reduced Google matrices associated to various directed networks are given in Lages et al. (2018), Frahm et al. (2016), Coquidé et al. (2019) and Frahm and Shepelyansky (2020).

Bi-functional Ising MetaCore network

To take into account the bi-functional nature (activation and inhibition) of MetaCore links, we use the approach proposed in Frahm and Shepelyansky (2019a), Frahm and Shepelyansky (2020) with the construction of a larger network where each node is split into two new nodes with labels \((+)\) and \((-)\). These two nodes can be viewed as two Ising-spin components associated to the activation and the inhibition of the corresponding protein. To construct the doubled “Ising” network of proteins, each elements of the initial adjacency matrix is replaced by one of the following \(2 \times 2\) matrices

$$\begin{aligned} \sigma _+=\left( \begin{array}{cc} 1 &{} 1 \\ 0 &{} 0 \\ \end{array}\right) ,\quad \sigma _-=\left( \begin{array}{cc} 0 &{} 0 \\ 1 &{} 1 \\ \end{array}\right) ,\quad \sigma _0=\frac{1}{2}\,\left( \begin{array}{cc} 1 &{} 1 \\ 1 &{} 1 \\ \end{array}\right) \end{aligned}$$
(6)

where \(\sigma _+\) applies to “activation” links, \(\sigma _-\) to “inhibition” links, and \(\sigma _0\) when the nature of the interaction is “unknown” or “neutral”. For the rare cases of multiple interactions between two proteins, we use the sum of the corresponding \(\sigma\)-matrices which increases the weight of the adjacency matrix elements. Once the “Ising” adjacency matrix is obtained, the corresponding Google matrix is constructed in the usual way (see “Google matrix construction of the MetaCore network” section). The initial simple MetaCore network has \(N=40{,}079\) nodes and \(N_{\ell }=292{,}904\) links; the ratio of the number of activation/inhibition links is \(N_{\ell +}/N_{\ell -} = 65379/49384\simeq 1.3\) and the number of neutral links is \(N_{\ell n}= N_{\ell } -N_{\ell +} - N_{\ell -} = 178141\). The doubled Ising MetaCore network corresponds to \(N_I=80158\) nodes and \(N_{I,\ell }=942090\) links (according to the non-zero entries of the used \(\sigma\)-matrices).

Now, the PageRank vector associated to this doubled Ising network has two components \(P_+(j)\) and \(P_-(j)\) for every node j of the simple network. Due to the particular structure of the \(\sigma\)-matrices (6), one can show analytically the exact identity, \(P(j)=P_+(j)+P_-(j)\), where P(j) is the PageRank of the initial single PPI network. We have numerically verified that the identity \(P(j)=P_+(j)+P_-(j)\) holds up to the numerical precision \(\sim 10^{-13}\).

As in Frahm and Shepelyansky (2019a), we characterize each node by its PageRank “magnetization” given by

$$\begin{aligned} M(j)=\frac{P_+(j)-P_-(j)}{P_+(j)+P_-(j)} . \end{aligned}$$
(7)

By definition, we have \(-1 \le M(j) \le 1\). Nodes with positive M are mainly activated nodes and those with negative M are mainly inhibited nodes.

Sensitivity

The reduced Google matrix \({G_{\mathrm{R}}}\) of bi-functional (or Ising) MetaCore network describes effective interactions between \(N_{\mathrm{r}}\) nodes taking into account the activation or inhibition nature of the interactions.

Following El Zant et al. (2018), it is useful to determine the sensitivity of the PageRank probabilities in respect to small variation of the matrix elements of \({G_{\mathrm{R}}}\). The PageRank sensitivity of the node j with respect to a small variation of the \(b\rightarrow a\) link measures the infinitesimal variation of the PageRank probability \(P_{\mathrm{r}}(j)\) induced by an infinitesimal increase of the \({G_{\mathrm{R}}}_{ab}\) stochastic matrix transition. Otherwise stated, the PageRank sensitivity gives the variation trend of the centrality of a given node if the weight of a given link is enhanced. Its definition is

$$\begin{aligned} D_{(b\rightarrow a)}(j)=\frac{1}{P_{\mathrm{r}}(j)} \left. \frac{d{P_{\mathrm{r}}}_\varepsilon (j)}{d\varepsilon }\right| _{\varepsilon =0} =\lim _{\varepsilon \rightarrow 0}\frac{1}{\varepsilon P_{\mathrm{r}}(j)}[{P_{\mathrm{r}}}_\varepsilon (j)-P_{\mathrm{r}}(j)] \end{aligned}$$
(8)

where \({P_{\mathrm{r}}}_\varepsilon (j)\) is the PageRank vector computed from a perturbed matrix \({{G_{\mathrm{R}}}}_\varepsilon\) the elements of which are defined by \({{G_{\mathrm{R}}}}_\varepsilon (a,b)={G_{\mathrm{R}}}(a,b)(1+\varepsilon )/[1+\varepsilon {G_{\mathrm{R}}}(a,b)]\) for the element (ab), \({{G_{\mathrm{R}}}}_\varepsilon (c,b)={G_{\mathrm{R}}}(c,b)/[1+\varepsilon {G_{\mathrm{R}}}(a,b)]\) for the other elements (cb) in the same column b, and \({{G_{\mathrm{R}}}}_\varepsilon (c,d)={G_{\mathrm{R}}}(c,d)\) for the elements (cd) in the other columns. The factor \(1/[1+\varepsilon {G_{\mathrm{R}}}(a,b)]\) ensures the correct sum normalization of the modified column b.

We use here an efficient algorithm described in Frahm and Shepelyansky (2019b) to evaluate the derivative in (8) exactly without usage of finite differences.

As proposed in El Zant et al. (2018), we define the symmetric matrix (see Eq. 15 of El Zant et al. 2018)

$$\begin{aligned} D_{(a\leftrightarrow b)}(j)=D_{(b\rightarrow a)}(j)+D_{(a\rightarrow b)}(j), \end{aligned}$$
(9)

which measures the node-j PageRank sensitivity to a weight increase of any direct link between nodes a and b, and furthermore the two symmetric and anti-symmetric sensitivity matrices

$$\begin{aligned} F_+(a,b)=D_{(a\leftrightarrow b)}(a)+D_{(a\leftrightarrow b)}(b),\quad F_-(a,b)=D_{(a\leftrightarrow b)}(a)-D_{(a\leftrightarrow b)}(b). \end{aligned}$$
(10)

The \(F_+\left( a,b\right)\) quantity measures the total PageRank sentivity of the nodes a and b when the weights of the direct links between them are enhanced, and the quantity \(F_-\left( a,b\right)\) allows to determine which of the centralities of the two nodes are the most impacted.

Results

Below, we describe various statistical properties of the MetaCore network obtained by the methods described above. More detailed data are available at [30].

CheiRank and PageRank of the MetaCore network

Let us sort the PageRank probabilities from the highest value to which we associate the \(K=1\) rank to the smallest value to which we associate to the \(K=N\) rank.

Fig. 1
figure 1

PageRank probability P(K) (\(P_I(K)\)) and CheiRank probability \(P^*(K^*)\) (\(P_I^*(K)\)) are shown as a function of the corresponding rank indexes K and \(K^*\) for the simple (Ising) MetaCore network. For comparison, the dashed gray line corresponds to the power decay \(P \propto K^{-2/3}\)

The dependence P(K) of the PageRank probabilities on the PageRank index K and the dependence \(P^*(K^*)\) of the CheiRank probabilities on the CheiRank index \(K^*\) are shown in Fig. 1 for the simple MetaCore network and the Ising (doubled) MetaCore network. The decay of the probabilities is approximately proportional to an inverse index in a power \(\beta \approx 2/3\), ie \(P(K) \propto 1/K^{2/3}\). This exponent \(\beta\) is approximately the same for the PageRank and the CheiRank probabilities, and for both network types. The situation is different from the networks of WWW, Wikipedia, and Linux for which one usually have \(\beta \approx 0.9\) for the PageRank probabilities and \(\beta \approx 0.6\) for CheiRank probabilities (Langville and Meyer 2006; Ermann et al. 2015; Chepelianskii 2010). The similarity between the PageRank and the CheiRank probability distributions suggests that in PPI networks, at least in the MetaCore PPI network, both ingoing and outgoing links are of equal importance while, in the other above cited networks, ingoing links are more robust and stable than outgoing links which have a more random character.

Let us focus on the biological elements at the top of the PageRank and CheiRank. In principle, these biological elements should be at the very end of signaling pathway cascades or the main triggers of such cascades, respectively. The top 40 PageRank and CheiRank nodes of the MetaCore network are given in Tables 1 and 2 respectively. The top 3 PageRank positions are occupied by specific molecules actively participating in various reactions with proteins. The top 3 CheiRank positions are occupied by the transcription factor c-Myc, the generic enzyme eIF2C2 (Argonaute-2), and the generic binding protein IGF2BP3. In a certain sense, we can say that top PageRank nodes are like workers in a company, who receive many orders, while top CheiRank nodes are like company administrators who submit many orders to their workers (such a situation was discussed for a company management network (Abel and Shepelyansky 2011)).

Table 1 Top 40 PageRank nodes of the simple MetaCore network
Table 2 Top 40 CheiRank nodes of the simple MetaCore network
Fig. 2
figure 2

Density of nodes of the MetaCore network on the PageRank-CheiRank \((K,K^*)\)-plane. The numbers of the color bar are a linear function of the logarithm of the density (with maximum values corresponding to 1 (red); minimum non-zero and zero values of the density corresponding to 0 (blue); the distribution is computed for \(100 \times 100\) cells equidistant in logarithmic scale). The white stars indicate the positions of the 12 selected nodes presented in Table 1. The white vertical and horizontal lines represent nodes with \(K\le 40\) and \(K^*\le 40\), respectively

The density distribution of nodes of the MetaCore network on the PageRank-CheiRank (\(K,K^*\))-plane is shown in Fig. 2. Comparing to the case of Wikipedia networks (Ermann et al. 2015; Zhirov et al. 2010) the distribution is globally more symmetric in respect to the diagonal \(K=K^*\). This reflects the fact that the decay of the PageRank and the CheiRank probabilities in Fig. 1 is approximately the same. However, the top nodes are rather different for the PageRank and CheiRank rankings that is also visible from Tables 1 and 2 . As an example, the top 40 PageRank and the top 40 CheiRank share only 7 nodes in common (Beta-catenin, p53, ESR1, STAT3, Androgen receptor, c-Myc, RelA) which are transcription factors with the exception of Beta-catenin which is a generic binding protein. As a consequence, depending on the considered biological process, these biological elements trigger the multiple cascades of interactions or are at the very end of these cascades. In contrast, there are biological elements with low K and high \(K^*\) and vice versa. For example, the phosphate compound PO\(_4^{3-}\), with PageRank-CheiRank indexes (\(K=16\)\(K^*=14888\)), is mainly a residue of biological processes and the passage of the Potassium ion K\(^+\) from the cytosol to the extracellular region, with PageRank-CheiRank indexes (\(K=19\), \(K^*=26346\)), can be considered as the final step of some biological process.

Among the top 40 PageRank nodes, we select a subset of 12 nodes which are more directly related to proteins. These 12 nodes are represented by white stars in the Fig. 2. The list of these nodes is given in Table 1. Below, we present the RIGMA analysis of these 12 nodes taking into account of the bi-functionality of the links (activation - inhibition).

Magnetization of nodes of the Ising MetaCore network

Fig. 3
figure 3

PageRank magnetization \(M(j)=(P_+(j)-P_-(j))/(P_+(j)+P_-(j))\) for the Ising MetaCore network. Here, j is the node index and K(j) is the PageRank index of the node j in the simple Metacore network (without node doubling). The biological class is reported for the top 40 PageRank nodes (\(K\le 40\), see Table 1): Inorganic ions (II), Generic binding protein (GBP), Transcription factor (TF), Protein kinase (PK), RNA, Receptor with enzyme activity (RwEA), DNA, Compound (C), RAS superfamily (RAS), Reaction (R), Generic receptor (GR), Protein phosphatase (PP), Generic enzyme (GE), Lipid phosphatase (LP)

From the PageRank probabilities of the Ising MetaCore network, we determine the magnetization M(K) of each node given by (7). The dependence of the magnetization M(K) on the PageRank index K is shown in Fig. 3. For \(K \le 10\), only few nodes have a significant positive magnetization. In the range \(10 < K \le 10^3\), some nodes have almost the maximal positive or negative values of the magnetization with M being close to 1 or \(-1\). Such nodes perform mainly activation or inhibition actions, respectively. For the range \(K > 10^3\), we see an envelope restricting the maximal or the minimal values M. At present, we have no analytical description of this envelope. We suppose that nodes with high K values have a majority of outgoing links which are more fluctuating in this range thus giving a decrease of the maximal/minimal values of M.

Focusing on the top 40 PageRank in Fig. 3, we mainly observe that the nodes are either non-magnetized \(M\approx 0\), or positively magnetized \(M\gtrsim 1\). These two situations correspond to biological elements which are equally activated/inhibited (\(M\approx 0\)) and mainly activated (\(M\gtrsim 0\)), respectively. Among the top 40 PageRank nodes, the non-magnetized elements are mainly inorganic ions, such as H\(^+\), Na\(^+\), K\(^+\), Ca\(^{2+}\), and Cl\(^-\), which are involved in many elementary interactions. As non-magnetized nodes, we observe also very important biological molecules such as DNA and the ADP compound which should occupy a central place in the protein interaction network. Among positively magnetized nodes, we observe reactions (\(M\gtrsim 0.75\)), RNA (\(M\simeq 0.7\)), protein kinase (\(M\simeq 0.25-0.4\)) and phosphatase (\(M\simeq 0.55\)), which respectively are known to turn on and turn off proteins. Let us remark that, as DNA, RNA occupies a very central role in the protein interaction network (\(K=6\)) but has a relatively high magnetization \(M\simeq 0.7\) which indicates that RNA is mainly activated at the end of major biological processes. The other positively magnetized nodes correspond to some transcription factors, such as PPAR-gamma and STAT3, generic binding proteins, such as PI3K and GRB2, members of RAS superfamily, such as Rac1, and generic proteins, such as ITGB1. We nevertheless note that among the top 40 PageRank nodes, there are some mainly inhibited proteins (\(M\simeq -0.2\)) such as the generic binding protein Bcl-2, the generic enzyme MDM2, and the lipid phosphatase PTEN.

We return to the magnetization properties of the selected subset of 12 nodes and the top 40 PageRank nodes in the next section.

RIGMA analysis of the Ising MetaCore network

Fig. 4
figure 4

Reduced Google matrix \({G_{\mathrm{R}}}\) and its three matrix components \({G_{\mathrm{pr}}}\), \({G_{\mathrm{rr}}}\) and \({G_{\mathrm{qr}}}\) associated to the subset of nodes presented in Table 1 and belonging to the Ising MetaCore network. The weights of the matrix components are \({W_{\mathrm{pr}}}=0.952\), \({W_{\mathrm{rr}}}=0.015\), and \({W_{\mathrm{qr}}}=0.033\). Each colored cell corresponds to a \({G_{\mathrm{X}}}_{ij}\) elements with \(\mathrm X\) standing for \(\mathrm R\), \(\mathrm{rr}\), \(\mathrm{pr}\), or \(\mathrm{qr}\). A \({G_{\mathrm{X}}}_{ij}\) matrix element is associated to the \(j\rightarrow i\) link where the j index corresponds to proteins read on the bottom or the top axis of the panels and the i index corresponds to proteins read on the left axis of the panels. The \(+\) and − signs correspond to the activated and inhibited state of the node, respectively. The values of the color bar correspond to the ratio of the matrix element over its maximum value. Note that the elements of \({G_{\mathrm{qr}}}\) may be possibly negative. There are only few and very small negative values (between \(-2.6\times 10^{-5}\) and \(-5.3\times 10^{-5}\)) which are not distinguishable from zero and have the same blue color code. Therefore, the color bar is only shown for positive values

We illustrate the RIGMA analysis of the Ising MetaCore network by applying it to the subset of the 12 nodes given in Table 1. They are selected from the top 40 PageRank list of Table 1 by excluding simple molecules and keeping best ranked proteins according to PageRank probabilities. Each of the 12 nodes of the subset are doubled into a \((+)\) component and a \((-)\) component. We order these 24 nodes by ascending PageRank index K and alternating the \((+)\) and the \((-)\) components. This ordering is used to represent, in Fig. 4, the reduced Ising Google matrix \({G_{\mathrm{R}}}\) and its three matrix components \({G_{\mathrm{rr}}}\), \({G_{\mathrm{pr}}}\), and \({G_{\mathrm{qr}}}\). The weights of these components are respectively \({W_{\mathrm{rr}}}=0.015\), \({W_{\mathrm{pr}}}=0.952\), and \({W_{\mathrm{qr}}}=0.033\). As in the case of Wikipedia networks (Frahm et al. 2016), the component \({G_{\mathrm{pr}}}\) has the highest weight, but as discussed, it is rather close to a matrix with identical columns, each one similar to the PageRank column vector. Thus, the \({G_{\mathrm{pr}}}\) matrix component does not provides more information than the standard PageRank/GMA analysis. We also see that the weight \({W_{\mathrm{qr}}}\) of the indirect links generated by long indirect pathways passing through the global Ising MetaCore network has approximately twice higher weight than the weight \({W_{\mathrm{rr}}}\) of direct links. Consequently, the contribution of indirect links are very important.

In the \({G_{\mathrm{rr}}}\) matrix component, each element i of the jth column corresponds to the direct action of the protein j on the protein i. The action is either an activation (+) or an inhibition (–). As a consequence, the \({G_{\mathrm{rr}}}\) matrix component simply mimics the Ising MetaCore network matrix adjacency (the elements of \({G_{\mathrm{rr}}}\) with a value equal to (greater than) \(\left( 1-\alpha \right) /2N\approx 0\) correspond to values 0 (1 or 1/2) in the adjacency matrix of the Ising MetaCore network). It is interesting to compare the \({G_{\mathrm{qr}}}\) matrix elements with those of the \({G_{\mathrm{rr}}}\) matrix. Each one of the \({G_{\mathrm{qr}}}\) matrix elements either modifies, generally enhances, the weight of an existing link, for which a non zero matrix element exists in the \({G_{\mathrm{rr}}}\) matrix, or, interestingly, quantifies the strength of a hidden effective interaction between two proteins. As an example of the enhancement of an existing direct link, we observe, in Fig. 4, that the known activation of FAK1 by ITGB1 is enhanced by indirect links, ie, by pathways passing by the elements outside the set of the twelve chosen proteins. Also, we clearly observe also an enhancement of the self-activation of FAK1 and the appearance of its indirect self-inhibition.

Fig. 5
figure 5

Sum of the two matrix components \({G_{\mathrm{rr}}}+{G_{\mathrm{qr}}}^{(\text{nd-block})}\). The matrix components are the same as in Fig. 4 with the exception of \({G_{\mathrm{qr}}}^{(\text{nd-block})}\) which is obtained from \({G_{\mathrm{qr}}}\) by excluding \(2 \times 2\) diagonal blocks, each one of these blocks corresponding to a protein self-loop. The right panel is the same as the left panel with the exception of the white cells which hide the direct links \(j\rightarrow i\) between the \(12\times 2\) chosen nodes in the Ising MetaCore network. The values of the color bar correspond to the ratio of the matrix element over its maximum value

Let us focus on the possible hidden interactions between the chosen set of twelve proteins. For that purpose, we show, in Fig. 5 (left panel), the matrix sum \({G_{\mathrm{rr}}}+{G_{\mathrm{qr}}}^{\mathrm{(nd-block)}}\) which summarizes both the information concerning the direct and hidden interactions between the set of twelve proteins. Here, we use the \({G_{\mathrm{qr}}}^{\mathrm{(nd-block)}}\) matrix which is the \({G_{\mathrm{qr}}}\) matrix from which the diagonal elements (self-interaction terms) have been removed. In Fig. 5 (right panel), the \({G_{\mathrm{rr}}}+{G_{\mathrm{qr}}}^{\mathrm{(nd-block)}}\) matrix elements associated to direct links have been masked to highlight only hidden interactions. Hence, although the ARX protein (aristaless related homeobox) is not directly connected to the other eleven proteins, ie, there is no direct action of the ARX protein onto the other eleven proteins and vice versa, it indirectly strongly inhibits the tumor suppressor protein p53. Secondarily, the ARX protein indirectly acts on different other proteins as it is indicated by blue shades on the ARX column in Fig. 5: hence, the ARX protein indirectly activates the EGFR and ESR1 proteins (epidermal growth factor receptor and estrogen receptor, respectively) and inhibits the c-Myc protein (proto-oncogene protein). Similarly, according to the \({G_{\mathrm{rr}}}\) matrix component (see Fig. 4), the c-Myc protein does not act on the chosen twelve proteins. But, the blues shades of the c-Myc column on the right panel of Fig. 5 gives us information on which proteins it indirectly contributes to activate or deactivate. Among strong weights of the \({G_{\mathrm{rr}}}+{G_{\mathrm{qr}}}^{\mathrm{(nd-block)}}\) matrix sum, we observe also the SHP-2 phosphatase protein indirectly strongly interacts with with the ARX protein and the estrogen receptor protein ESR2. In return, the ESR2 protein, which directly inhibits ESR1 and c-Myc proteins, also indirectly activates the SHP-2 protein.

Fig. 6
figure 6

Reduced network of the chosen twelve proteins (see Table 1). The construction procedure of this network is given in the main text. Arrow edges (\(\rightarrow\)) represent activation links and dot edges represent inhibition links. The arrow tips and the dots are on the side of the target nodes. The black edges represent direct links. The red edges represent hidden links. The color of the nodes correspond to the type of proteins: transcription factors (yellow), protein kinase (cyan), generic receptor (red), receptor with enzyme activity (green), general binding protein (orange), and protein phosphatase (violet). The border style of the node correspond to the location of the proteins: nucleus (solid line), cytoplasm (dashed line), and plasma membrane (hairy line)

In contrast to the adjacency matrix and the Google matrix, the matrix sum \({G_{\mathrm{rr}}}+{G_{\mathrm{qr}}}\) allows to discriminate the directed links outgoing from a given protein by assigning different weights to them. This discrimination is possible as the RGMA and the RIGMA takes account of not only the direct linkage of the twelve chosen proteins but all the knowledge encoded in the MetaCore complex network. Moreover, possible hidden links between proteins, which are non directly connected in the MetaCore network, can be inferred from non negligible weights in \({G_{\mathrm{qr}}}\). We propose to construct a reduced network highlighting the most important, direct and hidden, interactions between the twelve chosen proteins. Hence, for each protein source of the chosen subset, we retain, in the corresponding column of the \({G_{\mathrm{rr}}}+{G_{\mathrm{qr}}}\) matrix, the two most important weights revealing the most important protein target of the protein source. Here, we do not consider self-inhibition and self-activation matrix elements in the matrix sum \({G_{\mathrm{rr}}}+{G_{\mathrm{qr}}}\). The constructed reduced network associated to the twelve chosen proteins is presented in Fig. 6. We observe that it captures the above mentioned direct and hidden activation/inhibition actions between the considered proteins.

Sensitivity of the chosen subset

Fig. 7
figure 7

PageRank sensitivity matrices \(F_+(a,b)\) (left panel) and \(F_-(a,b)\) (right panel) associated to the subset of nodes presented in Table 1 and belonging to the Ising MetaCore network. The values of the color bar correspond to \(F_+(a,b)/\max _{a,b}\left( F_+(a,b)\right)\) (left panel) and to \(F_-(a,b)/\max _{a,b}\left| F_-(a,b)\right|\) (right panel)

The PageRank sensitivity of the chosen subset of 12 proteins is obtained from the RIGMA and presented in Fig. 7 following the definitions given by (8) and (9). We remind that \(F_{+}(a,b)\) gives the symmetric PageRank sensitivity of the nodes a and b to a variation of the link weight between them (in both directions from a to b and from b to a). The asymmetric PageRank sensitivity \(F_{-}(a,b)\) determines what node is more sensitive to such weight variation. Thus, for \(F_{-}(a,b) > 0\) we obtain that node a is more influenced by node b and for \(F_{-}(a,b) < 0\) that node b is more influenced by node a.

In Fig. 7, the symmetric PageRank sensitivity (left panel) shows that the activation or the inhibition of the p53 protein affect or are affected by all the other chosen proteins. Indeed, the p53 protein with \(K=4\) occupies a very central role in the protein interactions network as it contributes to the stability of the genome preventing damage biological information to be spread (Prives and Hall 1999; Joerger and Fersht 2016; Toufektchan and Toledo 2018). The reddish horizontal and vertical lines on the symmetric PageRank sensitivity panel (Fig. 7 left) indicate that the activation of the EGFR, STAT3, FAK1, SHP-2 and the GRB2 proteins are affected or affect all the other proteins of the chosen set. The right panel of the Fig. 7 shows the asymmetric PageRank sensitivity. We clearly observe that in fact it is the p53 protein which influences the activation/inhibition of the other proteins, and in a stronger manner the inhibition of the GRB2, SHP-2, ITGB1, and FAK1 proteins. In general, the inhibition of these four cited proteins is influenced by most of the other proteins (see greenish horizontal lines) and in return their respective activation influences also the other proteins (see greenish vertical lines).

Examples of magnetization of nodes

Fig. 8
figure 8

PageRank magnetization \(M(K)=(P_+(K)-P_-(K))/(P_+(K)+P_-(K))\) presented in the PageRank-CheiRank \((K,K^*)\)-plane. Left panel: PageRank magnetization M(k) for the chosen twelve proteins presented in the relative indexes \((k,k^*)\)-plane (see k adn \(k^*\) indexes in Table 1). Right panel: PageRank magnetization M(K) for nodes with \(K\le 40\). Here, \(P_{\pm }(K)\) is the PageRank probability of the \((\pm )\) component of the Ising MetaCore network node associated with the K PageRank (see text). The values of the color bar correspond to \(M/\max {|M|}\) with \(\max _{k\le 12}\left| M(k)\right| =0.549\) (left panel) and \(\max _{K\le 40}\left| M(K)\right| =0.963\) (right panel). On the right panel, the \(K^*\) index is here the relative CheiRank index inside the set of the first \(K\le 40\) nodes

In Fig. 8, left panel, we show, in the PageRank-CheiRank \((k,k^*)\)-plane (see Tables 1 and 2 for the relative PageRank and CheiRank indexes k and \(k^*\)), the PageRank magnetization M of the chosen 12 proteins. These nodes have global PageRank indexes \(K\le 26\) (see Table 1). In agreement with data presented in Fig. 3, for such K values, the magnetization is indeed mainly positive. So, these proteins are primarily activated. More precisely, as they belong to the top PageRank of the proteome (\(K/N\le 0.6 \permil\)), these proteins are activated as the result of most important cascade of interactions along the causality pathways. The magnetization M is also presented in Fig. 7, right panel, but for every nodes with \(K \le 40\) (see Table 1). Here, for these top PageRank indexes, we have both positive and negative magnetization values, but the majority of the nodes have a magnetization close to zero, as discussed in Fig. 3.

For the top 40 PageRank (\(K \le 40\)), the top 3 most activated nodes are the K\(^+\) Potassium ion in cytosol (\(K=19\), \(M(K)=0.962815\)), the CO\(_2\)+H\(_2\)O\(\rightarrow\)H\(^+\)+HCO\(_3^-\) reaction (\(K=29\), \(M(K)=0.757892\)), and the intracellular mRNA (\(K=6\), \(M(K)=0.708154\)), and the top 3 most inhibited nodes are the generic binding protein Bcl-2 (\(K=34\), \(M(K)=-0.220751\)), the generic enzyme MDM2 (\(K=36\), \(M(K)=-0.208082\)), and the generic binding protein E-cadherin (\(K=28\), \(M(K)=-0.114311\)).

Discussion

In this work, we have presented a detailed description of the statistical properties of the protein–protein interactions MetaCore network obtained with extensive Google matrix analysis. In this way, we find the proteins and molecules which are at the top PageRank and CheiRank positions playing thus an important role in the influence flow through the whole network structure. With a simple example of a subset of selected proteins (subset of selected nodes), we show that the reduced Google matrix analysis allows to determine the effective interactions between these proteins taking into account all the indirect pathways between these proteins through the global MetaCore network, in addition to direct interactions between selected proteins. We stress that the approach with the reduced Ising Google matrix algorithm, based on Ising spin description, allows to take into account the bi-functional nature of the protein–protein interactions (activation or inhibition) and to determine the average action type (or magnetization) of each protein .

Here, we have presented mainly the statistical properties of the MetaCore network without entering into detailed analysis of related biological effects. We plan to address, in further studies, the biological effects obtained from the reduced Google matrix analysis of the MetaCore network.

Availability of data and materials

Data supporting the findings of this study is available at MetaCoreNet. Finer details can be obtained from the authors upon request, e.g. network data for Fig. 4. However, the global PPIs data used for construction of the MetaCore network is the private property of Clarivate Analytics distibuted by their rules.

Abbreviations

PPIs:

Protein–Protein Interactions

GMA:

Google matrix analysis

IGMA:

Ising Google matrix analysis

RGMA:

Reduced Google matrix analysis

RIGMA:

Reduced Ising Google matrix analysis

WWW:

World Wide Web

References

Download references

Acknowledgements

We thank Andrei Zinovyev (Institut Curie) for useful discussions.

Funding

This research has been partially supported through the grant NANOX N° ANR-17-EURE-0009 (project MTDINA) in the frame of the Programme Investissements d’Avenir, France. This research has been also supported by the Programme Investissements d’Avenir ANR-15-IDEX-0003, ISITE-BFC (GNETWORKS project) and the council of Bourgogne Franche-Comté region (APEX project and REpTILs project).

Author information

Authors and Affiliations

Authors

Contributions

The authors contributed equally to this work. All authors read and approved the final manuscript.

Corresponding author

Correspondence to José Lages.

Ethics declarations

Competing interests

EK is a senior research scientist employed by Clarivate, Barcelona. KMF, JL, and DLS declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kotelnikova, E., Frahm, K.M., Lages, J. et al. Statistical properties of the MetaCore network of protein–protein interactions. Appl Netw Sci 7, 7 (2022). https://doi.org/10.1007/s41109-022-00444-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s41109-022-00444-4

Keywords

  • Complex networks
  • MetaCore
  • Google matrix
  • PageRank
  • Protein–protein interactions network