 Research
 Open Access
 Published:
Statistical properties of the MetaCore network of protein–protein interactions
Applied Network Science volume 7, Article number: 7 (2022)
Abstract
The MetaCore commercial database describes interactions of proteins and other chemical molecules and clusters in the form of directed network between these elements, viewed as nodes. The number of nodes goes beyond 40 thousands with almost 300 thousands links between them. The links have essentially bifunctional nature describing either activation or inhibition actions between proteins. We present here the analysis of statistical properties of this complex network applying the methods of the Google matrix, PageRank and CheiRank algorithms broadly used in the frame of the World Wide Web, Wikipedia, the world trade and other directed networks. We specifically describe the Ising PageRank approach which allows to treat the bifunctional type of protein–protein interactions. We also show that the developed reduced Google matrix algorithm allows to obtain an effective network of interactions inside a specific group of selected proteins. In addition to already known direct protein–protein interactions, this method allows to infer non trivial and unknown interactions between proteins arising from the summation over all the indirect pathways passing via the global bifunctional network. The developed analysis allows to establish an average action of each protein being more oriented to activation or inhibition. We argue that the described Google matrix analysis represents an efficient tool for investigation of influence of specific groups of proteins related to specific diseases.
Introduction
The MetaCore database (MetaCore) provides a large size network of Protein–Protein Interactions (PPIs). It has been shown to be useful for analysis of specific biological problems (see e.g. Ekins et al. 2006; Bessarabova et al. 2012) and finds various medical applications as e.g. the improvement of cancer prognosis (Winter et al. 2012). At present, the network has \(N=40{,}079\) nodes with \(N_\ell = 292{,}904\) links and an average of \(2n_l = 2N_l/N \approx 14.6\) links per node. The nodes are composed mainly by proteins but in addition there are also other molecules and molecular clusters catalyzing the interactions with proteins. This PPI network is directed and nonweighted. Its interesting feature is the bifunctional nature of the links leading to either the activation or the inhibition of one protein by another one. In some cases, the link action is neutral or unknown.
In the present work, we describe the statistical properties and the Google matrix analysis (GMA) of the MetaCore network. The GMA and the related PageRank algorithm has been at the foundation of the Google search engine with important applications to the World Wide Web (WWW) analysis (Brin and Page 1998; Langville and Meyer 2006). A variety of GMA applications to directed networks are presented in Ermann et al. (2015) and more specifically, the PageRank algorithm has been applied to various genomics networks (see e.g. Voevodski et al. 2009; Lee et al. 2011; Winter et al. 2012; Du et al. 2012; Li et al. 2021; Shan et al. 2021; Yang et al. 2021; Moroney et al. 2020; Stassen et al. 2021; Zhang et al. 2021). The first application of the GMA to PPI network with directed causal links was reported for the SIGNOR PPI network (Perfetto et al. 2016) in Lages et al. (2018). However, the size of the SIGNOR network is by a factor ten smaller than the MetaCore one and thus the GMA of SIGNOR network can be considered only as a test bed for more detailed studies of PPIs.
An important feature of the PPI networks is the bifunctional character of the directed links representing activation or inhibition actions. Usually, the directed networks have been considered without functionality of links (see e.g. Brin and Page 1998; Langville and Meyer 2006; Ermann et al. 2015). The IsingGoogle matrix analysis (IGMA) (Frahm and Shepelyansky 2019a) extends the GMA for bifunctional links. A test application to the SIGNOR PPI network (Perfetto et al. 2016) can be found in Frahm and Shepelyansky (2020). The IsingGoogle matrix analysis (IGMA) represents each node by two states \(\uparrow\) and \(\downarrow\), like Ising spins up and down. A link is then represented by a \(2 \times 2\)matrix describing the actions of activation or inhibition (Frahm and Shepelyansky 2019a, 2020). By contrast with the case of links without functionality, this description leads to a doubling of the number of nodes \(N_I=2N\). In the present work, we apply the IGMA to the MetaCore network which provide bifunctional interactions between multiple proteins.
In addition, we also use the reduced Google matrix analysis (RGMA), developed in Frahm and Shepelyansky (2016), Frahm et al. (2016), to describe the effective interactions between a subset of \(N_{\mathrm{r}}\ll N\) selected nodes taking account of all the indirect pathways connecting each couple of these \(N_{\mathrm{r}}\) nodes throughout the global PPI network. The efficiency of the RGMA has been demonstrated for large variety of directed networks including Wikipedia and the world trade network (see e.g. El Zant et al. 2018; Coquidé et al. 2019). The RGMA adapted to the IGMA for bifunctional links is called hereafter the RIGMA.
The paper is composed as follows: the data sets and the methods are described in the “Data sets and methods” section, the results are presented in the “Results” section and the discussion and the conclusion are given in the “Discussion” section.
Data sets and methods
Google matrix construction of the MetaCore network
At the first step, we start the construction of the Google matrix G of the MetaCore network neglecting the bifunctional character of the links and considering unweighted links. Considering the adjacency matrix A, the elements \(A_{ij}\) of which are equal to 1 if node j points to node i and equal to 0 otherwise, the stochastic matrix S of the nodetonode Markov transitions is obtained by normalizing to unity each column of the adjacency matrix A. For dangling nodes, ie, nodes without outgoign links, the corresponding column is filled with elements with value 1/N. The stochastic matrix S describes a Markov chain process on the network: a random surfer hops from node to node in accordance with the network structure and hops anywhere on the network if it reaches a dangling node. The elements of the Google matrix G takes then the standard form
where \(0.5\le \alpha <1\) is the damping factor. The random surfer obeying to the stochastic process encoded in G explores, with a probability \(\alpha\), the network in accordance to the stochastic matrix S and hops, with a complementary probability \(\left( 1\alpha \right)\), to any node of the network. The damping factor allows the random surfer to escape from possible isolated communities. Here, we use the standard value \(\alpha =0.85\) (Langville and Meyer 2006; Ermann et al. 2015). The PageRank vector P is the right eigenvector of the Google matrix G corresponding to the leading eigenvalue, here \(\lambda =1\). The corresponding eigenproblem equation is then \(G P = P\). According to the PerronFrobenius theorem, the PageRank vector P has positive elements. The PageRank vector element P(j) gives the probability to find the random surfer on the node j once the Markov process has reached the stationary regime. Consequently, all the nodes can be ranked by decreasing PageRank probability. We define the PageRank index K(j) giving the rank of the node j. The node j with the highest (lowest) PageRank probability P(j) corresponds to \(K(j)=1\) (\(K(j)=N\)). A recursive definition can be given: according to the PageRank algorithm, a node is all the more central that it is pointed by other central nodes. The PageRank algorithm then measures the influence of a node within the global network.
It is also useful to consider a network obtained by the inversion of all the directions of the links. For this inverted network, the corresponding Google matrix is denoted \(G^*\) and the corresponding PageRank vector is called the CheiRank vector \(P^*\) and is defined such as \(G^* P^* = P^*\). The importance and the detailed statistical analysis of the CheiRank vector have been reported in Chepelianskii (2010) and Zhirov et al. (2010) (see also Ermann et al. 2015; Coquidé et al. 2019). We define also a CheiRank index \(K^*(j)\) giving the rank of the node j according to its CheiRank probability \(P^*(j)\). According to the CheiRank algorithm, a node is all the more central that it points toward central nodes. The CheiRank algorithm then measures the diffusivity of a node within the global network.
Reduced Google matrix
The concept of the reduced Google matrix analysis was introduced in [21] and applied with details to Wikipedia networks in Frahm et al. (2016). The RGMA determines effective interactions between a selected subset of \(N_{\mathrm{r}}\) nodes embedded in a global network of size \(N \gg N_{\mathrm{r}}\). These effective interactions are determined taking into account that there are many indirect links between the \(N_{\mathrm{r}}\) nodes via all the other \(N_{\mathrm{s}}=NN_{\mathrm{r}}\) nodes of the network. As an example, we may have two nodes A and C which belongs to the selected subset of \(N_{\mathrm{r}}\) nodes and which are not coupled by any direct link. However, it may exist a chain of links from A to \(B_1\), then from \(B_1\) to \(B_2\),..., and then from \(B_m\) to C where \(B_1,\dots ,B_m\) are nodes not belonging to the subset of \(N_{\mathrm{r}}\) nodes. Although A and C are not directly connected, there is a chain of \(m+1\) directed links indirectly connecting A and C. The RGMA allows to infer an effective weighted link between any couple of two nodes of the \(N_{\mathrm{r}}\) subset of interest taking account of the possible direct link existing between these two nodes and taking account of all the possible chains of links connecting them throughout the remaining global network of size \(N_{\mathrm{s}}=NN_{\mathrm{r}}\gg N_{\mathrm{r}}\). It is important to stress that rather often the network analysis is done taking only into account the direct links between the \(N_{\mathrm{r}}\) nodes and, as a consequence, completely omitting their indirect interactions via the global network. It is known that such a simplified approach produces erroneous results as it happened for the network of historical figures extracted from Wikipedia when only direct links between historical figures were taking into account and all other links had been omitted (Aragon et al. 2012) (see discussion at Eom et al. 2015).
It is convenient to write the Google matrix G associated to the global network as
where the label “\({\mathrm{r}}\)” refers to the nodes of the reduced network, ie the subset of \(N_{\mathrm{r}}\) nodes, and “\(\mathrm s\)” to the other \(N_{\mathrm{s}}=NN_{\mathrm{r}}\) nodes which form the complementary network acting as an effective “scattering network”. The reduced Google matrix \({G_{\mathrm{R}}}\) associated to the subset of the \(N_{\mathrm{r}}\) nodes is a \(N_{\mathrm{r}}\times N_{\mathrm{r}}\) matrix defined as
where \(P_{\mathrm{r}}\) is a \(N_{\mathrm{r}}\) size vector the components of which are the normalized PageRank probabilities of the \(N_{\mathrm{r}}\) nodes of interest, \(P_{\mathrm{r}}(j)=P(j)/\sum _{i=1}^{N_{\mathrm{r}}}P(i)\). The RGMA consists in finding an effective Google matrix for the subset of \(N_{\mathrm{r}}\) nodes keeping the relative ranking between these nodes. To ensure the relation (3), the reduced Google matrix \({G_{\mathrm{R}}}\) has the form Frahm and Shepelyansky (2016), Frahm et al. (2016)
As shown in Frahm and Shepelyansky (2016), Frahm et al. (2016), the reduced Google matrix \({G_{\mathrm{R}}}\) can be represented as the sum of three components
Here, the first component, \({G_{\mathrm{rr}}}\), corresponds to the direct transitions between the \(N_{\mathrm{r}}\) nodes; the second component, \({G_{\mathrm{pr}}}\), is a matrix of rank with all the columns being approximately equal to the reduced PageRank vector \(P_{\mathrm{r}}\); the third component, \({G_{\mathrm{qr}}}\), describes all the indirect pathways passing through the global network. Thus, the component \({G_{\mathrm{qr}}}\) represents the most nontrivial information related to indirect hidden transitions. We also define \({G_{\mathrm{qrnd}}}\) matrix which is the \({G_{\mathrm{qr}}}\) matrix deprived of its diagonal elements. The contribution of each component is characterized by their weights \({W_{\mathrm{R}}}\), \({W_{\mathrm{pr}}}\), \({W_{\mathrm{rr}}}\), \({W_{\mathrm{qr}}}\) (\({W_{\mathrm{qrnd}}}\)) respectively for \({G_{\mathrm{R}}}\), \({G_{\mathrm{pr}}}\), \({G_{\mathrm{rr}}}\), \({G_{\mathrm{qr}}}\) (\({G_{\mathrm{qrnd}}}\)). The weight of a matrix is given by the sum of all the matrix elements divided by its size, here \(N_{\mathrm{r}}\) (by definition \({W_{\mathrm{R}}}=1\)). Examples of reduced Google matrices associated to various directed networks are given in Lages et al. (2018), Frahm et al. (2016), Coquidé et al. (2019) and Frahm and Shepelyansky (2020).
Bifunctional Ising MetaCore network
To take into account the bifunctional nature (activation and inhibition) of MetaCore links, we use the approach proposed in Frahm and Shepelyansky (2019a), Frahm and Shepelyansky (2020) with the construction of a larger network where each node is split into two new nodes with labels \((+)\) and \(()\). These two nodes can be viewed as two Isingspin components associated to the activation and the inhibition of the corresponding protein. To construct the doubled “Ising” network of proteins, each elements of the initial adjacency matrix is replaced by one of the following \(2 \times 2\) matrices
where \(\sigma _+\) applies to “activation” links, \(\sigma _\) to “inhibition” links, and \(\sigma _0\) when the nature of the interaction is “unknown” or “neutral”. For the rare cases of multiple interactions between two proteins, we use the sum of the corresponding \(\sigma\)matrices which increases the weight of the adjacency matrix elements. Once the “Ising” adjacency matrix is obtained, the corresponding Google matrix is constructed in the usual way (see “Google matrix construction of the MetaCore network” section). The initial simple MetaCore network has \(N=40{,}079\) nodes and \(N_{\ell }=292{,}904\) links; the ratio of the number of activation/inhibition links is \(N_{\ell +}/N_{\ell } = 65379/49384\simeq 1.3\) and the number of neutral links is \(N_{\ell n}= N_{\ell } N_{\ell +}  N_{\ell } = 178141\). The doubled Ising MetaCore network corresponds to \(N_I=80158\) nodes and \(N_{I,\ell }=942090\) links (according to the nonzero entries of the used \(\sigma\)matrices).
Now, the PageRank vector associated to this doubled Ising network has two components \(P_+(j)\) and \(P_(j)\) for every node j of the simple network. Due to the particular structure of the \(\sigma\)matrices (6), one can show analytically the exact identity, \(P(j)=P_+(j)+P_(j)\), where P(j) is the PageRank of the initial single PPI network. We have numerically verified that the identity \(P(j)=P_+(j)+P_(j)\) holds up to the numerical precision \(\sim 10^{13}\).
As in Frahm and Shepelyansky (2019a), we characterize each node by its PageRank “magnetization” given by
By definition, we have \(1 \le M(j) \le 1\). Nodes with positive M are mainly activated nodes and those with negative M are mainly inhibited nodes.
Sensitivity
The reduced Google matrix \({G_{\mathrm{R}}}\) of bifunctional (or Ising) MetaCore network describes effective interactions between \(N_{\mathrm{r}}\) nodes taking into account the activation or inhibition nature of the interactions.
Following El Zant et al. (2018), it is useful to determine the sensitivity of the PageRank probabilities in respect to small variation of the matrix elements of \({G_{\mathrm{R}}}\). The PageRank sensitivity of the node j with respect to a small variation of the \(b\rightarrow a\) link measures the infinitesimal variation of the PageRank probability \(P_{\mathrm{r}}(j)\) induced by an infinitesimal increase of the \({G_{\mathrm{R}}}_{ab}\) stochastic matrix transition. Otherwise stated, the PageRank sensitivity gives the variation trend of the centrality of a given node if the weight of a given link is enhanced. Its definition is
where \({P_{\mathrm{r}}}_\varepsilon (j)\) is the PageRank vector computed from a perturbed matrix \({{G_{\mathrm{R}}}}_\varepsilon\) the elements of which are defined by \({{G_{\mathrm{R}}}}_\varepsilon (a,b)={G_{\mathrm{R}}}(a,b)(1+\varepsilon )/[1+\varepsilon {G_{\mathrm{R}}}(a,b)]\) for the element (a, b), \({{G_{\mathrm{R}}}}_\varepsilon (c,b)={G_{\mathrm{R}}}(c,b)/[1+\varepsilon {G_{\mathrm{R}}}(a,b)]\) for the other elements (c, b) in the same column b, and \({{G_{\mathrm{R}}}}_\varepsilon (c,d)={G_{\mathrm{R}}}(c,d)\) for the elements (c, d) in the other columns. The factor \(1/[1+\varepsilon {G_{\mathrm{R}}}(a,b)]\) ensures the correct sum normalization of the modified column b.
We use here an efficient algorithm described in Frahm and Shepelyansky (2019b) to evaluate the derivative in (8) exactly without usage of finite differences.
As proposed in El Zant et al. (2018), we define the symmetric matrix (see Eq. 15 of El Zant et al. 2018)
which measures the nodej PageRank sensitivity to a weight increase of any direct link between nodes a and b, and furthermore the two symmetric and antisymmetric sensitivity matrices
The \(F_+\left( a,b\right)\) quantity measures the total PageRank sentivity of the nodes a and b when the weights of the direct links between them are enhanced, and the quantity \(F_\left( a,b\right)\) allows to determine which of the centralities of the two nodes are the most impacted.
Results
Below, we describe various statistical properties of the MetaCore network obtained by the methods described above. More detailed data are available at [30].
CheiRank and PageRank of the MetaCore network
Let us sort the PageRank probabilities from the highest value to which we associate the \(K=1\) rank to the smallest value to which we associate to the \(K=N\) rank.
The dependence P(K) of the PageRank probabilities on the PageRank index K and the dependence \(P^*(K^*)\) of the CheiRank probabilities on the CheiRank index \(K^*\) are shown in Fig. 1 for the simple MetaCore network and the Ising (doubled) MetaCore network. The decay of the probabilities is approximately proportional to an inverse index in a power \(\beta \approx 2/3\), ie \(P(K) \propto 1/K^{2/3}\). This exponent \(\beta\) is approximately the same for the PageRank and the CheiRank probabilities, and for both network types. The situation is different from the networks of WWW, Wikipedia, and Linux for which one usually have \(\beta \approx 0.9\) for the PageRank probabilities and \(\beta \approx 0.6\) for CheiRank probabilities (Langville and Meyer 2006; Ermann et al. 2015; Chepelianskii 2010). The similarity between the PageRank and the CheiRank probability distributions suggests that in PPI networks, at least in the MetaCore PPI network, both ingoing and outgoing links are of equal importance while, in the other above cited networks, ingoing links are more robust and stable than outgoing links which have a more random character.
Let us focus on the biological elements at the top of the PageRank and CheiRank. In principle, these biological elements should be at the very end of signaling pathway cascades or the main triggers of such cascades, respectively. The top 40 PageRank and CheiRank nodes of the MetaCore network are given in Tables 1 and 2 respectively. The top 3 PageRank positions are occupied by specific molecules actively participating in various reactions with proteins. The top 3 CheiRank positions are occupied by the transcription factor cMyc, the generic enzyme eIF2C2 (Argonaute2), and the generic binding protein IGF2BP3. In a certain sense, we can say that top PageRank nodes are like workers in a company, who receive many orders, while top CheiRank nodes are like company administrators who submit many orders to their workers (such a situation was discussed for a company management network (Abel and Shepelyansky 2011)).
The density distribution of nodes of the MetaCore network on the PageRankCheiRank (\(K,K^*\))plane is shown in Fig. 2. Comparing to the case of Wikipedia networks (Ermann et al. 2015; Zhirov et al. 2010) the distribution is globally more symmetric in respect to the diagonal \(K=K^*\). This reflects the fact that the decay of the PageRank and the CheiRank probabilities in Fig. 1 is approximately the same. However, the top nodes are rather different for the PageRank and CheiRank rankings that is also visible from Tables 1 and 2 . As an example, the top 40 PageRank and the top 40 CheiRank share only 7 nodes in common (Betacatenin, p53, ESR1, STAT3, Androgen receptor, cMyc, RelA) which are transcription factors with the exception of Betacatenin which is a generic binding protein. As a consequence, depending on the considered biological process, these biological elements trigger the multiple cascades of interactions or are at the very end of these cascades. In contrast, there are biological elements with low K and high \(K^*\) and vice versa. For example, the phosphate compound PO\(_4^{3}\), with PageRankCheiRank indexes (\(K=16\), \(K^*=14888\)), is mainly a residue of biological processes and the passage of the Potassium ion K\(^+\) from the cytosol to the extracellular region, with PageRankCheiRank indexes (\(K=19\), \(K^*=26346\)), can be considered as the final step of some biological process.
Among the top 40 PageRank nodes, we select a subset of 12 nodes which are more directly related to proteins. These 12 nodes are represented by white stars in the Fig. 2. The list of these nodes is given in Table 1. Below, we present the RIGMA analysis of these 12 nodes taking into account of the bifunctionality of the links (activation  inhibition).
Magnetization of nodes of the Ising MetaCore network
From the PageRank probabilities of the Ising MetaCore network, we determine the magnetization M(K) of each node given by (7). The dependence of the magnetization M(K) on the PageRank index K is shown in Fig. 3. For \(K \le 10\), only few nodes have a significant positive magnetization. In the range \(10 < K \le 10^3\), some nodes have almost the maximal positive or negative values of the magnetization with M being close to 1 or \(1\). Such nodes perform mainly activation or inhibition actions, respectively. For the range \(K > 10^3\), we see an envelope restricting the maximal or the minimal values M. At present, we have no analytical description of this envelope. We suppose that nodes with high K values have a majority of outgoing links which are more fluctuating in this range thus giving a decrease of the maximal/minimal values of M.
Focusing on the top 40 PageRank in Fig. 3, we mainly observe that the nodes are either nonmagnetized \(M\approx 0\), or positively magnetized \(M\gtrsim 1\). These two situations correspond to biological elements which are equally activated/inhibited (\(M\approx 0\)) and mainly activated (\(M\gtrsim 0\)), respectively. Among the top 40 PageRank nodes, the nonmagnetized elements are mainly inorganic ions, such as H\(^+\), Na\(^+\), K\(^+\), Ca\(^{2+}\), and Cl\(^\), which are involved in many elementary interactions. As nonmagnetized nodes, we observe also very important biological molecules such as DNA and the ADP compound which should occupy a central place in the protein interaction network. Among positively magnetized nodes, we observe reactions (\(M\gtrsim 0.75\)), RNA (\(M\simeq 0.7\)), protein kinase (\(M\simeq 0.250.4\)) and phosphatase (\(M\simeq 0.55\)), which respectively are known to turn on and turn off proteins. Let us remark that, as DNA, RNA occupies a very central role in the protein interaction network (\(K=6\)) but has a relatively high magnetization \(M\simeq 0.7\) which indicates that RNA is mainly activated at the end of major biological processes. The other positively magnetized nodes correspond to some transcription factors, such as PPARgamma and STAT3, generic binding proteins, such as PI3K and GRB2, members of RAS superfamily, such as Rac1, and generic proteins, such as ITGB1. We nevertheless note that among the top 40 PageRank nodes, there are some mainly inhibited proteins (\(M\simeq 0.2\)) such as the generic binding protein Bcl2, the generic enzyme MDM2, and the lipid phosphatase PTEN.
We return to the magnetization properties of the selected subset of 12 nodes and the top 40 PageRank nodes in the next section.
RIGMA analysis of the Ising MetaCore network
We illustrate the RIGMA analysis of the Ising MetaCore network by applying it to the subset of the 12 nodes given in Table 1. They are selected from the top 40 PageRank list of Table 1 by excluding simple molecules and keeping best ranked proteins according to PageRank probabilities. Each of the 12 nodes of the subset are doubled into a \((+)\) component and a \(()\) component. We order these 24 nodes by ascending PageRank index K and alternating the \((+)\) and the \(()\) components. This ordering is used to represent, in Fig. 4, the reduced Ising Google matrix \({G_{\mathrm{R}}}\) and its three matrix components \({G_{\mathrm{rr}}}\), \({G_{\mathrm{pr}}}\), and \({G_{\mathrm{qr}}}\). The weights of these components are respectively \({W_{\mathrm{rr}}}=0.015\), \({W_{\mathrm{pr}}}=0.952\), and \({W_{\mathrm{qr}}}=0.033\). As in the case of Wikipedia networks (Frahm et al. 2016), the component \({G_{\mathrm{pr}}}\) has the highest weight, but as discussed, it is rather close to a matrix with identical columns, each one similar to the PageRank column vector. Thus, the \({G_{\mathrm{pr}}}\) matrix component does not provides more information than the standard PageRank/GMA analysis. We also see that the weight \({W_{\mathrm{qr}}}\) of the indirect links generated by long indirect pathways passing through the global Ising MetaCore network has approximately twice higher weight than the weight \({W_{\mathrm{rr}}}\) of direct links. Consequently, the contribution of indirect links are very important.
In the \({G_{\mathrm{rr}}}\) matrix component, each element i of the jth column corresponds to the direct action of the protein j on the protein i. The action is either an activation (+) or an inhibition (–). As a consequence, the \({G_{\mathrm{rr}}}\) matrix component simply mimics the Ising MetaCore network matrix adjacency (the elements of \({G_{\mathrm{rr}}}\) with a value equal to (greater than) \(\left( 1\alpha \right) /2N\approx 0\) correspond to values 0 (1 or 1/2) in the adjacency matrix of the Ising MetaCore network). It is interesting to compare the \({G_{\mathrm{qr}}}\) matrix elements with those of the \({G_{\mathrm{rr}}}\) matrix. Each one of the \({G_{\mathrm{qr}}}\) matrix elements either modifies, generally enhances, the weight of an existing link, for which a non zero matrix element exists in the \({G_{\mathrm{rr}}}\) matrix, or, interestingly, quantifies the strength of a hidden effective interaction between two proteins. As an example of the enhancement of an existing direct link, we observe, in Fig. 4, that the known activation of FAK1 by ITGB1 is enhanced by indirect links, ie, by pathways passing by the elements outside the set of the twelve chosen proteins. Also, we clearly observe also an enhancement of the selfactivation of FAK1 and the appearance of its indirect selfinhibition.
Let us focus on the possible hidden interactions between the chosen set of twelve proteins. For that purpose, we show, in Fig. 5 (left panel), the matrix sum \({G_{\mathrm{rr}}}+{G_{\mathrm{qr}}}^{\mathrm{(ndblock)}}\) which summarizes both the information concerning the direct and hidden interactions between the set of twelve proteins. Here, we use the \({G_{\mathrm{qr}}}^{\mathrm{(ndblock)}}\) matrix which is the \({G_{\mathrm{qr}}}\) matrix from which the diagonal elements (selfinteraction terms) have been removed. In Fig. 5 (right panel), the \({G_{\mathrm{rr}}}+{G_{\mathrm{qr}}}^{\mathrm{(ndblock)}}\) matrix elements associated to direct links have been masked to highlight only hidden interactions. Hence, although the ARX protein (aristaless related homeobox) is not directly connected to the other eleven proteins, ie, there is no direct action of the ARX protein onto the other eleven proteins and vice versa, it indirectly strongly inhibits the tumor suppressor protein p53. Secondarily, the ARX protein indirectly acts on different other proteins as it is indicated by blue shades on the ARX column in Fig. 5: hence, the ARX protein indirectly activates the EGFR and ESR1 proteins (epidermal growth factor receptor and estrogen receptor, respectively) and inhibits the cMyc protein (protooncogene protein). Similarly, according to the \({G_{\mathrm{rr}}}\) matrix component (see Fig. 4), the cMyc protein does not act on the chosen twelve proteins. But, the blues shades of the cMyc column on the right panel of Fig. 5 gives us information on which proteins it indirectly contributes to activate or deactivate. Among strong weights of the \({G_{\mathrm{rr}}}+{G_{\mathrm{qr}}}^{\mathrm{(ndblock)}}\) matrix sum, we observe also the SHP2 phosphatase protein indirectly strongly interacts with with the ARX protein and the estrogen receptor protein ESR2. In return, the ESR2 protein, which directly inhibits ESR1 and cMyc proteins, also indirectly activates the SHP2 protein.
In contrast to the adjacency matrix and the Google matrix, the matrix sum \({G_{\mathrm{rr}}}+{G_{\mathrm{qr}}}\) allows to discriminate the directed links outgoing from a given protein by assigning different weights to them. This discrimination is possible as the RGMA and the RIGMA takes account of not only the direct linkage of the twelve chosen proteins but all the knowledge encoded in the MetaCore complex network. Moreover, possible hidden links between proteins, which are non directly connected in the MetaCore network, can be inferred from non negligible weights in \({G_{\mathrm{qr}}}\). We propose to construct a reduced network highlighting the most important, direct and hidden, interactions between the twelve chosen proteins. Hence, for each protein source of the chosen subset, we retain, in the corresponding column of the \({G_{\mathrm{rr}}}+{G_{\mathrm{qr}}}\) matrix, the two most important weights revealing the most important protein target of the protein source. Here, we do not consider selfinhibition and selfactivation matrix elements in the matrix sum \({G_{\mathrm{rr}}}+{G_{\mathrm{qr}}}\). The constructed reduced network associated to the twelve chosen proteins is presented in Fig. 6. We observe that it captures the above mentioned direct and hidden activation/inhibition actions between the considered proteins.
Sensitivity of the chosen subset
The PageRank sensitivity of the chosen subset of 12 proteins is obtained from the RIGMA and presented in Fig. 7 following the definitions given by (8) and (9). We remind that \(F_{+}(a,b)\) gives the symmetric PageRank sensitivity of the nodes a and b to a variation of the link weight between them (in both directions from a to b and from b to a). The asymmetric PageRank sensitivity \(F_{}(a,b)\) determines what node is more sensitive to such weight variation. Thus, for \(F_{}(a,b) > 0\) we obtain that node a is more influenced by node b and for \(F_{}(a,b) < 0\) that node b is more influenced by node a.
In Fig. 7, the symmetric PageRank sensitivity (left panel) shows that the activation or the inhibition of the p53 protein affect or are affected by all the other chosen proteins. Indeed, the p53 protein with \(K=4\) occupies a very central role in the protein interactions network as it contributes to the stability of the genome preventing damage biological information to be spread (Prives and Hall 1999; Joerger and Fersht 2016; Toufektchan and Toledo 2018). The reddish horizontal and vertical lines on the symmetric PageRank sensitivity panel (Fig. 7 left) indicate that the activation of the EGFR, STAT3, FAK1, SHP2 and the GRB2 proteins are affected or affect all the other proteins of the chosen set. The right panel of the Fig. 7 shows the asymmetric PageRank sensitivity. We clearly observe that in fact it is the p53 protein which influences the activation/inhibition of the other proteins, and in a stronger manner the inhibition of the GRB2, SHP2, ITGB1, and FAK1 proteins. In general, the inhibition of these four cited proteins is influenced by most of the other proteins (see greenish horizontal lines) and in return their respective activation influences also the other proteins (see greenish vertical lines).
Examples of magnetization of nodes
In Fig. 8, left panel, we show, in the PageRankCheiRank \((k,k^*)\)plane (see Tables 1 and 2 for the relative PageRank and CheiRank indexes k and \(k^*\)), the PageRank magnetization M of the chosen 12 proteins. These nodes have global PageRank indexes \(K\le 26\) (see Table 1). In agreement with data presented in Fig. 3, for such K values, the magnetization is indeed mainly positive. So, these proteins are primarily activated. More precisely, as they belong to the top PageRank of the proteome (\(K/N\le 0.6 \permil\)), these proteins are activated as the result of most important cascade of interactions along the causality pathways. The magnetization M is also presented in Fig. 7, right panel, but for every nodes with \(K \le 40\) (see Table 1). Here, for these top PageRank indexes, we have both positive and negative magnetization values, but the majority of the nodes have a magnetization close to zero, as discussed in Fig. 3.
For the top 40 PageRank (\(K \le 40\)), the top 3 most activated nodes are the K\(^+\) Potassium ion in cytosol (\(K=19\), \(M(K)=0.962815\)), the CO\(_2\)+H\(_2\)O\(\rightarrow\)H\(^+\)+HCO\(_3^\) reaction (\(K=29\), \(M(K)=0.757892\)), and the intracellular mRNA (\(K=6\), \(M(K)=0.708154\)), and the top 3 most inhibited nodes are the generic binding protein Bcl2 (\(K=34\), \(M(K)=0.220751\)), the generic enzyme MDM2 (\(K=36\), \(M(K)=0.208082\)), and the generic binding protein Ecadherin (\(K=28\), \(M(K)=0.114311\)).
Discussion
In this work, we have presented a detailed description of the statistical properties of the protein–protein interactions MetaCore network obtained with extensive Google matrix analysis. In this way, we find the proteins and molecules which are at the top PageRank and CheiRank positions playing thus an important role in the influence flow through the whole network structure. With a simple example of a subset of selected proteins (subset of selected nodes), we show that the reduced Google matrix analysis allows to determine the effective interactions between these proteins taking into account all the indirect pathways between these proteins through the global MetaCore network, in addition to direct interactions between selected proteins. We stress that the approach with the reduced Ising Google matrix algorithm, based on Ising spin description, allows to take into account the bifunctional nature of the protein–protein interactions (activation or inhibition) and to determine the average action type (or magnetization) of each protein .
Here, we have presented mainly the statistical properties of the MetaCore network without entering into detailed analysis of related biological effects. We plan to address, in further studies, the biological effects obtained from the reduced Google matrix analysis of the MetaCore network.
Availability of data and materials
Data supporting the findings of this study is available at MetaCoreNet. Finer details can be obtained from the authors upon request, e.g. network data for Fig. 4. However, the global PPIs data used for construction of the MetaCore network is the private property of Clarivate Analytics distibuted by their rules.
Abbreviations
 PPIs:

Protein–Protein Interactions
 GMA:

Google matrix analysis
 IGMA:

Ising Google matrix analysis
 RGMA:

Reduced Google matrix analysis
 RIGMA:

Reduced Ising Google matrix analysis
 WWW:

World Wide Web
References
Abel M, Shepelyansky DL (2011) Google matrix of business process management. Eur Phys J B. https://doi.org/10.1140/epjb/e201010710y
Aragon P, Laniado D, Kaltenbrunner A, Volkovich Y (2012) Biographical social networks on Wikipedia: a crosscultural study of links that made history. In: Proceedings of the eighth annual international symposium on wikis and open collaboration, WikiSym ’12. Association for Computing Machinery, New York. https://doi.org/10.1145/2462932.2462958
Bessarabova M, Ishkin A, JeBailey L, Nikolskaya T, Nikolsky Y (2012) Knowledgebased analysis of proteomics data. BMC Bioinform 13(Suppl 16):13. https://doi.org/10.1186/1471210513S16S13
Brin S, Page L (1998) The anatomy of a largescale hypertextual web search engine. Comput Netw 30(1–7):107–117. https://doi.org/10.1016/S01697552(98)00110X
Chepelianskii AD (2010) Towards physical laws for software architecture. arXiv:1003.5455
Coquidé C, Ermann L, Lages J, Shepelyansky DL (2019) Influence of petroleum and gas trade on eu economies from the reduced Google matrix analysis of UN COMTRADE data. Eur Phys J B. https://doi.org/10.1140/epjb/e20191001326
Du D, Lee CF, Li XQ (2012) Systematic differences in signal emitting and receiving revealed by pagerank analysis of a human protein interactome. PLoS ONE 7(9):1–9. https://doi.org/10.1371/journal.pone.0044872
Ekins S, Bugrim A, Brovold L, Kirillov E, Nikolsky Y, Rakhmatulin E, Sorokina S, Ryabov A, Serebryiskaya T, Melnikov A, Metz J, Nikolskaya T (2006) Algorithms for network analysis in systemsADME/Tox using the MetaCore and MetaDrug platforms. Xenobiotica 36(10–11):877–901. https://doi.org/10.1080/00498250600861660
El Zant S, JaffrèsRunser K, Shepelyansky DL (2018) Capturing the influence of geopolitical ties from Wikipedia with reduced Google matrix. PLoS ONE 13(8):1–31. https://doi.org/10.1371/journal.pone.0201397
Eom YH, Aragón P, Laniado D, Kaltenbrunner A, Vigna S, Shepelyansky DL (2015) Interactions of cultures and top people of Wikipedia from ranking of 24 language editions. PLoS ONE 10(3):1–27. https://doi.org/10.1371/journal.pone.0114825
Ermann L, Frahm KM, Shepelyansky DL (2015) Google matrix analysis of directed networks. Rev Mod Phys 87:1261–1310. https://doi.org/10.1103/RevModPhys.87.1261
Frahm KM, Shepelyansky DL (2016) Reduced Google matrix. arXiv:1602.02394
Frahm KM, JaffrèsRunser K, Shepelyansky DL (2016) Wikipedia mining of hidden links between political leaders. Eur Phys J B. https://doi.org/10.1140/epjb/e2016705263
Frahm KM, Shepelyansky DL (2019a) IsingPageRank model of opinion formation on social networks. Phys A Stat Mech Appl 526:121069. https://doi.org/10.1016/j.physa.2019.121069
Frahm KM, Shepelyansky DL (2019b) Linear response theory for Google matrix. arXiv:1908.08924
Frahm KM, Shepelyansky DL (2020) Google matrix analysis of bifunctional SIGNOR network of protein–protein interactions. Phys A Stat Mech Appl 559:125019. https://doi.org/10.1016/j.physa.2020.125019
Joerger AC, Fersht AR (2016) The p53 pathway: origins, inactivation in cancer, and emerging therapeutic approaches. Ann Rev Biochem 85(1):375–404. https://doi.org/10.1146/annurevbiochem060815014710
Lages J, Shepelyansky DL, Zinovyev A (2018) Inferring hidden causal relations between pathway members using reduced Google matrix of directed biological networks. PLoS ONE 13(1):1–28. https://doi.org/10.1371/journal.pone.0190812
Langville AN, Meyer CD (2006) Google’s PageRank and beyond—the science of search engine rankings. Princeton University Press, Princeton
Lee I, Blom UM, Wang PI, Shim JE, Marcotte EM (2011) Prioritizing candidate disease genes by networkbased boosting of genomewide association data. Genome Res 21(7):1109–1121. https://doi.org/10.1101/gr.118992.110
Li YE, Preissl S, Hou X, Zhang Z, Zhang K, Qiu Y, Poirion OB, Li B, Chiou J, Liu H, PintoDuarte A, Kubo N, Yang X, Fang R, Wang X, Han JY, Lucero J, Yan Y, Miller M, Kuan S, Gorkin D, Gaulton KJ, Shen Y, Nunn M, Mukamel EA, Behrens MM, Ecker JR, Ren B (2021) An atlas of gene regulatory elements in adult mouse cerebrum. Nature 598:129–136. https://doi.org/10.1038/s41586021036041
MetaCore. https://clarivate.com/cortellis/solutions/earlyresearchintelligencesolutions/
MetaCoreNet. https://www.quantware.upstlse.fr/QWLIB/metacorenet/
Moroney JB, Vasudev A, Pertsemlidis A, Zan H, Casali P (2020) Integrative transcriptome and chromatin landscape analysis reveals distinct epigenetic regulations in human memory b cells. Nat Commun. https://doi.org/10.1038/s41467020192426
Perfetto L, Briganti L, Calderone A, Perpetuini AC, Iannuccelli M, Langone F, Licata L, Marinkovic M, Mattioni A, Pavlidou T, Peluso D, Petrilli LL, Pirrò S, Posca D, Santonico E, Silvestri A, Spada F, Castagnoli L, Cesareni G (2016) SIGNOR: a database of causal relationships between biological entities. Nucleic Acids Res 44(Database–Issue):548–554. https://doi.org/10.1093/nar/gkv1048
Prives C, Hall PA (1999) The p53 pathway. J Pathol 187(1):112–126
Shan Q, Li X, Chen X, Zeng Z, Zhu S, Gai K, Peng W, Xue HH (2021) Tcf1 and lef1 provide constant supervision to mature cd8+ t cell identity and function by organizing genomic architecture. Nat Commun 12:5863. https://doi.org/10.1038/s41467021261591
Stassen SV, Yip GGK, Wong KKY, Ho JWK, Tsia KK (2021) Generalized and scalable trajectory inference in singlecell omics data with via. Nat Commun. https://doi.org/10.1038/s41467021257733
Toufektchan E, Toledo F (2018) The guardian of the genome revisited: p53 downregulates genes required for telomere maintenance, DNA repair, and centromere structure. Cancers. https://doi.org/10.3390/cancers10050135
Voevodski K, Teng S, Xia Y (2009) Spectral affinity in protein networks. BMC Syst Biol. https://doi.org/10.1186/175205093112
Winter C, Kristiansen G, Kersting S, Roy J, Aust D, Knösel T, Rümmele P, Jahnke B, Hentrich V, Rückert F, Niedergethmann M, Weichert W, Bahra M, Schlitt HJ, Settmacher U, Friess H, Büchler M, Saeger HD, Schroeder M, Pilarsky C, Grützmann R (2012) Google goes cancer: improving outcome prediction for cancer patients by networkbased ranking of marker genes. PLoS Comput Biol 8(5):1–16. https://doi.org/10.1371/journal.pcbi.1002511
Yang L, Chen R, Goodison S, Sun Y (2021) An efficient and effective method to identify significantly perturbed subnetworks in cancer. Nat Comput Sci. https://doi.org/10.1038/s43588020000094
Zhang Z, Zhou J, Tan P, Pang Y, Rivkin AC, Kirchgessner MA, Williams E, Lee CT, Liu H, Franklin AD, Miyazaki PA, Bartlett A, Aldridge AI, Vu M, Boggeman L, Fitzpatrick C, Nery JR, Castanon RG, Rashid M, Jacobs MW, ItoCole T, O’Connor C, PintoDuartec A, Dominguez B, Smith JB, Niu SY, Lee KF, Jin X, Mukamel EA, Behrens MM, Ecker JR, Callaway EM (2021) Epigenomic diversity of cortical projection neurons in the mouse brain. Nature. https://doi.org/10.1038/s4158602103223w
Zhirov AO, Zhirov OV, Shepelyansky DL (2010) Twodimensional ranking of Wikipedia articles. Eur Phys J B 77:523–531. https://doi.org/10.1140/epjb/e2010105007arXiv:1006.4270
Acknowledgements
We thank Andrei Zinovyev (Institut Curie) for useful discussions.
Funding
This research has been partially supported through the grant NANOX N° ANR17EURE0009 (project MTDINA) in the frame of the Programme Investissements d’Avenir, France. This research has been also supported by the Programme Investissements d’Avenir ANR15IDEX0003, ISITEBFC (GNETWORKS project) and the council of Bourgogne FrancheComté region (APEX project and REpTILs project).
Author information
Authors and Affiliations
Contributions
The authors contributed equally to this work. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
EK is a senior research scientist employed by Clarivate, Barcelona. KMF, JL, and DLS declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kotelnikova, E., Frahm, K.M., Lages, J. et al. Statistical properties of the MetaCore network of protein–protein interactions. Appl Netw Sci 7, 7 (2022). https://doi.org/10.1007/s41109022004444
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41109022004444
Keywords
 Complex networks
 MetaCore
 Google matrix
 PageRank
 Protein–protein interactions network