Connectivity motifs
In this section we show how Forman F and Ollivier O Ricci curvatures can be used for the identification of specific connectivity motifs of directed hyperedges. We discuss signs and values of F and O for the orange hyperedges of the nine directed hypergraphs presented in Fig. 1, based upon the connections of their tails and heads. From left to right we can detect changes in the sign of O while the sign of F is fixed. On the other hand, when we move vertically in the plot, the F sign changes while the O sign is fixed. In the diagonal, directed hyperedges have the same sign for both curvatures. In the first column, each orange hyperedge e=(ei,ej) has O(e)>0. The reason is that the set of masses and the set of holes associated to e are equal, namely \(\mathcal {M} = \mathcal {H}\). This implies that the distance between any mass and any hole is zero. Therefore, m0=1 while m2=m3=0. Thus, from Eq. 7, O(e)=1. In contrast, F decreases when we move vertically in the same column. In particular, F(e) is positive for the uppermost hypergraph because the size of the tail of e is greater than the sum of indegree values of the vertices in ei, namely, \(\left |e_{i}\right | - {\sum \nolimits }_{i\in e_{i}}\text {in-deg}(i) = 1\) and, at the same time, the sole vertex in the head of e has only one outgoing arrow, then \(\left |e_{j}\right | - {\sum \nolimits }_{j\in e_{j}}\text {out-deg}(j) = 0\). When moving down in the same column, \({\sum \nolimits }_{i\in e_{i}}\text {in-deg}(i)\) and \({\sum \nolimits }_{j\in e_{j}}\text {out-deg}(j)\) increase by one in the second and third hypergraphs, respectively (with values F(e)=0 and F(e)=−1).
The arguments above explain the signs of F in the second and third columns. On the other hand, O(e) values decrease from left to right in rows, due to the increase in the distance between each mass and each hole of e. Distances in the second column increase by 1 compared to the first column, which means that m0=m2=m3=0 and O(e)=0. Similarly, for any hypergraph in the third column, the distance from masses to holes becomes 3, and therefore, m0=m2=0, m3=1 and O(e)=−2.
Hyperloops and their curvature
The definition of a loop in a hypergraph depends on the definition of a path. The strictest version requires that the head and tail of the hyperedge e=(ei,ej) coincide, i.e. ei=ej. If, moreover, the hyperedge is isolated, then F(e)=0 while O(e)=1 (see the top right hyperedge of Fig. 2). If the hyperedge is not isolated, both curvature notions might change. As an example, if we move down in column one of Fig. 2, we see F(e) and O(e) decrease, as the result of the incoming neighbors added to the tail of e and the outgoing neighbors added to its head.
A second, more flexible scenario, requires one of the following two cases to occur, ei⊂ej or ej⊂ei. In this case, if the hyperedge is isolated, then F(e)>0 and O(e)<1. Again, both might change if the nodes of e have further connections. For instance, if we move down in column two of Fig. 2, F(e) and O(e) decrease due to the addition of hyperedges incident to e.
Finally, the most flexible version of a hyperloop simply requires the tail and head of e to have non-empty intersection, namely, \( e_{i} \cap e_{j} \neq \varnothing \). Naturally, any hyperedge of the two types described before represents a particular case of this version. All hyperloops in Fig. 2 are instances of this definition, showing that they can be flat, positively or negatively curved, for both F and O.
A random model of directed hypergraphs and its curvature fingerprint
In this section we introduce a random model of directed hypergraphs and explore the distribution of the degrees of nodes, the sizes of tails and heads, and the Forman and Ollivier Ricci curvatures. The model serves as a baseline to understand how curvature distributions look when connections (hyperedges) are made randomly following simple rules. It might be regarded as a hypergraph version of the Erdös-Rényi (ER) model. Here the size of a hyperedge e, can be any number between 1 and |V|. However, we decide not to choose the tail and head sizes of hyperedges randomly (with the obvious restriction that their sum should equal the size of the hyperedge), but rather let them match the statistical properties of the empirical networks that we want to analyze. The process is explicitly described in Algorithm 1.
ER model description: The input of Algorithm 1 is the number of vertices n and the number of directed hyperedges m. The output is a directed hypergraph D=(V,E) where V={1,…,n}, and E is a multiset of directed hyperedges. Each hyperedge e is an ordered pair (ei,ej) of subsets of V. First, we associate a random integer between 2 and n to the size of e, namely, |e|. Then, |e| nodes are randomly sampled from V without replacement, and, roughly |e|/2 are assigned to the tail ei and roughly |e|/2 to the head ej (see Algorithm 1). Notice that the model produces non-empty tails and heads. The new formed directed hyperedge e=(ei,ej) is then added to E. The process is independently carried out m times.
Numerical approach to ER properties: We use Algorithm 1 to compute four random directed hypergraphs over a fixed set of 500 vertices. The number of hyperedges are 250, 500, 1000, and 2000, respectively. Figure 3 shows the in-degree and out-degree distributions of nodes and suggests a normal behaviour. Moreover, since the size of tails and heads is an integer chosen randomly from a fixed interval, both distributions, tail sizes and head sizes, are expected to be uniform, explaining the shapes in Fig. 4. F(e) is the difference between hyperedge size and the number of incoming and outgoing hyperedges of nodes in the tail and head, respectively. In this model, F(e) values are uniformly distributed over an interval of the form [x,0] where x decreases with the number of hyperedges m, as shown in Fig. 5 (left). Similarly, O(e) takes positive real values within the interval [0.9,x] where x approaches to 1 as m increases, as shown in Fig. 5 (right). This result is triggered by the presence of arbitrary large hyperedges and the randomness of the wiring process of the ER model which, combined, produce hyperedges with sets of masses and holes as big as the entire set of vertices, as Fig. 6 shows.
Metabolism of E.coli
Metabolic networks are clear examples of directed hypergraphs. They consist of a set of chemical reactions involving a fixed set of chemical species. Each reaction, ei→ej, entails the change of chemical identity of an initial set ei of metabolites, called reactants, to yield a second set ej called products. This directed relationship can be captured naturally by the notion of hyperedge. In this section, we use our geometric tools to investigate the structure of the directed hypergraph that underlies the E. coli metabolic hypernetwork. In particular, we present the curvature distribution fingerprints of F and O, along with the quantities involved in their computation.
The directed hypergraph ofE. coli: the metabolism of this bacterium, reported in Reed et al. (2003) (K-12 (iJR904GSM/GPR)), is modeled as a directed hypergraph. Vertices represent chemical substances (metabolites) and hyperarcs stand for metabolic reactions. To each reaction ei→ej we associate a directed hyperedge e=(ei,ej). There are |V|=625 metabolites, 245 reversible reactions and 686 non-reversible ones. Each reversible reaction ei⇔ej was split into two, the forward and the reverse reaction, yielding two hyperarcs, ei→ej and ej→ei. Therefore, |E|=1,176.
Now turning to Figs. 7, 12 and 13 we see that the distributions for E.coli are very different from those of random hypergraphs. Figure 7a) shows the number of metabolic reactions with |ei| reactants and |ej| products. 90% of chemical reactions have at most three reactants and three products (also observed for the whole Chemical Space (Llanos et al. 2019)), which, according to Eq. 5, indicates that frequent curvature values in Fig. 12 (left) are ruled by the accumulated in- and out-degree. In particular, frequent values of curvature were found to distinguish bottle neck and redundant reactions in the metabolic network (Leal et al. 2018). On the other hand, when considering the number of incoming neighbors of reactants and of outgoing neighbors of products for every reaction, frequencies are of the order of hundreds and, for some reactions, almost the whole substrate set, as shown in Fig 7 b). The question that arises is how close those masses and holes are in the metabolic network. The Ollivier-Ricci curvature distribution in Fig. 13 (left) shows that most masses and holes are at distance less than 3, since the vast majority has curvature greater than -0.5. Less than 10% of incoming and outgoing neighbors are at distance 3. Only four reactions have curvature -2, indicating that their masses are at least three reactions away from their holes.
Ashuffled directed hypergraph ofE. coli: In the shuffling experiment, we start with the metabolic network M and end up with a directed hypergraph S of the same size (tail and head size distributions) and with the same degree sequence. Our goal is to investigate if our two notions of curvature can capture the wiring changes brought about by the shuffling of the hypernetwork. The distributions of the quantities involved are presented in Figs. 8-13. Figures 8 and 9 show that our shuffling algorithm does preserve the degree sequence i.e. the degree sequences of M and S are the same. Similarity, Figs. 10 and 11 show that the distributions of tail and head sizes of M and S are the same. Importantly, F and O distributions of M and S are different, as shown in Figs. 12 and 13. Confirming that our notions of curvature detect differences that the degree cannot.