Convergence properties of optimal transport-based temporal hypergraphs

We present a method to extract temporal hypergraphs from sequences of 2-dimensional functions obtained as solutions to Optimal Transport problems. We investigate optimality principles exhibited by these solutions from the point of view of hypergraph structures. Discrete properties follow patterns that differ from those characterizing their continuous counterparts. Analyzing these patterns can bring new insights into the studied transportation principles. We also compare these higher-order structures to their network counterparts in terms of standard graph properties. We give evidence that some transportation schemes might benefit from hypernetwork representations. We demonstrate our method on real data by analyzing the properties of hypernetworks extracted from images of real systems.


Introduction
Optimal Transport (OT) is a principled theory to compare probability distributions (Kantorovich 1942;Villani 2009;Santambrogio 2015;Peyré et al. 2019). Although this task is usually framed as an optimization problem, recent studies have mapped it within the framework of dynamic partial differential equations (Evans and Gangbo 1999;Facca et al. 2018Facca et al. , 2020Facca et al. , 2021Tero et al. 2007Tero et al. , 2010. In this context, solutions to a transportation problem are often found as the convergent state of evolving families of functions. In some scenarios, the steady states of these evolving families are supported in network-shaped structures (Xia 2003(Xia , 2014(Xia , 2015. Recently, this fact has called the attention of network scientists and graph theorists leading to the development of methods that convert the solutions of OT problems into actual graph structures (Baptista et al. 2020;Leite and De Bacco 2022). This has broadened the available set of tools to understand and solve these transportation problems. Recent studies have shown that common patterns can be unveiled in both the original mathematical setting and in the converted graph structures (Baptista and De Bacco 2021b).
Representations of these functions as sets of dyadic relations have been proven meaningful in various applications (Baptista and De Bacco 2021a;Facca et al. 2021). Nonetheless, traditional dyadic representations may be limited in representing flows of quantities Page 2 of 16 Baptista and De Bacco Applied Network Science (2023) 8:3 like mass or information as observed in real systems. Various examples of systems where interactions happen between 3 individuals or more are observed in applications as social contagion (de Arruda et al. 2020;Chowdhary et al. 2021), random walks (Carletti et al. 2020;Schaub et al. 2020) or non-linear consensus (Neuhäuser et al. 2020). Understanding the relation between the structure and dynamics taking place on higher-order structures is an active field of research (Taylor et al. 2015;Patania et al. 2017). For instance, key elements controlling dynamics are linked to the heterogeneity of hyperedges' sizes present in their higher-order representations (Patania et al. 2017). These systems are hence best described by hypergraphs, generalizations of networks that encode structured relations among any number of individuals. With this in mind, a natural question to ask is how do OT-based structures perform in terms of higher-order representations? To help bridge this knowledge gap about higher-order properties of structures derived from OT solutions, we elaborate on the results observed in Baptista and De Bacco (2021b). Specifically, we propose a method to convert the families of 2-dimensional functions into temporal hypernetworks. We enrich the existing network structures associated with these functions by encoding the observed interactions into hyperedges. We study classic hypergraph properties and compare them to the predefined cost functional linked to the transportation problems. Finally, we extend this method and the analysis to study systems coming from real data. We build hypergraph representations of P. polycephalum (Westendorf et al. 2016) and analyze their topological features.

The Dynamical Monge-Kantorovich set of equations
We start by reviewing the basic elements of the mechanism chosen to solve the OT problems. As opposed to other standard optimization methods used to solve this (Cuturi 2013), we use an approach that turns the problem into a dynamical set of partial differential equations. In this way, initial conditions are updated until a convergent state is reached. The dynamical system of equations as proposed by Facca et al. (2018Facca et al. ( , 2020Facca et al. ( , 2021, is presented as follows. We assume that the OT problem is set on a continuous 2-dimensional space ∈ R 2 , and at the beginning, no underlying network structure is observed. This gives us the freedom of exploring the whole space to design an optimal network topology, solution of the transportation problem. The main quantities that need to be specified in input are source and target distributions. We refer to them as sources and sinks, where a certain mass (e.g. passengers in a transportation network, water in a water distribution network) is injected and then extracted. We denote these with a "forcing" function f (x) = f + (x) − f − (x) ∈ R , describing the flow-generating sources f + (x) and sinks f − (x) . To ensure mass balance it is imposed � f (x)dx = 0 . We assume that the flow is governed by a transient Fick-Poiseuille flux q = −µ∇u , where µ, u and q are called conductivity (or transport density), transport potential and flux, respectively. Intuitively, mass is injected through the source, moved based on the conductivity across space, and then extracted through the sink. The way mass moves determines a flux that depends on the pressure exerted on the different points in space; this pressure is described by a potential function.
Page 3 of 16 Baptista and De Bacco Applied Network Science (2023) 8:3 The set of Dynamical Monge-Kantorovich (DMK) equations is given by: where ∇ = ∇ x . Equation (1) states the spatial balance of the Fick-Poiseuille flux and is complemented by no-flow Neumann boundary conditions. Equation (2) enforces the dynamics of this system, and it is controlled by the so-called traffic rate β . It determines the transportation scheme, and it shapes the topology of the solution: for β < 1 we have congested transportation where traffic is minimized, whereas β > 1 induces branched transportation where traffic is consolidated into a smaller amount of space. The case β = 1 recovers shortest path-like structures. Finally, Eq.
(3) constitutes the initialization of the system and can be thought of as an initial guess of the solution. Solutions (µ * , u * ) of Eqs.
(1)-(3) minimize the transportation cost function L(µ, u) (Facca et al. 2018(Facca et al. , 2021, defined as: L can be thought of as a combination of M , the total energy dissipated during transport (or network operating cost) and E , the cost to build the network infrastructure (or infrastructural cost). It is known that this functional's convexity changes as a function of β . Non-convex cases arise in the branched schemes, inducing fractal-like structures (Facca et al. 2021;Santambrogio 2007). This is the case that we considered in this work, and it is the only one where meaningful network structures, and thus, hypergraphs, can be extracted (Baptista et al. 2020).

Hypergraph construction
We define a hypergraph (also, hypernetwork) as follows ): a hypergraph is a tuple H = (V , E), where V = {v 1 , ..., v n } is the set of vertices and E = {e 1 , e 2 , ..., e m } is the set of hyperedges in which e i ⊂ V , ∀i = 1, ..., m, and |e i | > 1 . If |e i | = 2, ∀i then H is simply a graph. We call edges those hyperedges e i with |e i | = 2 and triangles, those with |e i | = 3 . We refer to the 1-skeleton of H as the clique expansion of H. This is the graph G = (V , E G ) made of the vertices V of H, and of the pairwise edges built considering all the possible combinations of pairs that can be built from each set of nodes defining each hyperedge in E.
Let µ be the conductivity found as a solution of Eqs.
(1)-(3). As previously mentioned, µ at convergence regulates where the mass should travel for optimal Page 4 of 16 Baptista and De Bacco Applied Network Science (2023) 8:3 transportation. Similar to Baptista and De Bacco (2021b), we turn this 2-dimensional function into a different data structure, namely, a hypergraph. This is done as follows: consider G(µ) = (V G , E G ) the network extracted using the method proposed in Baptista et al. (2020). We define H (µ) as the tuple together with all of its triangles. This choice is motivated by the fact that the graphextraction method proposed in Baptista et al. (2020) uses triangles to discretize the continuous space , which can have a relevant impact on the extracted graph or hypergraph structures. Hence, triangles are the natural sub-structure for hypergraph constructions. The method proposed in this work is valid for higher-order structures beyond triangles. Exploring how these additional structures impact the properties of the resulting hypergraphs is left for future work. Figure 1 shows an example of one of the studied hypergraphs. The red shapes represent the different triangles of H (µ) . Notice that, although we consider here the case where |e| ≤ 3 for each hyperedge e-for the sake of simplicity-higher-order structures are also well represented by the union of these elements, as shown in the right panels of the figure.
Since this hypergraph construction method is valid for any 2-dimensional transport density, we can extract a hypergraph not only from the convergent µ but also at any time step before convergence. This then allows us to represent optimal transport sequences as hypergraphs evolving in time, i.e. temporal hypernetworks.

Hypergraph sequences
Formally, let µ(x, t) be a transport density (or conductivity) function of both time and space obtained as a solution of the DMK model. We denote it as the sequence {µ t } T t=0 , for some index T (usually taken to be that of the convergent state). Each µ t is the t-th update of our initial guess µ 0 , computed by following the rules described in Eqs. (1)-(3). This determines a sequence of hypernetworks {H(µ t )} T t=0 extracted from {µ t } T t=0 with the extraction method proposed in Baptista et al. (2020). Figure 2 shows three hypergraphs built from one of the studied sequences {µ t } using this method at different time steps. The corresponding OT problem is that defined by the (filled and empty) circles: mass is injected in the bottom left circle and must be extracted at the highlighted destinations. On the top row, different updates (namely, t = 12, 18, 26 ) of the solution are shown. They are defined on a discretization of [0, 1] 2 . Darkest colors represent their support. Hypergraphs extracted from these functions are displayed at the bottom row. As can be seen, only edges (in gray) and triangles (in red) are considered as part of H(µ t ) . Notice that the larger the t is, the less dense the hypergraphs are, which is expected for a uniform initial distribution µ 0 and branched OT ( β > 1 ) (Facca et al. 2021).

Graph and hypergraph properties
We compare hypergraph sequences to their correspoding network counterparts (defined as described in the previous paragraph). We analyze the following main network and hypergraph properties for the different elements in the sequences and for different sequences. Denote with G = (V G , E G ) and H = (V H , E H ) one of the studied graphs and hypergraphs belonging to some sequence {G(µ t )} T t=0 and {H (µ t )} T t=0 , respectively. We consider the following network properties: 1. |E G | , total number of edges; 2. Average degree d(G), the mean number of neighbors per node; 3. Average closeness centrality c(G): let v ∈ V G , the closeness centrality of v is defined is the shortest path distance between u and v.
Hypernetwork properties can be easily adapted from the previous definitions with the help of generalized adjacency matrices and line graphs (Aksoy et al. 2020). Let H be a hypergraph with vertex set V = {1, .., n} and edge set E = {e 1 , ..., e m } . We define the generalized node s-adjacency matrix A s of H as the binary matrix of size n × n , s.t.,  We define the s-line graph L s as the graph generated by the adjacency matrix A s . Notice that A 1 corresponds to the adjacency matrix of H's skeleton (which is L 1 ). Figure 3 shows a family of adjacency matrices together with the line graphs generated using them. We can then define hypergraphs properties in the following way: 1. |E H | , total number of hyperedges; 2. |T | = |{e ∈ E H : |e| = 3}|, total number of triangles; 3. S = t∈T a(t), covered area, where a(t) is the area of the triangle t; 4. Average degree d s (H ) , the mean number of incident hyperedges of size greater or equal than s per node; 5. Average closeness centrality c s (H) : let v ∈ V H , the closeness centrality of v is defined as its closeness centrality in L s .
S can be defined in terms of any other property of a hyperedge, e.g. a function of its size |e|. Here we consider the area covered by a hyperedge to keep a geometrical perspective. On the other hand, this area S can be easily generalized to hyperedges with |e i | > 3 by suitably changing the set T in the summation, e.g. by considering structures containing four nodes. As for the centrality measures, we focus our attention to compare the case s > 1 against s = 1 , as the latter traces back to standard graph properties and we are interested instead to investigate what properties are inherent to hypergraps. Figure 4 shows values of the d s (H) and c s (H) for convergent hypergraphs H (obtained from different values of β ) together with the degree and closeness centrality of their correspondent graph versions. The considered hypergraphs are displayed in the top row of the figure. As can be seen in the figure, patterns differ considerably for different values of β . As s controls the minimum number of shared connections for different nodes in the networks, the higher this number, the more restrictive this condition becomes, thus leading to more disconnected line graphs. In the case of the s-degree centrality, we observe decreasing values for increasing s, with nodes with the highest centrality having much higher values than nodes less central. For both s = 2, 3 we observe higher values than nodes in G. This follows from the fact that once hyperedges are added to G, the number of incidences per node can only increase. Centrality distributions strongly depend on β . For small values-more distributed traffic ( β = 1.1)-the number of hyperedges per node remains larger than the number of regular edges connected to it. But if traffic is consolidated on less space ( β = 1.9 ), then very few hyperedges are found. This suggests that the information learned from hypergraphs that is distinct to that contained in the graph skeleton is influenced by the chosen traffic regime. As for the closeness centrality distribution, this resembles that of G for small values of β , regardless s. For higher β it switches towards an almost binary signal. Thus, nodes tend to become more central as β increases, suggesting that adding hyperedges to networks G leads to shorter distances between nodes. The loss of information seen for the highest values of s is due to the fact that the line graphs L s become disconnected with many small connected components. In these cases, the closeness centrality of a node is either 0 if it is isolated, or proportional to the diameter of the small connected component where it lives in.

Convergence criteria
Numerical convergence of the DMK Eqs. (1)-(3) is usually defined by fixing a threshold τ . The updates are considered enough once the cost associated to them does not change more ( ≤ τ ) than that of the previous time step. As it is usually the case when this threshold is too small ( τ = 10 −12 in our experiments), the cost or the network structure may consolidate to a constant value earlier than algorithmic convergence. Similar to Baptista and De Bacco (2021b), to meaningfully establish when is hypergraph optimality reached, we consider as convergence time the first time step when the transport cost, or a given network property, reaches a value that is smaller or equal to a certain fraction p of the value reached by the same quantity at algorithmic convergence (in the experiments here we use p = 1.05 ). We refer to t L and t P for the convergence in times in terms of cost function or a network property, respectively.

Results
To test the properties presented in the previous section and understand their connection to transportation optimality, we synthetically generate a set of optimal transport problems, determined by the configuration of sources and sinks. As done in Baptista and De Bacco (2021b), we fix a source's location and sample several points in the set [0, 1] 2 to be used as sinks' locations. Let S = {s 0 , s 1 , ..., s M } be the set of locations in the space [0, 1] 2 , and fix a positive number 0 < r . We define the distributions f + and f − as is the circle of center s i and radius r. The value of r is chosen based on the used discretization, and as mentioned before, the centers are sampled uniformly at random. The symbol ∝ stands for proportionality and is used to ensure that f + and f − are both probability distributions. The transportation cost is that of Eq. (4).

Synthetic OT problems
The set of transportation problems considered in our experiments consists of 100 source-sink configurations. We place the location of the source s 0 = (0, 0) (i.e. the support of f + at (0, 0)), and sample 15 points s 1 , s 2 , ..., s M uniformly at random from a regular grid. By sampling them from the nodes of the grid, we ensure that two different locations are at a safe distance so they are considered different once the space is discretized. We initialize µ 0 (x) = 1, ∀x to be a uniform distribution on [0, 1] 2 . This can be interpreted as a non-informative initial guess for the solution. Starting from µ 0 , we compute a maximum of 300 updates. Depending on the chosen traffic rate β more or fewer iterations can be needed. We claim that the sequence {µ t } T t=0 converges to a certain function µ * at iteration T if either |µ T − µ T −1 | < τ, for a tolerance τ ∈ (0, 1], or T reaches the mentioned maximum. For the experiments reported in this manuscript, the tolerance τ is set to be 10 −12 . Given the dependence of the solution of traffic constraints, a wide range of values of β is considered. Namely, we study solutions obtained from low traffic cases ( β = 1.1 , and thus, less traffic penalization) to large ones ( β = 1.9 ), all of them generating branched transportation schemes. Our 100 problems are linked to a total of 900 hypergraph sequences, each of them containing between 50 and 80 higherorder structures.

Convergence: transport cost vs hypernetwork properties
As presented in Baptista and De Bacco (2021b), we show a comparison between hypernetwork properties and the cost function minimized by the dynamics, where convergence times are highlighted (Fig. 5). We focus on the property S, the area of the surface covered by the triangles in H. This quantity is influenced by both the amount of triangles Page 9 of 16 Baptista and De Bacco Applied Network Science (2023) 8:3 (hence of hyperedges) and their distribution in space. Hence, it is a good proxy for how hypergraph properties change both in terms of iteration time and as we tune β . We observe that t P > t L in all the cases, i.e. convergence in terms of transportation cost is reached earlier than the convergence of the topological property. Similar behaviors are seen for other values of β ∈ [1.1, 1.9] and other network properties (see "Appendix"). Similar to DMK-based network properties, the covered area's decay is faster for the smallest values of β . This is expected, given the convexity properties of L (Facca et al. 2018(Facca et al. , 2021. However, the transport cost decays even faster, in a way that the value of S is still far away from convergence in the congested transportation case (small β).
Notice that S remains stable after the first few iterations, and then it starts decreasing at different rates (depending on β ) until reaching the converged value. This suggests that the dynamics tend to develop thick branches-covering a large area-at the beginning of the evolution, and then it slowly compresses them until reaching the optimal topologies. These different convergence rates for S and L may prevent construction of converged hypernetwork topologies: if the solver is stopped at t L < t P , the resulting hypergraphs H(µ t ), t = t L would mistakenly cover a surface larger than that covered by the convergent counterpart ( H (µ t ), for t ≥ t P ). This scenario is less impactful for larger values of β , although in these scenarios H is much more similar to a regular graph, because of the small number of higher-order structures. Topological differences between converged hypernetworks can be seen in Fig. 4.
Finally, we observe that both t L (β) and t P (β) are increasing functions on β . This is expected since the larger the traffic rate is, the longer it takes for the sequences to converge. This particular behavior matches what is shown in Baptista and De Bacco (2021b) in the case of t L , but this is not the case for t P (β) : it was observed a non-monotonic behavior in the network case. Convergence behavior of hypernetwork properties Figure 6 shows how the various network properties change depending on the traffic rate. Mean values and standard deviations are computed across times, for a fixed value of β . As shown, the number of hyperedges, number of triangles, covered area, and average 1-degree exhibit decreasing patterns as functions of t. As a consequence, transport optimality can be thought of as reaching minimum states on the mentioned hypernetwork properties. Another clear feature of these functions is related to the actual converged values: the larger the β is, the smaller these metrics become. This is explained by a cost function increasingly encouraging consolidations of paths on fewer edges. Notice also that the gap between these converged values signals a non-linear dependence on the outputs of the dynamics; e.g., a converged hypernetwork obtained for β = 1.1. loses many more hyperedges if the traffic rate is then set to 1.2, whereas this loss would not be that large if β = 1.2 is increased to 1.3. The nature of these gaps is substantially different depending on the property itself. This also shows that certain properties better reveal the distinction between different optimal traffic regimes. The behavior of the closeness centralities is distinctly different than that of the other properties. While its initial values are the same for all values of β (similar to the previous properties), no clear trend can be found as time increases. For s = 1 , on average β = 1.1 generates sequences that tend to recover initial values after increasing and then decreasing behavior. For the other traffic rates, we observe different patterns. Notice that s− closeness centrality on the hypergraph for s = 1 is the same as the classic closeness centrality on the skeleton of it. Thus, these rather noisy patterns are not due to the addition  Baptista and De Bacco Applied Network Science (2023) 8:3 of hyperedges. On the other hand, for s = 2 the average centrality shows increasing curves. This may be due to L s getting increasingly disconnected with small connected components. Therefore, the larger s, the closer the nodes are seen (see Fig. 3). Moreover, in this case small values of β lead to more stable closeness centrality values, showing the impact of β in building higher-order structures. While different values of β lead to different behaviors of the hypergraph properties (e.g. decreasing degrees and amount of hyperedges for increasing β ) we remark that choosing the value of β should depend on the application at hand. The analysis performed here showcases how this choice may impact the resulting topologies. This can help practitioners to visualize possible consequences in terms of downstream analysis on the transportation properties of the underlying infrastructure.

P. polycephalum hypernetworks
We now analyze hypernetworks extracted from images of real data. We are interested in the evolution of the area covered by triangles in the sequences {H (µ t )} T t=0 extracted from real images of the slime mold P. polycephalum. The behavior of this organism is the inspiration of the modeling ideas of the DMK equations described in Methods. It has been shown that these slime molds follow a similar optimization strategy as that captured by the DMK dynamics while foraging for food in 2D surfaces (Nakagaki et al. 2000;Tero et al. 2007Tero et al. , 2010. We extract hypernetworks from images using the idea described in Methods but instead of applying (Baptista et al. 2020) to obtain the networks, we use the method proposed by Baptista and De Bacco (2021a) which takes images as input. This pipeline uses the color intensities of the different image pixels to build a graph, by connecting adjacent meaningful nodes. We dedicate our attention to 4 image sequences from the Slime Mold Graph Repository (Dirnberger et al. 2017). The sequences are then describing the evolution of a P. polycephalum placed in a rectangular Petri dish. Each image, and thus each hypernetwork, is a snapshot of the movement of this organism over periods of 120 seconds.
We study the covered area for every one of the 4 sequences, and plot the results for one of them (namely, image set motion12; see "Appendix") in Fig. 7. We highlight 4 times along the property sequence to display the used images together with the corresponding hypernetworks. The lower leftmost plot shows a subsection of one of the studied snapshots. As can be seen there, this subhypernetwork topology exhibits a significant number of hyperedges of dimension 3, mainly around the thickest parts of the slime mold. On the other side, in the lower rightmost plot, the evolution of S is overall decreasing in time (similar results are obtained for other sequences, as shown in the "Appendix"). This suggests that the thicker body parts tend to get thinner as the P. polycephalum evolves into a consolidated state. This pattern resembles what is shown above for the synthetic data, i.e. the covered area tends to decrease as time evolves similar to the behavior of the DMK-based hypernetwork sequence. This suggests that the DMK model realistically mirrors a consolidation phase towards optimality of real slime molds (Dirnberger et al. 2017).

Additional hypernetwork properties
In this section we extend the comparison between the cost function-minimized by the dynamics-and hypernetwork properties (see Fig. 9). As mentioned in the main manuscript, similar monotonic behaviors can be observed in these cases.

Data information
We explain in this section further details about the analyzed real data. The images are taken from the Slime Mold Graph Repository (Dirnberger et al. 2017) as mentioned in the main manuscript. We study 4 {H i } T i sequences of different lengths. The length (T) varies depending on the number of images included in the sequence. This is because different experiments need more o fewer shots. These experiments, as mentioned in the repository's documentation, consist of placing a slime mold inside a Petri dish with a thin sheet of agar where no food is provided. Slime mold's exploration of the dish, as explained by the creators, is unbiased, due to the lack of food. Given that this organism is initially placed along one of the short edges of the rectangular dish, the experiment is considered to be finished once the plasmodium reaches the other short side. No more pictures are taken after this happens (Fig. 10).

Hypergraph extraction
We used the image sets motion12, motion24, motion40 and motion79, located in the repository, to build the studied hypernetworks. These sets contain a number of images ranging from 60 to 150. Hypernetworks are then extracted using the Img2net algorithm described in Baptista and De Bacco (2021a) as mentioned in the main manuscript, using the same configuration described in Baptista and De Bacco (2021b).