The Dynamical Monge-Kantorovich method
The Dynamical Monge-Kantorovich set of equations
We start by reviewing the basic elements of the mechanism chosen to solve the OT problems. As opposed to other standard optimization methods used to solve this (Cuturi 2013), we use an approach that turns the problem into a dynamical set of partial differential equations. In this way, initial conditions are updated until a convergent state is reached. The dynamical system of equations as proposed by Facca et al. (2018, 2020, 2021), is presented as follows. We assume that the OT problem is set on a continuous 2-dimensional space \(\Omega \in {\mathbb {R}}^{2}\), and at the beginning, no underlying network structure is observed. This gives us the freedom of exploring the whole space to design an optimal network topology, solution of the transportation problem. The main quantities that need to be specified in input are source and target distributions. We refer to them as sources and sinks, where a certain mass (e.g. passengers in a transportation network, water in a water distribution network) is injected and then extracted. We denote these with a “forcing” function \(f(x)=f^+(x)-f^-(x)\in {\mathbb {R}}\), describing the flow-generating sources \(f^+(x)\) and sinks \(f^-(x)\). To ensure mass balance it is imposed \(\int _\Omega f(x)dx = 0\). We assume that the flow is governed by a transient Fick–Poiseuille flux \(q=- \mu {\nabla }u\), where \(\mu ,u\) and q are called conductivity (or transport density), transport potential and flux, respectively. Intuitively, mass is injected through the source, moved based on the conductivity across space, and then extracted through the sink. The way mass moves determines a flux that depends on the pressure exerted on the different points in space; this pressure is described by a potential function.
The set of Dynamical Monge–Kantorovich (DMK) equations is given by:
$$\begin{aligned} -\nabla \cdot (\mu (t,x)\nabla u(t,x))&= f^+(x)-f^-(x), \end{aligned}$$
(1)
$$\begin{aligned} \frac{\partial \mu (t,x)}{\partial t}&= \left[ \mu (t,x)\nabla u(t,x)\right] ^{\beta } - \mu (t,x), \end{aligned}$$
(2)
$$\begin{aligned} \mu (0,x)&= \mu _0(x) > 0 , \end{aligned}$$
(3)
where \(\nabla =\nabla _{x}\). Equation (1) states the spatial balance of the Fick–Poiseuille flux and is complemented by no-flow Neumann boundary conditions. Equation (2) enforces the dynamics of this system, and it is controlled by the so-called traffic rate \(\beta\). It determines the transportation scheme, and it shapes the topology of the solution: for \(\beta <1\) we have congested transportation where traffic is minimized, whereas \(\beta >1\) induces branched transportation where traffic is consolidated into a smaller amount of space. The case \(\beta =1\) recovers shortest path-like structures. Finally, Eq. (3) constitutes the initialization of the system and can be thought of as an initial guess of the solution.
Solutions \((\mu ^*, u^*)\) of Eqs. (1)–(3) minimize the transportation cost function \({\mathcal {L}}(\mu ,u)\) (Facca et al. 2018, 2020, 2021), defined as:
$$\begin{aligned}&{\mathcal {L}}(\mu ,u) := {\mathcal {E}}(\mu ,u)+ {\mathcal {M}}(\mu ,u) \end{aligned}$$
(4)
$$\begin{aligned}&{\mathcal {E}}(\mu ,u) := \dfrac{1}{2}\int _{\Omega } \mu |{\nabla }u|^2 dx, \ \ {\mathcal {M}}(\mu ,u) := \dfrac{1}{2}\int _{\Omega } \dfrac{\mu ^{\frac{(2-\beta )}{\beta }}}{2-\beta } dx. \end{aligned}$$
(5)
\({\mathcal {L}}\) can be thought of as a combination of \({\mathcal {M}}\), the total energy dissipated during transport (or network operating cost) and \({\mathcal {E}}\), the cost to build the network infrastructure (or infrastructural cost). It is known that this functional’s convexity changes as a function of \(\beta\). Non-convex cases arise in the branched schemes, inducing fractal-like structures (Facca et al. 2021; Santambrogio 2007). This is the case that we considered in this work, and it is the only one where meaningful network structures, and thus, hypergraphs, can be extracted (Baptista et al. 2020).
Hypergraph sequences
Hypergraph construction
We define a hypergraph (also, hypernetwork) as follows (Battiston et al. 2020): a hypergraph is a tuple \(H = (V, E),\) where \(V = \{v_1, ... ,v_n\}\) is the set of vertices and \(E = \{ e_1, e_2, ... , e_m\}\) is the set of hyperedges in which \(e_i\subset V, \forall i = 1,...,m,\) and \(|e_i|>1\). If \(|e_i|=2,\forall i\) then H is simply a graph. We call edges those hyperedges \(e_i\) with \(|e_i|=2\) and triangles, those with \(|e_i|=3\). We refer to the 1-skeleton of H as the clique expansion of H. This is the graph \(G=(V,E_{G})\) made of the vertices V of H, and of the pairwise edges built considering all the possible combinations of pairs that can be built from each set of nodes defining each hyperedge in E.
Let \(\mu\) be the conductivity found as a solution of Eqs. (1)–(3). As previously mentioned, \(\mu\) at convergence regulates where the mass should travel for optimal transportation. Similar to Baptista and De Bacco (2021b), we turn this 2-dimensional function into a different data structure, namely, a hypergraph. This is done as follows: consider \(G(\mu ) = (V_G,E_G)\) the network extracted using the method proposed in Baptista et al. (2020). We define \(H(\mu )\) as the tuple \((V_H,E_H)\) where \(V_H = V_G\) and \(E_H = E_G \cup T_G,\) s.t., \(T_G = \{(u,v,w): (u,v),(v,w),(w,u) \in E_G, \}.\) In words, \(H(\mu )\) is the graph \(G(\mu )\) together with all of its triangles. This choice is motivated by the fact that the graph-extraction method proposed in Baptista et al. (2020) uses triangles to discretize the continuous space \(\Omega\), which can have a relevant impact on the extracted graph or hypergraph structures. Hence, triangles are the natural sub-structure for hypergraph constructions. The method proposed in this work is valid for higher-order structures beyond triangles. Exploring how these additional structures impact the properties of the resulting hypergraphs is left for future work.
Figure 1 shows an example of one of the studied hypergraphs. The red shapes represent the different triangles of \(H(\mu )\). Notice that, although we consider here the case where \(|e|\le 3\) for each hyperedge e—for the sake of simplicity—higher-order structures are also well represented by the union of these elements, as shown in the right panels of the figure.
Since this hypergraph construction method is valid for any 2-dimensional transport density, we can extract a hypergraph not only from the convergent \(\mu\) but also at any time step before convergence. This then allows us to represent optimal transport sequences as hypergraphs evolving in time, i.e. temporal hypernetworks.
Hypergraph sequences
Formally, let \(\mu (x,t)\) be a transport density (or conductivity) function of both time and space obtained as a solution of the DMK model. We denote it as the sequence \(\{\mu _t\}_{t=0}^T\), for some index T (usually taken to be that of the convergent state). Each \(\mu _{t}\) is the t-th update of our initial guess \(\mu _0\), computed by following the rules described in Eqs. (1)–(3). This determines a sequence of hypernetworks \(\{ H(\mu _t)\}_{t=0}^T\) extracted from \(\{\mu _t\}_{t=0}^T\) with the extraction method proposed in Baptista et al. (2020). Figure 2 shows three hypergraphs built from one of the studied sequences \(\{\mu _t\}\) using this method at different time steps. The corresponding OT problem is that defined by the (filled and empty) circles: mass is injected in the bottom left circle and must be extracted at the highlighted destinations. On the top row, different updates (namely, \(t=12, 18, 26\)) of the solution are shown. They are defined on a discretization of \([0,1]^2.\) Darkest colors represent their support. Hypergraphs extracted from these functions are displayed at the bottom row. As can be seen, only edges (in gray) and triangles (in red) are considered as part of \(H(\mu _t)\). Notice that the larger the t is, the less dense the hypergraphs are, which is expected for a uniform initial distribution \(\mu _0\) and branched OT (\(\beta >1\)) (Facca et al. 2021).
Graph and hypergraph properties
We compare hypergraph sequences to their correspoding network counterparts (defined as described in the previous paragraph). We analyze the following main network and hypergraph properties for the different elements in the sequences and for different sequences. Denote with \(G = (V_G,E_G)\) and \(H = (V_H, E_H)\) one of the studied graphs and hypergraphs belonging to some sequence \(\{ G(\mu _t)\}_{t=0}^T\) and \(\{ H(\mu _t)\}_{t=0}^T\), respectively. We consider the following network properties:
-
1.
\(|E_G|\), total number of edges;
-
2.
Average degree d(G), the mean number of neighbors per node;
-
3.
Average closeness centrality c(G): let \(v\in V_G\), the closeness centrality of \(v\) is defined as \(\sum _{u\in V_G} 1/d(u,v),\) where \(d(u,v)\) is the shortest path distance between \(u\) and \(v\).
Hypernetwork properties can be easily adapted from the previous definitions with the help of generalized adjacency matrices and line graphs (Aksoy et al. 2020). Let H be a hypergraph with vertex set \(V = \{1,..,n\}\) and edge set \(E = \{e_1, ... ,e_m\}\). We define the generalized node s-adjacency matrix \(A_s\) of H as the binary matrix of size \(n\times n\), s.t., \(A_s[i][j]=1\) if i and j are part of at least s shared hyperedges; \(A_s[i][j]=0,\) otherwise. We define the s-line graph \(L_s\) as the graph generated by the adjacency matrix \(A_s\). Notice that \(A_1\) corresponds to the adjacency matrix of H’s skeleton (which is \(L_1\)). Figure 3 shows a family of adjacency matrices together with the line graphs generated using them. We can then define hypergraphs properties in the following way:
-
1.
\(|E_H|\), total number of hyperedges;
-
2.
\(|T| = |\{e \in E_H: |e|= 3\}|,\) total number of triangles;
-
3.
\(S = \sum _{t\in T} a(t),\) covered area, where a(t) is the area of the triangle t;
-
4.
Average degree \(d_s(H)\), the mean number of incident hyperedges of size greater or equal than s per node;
-
5.
Average closeness centrality \(c_s(H)\): let \(v\in V_H\), the closeness centrality of \(v\) is defined as its closeness centrality in \(L_s\).
S can be defined in terms of any other property of a hyperedge, e.g. a function of its size |e|. Here we consider the area covered by a hyperedge to keep a geometrical perspective. On the other hand, this area S can be easily generalized to hyperedges with \(|e_{i}|>3\) by suitably changing the set T in the summation, e.g. by considering structures containing four nodes. As for the centrality measures, we focus our attention to compare the case \(s>1\) against \(s=1\), as the latter traces back to standard graph properties and we are interested instead to investigate what properties are inherent to hypergraps. Figure 4 shows values of the \(d_s(H)\) and \(c_s(H)\) for convergent hypergraphs H (obtained from different values of \(\beta\)) together with the degree and closeness centrality of their correspondent graph versions. The considered hypergraphs are displayed in the top row of the figure. As can be seen in the figure, patterns differ considerably for different values of \(\beta\). As s controls the minimum number of shared connections for different nodes in the networks, the higher this number, the more restrictive this condition becomes, thus leading to more disconnected line graphs. In the case of the s-degree centrality, we observe decreasing values for increasing s, with nodes with the highest centrality having much higher values than nodes less central. For both \(s=2,3\) we observe higher values than nodes in G. This follows from the fact that once hyperedges are added to G, the number of incidences per node can only increase. Centrality distributions strongly depend on \(\beta\). For small values—more distributed traffic (\(\beta =1.1\))—the number of hyperedges per node remains larger than the number of regular edges connected to it. But if traffic is consolidated on less space (\(\beta =1.9\)), then very few hyperedges are found. This suggests that the information learned from hypergraphs that is distinct to that contained in the graph skeleton is influenced by the chosen traffic regime.
As for the closeness centrality distribution, this resembles that of G for small values of \(\beta\), regardless s. For higher \(\beta\) it switches towards an almost binary signal. Thus, nodes tend to become more central as \(\beta\) increases, suggesting that adding hyperedges to networks G leads to shorter distances between nodes. The loss of information seen for the highest values of s is due to the fact that the line graphs \(L_s\) become disconnected with many small connected components. In these cases, the closeness centrality of a node is either 0 if it is isolated, or proportional to the diameter of the small connected component where it lives in.
Convergence criteria
Numerical convergence of the DMK Eqs. (1)–(3) is usually defined by fixing a threshold \(\tau\). The updates are considered enough once the cost associated to them does not change more (\(\le \tau\)) than that of the previous time step. As it is usually the case when this threshold is too small (\(\tau =10^{-12}\) in our experiments), the cost or the network structure may consolidate to a constant value earlier than algorithmic convergence. Similar to Baptista and De Bacco (2021b), to meaningfully establish when is hypergraph optimality reached, we consider as convergence time the first time step when the transport cost, or a given network property, reaches a value that is smaller or equal to a certain fraction p of the value reached by the same quantity at algorithmic convergence (in the experiments here we use \(p=1.05\)). We refer to \(t_{\mathcal {L}}\) and \(t_P\) for the convergence in times in terms of cost function or a network property, respectively.