Intersection of random spanning trees in complex networks

In their previous work, the authors considered the concept of random spanning tree intersection of complex networks (London and Pluhár, in: Cherifi, Mantegna, Rocha, Cherifi, Micciche (eds) Complex networks and their applications XI, Springer, Cham, 2023). A simple formula was derived for the size of the minimum expected intersection of two spanning trees chosen uniformly at random. Monte Carlo experiments were run for real networks. In this paper, we provide a broader context and motivations for the concept, discussing its game theoretic origins, examples, its applications to network optimization problems, and its potential use in quantifying the resilience and modular structure of complex networks.


Introduction
Given an undirected connected graph G, a spanning tree T of G is a subgraph that is a tree containing all the vertices of G.In the case of weighted graphs, the concept of minimum spanning tree, that is, a spanning tree with the smallest possible total edge weight, is of extraordinary importance.Spanning trees play a key role in many applications, such as network design, including computer networks, telecommunication networks, transportation networks, water supply networks, and electrical grids, see for example (Wu and Chao 2004); clustering, see for example the single linkage method and applications in finance (Mantegna 1999;Tola et al. 2008); or image registration and segmentation (Ma et al. 2000;Xu and Uberbacher 1997), just to mention a few, without being exhaustive.
Spanning trees in complex networks have been studied from various perspectives.They have been considered as skeletons of the network (Kim et al. 2004), used for dimension reduction (Mantegna 1999), or used for efficient visualization of evolving networks (Chen and Morris 2003).
In this work, we provide a different perspective of using spanning trees to get a deeper understanding of the structure of small-world networks.We start with a closely related game-theoretic concept.
An edge-tree game Alon et al. (1995) studied the following two-player zero-sum game on a connected graph G.In a round, tree player chooses a spanning tree T of G, while edge player chooses an edge e of G.The payoff to the edge player is defined by the function cost(T , e) as follows.If e is an edge of T, then cost(T , e) = 0 , while if e is not in T, then the cost is the length of the unique cycle (also called fundamental cycle) formed by adding e to T. They derived bounds for cost(T , e) for various classes of graphs and pointed out that the game arises in the context of the k-server problem, an online optimization problem on road networks.

Remarks
Motivated by the work of Alon et al., Bartal (1996) introduced the idea of hierarchically well-separated tree, or HST, which proved to be extremely useful in studying randomized algorithms on finite metric spaces.Essentially, a metric space M can be approximated with polynomially many randomly chosen weighted trees of a special property.(The property is that starting from the root the edge weights decrease exponentially.)Note that later Fakcharoenphol et al. gave the optimal form of this approximation (Fakcharoenphol et al. 2003).

Tree intersection game
Inspired by the ideas above we introduce a similar one replacing the edge player with another tree player.In this scenario, we are given a connected graph G, two players, Big and Small , choose a spanning tree, T B and T S of G, respectively, without knowing of each other's choices.The goal of Big is to maximize the number of common edges of T B and T S , i.e. the intersection of the two trees, while the goal of Small is the oppo- site.Finding optimal strategies and the value of this game seems far from obvious for general graphs.Optimal strategies can be found among mixed strategies, meaning that players choose each spanning tree with a probability given by a probability distribution over all spanning trees of the graph.
For some classes of graphs, the optimal strategy for both players is to choose a uniform random spanning tree.To compute the payoff (i.e. the intersection of the chosen spanning trees) in this case, we come to the problem of intersection of random spanning trees.This is the number of common edges of two spanning trees chosen uniformly at random.Although this choice is not an optimal strategy (of any player) of the game in general, the parameter it provides (the size of the intersection) captures a lot about the structure of a graph.For real small-world networks, the size of the mean intersection provides powerful insights into their structure.It seems that mainly the modular structure of the network and the number of weak links (i.e.links between communities) Granovetter (1973) determine the value of the parameter, since links between communities tend to be part of the spanning trees of the network.Roughly speaking, one expects large spanning tree intersections for networks with high Newman modularity, since the edges between communities, the weak links, are often overrepresented in the spanning trees.On the other hand, interestingly, our results suggest that network heterogeneity, i.e. heterogeneous degree distribution, does not itself indicate higher spanning tree overlap as the expected minimum.

Remarks
A somewhat similar game was introduced and analyzed by Gueye et al. (2010), often referred to as the secure broadcast game, see e.g.Kottegoda (2020).In this game, a broadcaster B located in a network node wants to broadcast a message to all other nodes.This is done by choosing a spanning tree.The other player, called eavesdropper E , can observe the transmission along a single link.B wins the game if the spanning tree avoids E 's edge, while E wins if the tree includes it.The game has been used for an interpretation of the p-modulus theory of graphs, see (Kottegoda 2020;Albin et al. 2021).
The rest of the paper is organized as follows.At first, we briefly introduce spanning tree games on graphs and show simple examples.Then we discuss the main concepts and definitions we are dealing with.We derive a lower bound on the expected intersection of two random spanning trees.This is done by considering a more general problem using hypergraphs.Finally, we present experimental results on both synthetic network models and real-world networks, as well as some possible directions for future work.
Throughout this paper, G = (V , E) will be a finite, connected, undirected and unweighted graph with |V | = n and |E| = m.

Spanning tree games on graphs
The games mentioned in the Introduction, (Alon et al. 1995;Kottegoda 2020), and the tree overlap game are so-called matrix games, which are in a sense perfectly described.The available (pure) strategies of Big can be listed as R 1 , . . ., R k , while those of Small as O 1 , . . ., O ℓ .Then matrix A such that a ij , the element in the ith row and jth column is the payoff of Big playing strategy R i , assuming Small plays the strategy O j .
The Minimax theorem, see e.g. at Chvátal's book (Chvatal 1983), states that there are probability distributions x and y on the rows and columns, which are the optimal strate- gies for the players.These strategies can be computed efficiently by linear programming.However, in the case of combinatorial games the number of strategies is usually too large to consider this approach.

Case of G is a union of two disjoint spanning trees
This construction plays an important role in the solution of the Shannon switching game by Lehman (1964).For a given simple graph G two players, Maker and Breaker, alternately claim the edges of G by taking turns.Maker wins by claiming a connected spanning subgraph, otherwise Breaker wins.
Theorem 1 (Lehman 1964) For a graph G, Maker as a second player wins the Shannon switching game on G if and only if G contains two edge-disjoint spanning trees.

Observation 1 If a graph G on n vertices is the union of two disjoint spanning trees, then the number of different trees in G is at least
Proof For a simple argument we recall the celebrated Erdős-Selfridge heorem (Erdös and Selfridge 1973).A hypergraph F = (V , H) is a ground set V (nodes) and H ⊂ 2 V (hyperedges).A Maker-Breaker game on F is as before, the players take the elements of V and Maker wins by taking all elements of an A ∈ H.
Theorem 2 (Erdös and Selfridge 1973) Breaker can win a Maker-Breaker game G) , the edge set of G and H be the set of spanning trees.Every span- ning tree has n − 1 edges, that is �(F ) = 2 n−1 |H| .Since Maker wins by Theorem 1, by Theorem 2 we have �(F ) ≥ 1 , which gives 2 n−1 ≤ |H| .

Observation 2 If a graph G on n vertices is the union of two disjoint spanning trees, then the value of the tree intersection game is
Proof Let G be the union of T 1 and T 2 .Big takes both T 1 and T 2 with probabil- ity . On the other hand, Small can follow the same strat- egy with expected payoff (n − 1)/2 .

Observation 3 The value of the tree intersection game is
Proof If n is even, we can divide the edge set of K n to n/2 disjoint trees T 1 , . . ., T n/2 .Taking those by probability 2/n each, is optimal strategy for both players, with value 2(n − 1)/n .If n is odd, we divide the edge set into (n − 1)/2 Hamiltonian circuits C 1 , . . ., C (n−1)/2 , and within each C i we consider the Hamiltonian paths P i 1 , . . ., P i n , where P i j ⊂ C i missing the j th edge of C i .Taking the appropriate uniform distribution on {P i j } i,j one observes that this is optimal for both players and gives expected intersection 2(n − 1)/n .

Observation 4 If graph G has an edge-transitive automorphism group, 2 then the value of the tree overlap game is
Proof Take the uniform distribution of a arbitrary fixed tree T over the automorphism group.We claim that this is optimal for both players.Every edge e ∈ E(G) is covered by probability (n − 1)/m , which gives the expected intersection (n − 1)2 /m of T with any fixed tree T ′ .

Remarks
One can argue that Observation 3 is a simple consequence of Observation 4. It is still worth separating them, since the proof of Observation 3 uses only linear number of trees, while the proof of the other may need super exponential number of trees.It is not clear if this number can be reduced in general.

Random spanning trees
Let T G be the set of all spanning trees of a graph G.The cardinality of T G , i.e. the num- ber of spanning trees of G, is explicitly known by Kirchhoff 's matrix tree theorem and can be calculated as the product of the positive eigenvalues of the graph Laplacian matrix L, divided by n.Moreover, it can be calculated in polynomial ( ∼ n 3 ) time, by removing the ith row and ith column (for any i) of L and calculate the determinant of the remaining (n − 1) × (n − 1) matrix.This allows us to precisely define the uniform random spanning trees, i.e. the uniform probability distribution over all spanning trees.Note also that for an edge e ∈ E(G) we can compute the probability p(e) that the edge e is in a uniform random spanning tree of G in polynomial time.This will be important in the use of Observation 5.
Another results of Kirchhoff states that the probability of e ∈ T for any edge e and a random tree T equals to the effective resistance of that edge, when considering the graph as an electrical network with unit edge conductance.For more details and computation methods see (Ellens et al. 2011).
Although we are able to determine what the probability of picking a spanning tree uniformly at random is, it is not straightforward how to actually generate one.Several algorithms have been provided to generate random spanning trees of an undirected graph, see, for instance Broder's (1989), Wilson's (1996) algorithms.In our experiments we utilized Wilson's algorithm available in Python package DPPy (Gautier et al. 2019).
We should note here that more general random spanning trees can be considered given any probability mass function over T G .Related problems and results are dis- cussed e.g. in Albin et al. (2021).In this paper, we consider only the uniform distribution case.

Minimum expected intersection of random spanning trees
In order to provide a lower bound on the expected value of the number of common edges of two random spanning trees of G we consider a more general problem on hypergraphs.We point out that the calculations can be easily done directly for graphs and its random spanning trees.Now, let F = (V , H) be a uniform hypergraph, i.e. all hyperedge A ∈ H contains the same number of nodes.Let |V | = µ , |H| = κ and |A| = ν for each hyperedge A ∈ H.
Let A and B be two hyperedges of F chosen independently with probability 1/κ .The expected hyperedge intersection of hypergraph F is E(|A ∩ B|).
The key theorem that works for any hypergraph H is the following.
Theorem 3 For any hypergraph F the expected hyperedge intersection is at least ν 2 /µ.

Proof
The degree d F (x) of a node x ∈ V is the number of hyperedges containing x.
Then x∈V d F (x) = κν , while the average degree is d F = κν/µ .If we pick two hyper- edges A and B randomly, then the probability that they both contain a node x with degree d F (x) is and hence the expected number of edges in the intersection is by noting that the expectation is minimized when all nodes have the same degree.
Observation 5 Let F = (V , H) and p(x) = Pr(x ∈ A) , where A ∈ H chosen uniformly at random.The exact hyperedge intersection value of F is x∈V p 2 (x).3Now let us define F = (V , H) as follows.Each node x ∈ V of F corresponds to an edge e ∈ E of a graph Gand a hyperedge A ∈ H corresponds to a spanning tree T of G.That is, V (F ) = E(G) and H = T G .Hence, F is an ( n − 1)-uniform hypergraph with |V | = m , and using Theorem 3 we get the following lower bound on the expected span- ning tree intersection for any graph.
Corollary Given a connected graph G of n nodes and m edges and T 1 , T 2 ∈ T G two ran- dom spanning trees of G.The minimum expected intersection of T 1 and T 2 , i.e. the mini- mum expected spanning tree intersection of G, is (n − 1) 2 /m.

Remarks
In the special case where G is the union of two spanning trees the minimum expected intersection (n − 1)/2 is easily obtained using Theorem 3, by noting that in this case ν = n − 1 and µ = 2(n − 1) .Observe that it is equal to the value of the game in the 2-player game on this graph introduced above.
For G = K n , we have ν = n − 1 and µ = n 2 , according to Theorem 3 the minumum expected intersection is 2(n − 1)/n , which is the value of the game in the 2-player game on this graph introduced above.

Experiments
In this section, through numerical experiments, we investigate how the experimentally calculated intersection values (using Monte-Carlo simulations) differ from the previously derived minimum expected intersection.To provide a simple metric that shows how likely a network's random spanning trees are to intersect, compared to the minimum expected intersection, we use the following normalized score: RTI takes values between 0 and 1. RTI = 0 means that the empirical mean intersection is equal to the minimum expected intersection, while RTI = 1 if and only if the network is a tree.Note that the maximum possible intersection of two spanning trees is n − 1 (when they are identical), but we used n in the formula to avoid division by zero.We conducted our experiments using the following setup.For each network we randomly sampled 100 pairs of spanning trees with bootstrap sampling (i.e. a sample may be used multiple times) and computed the empirical mean intersection and the corresponding standard deviation value.This is called the observed mean intersection.The minimum expected intersection, according to the results of the previous section was calculated as (n − 1) 2 /m.
The results for n = 1000 are shown in Fig. 1 and suggest that in case of random net- work models the minimum expected intersection value is equal to the observed (empirical) intersection value obtained by the simulations.Similar conclusions can be drawn for n = 100, 200, 500 .For almost all cases the standard deviation was close to zero confirming the robustness of the simulations.An interesting observation is that heterogeneous degree distribution (e.g. the power-law), by itself does not indicate a higher spanning tree intersection value as minimally expected.A deeper investigation of the intersection value as the function of the model graph's parameter (such as the parameter p for WS(n, k, p), or power-law exponent of PA graphs) could be the subject of another study.For example, since m ∝ n 2 p for G(n, p) we expect an inter- section size 2/p, which is confirmed by our simulations (Fig. 1 top right).
In this work we are more interested in that how the minimum expected value differs from the real one in the case of real complex networks.In the next section we present our experiments considering networks with various global structural characteristics.

Experiments on real-networks
To perform experiments on real-world networks we considered the largest connected component in the case of unconnected networks and we did not consider the direction and/or weight of the edges in the case of directed and/or weighted networks, respectively.
We compared the RTI score with some metrics that are commonly used as indicators of the small-world structure of a network.These are the network density ρ = m/ n 2 , the clustering coefficient cc = 3 × number of triangles/ n 3 and the average shortest path length ℓ = 1/ n 2 • i� =j ℓ ij , where ℓ ij is the length of the shortest path between nodes i and j.
Table 1 shows the results on 15 real-world complex networks including social, biological and technological networks as well (the network data are available on the websites Mark Newman's network data, Network Repository, Kunegis 2013, and Stanford Large Network Dataset Collection).The network size varies from n = 15 (marriage ties between Florentine families) to n = 33, 696 (Enron email communication network).The average shortest path length varies between ℓ = 2.235 (collaboration network of jazz musicians) and ℓ = 6.053 (collaboration network of the fields general relativity and quantum computing based on Arxiv articles), except for the US power grid and EuroRoad technological networks with ℓ = 19 and ℓ = 18.4 , respectively.Almost all networks have a relatively high clustering coefficient providing a strong small-world property of the studied networks.
The highest RTI scores were obtained for the Erdős co-authorship network (RTI=0.493)and the Enron email network (RTI=0.376),for which the difference between the empirically observed mean and the minimum expected intersection value is more than 10,000, i.e. more than 5% of the total number of edges.Figure 2 shows the RTI values in terms of the value (observed mean -min.expected)/m, i.e. the difference between the real tree intersection value and the expected lower bound as a percentage of the total number of edges in the network.The Iceland sexual contact network, the Arxiv collaboration network and the Wiki-Vote network also provided high RTI (equals 0.353, 0.284 and 0.259, resp.).

Spanning tree intersection and modularity
Comparing RTI and Newman modularity scores provides another interesting aspect of random tree intersection.Modularity is calculated as given a partition C of the nodes, such that C i ∈ C is the class to which node i belongs, a ij is the element of the network's adjacency matrix indicating whether nodes i and j are connected or not, d i is the degree (number of connections) of node i, while δ(C i , C j ) = 1 if i and j belong to the same class and 0 otherwise.Q is computed by considering the community structure (partition) obtained by the "Leuven" fast greedy method (Blondel et al. 2008).Among the investigated networks the Word adjacencies (Adjnoun), the Jazz musician, the Zacharay and the C-elegans networks show the lowest RTI score (in the range 0.051-0.12)and the lowest modularity score (in the range 0.293-0.364)as well.On the other hand, Erdös, Email-Enron, Iceland and Arxiv GR-QC (have RTI scores that are among the highest) have high modularity scores between 0.504 and 0.793, see Fig. 2.However, Powergrid, Euroroad and NetSci networks have the highest modularities (0.838-0.934), but we need to note here that, according to Good et al. (2010), increasing n (or increasing number of clusters) generally tend to increase max Q .These suggest that not only modularity, but prob- ably also the density and the clustering coefficient correlates with RTI, the former positively (0.34, on the investigated dateset), while the latter two negatively ( − 0.5 and − 0.33, resp.).A Deeper analysis on correlations and relationship of RTI to other metrics remains the topic of a future study.

Conclusions
In this work we investigated that how randomly chosen spanning trees are likely to intersect in order to gain some new insights on the macrocopic structure of complex networks.First, we derived a formula for the lower bound of the expected value of the intersection (i.e. the number of common edges of two randomly chosen spanning trees) as a function of the number of nodes and edges of the network.We compared this value with the empirical (real) intersection value (obtained by simulation experiments) and found that in case of random model networks, such as the preferential attachment or the Watts-Strogatz small world network model, there is no significant difference between the two values.
On the other hand, and more interestingly, experiments show that for some real networks, the empirical mean intersection is very different from the expected minimum, suggesting the existence of special links that are likely to appear in most spanning trees of the network.Comparing the networks' Newman modularity with the introduced random tree intersection (RTI) score we observed a positive correlation on the investigated network dataset and we hypothesise that network size, density and modularity are the main drivers of the RTI value.This is consistent with the existence of "weak links" (Granovetter 1973), connecting separate clusters (or modules) of the network, as they are often over-represented in the spanning trees of the network.Looking at RTI in this context, it could be seen as a measure of network resilience, as the edges that appear more often in the spanning trees are more crucial for the connectivity of the network.This suggest at least two direction of future work.Firstly, the relationship between the probability of an edge being in a randomly chosen spanning tree and the edge between value seems interesting.Secondly, network connectivity and resilience are worth examining from this perspective.Other possibilities include deriving the exact bounds of spanning tree intersection for random model networks, and examining the relationship between the spanning tree intersection value and widely used global structural metrics of complex networks in more detail.

Fig. 1
Fig. 1 Experimental results on synthetic networks: Erdős-Rényi, random regular, Barabási-Albert and Watts-Strogatz (top-down) with 1000 nodes each.Red diamonds (with red lines indicating standard deviaton) show the observed results based on bootstrap Monte-Carlo experiments, while blue bars show the minimum expected intersection values

Fig. 2
Fig. 2 Relationship between RTI and modulaity (left) and RTI and clustering (right) on the investigated real-world dataset.Dot size is adjusted according to the number of nodes

Table 1
Network statistics and random spanning tree intersection values of some real-world network.Data sources: Mark Newman's network data, Network Repository, Kunegis (2013), Stanford Large Network Dataset Collection