- Research
- Open access
- Published:
DC-RST: a parallel algorithm for random spanning trees in network analytics
Applied Network Science volume 9, Article number: 45 (2024)
Abstract
The Mantel Test, discovered in the 1960s, determines whether two distance metrics on a network are related. More recently, DimeCost, an equivalent test with improved computational complexity, was proposed. It was based on computing a random spanning tree of a complete graph on n vertices—the spanning tree was computed using Wilson’s random walk algorithm. In this paper, we describe DC-RST, a parallel, divide-and-conquer random walk algorithm to further speed up this computation. Relative to Wilson’s sequential random-walk algorithm, on a system with 48 cores, DC-RST was up to 4X faster when first creating random partitions and up to 20X faster without this sub-step. DC-RST is shown to be a suitable replacement for the Mantel and DimeCost tests through a combination of theoretical and statistical results.
Introduction
The Mantel Test (Mantel 1967) is a well-established statistical test for characterizing the relation between two distance metrics (for example, to determine the relationship between drive times and driving distances in a road network). The Mantel Test and its variants have been used to quantify such relationships and is available in a package for the statistical language R (Dray and Dufour 2007). Schneider and Borlund (2007) have reviewed its use in anthropology, psychology, and geography; see also Sokal and Rohlf (1962), Corliss et al. (1974) and Cooper-Ellis and Pielou (1994). Ricaut et al. (2010) used it to determine whether there is a correlation between genetic and discrete trait proximity matrices for individuals in the Egyin Gol necropolis in Mongolia. Smouse et al. (1986) describes the importance of the Mantel Test in biology, on distance metrics of genetic markers, morphological traits, ecological divergence, etc. Kouri et al. (2014) used it in cheminformatics, to study the relationship between bond count distances and Tanimoto distances.
The Mantel Test entails computing Pearson’s correlation coefficient on an \(n \times n\) matrix for a number of permutations (denoted by t), making the overall computation \(\Theta (t \cdot n^2)\), where t (typically 100) may be considered to be a constant. This computational complexity is prohibitive for modern Big Data applications (social networks, the internet web graph, etc.), where \(n \sim 10^9\). Bourbour et al. (2020) recently presented DimeCost that instead utilizes uniform random spanning trees to achieve an improved complexity of \(\Theta (t \cdot n)\).
This paper goes one step further: a key bottleneck in DimeCost is the computation of a uniform random spanning tree of a complete graph using Wilson’s algorithm; we aim to speed this up by performing this step in parallel. While the generation of uniform random spanning trees in parallel has been studied before, this was only done theoretically (Anari et al. 2021). Our proposed algorithm, DC-RST (Divide and Conquer-Random Spanning Tree), has been analyzed theoretically and implemented experimentally, and is shown to be a suitable replacement for sequential random spanning tree generation.
Definitions
Consider a set of n objects and two distance metrics \(d_1\) and \(d_2\), which can be used to compute pairwise distances between all pairs of objects.Footnote 1 A distance metric can be interpreted as a symmetric matrix: an \(n \times n\) matrix, say \(D_1\), where \(D_1[i,j]\) is the distance between i and j using the \(d_1\) metric. It may also be viewed as a weighted network: a weighted, undirected, complete graph, \(G_1\), where each vertex corresponds to an object, and an edge from \(v_i\) to \(v_j\) has weight equal to the distance between i and j using the distance metric \(d_1\). Both representations are equivalent: matrix \(D_1\) can be viewed as the weighted adjacency matrix for \(G_1\) as illustrated in Fig. 1.
Mantel test
The Mantel test answers the question: is there a statistically significant relationship between distance metrics \(d_1\) and \(d_2\)? In the road network example, we expect drive times to be closely related to driving distances, and the Mantel test provides a quantitative method for measuring this relationship. Specifically, the Mantel test takes as input the matrices \(D_1\) and \(D_2\) and:
-
1.
Computes the Pearson’s correlation coefficient (\(r_{init}\)) between \(D_1\) and \(D_2\).
-
2.
Randomly permutes the rows and columns of \(D_1\) to obtain \(D_1'\).
-
3.
Compute r between \(D_1'\) and \(D_2\).
Steps 2 and 3 are run t times and the number of times x that \(r > r_{init}\) is recorded. If \(x/t \le p\) (e.g., p could be 0.05), the test asserts that metrics \(d_1\) and \(d_2\) are related (Mantel 1967).
Dimecost
Bourbour et al. (2020) proposed an algorithm called Dimecost, which uses uniform, random spanning trees instead of matrices. Recall that the distance matrix is equivalent to a weighted, undirected, complete graph on n vertices. Dimecost computes a random spanning tree of this graph. \(r_{init}\) is then computed using the edges of the spanning tree and permutations of the \(d_1\) weights are used to perform a test similar to Mantel. Bourbour et al. show this method works better than randomly selecting edges, has a lower complexity than the original Mantel test (\(\Theta (t \cdot n)\) instead of \(\Theta (t \cdot n^2)\)), and results in similar correlation values between distance metrics.
Random walks
A key step in Dimecost uses random walks on the graph to obtain a uniform, random spanning tree. Given an undirected graph \(G = (V, E)\), a spanning tree is any undirected, acyclic, connected sub-graph \(T = (V_T, E_T)\) that spans the graph. In general, undirected graphs have many spanning trees; thus, a uniform random spanning tree is a spanning tree, selected at random from the possible set of all spanning trees (where each tree is equally-likely [uniformly] to be selected). Aldous (1990) and Broder (1989) gave equivalent algorithms using a random walk to generate a uniform random spanning tree by tracking the edges traversed when discovering a vertex. Their algorithm has complexity equal to the mean cover time of the graph—for cliques, this is \(O(n \log n)\) (Lovász 1993). Later, Wilson (1996) gave a different algorithm: the key difference is at any point in the random walk, a partial tree T has been created (initially just the starting vertex). To expand T, a random vertex \(v_i\) not in T is selected uniformly at random, and a random walk is started at \(v_i\) until a path (with cycles erased) is found that reaches T. This path is then added to T, forming a larger partial tree. The algorithm terminates when T consists of \(n - 1\) edges, i.e., when T is a spanning tree. The complexity of Wilson’s algorithm is equal to the mean hitting time of a graph—for cliques, this is O(n) (Lovász 1993). Dimecost uses Wilson’s algorithm.
Parallel algorithm and analysis
Algorithm outline
We propose the following approach to create random spanning trees.
-
1
Randomly partition the original clique into k sub-cliques.
-
2
Run Wilson’s algorithm on each sub-clique, forming a forest of k spanning sub-trees.
-
3
“Merge” the sub-trees together to form the final tree as follows:
-
(a)
Consider each sub-clique to be a “supernode”, mutually connected to form a “supergraph” clique.
-
(b)
Run Wilson’s algorithm on the “supergraph” to get a “supertree”.
-
(c)
For each superedge in the supertree, obtain a “real” edge by choosing uniformly at random a vertex from both subtrees and join with an edge.
-
(a)
Consider a clique of 24 vertices randomly partitioned in Step 1 into four subgraphs of six vertices each, as shown in Fig. 2.
For each of the four sub-graphs, Step 2 runs four independent instances of Wilson’s algorithm (one per sub-graph) in parallel resulting in a forest of spanning trees (referred to as “sub-trees”) shown in Fig. 3. Recall, Wilson’s algorithm generates uniform random spanning trees, thus this forest of spanning trees is just one of many possible forests.
With these sub-trees, we now run the “Merge” step of Step 3. Step 3a creates the “supergraph”, which is a clique of 4 vertices as shown in Fig. 4. Step 3b runs Wilson’s algorithm on the supergraph, to form a “supertree”, also shown in Fig. 4.
Step 3c converts each superedge in the supertree to a “real” edge by choosing uniformly at random a vertex from each tree and joining those two with an edge. The example supertree indicates trees 0 and 2 should be connected with an edge, thus a vertex from each tree is chosen at uniform random to be connected (vertex 12 and 3). This process, repeated for the remaining superedges, results in the final tree (new edges in red), shown in Fig. 5.
Theoretical analysis
We first show that our algorithm meets the following necessary condition for a uniform random spanning tree.
Theorem 1
Let \(G = (V, E)\) be an complete, undirected graph (i.e., a clique), with \(n = |V|\) and \(m = |E|\). Let \((u, v) \in E\) be any edge in G. Let T be a spanning tree, chosen uniformly at random from all possible spanning trees of G (i.e., T is a uniform random spanning tree).
Then, the probability \((u,v) \in T\) is:
Proof
We prove this by composition:
-
1
For a clique with n vertices, there are \(n^{n-2}\) spanning trees (Cayley’s formula).
-
2
Each spanning tree has \(n - 1\) edges
-
3
The total number of edges in the graph is:
$$\begin{aligned} m = (n - 1) + (n - 2) + \cdots + 1 = {n \atopwithdelims ()2} = \frac{n \cdot (n - 1)}{2} \end{aligned}$$(2)
Combining, we get:
\(\square\)
We now consider the case where the original graph is partitioned into equal-sized disjoint groups, and our proposed algorithm is used to create the final tree T.
Theorem 2
Let \(G = (V, E)\) be a clique, let \((u, v) \in E\) be any edge in G, and let T be a spanning tree our algorithm creates. If the vertices V are partitioned uniformly at random into \(n_1\) groups each of size \(n_2\) (thus \(n = n_1 \cdot n_2\)), then the probability \((u, v) \in T\) is:
Proof
When partitioning, there are two general “types” of edgesFootnote 2:
-
1
Edges within a partition (called “within” edges). Formally: given a subset of vertices \(V_i \in V\), an edge (u, v) is a within edge if and only if \(u \in V_i \wedge v \in V_i\).
-
2
Edges spanning between partitions (called “between” edges). Formally: given two subsets of vertices \(V_i \in V\), \(V_j \in V, i \ne j\); an edge (u, v) is a between edge if and only if \(u \in V_i \wedge v \in V_j\)
We start by considering each case individually.
-
1.
Within Edges. Due to equal partitions, \(|V_i| = n_2 \quad \forall i \in [1, n_1]\); thus,
$$\begin{aligned} P((u, v) \in T | u \in V_i \wedge v \in V_i) = \frac{2}{n_2} \end{aligned}$$(7)This is a corollary from Theorem 1.
-
2.
Between Edges. Between any pair of sub-trees, there are \(n_2^2\) edges (each vertex, of which there are \(n_2\) in each tree, has an edge to every vertex in the other tree). There are \({n_1 \atopwithdelims ()2}\) unique pairs of sub-trees, thus the total number of “between” edges is:
$$\begin{aligned} {n_1 \atopwithdelims ()2} \cdot n_2^2 \end{aligned}$$(8)The random spanning tree T has \(n_1 - 1\) “between” edges (since there are only \(n_1\) sub-trees to be connected), thus the probability that \((u, v) \in T\) given that \(u \in V_i \wedge v \in V_j, i \ne j\) is:
$$\begin{aligned} P((u, v) \in T | u \in V_i \wedge v \in V_j) = \frac{n_1-1}{{n_1 \atopwithdelims ()2} \cdot n_2^2} = \frac{n_1-1}{n_1 (n_1 - 1) /2 \cdot n_2^2} = \frac{2}{n_1 \cdot n_2^2} \end{aligned}$$(9)
We now construct the final probability by using these two cases. To do this, we will first calculate the probability two arbitrary vertices \(v_1\) and \(v_2\) are placed into the same or different partitions.
-
Let \(E_1\) be when \(v_1\) and \(v_2\) are placed into the same partition. Formally, \(u \in V_i \wedge v \in V_i\)
-
Let \(E_2\) be when \(v_1\) and \(v_2\) are placed into different partitions. Formally, \(u \in V_i \wedge v \in V_j, i \ne j\)
Clearly, \(P(E_2) = 1 - P(E_1)\). We now first calculate \(P(E_1):\)
Without loss of generality, suppose u is randomly placed into \(V_i\). Since each partition will be of size \(n_2\), and there are n vertices total, the probability v is also placed into \(V_i\) is \(\dfrac{n_2 - 1}{n - 1}\).
We now combine the two cases above with \(P(E_1)\) and \(P(E_2)\) to solve for the final probability.
\(\square\)
We now consider the case where the vertices V are randomly split into k groups of possibly unequal size, \(g_1\) through \(g_k\). That is:
Theorem 3
Let \(G = (V, E)\) be a clique, let \((u, v) \in E\) be any edge in G, and let T be a spanning tree our algorithm creates. If the vertices V are partitioned uniformly at random into k groups of possibly unequal size, then the probability \((u, v) \in T\) is: \(P((u, v) \in T) = \frac{2}{n}\)
Proof
Follows a similar sequence as the proof for equal partitions.
When partitioning, there are two general “types” of edges:
-
1
Edges within a partition (called “within” edges). Formally: given a subset of vertices \(V_i \in V\), an edge (u, v) is a within edge if and only if \(u \in V_i \wedge v \in V_i\).
-
2
Edges spanning between partitions (called “between” edges). Formally: given two subsets of vertices \(V_i \in V\), \(V_j \in V, i \ne j\); an edge (u, v) is a between edge if and only if \(u \in V_i \wedge v \in V_j\)
We start by considering each case individually.
-
1
Within Edges For any group \(g_i\) which contains vertices \(V_i\):
$$\begin{aligned} P((u, v) \in T | u \in V_i \wedge v \in V_i) = \frac{2}{|g_i|} \end{aligned}$$(13)This is a corollary from Theorem 1.
-
2
Between Edges Between any two partitions \(g_i\) and \(g_j\), there are \(|g_i| \cdot |g_j|\) edges. Since the groups \(g_1, \ldots , g_k\) are of possibly unequal sizes, we must pairwise multiply the each group and sum to count the total number of “between” edges.
$$\begin{aligned} \sum _{1\le i < j \le k} \left( n_i \cdot n_j\right) \end{aligned}$$(14)By Vieta’s formula, this sum equals the 2nd coefficient, \(a_{k-2}\), of the polynomial given in (15).
$$\begin{aligned} f(x) = x^k + a_{k-1} x^{k-1} + a_{k-2} x^{k-2} + \cdots + a_1 x^1 + a_0 = (x - |g_1|) \cdot (x - |g_2|) \cdot \ldots \cdot (x - |g_k|) \end{aligned}$$(15)The random spanning tree T has \(k - 1\) “between” edges (since there are only k sub-trees to be connected), thus probability \((u, v) \in T\) given (u, v) is a “between” edge:
$$\begin{aligned} P((u, v) \in T | (u,v) \text { is a ``between'' edge}) = \frac{k - 1}{a_{k - 2}} \end{aligned}$$(16)
We now construct the final probability by using these two cases. To do this, we will first calculate the probability two arbitrary vertices u and v are placed into the same or different partitions.
-
Let \(E_1\) be the event when \(v_1\) and \(v_2\) are placed into the same partition \(g_i\). Formally, \(u \in V_i \wedge v \in V_i\)
-
Let \(E_2\) be the event when \(v_1\) and \(v_2\) are placed into different partitions \(g_i\) and \(g_j\). Formally, \(u \in V_i \wedge v \in V_j, i \ne j\)
We now calculate \(P(E_1)\) and \(P(E_2)\):
Without loss of generality, suppose u is randomly placed into \(g_i\). Since each \(g_i\) will be of size \(|g_i|\), and there are n vertices total, the probability v is also placed into \(g_i\) is \(\dfrac{|g_i| - 1}{n - 1}\). A similar argument is made for \(P(E_2)\).
We now combine the two cases above with \(P(E_1)\) and \(P(E_2)\) to solve for the final probability.
While the initial step is complicated, it follows directly from the previous work:
-
1
We enumerate over all pairwise groups, and need to consider two cases \(i = j\) and \(i \ne j\).
-
2
The first case, \(i=j\), is simply: \(P(E_1) \cdot P((u, v) \in T | E_1)\)
-
3
The second case, \(i \ne j\) is the same idea: \(P(E_2) \cdot P((u, v) \in T | E_2)\) However, when this is placed in the summation, we need to tweak \(P((u, v) \in T | E_2)\). This is because the solution found previously using Vieta’s formula already accounts for the sum, yet we are starting from the sum again.Footnote 3 Therefore, (a) only 1 of the \(|g_i| \cdot |g_j|\) edges between \(g_i\) and \(g_j\) is needed, and (b) \(\dfrac{k-1}{{k \atopwithdelims ()2}} = \dfrac{2}{k}\) is the probability an edge will even be needed between \(g_i\) and \(g_j\) (i.e., that’s the probability Wilson’s algorithm chooses the “supernodes” \(g_i\) and \(g_j\) are to be connected directly with an edge).
\(\square\)
However, a simple counterexample shows that DC-RST does not generate random spanning trees with uniform probability. Specifically, the “star” tree (Fig. 6) will never be generated by DC-RST unless \(k = 1\) or \(k = n\) (when DC-RST degenerates to Wilson’s algorithm). Thus, DC-RST creates a random (but non-uniform) spanning tree of a clique such that edges are chosen with the same probability as a uniform random spanning tree.
Statistical analysis
This raises the following question: if DC-RST does not generate uniform random spanning trees, then is DC-RST a suitable replacement for Wilson’s algorithm in DimeCost? To answer this, we compared two versions of DimeCost—one using Wilson’s algorithm, the other using DC-RST—on three data sets used by Bourbour et al. In each experiment, we use the Mantel Test to compute the correlation coefficient between a pair of distance matrices (r), and compute 95% confidence intervals (CIs) for DimeCost using both Wilson’s algorithm and DC-RST for spanning tree generation. We then examine whether the CIs contain r.
Data set 1: Comparison of distance norms
For this first data set, we randomly generated 25 (x, y) points. For each pair of points (i.e., \(\frac{25\times 24}{2} = 300\) pairs of points), we applied four distance metrics: the \(L_1\), \(L_2\), \(L_3\), and \(L_\infty\) norms; considering these pairwise gives six data sets.Footnote 4 The results are summarized in Table 1.
The CIs using DC-RST differ slightly from those using Wilson’s algorithm, but still contain r from the Mantel Test.
Data set 2: Line graph
For this example, consider a road network represented by a line graph, modeling a long continuous stretch of an interstate highway where each edge represents a highway segment. In the first graph, the weights of all edges (representing travel distances) are equal to 1. In the second graph (representing travel time), the weights of all edges are again 1 except the last edge, which possesses an unusually high weight, 100. This could represent an accident in that segment of the highway causing times, but not distances, to drastically increase. Both graphs are illustrated for the case of \(n = 6\) in Fig. 7.
We then computed distance matrices for both metrics by computing the shortest path distance between each pair of vertices. Corresponding elements in the two matrices have identical values, except the last row and column, which represent edges to the last vertex. The results are summarized in Table 2.
As before, while the CIs using DC-RST differ slightly from those computed using Wilson’s algorithm, they still contain the r value calculated by the Mantel Test. This test previously showed that simply choosing random edges from the complete graph does not suffice, as the CIs generated in this example with random edges do not contain the r value calculated by the Mantel Test (Bourbour et al. 2020).
Data set 3: Random
Lastly, we consider the case of two random distance metrics; i.e., the value returned by \(d_1(a, b)\) and \(d_1(b, a)\) is simply a random number (likewise for \(d_2\)). We expect these two distance metrics to be uncorrelated. The results are summarized in Table 3.
Again, the confidence intervals both contain the correlation coefficient calculated by the Mantel Test. These results suggests that DC-RST is indeed a suitable replacement for Wilson’s algorithm in DimeCost.
Performance analysis
Implementation details
We implemented DC-RST in C++ along with the OpenMP parallel library. The pseudocode is shown below in Fig. 8.
The function random-partition() was implemented by (1) initializing an array A with entries \(1 \ldots n\) and (2) shuffling A uniformly at random. This array now defines the partitions. If parts \(1 \ldots k\), have sizes \(|p_1|, \ldots , |p_k|\), then the first \(|p_1|\) elements of A are assigned to part 1, the next \(|p_2|\) elements of A are assigned to part 2, etc. Step (2) of random-partition() involves shuffling an array of size n—the best known serial algorithm for this is Fisher-Yates, which runs in O(n) time (Bacher et al. 2018).
Performance on a symmetric multiprocessor architecture
We compared DC-RST against the serial implementation of Wilson’s algorithm on various combinations of k and n—in real-world settings, we recommend k be at least as large as the number of CPU cores. The function std::uniform_int_distribution was used to generate the random numbers required in wilsons-algorithm(). This code was executed on Colorado School of Mines’ “Isengard” server, which is a 48-core symmetric multiprocessor architecture with over 350 GB of main memory.
We initially implemented step (2) of random-partition() with a standard sequential shuffle. The speedups are shown in Fig. 9. As can be seen, the speedups are far below linear: even with 200 partitions, we fail to achieve even 3x speedup.
After performing a runtime analysis, it was revealed the algorithm spent almost 90% of total execution time on generating a random partition. Intuitively, this makes sense, as Wilson’s algorithm is O(n) on cliques; and a serial shuffle (which is a sub-step of making the partitions) is also O(n)—thus our speedup is entirely from the lower-order terms hidden by the asymptotic notation.
To try and address this, we tried implementing step (2) of random-partition() with a parallel shuffling algorithm instead: MergeShuffle (Bacher et al. 2018), which was chosen because of the availability of an implementation that uses OpenMP. The authors of MergeShuffle suggest it is the current fastest parallel shuffling algorithm.
The speedups are shown in Fig. 10. While this results in an improvement over Wilson’s algorithm, observe that the speedup of DC-RST remains sub-linear w.r.t. the number of processors, with \(k = 200\) partitions achieving a little less than 4x speedup on the largest clique.
To further assess our algorithm, we simply removed the shuffle step from random-partition(). For timing purposes this yields the equivalent of shuffling in 0 time. We called this variation NoShuffle, and re-ran our benchmarks on NoShuffle to compare its performance to the previous results. This is shown in Fig. 11.
As expected, NoShuffle performs significantly better: on some inputs, NoShuffle is more than 20X faster than serial. This illustrates the potential for DC-RST to improve the performance of generating random spanning trees, should a better solution to Shuffle be found or if random partitions are pre-computed.
The results are summarized in Fig. 12. This figure illustrates the speedup of the different variations of DC-RST (std::shuffle, MergeShuffle, and NoShuffle) over serial across the different clique sizes, all at \(k = 200\) partitions. NoShuffle is by far the fastest, followed by MergeShuffle, with std::shuffle coming up last.
Conclusions and discussion
We have described DC-RST, a parallel divide-and-conquer algorithm for creating random spanning trees on a clique. While DC-RST does not result in uniform random spanning trees, we showed experimentally that the impact on the proposed network science application Dimecost is not significant. On a machine with 48 cores, DC-RST achieves 4X speedup when using MergeShuffle, and achieves 20X speedup, when shuffling is not used. This points to the need for a faster parallel implementation of shuffling.
DimeCost used Wilson’s random-walk algorithm to create a uniform random spanning tree in expected O(n) time. In this paper, we similarly used Wilson’s Algorithm to generate a forest of k spanning sub-trees and to connect the spanning forest. We thank an anonymous reviewer for pointing us to an algorithm proposed by Aldous (Algorithm 2 from Aldous (1990)) to obtain a uniform random spanning tree in worst-case \(\Theta (n)\) time for a complete graph. This algorithm does not incur the higher variance associated with Wilson’s random walk. It instead uses n calls to a random-number generator, followed by a random shuffle. The need for a random shuffle at the end will continue to be a bottleneck in a parallel implementation.
Finally, from the perspective of its use in practice, we note that DC-RST only needs the size of the partitions (in order), and returns the list of edges in the computed spanning tree. This means the entire complete graph does not need to first be generated to run DC-RST: instead, only \(n - 1\) rather than \(\approx n^2/2\) queries are needed to determine spanning tree edge-weights to compute the correlation between distance metrics.
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
Notes
Note: distance metrics are assumed to be symmetric.
These can be seen in Fig. 2: the “within” edges are drawn, and all “between” edges are not.
Vieta’s formula is used in the simplification of this sum.
The \(L_p\)-norm for \(p \ge 1\) of a vector \(\textbf{x}\) is a commonly used measure of “distance” in machine learning for clustering, and is defined by \(||\textbf{x}||_p = (|x_1|^p + |x_2|^p + \cdots + |x_n|^p)^\frac{1}{p}\). Note that \(L_\infty (\textbf{x}) = \max \{|x_1|, |x_2|, \ldots , |x_n| \}\) is the limit of the \(L_p\) norm as \(p \longrightarrow \infty\).
References
Aldous DJ (1990) The random walk construction of uniform spanning trees and uniform labelled trees. SIAM J Discrete Math. https://doi.org/10.1137/0403039
Anari N, Hu N, Saberi A, Schild A (2021) Sampling arborescences in parallel. In: Innovations in theoretical computer science
Bacher A, Bodini O, Hollender A, Lumbroso J (2018) Mergeshuffle: a very fast, parallel random permutation algorithm, vol 2113
Bourbour S, Mehta DP, Navidi WC (2020) Improved methods to compare distance metrics in networks using uniform random spanning trees (dimecost). Networks. https://doi.org/10.1002/net.21949
Broder A (1989) Generating random spanning trees. https://doi.org/10.1109/sfcs.1989.63516
Cooper-Ellis S, Pielou EC (1994) The interpretation of ecological data: a primer on classification and ordination. The Bryologist. https://doi.org/10.2307/3243925
Corliss JO, Sneath PHA, Sokal RR (1974) Numerical taxonomy: the principles and practice of numerical classification. Trans Am Microsc Soc. https://doi.org/10.2307/3225339
Dray S, Dufour A-B (2007) The ade4 package: implementing the duality diagram for ecologists. J Stat Softw 22(4):1–20. https://doi.org/10.18637/jss.v022.i04
Kouri TM, Awale M, Slyby JK, Reymond JL, Mehta DP (2014) “Social” network of isomers based on bond count distance: algorithms. J Chem Inf Model. https://doi.org/10.1021/ci4005173
Lovász L (1993) Random walks on graphs: a survey. Combinatorics 2
Mantel N (1967) The detection of disease clustering and a generalized regression approach. Cancer Res 27:209–220
Ricaut FX, Auriol V, Cramon-Taubadel NV, Keyser C, Murail P, Ludes B, Crubézy E (2010) Comparison between morphological and genetic data to estimate biological relationship: the case of the Egyin Gol necropolis (Mongolia). Am J Phys Anthropol. https://doi.org/10.1002/ajpa.21322
Schneider JW, Borlund P (2007) Matrix comparison, part 2: measuring the resemblance between proximity measures or ordination results by use of the Mantel and Procrustes statistics. J Am Soc Inf Sci Technol. https://doi.org/10.1002/asi.20642
Smouse PE, Long JC, Sokal RR (1986) Multiple regression and correlation extensions of the mantel test of matrix correspondence. Syst Zool. https://doi.org/10.2307/2413122
Sokal RR, Rohlf FJ (1962) The comparison of dendrograms by objective methods. Taxon. https://doi.org/10.2307/1217208
Wilson DB (1996) Generating random spanning trees more quickly than the cover time, vol Part F129452. https://doi.org/10.1145/237814.237880
Acknowledgements
We would like to thank the Anonymous Reviewer for suggesting an alternative algorithm to Wilson’s for future consideration
Funding
This research was self-funded, no sources of funding were sought nor received.
Author information
Authors and Affiliations
Contributions
LH and DM conceived and developed the theory underlying the presented idea. LH contributed all of the computer code and conducted all of the experiments; LH wrote the manuscript with support from DM. LH and DM read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Henke, L., Mehta, D. DC-RST: a parallel algorithm for random spanning trees in network analytics. Appl Netw Sci 9, 45 (2024). https://doi.org/10.1007/s41109-024-00613-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41109-024-00613-7