 Research
 Open access
 Published:
DCRST: a parallel algorithm for random spanning trees in network analytics
Applied Network Science volume 9, Article number: 45 (2024)
Abstract
The Mantel Test, discovered in the 1960s, determines whether two distance metrics on a network are related. More recently, DimeCost, an equivalent test with improved computational complexity, was proposed. It was based on computing a random spanning tree of a complete graph on n vertices—the spanning tree was computed using Wilson’s random walk algorithm. In this paper, we describe DCRST, a parallel, divideandconquer random walk algorithm to further speed up this computation. Relative to Wilson’s sequential randomwalk algorithm, on a system with 48 cores, DCRST was up to 4X faster when first creating random partitions and up to 20X faster without this substep. DCRST is shown to be a suitable replacement for the Mantel and DimeCost tests through a combination of theoretical and statistical results.
Introduction
The Mantel Test (Mantel 1967) is a wellestablished statistical test for characterizing the relation between two distance metrics (for example, to determine the relationship between drive times and driving distances in a road network). The Mantel Test and its variants have been used to quantify such relationships and is available in a package for the statistical language R (Dray and Dufour 2007). Schneider and Borlund (2007) have reviewed its use in anthropology, psychology, and geography; see also Sokal and Rohlf (1962), Corliss et al. (1974) and CooperEllis and Pielou (1994). Ricaut et al. (2010) used it to determine whether there is a correlation between genetic and discrete trait proximity matrices for individuals in the Egyin Gol necropolis in Mongolia. Smouse et al. (1986) describes the importance of the Mantel Test in biology, on distance metrics of genetic markers, morphological traits, ecological divergence, etc. Kouri et al. (2014) used it in cheminformatics, to study the relationship between bond count distances and Tanimoto distances.
The Mantel Test entails computing Pearson’s correlation coefficient on an \(n \times n\) matrix for a number of permutations (denoted by t), making the overall computation \(\Theta (t \cdot n^2)\), where t (typically 100) may be considered to be a constant. This computational complexity is prohibitive for modern Big Data applications (social networks, the internet web graph, etc.), where \(n \sim 10^9\). Bourbour et al. (2020) recently presented DimeCost that instead utilizes uniform random spanning trees to achieve an improved complexity of \(\Theta (t \cdot n)\).
This paper goes one step further: a key bottleneck in DimeCost is the computation of a uniform random spanning tree of a complete graph using Wilson’s algorithm; we aim to speed this up by performing this step in parallel. While the generation of uniform random spanning trees in parallel has been studied before, this was only done theoretically (Anari et al. 2021). Our proposed algorithm, DCRST (Divide and ConquerRandom Spanning Tree), has been analyzed theoretically and implemented experimentally, and is shown to be a suitable replacement for sequential random spanning tree generation.
Definitions
Consider a set of n objects and two distance metrics \(d_1\) and \(d_2\), which can be used to compute pairwise distances between all pairs of objects.^{Footnote 1} A distance metric can be interpreted as a symmetric matrix: an \(n \times n\) matrix, say \(D_1\), where \(D_1[i,j]\) is the distance between i and j using the \(d_1\) metric. It may also be viewed as a weighted network: a weighted, undirected, complete graph, \(G_1\), where each vertex corresponds to an object, and an edge from \(v_i\) to \(v_j\) has weight equal to the distance between i and j using the distance metric \(d_1\). Both representations are equivalent: matrix \(D_1\) can be viewed as the weighted adjacency matrix for \(G_1\) as illustrated in Fig. 1.
Mantel test
The Mantel test answers the question: is there a statistically significant relationship between distance metrics \(d_1\) and \(d_2\)? In the road network example, we expect drive times to be closely related to driving distances, and the Mantel test provides a quantitative method for measuring this relationship. Specifically, the Mantel test takes as input the matrices \(D_1\) and \(D_2\) and:

1.
Computes the Pearson’s correlation coefficient (\(r_{init}\)) between \(D_1\) and \(D_2\).

2.
Randomly permutes the rows and columns of \(D_1\) to obtain \(D_1'\).

3.
Compute r between \(D_1'\) and \(D_2\).
Steps 2 and 3 are run t times and the number of times x that \(r > r_{init}\) is recorded. If \(x/t \le p\) (e.g., p could be 0.05), the test asserts that metrics \(d_1\) and \(d_2\) are related (Mantel 1967).
Dimecost
Bourbour et al. (2020) proposed an algorithm called Dimecost, which uses uniform, random spanning trees instead of matrices. Recall that the distance matrix is equivalent to a weighted, undirected, complete graph on n vertices. Dimecost computes a random spanning tree of this graph. \(r_{init}\) is then computed using the edges of the spanning tree and permutations of the \(d_1\) weights are used to perform a test similar to Mantel. Bourbour et al. show this method works better than randomly selecting edges, has a lower complexity than the original Mantel test (\(\Theta (t \cdot n)\) instead of \(\Theta (t \cdot n^2)\)), and results in similar correlation values between distance metrics.
Random walks
A key step in Dimecost uses random walks on the graph to obtain a uniform, random spanning tree. Given an undirected graph \(G = (V, E)\), a spanning tree is any undirected, acyclic, connected subgraph \(T = (V_T, E_T)\) that spans the graph. In general, undirected graphs have many spanning trees; thus, a uniform random spanning tree is a spanning tree, selected at random from the possible set of all spanning trees (where each tree is equallylikely [uniformly] to be selected). Aldous (1990) and Broder (1989) gave equivalent algorithms using a random walk to generate a uniform random spanning tree by tracking the edges traversed when discovering a vertex. Their algorithm has complexity equal to the mean cover time of the graph—for cliques, this is \(O(n \log n)\) (Lovász 1993). Later, Wilson (1996) gave a different algorithm: the key difference is at any point in the random walk, a partial tree T has been created (initially just the starting vertex). To expand T, a random vertex \(v_i\) not in T is selected uniformly at random, and a random walk is started at \(v_i\) until a path (with cycles erased) is found that reaches T. This path is then added to T, forming a larger partial tree. The algorithm terminates when T consists of \(n  1\) edges, i.e., when T is a spanning tree. The complexity of Wilson’s algorithm is equal to the mean hitting time of a graph—for cliques, this is O(n) (Lovász 1993). Dimecost uses Wilson’s algorithm.
Parallel algorithm and analysis
Algorithm outline
We propose the following approach to create random spanning trees.

1
Randomly partition the original clique into k subcliques.

2
Run Wilson’s algorithm on each subclique, forming a forest of k spanning subtrees.

3
“Merge” the subtrees together to form the final tree as follows:

(a)
Consider each subclique to be a “supernode”, mutually connected to form a “supergraph” clique.

(b)
Run Wilson’s algorithm on the “supergraph” to get a “supertree”.

(c)
For each superedge in the supertree, obtain a “real” edge by choosing uniformly at random a vertex from both subtrees and join with an edge.

(a)
Consider a clique of 24 vertices randomly partitioned in Step 1 into four subgraphs of six vertices each, as shown in Fig. 2.
For each of the four subgraphs, Step 2 runs four independent instances of Wilson’s algorithm (one per subgraph) in parallel resulting in a forest of spanning trees (referred to as “subtrees”) shown in Fig. 3. Recall, Wilson’s algorithm generates uniform random spanning trees, thus this forest of spanning trees is just one of many possible forests.
With these subtrees, we now run the “Merge” step of Step 3. Step 3a creates the “supergraph”, which is a clique of 4 vertices as shown in Fig. 4. Step 3b runs Wilson’s algorithm on the supergraph, to form a “supertree”, also shown in Fig. 4.
Step 3c converts each superedge in the supertree to a “real” edge by choosing uniformly at random a vertex from each tree and joining those two with an edge. The example supertree indicates trees 0 and 2 should be connected with an edge, thus a vertex from each tree is chosen at uniform random to be connected (vertex 12 and 3). This process, repeated for the remaining superedges, results in the final tree (new edges in red), shown in Fig. 5.
Theoretical analysis
We first show that our algorithm meets the following necessary condition for a uniform random spanning tree.
Theorem 1
Let \(G = (V, E)\) be an complete, undirected graph (i.e., a clique), with \(n = V\) and \(m = E\). Let \((u, v) \in E\) be any edge in G. Let T be a spanning tree, chosen uniformly at random from all possible spanning trees of G (i.e., T is a uniform random spanning tree).
Then, the probability \((u,v) \in T\) is:
Proof
We prove this by composition:

1
For a clique with n vertices, there are \(n^{n2}\) spanning trees (Cayley’s formula).

2
Each spanning tree has \(n  1\) edges

3
The total number of edges in the graph is:
$$\begin{aligned} m = (n  1) + (n  2) + \cdots + 1 = {n \atopwithdelims ()2} = \frac{n \cdot (n  1)}{2} \end{aligned}$$(2)
Combining, we get:
\(\square\)
We now consider the case where the original graph is partitioned into equalsized disjoint groups, and our proposed algorithm is used to create the final tree T.
Theorem 2
Let \(G = (V, E)\) be a clique, let \((u, v) \in E\) be any edge in G, and let T be a spanning tree our algorithm creates. If the vertices V are partitioned uniformly at random into \(n_1\) groups each of size \(n_2\) (thus \(n = n_1 \cdot n_2\)), then the probability \((u, v) \in T\) is:
Proof
When partitioning, there are two general “types” of edges^{Footnote 2}:

1
Edges within a partition (called “within” edges). Formally: given a subset of vertices \(V_i \in V\), an edge (u, v) is a within edge if and only if \(u \in V_i \wedge v \in V_i\).

2
Edges spanning between partitions (called “between” edges). Formally: given two subsets of vertices \(V_i \in V\), \(V_j \in V, i \ne j\); an edge (u, v) is a between edge if and only if \(u \in V_i \wedge v \in V_j\)
We start by considering each case individually.

1.
Within Edges. Due to equal partitions, \(V_i = n_2 \quad \forall i \in [1, n_1]\); thus,
$$\begin{aligned} P((u, v) \in T  u \in V_i \wedge v \in V_i) = \frac{2}{n_2} \end{aligned}$$(7)This is a corollary from Theorem 1.

2.
Between Edges. Between any pair of subtrees, there are \(n_2^2\) edges (each vertex, of which there are \(n_2\) in each tree, has an edge to every vertex in the other tree). There are \({n_1 \atopwithdelims ()2}\) unique pairs of subtrees, thus the total number of “between” edges is:
$$\begin{aligned} {n_1 \atopwithdelims ()2} \cdot n_2^2 \end{aligned}$$(8)The random spanning tree T has \(n_1  1\) “between” edges (since there are only \(n_1\) subtrees to be connected), thus the probability that \((u, v) \in T\) given that \(u \in V_i \wedge v \in V_j, i \ne j\) is:
$$\begin{aligned} P((u, v) \in T  u \in V_i \wedge v \in V_j) = \frac{n_11}{{n_1 \atopwithdelims ()2} \cdot n_2^2} = \frac{n_11}{n_1 (n_1  1) /2 \cdot n_2^2} = \frac{2}{n_1 \cdot n_2^2} \end{aligned}$$(9)
We now construct the final probability by using these two cases. To do this, we will first calculate the probability two arbitrary vertices \(v_1\) and \(v_2\) are placed into the same or different partitions.

Let \(E_1\) be when \(v_1\) and \(v_2\) are placed into the same partition. Formally, \(u \in V_i \wedge v \in V_i\)

Let \(E_2\) be when \(v_1\) and \(v_2\) are placed into different partitions. Formally, \(u \in V_i \wedge v \in V_j, i \ne j\)
Clearly, \(P(E_2) = 1  P(E_1)\). We now first calculate \(P(E_1):\)
Without loss of generality, suppose u is randomly placed into \(V_i\). Since each partition will be of size \(n_2\), and there are n vertices total, the probability v is also placed into \(V_i\) is \(\dfrac{n_2  1}{n  1}\).
We now combine the two cases above with \(P(E_1)\) and \(P(E_2)\) to solve for the final probability.
\(\square\)
We now consider the case where the vertices V are randomly split into k groups of possibly unequal size, \(g_1\) through \(g_k\). That is:
Theorem 3
Let \(G = (V, E)\) be a clique, let \((u, v) \in E\) be any edge in G, and let T be a spanning tree our algorithm creates. If the vertices V are partitioned uniformly at random into k groups of possibly unequal size, then the probability \((u, v) \in T\) is: \(P((u, v) \in T) = \frac{2}{n}\)
Proof
Follows a similar sequence as the proof for equal partitions.
When partitioning, there are two general “types” of edges:

1
Edges within a partition (called “within” edges). Formally: given a subset of vertices \(V_i \in V\), an edge (u, v) is a within edge if and only if \(u \in V_i \wedge v \in V_i\).

2
Edges spanning between partitions (called “between” edges). Formally: given two subsets of vertices \(V_i \in V\), \(V_j \in V, i \ne j\); an edge (u, v) is a between edge if and only if \(u \in V_i \wedge v \in V_j\)
We start by considering each case individually.

1
Within Edges For any group \(g_i\) which contains vertices \(V_i\):
$$\begin{aligned} P((u, v) \in T  u \in V_i \wedge v \in V_i) = \frac{2}{g_i} \end{aligned}$$(13)This is a corollary from Theorem 1.

2
Between Edges Between any two partitions \(g_i\) and \(g_j\), there are \(g_i \cdot g_j\) edges. Since the groups \(g_1, \ldots , g_k\) are of possibly unequal sizes, we must pairwise multiply the each group and sum to count the total number of “between” edges.
$$\begin{aligned} \sum _{1\le i < j \le k} \left( n_i \cdot n_j\right) \end{aligned}$$(14)By Vieta’s formula, this sum equals the 2nd coefficient, \(a_{k2}\), of the polynomial given in (15).
$$\begin{aligned} f(x) = x^k + a_{k1} x^{k1} + a_{k2} x^{k2} + \cdots + a_1 x^1 + a_0 = (x  g_1) \cdot (x  g_2) \cdot \ldots \cdot (x  g_k) \end{aligned}$$(15)The random spanning tree T has \(k  1\) “between” edges (since there are only k subtrees to be connected), thus probability \((u, v) \in T\) given (u, v) is a “between” edge:
$$\begin{aligned} P((u, v) \in T  (u,v) \text { is a ``between'' edge}) = \frac{k  1}{a_{k  2}} \end{aligned}$$(16)
We now construct the final probability by using these two cases. To do this, we will first calculate the probability two arbitrary vertices u and v are placed into the same or different partitions.

Let \(E_1\) be the event when \(v_1\) and \(v_2\) are placed into the same partition \(g_i\). Formally, \(u \in V_i \wedge v \in V_i\)

Let \(E_2\) be the event when \(v_1\) and \(v_2\) are placed into different partitions \(g_i\) and \(g_j\). Formally, \(u \in V_i \wedge v \in V_j, i \ne j\)
We now calculate \(P(E_1)\) and \(P(E_2)\):
Without loss of generality, suppose u is randomly placed into \(g_i\). Since each \(g_i\) will be of size \(g_i\), and there are n vertices total, the probability v is also placed into \(g_i\) is \(\dfrac{g_i  1}{n  1}\). A similar argument is made for \(P(E_2)\).
We now combine the two cases above with \(P(E_1)\) and \(P(E_2)\) to solve for the final probability.
While the initial step is complicated, it follows directly from the previous work:

1
We enumerate over all pairwise groups, and need to consider two cases \(i = j\) and \(i \ne j\).

2
The first case, \(i=j\), is simply: \(P(E_1) \cdot P((u, v) \in T  E_1)\)

3
The second case, \(i \ne j\) is the same idea: \(P(E_2) \cdot P((u, v) \in T  E_2)\) However, when this is placed in the summation, we need to tweak \(P((u, v) \in T  E_2)\). This is because the solution found previously using Vieta’s formula already accounts for the sum, yet we are starting from the sum again.^{Footnote 3} Therefore, (a) only 1 of the \(g_i \cdot g_j\) edges between \(g_i\) and \(g_j\) is needed, and (b) \(\dfrac{k1}{{k \atopwithdelims ()2}} = \dfrac{2}{k}\) is the probability an edge will even be needed between \(g_i\) and \(g_j\) (i.e., that’s the probability Wilson’s algorithm chooses the “supernodes” \(g_i\) and \(g_j\) are to be connected directly with an edge).
\(\square\)
However, a simple counterexample shows that DCRST does not generate random spanning trees with uniform probability. Specifically, the “star” tree (Fig. 6) will never be generated by DCRST unless \(k = 1\) or \(k = n\) (when DCRST degenerates to Wilson’s algorithm). Thus, DCRST creates a random (but nonuniform) spanning tree of a clique such that edges are chosen with the same probability as a uniform random spanning tree.
Statistical analysis
This raises the following question: if DCRST does not generate uniform random spanning trees, then is DCRST a suitable replacement for Wilson’s algorithm in DimeCost? To answer this, we compared two versions of DimeCost—one using Wilson’s algorithm, the other using DCRST—on three data sets used by Bourbour et al. In each experiment, we use the Mantel Test to compute the correlation coefficient between a pair of distance matrices (r), and compute 95% confidence intervals (CIs) for DimeCost using both Wilson’s algorithm and DCRST for spanning tree generation. We then examine whether the CIs contain r.
Data set 1: Comparison of distance norms
For this first data set, we randomly generated 25 (x, y) points. For each pair of points (i.e., \(\frac{25\times 24}{2} = 300\) pairs of points), we applied four distance metrics: the \(L_1\), \(L_2\), \(L_3\), and \(L_\infty\) norms; considering these pairwise gives six data sets.^{Footnote 4} The results are summarized in Table 1.
The CIs using DCRST differ slightly from those using Wilson’s algorithm, but still contain r from the Mantel Test.
Data set 2: Line graph
For this example, consider a road network represented by a line graph, modeling a long continuous stretch of an interstate highway where each edge represents a highway segment. In the first graph, the weights of all edges (representing travel distances) are equal to 1. In the second graph (representing travel time), the weights of all edges are again 1 except the last edge, which possesses an unusually high weight, 100. This could represent an accident in that segment of the highway causing times, but not distances, to drastically increase. Both graphs are illustrated for the case of \(n = 6\) in Fig. 7.
We then computed distance matrices for both metrics by computing the shortest path distance between each pair of vertices. Corresponding elements in the two matrices have identical values, except the last row and column, which represent edges to the last vertex. The results are summarized in Table 2.
As before, while the CIs using DCRST differ slightly from those computed using Wilson’s algorithm, they still contain the r value calculated by the Mantel Test. This test previously showed that simply choosing random edges from the complete graph does not suffice, as the CIs generated in this example with random edges do not contain the r value calculated by the Mantel Test (Bourbour et al. 2020).
Data set 3: Random
Lastly, we consider the case of two random distance metrics; i.e., the value returned by \(d_1(a, b)\) and \(d_1(b, a)\) is simply a random number (likewise for \(d_2\)). We expect these two distance metrics to be uncorrelated. The results are summarized in Table 3.
Again, the confidence intervals both contain the correlation coefficient calculated by the Mantel Test. These results suggests that DCRST is indeed a suitable replacement for Wilson’s algorithm in DimeCost.
Performance analysis
Implementation details
We implemented DCRST in C++ along with the OpenMP parallel library. The pseudocode is shown below in Fig. 8.
The function randompartition() was implemented by (1) initializing an array A with entries \(1 \ldots n\) and (2) shuffling A uniformly at random. This array now defines the partitions. If parts \(1 \ldots k\), have sizes \(p_1, \ldots , p_k\), then the first \(p_1\) elements of A are assigned to part 1, the next \(p_2\) elements of A are assigned to part 2, etc. Step (2) of randompartition() involves shuffling an array of size n—the best known serial algorithm for this is FisherYates, which runs in O(n) time (Bacher et al. 2018).
Performance on a symmetric multiprocessor architecture
We compared DCRST against the serial implementation of Wilson’s algorithm on various combinations of k and n—in realworld settings, we recommend k be at least as large as the number of CPU cores. The function std::uniform_int_distribution was used to generate the random numbers required in wilsonsalgorithm(). This code was executed on Colorado School of Mines’ “Isengard” server, which is a 48core symmetric multiprocessor architecture with over 350 GB of main memory.
We initially implemented step (2) of randompartition() with a standard sequential shuffle. The speedups are shown in Fig. 9. As can be seen, the speedups are far below linear: even with 200 partitions, we fail to achieve even 3x speedup.
After performing a runtime analysis, it was revealed the algorithm spent almost 90% of total execution time on generating a random partition. Intuitively, this makes sense, as Wilson’s algorithm is O(n) on cliques; and a serial shuffle (which is a substep of making the partitions) is also O(n)—thus our speedup is entirely from the lowerorder terms hidden by the asymptotic notation.
To try and address this, we tried implementing step (2) of randompartition() with a parallel shuffling algorithm instead: MergeShuffle (Bacher et al. 2018), which was chosen because of the availability of an implementation that uses OpenMP. The authors of MergeShuffle suggest it is the current fastest parallel shuffling algorithm.
The speedups are shown in Fig. 10. While this results in an improvement over Wilson’s algorithm, observe that the speedup of DCRST remains sublinear w.r.t. the number of processors, with \(k = 200\) partitions achieving a little less than 4x speedup on the largest clique.
To further assess our algorithm, we simply removed the shuffle step from randompartition(). For timing purposes this yields the equivalent of shuffling in 0 time. We called this variation NoShuffle, and reran our benchmarks on NoShuffle to compare its performance to the previous results. This is shown in Fig. 11.
As expected, NoShuffle performs significantly better: on some inputs, NoShuffle is more than 20X faster than serial. This illustrates the potential for DCRST to improve the performance of generating random spanning trees, should a better solution to Shuffle be found or if random partitions are precomputed.
The results are summarized in Fig. 12. This figure illustrates the speedup of the different variations of DCRST (std::shuffle, MergeShuffle, and NoShuffle) over serial across the different clique sizes, all at \(k = 200\) partitions. NoShuffle is by far the fastest, followed by MergeShuffle, with std::shuffle coming up last.
Conclusions and discussion
We have described DCRST, a parallel divideandconquer algorithm for creating random spanning trees on a clique. While DCRST does not result in uniform random spanning trees, we showed experimentally that the impact on the proposed network science application Dimecost is not significant. On a machine with 48 cores, DCRST achieves 4X speedup when using MergeShuffle, and achieves 20X speedup, when shuffling is not used. This points to the need for a faster parallel implementation of shuffling.
DimeCost used Wilson’s randomwalk algorithm to create a uniform random spanning tree in expected O(n) time. In this paper, we similarly used Wilson’s Algorithm to generate a forest of k spanning subtrees and to connect the spanning forest. We thank an anonymous reviewer for pointing us to an algorithm proposed by Aldous (Algorithm 2 from Aldous (1990)) to obtain a uniform random spanning tree in worstcase \(\Theta (n)\) time for a complete graph. This algorithm does not incur the higher variance associated with Wilson’s random walk. It instead uses n calls to a randomnumber generator, followed by a random shuffle. The need for a random shuffle at the end will continue to be a bottleneck in a parallel implementation.
Finally, from the perspective of its use in practice, we note that DCRST only needs the size of the partitions (in order), and returns the list of edges in the computed spanning tree. This means the entire complete graph does not need to first be generated to run DCRST: instead, only \(n  1\) rather than \(\approx n^2/2\) queries are needed to determine spanning tree edgeweights to compute the correlation between distance metrics.
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
Notes
Note: distance metrics are assumed to be symmetric.
These can be seen in Fig. 2: the “within” edges are drawn, and all “between” edges are not.
Vieta’s formula is used in the simplification of this sum.
The \(L_p\)norm for \(p \ge 1\) of a vector \(\textbf{x}\) is a commonly used measure of “distance” in machine learning for clustering, and is defined by \(\textbf{x}_p = (x_1^p + x_2^p + \cdots + x_n^p)^\frac{1}{p}\). Note that \(L_\infty (\textbf{x}) = \max \{x_1, x_2, \ldots , x_n \}\) is the limit of the \(L_p\) norm as \(p \longrightarrow \infty\).
References
Aldous DJ (1990) The random walk construction of uniform spanning trees and uniform labelled trees. SIAM J Discrete Math. https://doi.org/10.1137/0403039
Anari N, Hu N, Saberi A, Schild A (2021) Sampling arborescences in parallel. In: Innovations in theoretical computer science
Bacher A, Bodini O, Hollender A, Lumbroso J (2018) Mergeshuffle: a very fast, parallel random permutation algorithm, vol 2113
Bourbour S, Mehta DP, Navidi WC (2020) Improved methods to compare distance metrics in networks using uniform random spanning trees (dimecost). Networks. https://doi.org/10.1002/net.21949
Broder A (1989) Generating random spanning trees. https://doi.org/10.1109/sfcs.1989.63516
CooperEllis S, Pielou EC (1994) The interpretation of ecological data: a primer on classification and ordination. The Bryologist. https://doi.org/10.2307/3243925
Corliss JO, Sneath PHA, Sokal RR (1974) Numerical taxonomy: the principles and practice of numerical classification. Trans Am Microsc Soc. https://doi.org/10.2307/3225339
Dray S, Dufour AB (2007) The ade4 package: implementing the duality diagram for ecologists. J Stat Softw 22(4):1–20. https://doi.org/10.18637/jss.v022.i04
Kouri TM, Awale M, Slyby JK, Reymond JL, Mehta DP (2014) “Social” network of isomers based on bond count distance: algorithms. J Chem Inf Model. https://doi.org/10.1021/ci4005173
Lovász L (1993) Random walks on graphs: a survey. Combinatorics 2
Mantel N (1967) The detection of disease clustering and a generalized regression approach. Cancer Res 27:209–220
Ricaut FX, Auriol V, CramonTaubadel NV, Keyser C, Murail P, Ludes B, Crubézy E (2010) Comparison between morphological and genetic data to estimate biological relationship: the case of the Egyin Gol necropolis (Mongolia). Am J Phys Anthropol. https://doi.org/10.1002/ajpa.21322
Schneider JW, Borlund P (2007) Matrix comparison, part 2: measuring the resemblance between proximity measures or ordination results by use of the Mantel and Procrustes statistics. J Am Soc Inf Sci Technol. https://doi.org/10.1002/asi.20642
Smouse PE, Long JC, Sokal RR (1986) Multiple regression and correlation extensions of the mantel test of matrix correspondence. Syst Zool. https://doi.org/10.2307/2413122
Sokal RR, Rohlf FJ (1962) The comparison of dendrograms by objective methods. Taxon. https://doi.org/10.2307/1217208
Wilson DB (1996) Generating random spanning trees more quickly than the cover time, vol Part F129452. https://doi.org/10.1145/237814.237880
Acknowledgements
We would like to thank the Anonymous Reviewer for suggesting an alternative algorithm to Wilson’s for future consideration
Funding
This research was selffunded, no sources of funding were sought nor received.
Author information
Authors and Affiliations
Contributions
LH and DM conceived and developed the theory underlying the presented idea. LH contributed all of the computer code and conducted all of the experiments; LH wrote the manuscript with support from DM. LH and DM read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons AttributionNonCommercialNoDerivatives 4.0 International License, which permits any noncommercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/byncnd/4.0/.
About this article
Cite this article
Henke, L., Mehta, D. DCRST: a parallel algorithm for random spanning trees in network analytics. Appl Netw Sci 9, 45 (2024). https://doi.org/10.1007/s41109024006137
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41109024006137