Correcting a Nonparametric Two-sample Graph Hypothesis Test for Graphs with Different Numbers of Vertices with Applications to Connectomics

Random graphs are statistical models that have many applications, ranging from neuroscience to social network analysis. Of particular interest in some applications is the problem of testing two random graphs for equality of generating distributions. Tang et al. (2017) propose a test for this setting. This test consists of embedding the graph into a low-dimensional space via the adjacency spectral embedding (ASE) and subsequently using a kernel two-sample test based on the maximum mean discrepancy. However, if the two graphs being compared have an unequal number of vertices, the test of Tang et al. (2017) may not be valid. We demonstrate the intuition behind this invalidity and propose a correction that makes any subsequent kernel- or distance-based test valid. Our method relies on sampling based on the asymptotic distribution for the ASE. We call these altered embeddings the corrected adjacency spectral embeddings (CASE). We also show that CASE remedies the exchangeability problem of the original test and demonstrate the validity and consistency of the test that uses CASE via a simulation study. Lastly, we apply our proposed test to the problem of determining equivalence of generating distributions in human connectomes extracted from diffusion magnetic resonance imaging (dMRI) at different scales.

or organizations, and edges being representing the degree of communication between them (Wasserman and Faust 1994).
The first random graph model was proposed in 1959 by E. N. Gilbert.In his short paper, he considered a graph in which the probability of an edge between any two vertices was a Bernoulli random variable with common probability p (Gilbert 1959).Almost concurrently, Erdös and Rényi developed a similar random graph model with a constrained number of edges that are randomly allocated in a graph.They also provided a detailed analysis of the probabilities of the emergence of certain types of subgraphs within graphs developed both by them and Gilbert (Erdös and Rényi 1960).Nowadays, the graphs in which edges arise independently and with common probability p are known as Erdös-Rényi (ER) graphs.
Latent position random graph models consitute a diverse class of random graph models that are much more flexible than the ER model.A vertex in a latent position graph is associated with an element in a latent space X , and the probability of an edge between any two vertices is given by a link function g : X × X → [0, 1] (Hoff et al. 2002).The model draws inspiration from social network analysis, in which the members are thought of as vertices, and the latent positions are differing "interests".Latent position random graphs are a submodel of the independent edge graphs, that is, graphs in which the edge probabilities are indpendent, conditioned on a matrix of probabilities.The theory of latent positions graphs is also closely related to that of graphons (Lovász 2012); for discussion on this relationship, see, for example, Lei (2018) or Rubin-Delanchy (2020).
One example of latent position graphs relevant to this discussion is the random dot product graph (RDPG).An RDPG is a latent position graph in which the latent space is an appropriately constrained Euclidian space R d , and the link function is the inner product of the d-dimensional latent positions (Athreya et al. 2018).Despite their relative simplicity, suitably high-dimensional RDPGs can provide useful approximations of general latent position and independent edge graphs, as long as their matrix of probabilities is positive semidefinite (Tang et al. 2013).
The well-known stochastic blockmodel (SBM), in which each vertex belongs to one of K blocks, with connection probabilities determined solely by block membership (Holland et al. 1983), can be represented as an RDPG for which all vertices in a given block have the same latent positions.Furthermore, common extensions of SBMs, namely degree-corrected SBMs (Karrer and Newman 2011), mixed membership SBMs (Airoldi et al. 2008), and degree-corrected mixed membership SBMs (Jin et al. 2017) can also be framed as RDPG.There is, however, a caveat, similar to the one for approximating independent edge graphs with RDPG: only SBM graphs with positive semidefinite block probability matrix can be formulated in the context of RDPG.Rubin-Delanchy et al. (2022) present a generalization of RDPGs, called the generalized random dot product graph (GRDPG) that allows to drop the positive semidefiniteness requirements in both cases.Although the generalization of many estimation and inference procedures from RDPGs to GRDPGs is straightforward, their theory, particularly of latent distribution testing, is not yet as developed as that of RDPG.Thus, we limit the scope of this work to RDPG.
The problem of whether the two graphs are "similar" in some appropriate sense arises naturally in many fields.For example, two different brain graphs may be tested for the similarity of the connectivity structure (Chung et al. 2022), or user behavior may be compared between different social media platforms.Testing for similarity also has applications in more intricate network analysis techniques, such as hierarchical community detection (Lyzinski et al. 2017;Li et al. 2020).Despite the multitude of applications, network comparison is a relatively nascent field, and comparatively few techniques currently exist (Lyzinski et al. 2017).There have been several tests assuming the random graphs have the same set of nodes, such as Tang et al. (2017); Levin et al. (2017); Ghoshdastidar et al. (2017); Li and Li (2018); Levin and Levin (2019), and Arroyo et al. (2021).Other approaches designed for fixed models and related problems, include Rukhin and Priebe (2011); Asta and Shalizi (2015); Lei (2016); Bickel and Sarkar (2016); Maugis et al. (2020); Chen and Lei (2018); Gangrade et al. (2019) and Fan et al. (2022), to name a few.In Ghoshdastidar et al. (2020), the authors formulate the two-sample testing problem for graphs of different orders more generally.
Of particular interest is Tang et al. (2017), in which the authors propose a nonparametric test for the equality of the generating distributions for a pair of random dot product graphs.This test does not require the graphs to have the same set of nodes or be of the same order.It relies on embedding the adjacency matrices of the graphs into Euclidean space, followed by a kernel two-sample test of Gretton et al. (2012) performed on these embeddings.The exact finite-sample distribution of the test statistics is unknown, but it can be estimated using a permutation test, or approximated using the χ 2 -distribu- tion.Unfortunately, despite the theorem stating that in the limit, even for graphs of differing orders, the statistic using the two embeddings converges to the statistic obtained using the true but unknown latent positions, the test is not always valid for finite graphs of differing orders.
The invalidity arises from the fact that the approximate finite-sample variance of the adjacency spectral embedding depends on the number of vertices (Athreya et al. 2016).Hence, the distributions of the estimates of the latent positions for the two graphs might not be the same, even if the true distributions of the latent positions are equivalent.The test of Gretton et al. (2012) is sensitive to the differences induced by this incongruity and as a result may reject more often than the intended significance level.In this work, we present a method for modifying the embeddings before computing of the test statistic.Using this correction makes the test for the equivalence of latent distributions valid even when the two graphs have an unequal number of vertices.
The remainder of the paper is structured as follows.In Sect.2, we review the random dot product graph, and discuss its relationship with Erdös-Rényi, stochastic blockmodel and other random graph models.We also discuss results associated with the adjacency spectral embedding of an RDPG, such as consistency for the true latent positions and asymptotic normality, and we review the original nonparametric two-sample hypothesis test for the equality of the latent distributions.There we also briefly discuss generalizing the ASE procedure to weighted and/or directed graphs.Then, in Sect.3, we give an intuition as to why this test increases in size as the orders of the two graphs diverge from each other.We also present our approach to correcting the adjacency spectral embeddings in a way that makes them exchangeable under the null hypothesis of the test for the equivalence of the latent distribution.We demonstrate the validity and consistency of the test that uses the corrected adjacency spectral embeddings across a variety of settings in Sect. 4. In Sect.5, we demonstrate that failing to correct for the difference in distributions can lead to significant inferential consequences in real world applications, such as the setting of brain graphs obtained from diffusion magnetic resonance imaging (dMRI).Furthermore, we show that the test is able to meaningfully differentiate between scans within the same subject and different subjects.We conclude and discuss our findings in Sect.6.

Notation
We use the terminology "order" for the number of vertices in a graph.We denote scalars by lowercase letters, vectors by bold lowercase letters and matrices by bold capital letters.For example, c is a scalar, x is a vector, and H is a matrix.For any matrix H , we let H ij denote its i, jth entry.For ease of notation, we also denote H i to be the column vec- tor obtained by transposing the i-th row of H . Formally, H i = (H i• ) T .In the case where we need to consider a sequence of matrices, we will denote such a sequence by H (n) , where n is the index of the sequence.Whether a particular scalar, vector or a matrix is a constant or a random variable will be stated explicitly or be apparent from context.Unbold capital letters denote sets or probability distributions.For example, F is a probability distribution.The exception to this rule is K which is always used to denote the number of blocks in a stochastic blockmodel.

Models
We begin by defining random dot product graphs.
Definition 1 (d-dimensional random dot product graph (RDPG)) Let F be a distribution on a set X ⊂ R d such that �x, x ′ � ∈ [0, 1] for all x, x ′ ∈ X .We say that (X, A) ∼ RDPG(F , n) is an instance of a random dot product graph (RDPG) if X = [X 1 , . . ., X n ] T with X 1 . . . ., X n iid ∼F and A ∈ {0, 1} n×n is a symmetric hollow matrix whose entries in the upper triangle are conditionally independent given X and satisfy We refer to X 1 . . . ., X n as the latent positions of the corresponding vertices.
Remark 2 Nonidentifiability is an intrinsic property of random dot product graphs.For any matrix X and any orthogonal matrix W , the inner product between any rows i, j of X is identical to that between the rows i, j of XW .Hence, for any probability distri- bution F on X and orthogonal operator W , the adjacency matrices A and B , generated according to (X, A) ∼ RDPG(F , n) and (Y , B) ∼ RDPG(F • W , n) , respectively, are iden- tically distributed.Here, the notation Constraining all latent positions to a single value leads to an Erdös-Rényi (ER) random graph.
Definition 2 (Erdös-Rényi graphs (ER)) We say that a graph (X, A) ∼ RDPG(F , n) is an Erdös-Rényi (ER) graph with an edge probability p 2 if F is a pointmass at p.In this case, we write A ∼ ER(n, p 2 ).
Another random graph model that can be framed in the context of random dot product graphs is the stochastic blockmodel (SBM) (Holland et al. 1983).In the SBM, the vertex set is thought of as being partitioned into K groups, called blocks, and the probability of an edge between two vertices is determined by their block memberships.The partitioning, or assignment, of the vertices is usually itself random and mutually independent.Formally, we can define SBMs in terms of the RDPG model as follows.
Definition 3 [(Positive semidefinite) stochastic blockmodel (SBM)] Denote δ z as the Dirac delta measure at z .We say that a graph (X, , and the distinct latent posi- tions are given by Z In this case we also write A ∼ SBM(n, π , P) , where P := ZZ T ∈ R K ×K .The matrix P is often referred to as block probability matrix.
Remark 3 We note that almost everywhere below we use the terms SBM and positive semidefinite SBM interchangeably, as only positive semidefinite block probability matrices can be represented as a product of a matrix of latent positions with transpose of itself, and thus only they can be defined in terms of the RDPG model.We emphasize, however, that the work of Rubin-Delanchy et al. (2022) on the generalized random dot product (GRDPG) extends the construction of RDPG via the indefinite inner product to encompass indefinite SBM and the generalizations thereof.
There are two common generalizations of the stochastic blockmodel: degree-corrected stochastic blockmodel (Karrer and Newman 2011) and mixed-membership stochastic blockmodel (Airoldi et al. 2008).We present these two models below.The presentations are different from the ones many readers may be familiar with because we present them under the RDPG framework.These definitions coincide with the ones in literature, as covered in Lyzinski et al. (2014);Rubin-Delanchy et al. (2017, 2022).
The degree-corrected stochastic blockmodel allows for vertices within each block to have different expected degrees, which makes it more flexible than the standard SBM and a popular choice to model network data (Karrer and Newman 2011;Lyzinski et al. 2014).
Definition 4 (Degree-corrected SBM) We say that a graph (X, A) ∼ RDPG(F , n) is a degree-corrected SBM (DCSBM) with K blocks, if there exists a distribution F m , which (1) is a mixture of K point-masses Z 1 , . . ., Z K , as in Definition 3, and a distribution F c on [0, 1], such that for all X i , there exists Y i ∼ F m and c i ∼ F c , such that That is, any latent position of a vertex in a DCSBM graph can be decomposed into a point Y i , chosen among one of the K shared points Z 1 , . . ., Z K , and a scalar c i .Note that there is no requirement on Y i and c i to be independent from each other.In other words, the distributions on degree corrections can depend on the block assignments.In essence, the DCSBM generalizes the SBM from an RDPG with a distribution of latent positions over a finite number of points to an RDPG with a distribution of latent positions over a finite number of rays from the origin.Of course, not every point on these rays needs to be in the support of this distribution.Restraining F c to a point-mass at unity recovers the regular SBM.See left and central panels of Fig. 1 for a visualization comparing the latent distributions of SBM and DCSBM.
On the other hand, the mixed membership SBM offers more flexibility in block memberships by allowing each vertex to be in a mixture of blocks (Airoldi et al. 2008).
Definition 5 (Mixed-membership SBM) Denote d×1 to be the space of the (d + 1) -dimensional column vectors starting at the origin and terminating in the d-dimensional unit simplex.We say that a graph (X, That is, any latent position of a vertex in an MMSBM is a convex combination of K shared points, Z 1 , . . ., Z K .The MMSBM generalizes the SBM from an RDPG with latent positions coming from a finite-dimensional mixture of point-masses to an RDPG with latent positions having a distribution over a convex hull formed by a finite number of points.See left and right panels of Fig. 1 for a visualization of thereof.Once again, the whole convex hull needs not be in the support of this distribution.If one constrains F m to only have support on a finite set of vectors with 1 in a single entry and 0 in all other, F m collapses to a distribution of point-masses and the model agrees exactly with SBM.Remark 4 For graphs with one-dimensional latent positions, any RDPG model is both a DCSBM with a single block and an MMSBM with two blocks.To see this, note that the latent positions all take values in [0, 1] (or equivalently [−1, 0] ).This region can be thought of as either a single line segment starting from the origin or as a one-dimensional convex hull between 0 and 1.
Remark 5 Jin et al. (2017) introduced a model that has both the degree heterogeneity of the DCSBM and the flexible memberships of MMSBM.This model can also be formulated in terms of the RDPG.See, for example, Definition 6 of Agterberg et al. (2020).
We reiterate that the SBM with K blocks is a submodel of both the K-block DCSBM and the K-block MMSBM.Furthermore, both the K-block DCSBM and the K-block MMSBM are submodels of an RDPG with latent positions in at most K dimensions.Hence, any test for the equality of the latent distributions that is consistent in the RDPG setting will be able to meaningfully distinguish between two graphs generated from two different model subspaces, or between graphs from the same model subspace but with different parameters; for example, between a MMSBM and an SBM, or between two SBMs with different block-probability matrices.

Adjacency spectral embedding
Inference on random dot product graphs relies on having good estimates of the latent positions of the vertices.One way to estimate the latent positions is to use the adjacency spectral embedding of the graph, defined as follows.It has been proven in Sussman et al. (2012Sussman et al. ( , 2014) ) and Lyzinski et al. (2014) that the adjacency spectral embedding provides a consistent estimate of the true latent positions in random dot product graphs.The key to this result is tight concentrations, in both Frobenius and 2 → ∞ norms, of the ASE about the true latent positions.Athreya et al. (2016) show that for a d-dimensional RDPG with i.i.d.latent positions, the ASE is not only consistent, but also asymptotically normal, in the sense that there exists a sequence of d × d real orthogonal matrices W (n) such that for any row index i, converges to a (possibly infinite) mixture of multivariate normals.n) , A (n) ) ∼ RDPG(F , n) be a sequence of latent positions and associated adjacency matrices of a d-dimensional RDPG according to a distribtuion F in an appropriately constrained region of R d .Also let X(n) be the adjacency spectral embedding of A (n) into R d .Let �(z, ) denote the cumulative dis- tribution function for the multivariate normal, with mean zero and covariance matrix , evaluated at z .Then there exists a sequence of orthogonal d × d matrices W (n) ∞ n=1 such that for each component i and any z ∈ R d , where and = E X 1 X T 1 is the second moment matrix.
An intuitive way to restate this result is by identifying that each row Xi of the ASEs X is approximately normal around the true but unknown realization of the latent position of the vertex: where W is an orthogonal matrix present due to the inherent orthogonal nonidentifi- ability of the RDPG.
In our work, we will need to estimate the covariance matrix (X i ) .The plug-in principle (Bickel and Doksum 2006) states that one acceptable method of estimating (X i ) is to use the analogous empirical moments: where When we are presented with two or more RDPGs that have the same distribution for their latent positions, either by assumption or by prior knowledge, we can leverage this fact and calculate the moments over all graphs at the same time.Conceptually this is similar to using pooled variance in classical one-dimensional two-sample inference.
A corollary of the previous result arises when (X, A) ∼ RDPG(F , n) is a K-block stochastic blockmodel.Then, we can condition on the event that X i is assigned to a block k ∈ {1, 2, . . ., K } to show that the conditional distribution of X(n) W (n) − X (n)   converges to a multivariate normal.
Corollary 2.2 Assume the setting and notation of Theorem 2.1.Further, assume that (X, A) ∼ RDPG(F , n) is a positive definite stochastic blockmodel, that is, F is a mixture of K point masses Z 1 , . . ., Z K , as per Definition 3. Then there exists a sequence of orthogonal matrices W n such that for all z ∈ R d and for any fixed index i, Consequently, the unconditional limiting distribution in this setting is a mixture of K multivariate normals (Athreya et al. 2016).
Remark 6 As a special case of Corollary 2.2, we note that if A ∼ ER(n, p 2 ) , then the adjacency embedding of A , X , satisfies The directed, the weighted, and the unknown dimension ASE Although the theory of the ASE and the nonparametric test is predominantly developed of the setting of undirected unweighted graphs with an assumed known distribution of the true latent dimension, the real world datasets often require us to relax those assumptions.This will be the case in our Sect.5 which will present an illustrative example using the dMRI dataset.Graphs in this dataset are weighted and directed, and have an unknown true distribution of the latent dimension which requires having a modification to the procedure and interpretation described previously.These modifications to the statistical procedures involving ASE of the RDPGs that are not unweighted or/nor undirected are described in more details in Sect.6.3 of Athreya et al. (2018) where authors apply a clustering algorithm to a dataset of the larval drosophila mushroom body connectome which is a directed graph on four neuron types.
The presence of weights changes the interpretation of the embeddings, as the inner product no longer represents a probability of an edge, but does not require modifications to any of the algorithmic procedures.We do, however, need to define a special adjacency spectral embedding of a directed graph, as the adjacency matrix is no longer symmetric and thus does not have an eigendecomposition.
The scaled left-singular vectors U 1/2 can be thought of as the "out-vector" rep- resentation of the directed graph, and similarly, V 1/2 can be interpreted as the "in- vectors" (Athreya et al. 2018).The subsequent inference generally does not differ in any way after obtaining the ASE of the directed graph.
The "optimal" dimension d (or 2d in a directed case) to embed into is often unknown and must be estimated.In general, identifying the "best" method is impossible, as the bias-variance tradeoff demonstrates that, for small n, subsequent inference may be optimized by choosing a dimension smaller than the true signal dimension, see Jain et al. (2000) for a clear and concise illustration of this phenomenon.For a brief discussion of methods applicable to this problem in the graph embedding setting, see Sect.6.3 of Athreya et al. (2018).In our work, we elect to use the automated profile likelihood-based single value thresholding method of Zhu and Ghodsi (2006) when the true dimension is unknown (i.e.Sect.5).In the cases when the optimal dimensions of the two graphs being compared are not equal, we pick the larger of the two.For our simulation study in Sect. 4 we assume that the true dimension is known apriori.

Nonparametric latent distribution test
→ 0, and |T n,m (X, Y W )| → 0 as n, m → ∞ , where W is any orthogonal matrix such that F = G • W .In addition, under the alternative hypothesis F = G • W for any orthogonal matrix W ∈ R d×d that is dependent on F and G but independent of m and n, we have Simply said, the authors propose using a test statistic that is a kernel-based function of the latent position estimates obtained from the ASE and show that it converges to the test statistic obtained using the true but unknown latent positions under both null and alternative hypotheses.
Together with the work of Gretton et al. (2012) on the use of maximum mean discrepancy for testing the equivalence of distributions, this result offers an asymptotically valid and consistent test.Formally, this means that for two arbitrary but fixed distributions F and G, T n,m ( X, Ŷ ) → 0 as n, m → ∞ if and only if F = G (up to W ). Such a result requires appropriate conditions on the kernel function κ which are satisfied when κ is a Gaussian kernel, κ g , defined as with any fixed bandwidth σ 2 (Lyzinski et al. 2017).
The intuition behind the maximum mean discrepancy two-sample test is the following.Under some conditions, the population difference between the average values of the kernel within and between two distributions is zero if and only if the two distributions are the same.Hence, using a sample test statistic that is consistent for the this difference and rejecting for the large values thereof leads to a consistent test.
No closed form of the finite-sample distribution of this test statistic is known, for graphs or in the general setting, so it is not immediately clear how to calculate the critical value given a significance level α .The authors of Tang et al. (2017) propose using permutation resampling in order to approximate the distribution of the test statistic under the null.The permutation version of the test is computationally expensive, but practically feasible.Alternatives to the permutation test include using a χ 2 asymptotic approximations (Gretton et al. 2012).

Source of the nonvalidity
The limiting result in the previous section should, however, be taken with caution for graphs of finite order.Even though the ASE estimates converge to the true latent positions, and the test statistic using the estimates converges to the one using the true values, for any finite n and m there is still variability associated with these estimates as described by Theorem 2.1.
When the graphs are of the same order, the variability introduced by the estimates instead of the true latent positions is the same for the two graphs.Hence, the two embeddings have equal distributions under the null hypothesis, up to orthogonal nonidentifiability.This leads to a valid and consistent test, as demonstrated experimentally in both Tang et al. (2017) and our Sect.4.However, recall that the approximate finite-sample distribution of the ASEs has variance that depends on the number of vertices.Suppose that we have a graph of order n, with adjacency matrix A generated according to (A, X) ∼ RDPG(F , n) and a graph of order m, with adjacency matrix B generated according (B, Y ) ∼ RDPG(G, m) .From the central limit result stated above, the distributions of the ASEs of the two graphs, conditioned on the true latent positions, are where W X and W Y are orthogonal matrices present due to the model-based orthogonal nonidentifiablity.The unconditioned distributions of the ASEs are not equal whenever m = n , even if X i and Y i have the same distribution, i.e. even if F = G .Thus, as long as the graphs are not of the exact same order, the collection X1 , . . ., Xn , Ŷ 1 , . . .Ŷ m is not exchangeable under the null hypothesis, even up to orthogonal nonidentifiability.This places the distributions of the ASEs of two graphs of different order in the alternative of the kernel-based test of Gretton et al. (2012), despite the fact that the distributions of the true latent positions would fall under the null.In many cases, the subsequent kernelbased test is sensitive enough to pick up these differences in distributions, which makes the size of the test grow as the sample sizes diverge from each other.Consider the following simple example.Suppose that the graphs have distributions A ∼ ER(n, p 2 ) and B ∼ ER(m, p 2 ) .Then, the distributions of the ASEs become up to an orthogonal nonidentifiablity, which in a single dimension is just a sign flip.
A visualization of this specific case with parameters n = 500, m = 50 , and p = 0.8 is provided in Fig. 2. The ASEs have substantially different distributions from each other, despite the identical distributions of the true latent positions.As will be demonstrated in Sect.4, in this case the nonparametric test developed by Gretton et al. (2012) and employed by Tang et al. (2017) rejects more often than the significance level α , as it should.
Indeed, the test of Gretton et al. ( 2012) cannot be used directly on the adjacency spectral embeddings of two graphs of different order to test for the equivalence of the distributions of the latent positions, as it is not valid.

Corrected adjacency spectral embeddings
We propose modifying the adjacency spectral embeddings of one of the graphs by injecting appropriately scaled Gaussian noise.The noise inflates the variances of the ASE of the larger graph to approximately the same value as the smaller graph and makes the latent positions exchangeable under the null hypothesis. (3) Definition 8 (Corrected Adjacency Spectral Embedding) Consider two d-dimensional random dot product graphs (A, X) ∼ RDPG(F , n) and (B, Y ) ∼ RDPG(G, m) .Without loss of generality, assume that n > m .For every row in the adjacency spectral embedding of the larger graph, Xi , consider estimating its variance using the plug-in estimator from Eq. 2, and then sampling a point ǫ Xi ∼ N (0, 1 m − 1 n �( Xi )) .For every row in the adja- cency spectral embedding of the smaller graph, Ŷ j , define ǫ Ŷ j := 0 .Let X i = Xi + ǫ Xi for all i and Y j = Ŷ j + ǫ Ŷ j for all j.We denote the matries whose rows consist of these new vectors X and Y , respectively, and we call them the corrected adjacency spectral embeddings (CASE).The corrected adjacency spectral embeddings of two graphs of the same order are equal to the standard adjacency spectral embeddings.
The motivation for the preceding definition is as follows.Recall that we have assumed without the loss of generality that n > m .Conditioned on the true latent positions, the rows of the corrected adjeacency spectral embeddings have distributions that are given by Unlike Eq. 3, these distributions are approximately the same, up to orthogonal transformations W X and W Y .This is true regardless of the ratio of graph orders, as long the true latent positions X i , Y i have the same distribution and ˆ is a good estimator of .As an illustrative example, we revisit the ER ilustration from the previous section.A (5) visualization of the theoretical and simulated CASEs of two ER graphs with vastly different orders is presented in Fig. 3.Both the theoretical and the simulated corrected embeddings have the same distribution.Hence, the corrected adjacency spectral embeddings can be used as inputs to the latent distribution test of Tang et al. (2017).We note that due to the exact equivalence of the maximum mean discrepancy test of Gretton et al. (2012), the Energy distance two-sample test (Székely and Rizzo 2013), the Hilbert-Schmidt independence criterion (Gretton et al. 2007), and distance correlation (Székely et al. 2007;Székely and Rizzo 2014) test for independence, any of these four can be used as a subsequent test interchangeably (Shen and Vogelstein 2021;Panda et al. 2021).In the case of the latter two of the four, one first has to concatenate the two embeddings, define an auxiliary label vector, and then perform the independence test.For more on this procedure, sometimes called k-sample transform, see Shen and Vogelstein (2021).
It may also be possible to use other independence tests framed as two-sample tests to test for the equivalence of the latent distributions after the embeddings have been obtained and corrected.Such tests include, but are not limited to RV (Escoufier 1973;Robert and Escoufier 1976) which is the multivariate generalization of the Pearson correlation (Pearson 1895), canonical component analysis (Hardoon et al. 2004), and multiscale graph correlation (Lee et al. 2019;Shen et al. 2020).The power of the multiscale graph correlation against some alternatives has been studied in the graph setting in Chung et al. (2022).However, no theoretical guarantees, at least known to us, have been established in the graph setting for any of these tests.

Simulation study
We conduct a simulation study comparing the latent distribution tests that use regular and corrected ASEs.We use graphs generated from the ER, SBM and RDPG models in our experiments.However, we always estimate the variances of the ASE using the generic plug-in estimator for the RDPG model, provided in Eq. 2. That is, we do not use the knowledge that the latent distribution is truly a point-mass, or a mixture thereof, anywhere in our experiments.
The implementation of the latent distribution test used in this simulation study is incorporated into graspologic (Chung et al. 2019) Python package, both for ASE and CASE.This implementation exploits the exact equivalence with independence tests described above.Code that is compatible with the latest version of graspologic and can be used to reproduce all of the simulations is available at https:// github.com/ alyak in314/ corre cting-nonpar.
We set the number of permutations used to generate the null distribution to 200.For a task like this, it is quite common to use a gaussian kernel with a bandwith selected using a median heuristic (Garreau et al. 2017), which in practice might be more sensetive than most arbitrarily chosen constant bandwidths.However since the theoretical result holds only a fixed kernel, we chose to use a Gaussian kernel with a fixed bandwidth σ = 0.5 throughout our experiments.

Erdös-Rényi graphs: validity and consistency
We generate pairs of graphs from the null hypothesis of the test: A ∼ ER(n, p 2 ) and B ∼ ER(m, p 2 ) with m = cn .We consider different ratios of the graphs orders c ∈ {1, 2, 5, 7, 10, } , and different smaller graph orders n ∈ {50, 100, 200, 300, 400, 500} .We use the latent position p = 0.8 , which corresponds to the Erdös-Rényi graphs with the edge probability of 0.64.We always embed the graphs into one dimension and we overcome orthogonal nonidentifiability by flipping the signs of the ASE of a graph if their median is negative.1000 Monte-carlo replications are used for each of combination of c and n tested.
We set α to 0.05 and report the sizes of the test in Fig. 4. The size of the test that use the standard ASE grows as a function of c rendering it invalid for graphs of different sizes.The size of the test that uses the CASE remains below 0.05 across all choices of c and n considered.In general, the size of the permutation tests should be exactly α .However, due to the intricate dependence behavior of the graph spectral embeddings (Athreya et al. 2016;Tang et al. 2022), the tests ends up being conservative.The extent to which the test is conservative is dependent on the model from which the graphs were generated, and thus cannot be easily corrected.The scope of this work is limited to correcting the invalidity phenomenon and not the conservatism of this test.
We also study the behavior of the test under the alternative hypothesis in order to assess its power.We use the alternative hypothesis A ∼ ER(n, p 2 ) and B ∼ ER(m, q 2 ) , with p = 0.8 and q = 0.79 and m = cn for various ratios c.We again consider the graph order ratios c ∈ {1, 2, 5, 7, 10} , and smaller graph orders n ∈ {50, 100, 200, 300, 400, 500} .For c = 1 , CASE overlaps exactly with the standard ASE, so the testing procedure is the same as the original test of Tang et al. (2017).For all other choices of c, the original test is not valid, and is thus omitted from study.
The results of this study are presented in Fig. 5.The power of the test goes to one as the sample size increases for all choices of c used, which suggests that the test that uses CASE is still consistent.We note that for any given n, the power of the test grows as c grows; this behavior is expected, since the number of vertices in one graph is held constant and the number of vertices in the other increases, so the total number of observations grows.

Stochastic block model graphs: higher dimensions
We repeat the validity and consistency experiments, but use 3-block SBMs, instead of ER graphs.In all simulations we use the vector of prior probabilities π = [0.4,0.3, 0.3] T .To estimate size, we use graphs A ∼ SBM(n, π, P) and B ∼ SBM(m, π , P) , where the block-probability matrix P = ZZ T is obtained using the matrix of latent positions Z is being parametrized by spherical coordinates where r = 0.9 , θ = [0, 0.2, 0.4, 0.5] T , and ω = [0.00,0.10, 0.05, 0.05] T (the fourth coordi- nate will become relevant for the evaluation of power).Numerically, and Exactly as the one-dimensional case, we constrain m = cn and consider the graph order ratios c ∈ {1, 2, 5, 7, 10} , and smaller graph orders n ∈ {50, 100, 200, 300, 400, 500} .We always embed into the true dimension d = 3 .We overcome orthogonal nonidentifiabil- ity by aligning the medians of the embeddings to be in the same quadrant by flipping all of the signs on one of them if they do not match.The size of the tests at α = 0.05 is presented in Fig. 6.Similarly to the one-dimensional setting, the size of the test that uses standard ASE grows as a function of c, but is unaffected for the test that uses CASE.
To estimate power and demonstrate consistency in higher dimensions we use a pair of graphs A and B, generated from SBM(n, π, P) and SBM(m, π, P ′ ) , respectively, where P is as defined above, and  Also, observe that F z is nothing more than the latent distribution of a two-block SBM in a single dimension with a block-probability matrix whereas F x and F y can be thought as latent distributions of either DCSBMs or MMS- BMs, as per Remark 4. Thinking of them as MMSBMs with Z = [0.6,0.3] T , the param- eter a can be viewed as a mixing coefficient: F x has a lot of mixing, F y has some mixing, and F z has two components completely separated.
First, we consider graphs A and B generated from (X, A) ∼ RDPG(F , n) , and (Y , B) ∼ RDPG(F , m) , with m = cn This setting is in the null hypothesis of the latent distribution test.Unlike the previous experiment settings, we set c to a single value of 10, and instead vary the distributions of the latent positions.We consider F ∈ {F x , F y , F z } .The number of vertices of the smaller graph, n, is once again varied to be {50, 100, 200, 300, 400, 500} .We generate 1000 pairs of graphs for each of the possible settings, and use both a test that uses ASE and a test that uses CASE.Like before, we overcome orthogonal nonidentifiability by aligning the medians of the embeddings via flippting signs.
Results of this simulation are presented in the Fig. 9. Observe that the test that uses ASE is not valid for all three distributions of the latent positions, which is especially clear and increases when the smaller graph has less mixing.Our conjecture is that the increase of in power with the correction can happen if the distribution of the latent positions of the smaller graph has smaller variance, as happens in the cases where the smaller graph has more mixing.In over words, the difference in variance due to the inherent differences in latent distributions is partially compensated by the difference in variance due to estimation, which leads to less powerful test if the correction is not used.Thus, using the uncorrected version of the test can both lead to incorrect inference under the null hypothesis, and a less sensitive inference under some alternatives.

Real world application
We demonstrate an application of this testing procedure to a real world dataset of human connectomes.A connectome, also known as a brain graph (Chung et al. 2021), represents the brain as a network with neurons (or collections thereof ) as vertices, and synapses (or structural connections) as edges.For this demonstration, the raw data is collected by diffusion magnetic resonance imaging (dMRI), which can represent the structural connectivity within the brain.(Yang et al. 2019) This example is predominantly included as an illustrative example of the applicability of the test to the setting and consistency of its results with a natural intuition.It should not be treated as an imaging study to draw conclusions about the dataset.The macro-scale connectomes are estimated by NeuroData's MRI to graphs (NDMG) pipeline (Kiar et al. 2018), which is designed to produce robust and biologically plausible connectomes across studies, individuals, and scans.The vertices of the graph represent regions of interest identified by spatial proximity, and the edges of the graph represent the connection between regions via tensor-based fiber streamlines.Specifically, there is an edge for a pair of regions if and only if there is a streamline passing between them.For more information on the procedure that generates the brain graphs, we refer the readers to Kiar et al. (2018).The data used in this study is the same one used by Yang et al. (2019).
Graphs in this dataset are weighted, directed, and have unknown dimension of the latent distribution.Thus, modifications to the procedure described in Sect.2.3 are required to obtain the ASE of the graphs.The correction of ASE to CASE and the subsequent test are performed without further modifications.In addition to that, we do employ the median heuristic (Garreau et al. 2017) in order to determmine the bandwidth of the kernel.All of those modifications are implemented in graspologic (Chung et al. 2019) Python package, and, in fact, used by default whenever one uses a latent distribution test on a graphs that are directed, weighted, and/or have unspecified latent dimension.
There are 57 subjects in this dataset, each of which has 2 different dMRI scans.Furthermore, each of the scans was converted to a 'large' and a 'small' graph, using the aforementioned pipeline.The number of vertices in the large graphs varies between 730 and 1194, wheras the number of vertices in the small graphs varies between 493 and 814.We will refer to these large and small graphs as different scales.In this work all comparisons, whether within or between the subjects, take two graphs of different scales, one small and one large.
We first use both the test that uses ASE and the test that uses CASE to compare the scans within the same subject, within the same scan, but between the two scales.There are 114 total possible comparisons.Paired differences in p values obtained by the test that uses CASE and the test that uses ASE are presented in Fig. 11.Using the one-sided Wilcoxon Signed-Rank test (Wilcoxon 1945) on those pairs of p values, we obtain p value < 10 −7 , signifying that the corrected test rejects statistically less often than the uncorrected test.
We can furthermore consider decision-theoretic consequences by setting significance level α to two different commonly used values 0.01 and 0.05.In case of α = 0.01 , both tests reject the null in 24 case, neither rejects in 69 cases, only ASE does in 16 cases, and only CASE does in just 5 cases (see color coding of Fig. 11 but note that some data points overlap).Using the two-sided Fisher's exact test, we obtain a statistic of 20.7 with a p value < 10 −8 .Alternatively, setting α = 0.05 , we obtain, a contigency table of: both: 89, none: 21, ASE only: 4, CASE only: 0, leading to a two-sided Fisher's test statistic p value < 10 −18 .
Thus, the test that uses CASE picks up the differences staistically less often when using both raw p values and when using binary decisions by comparing p values with significance levels α = {0.01,0.05} .In Sect.4.3 we demonstrated that using the uncorrected Fig. 11 Comparison of the difference in p values obtained using CASE and ASE in the setting of brain graphs of the same subject and scan but different scales.Color coding according to decision at α = 0.01 .Note that some datapoints might overlap exactly due to permutation test providing a p value from a discrete set test can lead both to an invalid test under the null, and to a less sensitive test under some alternatives.Thus using a correction does not always imply having larger p values.In this case, however, it does, which aligns with our natural intuition that a correct test should reject less often, as graphs obtained from the same scan but at different scales should be somewhat similar to each other.
Next, we use the corrected test to compare the graphs between the scans and between subjects.Thre is a total of 114 possible comparisons in the setting of different scans of the same subject, as there are two comparisons per subject (larger scale scan 1 to smaller scale scan 2 and larger scale scan 2 to smaller scale scan 1) and 57 subjects total.For the case of different subject, there are (57 × 2) × (56 × 2) = 12, 768 total comparions (57 subjects each of which has 2 scans at larger scale compared to each of the 2 scans at smaller scale of everyone but themselves).
We plot the histograms and the kernel density estimates of the distribution of p values, stratified by setting, in Fig. 12.Using the one-sided (>) Mann-Whitney U test (Mann and Whitney 1947) we obtain: a p value 0.020 when comparing the distribution of p values of the same subject within the scan to the distribution of p values for the same subject between the scans, a p value of 0.003 when comparing the distribution of p values of the same subject between the scans to the distribution of p values between different subjects, and lastly, a p value < 10 −7 when comparing the distribution of p values for the same subject within the scan to the distribution of p values in the setting of different subjects.
To summarize, the p values within the same subject same scan are smaller than within the same subject but different scan, which are themselves smaller than between different subjects.This aligns with the natural intuition that the test should reject more often for different scans than for the same scans and more often for the different subjects than for the same subjects.

Discussion
In this work we demonstrated that the latent distribution test proposed by Tang et al. (2017) degrades in validity as the numbers of vertices in two graphs diverge from each other.This phenomenon does not contradict the results of the original paper, as it occurs when test is used on two graphs of finite size.Meanwhile, the scope of the original paper is limited to the asymptotic case.We presented an intuitive example that demonstrates that the invalidity occurs because a pair of adjacency spectral embeddings for the graphs with different number of vertices falls under the alternative hypothesis of the subsequent test.We also proposed a procedure to modify the embeddings in a way that makes them exchangeable under the null hypothesis.This leads to a testing procedure that is both valid and consistent, as has been demonstrated experimentally.The code for the testing procedure that uses CASE is incorporated into GraSPy Chung et al. (2019) python package, alongside the original unmodified test.We strongly recommend CASE, as opposed to ASE, for nonparametric two-sample graph hypothesis testing when the graphs have differing numbers of vertices.However, we note that this procedure is nondeterministic, as it requires sampling additive noise.
Our work can be extended by developing limit theory for the corrected adjacency spectral embeddings and the test statistcs that use them.It is also likely that the approach of modifying the embeddings can be extended to tests that use Laplacian spectral embedding (See Athreya et al. (2018) for associated RDPG theory) or models that are more general than RDPGs, such as Generalized Random Dot Product Graphs (Rubin-Delanchy et al. 2022) or other latent position models.
In general, two-sample latent distributon hypothesis testing is also closely related to the problem of testing goodness-of-fit of the model (Tang et al. 2017).No such test, at least known to us, exists for random dot product graphs.We hope that the work presented in this paper may facilitate this investigation.

Fig. 1
Fig. 1 Visualization of the valid latent positions of an arbitrary 2-dimensional SBM with K = 4 (left), valid latent positions of a DCSBM with the same Z (center) and valid latent positions of an MMSBM with the same Z (right).All three are examples of RDPGs

Definition 6 (
Adjacency spectral embedding (ASE)) Let A have eigendecomposition where U and consist of the top d eigenvectors and eigenvalues (arranged by decreas- ing magnitude) respectively, and U ⊥ and ⊥ consist of the bottom n − d eigenvectors and eigenvalues respectively.The adjacency spectral embedding of A into R d is the n × d matrix where the operator | • | takes the entrywise absolute value.

Definition 7 (
Adjacency spectral embedding (ASE) of a directed graph) Let d ≥ 1 and let A be an adjacency matrix of a directed graph with n vertices.Let A have singular value decomposition is a d × d diagonal matrix consisting of d largest singular values and U and V are the associated matrices of left andright singular vectors.The adjacency spectral embeddingof a directed graph A into R 2d is the n × 2d matrix Tang et al. (2017)  present the convergence result of the test statistic in the test for the equivalence of the latent distributions of two RDPG.One of their main theorems is presented below.Theorem 2.3 Let (X, A) ∼ RDPG(F , n) and (Y , B) ∼ RDPG(G, m) be d-dimensional random dot product graphs.Assume that the distributions of latent positions F and G are such that the second moment matrices E[X 1 X T 1 ] and E[Y 1 Y T 1 ] each have d distinct nonzero eigenvalues.Consider the hypothesis test Denote by X = X1 , . . ., Xn and Ŷ = Ŷ 1 , . . ., Ŷ m the adjacency spectral embeddings of A and B respectively.Recall that a radial basis kernel κ(•, •) is any kernel such that κ(W x, W y) = κ(x, y) for all x, y and orthogonal transformations W . Define the test statistic where κ is some radial basis kernel.Suppose that m, n → ∞ and m/(m + n) → ρ ∈ (0, 1) .Then under the null hypothesis of F = G • W ,

Fig. 2
Fig. 2 A visualization of the ASEs for the Erdös-Rényi graphs with the same edge probability, but vastly different orders.Top: theoretical densities of the ASEs; bottom: the histogram of the ASEs of two generated graphs, with kernel density estimates

Fig. 3
Fig. 3 A visualization of the CASEs for the Erdös-Rényi graphs with the same edge probability, but vastly different orders.Top: theoretical densities of the corrected ASEs; bottom: the histogram of the corrected ASEs of two generated graphs, with kernel density estimates

Fig. 4
Fig. 4 Size of the nonparametric latent distribution permutation tests that use the standard ASE (left) and the CASE (right).Graphs are A ∼ ER(n, 0.8 2 ) and B ∼ ER(cn, 0.8 2 ) .Error bars represent 95% confidence interval

Fig. 5
Fig. 5 Power of the nonparametric latent distribution permutation test that uses CASE against the alternative with graphs generated from A ∼ ER(n, 0.8 2 ) and B ∼ ER(cn, 0.79 2 ) .Error bars represent 95% confidence interval

Fig. 6 Fig. 7
Fig. 6 Size of the nonparametric latent distribution permutation tests that use the regular ASE (left) and the CASE (right).Graphs are A ∼ SBM(n, π, P) and B ∼ SBM(cn, π, P) .Error bars represent 95% confidence interval

Fig. 12
Fig. 12 Comparison of the difference in p values obtaind using CASE for different settings of the brain graph data.Left panel: Histograms of p values with a bin size of 0.01.Normalized to add to 1. Right panel: Kernel density estiates of the distribution of p values.Normalized to integrate to 1