Using a Bayesian approach to reconstruct graph statistics after edge sampling

Often, due to prohibitively large size or to limits to data collecting APIs, it is not possible to work with a complete network dataset and sampling is required. A type of sampling which is consistent with Twitter API restrictions is uniform edge sampling. In this paper, we propose a methodology for the recovery of two fundamental network properties from an edge-sampled network: the degree distribution and the triangle count (we estimate the totals for the network and the counts associated with each edge). We use a Bayesian approach and show a range of methods for constructing a prior which does not require assumptions about the original network. Our approach is tested on two synthetic and three real datasets with diverse sizes, degree distributions, degree-degree correlations and triangle count distributions.


Introduction
Analysis of complex networks remains a growing area and network data sets are more and more commonly available.However, some data sets are only a sample of the entire network.For very large networks, it may not be possible to work with complete data because of its size.Additionally, APIs can rate-limit the number of queries, meaning that not all nodes and edges are present [19].A common example is the Twitter stream API which returns a 1% random sample of all tweets in real-time [26].In the usual Twitter graph formulation where edges constitute 1:1 replies or retweets, this corresponds to uniform edge sampling of the full Twitter reply/retweet graph.Inferring even simple characteristics such as the true number of nodes or edges from a sample can be nontrivial [15,7].
In this work we present a methodology for recovering the degree sequence and the triangle sequence (per edge) under a uniform edge-sampling scenario where for an undirected graph G, a sample is constructed by uniformly sampling each edge of G with probability p. First, we build on methods by Ganguly et al [14] who recover the degree distribution from node-sampled networks using a Bayesian approach and we extend this to edge-sampled networks.We address the problem of finding an appropriate prior degree distribution by proposing two different ways to construct a prior.We further extend this Bayesian approach to estimating the edge triangle count (the number of triangles associated with each edge) and the total triangle count.
We find that our Bayesian method outperforms the standard scale-up method at estimating the degree sequence, particularly in small p scenarios where as few as 10% of the edges remain.Moreover, the priors we use do not make any assumptions about the original degree distribution.For estimating the triangle per link count, in 3 out of the 5 network datasets we use, a Poisson prior achieves similar performance to a correct prior.
This paper is structured as follows.First, in section 3 we describe the edge sampling procedure and derive properties of graphs that have been sampled in this way.Then in section 4 we introduce the various estimators used for these properties, with section 5 showing how to construct a prior for the Bayes estimators.Finally in section 6 we present our results on recovering these properties on synthetic and real datasets.We discuss the implications in section 7.
Note that this paper is an extended version of previous work [4]; in section 3 we perform a deeper analysis of how the expectation and variance of quantities in the sampled graph relates to the structure of the original graph and how this may affect estimate quality, in section 6 we investigate the empirical bias of our estimators with respect to the different datasets, and we include results for a significantly larger dataset.

Related Work
Sampling of complex networks in general is a well studied problem.One point of interest is how well sampling preserves different properties, such as node rankings in Twitter networks [19], temporal features [1] and scaling properties [17].These works have aimed also at designing sampling schemes specifically to preserve a given quantity.Other works have used sampling to estimate quantities on graphs that are prohibitively large to work with in their entirety, with a focus on triangle counting [25,2,23] or other motifs [16,6].
Two recent works studied the problem of recovering a network's degree distribution working from a small sample, first posed by Frank [13] in his PhD thesis in 1971.The first by Zhang et al [29] frames it as an inverse problem involving the vector of observed degree counts and a linear operator representing the sampling scheme.The second by Ganguly et al [14] uses a range of estimators for individual vertex degrees in node-sampled networks; simple scale-up estimators, risk minimisation estimators and Bayes posterior estimates.Antunes et al [2] whose work was on sampling methods for estimating the triangle distribution, studied the n = 1 sample size problem as restricted access scenario as a case study.Other than this, little attention has been given specifically to these restricted access problems, noted in [29].
The problem of reconstructing triangles from sampled networks has also received attention.Tsourakakis et al [25] were motivated by the speed-up that can be obtained by working on sampled not complete data.Their method counts the number of triangles in the graph G ′ sampled from a full graph G retaining edges independently with probability p (the setting for this paper).They proved that simply multiplying by 1/p 3 on the observed triangles in the sample G ′ gives an unbiased estimator for the original number of triangles in G. Lim et al [18] developed a streaming estimator that can work in a single pass over data using a similar algorithm.The extend [25] to graphs that allow multiple edges between the same node pair.
A related problem is reconstructing network structure from unreliable or noisy data, such as social networks constructed from reported friendships, which are well known for having missing or spurious edges due to the different interpretations of "friendship" [28].Young et al [28] address this using a Bayesian approach for finding posterior probabilities for an edge's existence given the measurements obtained.Newman [21] use a Bayesian approach involving the empirical false and true positive rates of observing an edge from the data.
While it does not deal directly with reconstruction, [8] gives an estimate of the number of triangles in a network as where T is the estimated number of triangles, P (k) is the number of nodes of degree k and k is the mean degree.This would allow an estimate of the number of triangles to be derived if the degree distribution is known (or can be estimated).However, the estimate is only reasonable if nodes are connected independently, that is the probability that node i connects to node j is proportional to k i k j their degrees.In [8] it is shown that this is a reasonable approximation for the Internet Autonomous Systems network but it is not a useful approximation for the majority of networks.

Properties of edge sampled graphs
Let G = (V, E) be an undirected simple graph with vertex set V = {v 1 , . . ., v N } and edge set E = {e 1 , . . ., e M }.Consider a sampling regime where each edge e l ∈ E is included in the sampled graph with probability p ∈ [0, 1], and each vertex v i ∈ V is included if any edge incident to it is included.Denote this sampled graph G ′ = (V ′ , E ′ ), where V ′ ⊆ V and E ′ ⊆ E. Let the sizes of V ′ and E ′ be N ′ and M ′ respectively.This is known as incident subgraph sampling [17].

Degree
Let k i , respectively k ′ i denote the degree of a node v i ∈ G and G ′ respectively.Then k ′ i follows a binomial distribution k ′ i ∼ B(k i , p), with conditional probability given by with expectation E(k Nodes in G that become isolated as part of the sampling process are invisible to observers of G ′ and should be considered removed.In this way, let δ i be the indicator random variable representing the removal of node v i from G, with probability (1 − p) ki .Then, the expected number N ′ 0 of removed nodes from G Fig. 1: Proportion of nodes removed in Erdős-Rényi and Barabási-Albert graphs, of size 1000 nodes and 10000 (ER), 9900 (BA) edges (see table 1), using the edgesampling procedure.Shown is the mean value N ′ 0 /N for 15000 sampled graphs for each value of p with standard deviation error bars (one above and one below).The y-axis is slightly truncated since the ratio is a positive number.is given by where N k is the number of vertices in G of degree k.This is dependent on the degree distribution; the leading constant term for small p is N 1 , the number of degree 1 nodes in G.This brings to mind the friendship paradox [12], where a node incident to a randomly chosen edge will on average have a higher degree than a randomly chosen node.From figure 1 we see Barabási-Albert [5] networks experience more node removal than Erdős-Rényi [11] networks.Equation ( 2) is a special case of the expression Dubois et al [10] derived for the number of nodes of degree k ′ in an edge-sampled graph They note that the number of nodes of degree k ′ is the sum of variables that are not independent of each other and that the resulting degree of a node after edge sampling is intrinsically linked with the resulting degree of its neighbours.Indeed, we find that the variance in the number of nodes removed N ′ 0 is given by where N k,k ′ is the number of edges connecting vertices of degree k and k ′ .Treating p as a small constant, the dominating terms are N 1 and N 1,1 , meaning that the variance is dependent on both the number of low degree nodes and degreedegree correlations among low degree nodes.In figure 2, we study two networks Fig. 2: Difference between the expected (using equation 2) and true number of nodes removed in the edge sampling process for two networks of the same powerlaw degree distribution but different degree assortativities.The mean is shown over 1000 sampled graphs for each value of p with standard deviation error bars (one deviation above and one below.) with the same power-law degree distribution but rewired in two different ways using [30] to obtain a maximally assortative and maximally disassortative version of that network.Plotted is the difference between number of nodes removed and the expected number of nodes removed using equation 2, finding a slight small empirical bias for the disassortative network.With all these things in mind, estimating the number of nodes removed by an edge sampling process can therefore be expected to be nontrivial and we do not address it in this work.

Triangles
Let T l be the number of triangles in G which include edge e l ∈ E. Then the number of triangles in G, denoted by T , is given by where the factor of 1 3 is present because each triangle in the sum is counted three times, once for each link.
Let T ′ l be the number of triangles which include edge e l in the sampled graph In the case that edge e l remains in the sampled network, then each triangle that includes e l will remain in the sampled network if and only if the other two edges remain; this occurs with probability p 2 .There are T l such triangles, so the number of these which remain in the sampled network is binomially distributed with T l trials and probability p 2 .That is, In the case that e l does not remain in the sampled network, the following holds: where δ 0,t ′ is the Kronecker delta function, taking the value of 1 if t ′ = 0 and 0 otherwise (since we defined T ′ l = 0 if e l / ∈ E ′ ).We can use the law of total probability to remove the conditioning on e l from eqs. ( 4) and ( 5) and find P(T ′ l = t ′ ), the probability mass function for the number of triangles in the sampled graph link e l participates in.
Therefore, the conditional probability mass function for T ′ l given T l is given by eq. ( 6).
The expected value of T ′ l given T l is given by where eq. ( 7) comes from noting that the sum in the left hand side precisely evaluates the expected value of a binomial random variable with t trials and probability p 2 .
Let T ′ be a random variable representing the triangle count of G ′ , then where T ′ is the triangle count of the sampled network G ′ .
Similar to the discussion in the previous section, the sum in eq. ( 8) is not the sum of independent random variables.The quantities T ′ l and T ′ m for l ̸ = m are only independent if edges e l , e m ∈ E are not part of triangles which overlap in any way; this can happen not only if they are adjacent edges but if they have triangles dependent on a shared edge.We can therefore expect total triangle count estimates to be more variable in networks with numerous instances of multiple triangles sharing a link; in these cases, the removal of a single link may remove a large number of triangles.This nature of clustered triangle distribution would be likely in a network that was highly transitive.As an example, figure 3 shows the average number of triangles T ′ in a sampled graph minus the estimated number of triangles T p 3 .The network is a power law network with positive transitivity (global transitivity C = 0.14 and assortativity ρ = 0.35).The total number of triangles in the original network is 13,305 and the maximum observed effect is an underestimate of around 8 triangles.This is consistent with low or zero bias in the estimator.Note that the large error bars for large p, while may seem counterintuitive, are due to larger variance in T ′ for these values of p and hence larger standard errors.Fig. 3: Difference between the observed number of triangles and the expected number of trinagles using p 3 T , on a power law network with N = 1, 000 and M = 5, 000 and degree assortativity ρ = 0.35.Shown is the mean over 500 sampled graphs for each value of p and with standard deviation error bars (one deviation above and one below.) The variance of T ′ l given T l is given by where the bracketed term in line 9 is the variance of a binomial variable with t trials and probability p 2 .An argument involving computation of the covariances Cov (T j , T l ) shows that the variance of the expected total triangle count of G ′ given the individual triangle counts T 1 , . . ., T M is given by where k is the number of triangles which share a link.The full derivation of eq. ( 10) can be found in the first author's thesis [3] which also contains derivations for wedge1 counts and clustering coefficient.An expression for this variance conditioned on total triangle count T not edge triangle counts is given in [25].
The number of triangles per node T i can be obtained from T e ℓ from 2T i = k T e ℓ =(i,k)∈E meaning the estimators of the edge-sampled network can be extended to evaluate vertex statistics eg local transitivity of the nodes

Estimators for the degree sequence and triangle count
The previous showed how the distribution of a quantity X ′ in a sampled graph G ′ could be calculated as a conditional probability P (X ′ = x ′ |X = x) given the unsampled measurement X = x.This section aims to estimate the true network quantity X given its sampled counterpart X ′ .

Method of moments estimators
Let X be a random variable associated with a statistic of G and let X ′ be that statistic on G ′ with expected value E(X ′ ) = f (X, p).A naive 'scale-up' estimator for X given observed value x ′ for X ′ is the solution x to the equation x ′ = f (x, p), provided a solution exists.Borrowing the terminology from [14], we will refer to these estimators as method of moments estimators (MME).
Degree For a node of degree k ′ i in G ′ , the MME for k i is given by k ′ i /p.This is an unbiased estimator with mean Nodes with the lowest possible degree (one) in the sampled graph are estimated as having degree 1/p in the unsampled graph so as p decreases, the estimation of low-degree nodes becomes poorer.

Triangle count
The expected triangle count E(T ′ l ) of edge e l is p 3 T l .If in addition, e l remains in G ′ , its expected triangle count is given by p 2 T l .Therefore the MME for T l is p −3 T ′ l or p −2 T l , without and with the conditioning respectively.Similar to the MME for degree, it provides poor estimates for edges that have a low triangle count, as it disallows any estimates of T l in the range (0, 1/p 2 ).
Similarly, an MME estimate proposed by Tsourakakis et al [25] for the total triangle count of a network is p −3 T ′ , which has expected value E (p −3 T ′ ) = T .They found that this estimator has variance 1 p 6 (p 3 − p 6 )T + 2k(p 5 − p 6 ) , where k is the number of pairs of triangles which share a link.

Bayes estimator
This estimator relies on Bayes theorem, giving is the likelihood which is determined by the edge sampling procedure and is known.P (X = x) is the prior function which will be denoted by π(x); this is in general not known.
A posterior estimate for X given X ′ can then be given as the expected value .
The immediate question arises of how to deal with the prior π(x), as this may involve making assumptions about the structure of G.This will be discussed case by case for the degree and triangle count.
Degree Using the likelihood function for the degree from eq. ( 1), a posterior estimate for the degree of node v i given it has degree where π(k) is a prior for the degree distribution P (k) of G.
Triangle count Using the likelihood function from eq. ( 4), a posterior estimate for the triangle count of edge e l in G given it remains in G ′ is where π(t) is a prior for the proportion of edges with triangle count t. 2To establish the total triangle count, summing the value of this estimator over the remaining edges in G ′ and dividing by 3, as in eq. ( 3), will provide an underestimate for the total triangle count of G, since there are potentially many missing edges in G ′ .To mitigate this, we scale this factor up to the estimated number of edges in G.That is, our estimate of the total triangle count becomes T = 1 3p e l ∈G ′ Tl .

Remarks on our estimators
It is worth highlighting that up to now, our degree estimators ki , kj for nodes i and j will take the same value if k ′ i and k ′ j are the same.That is, only the node's degree and the degree sequence of the sampled network are taken into account and no other aspect of the network's structure.It also means that estimates will be very coarse; if the sampled network has maximum degree k then the estimated degree sequence will take up to k different values.The same is true for the triangle per link estimators.

Constructing a prior
The Bayes estimators for the degree and triangle per link sequences require the choice of a prior π(k) and π(t) for the degree distribution and triangles per link distribution respectively.In this section, we propose methods for constructing priors.

Degree distribution
A prior could be obtained from chosen family of distributions such as the Zipf distribution or a power law distribution, but this baked-in assumption may not be desirable.Furthermore, it has been shown that the distribution of a sampled network may not even follow the distribution of the true network [24].Therefore, we propose two different methods of constructing a prior which do not make assumptions about the degree distribution of the true network.
Monte Carlo minimisation method First, it is possible to estimate the prior using a Monte Carlo method to minimise the ℓ 2 norm of the error, which in this work we will refer to as the minimisation method.In this approach, we find a degree sequence {κ i } which minimises min . This is done with the restrictions that the degree κ i is an integer number with κ i ≥ k ′ i and To do this, we start with κ i = ⌊k ′ i /p⌋.If the sum vi∈V ′ κ i of the estimated degrees is not equal to the estimated number of links ⌊2M ′ /p⌋ then we increment or decrement the degree of nodes chosen uniformly at random until equality holds.Then we construct a graph with degree distribution given by {κ i } and rewire its links at random by a single edge swap (incrementing the degree of one node chosen at random and decrementing the degree of another node chosen at random) for a large number of iterations (15000 in our case), accepting each proposed rewire if it decreases the ℓ 2 error.Note that if the initial estimate κ i satisfies equation 13, then this is a global minimum for the ℓ 2 norm.In particular, this happens when k ′ i /p is an integer for all i.
Link cascade method The MME (and hence sometimes the minimisation method) cannot estimate the degree k i ≈ k ′ i /p when k ′ i = 0; that is, the lowest possible estimated degree is 1/p.If a good estimate for the original number of nodes is known then another prior for capturing these low degree nodes is constructed by "cascading" links from high degree to low degree nodes.This is a heuristic method based on the observation, that when k ′ i = 0 the estimation is poor, so the idea is to create a prior that has the original number of nodes and there are no zero degree nodes.As with the minimisation method, we start with an estimated degree sequence κ i = ⌊k ′ i /p⌋, redistributing links as before if the total estimated degree does not match twice the estimated number of links.Then, we place the nodes in descending order based on their estimated degree, with the knowledge of the original number of nodes in the network being used to append placeholder nodes which would have been removed by the sampling process.Finally, we pick the first occurring node in this list with degree zero, and increment this degree by simultaneously decrementing the degree of the node directly before it, this operation is to conserve the total number of links.This step is performed iteratively until there are no degree zero nodes.This heuristic method is fast as it does not require the construction of a graph and the shuffling of its links.As the method manipulates only the degree sequence the method is attractive when trying to construct a prior for large networks.Finally, as a comparison point representing the best possible result achievable with the Bayes method, we use a true prior which places the probability mass π(k) equal to the proportion of nodes of degree k in the original network.

Triangle per link distribution
The two methods for prior construction of the degree distribution do not immediately translate to an analogue for triangles, and little is known about the triangle per edge distribution as a starting place for selecting a prior.As an initial approach therefore, we use a Poisson distribution Po(λ) with λ = 3 T /M ′ , the average number of triangles per link in the MME estimator.As with the degree distribution, we include a result with a true prior π(t) equal to the proportion of links with triangle count t in the original network as a comparison point.

Results
To test the capability of our estimators of degree sequence and triangle count, we consider five different starting networks: an Erdős-Rényi G(N, M ) network [11] with N = 1000 and M = 10000, a Barabási-Albert network [5] of approximately the same size and density, a real collaboration network from authors who submitted to the ArXiV high-energy theoretical physics category [20] (henceforth Hep-Th for brevity), an Internet autonomous systems topology dataset (henceforth AS) [9] and an interaction network on the MathOverflow sub-forum on StackExchange [22].A quick reference of some summary statistics can be found in table 1.These datasets were chosen to represent a heterogeneous selection of network types.The ER network has a Poisson degree and edge triangle count distribution and an overall low number of triangles for the network's density.The BA network has a theoretical power law degree distribution and a low triangle count for its density.The Hep-Th and AS networks have a heavy-tailed degree distribution but have very different degree correlations, and the Hep-Th has a higher clustering than the AS network.The MathOverflow dataset is provided as an example of a larger network.For each of these datasets modelled as a graph G, we take an edge-sampled network G ′ with edge sampling probability p, for p = 0.1, 0.2, . . ., 0.9 and from this, reconstruct the degree sequences, edge triangle counts and total triangle counts using our estimators.In the larger datasets (Hep-Th, AS topology, StackExchange) we examine also the cases p = 0.01 and p = 0.05 to consider the most extreme scenarios.In the degree distribution  1: Original statistics of network datasets used prior to sampling.Shown is the number of nodes N , number of edges M , maximum degree k max , degree assortativity ρ, number of triangles T , average node clustering coefficient C [27], and average number of triangles per link Tl .experiment, we reconstruct the degree k of nodes in V ′ using our chosen estimators k, and compute the root mean squared error of the degree sequences as 2 .These results are shown in figure 4, showing the mean and s.d.error over 10 experiments.In all but the AS topology network and the Math Overflow network, the Bayes estimator with true prior has the lowest error, though this is included only to show the best possible result that could be obtained with the Bayes method since the true prior is unknowable.The link cascade method performs well for many networks in the realistic cases with low sample rates but poorly for the E-R network.Bayes with link cascade prior is a good all round performer, never being a bad performer and second or third best for most networks.This method assumes knowledge of the number of nodes in G (i.e. the number of nodes pruned by the edge-sampling) so performs better at estimating low degree nodes.This is particularly evident in figure 4(c), performing better than the Bayes approach with true prior.The Monte Carlo minimisation method used as an estimator overlays the MME due to the restriction that the degree sequence is an integer (c.f.section 5).
Figure 5 studies the performance of the Bayes posterior estimate (the one with prior obtained using the minimisation method) against the baseline method of moments estimator in small p and large p scenarios, with the assumption that the number of nodes in the original graph is known.For small values of p, our estimates for low degree nodes are very poor.This is due to the information loss; for p = 0.1 as an example, it is more likely than not for a node of degree 6 or fewer to lose all its edges in the sampling process, and hence the performance is worse in networks with a low maximum degree such as the Erdős-Rényi and Hep-Ph collaborations graph.The MME is further restricted to taking values that are multiples of 10 for this value of p = 0.1, so gives very coarse estimates.However, for the p ≤ 0.1 cases, the Bayes posterior method brings the estimates closer to the correct answer than the MME in all but the AS topology dataset, where the two methods appear to perform similarly.
In the triangle count experiment, we estimate the triangle per edge count Tl for edges e l ∈ E ′ and compute the mean squared error as RMSE( T) = . In addition, we estimate the total number of triangles as described in section 4 and calculate the mean squared error over the 10 experiments performed.These are shown in figure 6.In all experiments, the Bayes estimator with Poisson prior overlays the MME for the total number of triangles because the λ used in the Poisson distribution is the MME estimate of the average number of triangles per link..However, in all but the AS topology, the Poisson prior improves the estimate of triangles per link, especially in the small p scenario.In the AS topology dataset, the Poisson is an inappropriate prior, performing poorly even with large sample sizes.
As with the degree sequence, we compare the number of triangles per link and estimated triangles per link in small and large p scenarios in our datasets, using the MME and Bayes posterior estimate with Poisson prior in figure 7. Considering first the p = 0.1 scenario, we see that the Bayes posterior estimate helps to ease the problem of extreme estimates (the MME Tl = T ′ /p 2 takes smallest values 0 then 1/p 2 ) since the contribution of high t terms in eq. ( 12) is suppressed by the Poisson prior.However, for large p, we see that the Poisson prior begins to dominate inappropriately, incorrectly assuming that the distribution of triangles across the links is homogeneous.This manifests in the BA and Hep-Ph examples as a slight overestimate at the lower triangle count end and a large underestimate at the upper end.We see this even clearer in the AS topology example, where a triangle count of over 250 is effectively forbidden by the Poisson prior, while the MME remains unbiased in all datasets.The poor performance of both estimators at p = 0.1 indicates how difficult a problem estimating the edge triangle count is after so much information has been lost; in all datasets apart from the AS topology, edges in the sampled graph had only 0 or 1 triangle remaining.A better estimate for these may involve more assumptions about the network structure.For example, the results in figure 7 show that the distribution of triangles in the real network and, to an extent, the BA network is heterogeneous, so perhaps a heavier tailed distribution than a Poisson should be considered.

Conclusion
This paper provided methods for recovering the degree sequence, number of triangles and triangle per link sequence from networks sampled via uniform edge sampling such as graphs limited to a sample by the Twitter API.Our derivations of the expectation and variance of these quantities demonstrate the difficulty of the problem, finding their dependency on the degree sequence, degree-degree correlations and triangle distributions of the original networks.Our results show that our Bayesian estimators perform much better than standard approaches on the degree sequence even when the priors were constructed without knowledge of distributions for the original network.For the triangle count per edge, we showed that while the Bayes estimates do not always improve upon the MME for total triangle counts, they provide a markedly better estimate of triangles per link in the small p scenario where most information has been lost.However, an inappropriate choice of prior can lead to a bias even when the sample size is large, as we found when using a Poisson prior for a network with heterogeneous triangle distribution.Future work will investigate generalising methods we used for constructing a degree distribution prior for constructing a prior for triangle counts per link.However, it should be noted that, in particular, reconstructing triangles is extremely hard at low sampling rate and it is doubtful any technique would produce large improvements.The work could instead be extended by considering sampling regimes and network properties for which a likelihood can be calculated.

Declarations
Ethical approval No ethical approval is applicable for this work.

Competing interests
The authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Authors' contributions All authors designed the research and contributed equally to the manuscript.

Fig. 4 :
Fig.4: Error in estimation of the true degree sequence.Each value is averaged over 10 experiments with the shaded error bars representing standard deviation and plotted with log-scaled y-axis.The MME overlays the minimisation method (blue, green respectively) in all plots; the same is true of the Bayes methods with minimisation and link cascade priors (red and purple respectively).In (e) all apart from the link cascade method are overlaid.The error bars in some places are very small, in part because we are considering root mean squared error and additionally the y axis is log-scaled.

Fig. 6 :
Fig. 6: Error in estimation of the triangles per link sequence (left hand column) and total triangles (right hand column) using our different approaches.Each value is averaged over 10 experiments with the shaded error bars representing standard deviation.The MME overlays the Bayes estimator with Poisson prior in the right hand column. Table