Skip to main content
Fig. 5 | Applied Network Science

Fig. 5

From: Sampling on networks: estimating spectral centrality measures and their impact in evaluating other relevant network measures

Fig. 5

Covariates distribution and sampling. We consider the Pokec social network dataset (\(\approx 1.3 \cdot 10^6\) nodes and \(\approx 2.9 \cdot 10^7\) edges) with a sample fraction of 10%. The columns with bolded colors represent the region where we started the sampling from. Numbers inside the legend are average and standard deviations over 10 runs of \(KL(p_{G}||p_{G_{m}})\), where \(p_{G}\) is the empirical frequency of the ten regions in the original graph, similarly for \(p_{G_{m}}\) in the sample. H represents the mean and standard deviation over the same runs for the \(H_{G_m}(s) / H_G (s)\) entropy ratio. We plot here an example of the relative frequency of nodes for the ten regions in which nodes are divided. The distribution on the whole network is the black vertical line. Vertical lines on top of bars represent standard deviations across 10 runs of sampling. Notice three different behaviors: RW obtains an in-sample attribute distribution similar to the one on the whole graph. TCEC has a higher difference in KL, followed by node2vec. On average, the former two are not biased by the starting region, as it is instead the case for node2vec. This can also be observed quantitatively by a higher \(H_{G_m}(s) / H_G (s)\) ratio

Back to article page