Skip to main content
Fig. 4 | Applied Network Science

Fig. 4

From: Explaining classification performance and bias via network structure and sampling technique

Fig. 4

RQ2: Parameter estimation and overall performance on small samples. These are the squared estimation errors of class priors (y-axis) and conditional probabilities (x-axis) with their respective overall performance (colors) on small samples (\(pseeds\le {{30}\%}\)) using different sampling techniques: a random nodes, b random edges, c degree sampling, and d partial crawls. Class balance and homophily values are shown as columns B and rows H, respectively. Mean ROCAUC scores per network type (B,H) are shown as \(\overline{ROCAUC}\). Conditional probabilities obtained by partial crawls (\(SE_{maj|maj}+SE_{min|min}<0.18\)) are the most accurate, followed by random edges, degree sampling and random nodes. However, the most accurate class priors are obtained by random nodes (\(SE_{min}<0.02\)), followed by random edges, partial crawls, and degree rank. In terms of performance, on average degree sampling achieves the highest ROCAUC followed by random edges, partial crawls, and random nodes. Surprisingly, perfect estimates (\(\sum SE = SE_{min}+SE_{maj|maj}+SE_{min|min}\approx 0\)) do not guarantee perfect performance (\(\text {ROCAUC}\approx 1.0\)), and vice versa

Back to article page