In this section, we describe the main steps towards a hypothesis-driven Bayesian approach for understanding edge formation in unweighted attributed multigraphs. To that end, we propose intuitive models for edge formation (Section “Generative edge formation models”), a flexible toolbox to formally specify belief in the model parameters (Section “Constructing belief matrices”), a way of computing proper (Dirichlet) priors from these beliefs (Section “Eliciting a Dirichlet prior.”), computation of the marginal likelihood in this scenario (Section “Computation of the marginal likelihood”), and guidelines on how to interpret the results (Section “Application of the method and interpretation of results”). We subsequently discuss these issues one-by-one.
Generative edge formation models
We propose two variations of our approach, which employ two different types of generative edge formation models in multigraphs.
Global model First, we utilize a simple global model, in which a fixed number of graph edges are randomly and independently drawn from the set of all potential edges in the graph G by sampling with replacement. Each edge (v
i
,v
j
) is sampled from a categorical distribution with parameters \(\theta _{ij}, 1 \leq i \leq n, 1 \leq j \leq n, \forall {ij}: \sum _{ij} \theta _{ij} = 1\): (v
i
,v
j
)∼Categorical(θ
ij
). This means that each edge is associated with one probability θ
ij
of being drawn next. Figure 2
a shows the maximum likelihood global model for the network shown in Fig. 1. Since this is an undirected graph, inverse edges can be ignored resulting in n(n+1)/2 potential edges/parameters.
Local models As an alternative, we can also focus on a local level. Here, we model to which other node a specific node v will connect given that any new edge starting from v is formed. We implement this by using a set of n separate models for the outgoing edges of the ego-networks (i.e., the 1-hop neighborhood) of each of the n nodes. The ego-network model for node v
i
is built by drawing randomly and independently a number of nodes v
j
by sampling with replacement and adding an edge from v
i
to this node. Each node v
j
is sampled from a categorical distribution with parameters \(\theta _{ij}, 1 \leq i \leq n, 1 \leq j \leq n, \forall i: \sum _{j} \theta _{ij} = 1\): v
j
∼Categorical(θ
ij
). The parameters θ
ij
can be written as a matrix; the value in cell (i,j) specifies the probability that a new formed edge with source node v
i
will have the destination node v
j
. Thus, all values within one row always sum up to one. Local models can be applied for undirected and directed graphs (cf. also in Section “Discussion”). In the directed case, we model only the outgoing edges of the ego-network. Figure 2
b depicts the maximum likelihood local models for our introductory example.
Hypothesis elicitation
The main idea of our approach is to encode our beliefs in edge formation as Bayesian priors over the model parameters. As a common choice, we employ Dirichlet distributions as the conjugate priors of the categorical distribution. Thus, we assume that the model parameters θ are drawn from a Dirichlet distribution with hyperparameters α: θ∼Dir(α). Similar to the model parameters themselves, the Dirichlet prior (or multiple priors for the local models) can be specified in a matrix. We will choose the parameters α in such a way that they reflect a specific belief about edge formation. For that purpose, we first specify matrices that formalize these beliefs, then we compute the Dirichlet parameters α from these beliefs.
Constructing belief matrices
We specify hypotheses about edge formation as belief matrices
B=b
ij
. These are n×n matrices, in which each cell b
ij
∈IR represents a belief of having an edge from node v
i
to node v
j
. To express a belief that an edge occurs more often (compared to other edges) we set b
ij
to a higher value.
Node-attributed multigraphs In general, users have a large freedom to generate belief matrices. However, typical construction principles are to assume that nodes with specific attributes are more popular and thus edges connecting these attributes receive higher multiplicity, or to assume that nodes that are similar with respect to one or more attributes are more likely to form an edge, cf. (Papadopoulos et al. 2012). Ideally, the elicitation of belief matrices is based on existing theories.
For example, based on the information shown in Fig. 1, one could “believe” that two authors collaborate more frequently together if: (1) they both are from the same country, (2) they share the same gender, (3) they have high positions, or (4) they are popular in terms of number of articles and citations. We capture each of these beliefs in one matrix. One implementation of the matrices for our example beliefs could be:
-
B
1 (same country): b
ij
:=0.9 if f
i
[country]=f
j
[country] and 0.1 otherwise
-
B
2 (same gender): b
ij
:=0.9 if f
i
[gender]=f
j
[gender] and 0.1 otherwise
-
B
3 (hierarchy): b
ij
:=f
i
[position]·f
j
[position]
-
B
4 (popularity): b
ij
:=f
i
[articles]+f
j
[articles]+f
i
[citations]+f
j
[citations]
Figure 3
a shows the matrix representation of belief B
1, and Fig. 3
b its respective row-wise normalization for the local model case. While belief matrices are identically structured for local and global models, the ratio between parameters in different rows is crucial for the global model, but irrelevant for local ones.
Dyad-attributed networks For the particular case of Dyad-Attributed networks, beliefs are described using the underlying mechanisms of secondary multigraphs. For instance, a co-authorship network—where every node represents an author with no additional information or attribute—could be explained by a citation network under the hypothesis that if two authors frequently cite each other, they are more likely to also co-author together. Thus, the adjacency (feature) matrices \((\hat F)\) of secondary multigraphs can be directly used as belief matrices B=(b
ij
). However, we can express additional beliefs by transforming the matrices. As an example, we can formalize the belief that the presence of a feature tends to inhibit the formation of edges in the data by setting b
ij
:=−sigm(f
ij
), where sigm is a sigmoid function such as the logistic function.
Eliciting a Dirichlet prior.
In order to obtain the hyperparameters α of a prior Dirichlet distribution, we utilize the pseudo-count interpretation of the parameters α
ij
of the Dirichlet distribution, i.e., a value of α
ij
can be interpreted as α
ij
−1 previous observations of the respective event for α
ij
≥1. We distribute pseudo-counts proportionally to a belief matrix. Consequently, the hyperparameters can be expressed as: \(\alpha _{ij} = \frac {b_{ij}}{Z} \times \kappa + 1\), where κ is the concentration parameter of the prior. The normalization constant Z is computed as the sum of all entries of the belief matrix in the global model, and as the respective row sum in the local case. We suggest to set κ=n×k for the local models, κ=n
2×k for the directed global case, \(\kappa = \frac {n(n+1)}{2} \times k\) for the undirected global case, and k={0,1,...,10}. A high value of κ expresses a strong belief in the prior parameters. A similar alternative method to obtain Dirichlet priors is the trial roulette method (Singer et al. 2015). For the global model variation, all α values are parameters for the same Dirichlet distribution, whereas in the local model variation, each row parametrizes a separate Dirichlet distribution. Figure 3 (c) shows the prior elicitation of belief B1 for kappa=4 using the local model.
Computation of the marginal likelihood
For comparing the relative plausibility of hypotheses, we use the marginal likelihood. This is the aggregated likelihood over all possible values of the parameters θ weighted by the Dirichlet prior. For our set of local models we can calculate them as:
$$ P(D|H) = \prod_{i=1}^{n} \frac{\Gamma\left({\sum\nolimits}_{j=1}^{n}\alpha_{ij}\right)}{\Gamma\left({\sum\nolimits}_{j=1}^{n}\alpha_{ij}+m_{ij}\right)} \prod_{j=1}^{n} \frac{\Gamma(\alpha_{ij}+m_{ij})}{\Gamma(\alpha_{ij})} $$
(2)
Recall, α
ij
encodes our prior belief connecting nodes v
i
and v
j
in G, and m
ij
are the actual edge counts. Since we evaluate only a single model in the global case, the product over rows i of the adjacency matrix can be removed, and we obtain:
$$ P(D|H)=\frac{\Gamma\left({\sum\nolimits}_{i=1}^{n}{\sum\nolimits}_{j=1}^{n}\alpha_{ij}\right)}{\Gamma\left({\sum\nolimits}_{i=1}^{n}{\sum\nolimits}_{j=1}^{n}\alpha_{ij}+m_{ij}\right)} \prod_{i=1}^{n}\prod_{j=1}^{n} \frac{\Gamma\left(\alpha_{ij}+m_{ij}\right)}{\Gamma(\alpha_{ij})} $$
(3)
Section “Computation of the marginal likelihood” holds for directed networks. In the undirected case, indices j go from i to n accounting for only half of the matrix including the diagonal to avoid inconsistencies. For a detailed derivation of the marginal likelihood given a Dirichlet-Categorical model see (Tu 2014; Singer et al. 2014). For both models we focus on the log-marginal likelihoods in practice to avoid underflows.
Bayes factor Formally, we compare the relative plausibility of hypotheses by using so-called Bayes factors (Kass and Raftery 1995), which simply are the ratios of the marginal likelihoods for two hypotheses H
1 and H
2. If it is positive, the first hypothesis is judged as more plausible. The strength of the Bayes factor can be checked in an interpretation table provided by Kass and Raftery (1995).
Application of the method and interpretation of results
We now showcase an example application of our approach featuring the network shown in Fig. 1, and demonstrate how results can be interpreted.
Hypotheses We compare four hypotheses (represented as belief matrices) B
1, B
2, B
3, and B
4 elaborated in Section “Hypothesis elicitation”. Additionally, we use the uniform hypothesis as a baseline. It assumes that all edges are equally likely, i.e., b
ij
=1 for all i,j. Hypotheses that are not more plausible than the uniform cannot be assumed to capture relevant underlying mechanisms of edge formation. We also use the data hypothesis as an upper bound for comparison, which employs the observed adjacency matrix as belief: b
ij
=m
ij
.
Calculation and visualization For each hypothesis H and every κ, we can elicit the Dirichlet priors (cf. Section “Hypothesis elicitation”), determine the aggregated marginal likelihood (cf. Section “Computation of the marginal likelihood)”, and compare the plausibility of hypotheses compared to the uniform hypothesis at the same κ by calculating the logarithm of the Bayes factor as log(P(D|H))−log(P(D|H
uniform
)). We suggest two ways of visualizing the results, i.e., plotting the marginal likelihood values, and showing the Bayes factors on the y-axis as shown in Fig. 4
a and 4
b respectively for the local model. In both cases, the x-axis refers to the concentration parameter κ. While the visualization showing directly the marginal likelihoods carries more information, visualizing Bayes factors makes it easier to spot smaller differences between the hypotheses.
Interpretation Every line in Fig. 4
a to 4
d represents a hypothesis using the local (top) and global models (bottom). In Fig. 4
a and 4
c, higher evidence values mean higher plausibility. Similarly, in Fig. 4
b and 4
d positive Bayes factors mean that for a given κ, the hypothesis is judged to be more plausible than the uniform baseline hypothesis; here, the relative Bayes factors also provide a ranking. If evidences or Bayes factors are increasing with κ, we can interpret this as further evidence for the plausibility of expressed hypothesis as this means that the more we believe in it, the higher the Bayesian approach judges its plausibility. As a result for our example, we see that the hypothesis believing that two authors are more likely to collaborate if they are from the same country is the most plausible one (after the data hypothesis). In this example, all hypotheses appear to be more plausible than the baseline in both local and global models, but this is not necessarily the case in all applications.