Annotated hypergraphs: models and applications

Hypergraphs offer a natural modeling language for studying polyadic interactions between sets of entities. Many polyadic interactions are asymmetric, with nodes playing distinctive roles. In an academic collaboration network, for example, the order of authors on a paper often reflects the nature of their contributions to the completed work. To model these networks, we introduce annotated hypergraphs as natural polyadic generalizations of directed graphs. Annotated hypergraphs form a highly general framework for incorporating metadata into polyadic graph models. To facilitate data analysis with annotated hypergraphs, we construct a role-aware configuration null model for these structures and prove an efficient Markov Chain Monte Carlo scheme for sampling from it. We proceed to formulate several metrics and algorithms for the analysis of annotated hypergraphs. Several of these, such as assortativity and modularity, naturally generalize dyadic counterparts. Other metrics, such as local role densities, are unique to the setting of annotated hypergraphs. We illustrate our techniques on six digital social networks, and present a detailed case-study of the Enron email data set.


Introduction
Many data sets of contemporary interest log interactions between sets of entities of varying size.In collaborations between scholars, legislators, or actors, a single project may involve an arbitrary number of agents.A single email links at least one sender to one or more receivers.A given chemical reaction may require a large set of reagents.Networks such as these cannot be represented via the classical paradigm of dyadic graphs without loss of higher-order information.For this reason, recent scholarly attention has emphasized the role of such polyadic interactions in governing the structure and function of complex features in the data.Additionally, we conduct an extended case-study of the popular Enron email data set.We conclude in Section 5 with a discussion of our results and suggestions for future work in the modeling of rich, polyadic data.
(2) A labeled edge set E, a multiset of subsets of V.In particular, multi-edges are permitted, but edges in which the same node appears twice are not.(3) A finite label set X .(4) A role labeling function ∶ {(v, e) ∈ V × E v ∈ e} → X where X is a label alphabet.
The statement (v, e) = x is to be read as "node v has role x in edge e."We emphasize that the labeling function is contextual.Roles are assigned neither to nodes or to edges, but rather to node-edge pairs.There are two representations of annotated hypergraphs that will be useful in our subsequent development.Let n = V , m = E , and p = X .Let H refer to the set of all annotated hypergraphs with n nodes, m hyperedges, and label alphabet X .
Definition 2 (Labeled Incidence Array).The labeled incidence array T = T(H) ∈ {0, 1} n×m×p of an annotated hypergraph is defined entrywise by Note that the labeled incidence array is distinct from the tensorial representation of directed hypergraphs in [38].An alternative representation is especially useful in the design of algorithms.
Definition 3 (Annotated Bipartite Graph).The annotated bipartite graph B(H) = (V ′ , E ′ ) consists of • A node set V ′ = (V, E) • An edge set E ′ , where an edge between v and e exists in E ′ iff v ∈ e in E. Each edge is labeled by (v, e), and may therefore be written (v, e; x).
Let B be the set of annotated bipartite graphs.
We generalize the notion of self-loops in graphs via the concept of degeneracy.
Definition 4 (Degeneracy).An edge e ∈ E is role-degenerate if the same node v appears twice in e, with the same role.Edge e is simply degenerate if the same node appears twice in e, possibly in different roles.We call an annotated hypergraph H role-degenerate (resp.degenerate) if it contains any role-degenerate (resp.degenerate) edges.
It is difficult to find a modeling justification for hypergraphs with role-degenerate edges, but degenerate edges may have modeling applications.For example, in email networks, it may be useful to log instances in which a sender also copies themsleves into the recipients list.In our experiments below, however, we work only on data with neither form of degeneracy.Throughout the remainder of the paper, we restrict H to the set of nondegenerate annotated hypergraphs.

2.1.
A Configuration Model for Annotated Hypergraphs.We now define a configuration null model on nondegenerate, annotated hypergraphs.As has recently been emphasized by [13], there are several distinct models that are often called "configuration models," and care is required in order to define and sample from them.
To do so, we define two sets of vectors that summarize the incidence array T. For each x, define the vector d x ∈ Z n + entrywise as Similarly, define the vector k ∈ Z m + entrywise as The vector d x counts the number of times that each node plays role x in H, while the vector k x counts the number of nodes with role x in each edge.Let D be the matrix whose xth column is d x , and K the matrix whose xth column is k x .In a slight abuse of notation, we also regard D and K as functions of H.
Definition 5 (Configuration Null Space).The configuration null space C D,K ⊂ H induced by degree-role matrix D and dimension-role matrix K is If C D,K ≠ ∅, we say that D and K are configurable.
Throughout the remainder of this paper, we assume that D and K are configurable.Note that this is always the case when D and K are extracted from an empirical data set.
In C D,K , each node "remembers" how many times it played role each x and each edge remembers how many nodes playing role x were contained in it.Summing over x, we see that nodes remember their degrees and edges their dimensions.The null space C D,K thus generalizes the hypergraph configuration null space of [10].
A natural approach to defining a null model is to define the uniform measure on C(H 0 ).Such a definition is attractive from a theoretical standpoint, since this measure is also the entropy-maximizing measure on H subject to the degree and dimension constraints.However, methods for sampling from uniform models suffer from computational issues related to counting edge multiplicities and rejection probabilities, often resulting in slow sampling [13,11].We therefore instead follow the traditional path of [8,32], and others in formulating a configuration model as the output of a stub-matching algorithm.
A stub is an indexed pair (v, x, i) of a node and a role; the index i serves simply to distinguish stubs.For each such pair, d x v counts the number of times that node v has role x in H 0 .We collect all stubs in a single multiset: .
Stub-matching proceeds via the following algorithm, which we describe informally.For each edge e: (1) For each role x, uniformly sample k x e stubs with role x from Σ, without replacement, and combine them via multiset union.Add the result to the edge set E.
The algorithm terminates when it is impossible to form the next edge.When D and K are configurable, stub-matching terminates when Σ = ∅ and produces a partition generating a stub-labeled hypergraph with specified degree and dimension sequences.
By construction, the output of stub-matching is distributed according to the uniform measure µ 0 on the set Σ D,K of partitions of indexed stubs subject to the first moment constraints.Let g ∶ Σ D,K → C D,K be the map that sends to each such partition its associated hypergraph.
The configuration model µ weights elements of C D,K according to their likelihood of being realized via stub-matching, conditional on nondegeneracy.In this, it differs from the uniform distribution on C D,K , since the same annotated hypergraph can be realized through multiple configurations.Indeed, any permutation of stubs of the form (v, x, 1), (v, x, 2) does not alter the image under g.Because of this, the configuration model tends to give greater probabilistic weight to annotated hypergraphs in which there are many parallel edges.
By definition, we can in principle sample from µ by repeatedly performing stub-matching until a nondegenerate configuration is obtained, and then applying the function g.This approach is usually impractical, as the probability of realizing a nondegenerate configuration is typically very low.The generalization of limit laws such as those provided by [2] for a dyadic configuration model governing the probability of nondegeneracy would be a welcome development beyond our present scope.

2.2.
Edge-Swap Markov Chains.Edge-swap Markov Chain Monte Carlo provides an alternative approach to sampling from µ D,K .The benefit of this method of sampling is that, as long as it is initialized with a non-degenerate hypergraph, all samples produced are guaranteed to be nondegenerate.
It is convenient to define edge swaps on the annotated bipartite graph B.
Definition 7.An role-preserving edge-swap on B is a map of pairs of edges: (v 1 , e 1 ; x), (v 2 , e 2 ; x) ↦ (v 2 , e 1 ; x), (v 1 , e 2 ; x) We can also regard a role-preserving edge-swap of bipartite edges f 1 and f 2 as a map π ∶ B → B that generates a new bipartite graph B ′ .In this case we write B ′ = π(B f 1 , f 2 ).Note that it is possible that B = π(B f 1 , f 2 ); this occurs when f 1 = (v, e 1 ; x) and f 2 = (v, e 2 ; x) for some v or f 1 = (v 1 , e; x) and f 2 = (v 2 , e; x) for some e.Nondegeneracy rules out the case that f 1 = f 2 = (v, e; x) for distinct f 1 and f 2 .
Let B x denote the edges of B with role label x.Then, we can construct a Markov chain B t ∈ B by repeated role-preserving double edge swaps.Some care is needed to ensure that that each state of this chain is nondegenerate.The full algorithm is formalized in Algorithm 1.The Markov chain B t of hypergraphs induces a chain H t ∈ H of annotated hypergraphs.
Theorem 1.The Markov chain H t ∈ H is irreducible and reversible with respect to µ.
Proof.It is convenient to first view stub-matching as an algorithm for generating stublabeled bipartite graphs.To do so, we observe that the output partition of stub-matching defines a bipartite graph similar to B, except that to each edge (v, e; x) is associated an integer between 1 and d x v .Let B denote this stub-labeled bipartite graph, and B the set of such graphs.We recover a standard bipartite graph B from B by erasing the stub-labels.
We now observe that a bipartite edge-swap Bt+1 = π( Bt f 1 , f 2 ) always produces a new element of B, since each bipartite edge has a distinct stub-label.For the same reason, if Bt+1 = π( Bt f 1 , f 2 ), then f 1 and f 2 are the only bipartite edges for which this relation holds.
In particular, the transition kernel of the edge-swap Markov chain may be written where r(f 1 , f 2 B) is the probability that bipartite edges f 1 and f 2 are sampled and that x 1 = x 2 .It follows that P will be reversible with respect to the uniform measure on B provided that r(f To see why this is the case, note that r(f 1 , f 2 B) depends on B only through the sizes of edges and the role distributions within each edge.These quantities are preserved under double edge swaps.We thus conclude that B t is reversible with respect to µ 0 , and therefore H t is reversible with respect to µ.
It remains to show irreducibility.We will construct a supported path in state space from B to B′ , where these are stub-labeled bipartite graphs with fixed marginals.Let E and E ′ denote the respective edge sets of these bipartite graphs.Choose x such that Bx ∖ B′x is nonempty, and let (v 1 , e 1 ; x, i 1 ) ∈ Bx ∖ B′x .Then, there must exist edges of the form (v 1 , e 2 ; x, i 1 ) and (v 2 , e 1 ; x, i 2 ) in B′x ∖ Bx , since node v 1 must be connected to some edge with role x, and similarly edge e 1 must be connected to some node with role x.In particular, the swap (v 1 , e 2 ; x, i 1 ), (v 2 , e 1 ; x, i 2 ) ↦ (v 1 , e 1 ; x, i 1 ), (v 2 , e 2 ; x, i 2 ) reduces the size of the set Bx ∖ B′x by at least one.Repeating this procedure allows us to reduce the size of Bx ∖ B′x indefinitely, and therefore constitutes a supported path between them.
Corrolary 1.As δt → ∞, the output of Algorithm 1 is asymptotically independent and identically distributed according to µ.
For the feature of guaranteed nondegeneracy, we pay a computational cost in mixing times.Unfortunately, no results are extant for this class of edge-swap Markov chains, while the best available upper bound for the mixing time of a related class of chains [19,18] scales poorly with the node degrees and total number of edges.Despite this, edge-swap Markov chains can be deployed in a variety of practical settings, see [13] for a review.

Analysis of Annotated Hypergraphs
In this section, we introduce a series of tools for measuring the structural properties of annotated hypergraphs while flexibly incorporating information about roles.We split these into two categories; those that can be natively defined on the annotated hypergraph structure, and those that can be measured using a projection of the annotated hypergraph to a weighted directed network.
3.1.1.Role Densities.The simplest nontrivial statistic is the individual role density associated to a node.The individual role density is a probability distribution summarizing the proportion of interactions in which node v plays each role.It may be computed as where e is the vector of ones, and d v is the vector of degrees of node v in each role, defined in Equation (2).Normalizing yields the local role density p ′ .
The local role density measures not the behavior of node v, but rather the typical behavior of the nodes with which v interacts.
3.1.3.Assortativity.Classically, degree-assortativity measures the tendency of nodes of similar degrees to connect to each other.In particular, it is often observed that nodes of high degree tend to connect to other nodes of high degree.In an annotated hypergraph, each node possesses a distinct degree corresponding to each role.We therefore develop a roledependent assortativity measure, similar to assortativity for directed graphs.The measure we choose generalizes that of [10], which was formulated for hypergraphs without annotations.We first provide the mathematical formulation, and then discuss the nature of the correlation it measures.
Fix roles x, y ∈ X .For any nodes u, v ∈ V, let count the number of edges in which u has role x and v has role y.We compute an assortativity score as a correlation coefficient between the random variables d x U −s xy U V and d y V −s xy U V , where the random nodes U and V are sampled according to a two-stage scheme.We first select a uniformly random edge e from E. From e, we then select a uniformly random pair of distinct nodes U and V , conditioning the entire process on the roles x and y.The resulting probability law for U and V is To compute a Spearman assortativity coefficient, let q x uv denote the the rank of d x u − s xy uv among all pairs u and v.Then, the Spearman assortativity coefficient is given by with expectations computed with respect to Equation (5).By construction, ρ xy is symmetric in the roles x and y.Note also that the definition of q x uv excludes from the calculation instances in which u and v themselves interact in these roles.The assortativity coefficient supports quantitative investigations of questions like: is it statistically the case that the sender and receiver of a given email tend to send and receive many emails, respectively?Do scholars with many first-authorships tend to collaborate with scholars with many lastauthorships?
Since it can in practice be difficult to compute the expectations appearing in Equation ( 6) exactly, it is often convenient to estimate them via repeated sampling from Equation ( 5).This is the method used in our experiments below.
3.2.The Weighted Projection.We often wish to apply tools from dyadic graph theory and network science to polyadic data.The usual way to do this is to project the latter by replacing each k-dimensional hyperedge with a k-clique.When role information is available, we can perform more flexible projections.Let R = [r xy ] ∈ R p×p .We refer to R as a role-interaction kernel, which describes directed interaction strengths between pairs of roles.We do not place any restriction on the values in R, although we can without loss of generality rescale to ensure that max x,y r xy = 1.In all our examples, the entries of R will be nonnegative, but in principle negative interaction weights can also be used.
We compute the weighted projection matrix W = W(H, R) entrywise via the formula Each entry w uv thus counts the number of edges connecting any pair of nodes, weighted by the entries of R. To illustrate, let us first consider the directed hypergraphs of [14].
Directed hypergraphs possess two possible roles, "source" and "target."To project a directed hypergraph to a weighted dyadic graph that respects the roles, we can compute Equation ( 7) using the interaction kernel The result is a weighted directed graph in which w uv counts the number of hyperedges in which u appears as a source and v as a target.This example is somewhat trivial, but the benefit of our general formalism is that flexible modeling choices are possible.For example, in our case study of the Enron email data set below we use the interaction kernel This kernel reflects an assumption that information travels efficiently from senders to direct receivers, but more weakly to cc'd receivers.
3.2.1.Centrality.Standard, dyadic centrality analysis may be performed on the weighted projected graph of an annotated hypergraph.For our examples in Section 4 we consider the results of computing Pagerank [35] and eigenvector centrality on weighted projected networks [34].We note that there are alternative approaches to defining centrality in hypergraphs such as [41] and [4].The first of these uses a random walk formulation that can be represented via normalized weighted projections.The latter is applicable only for k-uniform hypergraphs.Our approach, while not a direct generalization of either, is considerably more flexible in applied data analysis.

Modularity and Community Detection.
We extend the notion of modularity to annotated hypergraphs.Many such extensions are possible.Unlike a recent proposal [23], we define a dyadic notion of modularity via null expectations for the weighted projected graph computed via Equation (7).While our approach loses some higher-order information in the modularity calculation, it has the benefit of allowing roles to be flexibly incorporated into the null expectation.
Recall that w uv gives the observed weighted edge count from u to v, with the weights specified via R.In order to derive a working notion of dyadic modularity, we need only to estimate the expectation of w uv under a suitably-chosen null.We will approximate this expectation under the annotated hypergraph configuration model.
Let us estimate E µ [M xy uv ], the expected number of edges that contain node u in role x and node v in role y.We begin by forming an edge e via stub-matching.There are k x e x-stubs that must be selected to form e. Each of these has probability approximately ⟨e,d x ⟩ to be node u.Supposing the probability of degeneracy in an individual edge to be small, we can thus approximate the probability that e contains u in role x as k x We can write this expression more compactly as where ⊗ denotes the vector outer product.
To compute E µ [W], we weight by R and sum over role pairs: We then define a dyadic modularity score of a partition g: where G is the one-hot encoding matrix of the partition g.As usual, the pre-factor ensures that It is important to clarify the nature of the null model used in the modularity calculation.A procedure that is commonly followed for studying polyadic data is to construct a projected graph and perform modularity maximization with respect to an implicit null defined over dyadic graphs.In contrast, we have defined a null over the space of annotated hypergraphs, and then computed expectations in the projected graph with respect to this higher-order null.This approach has the benefit of preserving some information about polyadic interactions in the modularity score, even though this score is natively dyadic.These two approaches will generally lead to different null matrices and therefore different partitions.
In order to approximately maximize Equation ( 8), we adopt the multiway spectral algorithm of [40], though many alternatives are possible.The matrix W is not symmetric, and therefore it is necessary to make a small adjustment to this algorithm [28].Rather than computing the leading eigenvectors of the matrix W − E µ [W], we instead compute the eigenvectors of the symmetrized form The multiway spectral algorithm is then applied to perform this task.

Results
We illustrate how annotated hypergraphs and their null models can be used to enrich analysis of previously studied data sets.

Enron Case Study.
We first focus our analysis on the Enron email data set [26].This data contains the emails from employees of the company prior to its forced bankruptcy and shutdown due to corporate fraud and corruption.While this data is temporal in nature we consider only the time-aggregated graph, neglecting the timestamps of edges.We also only consider official email accounts of Enron employees, referred to as the 'core' group.
4.1.1.Role Distributions and Assortativity.The data contains three roles: "from", "to", and "cc."These describe the sender, recipients, and carbon copy recipients of emails respectively 1 .Depending on the occupation of the employee, the number of emails they send and receive (and therefore their role participation) will vary.Figure 1(A) shows the distribution individual and local role densities for the data.Here we see a diverse range of behaviour, with an increased diversity in individual node distributions compared to local distributions.For example, there are nodes that have exclusively message senders and exclusive receivers, but no node neighbourhoods consist of exclusively one role.The extent of role hybridization highlights the importance of modeling roles as properties of node-edge pairs, since no node can be assigned a single role.
Figure 1(B) shows the individual and local role densities for the four CEOs of the company, now each connected by a line.Here we see that, while the CEOs correspond with other individuals who perform similar roles, they exist across a spectrum of behaviour themselves, namely in whether they are senders or receivers of emails.For example, Kenneth Lay was the sender of emails roughly 65% of the time, in comparison with David Delainey, who played that role in less than 10% of interactions.This passive role of David Delainey as a receiver of information rather than originator of it may be reflected in the lesser sentencing he received in comparison to Lay.
In Figure 1(C) we show the effect of randomization under hypergraph configuration models.The configuration model introduced in Section 2 preserves the roles of all nodes, while in the non-role-preserving variant we simply erase the role labels.The samples from both null models show a reduction in heterogeneity of the role distribution -randomization has the effect of homogenizing the population.This effect is especially apparent under non-rolepreserving randomization and the local role distribution converges towards the average over all nodes.
The role assortativity, defined in Equation ( 6), captures the tendency of nodes that frequently play one role to be associated with nodes that frequently play another.Figure 2 compares the the observed and null-distributed assortativities ρ xy for each combination of roles in the Enron data set.Five combinations are shown -there are six pairs of role labels, and the 'from/from' combination is vacuous since all emails have exactly one sender.In four of the five combinations, the assortativity coefficient is much higher than would be expected under the null and would generally be judged statistically significant.The positive assortativities in the 'from/to' and 'from/cc' combinations quantify the tendency of prolific senders to share information with prolific receivers.The latter combination highlights the importance of using null models to contextualize network measurements.Despite the fact that the observed assortativity is negative, the null distributions are even more so.The data should therefore be judged more assortative than expected by chance.The 'to/to' and 'cc/cc' combinations quantify the tendency of important receivers to do so along the same communication threads.The 'to/cc' combination illustrates the utility of role-preserving randomization -while this measurement would be statistically significant under randomization without role information, it falls within the bulk of the null when roles are preserved.We recall that the Spearman coefficient defined above subtracts out the edges along which u and v interact.It is natural to conjecture that the result for the 'to/cc' combination indicates a tendency for nodes to cluster in recurring 'to/cc' motifs, which are then removed  in the course of calculation.An example of a recurring pattern might be frequent emails to an executive with their personal assistant cc'd.
4.1.2.Centralities.In this and subsequent sections, we study the weighted projected graph of the Enron annotated hypergraph.We will primarily use the role interaction kernel given by This kernel emphasizes flow along edges -information flows strongly to receivers listed in the 'to' field and less strongly to those listed in the 'cc' field.
Figure 3 illustrates the flexibility of interaction kernels in studying graph properties.We compute eigenvector and PageRank centralities on the weighted projected graph using the kernels R and R T .High-centrality nodes under R will tend to be those to whom information flows, while under R T they will tend to be those from whom information originates.The kernel R thus emphasizes information sinks, and R T information sources.In bulk, the source and sink centralities are only weakly correlated, indicating that they capture distinct structural properties of the network.The flexibility of the formalism of annotated hypergraphs with user-specified role interaction kernels supports the discovery of these features.4.1.3.Modularity Maximization.Figure 4 shows the best of 100 partitions obtained using this kernel for k = 4 communities.Inspection of the clustered adjacency matrix (left) suggests that three of the communities are relatively coherent, while one is highly disperse and essentially serves as a "none of the above" class.On the right, we show the communities themselves, along with nodes colored according to job title.qqq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq Title q q q q q q q q q CEO President Vice President It is useful to draw contrast against the uniform role interaction kernel, which ignores roles entirely: The use of this kernel produces substantially different communities when compared to the information flow kernel.The best of 100 partitions obtained using this kernel only chose three communities, when up to k = 4 were available.This partition achieves modularity A simple measure of similarity between the two partitions is the normalized mutual information , where X and Y are random variables giving the community assignment of a uniformly random node under each scheme, H(X) is the entropy of random variable X, and I(X, Y ) the mutual information of the two random variables X and Y .The NMI has maximum value of unity when X and Y are deterministically related, and minimum value of zero when they are statistically independent.In this case, the normalized mutual information between the weighted and unweighted community assignments is 0.55.This score indicates that the partitions are correlated, as we might expect -however, there are nevertheless substantial differences between them.These simple experiments emphasize the importance of appropriately-specified interaction kernels when performing community detection on annotated hypergraphs.

Ensemble Study.
We now turn to studying annotated graph features systematically across a range of data sets.We consider six data sets, outlined in Table 1, with full descriptions given in Appendix A. We capture the wide variety of possible hypergraphs, those where the number of nodes exceeds that of the number of edges ( V ≫ E ), and those where the number of nodes is much less than the number of edges ( V ≪ E ).The number of node-edge stubs can take a maximum value of V E which corresponds to every node being included in every edge.All the data are social in nature, but diverse in their interpretation.
With the exception of the Enron data, all are sparse, with low mean node degree ⟨d⟩ and edge dimension ⟨k⟩.
For each data set we calculate seven summary statistics.Two of these are the average entropy of the individual and local role densities (Equations ( 3) and (4) resp.), which capture the diversity of roles observed on individual nodes and their neighbourhoods.An entropy of zero indicates no role diversity observed, while maximal entropy2 indicates all roles are equally likely to be observed.In addition we calculate the mutual information between a node's individual roles and local roles.This quantity is computed as the mean KLdivergence between the individual and local role densities.We also calculate the weighted The average local role mutual information across data sets, with comparisons to the role-preserving (green) and non-role-preserving ensembles (green).In all cases, the observed value is falls outside the interquartile range for the role-preserving ensemble (orange), and in all but the math-overflow data set for the non-role-preserving ensemble (green).Note that in both the movielens and twitter data sets we may not have completely reached the mixed-state during the burn-in period, however in both cases the ensemble distributions differ significantly from the empirical observation.
degree, PageRank, and eigenvector centralities for each node.To create suitable summary statistics we again use entropy, now across the distribution of centrality across nodes.This captured how concentrated the centrality is across all nodes in the hypergraph.Finally we report the number of weakly connected components in the weighted projection.
We assess the significance of each feature by comparing with ensembles generated from both role-preserving and non-role-preserving randomised swaps.We take 500 samples from each null model, each time performing ⌊0.1 E ′ ⌋ shuffles, where E ′ is the number of edge stubs.We use a burn-in period of 10 E ′ shuffles to ensure that all chains are sufficiently well-mixed.
In Figure 5 we show the local role mutual information across each data set.We see in all cases except for math-overflow that the local role mutual information is significant when compared to the non-role-preserving null model.This intuitively makes sense given that nodes may switch roles and so any correlation between node states is lost.In all data the local role density is significant when compared to the role-preserving model, however again the math-overflow shows the least difference (likely due to the small data size).Despite being significant in all but one data set, the local role mutual information can be both larger than expected (e.g.enron, scopus-multilayer) and small than expected (e.g.stack-overflow, twitter) when compared to the null.This can be explained by certain nodes having little diversity of roles in their local neighbourhood.For example, in the enron hypergraph, certain nodes may only ever send messages (and so their neighbourhood is considers of recipients) as we saw in Figure 1(b).However, upon shuffling, these nodes may be swapped with other nodes with more diverse neighbourhoods, effectively increasing the average local diversity.Those which are lower than expected suggest that the hyperedges Table 2.The significance of multiple observables across the enron and stack-overflow data sets.Significance scores coloured red are two standard deviations larger in the original data than the null model.In contrast, those coloured blue are two standard deviations smaller.The results for all data sets are presented in Figure 6.themselves are relatively uniform in participating roles.In the twitter data this could be due to the limit on the edge cardinality (due to the character limit on posts).The significance of the remaining features for two data sets is given in Table 2. Naturally the node role entropy is never significant under the role-preserving null model, however in all cases it becomes significant when roles are not preserved upon shuffling.In the enron data set all features (barring the number of connected components) differ significantly from the non-role-preserving null.Interestingly the shuffling effect is not consistent across the different centrality measures.For the eigenvector centrality is more localised than expected (lower entropy) however the PageRank is less localised than expected (higher entropy).Echoing Figure 1(C), the neighbourhood role distributions are more diverse in both null models.For the stack-overflow data, the significance of number of components can easily be explained by the presence of topic cliques in the original data.For example, users answering questions on Python may be unlikely to answer questions on Javascript.Since the null model is agnostic to these topics, the components are merged under shuffling.The role entropies are all significantly lower than expectation.This can be explained by nodes being consistent in the the roles that they take across multiple questions -that is, question answerers tend to answer more questions, for example.
A full summary of the of the analysis across all features and datasets is given in Appendix B.

Discussion
Many of the seminal results in network science were obtained using dyadic, unweighted networks without metadata -perhaps the minimal model of a complex system.As the quantity and richness of networked data have grown, so too has the need to incorporate higherorder interactions and heterogeneous node and edge properties into our models.Toward this program, we have introduced annotated hypergraphs.Annotated hypergraphs may be viewed as a natural extension of directed graphs for modeling heterogeneous, polyadic interactions.Because the setting of annotated hypergraphs is highly general, they allow the data scientist to flexibly incorporate their assumptions about how role information should feature into downstream analysis.
We have also made contributions to the inferential and exploratory analysis of annotated hypergraphs.First, we formulated a role-aware configuration null model.This model can be used to assess whether an observed structural feature in an annotated hypergraph can be explained purely through information about role-dependent edge and node incidence.Features that cannot may be reasonably attributed to higher-order mechanisms, which may then be investigated.Second, we provide a small suite of analytical tools that generalize many common methods for studying dyadic networks, including centrality, assortativity, and modularity scores.We show how each of these may be used in role-aware ways to highlight diverse features of the data.
Annotated hypergraphs admit multiple avenues of future work.There are opportunities to define and study diffusion, spreading, and opinion dynamics on annotated hypergraphs, using models that account for heterogeneous, polyadic interactions.For example, our weighted projection scheme does not incorporate any explicit accounting of edge dimensions.It may be of interest to define a role-dependent simple random walk along the hypergraph.These structures implicitly normalize edge dimensions, implying that a node who received an email along with ten others is in some sense less important than one who received a private communication.This walk could then be used to define alternative measures of centrality and modularity.
Another direction of future development concerns the structure of hyperedges.By assigning roles within each hyperedge, we impose a certain model of how the interaction marked by the edge takes place.Further structural assumptions are possible.In some cases it may be useful to assume, for example, that the hyperedge itself contains a small network between the nodes.The role of the hyperedge in this case is to serve as a single entity housing a network motif [1].A null model over such structures would allow for the sampling of random graphs with control over the participation of nodes in various microscale graph structures.While such a model would be substantially more complex, the emerging importance of network motifs [6] and network-of-networks modeling (e.g.[24]) may provide sufficient impetus to pursue it.
Finally, an important feature of many polyadic data sets is that interactions are temporally localized.The incorporation of temporal information into models of hypergraphs and annotated hypergraphs is of substantial importance for modeling realistic dynamics on network substrates.One route may be to generalize temporal event graphs [31] for rich, polyadic data.Such a generalization, along with the development of associated metrics, would be of substantial theoretical and practical interest.
This work is a contribution to the project of integrating progressively more complex information into network data science.We foresee that this program will become increasingly important as rich, relational data sets become more readily available.

Declarations
Availability of data and materials.The datasets generated and analysed during the current study are available from the authors webpage.A Python package for the analysis of annotated hypergraphs is available on Github.This includes the analysis to replicate the figures from this study.
For this hypergraph we choose the role-interaction matrix to be This reflects that information can only be transmitted from the sender.Furthermore this reflects the assumption that information is less likely to be transmitted (or not fully transmitted) to those who are "cc'd." A.1.2.Scopus Multilayer Literature.This data consists of academic literature surrounding 'multilayer networks,' collected using Scopus.Specifically we choose all references from the following three reviews and books: Authors are assigned roles for each article dependent on their order in the list of authors.
Although practices vary across disciplines and institutions, here we distinguish between first, middle, and last authors.When there are fewer than three authors then the role is assigned as first for a single author, and first and last for a pair of authors (regardless of any note of equal contribution).
For this hypergraph we choose the role-interaction matrix to be We make the modelling assumption that the first author is the most knowledgeable and therefore able to spread information to other authors.The last author is often an advisor who can spread information to the first author, while the middle authors can weakly diseminate information between everyone.
A.1.3.MovieLens Actor Credits.This data contains a list of credits from the MovieLens data collection.We consider the top five billed actors from a collection of [] movies.Actor roles are distinguished by being the top-billed actor, or in the remaining cast.
For this hypergraph we choose the role-interaction matrix to be R = The top billed actor is assumed to be the most diffusive, potentially spreading fame and influence.The lower billed cast have a smaller diffusive rate.
A.1.4.Stack Overflow Threads.This data contains a list of Stack Overflow question threads which achieved a score greater than 25 between 1st January 2017 to 1st January 2019.The score is calculated and reflects the quality of the question both in terms of its pertinence and its presentation.Here edges reflect questions threads where users can be in three roles.These are the question setter, the question answerers, and the best answerer (chosen by the question setter as the accepted answer).
For this hypergraph we choose the role-interaction matrix to be Here, the question setter may disseminate some information (either about the question, or the topic).The question answerers may share information uniformly across all roles.The node whose question is answered transfers the most information to the question setter since the setter has chosen this response to adopt.This accepted answer may also benefit the other answerers with less useful answers.
A.1.5.Math Overflow Threads.This data is in the identical format to the Stack Overflow threads, however questions threads are now on the topic of mathematical research as opposed to general programming.
A.1.6.Twitter Keyword Sample.This data contains a list of messages posted on the social media platform Twitter over a 24 hour period.All these messages contained a particular keyword relating to the aviation industry.Each edge corresponds to a message (or tweet).Nodes participating in a message can have four possible roles, a sender, a receiver, a retweeter, and the retweeted 3 .
For this hypergraph we choose the role-interaction matrix to be We assume that the sender transmits information to the receivers in a directed fashion (information travelling only one way).A retweeted node can transmit information to the node that retweets it and so to any receivers who are also included in the message.

Appendix B. Ensemble Study
In Figure 6 we present the all results from the ensemble study.

3. 1 . 2 .
Local Role Density.It is of interest to compare the individual role density p v of node v to the proportion of roles in a neighborhood of v. Let c y v give the number of times in which a node incident to v plays role y, other than v itself.We then have, c y v = x∈X e∈E I( (v, e) = x) u∈e,u≠v I( (u, e) = y) = e∶v∈e k y e − d y v .
x ⟩ .Similarly, the probability that e contains v in role y is k y e d y v ⟨e,d y ⟩ .Summing over edges gives our approximation for E µ [M xy uv ]: E µ [M xy uv ] x ⟩⟨e, d y ⟩ .

Figure 1 .
Figure 1.Role distributions in the Enron data set.(A).The individual role densities (orange) and local role densities (blue).(B).The individual role densities (orange) and local role densities (blue) of the four Enron CEOs, now with lines connecting the statistics for each node.(C).The local role densities (black) compared with a single sample from the rolepreserving ensemble (blue) and non-role-preserving ensemble (orange).

Figure 2 .
Figure 2. Hypothesis-testing for significant role-dependent assortativity as measured by Equation (6).The empirical values (black bars) are shown alongside labeled (orange) and unlabeled (green) configuration null distributions for each combination of roles.

Figure 3 .
Figure 3. Sensitivity of eigenvector and PageRank centrality measures to role interaction kernels.The horizontal axis gives centrality scores under the projection R T , while the vertical gives those under R.The PageRank teleportation parameter is α = 0.15.

Figure 4 .
Figure 4. Example partition of the Enron data set via approximate spectral maximization of Equation (8).(Left): The adjacency matrix, arranged in order of the partition.Colors are shown on a log-scale.(Right): Visualization of the network.Directed edges have been made undirected.The darkness of each edge corresponds to its weight.The color of each node corresponds to listed job title.In this experiment, k = 4, and the kernel used is the information flow kernel R of Equation (9).The modularity of the shown partition is 0.506.

Table 1 .
Descriptive statistics for each data set.The fourth and fifth columns give the mean node degree and mean edge dimension, both ignoring role labels.Q = 0.524.The modularity scores of the two kernel weighting methods should not be compared directly, since they are computed on different modularity matrices.