Our proposed Co-MLHAN is a self-supervised graph representation learning approach conceived for multilayer heterogeneous attributed networks. As previously discussed, a key novelty of Co-MLHAN is its higher expressiveness w.r.t. existing methods, since heterogeneity is assumed to hold at both node and edge levels, possibly for each layer of the network. This capability of handling graphs that are multilayer, heterogeneous, and attributed simultaneously, enables Co-MLHAN to better model complex real-world scenarios, thus incorporating most information when generating node embeddings.
In the following, we first provide a formal definition of multilayer heterogeneous attributed graph and representation learning in such networks, then we move to a detailed description of Co-MLHAN. The notations used in this work are summarized in Table 9, Appendix 1.
Preliminary definitions
A multilayer graph is a set of interrelated graphs, each corresponding to one layer, with a node mapping function between any (selected) pair of layers to indicate which nodes in one graph correspond to nodes in the other one. We assume that each layer can be heterogeneous, i.e., is characterized by nodes of different types and/or edges of different types, such that any node can be linked to nodes of the same type as well as to nodes of different types, through the same or different relations, and is attributed, i.e., has nodes associated with external information, available as set of attributes. Therefore, each layer graph has its internal set of edges, dubbed intra-layer or within-layer edges, as well as a set of edges connecting its nodes to nodes of another layer, dubbed inter-layer or across-layer edges. Layers can be seen as different interaction contexts, semantic aspects, or time steps, while the participation of an entity to a layer can be seen as a particular entity instance. Instances of the same entity are connected via pillar-edges. We hereinafter refer to entity instances as nodes in the multilayer network. Figure 1 illustrates an example of a multilayer heterogeneous attributed graph.
Multilayer heterogeneous attributed graph.
We define a multilayer heterogeneous attributed graph as \(G_{{\mathcal{L}}}=\langle {\mathcal{L}}, {\mathcal{V}}, V_{{\mathcal{L}}}, E_{{\mathcal{L}}}, A, R, \phi , \varphi , {\varvec{\mathcal{X}}}_{{\mathcal{L}}}\rangle\), where \({\mathcal{L}} = \{G_{1}, \cdots , G_{\ell } \}\) is the set of layer graphs, indexed in \(L = \{1,\dots , \ell \}\), with \(|{\mathcal{L}}| = \ell \ge 2\), \({\mathcal{V}}\) is the set of entities, \(V_{\mathcal{L}} \subseteq {\mathcal{V}} \times {\mathcal{L}}\) is the set of nodes, \(E_{{\mathcal{L}}}\) is the set of edges, including both intra- and inter-layer edges, A is the set of entity, resp. node, types, R is the set of relation types, \(\phi :{\mathcal{V}}\rightarrow A\) is the entity-type mapping function, \(\varphi :E_{{\mathcal{L}}} \rightarrow R\) is the edge-type mapping function, and \({\varvec{\mathcal{X}}}_{{\mathcal{L}}}\) is a set of matrices storing attributes, or initial features, with \({\varvec{\mathcal{X}}}_{{\mathcal{L}}} = \bigcup _{l=1\ldots \ell } {\varvec{\mathcal{X}}}_{l}\). More specifically, entities, resp. nodes, of each type are assumed to be associated with features stored in layer-specific matrices \({\varvec{\mathcal{X}}}_{l} = \{ {\textbf{X}}^{(a)}_{l} \}\), where each \({\textbf{X}}^{(a)}_{l}\) is the feature matrix associated with entities, resp. nodes, of type \({a} \in A\) in the l-th layer. Throughout this work we use symbol \({\textbf{x}}^{(a)}_{{\langle i,l \rangle }}\) to denote the feature vector of entity \(v_{i}\) of type a in layer \(G_{l}\). We also admit that features can be layer-independent, in which case we indicate with \({\textbf{x}}^{(a)}_{i}\) the feature vector associated with entity \(v_{i}\) of type a in each layer, i.e., \({\textbf{x}}^{(a)}_{\langle i,l \rangle } = {\textbf{x}}^{(a)}_{i}\) for each \(G_{l} \in {\mathcal{L}}\).
We specify that each entity has instances (i.e., nodes) in one or more layers, and appears at least in one layer, i.e., \({\mathcal{V}} = \bigcup _{l=1\ldots \ell } {\mathcal{V}}_{l}\), with \({\mathcal{V}}_{l}\) set of entities appearing in the l-th layer. Likewise, \(A=\bigcup _{l=1\ldots \ell } A_{l}\), with \(A_{l}\) denoting the set of node types of the l-th layer, \(R=\bigcup _{l=1\ldots \ell } R_{l}\), with \(R_{l}\) denoting the set of edge types of the l-th layer, and \(E_{{\mathcal{L}}} = \bigcup _{r \in R}E_{r}\) \(\subseteq V_{{\mathcal{L}}} \times V_{{\mathcal{L}}}\), with \(E_{r}\) indicating all the edges of type r.
Moreover, \(E_{{\mathcal{L}}}\) can be partitioned into two sets denoting the intra-layer edges and inter-layer edges. Note that inter-layer edges represent coupling structure of layers; in our setting, we assume that different coupling constraints between layers might hold, e.g., layers could be coupled with each other, only adjacent layers could be coupled, layers could follow a temporal relation order, etc. We define the set of layer pairing indices as \(L_{cross}\), where each \(\pi =(l,l') \in L_{cross}\) is a pair of coupled layers denoting an interaction between layer \(G_{l}\) and \(G_{l'}\).
We stress that in contrast to other approaches, such as (Yang et al. 2021), in our formulation each layer \(G_{l}\) (\(l=1\ldots \ell\)) is a heterogeneous graph at both node and edge levels, i.e., \(|A_{l}|>1\) and \(|R_{l}| > 1\). Moreover, \(A_{l} \subseteq A\), for all \(G_{l} \in {\mathcal{L}}\), and \(R_{l} \subset R\), since inter-layer connections are regarded as different types of edges.
Multilayer heterogeneous attributed graph embedding.
Given a multilayer heterogeneous attributed network \(G_{{\mathcal{L}}}\), our goal is to learn an embedding function at entity level \(g : {\mathcal{V}} \rightarrow {\mathbb{R}}^{d}\), where d is the dimension of the latent space, and \(d \ll |{\mathcal{V}}|\). Function g can be derived from an analogous function \(g' : V_{{\mathcal{L}}}\rightarrow {\mathbb{R}}^{d}\), where d is the dimension of the latent space, and \(d \ll |V_{{\mathcal{L}}}|\), being the embedding function at node level. The mapping g, resp. \(g'\), defines the latent representation of each entity \(v_{i} \in {\mathcal{V}}\), resp. node \(\langle i,l \rangle \in V_{{\mathcal{L}}}\), and we use symbol \({\textbf{z}}_{i}\), resp. \({\textbf{z}}_{\langle i,l \rangle }\), to denote its learned embedding. The learned embeddings are eventually used to support multiple downstream graph mining tasks, e.g., entity/node classification, link prediction, node regression, etc.
Co-MLHAN: contrastive learning framework for multilayer heterogeneous attributed networks
We aim to learn node embeddings in an unsupervised manner, with function g employing graph neural networks and attention mechanisms in order to encode both structural and semantic, heterogeneous and multilayer information in the context of a multi-view contrastive mechanism.
Our proposed approach is based on the infomax principle of maximizing mutual information (Linsker 1988), both in terms of graph structure encoding—complying with the distinction between local and high-order information—and across-layer information—complying with the distinction between inter-layer edges connecting direct neighbors and pillar-edges connecting different instances of the same entity. According to this principle, we define two different structural views on the original graph: the one is designed to encode the local structure of nodes and handle heterogeneity, capturing useful information from one-hop neighbors of different types (possibly from different layers), and the other one is designed to encode the global structure of nodes and model information from distant nodes in the network, thus capturing useful information from multi-hop neighbors of the same type. Note that we include pillar edges in the global view, since they are particular connections matching two instances of the same entity, thus enabling across-layer transitions, but they do not represent edges between two direct neighbors.
It should be emphasized that Co-MLHAN is conceived to be general and flexible, so as to exploit all available information but also being effective even when such information is lacking. For instance, across-layer relations could be limited to few replicas, nodes may show high variability in the number of neighbors, or one or more types of neighbors could be missing for some nodes.
Figure 2 shows a conceptual overview of our proposed framework. Accordingly, the final embedding for each target entity is learned through three main stages:
-
1.
Content encoding. Since the initial feature vectors of nodes/entities (\({\textbf{x}}\)) might be of different sizes, the first stage requires to transform such initial features into a shared low-dimensional latent space (\({\textbf{h}}\)). Moreover, this stage is also concerned with the content encoding “from scratch”, i.e., generating initial embeddings from raw data associated with nodes/entities, which might be from possibly multiple and heterogeneous contents, such as categorical or numerical attributes, unstructured text and multimedia content
-
2.
Graph structure encoding. According to the multi-view learning paradigm, the second stage requires to generate two distinct embeddings for each entity, reflecting the graph structure and maximizing the mutual information: (1) embeddings for the local structure (\({\textbf{z}}\)ns), including information from all direct neighbors of the nodes being instances of the target entity, and (2) embeddings for the high-order structure (\({\textbf{z}}\)mp), including information from pillar-edges and from target nodes that can be reached through composite relations (i.e., meta-paths).
-
3.
Final embedding based on contrastive learning. The third stage requires a joint optimization between the embeddings learned under the two views to generate the final entity embedding (\({\textbf{z}}\)). The contrastive learning mechanism is enforced by choosing suitable positive and negative samples from the original graph.
In the following sections, we elaborate on the graph structure encoding (stage 2) and the generation of the final embedding based on contrastive learning (stage 3). We examine their computational complexity aspects in Appendix 4. For the sake of readability, note also that, since the first stage of content encoding is actually beyond the objectives of this work, we discuss it in Appendix 3.
Graph structure encoding
The second stage models two graph views, named network schema view and meta-path view, able to encode the local and global structure surrounding nodes, respectively, while exploiting multilayer information.
The network schema of a heterogeneous graph is an abstraction of the original graph showing the different node types and their direct connections. It is often referred to as meta template, since it captures node and edge type mapping functions. Formally, a network schema is a directed graph defined over node types A, with edges as relation types from R. In a multilayer heterogeneous network \(G_{{\mathcal{L}}}\), the network schema includes all types A for individual layers and relations R, including both intra- and inter-layer edges. More specifically, we consider all relations involving any node \(\langle i,l \rangle\) of target type, denoted as \(R_{\langle i,l \rangle } \subseteq R\), and all node types a connected to the target node through a relation \(r \in R_{\langle i,l \rangle }\). Hereinafter, we refer to this graph as network schema graph. Figure 3 shows an example of network schema graph for the multilayer heterogeneous attributed network of Fig. 1.
A meta-path is a sequence of connected nodes making two distant nodes in the network reachable, i.e., the terminal or endpoint nodes of a meta-path instance. Formally, a meta-path \(M_{m}\) is a path defined on the network schema graph, in the form \({a_{1}} \xrightarrow {{r_{1}}} {a_{2}} \xrightarrow {{r_{2}}} \cdots \xrightarrow {{r_{k}}} {a_{k+1}}\), describing a composite relation \({r_{1}} \circ {r_{2}} \circ \dots \circ {r_{k}}\) between node types \({a_{1}}\) and \({a_{k+1}}\). A meta-path instance of \(M_{m}\) is a sampling under the guidance of \(M_{m}\) providing a sequence of connected nodes with edges matching the composite relation in \(M_{m}\). Examples of within layer meta-path instances are depicted in Fig. 4a and b. Given a multilayer heterogeneous graph \(G_{{\mathcal{L}}}\) and a meta-path \(M_{m}\), let \(N_{m}(i,l)\) denote the meta-path based neighbors of node \(\langle i,l \rangle\) of a certain type a, defined as the set of nodes of type \(a'\) that are connected with node \(\langle i,l \rangle\) through at least one meta-path instance of \(M_{m}\) having a as starting node-type and \(a'\) as ending node-type. Note that, similarly to Wang et al. (2019), the intermediate nodes along meta-paths are discarded. A meta-path based graph is a graph comprised of all the meta-path based neighbors. For meta-paths with terminal nodes of the same type, the resulting graph is homogeneous at node level. Figure 4c shows an example of single-layer meta-path based graph according to a specific meta-path type.
Following Wang et al. (2021), given a target entity, the network schema view is used to capture the local structure, by modeling information from all the direct neighbors of the corresponding target nodes, whereas the meta-path view is used to capture the global structure, by modeling information from all the nodes connected to the corresponding target nodes through a meta-path and from the pillar-edges derived by the corresponding meta-path based graph.
View embedding generation. The two views exploit features associated with different entity types; specifically, the network schema view takes advantage of features of neighbors of any type, while the meta-path view takes advantage of features of nodes of target type involved in high order relations.
We remind that Co-MLHAN produces for each target entity a distinct embedding under each view. Nonetheless, both views share two fundamental steps in the embedding generation: (1) aggregating information of different instances of the same type—i.e., instances of the same relation and instances of the same meta-path, respectively—and (2) combining information of different types—i.e., different types of relations and of meta-paths, respectively, as well as different layers.
Network schema view embedding
In the network schema view, the embedding of each target node is computed from its direct neighbors, both within and across layers. As mentioned before, the network schema is a multilayer heterogeneous graph, having nodes of different types and relations corresponding to intra- and inter-layer edges involving nodes of target type.
To generate the embeddings under the network schema view, we follow a hierarchical attention approach, consisting of two main steps, which are summarized as follows and depicted in Fig. 5:
-
(NSVE-1)
First, we aggregate information of the same type (i.e., different instances of the same relation type) via node-level attention, learning the importance of each neighbor and obtaining, for each node, an embedding w.r.t. each relation type that involves a node of target type t.
-
(NSVE-2)
Second, we combine information of different types (i.e., different relations in different layers) via type-level attention, learning the most relevant relations and obtaining an embedding for each node under the network schema view. Moreover, we combine information from different layers via across-layer attention, learning the importance of each layer and obtaining, for each entity, a single embedding under the network schema view.
Note that we refer to relation type and not to node type to be consistent in the event that target nodes are connected to a certain node type through multiple relationships. We point out that, in accordance with the infomax principle, the network schema view does not model pillar-edges, since they are processed in the other view. We also specify that intra-layer edges in different layers are seen as different types of relations, reflecting the separation into layers according to a certain aspect. In practice, layers are an additional way for distinguishing the context of relations.
Aggregating information of different instances of the same type (NSVE-1). Aggregating information of the same type (i.e., different instances of the same relation type) takes place via node-level attention. This step exploits features of nodes connected to target nodes through a direct link, whether they are of the same type as the target or not.
Given the graph \(G_{{\mathcal{L}}}\), we define a function, denoted as \(N^{(r)}(\cdot )\), that for any pair entity-layer yields its neighborhood under relation type r, regardless of the within layer or across-layer location of the neighbors. Formally, given a target node \(\langle i,l \rangle\), we define the set of its neighbors under relation \(r \in R_{\langle i,l \rangle }\) as:
$$\begin{aligned} N^{(r)}(i,l) = \{\langle j,l' \rangle \in V_{{\mathcal{L}}} {|} (\langle j,l' \rangle ,\langle i,l \rangle ) \in E_{r} \}. \end{aligned}$$
(1)
Above, note that \(N^{(r)}(i,l)\) returns within-layer or across-layer neighbors of \(\langle i,l\rangle\) under relation r, when \(l=l'\) or \(l\ne l'\), respectively. (Recall that pillar edges are excluded from the definition of neighbor sets). Moreover, to ensure the aggregation of the same amount of information, we sample a fixed size of neighbors to be processed at each epoch by setting a threshold value for each type of neighbor (cf. “Experimental settings” section). In our setting, neighbor sampling can be done with and without replacement. Note that this neighbor sampling approach allows for saving computational resources in case of huge networks.
We thus define the embedding of entity \(v_{i}\) in layer \(G_{l}\) based on neighbors under relation r as:
$$\begin{aligned} {\textbf{z}}^{N^{(r)}}_{\langle i,l \rangle } = \sigma \left( \sum _{\langle j,l' \rangle \in N^{(r)}( i,l )} \alpha ^{(r)}_{\langle i,l \rangle ,\langle j,l' \rangle } {\textbf{W}}_{\textbf{2}}^{(r)} {\textbf{h}}_{\langle j,l' \rangle }\right) , \end{aligned}$$
(2)
where \({\textbf{z}}^{N^{(r)}}_{\langle i,l \rangle }\) is the embedding of node \(\langle i,l \rangle\) obtained from neighborhood under relation r, \(\sigma (\cdot )\) is the activation function (default is ELU), \({\textbf{W}}_{{\textbf{2}}}^{(r)}\) is the weight matrix of shape (d, d) associated with one-hop neighbors \(\langle j,l'\rangle\), \({\textbf{h}}_{\langle j,l' \rangle }\) is the feature embedding of node \(\langle j,l' \rangle\) and \(\alpha ^{(r)}_{\langle i,l \rangle ,\langle j,l' \rangle }\) is the normalized attention coefficient for the relation r connecting \(\langle i,l \rangle\) and \(\langle j,l' \rangle\) and indicating the importance for \(\langle i,l \rangle\) of information coming from \(\langle j,l'\rangle\), as defined in Eq. 3:
$$\begin{aligned} \begin{aligned} \alpha ^{(r)}_{\langle i,l \rangle ,\langle j,l' \rangle }&= \frac{ \exp \left( e^{(r)}_{\langle i,l\rangle ,\langle j,l'\rangle }\right) }{\sum _{\langle u,l'\rangle \in N^{(r)}(i,l) } \exp \left( e^{(r)}_{\langle i,l\rangle ,\langle u,l'\rangle }\right) }, \\ \\ {\text{with}} \quad e^{(r)}_{\langle i,l\rangle ,\langle j,l'\rangle }&= {\textbf{a}}^{{(r)}^{\mathrm{T}}}\left( LeakyReLU\left( {\textbf{W}}^{(r)}\left[ {\textbf{h}}_{\langle i,l \rangle } \| {\textbf{h}}_{\langle j,l' \rangle }\right] \right) \right) , \end{aligned} \end{aligned}$$
(3)
where \({\textbf{a}}^{(r)} \in {\mathbb{R}}^{d}\) is the learnable weight vector under relation r, \([{\textbf{h}}_{\langle i,l \rangle } \| {\textbf{h}}_{\langle j,l' \rangle }] \in {\mathbb{R}}^{2d}\) is the row-wise concatenation of the column vectors associated with the two node embeddings, \({\textbf{W}}^{(r)} = [{\textbf{W}}_{{\textbf{1}}}^{(r)}\|{\textbf{W}}_{{\textbf{2}}}^{(r)}] \in {\mathbb{R}}^{d \times 2d}\) is the column-wise concatenation of \({\textbf{W}}_{{\textbf{1}}}^{(r)}\) and \({\textbf{W}}_{{\textbf{2}}}^{(r)}\), both of shape (d, d) and containing the left and right half of the columns of \({\textbf{W}}^{(r)}\), associated with destination and source nodes (one-hop neighbors), respectively.Footnote 1
In Eq. 3, we adopt the same approach as in GATv2 (Brody et al. 2021), which aims to fix the static attention problem of standard Graph Attention Network (GAT) (Velickovic et al. 2018) that limits its expressive power, since the ranking of attended nodes is unconditioned on the query node; on the contrary, GATv2 is a dynamic graph attention variant where the order of internal operations of the scoring function is modified to apply an MLP for computing the score of each attended node.
The self-attention mechanism can be extended similarly to Vaswani et al. (2017) by employing multi-head attention, in order to stabilize the learning process. In this case, operations are independently replicated Q times, with different parameters, and outputs are feature-wise aggregated through an operator denoted with symbol \(\bigoplus\), which usually corresponds to average (default) or concatenation:
$$\begin{aligned} {\textbf{z}}^{N^{(r)}}_{\langle i,l \rangle } = \sigma \left( \underset{q=1\dots Q}{\bigoplus } \left( \sum _{\langle j,l' \rangle \in N^{(r)}(i,l) } \alpha ^{(r,q)}_{\langle i,l \rangle ,\langle j,l' \rangle } {\textbf{W}}^{(r,q)} {\textbf{h}}_{\langle j,l' \rangle }\right) \right) , \end{aligned}$$
(4)
where \({\textbf{W}}^{(r,q)}\) and \(\alpha ^{(r,q)}_{\langle i,l \rangle ,\langle j,l' \rangle }\) denote the weight matrix and the attention coefficient for the q-th attention head under relation r, respectively.
Let \({\textbf{z}}_{\langle i,l \rangle }^{N^{(r)}}\) be the embedding of a target node \(\langle i,l \rangle\) obtained from its neighbors in each layer under relation r. Downstream of node-level attention, we thus obtain \(\bigcup \nolimits _{r \in R_{\langle i,l \rangle }} \{ {\textbf{z}}_{\langle i,l \rangle }^{N^{(r)}} \}\) embeddings.
Combining information of different types and layers (NSVE-2). In order to combine information of different node types according to the different relations with target nodes, we employ type-level attention for each layer separately. For each target node \(\langle i,l \rangle\), we obtain the embedding under the network schema view \({\textbf{z}}^{{{{\mathrm{NS}}}}}_{\langle i,l \rangle }\), as defined in Eq. 5:
$$\begin{aligned} {\textbf{z}}^{{{{\mathrm{NS}}}}}_{\langle i,l \rangle } = \sum _{r \in R_{\langle i, l \rangle }} \beta ^{(r)} {\textbf{z}}^{N^{(r)}}_{\langle i,l \rangle }, \end{aligned}$$
(5)
where \(\beta ^{(r)}\) is the attention coefficient for neighborhood under relation r, which is defined as follows:
$$\begin{aligned} \beta ^{(r)} = \frac{\exp {(w^{(r)})}}{\sum _{r' \in R_{\langle i, l \rangle }} \exp {(w^{(r')})}} \quad {\text{with}} \ \ w^{(r)}= \frac{1}{|{\mathcal{V}}_{l}^{(t)}|} \sum _{\langle i,l \rangle \in {\mathcal{V}}_{l}^{(t)}} {\textbf{a}}^{{{{{\mathrm{NS}}}}}^{\mathrm {T}}} {\text{tanh}}\left( {\textbf{W}}^{{{{{\mathrm{NS}}}}}} {\textbf{z}}^{{N}^{(r)}}_{\langle i,l \rangle } + {\textbf{b}}^{{{{{\mathrm{NS}}}}}}\right) , \end{aligned}$$
(6)
where \({\mathcal{V}}_{l}^{(t)}\) is the set of entities of target type t in layer l; \({\textbf{a}}^{{{{{\mathrm{NS}}}}}} \in {\mathbb{R}}^{d}\) is the type-level attention vector; \(\textbf{W}^{{{{\mathrm{NS}}}}}\) and \({\textbf{b}}^{{{{\mathrm{NS}}}}}\) are the learnable weight matrix and the bias term, respectively, under the network schema view, shared by all relation types. We hence obtain the set of embeddings \(\bigcup \nolimits _{l \in L} \{ {\textbf{z}}^{{{{\mathrm{ NS}}}}}_{\langle i,l\rangle }\}\) under the network schema view for each target node.
In order to map the learned node embeddings into the same space of the contrastive loss function, we apply an additional level of attention, i.e., across-layer attention. This is designed to evaluate the importance of each layer of \(G_{{\mathcal{L}}}\) and combine layer-wise the features of nodes. We thus obtain an embedding under the network schema view for each target entity \(v_{i}\), as defined in Eq. 7:
$$\begin{aligned} {\textbf{z}}^{{{{\mathrm{NS}}}}}_{i} = \sum _{l \in L} \beta ^{(l)} {\textbf{z}}^{{{{\mathrm{ NS}}}}}_{\langle i,l \rangle }, \end{aligned}$$
(7)
where \(\beta ^{(l)}\) is the learned attention coefficient for layer \(G_{l}\), computed via the same attention model like in Eq. 6, where in this case the learnable weights are shared by all layers.
Meta-path view embedding
In the meta-path view, the embedding of each target node is computed from its meta-path based neighbors and from the pillar-edges derived by the corresponding meta-path based graph. We remind that each layer of a meta-path based graph is a homogeneous network with nodes corresponding to a subset of target nodes and edges as connections of meta-path based neighbors, including across-layer information matching pillar-edges.
We consider meta-paths of any length, starting and ending with nodes of target type; indeed, information of intermediate nodes can be discarded as it is included in the network schema view. Note that considering multiple meta-paths allow us to deal with multiple semantic spaces (Lin et al. 2021), and our framework is designed to handle an arbitrary number of meta-paths. Also, in case a layer does not contain any node of target type, the layer is discarded from the resulting multilayer graph. Yet, our framework admits the worst case of \(\ell -1\) layers missing for a meta-path type.
Analogously to the network schema view, the meta-path view embedding generation consists of two main steps (Fig. 6):
-
(MPVE-1)
First, we aggregate information of the same type, this time intended as several instances of the same meta-path and encoded via meta-path-specific Graph Convolutional Network (GCN) (Kipf and Welling 2017), obtaining, for each target node, an embedding w.r.t. each meta-path type.
-
(MPVE-2)
Second, we combine information of different types (i.e., different meta-paths in different layers) and layers (i.e., different meta-paths across layers) via semantic attention, learning the importance of each meta-path and obtaining an embedding for each target node and entity under the mata-path view.
In the following, we first describe the process of meta-path view embedding generation according to the basic Co-MLHAN approach. Next, in “Alternative meta-path view embedding: Co-MLHAN-SA” section, we shall describe an alternative strategy, called Co-MLHAN-SA, which differs from Co-MLHAN in the way across-layer information relating to pillar-edges is modeled.
Aggregating information of different instances of the same type (MPVE-1). The first step of embedding generation under the meta-path view is to aggregate information of the same type, which corresponds to several instances of a given meta-path. More specifically, we consider all p meta-paths \(\mathcal{M} = \{M_{1}, \dots , M_{p}\}\) involving nodes of target type, where each meta-path \(M_{m}\) matches a multilayer graph with at most \(\ell\) layers.
In the meta-path view, across-layer dependencies are modeled as particular types of meta-paths, i.e., across-layer meta-paths. They refer to the same composite relation, with the additional constraint that the terminal nodes belong to different layers, and that the intermediate node matches a pillar-edge, i.e., it corresponds to an entity (of type different from the target one) with both instances involved in the composite relation. An example is illustrated in Fig. 7. We define the set of across-layer meta-paths, \({\mathcal{M}}^{\Updownarrow }\), as the the union of all meta-paths of any type and defined over all layer-pairs.
To identify the meta-path based neighbors of each node, we define two functions, denoted as \(N^{\Leftrightarrow }(\cdot )\) and \(N^{\Updownarrow }(\cdot )\), which for each node return the intra-layer and inter-layer neighborhood, respectively. Formally, we define the set of within-layer neighbors of the node \(\langle i, l \rangle\), according to m-th (within-layer) meta-path type, as:
$$\begin{aligned} N_{m}^{\Leftrightarrow }(i,l)= \{\langle j,l \rangle \in V_{{\mathcal{L}}} \ | \ \langle j,l \rangle \in N_{m}(i,l)\}. \end{aligned}$$
(8)
Similarly, we define the set of across-layer neighbors of node \(\langle i, l \rangle\), according to the m-th (across-layer) meta-path type, as follows:
$$\begin{aligned} N_{m}^{\Updownarrow }(i,l) = \{\langle j,l' \rangle \in V_{{\mathcal{L}}} \ | \ \langle j,l' \rangle \in N_{m}(i,l) , l' \ne l\}. \end{aligned}$$
(9)
Note that Eqs. 8 and 9 identify the meta-path based neighborhood of type \(M_{m}\) for node \(\langle i,l \rangle\), with m referring to a within or across-layer meta-path, respectively; in particular, \(N_{m}^{\Leftrightarrow }(i,l) \equiv N_{m}(i,l)\).
Given any target node \(\langle i,l \rangle\), we apply a meta-path specific graph neural network \(f_{m}\) (with K hidden layers) in order to compute its embedding according to the m-th meta-path; formally, at each k-th layer:
$$\begin{aligned} {\textbf{z}}^{(k+1)}_{\langle i,l \rangle } = {\left\{ \begin{array}{ll} f_{m}^{ (k+1)}\left( {\textbf{z}}^{(k)}_{\langle i,l \rangle },\ \ \bigoplus \left\{ {\textbf{z}}_{\langle j,l \rangle }^{(k)} \ | \ \langle j,l \rangle \in N_{m}^{\Leftrightarrow }(i,l)\right\} \right) &{} {\text{if }} m {\text{ is a within layer meta-path}}\\ \\ f_{m}^{ (k+1)}\left( {\textbf{z}}^{(k)}_{\langle i,l \rangle },\ \ \bigoplus \left\{ {\textbf{z}}_{\langle j,l \rangle }^{(k)} \ | \ \langle j,l \rangle \in N_{m}^{\Updownarrow }(i,l)\right\} \right) &{} {\text{if }}m {\text{ is an across-layer meta-path}}\\ \end{array}\right. } \end{aligned}$$
(10)
where \({\textbf{z}}^{(0)}_{\langle i,l \rangle } = {\textbf{h}}_{\langle i,l \rangle }\) is the feature embedding computed in the first stage, and \(\bigoplus\) denotes an arbitrary differentiable function, aggregating feature information from the local neighborhood of nodes [e.g., summation, a pooling operator, or even a neural network (Wang et al. 2020)]. Similarly to Wang et al. (2021), we use a GCN architecture as \(f_{m}\), for all \(M_{m}\) \((m=1\ldots p)\) in Eq. 10, assuming no different contribution from different instances of the same meta-path.
More specifically, given the m-th within-layer meta-path and \({\mathcal{A}} = \{ {\textbf{A}}_{1}, \ldots , {\textbf{A}}_{\ell } \}\) as the set of adjacency matrices associated with the corresponding meta-path based graph, being \({\textbf{A}}_{l} \in {\mathbb{R}}^{n_{l} \times n_{l}}\) (\(l=1\ldots \ell\)) the adjacency matrix associated with layer l, the GCN for layer \(G_{l}\) is defined as follows:
$$\begin{aligned} {\textbf{z}}_{\langle i,l \rangle }^{(k+1)}= \sigma \left( \sum _{\langle j,l \rangle \in N^{\Leftrightarrow }(i,l)}\frac{1}{\sqrt{{\widetilde{\textbf{D}}}^{l}_{ii}{\widetilde{\textbf{D}}}^{l}_{jj}}} {{\textbf{W}}^{(k,l)}}^{\mathrm{T}} {\textbf{z}}_{\langle j,l \rangle }^{(k)} \right) \end{aligned}$$
(11)
where \(\sigma (\cdot )\) is a non-linear activation function (default is \(ReLU(\cdot ) = max(0,\cdot )\)), \({\textbf{W}}^{(k,l)}\) is the trainable weight matrix for the m-th meta-path in the k-th convolutional layer of shape (d, d), and \({\widetilde{\textbf{D}}}^{l}_{ii}=\sum _{j}{\widetilde{{\textbf{A}}}}^{l}_{ij}\) is the degree matrix derived from \({\widetilde{{\textbf{A}}}_{l}} = {\textbf{A}}_{l} + {\textbf{I}}_{n}\), with \({\textbf{I}}^{l}_{n}\) as the identity matrix of size \(n_{l}\), and \(n_{l}\) number of nodes of layer \(G_{l}\). The GCN model for across-layer meta-paths is built similarly, considering \(N^{\Updownarrow }(\cdot )\) instead of \(N^{\Leftrightarrow }(\cdot )\) and \(\pi\) instead of l.
Let \({\textbf{z}}^{(m)}_{\langle i, l \rangle }\) and \({\textbf{z}}^{(m)}_{\langle i,\pi \rangle }\) be the node embedding associated with the m-th within (resp. across)-layer meta-path of node \(\langle i,l \rangle\) (resp. layers-pair \(\pi\)). Downstream of meta-path specific GNNs, we obtain \(\{{\textbf{z}}^{(m)}_{\langle i,l \rangle } \ | \ l \in L, \ m=1\dots p\} \bigcup \{{\textbf{z}}^{(m)}_{\langle i,\pi \rangle } \ | \langle m,\pi \rangle \in \mathcal{M}^{\Updownarrow }\}\) node embeddings.
Combining information of different types and layers (MPVE-2). Once obtained the meta-path specific embeddings for each target node, we employ semantic-level attention for combining different meta-path types, including both intra- and inter-layer information. Given a node \(\langle i,l \rangle\), the embedding under the meta-path view is computed as follows:
$$\begin{aligned} {\textbf{z}}^{{{{\mathrm{ MP}}}}}_{\langle i, l \rangle } = \sum _{m = 1}^{p} \beta ^{(m,l)} {\textbf{z}}_{\langle i,l \rangle }^{(m)} + \lambda ^{\Updownarrow } \left( \sum _{m = 1}^{p} \sum _{\pi | l \in \pi } \beta ^{(m,\pi )} {\textbf{z}}_{\langle i,\pi \rangle }^{(m)}\right) , \end{aligned}$$
(12)
where \(\beta\) is the attention coefficient denoting the importance of each type of within layer and across-layers meta-path (cf. Eq. 6) and \(\lambda ^{\Updownarrow } \in [0\ldots 1]\) is a balancing coefficient denoting the importance of inter-layer connections.
In order to project the node embedding into the same space of the loss function—analogously to the network schema view—we aggregate the embeddings obtained from each layer with a sum operator, which is defined as follows:
$$\begin{aligned} {\textbf{z}}_{i}^{{{{\mathrm{MP}}}}} =\sum _{l \in L} {\textbf{z}}^{{{{\rm MP}}}}_{\langle i, l \rangle }. \end{aligned}$$
(13)
Note that Eq. 13 does not require an additional level of attention, since the layer dependency has already been taken into account by the attention mechanism in Eq. 12. Therefore, Eqs. 12 and 13 can be combined as follows:
$$\begin{aligned} {\textbf{z}}_{i}^{{{{\rm MP}}}} = \underbrace{\sum _{m = 1}^{p} \sum _{l \in L} \beta ^{(m,l)} {\textbf{z}}_{\langle i,l \rangle }^{(m)}}_{\text{within-layer}} + \underbrace{\lambda ^{\Updownarrow } \left( \sum _{m = 1}^{p} \sum _{\pi \in L_{cross}} \beta ^{(m,\pi )} {\textbf{z}}_{\langle i,\pi \rangle }^{(m)}\right) }_{\text{across-layers}}. \end{aligned}$$
(14)
Equation 14 hence enables the direct computation of the final embedding under the meta-path view for each entity \(v_{i}\).
Alternative meta-path view embedding: Co-MLHAN-SA
Our alternative approach for embedding generation under the meta-path view is named Co-MLHAN-SA, where the suffix ‘SA’ refers to the supra-adjacency matrix modeling each meta-path based graph. The supra-adjacency matrix, denoted as \({\textbf{A}}^{\mathrm{sup}}\), has diagonal blocks each representing a layer-specific adjacency matrix (i.e., \({\textbf{A}}_{l} \in {\mathbb{R}}^{n_{l} \times n_{l}}\), with \(l=1\ldots \ell\)), and off-diagonal blocks each corresponding to the inter-layer adjacency matrix \({\textbf{A}}_{\pi }\) for layer-pair \(\pi =(l,l')\), with values equal to 1 if an edge between \(\langle i,l \rangle\) and \(\langle j,l' \rangle\) exists, with \(l \ne l'\), and 0 otherwise.
To give an intuition, we model across-layer information downstream of semantic attention, by accounting for another level of attention, i.e., across-layer attention (by analogy with the network schema view).
We thus learn the importance of different (within layers) meta-paths via semantic attention, obtaining an embedding under the meta-path view for each node and we subsequently learn the importance of each layer via across-layer attention, obtaining an embedding under the meta-path view for each entity.
Like in the basic Co-MLHAN approach, the meta-path view embedding generation in Co-MLHAN-SA consists of two main steps (Fig. 8):
-
(MPVE-SA-1)
First, we aggregate information of the same type, intended as several instances of the same meta-path and encoded via meta-path-specific GCNs, obtaining, for each node, an embedding w.r.t. each meta-path. Unlike MPVE-1, the first step of the Co-MLHAN-SA approach hence handles the inter-layer dependencies derived from pillar-edges.
-
(MPVE-SA-2)
Second, we combine information of different types (i.e., different meta-paths in different layers) via semantic attention, learning the importance of each meta-path and obtaining an embedding for each target node under the meta-path view. Moreover, we combine information from different layers via across-layer attention, learning the importance of each layer and obtaining, for each target entity, a single embedding under the meta-path view.
By avoiding across-layer meta-paths \(\mathcal{M}^{\Updownarrow }\) definition, Co-MLHAN-SA requires a limited number of learnable parameters, as it utilizes a meta-path specific GCN shared by all layers \(G_{l}\).
Aggregating information of different instances of the same type (MPVE-SA-1). We still use the notation \(N^{\Leftrightarrow }(\cdot )\) and \(N^{\Updownarrow }(\cdot )\) to indicate the set of within-layer and across-layer neighbors, respectively. While the definition of \(N^{\Leftrightarrow }(\cdot )\) does not change w.r.t. Eq. 8, the definition of \(N^{\Updownarrow }(\cdot )\) of the Co-MLHAN-SA approach is modified in the modeling of pillar-edges, by directly considering all the instances of the same target entities in other layers, as shown in Eq. 15:
$$\begin{aligned} N_{m}^{\Updownarrow }(i,l) = \{\langle i,l' \rangle \in V_{{\mathcal{L}}} \ | \ l' \ne l\}. \end{aligned}$$
(15)
Similarly to MPVE-1, we apply a meta-path specific GNN for aggregating different meta-path instances of the same type:
$$\begin{aligned} {\textbf{z}}^{(k+1)}_{\langle i,l \rangle } = f_{m}^{ (k+1)}\left( {\textbf{z}}^{(k)}_{\langle i,l \rangle }, \bigoplus ^{k} \left( \left\{ {\textbf{h}}_{\langle j,l \rangle }^{(k)} \ | \ \langle j,l \rangle \in N^{\Leftrightarrow }(i,l) \cup N^{\Updownarrow }(i,l) \right\} \right) \right) . \end{aligned}$$
(16)
Unlike MPVE-1, the inter-layer dependencies are taken into account by the GNN, employing a modified version of the propagation rule that can handle the supra-adjacency matrix as input. We thus build for each meta-path its corresponding meta-path based supra-graph, i.e., a graph where pillar edges exist between every node and its counterpart in other coupled layers. In our setting, we instantiate \(f_{m}\) with a multi-layer GCN model (Zangari et al. 2021), as shown in Eq. 17:
$$\begin{aligned} {\textbf{z}}_{\langle i,l \rangle }^{(k+1)} = \sigma \left( \sum _{\langle j,l' \rangle \in N^{\Leftrightarrow }(i,l) \cup N^{\Updownarrow }(i,l) }\frac{1}{\sqrt{{\widetilde{\textbf{D}}}_{ii}{\widetilde{\textbf{D}}}_{jj}}} {{\textbf{W}}^{(k,m)}}^{\mathrm {T}} \delta (l,l') \ {\textbf{z}}_{(j,l')}^{(k)} \right) , \end{aligned}$$
(17)
where the degree matrix \({\widetilde{\textbf{D}}}\) is built considering both inter-layer and intra-layer links of nodes using the supra-adjacency matrix of the graph, \({\widetilde{\textbf{D}}}_{ii}=\sum _{j=1}{\widetilde{{\textbf{A}}}}^{\mathrm{sup}}_{ij}\), where \({\widetilde{{\textbf{A}}}}^{\mathrm{sup}}\) is the supra-adjacency matrix with self-loops added, \(\delta (l,l')\) is a scoring function denoting the weight coefficient for inter-layer links, ranging between 0 and 1, with values equal to \(\lambda ^{\Updownarrow }\) if \(\ l \ne l'\), and 1 otherwise.
Let \({\textbf{z}}^{m}_{\langle i,l\rangle }\) be the embedding of node \(\langle i,l \rangle\) associated with the m-th metapath. We thus obtain \(\bigcup \nolimits _{\begin{array}{c} m=1 \dots p \\ l \in L \end{array}} \{{\textbf{z}}^{(m)}_{\langle i,l\rangle }\}\) meta-path specific embeddings.
Combining information of different types and layers (MPVE-SA-2). Once obtained the meta-path specific embeddings for each target node, we employ semantic-level attention for combining different meta-path types, obtaining for each node \(\langle i, l \rangle\) an embedding under the meta-path view, which is defined as follows:
$$\begin{aligned} {\textbf{z}}^{{{{\rm MP}}}}_{\langle i,l \rangle } = \sum _{m =1 \dots p} \beta ^{(m,l)} {\textbf{z}}_{\langle i,l \rangle }^{(m)}, \end{aligned}$$
(18)
where \(\beta ^{(m,l)}\) is an attention coefficient computed as in Eq. 6.
In order to project the node embedding into the same space of the loss function, we apply an additional level of attention, named across-layer attention, similarly to network schema view, thus obtaining for each entity \(v_{i}\) an embedding under the meta-path view:
$$\begin{aligned} {\textbf{z}}_{i}^{{{{\rm MP}}}} = \sum _{l \in L} \beta ^{(l)}{\textbf{z}}^{{{{\rm MP}}}}_{\langle i,l \rangle }, \end{aligned}$$
(19)
where \(\beta ^{(l)}\) is the attention coefficient denoting the importance of the l-th layer, computed similarly to Eq. 6.
Final embedding based on Contrastive Learning
The third stage of the proposed framework is concerned with the exploitation of a contrastive learning mechanism to produce the final entity embeddings, pulling together similar entities and pushing apart dissimilar ones in the embedding space. We combine the contrastive losses computed according to each view, with individual nodes of both positive and negative pairs selected from distinct views.
Given the embeddings \({\textbf{z}}_{i}^{{{{\rm NS}}}}\) (Eq. 7) and \({\textbf{z}}_{i}^{{{{\rm MP}}}}\) (either Eq. 14 or Eq. 19) for each target entity \(v_{i}\), we transform them into the same space in which a contrastive loss function is computed, by employing a simple MLP architecture with one hidden layer, as defined in Eq. 20:
$$\begin{aligned} \begin{aligned} \hat{{\textbf{z}}}_{i}^{{{{\rm NS}}}}&= {\textbf{W}}^{(2)} \sigma ({\textbf{W}}^{(1)}{\textbf{z}}_{i}^{{{{\rm NS}}}} + {\textbf{b}}^{(1)}) + {\textbf{b}}^{(2)}, \\ \hat{{\textbf{z}}}_{i}^{{{{\rm MP}}}}&= {\textbf{W}}^{(2)} \sigma ({\textbf{W}}^{(1)}{\textbf{z}}_{i}^{{{{\rm MP}}}} + {\textbf{b}}^{(1)}) + {\textbf{b}}^{(2)}, \end{aligned} \end{aligned}$$
(20)
where \({\textbf{W}}^{(2)}\), \({\textbf{W}}^{(1)}\), \({\textbf{b}}^{(2)}\) and \({\textbf{b}}^{(1)}\) are learnable weights shared by both views and \(\sigma (\cdot )\) is the activation function (default is ELU).
The contrastive loss according to a certain view is computed on pairs of positive and negative samples. While earlier contrastive learning approaches were based on one or more negatives and a single positive for each instance, we follow the more recent trend of using both multiple positive and negative pairs (Khosla et al. 2020; Wang et al. 2021). Each target entity \(v_{i}\) can hence rely on more than one positive (at least itself, under the other view). For positive sampling, the idea is to select the best nodes connected by multiple meta-path instances, since meta-path based neighbors have higher probability of being similar to each other. For negative sampling, we simply choose considering everything that is not positive.
We first proceed to the selection of positive samples. For this purpose, we count the meta-paths instances connecting each pair of target entities, considering all meta-path types on individual layers, as shown in Eq. 21:
$$\begin{aligned} C_{i,j} = \sum _{l \in L} \sum _{m = 1 \dots p} |\{j \ | \ \langle j,l \rangle \in N_{m}^{\Leftrightarrow }(i,l)\} |. \end{aligned}$$
(21)
For each target entity \(v_{i}\), we obtain a set \({\mathcal{S}}_{i}= \{v_{j} \in {\mathcal{V}} \ | \ C_{i,j}>0 \}\) which is sorted by decreasing values of \(C_{i,j}\). Given a threshold \(T_{pos}\), we select for each entity itself and the best \(T_{pos}-1\) entities as positives, obtaining a subset \(\overline{{\mathcal{S}}}_{i} \subseteq {\mathcal{S}}_{i}\) with \(|\overline{{\mathcal{S}}}_{i}| \le T_{pos}-1\); all the remaining \(|{\mathcal{V}}|-T_{pos}\) entities are regarded as negatives for \(v_{i}\). Therefore, for each entity \(v_{i}\), we define the set of positive samples \({\mathcal{P}}_{i}\) as \({\mathcal{P}}_{i}=v_{i} \cup \{ v_{j} | v_{j} \in \overline{{\mathcal{S}}}_{i} \}\) and the set of negative samples \({\mathcal{N}}_{i}\) as \({\mathcal{N}}_{i}= {\mathcal{V}} {\setminus } {\mathcal{P}}_{i}\).
We stress that for the selection of positives we only exploit structural information, without using any information derived from the encoding of external content (i.e., initial features) of entities. Nonetheless, additional conditions on meta-paths in the selection of entity pairs can be defined, e.g., by diversifying the minimum number of instances required to enable the enumeration of a specific meta-path. Co-MLHAN is flexible in both the meta-path counting method and the overall positive and negative selection strategy.
For the computation of contrastive losses according to a given view, the embedding of each target entity \(v_{i}\) is selected from the given view, while the positive and negative samples are selected from the other view, as defined in Eqs. 22 and 23, and illustrated in Fig. 9:
$$\begin{aligned} L^{{{{\rm NS}}}}= & {} -\log \frac{\sum _{j \in {\mathcal{P}}_{i}} \exp \left( sim \left( \hat{{\textbf{z}}}^{{{{\rm NS}}}}_{i},\hat{{\textbf{z}}}^{{{{\rm MP}}}}_{j}\right) /\tau \right) }{\sum _{u \in {{\mathcal{P}}_{i} \cup {\mathcal{N}}_{i}}} \exp \left( sim \left( \hat{{\textbf{z}}}^{{{{\rm NS}}}}_{i},\hat{{\textbf{z}}}^{{{{\rm MP}}}}_{u}\right) /\tau \right) }, \end{aligned}$$
(22)
$$\begin{aligned} L^{{{{\rm MP}}}}= & {} -\log \frac{\sum _{j \in {\mathcal{P}}_{i}} \exp \left( sim \left( \hat{{\textbf{z}}}^{{{{\rm MP}}}}_{i},\hat{{\textbf{z}}}^{{{{\rm NS}}}}_{j}\right) /\tau \right) }{\sum _{u \in {{\mathcal{P}}_{i} \cup {\mathcal{N}}_{i}}} \exp \left( sim \left( \hat{{\textbf{z}}}^{{{{\rm MP}}}}_{i},\hat{{\textbf{z}}}^{{{{\rm NS}}}}_{u}\right) /\tau \right) }, \end{aligned}$$
(23)
where \(sim(\textbf{v}_{1},\textbf{v}_{2})\) denotes the cosine similarity between two vectors \(\textbf{v}_{1}\) and \(\textbf{v}_{2}\), and \(\tau\) is the temperature parameter, which indicates how concentrated the embeddings are in the representation space, so that a lower temperature leads the loss to be dominated by smaller distances and widely separated representations contribute less. Note that Eqs. 22–23 are independent from the specific strategy of positive and negative selection; we leave the investigation of alternative sampling methods as future work (“Conclusions” section).
The final contrastive loss is computed as a convex combination of the two contrastive losses to balance the effects of the two views:
$$\begin{aligned} L_{co} = \lambda L^{{{{\rm NS}}}} + (1-\lambda ) L^{{{{\rm MP}}}} \end{aligned}$$
(24)
with \(0< \lambda < 1\). The loss function is completely specified depending on whether an unsupervised or semi-supervised paradigm is adopted. The extension to the (semi-)supervised case can be done by adding a new term to the final loss, as shown in Eq. 25:
$$\begin{aligned} L_{tot} = \eta L_{co} + L_{sup} \end{aligned}$$
(25)
where \(L_{sup}\) is the (semi-)supervised term, e.g., cross-entropy for classification tasks, jointly optimized with the contrastive term in a end-to-end fashion, and the coefficient \(\eta\), \(0 \le \eta \le 1\), is given to the contrastive term, since in a (semi-)supervised setting the (semi-)supervised term is expected to be more relevant.
Similarly to Chen et al. (2020), once the training procedure is completed, the optimized \({\textbf{z}}_{i}^{{{{\rm MP}}}}\) or \({\textbf{z}}_{i}^{{{{\rm NS}}}}\) will eventually be used for downstream tasks. Particularly, our default choice is to select the embeddings under the meta-path view, since meta-paths represent high-order relations between target nodes and pillar edges capture the information of instances of the same entity, exploiting multilayer dependencies. It should however be noted that the similarity between the two learned embeddings, for any entity, is expected to be high, since, according to our positive selection strategy, each entity \(v_{i}\) includes itself under the other view in its set of positive samples \({\mathcal{P}}_{i}\). Nonetheless, in “Experimental settings” section, we shall provide empirical evidence of such embedding similarities. The final learned embeddings optimized via such cross-view contrastive loss can be used for a wide range of analysis tasks—at node, entity, or edge level—such as node/entity classification, graph clustering, link prediction.