GraphKKE: Graph Kernel Koopman Embedding for Human Microbiome Analysis

More and more diseases have been found to be strongly correlated with disturbances in the microbiome constitution, e.g., obesity, diabetes, or some cancer types. Thanks to modern high-throughput omics technologies, it becomes possible to directly analyze human microbiome and its influence on the health status. Microbial communities are monitored over long periods of time and the associations between their members are explored. These relationships can be described by a time-evolving graph. In order to understand responses of the microbial community members to a distinct range of perturbations such as antibiotics exposure or diseases and general dynamical properties, the time-evolving graph of the human microbial communities has to be analyzed. This becomes especially challenging due to dozens of complex interactions among microbes and metastable dynamics. The key to solving this problem is the representation of the time-evolving graphs as fixed-length feature vectors preserving the original dynamics. We propose a method for learning the embedding of the time-evolving graph that is based on the spectral analysis of transfer operators and graph kernels. We demonstrate that our method can capture temporary changes in the time-evolving graph on both created synthetic data and real-world data. Our experiments demonstrate the efficacy of the method. Furthermore, we show that our method can be applied to human microbiome data to study dynamic processes.


Introduction
Only about 1 out of 10 cells in our body is actually a human cell. We are colonized by a diverse community of bacteria, archaea, and viruses, jointly referred to as the microbiome. About 1.5 kg of microbes live almost everywhere on and in the human body as symbionts, e.g., on the skin, in the mouth, or in the gut. They have a strong influence on both their hosts and environments. For example, more and more diseases have been found to be strongly correlated with the disturbances in the microbiome constitution, e.g., obesity, diabetes, or some cancer types. Furthermore, recent studies have revealed that gut microbiome also has a huge impact on brain functions and is related to disorders such as Alzheimer's disease (Xu and Wang [2016]). Most studies aiming at understanding the differences in the microbiome profiles of healthy and ill individuals, however, are focused on statistical constitution analysis, omitting the large variety of complex microbe-microbe and host-microbe interactions, which can be modeled as time-evolving graphs. Red color of vertices means that the concentrations of microbes decreased after a person became ill at time t+τ +1. This reduction results in the change of the topology of the time-evolving graph characterized by removing the edges between vertices in red and some vertices in yellow.
It has also been found that although the constitution of the microbiome is constantly changing throughout our lives (in response to environmental factors), a healthy human microbiome can be considered as a metastable state lying in a minimum of some ecological stability landscape (Shaw et al. [2019]). Broadly speaking, metastability can be observed when for short timescales, the system appears to be equilibrated, but at larger time scales, under-goes some transitions from one metastable state to other metastable states (Bovier and den Hollander [2006]). This phenomenon occurs in dynamical systems of various structures, including systems with vector-valued states, but also systems represented as time-evolving graphs.
As an illustration of a time-evolving graph that lies in an energy landscape with two metastable states, consider the time-evolving microbiome interaction graph shown in Figure 1, where vertices represent the concentrations of bacteria species and edges pairwise associations between them. In this example, a disease can be thought of as a perturbation that displaces the microbiome composition from its equilibrium (healthy) state. This displacement is characterized by removing the edges between vertices in red due to the reduction of the concentration of one of vertices. Given an evolution of the graphs (in this example, the evolution of the microbe interactions), we aim at analyzing dynamics occurring in the graph over time, namely, extracting the number of metastable states and their locations, substructures of a graph, which characterize the state space (e.g., the difference in the microbe interactions between the states healthy and ill ). Moreover, the detection of the metastable states in the time-evolving graph can serve additional purposes such as graph clustering.
Figure 2: An illustration of the proposed method and challenges which we aim to overcome. (a) Learning transfer operators using graph kernels, where k(·, ·) is a graph kernel and K k is the Koopman operator. (b) In the learned embedding space it is possible to detect metastable states and to determine distinct substructures.
Related work. Two potential ways to detect metastable states in a time-evolving graph (e.g., the states healthy and ill in our example) are the following: 1. A typical solution would be to analyze the timeevolving graph directly in the space of graphs without taking into account potential temporal correlations. Practically, this can take the form of a simple kernelbased graph clustering algorithm. Classic graph kernels decompose graphs into substructures (e.g., walks (Kang et al. [2012]), subgraphs (Shervashidze et al. [2009]), paths (Borgwardt and Kriegel [2005]), and subtrees (Shervashidze et al. [2011])) and count the number of common substructures between graphs in order to obtain the feature vectors. Afterwards, these feature vectors can be used by various machine learning approaches to cluster snapshots of the time-evolving graphs. The problem with such methods is that they are incapable of capturing the time-information, which is crucial for time-evolving graphs with metastability.
2. Another possible way is graph representation learning, which aims at finding a mapping that embeds the system into some low-dimensional space. That is, we represent a single snapshot of the time-evolving graph at each time point by a single vector retaining the original properties of the dynamics. After finding the optimal embedding space, the low-dimensional representation can be used as a feature input for diverse machine learning approaches for analyzing time-series data.
The recently proposed methods for graph representation learning focus mostly on static graphs. These methods can be broadly divided into two categories. The first category comprises methods for embedding graph substructures (e.g., vertices or subgraphs) (Perozzi et al. [2014], A. andJ. [2016], , Ou et al. [2016]). For instance, DeepWalk (Perozzi et al. [2014]) and node2vec (A. and J. [2016]) are approaches that use random walks to produce embeddings. The only difference between them is that node2vec utilizes two hyperparameters, where one of them controls the likelihood of a random walk to return to the previously visited vertex and another parameter controls the likelihood to explore undiscovered parts of a graph. DeepWalk first traverses the graph with random walks in order to extract local structures and then it uses the Skip-Gram algorithm to learn embeddings. The second category pertains to representation learning of the entire graph, which is used for the classification/clustering of the set of graphs. The graph2vec approach (Narayanan et al. [2017]) learns the embedding of the set of graphs using the idea of the Skip-Gram from doc2vec (Le and Mikolov [2014]). It comprises two main components: (1) The generation of rooted subgraphs around every vertex using the Weisfeiler-Lehman relabeling process from Shervashidze et al. [2011]; (2) Learning the embedding of the given graphs following the Skip-Gram with negative sampling procedure. Although this approach is capable of projecting the entire set of graphs into low-dimensional space, it does not capture the time-evolution of the graph.
Recently, some work has also been done on learning the embedding vectors of vertices in the time-evolving graph. Dyngraph2vec (Goyal et al. [2019]) is a deeplearning based approach which learns both the topological patterns in a graph and the temporal transitions using multiple nonlinear layers and recurrent layers. Moreover, it uses the lookback hyperparameter in the recurrent layers to control the length of temporal patterns. The idea of DynamicTriad (Zhou et al. [2018]) is to use a group of three vertices, a so-called triad, to model the dynamic changes of graph structures. This approach only considers patterns within two time steps, which means that it cannot capture patterns that exist for a longer period of time. The main disadvantage of the substructure representation learning approaches, both for static and for time-evolving graphs, is that they are not able to project the entire set of snapshots of the time-evolving graph into low-dimensional space.
Contribution. To this end, we present an approach named graphKKE (the overall structure is shown in Figure 2), which is, to our knowledge, the first approach for representation learning of an entire time-evolving graph. Inspired by the proposed kernel transfer operator approach for molecular conformation analysis (Klus et al. [2020], Klus et al. [2018]), we use the same approach for learning the embeddings of time-evolving graphs. The method is based on the spectral analysis of transfer operators, such as the Perron-Frobenius or Koopman operator in a reproducing kernel Hilbert space.
Overall, we highlight the following contributions: We propose graphKKE, a novel unsupervised representation learning technique to analyze a timeevolving graph, i.e., class labels of the graphs are not required for learning their embedding. Moreover, we demonstrate the applicability of the graph kernels to time-evolving graphs. Our method is not only capable of preserving the information about the underlying dynamical graph patterns but also of taking into account the topological structure of the graph.
We present a new simulation method for constructing artificial benchmark datasets of time-evolving graphs with metastability and with graph structures of different complexity. We demonstrate that graphKKE significantly outperforms other methods for graph representation learning on several benchmark problems.
We illustrate that graphKKE can extract the important associations among microbes and capture the temporal changes occurring in the time-evolving microbiome interaction graph.
The remainder of this paper is organized as follows: In Section 2, the problem of learning the embeddings of time-evolving graphs with metastable behavior is defined. In Section 3, we introduce transfer operators, graph kernels, and the method for the approximation of transfer operators using graph kernels. A model for the simulation of time-evolving benchmark graphs with metastability and the experiments with these benchmark datasets are presented in Sections 4 and 5. Eventually, Section 6 illustrates that it is possible to obtain a meaningful lowdimensional representation for microbiome data.

Problem statement
In order to state the problem formally, let us first introduce the necessary notations and definitions.
A graph G is a pair (V, E) with a non-empty set of vertices V (G) and a set of edges The set V (G) often represents the objects in the data and E(G) relations between objects. We define the adjacency matrix of the graph G as the n × n matrix A with A ij = 1 if the edge (v i , v j ) ∈ E(G), and 0 otherwise. Furthermore, we say thatḠ = (V ,Ē) is a subgraph of a graph G = (V, E) if and only ifV ⊆ V and Given a time-evolving graph G as a sequence of T graphs G = (G 0 , . . . , G T −1 ) at the consecutive time points {0, . . . , T − 1} for some T ∈ N. We call G t a timesnapshot of G at time t. We say that the time-evolving graph G exhibits metastable behavior if G can be partitioned into s subsets G = G 0 ∪ · · · ∪ G s−1 for some s T such that for each time point t ∈ {0, . . . , T − 1} We call G 0 , . . . , G s−1 metastable states of the timeevolving graph G and each G t , t = 0, . . . , T −1, belongs to exactly one of the states G i . In most cases, each state G i is characterized by a certain pattern of graph attributes (i.e., edges, vertex labels). We define our problem as follows: Given a timeevolving graph G = (G 0 , . . . , G T −1 ) with assumed metastable behavior, we aim to represent each timesnapshot G t as a vector in a low-dimensional space R m , where m is a number of embedding dimensions, retaining the metastable behavior of G.
Commonly, the number of embedding dimensions m is a hyperparameter that has to be tuned in order to obtain a good performance, in our approach we will show that the number of embedding dimensions m can be chosen to be the number of states s, which eliminates the need to optimize this hyperparameter.

GraphKKE: Graph Kernel Koopman Embedding
In what follows, we first introduce transfer operators, kernel functions, and graph kernels. Afterwards, we present our approach -graphKKE -that is capable of learning embeddings of time-evolving graphs preserving temporal changes in a low-dimensional space.

Transfer operators
In order to capture the temporal changes in the timeevolving graph, transfer operator theory will be used in our method. Therefore, we will briefly discuss transfer operators and their applicability in the analysis of dynamical systems (for details, see Klus et al. [2016]). Information about the evolution of the system is contained in the spectral properties (such as eigenvalues and eigenfunctions) of linear operators. The most commonly used examples of such operators are the Koopman operator and the Perron-Frobenius operator.
Let {X t } t≥0 be a stochastic process defined on a highdimensional state space X ⊂ R d . The pointwise evolution of X t can be formally described by the transition density function p τ (y | x), which gives the probability to find the process at a point y after some lag time τ , given that it started in x at time 0. More formally, the transition density function is With the aid of the transition density function, the Koopman operator expresses the evolution of a function of the state, also called observable, whereas the Perron-Frobenius operator evolves probability densities. Let f t ∈ L ∞ (X) be an observable of the system. Then the Koopman operator K τ : (1) The evolution of probability densities can be described in a similar way. Assume the initial density of the system is given by g t ∈ L 1 (X). Then the Perron-Frobenius operator P τ : L 1 (X) → L 1 (X) is defined by A density π is called invariant density or equilibrium density if it is invariant under the action of P τ , that is, P τ π = π. Let u t (x) = π(x) −1 g t (x) be a probability density with respect to the equilibrium density π. Then, the Perron-Frobenius operator with respect to the equilibrium density is defined as Both the Koopman operator K τ and the Peron-Frobenius operator P τ are linear, infinite-dimensional operators, which are adjoint to each other and, therefore, it should not matter which one we choose to study the behavior of the system. Moreover, although they are typically defined on the function spaces L 1 and L ∞ , we assume that the operators are well-defined on L 2 (for details, see Klus et al. [2016]).
The information about the long-term behavior of the dynamical system is encoded in the spectral properties of these operators such as eigenvalues and eigenfunctions (Klus et al. [2020]). More precisely, eigenfunctions with eigenvalues close to 1 of both Koopman and Perron-Frobenius operators contain information about the locations of metastable states in the state space X.
Since transfer operators are infinite-dimensional, the goal is to obtain a finite-dimensional approximation of these operators. Below, we will show how to obtain a finite-dimensional approximation of transfer operators utilizing the evaluation of graph kernels on training data.

Graph kernels
In this section, we describe kernel functions and a neighborhood aggregation graph kernel, the 1dimensional Weisfeiler-Lehman kernel, since all our experiments make use of this graph kernel. However, one can potentially use other graph kernels, which can be tailored to specific applications.
Kernel function. Kernel-based methods are machine learning algorithms that learn by comparing any pair of data points using similarity measures called kernel functions. We will say that k : for x, x ∈ X and where ·, · is the inner product on H.
A feature map ϕ exists if and only if k is a positivesemidefinite function. However, the kernel is normally not defined by an explicit representation of ϕ, but instead, each kernel implicitly defines a potentially infinitedimensional mapping ϕ. For a given set of data points x 0 , . . . , x m ∈ X, the ma- Now let G be a sequence of graphs, then a kernel k : G × G → H is called a graph kernel.
Gaussian kernel. The most popular kernel function used in numerous kernel-based methods is the Gaussian kernel, which for two graphs G and G can be defined as where A and A are the respective adjacency matrices, σ > 0 is the bandwidth parameter, and · the Frobe-nius norm. The Hilbert space associated with the Gaussian kernel is an infinite-dimensional space H that is dense in L 2 and therefore, the space H is a rich approximation space for L 2 (Klus et al. [2018]).
Weisfeiler-Lehman kernel. In this work, we will use a neighborhood aggregation kernel -the Weisfeiler-Lehman (WL) kernel (Shervashidze et al. [2011]) -for graphs with discrete vertex labels. However, one could choose any other class of graph kernels such as graphlet kernels from Shervashidze et al. [2009] or random walk kernels from Kang et al. [2012]. We will briefly give an overview of the Weisfeiler-Lehman kernel. Let G and G be graphs and l (0) is a set of unique original vertex labels of G and G. The key idea of this kernel is to augment each vertex label by the sorted set of neighboring vertex labels, and then to compress the augmented label into some new label using a hash function f . That is, at each iteration h = 1, . . . , the 1-dimensional Weisfeiler-Lehman kernel computes a new set of vertex labels l (h) such that ∀v ∈ V (G)∪V ( G) and where the symbol "+" denotes the concatenation of strings, N (v) the set of neighbors of a vertex v, and sorted(N (v)) means that vertex labels need to be sorted before concatenation. The hash function f is chosen in such a way that f . The next step is to compute a feature vector for each graph G and G at each iteration h: Finally, the Weisfeiler-Lehman kernel for two graphs G and G is defined as: In the next subsection, we will introduce an approach for learning the embedding of a time-evolving graph using transfer operators and graph kernels.

Method overview -graphKKE
Now, we introduce a graph kernel-based approximation method for time-evolving graphs inspired by the method proposed in Klus et al. [2018].
Since we cannot compute eigendecompositions of infinite-dimensional operators numerically, typically suitable finite-dimensional subspaces are considered. It was shown that the initial eigenvalue problem on L 2 can be approximated by an eigenvalue problem defined on the reproducing kernel Hilbert space H utilizing only kernel evaluations.
Assume we have measurement data, given by a timeevolving graph G = (G 0 , ..., G T −1 ), where each G t is a single snapshot of G at time point t and G is a set of graphs mapped forward for a time lag τ , that is, It was shown in Klus et al. [2020] that in order to find eigenfunctions of transfer operators, we need to solve auxiliary matrix eigenvalue problems, given by and where Gram matrices, k(·, ·) is a graph kernel, and K G G = K GG . The equations (3) and (4)  This eigenvalue problem is closely related to kernel canonical correlation analysis (kernel CCA) (Klus et al. [2019]). Kernel CCA computes eigenfunctions of the forward-backward dynamics to identify so-called coherent sets. Coherent sets are a generalization of metastable sets and are regions of the state space that are not distorted over a certain time interval.
We assume that K GG is non-singular or otherwise we replace the inverse by its regularized version (K GG + ηI) −1 , where η ≥ 0 is a ridge parameter. This regularization is known as Tikhonov regularization.
Furthermore, if k(·, ·) is a graph kernel, then we apply the following normalization: for all i, j = 0, . . . , T − 1. The same normalization is applied to graphs in both G and G.
The number of states s in the time-evolving graph G is determined by the number of dominant eigenvalues close to 1. That is, if we have s dominant eigenvalues close to 1, then the time-evolving graph can be divided into s subsets G = G 0 ∪ · · · ∪ G s−1 . Moreover, all information about long-term behavior of the time-evolving graph G is contained within the eigenfunctions associated with s dominant eigenvalues close to 1. All things considered, the dominant eigenvalues can be used to determine the number of states s in the data and the dimension of a new low-dimensional space. The eigenfunctions associated with the dominant eigenvalues close to 1 are considered as a low-dimensional representation of the timeevolving graph G.

Generating benchmark data with metastability
Most of the benchmark data sets such as those from chemo-and bio-informatics domains, see Kersting et al. [2016], can be represented by static graphs. Thus, these datasets are not appropriate for our purposes, since they do not have time information and metastable behavior. Hence, in this section we present a model for generating time-evolving graphs with a comprehensible structure to estimate the performance of the proposed method. Figure 3: An example of a trajectory of a particle in the 5-well potential described in Section 4. Points indicate the position of the particle at time t and blue lines show the movement of the particle from time point t to t + τ .
In order to obtain a time-evolving graph G with metastability, we use a stochastic differential equation to generate a trajectory based on which a set of timesnapshots of the graph G is then constructed. Let us consider a particle in a 2-dimensional s-well potential given by the stochastic differential equation (SDE) (Klus et al. [2019]): with the potential F (x) = cos(s arctan(x 1 , x 2 )) + 10 Here, s denotes the number of wells, since we assume that the number of wells defines the number of states in the time-evolving graph G, the parameter β is the inverse temperature and W t is a standard Wiener process. The particle stays in one of the wells for a relatively long time and then jumps to one of the neighboring wells. We consider one realization (trajectory) S ∈ R 2 of the stochastic process X = {X t } L−1 t=0 , where L is the length of the trajectory. An example of such a trajectory S is shown in Figure 3, where the number of wells is s = 5 and β = 0.05. Before generating a time-evolving graph G, we cluster all points of S using k -means in order to obtain the ground truth labels for time-snapshots of G. Every synthetic benchmark data is based on this trajectory and constructed as follows. Figure 4: An illustration of our benchmark data at times (a) t = 0, (b) t = 256. In both (a) and (b), the left image shows a time-snapshot G 0 and G 256 and the right image are a points of the particle of the trajectory shown in Figure 3, which are clustered into 5 sets with k -means. Edges in red color are removed from the graph. Vertices in the circles are considered as patterns characterizing corresponding states.
The construction of the time-evolving graph G = {G 0 , ..., G T −1 } can be described by a three-step process. In the first step, the trajectory S = {(x i=0 using SDE (5) is generated. We consider the case where the number of time points T in G is equal to the length L of S, we will then denote them both by T . In the second step, we choose the number of vertices n and assign positions (a j , b j ), j = 0, . . . , n − 1 to each vertex v ∈ V (G t ) in a Cartesian coordinate system. The number of vertices n and their positions will be the same for each G t ∈ G, t = 0, . . . , T − 1. We use the uniform distribution to generate random points (a j , b j ) such that (a j , b j ) ∼ U [−2,2]×[−2,2] . Finally, in the third step of the construction process, we generate temporary patterns in the structure of the time-evolving graph such that it exhibits metastable behavior in the following way. At each time point t ∈ {0, . . . , T − 1}, we draw a circle around the point (x 2 ) ∈ S with radius r. We choose the radius r as the average of the radii of each cluster in S and r is the same for each t. Each time-snapshot G t is first set to be a complete graph. We define temporal patterns, which characterize each state of G, by removing all edges between vertices that are inside the current circle. In order to add noise to the data we also remove edges outside the circle with the out-state probability. An example of the benchmark data is shown in Figure 4.
The code to generate the data and code of graphKKE are available on GitHub at https://github.com/k-melnyk/graphKKE.

Experiments
We illustrate the efficacy of graphKKE proposed in Section 3.3 on the benchmark dataset and a real-world dataset with an artificial signal. We will show that our method is capable of learning the embedding of the timeevolving graph maintaining all dynamic properties in such way that it is possible to detect the metastable states in the low-dimensional space. Besides the experiments with benchmark and real-world datasets, we compare our method with several state-of-the-art approaches for graph clustering.

Experiments with synthetic data
Experimental setup. In order to test the performance of the method proposed in Section 3 and compare the result to other baselines models, we generate the synthetic data described in Section 4 with different configurations of interest such as the number of vertices n, the number of time steps T , and the number of states s. The datasets are summarized in Table 2. For each dataset we set the out-state probability to 0.1. We apply graphKKE with the Weisfeiler-Lehman graph kernel with number of iterations h = 1 and regularization parameter η = 0.1. In order to have ground truth labels/states of G, we apply k -means clustering to the SDE trajectory S. For the Weisfeiler-Lehman kernel, the initial set of vertex labels l 0 is defined to be {0, 1, 2, . . . , n}. Results & Analysis. We visualize the result only for the 5DynG-100 dataset. The resulting eigenvalues are shown in Figure 5. A spectral gap after the fifth eigenvalue indicates that the time-evolving graph G can be divided into s = 5 metastable states G = G 0 ∪· · ·∪G 4 . This implies that the number of embedding dimensions m is equal to the number of dominant eigenvalues or the number of states s, since all information about the long-term behavior of the time-evolving graph is contained within the eigenfunctions associated with s dominant eigenvalues close to 1. Thus, each time-snapshot of G is embedded into a new vector space R s as m = s, where the lowdimensional representations of time-snapshots are eigenfunctions associated with s dominant eigenvalues.
Applying k -means to the eigenfunctions associated with the five dominant eigenvalues results in the five clusters. Since each state of the time-evolving graph is characterized by some common pattern in the topological structure, we average adjacency matrices of each state. Thus, if we have a time-evolving graph with s states G = G 0 ∪ · · · ∪ G s−1 and {A 0 , . . . , A s−1 } is a set of corresponding subsets of adjacency matrices, then Each average adjacency matrix A avg i is associated with the average graph G avg i . Figure 6 illustrates the graphs of each state, where vertices are colored according to their degrees of the average graph G avg i , i = 0, . . . , s − 1. Our approach is capable of capturing common temporal patterns in the topological structure of the timeevolving graph with metastability. Consequently, it can learn a meaningful embedding of the time-evolving graph and preserve states in a low-dimensional space.

Experiment on the MovingPic microbiome dataset
In this experiment, we apply graphKKE to analyze a microbiome dataset -MovingPic coming from Caporaso et al. [2011], where one male and one female were sampled daily at three body sites (gut, skin, and mouth) for 15 months and for 6 month, respectively. As a feature matrix, the OTU table D ∈ N T ×p is used, where T is the number of time points and p is the number of OTUs. The operational taxonomic units (OTUs) are defined as groups of closely related microbes or bacteria species.
We use the microbiome profile only from the skin and since the data does not have any perturbations such as antibiotics exposure or diseases, we add an artificial noisy signal to the data in the following way. A practical justification for adding noise to the signal is that the human microbiome might react not only to major perturbations such as diseases or antibiotics exposure but also to some short-term daily fluctuations such as changing of lifestyle or stress. Moreover, the noise will be added to test the robustness of graphKKE.
] be the T -dimensional column vector of OTU counts of the ith species. OTUs with less than 30% of total reads are removed from the matrix D. We randomly choose 100 OTUs that are used to add the noisy signal. The vector of length T is constructed using a sine wave function: 2πt ω and then for each i, i = 0, . . . , 100, we compute new OTU counts d i , where w ∼ Normal(0, 1) and is the level of Gaussian noise. We set to one of {0, 0.05, 0.3}. Figure 7 shows the OTU table without the artificial signal and with the artificial signal.
The next step is the construction of a time-evolving graph. Let d t = [d t 1 , d t 2 , ..., d t p ] be the p-dimensional row vector of OTU counts at time point t, t = 0, ..., T −1. The raw OTU counts are typically normalized by the total cumulative count c t = p i=1 d t i in order to account for the different sequencing depth (Lo and Marculescu [2019]). Thus, the normalization of d t by the total cumulative count results in the relative abundance vector: .., p in order to define an initial co-occurrence graph. We choose a threshold of 0.5 such that edges with the Pearson coefficient greater than 0.5 or less than −0.5 are considered to be strongly correlated and remain in G 0 . Edges with the Pearson correlation coefficient in the range [−0.5; 0.5] are removed from the initial graph. Furthermore, to construct time-snapshots for each t = 0, . . . , T − 1, we use the OTU counts. If the OTU count for the current vertex is zero, we remove edges connecting this vertex and its neighboring vertices. The statistics of the pre-processed data can be seen in Table 2. Moreover, we define G t = G t+τ . That is, for the chosen lag time τ = 1, G = (G 0 , . . . , G T −2 ) and G = (G 1 , . . . , G T −1 ). From the two time-evolving graphs G and G, we compute the Gram matrices K GG and K G G using the Weisfeiler-Lehman kernel, where the number of iterations is set to h = 1, and the regularization parameter to η = 0.9. Results & Analysis. The eigenvalues detected by graphKKE for different percentages of Gaussian noise are shown in Figure 8. The gap after the second eigenvalue and the values of these eigenvalues close to 1 imply the presence of two states in the time-evolving graph G. The spectral gap after the forth eigenvalue indicates the presence of four states but we are not aware of the biological interpretations of the second two states since the original study does not mention any potential perturbations. The experiment also shows that graphKKE is robust to the noise in the data. In order to find the location of the states, we cluster time-snapshots into two states using k -means applied to the two normalized eigenfunctions associated with two dominant eigenvalues with the number of clusters set to 2.
The following experiment will demonstrate whether the detected states in the benchmark and the real-world datasets correspond to the ground truth labels. Moreover, we will show that graphKKE outperforms other methods for learning the embeddings of time-evolving graphs.

Comparative analysis
Experimental setup. The goal of this experiment is to compare graphKKE to several state-of-the-art representation learning and graph clustering approaches using benchmark and real-world datasets. The proposed approach with two different graph kernels -Gaussian and Weisfeiler-Lehman kernels -is compared with graph2vec (Narayanan et al. [2017]) and the original WL kernel (Shervashidze et al. [2011]). The main idea of graph2vec is explained in Section 1 and the WL kernel is discussed in Section 3.2. Since the analysis is done for the graph clustering task, we apply k -means to the resulting embedding vectors of every approach. The embedding dimensions of {5, 64, 128, 1024} were chosen for graph2vec. The hyperparameters of graphKKE were chosen empirically 1 and can be seen in Table 1. The choice of σ for the Gaussian kernel is critical for the performance of graphKKE. The optimal choice of σ is beyond the scope of this paper (for details see ?). For the MovingPic dataset, the level of Gaussian noise is set to 0.05 in this experiment. Figure 9: The large spectral gap after the second eigenvalue of the Koopman operator approximated by graphKKE indicates that the cholera infection dataset can be divided into two metastable states.
Evaluation Metric. In order to assess the results of the clustering of the embedding vectors for all approaches, the Adjusted Rand Index (ARI) is used. Higher ARI corresponds to greater accuracy in correctly identifying the ground truth labels/states.
Results & Analysis. The graph clustering results for all datasets using graphKKE and other state-of-the-art methods are presented in Table 3 (experimental datasets). For graph2vec the embedding dimension of 5 was used as a dimension with the best ARI to compare its result with the results of other approaches. We observe that both graph2vec and WL kernel perform poorly on the benchmark and real-world datasets. One reason of the poor embedding is that these two methods do not take into account the time information which is crucial in timeevolving graphs with metastability.
Additionally, this experiment shows that the detected metastable states using the embedding of graphKKE correspond exactly to the ground truth labels. In the benchmark data, the ground truth labels are the labels of the k -means clustering of the trajectory S. In the case of the MovingPic dataset, the ground truth labels correspond to the time period when the sine wave function of the artificial signal is zero (label 0) or greater than zero (label 1).

Application to microbiome data
Having studied the performance of graphKKE on benchmark datasets and the real-world dataset with the artificial signal, we now describe the application of our graphKKE approach to the microbiome data. Such data is more challenging than the benchmark data because the real-world data generating process is more complex and also contains noise. Background. The microbiome data, which we will analyze in this section comes from a study about recovery from Vibrio cholerae infection (Hsiao et al. [2014]). Fecal microbiota was collected during acute diarrhea and recovery periods of cholera in a cohort of seven Bangladeshi adults. In our experiments, we chose one patient, since there is variation in the constituents of the gut microbiota among individuals (Durack and Lynch [2019]) and thus, it can bias the result of detecting the metastable states such as diarrhea and recovery periods. The pre-processed OTU table were obtained from Zackular et al. [2015]. The aim is to determine if there are metastable states in this data and if possible, the number of metastable states and their locations.
The time-evolving graph from the given OTU table is constructed in the same way as for the MovingPic dataset using the relative abundance vector and Pearson correlation coefficients. In the real-world microbiome dataset,  Therefore, the question how to properly construct timeevolving graphs such that both metastable behavior and associations between microbes are taken into consideration need to be considered in future work. We apply graphKKE using the Weisfeiler-Lehman graph kernel. We set the number of iteration to 5 and the regularization parameter to 0.1.

Results & Analysis
The resulting eigenvalues are shown in Figure 9. Two dominant eigenvalues close to 1 implies that time-snapshots G t ∈ G constructed from the cholera infection dataset can be divided into two states. Moreover, the eigenfunctions associated with these two dominant eigenvalues contain all information about the long-term behavior of the time-evolving graph G. Thus, applying some clustering method to two eigenfunctions, we can find the location of metastable states in G. We again use k -means with k = 2 for the first two eigenfunctions. This results in two subsets G = G 0 ∪ G 1 . After clustering eigenfunctions into two sets, we can compare the topological structures of time-snapshots of the two states. We compute the average adjacency matrices in each set as discussed in Section 5.1. The result is shown in Figure 10. We see that depending on the state, different clusters of vertices have different degrees. This is due to the fact that the cholera infection causes marked shifts in microbiome composition. The biological meaning of these clusters and how they are related to the healthy/ill state are open questions and need to be analyzed in future work.
Moreover, we compare the metastable states detected by graphKKE with the initial time periods of diarrhea and recovery. The ARI is shown in Table 3 (real-world dataset).
In addition, using the resulting embedding (eigenfunctions), we can further analyze the time-evolving graph G, e.g., we can predict the state of G at the next time points or we can find the probability of G returning to the diarrhea state if the person continues living in this area.

Discussion & Conclusion
The large variety of species and complex interactions in the microbiome makes it challenging for researches to analyze the responses of the microbiome to different perturbations such as diseases or antibiotic exposures and its influence on the human health. However, most studies aiming at understanding these dynamics are primarily focused on statistical constitution analysis omitting more complex interactions that can be described as a time-evolving graph. One solution is to represent each time-snapshot of the time-evolving graph as a fixed-length feature vector. Many existing approaches learn the embedding either of the static graphs or of the substructures such as nodes, edges, or subgraphs, whereas for some system it is of great importance to embed the entire time-snapshots of the time-evolving graph into a low-dimensional space preserving the global temporal mechanisms such as metastability.
In this paper, we introduced an unsupervised approach (i.e., class labels of single time-snapshots are not required to learn the embedding) for learning a mapping that embeds time-snapshots of a time-evolving graph exhibiting metastable behavior as points in a lowdimensional vector space. Our experiments on synthetic benchmark and real-world data show that our approach is capable of learning a low-dimensional representation of the time-evolving graph that preserves the metastable behavior. This embedding can then be clustered in order to split individual time-snapshots of the time-evolving graph into states. Moreover, one can also analyze the dynamics occurring in the time-evolving graph (e.g., the probability of jumping from one state to another or the probability that the graph will return to one of the states) and apply different machine learning techniques. Since we are dealing with graph-structured data, which usually represents the interactions between objects, we can extract structural information pertaining to particular states. The latter is beneficial in the case of biological interactions such as microbiome data, where it is crucial to understand the differences between states (e.g., healthy/ill). To this end, experimental results have shown that our approach can outperform several state-of-the-art methods for representation learning of graphs. For instance, the comparative analysis has shown that applying only Weisfeiler-Lehman kernel to the time-evolving graph is not sufficient to capture the underlying dynamical graph patterns and consequently, to detect the metastable sets.
We have shown that graph kernels are not only a powerful tool for analyzing static graphs but also for analyzing time-evolving graphs. The transfer operator approach in combination with graph kernels yields a method capable not only of extracting structural information in each time-snapshot of the time-evolving graph but also of identifying the evolution patterns, which may exist in timeevolving graphs with metastability over long periods of time.