Persistence Homology of Networks: Methods and Applications

Information networks are becoming increasingly popular to capture complex relationships across various disciplines, such as social networks, citation networks, and biological networks. The primary challenge in this domain is measuring similarity or distance between networks based on topology. However, classical graph-theoretic measures are usually local and mainly based on differences between either node or edge measurements or correlations without considering the topology of networks such as the connected components or holes. In recent years, mathematical tools and deep learning based methods have become popular to extract the topological features of networks. Persistent homology (PH) is a mathematical tool in computational topology that measures the topological features of data that persist across multiple scales with applications ranging from biological networks to social networks. In this paper, we provide a conceptual review of key advancements in this area of using PH on complex network science. We give a brief mathematical background on PH, review different methods (i.e. filtrations) to define PH on networks and highlight different algorithms and applications where PH is used in solving network mining problems. In doing so, we develop a unified framework to describe these recent approaches and emphasize major conceptual distinctions. We conclude with directions for future work. We focus our review on recent approaches that get significant attention in the mathematics and data mining communities working on network data. We believe our summary of the analysis of PH on networks will provide important insights to researchers in applied network science.


Introduction
Information networks are important tools to model the relationship between complex data. They exist in multiple disciplines such as social networks, biological networks, the World Wide Web and so on. Analysis of such networks includes many applications such as node classification [1,2], community detection [3,4], and link prediction [5,6].
The primary challenge in applied network science is measuring similarity or distance between networks without knowing node correspondences. Since comparing the graphs with the graph isomorphism is computationally expensive [7], many statistically oriented graph similarity measures have been proposed in literature [8,9]. While some of these methods embed the graphs into a feature space and then define distances on that space, other methods define kernel functions on graphs to build similarity measures [10]. Moreover, in graph-theoretic approaches, similarity measures are defined based on the difference in graph-theoretic features such as assortativity, betweenness central-arXiv:1907.08708v1 [math.AT] 19 Jul 2019 ity, small-worldness, and network homogeneity. However, such classical graph-theoretic measures are usually local and mainly based on differences between either node or edge measurements, or correlations without considering the network topology. Therefore, they may have information loss over topological structures, such as the connected components or holes in networks. On the other hand, structural holes in networks can give important information about network topology [11]. For instance, node importance can be measured based on structural holes. The unique characteristics of nodes in the location of structural holes can help to separate the structural holes nodes from other nodes.
Moreover, the existence and distribution of structural holes in networks can be used as important topological features for network comparison and classification [12].
In recent years, mathematical tools and deep learning based methods have become popular to extract the topological features of networks. Persistent homology (PH) is a mathematical tool in computational topology that measures the topological features of data that persist across multiple scales. Its applications range from biological systems [13] to computer vision [14]. The basic idea in PH is to replace the data points with a parametrized family of simplicial complexes, which can roughly be considered as a union of points, edges, triangles, tetrahedron and higher-dimensional polytopes, and encode the change of the topological features (such as the number of connected components, holes, voids) of the simplicial complexes across different parameters for data analysis [15]. For an extensive and rigorous introduction to the computation of persistent homology, we refer readers to the survey papers [16,17].
Nowadays PH is largely applied for the study of complex networks as a feature extractor since persistent homology gives multi-scales summarization of the graph, unlike the traditional metrics that describe the graph in specific angles. In this paper, we provide a conceptual review of key advancements in the area of using PH on complex network science.
The paper is structured as follows: In Section 2, we define and give the background on networks, simplicial complex, simplicial homology, and persistent homology. In Section 3, we list and compare the filtrations defined for networks. In Section 4, we highlight different algorithms and applications where PH is used in solving network mining problems. Lastly, we conclude the paper with directions for future work in Section 5.

Preliminaries
While a network can be represented as a graph, it can also be represented as other topological objects. Topology is a branch of mathematics that studies the property of the shapes that are invariant under continuous deformation such as stretching, twisting, bending but not tearing or gluing. For example, a donut and a coffee mug are topologically equivalent since one can transform one to the other continuously. Topological invariants, which are properties of the shapes that do not change under continuous transformation, are useful to detect whether given two shapes are topologically equivalent.
The number of connected components, the existence of holes or voids are examples of the topological invariants. Algebraic topology is the area in topology that extracts these invariants of an object by simply counting them or associating algebraic structures, such as vector spaces, to them. For example, for a given topological object X , homology as- For a finite set of points, e.g. a point cloud data, homology does not give interesting information. The dimension of H 0 (X ) gives the number of points, and the dimensions of the higher dimensional homology are zero. It is also similar in a network setting. The dimension of H 0 (X ) gives the number of disconnected subgraphs, H 1 (X ) gives the number of loops and the dimensions of the higher dimensional homology are zero since a graph does not have 2 and higher dimensional simplices. Hence, instead of just looking the homology of the finite set of points itself, using (1) a distance function, e.g. a correlation or a measure of dissimilarity between points, and (2) a parameter value, one can add simplices and check how homology changes across different scales. Persistent homology then tracks the change in homology as the parameter value increases and detects which topological features "persist" across different scales.
In general, it is very difficult to compute homology of an arbitrary topological object.
Hence, instead of doing this, we can approximate a topological object with a simplicial complex and then compute the homology, that is actually called simplicial homology.
In this section, we define how to define persistent homology on a finite set of points and networks. We first give a formal definition of graphs and explain their characteristics. We then define the simplicial complex and how to compute the simplicial homology. Finally, we briefly explain the persistent homology and two special metrics which are very useful for using persistent homology in data analysis applications.

Graphs
As a formal definition, a graph G is a pair of sets G = (V, E ) where V is the set of vertices and E is the set of edges of the graph. Networks can be represented via graphs where vertices represent the objects and edges represent the relations between objects. There are different types of graphs to represent different relations between vertices. While in an undirected graph, edges link two vertices symmetrically, in a directed graph (also called digraph in literature), edges link two vertices asymmetrically. If there is a score for the relationship between vertices which could represent the strength of interactions, we can represent this type of relations or interactions by a weighted network. In a weighted graph, a weight function W : E → R is defined to assign a weight on each edge. Weights could come from the Euclidean space or other spaces.
A graph G with n vertices can be represented by an n × n adjacency matrix. Entries of the matrix will be G i j = 1 for an unweighted graph and G i j = w i j for a weighted graph if there is an edge from vertex i to vertex j . If there is no edge between vertex i and vertex j , it will be G i j = 0. Furthermore, there are two different graph types we study in this paper. Firstly, a graph is called a metric graph if each edge is assigned a positive length and if the graph is equipped with a natural metric where the distance between any two points of the graph (not necessarily vertices) is defined to be the minimum length of all paths from one to the other. Secondly, a graph is called a dynamic (time-varying) graph if the graph varies over time, i.e. it can have vertex and edge deletions and additions.

Simplicial complex
Informally, a simplicial complex is a topological object which is built as a union of points, edges, triangles, tetrahedron, and higher-dimensional polytopes. The building blocks of a simplicial complex are called simplices (plural of simplex). Simplices are higher dimensional analogs of points, line segments, and triangles, such as a tetrahedron. We start this section with a formal definition of a simplex.
Definition 1 An i -simplex σ is the convex hull of i + 1 affinely independent points, i.e. the set of all convex combinations λ A 0-simplex is just a point, a 1-simplex is two points connected with a line segment, a 2-simplex is a filled triangle (see Figure 1). We call vertex for 0-simplex, edge for 1simplex, triangle for 2-simplex, and tetrahedron for 3-simplex. We can now define a simplicial complex roughly as a union of simplices, but these simplices need to be glued in a certain way. Here is the formal definition.

Definition 2 A simplicial complex is a finite collection of simplices K such that
1 Every face of a simplex in K also belongs to K .
2 For any two simplices σ 1 and σ 2 in K , if σ 1 ∩σ 2 = , then σ 1 ∩σ 2 is a common face of both σ 1 and σ 2 . (c) are not since there is a missing edge in (b) and two triangles meet along an edge which is not an edge of either triangle in (c).
The first condition says that if a simplex, e.g. a triangle, is in K , then its faces, such as its edges and vertices, need to be also in K . The second condition says that we can only glue simplices by their common faces. For example, we can glue two triangles by a common vertex or a common edge but cannot glue a vertex of a triangle on one of the edges of the other triangle. Figure 2-a is an example of a simplicial complex whereas Before we start to explain how to compute the homology of a simplicial complex, we define the clique complex of a graph G which will be a crucial concept to define most of the filtrations in Chapter 3.

Definition 3
The clique complex C l (G) of an undirected graph G = (V, E ) is a simplicial complex where vertices of G are its vertices and each k-clique, i.e. the complete subgraphs with k vertices, in G corresponds to a (k − 1)-simplex in C l (G).
For example, in Figure 3-a, there is a graph with a 4-clique on the left, 2-clique in the middle and 3-clique on the right. Hence, its clique complex, Figure 3-b, has a 3-simplex (tetrahedron), a 1-simplex (edge) and a 2-simplex (triangle).

Simplicial homology
In a simplicial complex, we can consider the holes as voids bounded by simplices of different dimensions. In dimension 0, they are connected components, in dimension 1, they are loops bounded by edges (1-simplices), in dimension 2, they are holes bounded by triangles (2-simplices) and in general, in dimension i , they are the holes bounded by i -simplices. The simplicial homology is the way to find the holes in a simplicial complex. To understand what simplicial homology is, we need to define the chains, and two special types of chains, namely cycles and boundaries.

Definition 4 Fix a dimension i and assume we use the field of integers. An i -chain is a
formal sum of i -simplices of a simplicial complex K with integer coefficients and the sum is taken over possible i -simplices. The set of all i -chains of K is denoted with C i (K ).
is actually a vector space over integers (more generally we can over any field such as real numbers). For simplicity, we assume the field is the binary field Z/2Z = {0, 1} from now on. To map an i -simplex to an (i − 1)-simplex, we define the boundary of an i -simplex as the sum of its (i − 1)-dimensional faces. Formally speaking, for an i -simplex σ = [v 0 , ..., v i ], its boundary is where the hat indicates the v j is omitted. We can expand Figure 4, ∂ 1 A = e + a and ∂ 2 T = E + D + F.
We should also note here that the boundary of a boundary is empty, i.e. ∂ i ∂ i +1 = 0. For example, in Figure 4, We can now distinguish two special types of chains using the boundary map that will be useful to define homology. The first one is an i -cycle, which is defined as an i -chain with empty boundary. In other words, an i -chain c is an i -cycle if and only if ∂ i (c) = 0, i.e. c ∈ ker(∂ i ). For example, the 1-chain A + B + C + F in Figure 4 is a 1-cycle since The set of all such i -cycles forms a subspace in C i (K ), which we denote Z i (K ).
Second special type of an i -chain is i -boundary: an i -chain c is an i -boundary if there For example. the one chain E + D + F is a 1-boundary since E + D + F = ∂ 2 (T ). The set of all such i -boundaries forms a subspace in C i (K ), which we denote B i (K ).
After defining these two special subspaces, i -cycles Z i (K ) and i -boundaries B i (K ) of C i (K ), we now take the quotient space of B i (K ) as a subset of Z i (K ). In this quotient space, there are only the i -cycles that do not bound an (i +1)-complex left. These are actually the i -voids of K . We call this quotient space as the i -th homology of the simplicial complex K The dimension of i -th homology is called the i -th Betti number of K , Basically, the i -th Betti number is the number of i -dimensional voids in the simplicial complex. For example, β 0 gives the number of connected components and β 1 gives the number of loops. In Figure 4, β 0 = 1, β 1 = 1, β 2 = 0.

Persistent homology
For a finite set of points X , e.g. a point cloud data, homology does not give interesting information. β 0 gives the number of connected components, which is just the number of points, and all other Betti numbers are zero since there are no other dimensional holes in the set. Hence, instead of working with the set of points, one can induce a family of simplicial complexes K δ X for a range of values of δ ∈ R out of the set X so that the complex at step m is embedded in the complex at n for m ≤ n, i.e. K m X ⊆ K n X . This nested family of simplicial complexes is called filtration (see Figure 5 for an example). During this construction, some holes may appear and then disappear and the persistency of these homological features can be considered as the features of the dataset. In a filtration, one can record the birth, the time a hole appears, and death, the time a hole disappears, of holes. The essence of the persistent homology is to tract the birth and death of these homological features in K δ X for different δ values. The lifespan of each homological feature can be represented as an interval, where the start and end points of the interval correspond to the birth and the death of the homological feature respectively. For a given dataset and a filtration, one can record all these intervals by a persistence barcode (PB) as a multiset of intervals bounded below [18]. Equivalently, a persistence barcode can be represented via persistence diagram (PD) that consists of the birth and death times of the features as a point (birth, death) in the extended real planeR 2 [19]. The longer bars in PBs and the points far away to diagonal in PDs are considered as the real feature of the dataset.  (2,7), (3,6) and (4,5) that correspond to the three bars in the 1-dimensional PB.

Two metrics for persistence diagrams
One may want to employ persistence diagrams to compare the corresponding datasets.
For example, in the network matching problem, we can create a persistence diagram for each network and compare the persistence diagrams to obtain the network similarity.
For such a comparison, we need to measure the distance between persistent diagrams using stable metrics. A metric is stable if a small perturbation of a dataset creates only a small change in the persistence diagram up to that metric. There are two metrics, which can be stable depending on how simplices are defined, that have been commonly used to measure the distance between diagrams: the bottleneck distance and the Wasserstein distance. We first define the bottleneck distance.

Definition 5 Let P and Q be two persistence diagrams. The bottleneck distance between
P and Q is defined as where γ ranges over all matchings from P to Q and ||p − q|| In other words, the bottleneck distance measures the distance between two persistence diagrams P and Q by the maximum distance between two points in a matching from P to Q. Hence, the bottleneck distance only outputs the distance between the greatest outlier, rather than the distance between all pair of points.
As an answer to this concern, the Wasserstein distance can be used.

Definition 6 Let P and Q be two persistence diagrams. The p-th Wasserstein distance
between P and Q is defined as where γ ranges over all matchings from P to Q and ||p − q|| p = (|p 1 In other words, the Wasserstein distance considers the total distance between the matched pair of points, hence provides an overall quantification for the similarity between persistence diagrams.

Filtrations
In this section, we review the filtrations defined for networks in the literature. We compare the filtrations according to their properties such as sensitivity to different network types (e.g. directed/undirected, weighted/unweighted). We also provide a comparison Throughout this section, we use the notations defined for graphs in Section 2.1.

Vietoris-Rips filtration (VR)
only includes the edges whose weight is less than or equal to δ. Then, for any δ ∈ R, we define the Vietoris-Rips complex as the clique complex of the 1-skeleton G δ , C l (G δ ), and the Vietoris-Rips filtration is then defined as In other words, in this filtration, we first start with the vertex set. We then rank the edge weights from the minimum weight, w mi n , to the maximum weight, w max , and let the parameter δ increase from w mi n to w max . At each step, we add the corresponding edges and take the clique complex of the thresholded subgraph G δ . This construction yields the Vietoris-Rips filtration on networks.
For the application purposes, we may prefer to add edges with larger weights before the ones with smaller weights to stress the importance of the weights. In other words, after adding the vertex set, we rank the edge weights from w max to w mi n and for any δ ∈ R, we add edges whose weight is bigger than or equal to δ. This yields a similar yet another filtration. This filtration is called the weight rank clique filtration by [20]. However, to be more concise, we prefer to call this as inverse Vietoris-Rips filtration.

Dowker sink and source filtration (DSS)
Using the idea of the Vietoris-Rips filtration, the authors in [21,22] define the Dowker δ-sink and δ-source simplicial complex on directed weighted networks that is sensitive In other words, there is a sink vertex x ∈ V such that there are edges from each x i ∈ σ to x with weights less than or equal to the threshold δ. Using this simplicial complex, they define the Dowker sink filtration as follows They similarly define a dual construction, namely the Dowker δ-source simplicial complex associated to a directed weighted network G, as follows The only difference here is the edge directions: there is a source vertex x ∈ V such that there are edges from x to each x i ∈ σ with weights less than or equal to the threshold δ.
Similarly, they define the Dowker source filtration as Dowker sink and source (DSS) filtrations are formed with respect to a central authority x ∈ V , hence they could be preferred on networks, such as small-world networks, who would desire simplices to be formed with respect to particular hub nodes.
The authors also prove that both filtrations generate the same persistent diagram.

Clique complex filtration (CCL)
For a graph G with n ∈ Z vertices and its clique complex C l (G), the clique complex filtration is defined as the set of simplices of dimension less than or equal to j [23]. In other words, in this filtration, we add the vertices at δ = 0, add the edges at δ = 1, add the triangles at δ = 2 and so on.

Vertex-based clique filtration (VBCL)
This filtration is originally defined in [24] for just the 0-dimension, however it can be extended to higher dimensions as well. Let G = (V, E ) be a graph with a vertex weight function ω : V → R. For this filtration, we use vertex weights, instead of the edge weights, as threshold values. For any δ ∈ R, the 1-skeleton Then, for any δ ∈ R, using the clique complex of the 1-skeleton Furthermore, we can also define the inverse vertex-based clique filtration by just filtering from ω max to ω mi n as we do in the Vietoris-Rips filtration.

k-clique filtration (kCL)
This filtration is used to detect the evolution of k-clique communities for a fixed k in [24]. In this filtration, we assume the graph G = (V, E ) has a vertex weight function ω : V → R. First, using the vertex weights, we assign a weight function ω(·) on an arbitrary clique σ inductively as i.e., the maximum weight of its vertices. Second, for a fixed k, we detect all k-clique communities in G and create the k-clique connectivity graph G k = (V k , E k ) where there is a vertex for every k-clique of G and its edges are defined by i.e., σ and σ intersect in a (k − 1)-clique, in other words, they share k − 1 vertices in common. We then extend the weight function ω(·) to the edges of G k by setting Next, in a similar way, for any δ ∈ R, the 1-skeleton Then, for any δ ∈ R, using the clique complex of the 1-skeleton G k δ , the filtration is defined as This filtration is unique in a sense that it just focuses on the evolution of the k-clique communities only in the original graph.

Weighted simplex filtration (WS)
In the previous filtration, we assign weights to arbitrary cliques, i.e. simplices, using the vertex weights. Alternatively, one may use another way to assign weights to simplices and use these weights to create a filtration. For example, Huang et all [25] assign weights to each simplex in a simplicial complex K based on relationship functions in a given dissimilarity network. For any δ ∈ R, they define K δ ⊆ K to be the collection of simplices appearing before or on δ. Then, this construction yields the filtration To be a well-defined filtration, we need to have all faces of each simplex and intersections of any simplices in K δ also appear before or on δ. They prove that this filtration from a given dissimilarity network is a well-defined filtration, i.e. satisfies both conditions.

Vertex function based filtration (VFB)
Let G = (V, E ) be an undirected graph and f : V → R be a function defined on its vertices.
We construct the sublevel graphs sequence of increasing subgraphs. The sublevel vertex function based (VFB) filtration is given by taking the clique complex of each sublevel graph Similarly, we can construct the superlevel graphs −∞ provides a nested sequence of increasing subgraphs which yields to the superlevel vertex function based (VFB) filtration as follows

Intrinsic Čech filtration (IC)
This filtration is defined only for metric graphs in [26].
The associated intrinsicČech filtration is defined as the set of inclusion maps

Functional metric graph filtration (FMG)
This filtration is also defined for metric graphs only [27]. Let G be a metric graph and take a fixed point s ∈ G. They consider f : Then the filtration is given by Similarly, instead of using super-level set of G, one can use the sub-level set of G for each δ, G δ := {x ∈ G| f (x) ≤ δ}. This yields another filtration Depending on problems, one may choose either of the filtrations. the power filtration is given by

Power filtration (POW)
where C l denotes the clique complex.

Temporal filtration (TMP)
This filtration is defined in [28] for dynamic (time-varying) networks. If the network is growing in time t 0 < · · · < t n , this will yield a sequence of networks {G t , t = t 0 , ..., t n } where the network G t represents the network occurred until time t . This network sequence results in the temporal filtration given by the clique complex of each network

Zigzag simplicial filtration (ZSF)
This filtration is also defined for dynamic networks. While TMP only considers the ver- (2) for any scale parameter value δ ∈ R, it holds that G δ− ⊆ G δ ⊇ G δ+ for all sufficiently small > 0. Then we use the zigzag persistent homology to obtain the persistence barcodes/diagrams [29]. The same basic idea applies in the zigzag persistent homology. For example, for the 0-dimensional zigzag persistence barcodes, we just track the number of connected components in the filtration.

Digraph filtration using Persistent Path Homology (PPH)
In [30], the authors define a new way to construct homology on networks: persistent path homology (PPH) which is sensitive to the edge directions. We summarize the construction in 4 steps.
Step 1: Let G = (V, E ) be directed weighted graph. Given any integer p ∈ Z + , an elementary p-path over V is a sequence [v 0 , ..., v p ] of p + 1 vertices of V . For each p ∈ Z + , the free vector space consisting of all formal linear combinations of elementary ppaths over V with coefficients in K is denoted Λ p = Λ p (V ) = Λ p (V, K). One also defines Λ −1 := K and Λ −2 := {0}. Next, for any p ∈ Z + , one defines the non-regular boundary map [v 0 , ..., v p ] is regular }] and I p is irregular one. We have ∂ nr p (I p ) ⊆ I p−1 so ∂ nr p is well defined on Λ p /I p . Since R p ∼ = Λ p /I p via a natural isomorphism, one can define ∂ p : R p → R p−1 as the pullback of ∂ nr p via this isomorphism. ∂ p is called the regular boundary map and now we have a chain complex (R p , ∂ p ) p∈Z + .
Step 3: For each p ∈ Z + , one defines an elementary p-path [v 0 , ..., v p ] on V to be al- For each p ∈ Z + , the free vector space on the allowed p-paths on (V, E ) is denoted A p = A p (G) = A p (V, E , K) and is called the space of allowed p-paths. Furthermore, A −1 := K, A −2 := {0}.
Step 4: The allowed paths do not form a chain complex since the image of an allowed path under ∂ need not to be allowed. To handle this problem, they define the space of ∂-invariant p-paths on G as the following subspace of A p (G): After getting the filtration, instead of the persistent homology, they apply the persistent path homology (PPH). They show that in an undirected graph, PPH and Dowker filtrations agree in dimension 1 if a certain local condition is satisfied (need to be squarefree). The authors also prove the stability result for PPH.

Generalizations of Vietoris-Rips filtration (GVR)
The author in [31] uses ordered-tuple complexes instead of using simplicial complexes to increase the flexibility regarding order. An ordered-tuple complex (OT-complex) is a collection K of ordered tuples such that if (v 0 , v 1 , v 2 , ..., v n ) ∈ K then (v 0 , ...,v i , ..., v n ) ∈ K for all i (where (v 0 , ...,v i , ..., v n ) is the ordered tuple with v i removed). For example, the tuples (v 1 , v 2 , v 3 ) and (v 3 , v 1 , v 2 ) are distinct. One can define a chain complex, homology and k-th dimensional ordered tuple persistence homology of OT-complexes as defined for simplicial complexes. Then, the author defines the four generalization of Vietoris-Rips filtrations. She also proves the stability theorem for each case. Hence, we will not mention the theorem for each case separately. For the following filtrations, let (V, f ) be the vertex set and a function f : V × V → R ( f can be considered as an edge weight function).

Vietoris-Rips filtration under sym a
For any a ∈ [0, 1] we can define a symmetric function sym a ( f ) : t for all i , j . This filtration is called the Vietoris-Rips filtration under sym a of (V, f ).

Directed Vietoris-Rips filtration
This filtration is called {R d i r (X ) t } the directed Vietoris-Rips filtration of (V, f ).

Associated filtration of directed graphs
There is a natural filtration of directed graphs {D(V ) t : t ∈ [0, ∞)} associated to V by set-

Algorithms and Applications
In recent years, persistent homology has found applications in data analysis, including neuroscience [32], time series data [33], text mining [34] and shape analysis [35]. In the complex network setting, while some studies analyze the evaluation of a single graph, some studies analyze multiple graphs for graph matching and classification with characterizing the temporal changes in topological features of a network. Besides these, while some studies use Betti numbers, some studies use persistent diagrams to extract some statistical features of the network. In this section, we categorize the persistent homology enabled applications as single graph and multiple graph analysis. We explain the algorithms and applications of each study in their corresponding sections. We also provide a comparison table, Table 2 and Table 3, for algorithms with datasets after each section.

Analysis on Single Graph
In some applications, persistent homology is used to detect global structural features of a single network such as complexity and distributions of strongly connected regions.
While some applications analyze the evaluation of a single graph according to edge weights, others analyze the evaluation of the graph over time.
In many studies, Betti numbers are used as the complexity measure for different networks. Benzekry et al. [36] propose that cancer therapy can be guided by changes in the complexity of protein-protein interaction (PPI) networks. They analyze 11 cancer interaction networks and find out that there is a correlation between 1-dimensional Betti Similar to this, Rucco et al. [37] use Betti number as persistent entropy to measure the graph complexity. They study the behavior of the idiotypic network of the mammal immune system. Their main goal is to detect the behavior of the immune system reaction to an external stimulus in terms of phase transitions. In addition to the persistent entropy, they use 2 other graph complexity measures, which are the connectivity entropy and the approximate von Neumann entropy [38]. While connectivity entropy is used to analyze the structural properties and to identify the set of key players of the idiotypic network, approximate von Neumann entropy is used to distinguish graphs corresponding to the same system but in different conditions. For persistent entropy, they use persistent barcodes constructed with the Vietoris-Rips filtration (VR). In their experiment, they create the simulation of the idiotypic network and obtain a weighted idiotypic network using the coexistence function as a weight function between antibodies. After computing the 3 different entropy measures on this network, they identify that peak on entropy corresponds to the activation of the immune response. While the connectivity entropy does not distinguish between the activation and the immune memory states, both the approximated von Neumann entropy and the persistent entropy are able to recognize the activation of the immune system. The analysis of the Betti numbers reveals that there is a subset of antibodies arranged in a 1-dimensional hole that is present both in the activation state and in the memory state.
Cliques and cycles are important structural features of complex networks to describe their cohesive structures. Rieck et al. [24] use persistent homology to detect clique communities and their evolution in weighted networks. Persistent diagram is created using the vertex-based clique filtration (VBCL) and the k-clique filtration (kCL). They analyze the connectivity relations for all clique degrees and all weight thresholds. Various networks are studied including co-occurrence network, brain network, and collaboration network. An interactive visualization tool is created that is capable of detecting and tracking the evolution of networks' clique communities for different thresholds and clique degrees.
Persistent homology is also used to analyze the brain networks by computing distributions of cliques (brain regions) and cycles (strongly connected regions) in them. In [39], the authors review the underlying mathematical background of using simplicial complex in neural data, specifically brain networks. They list different types of simplicial complexes for encoding neural data such as networks, clique complex, independence complex, and concurrence complex. They also elaborate on using persistent homology to measure the global structure of simplicial complexes and the strength of neural connections using the weighted simplex filtration (WS) to generate persistent diagrams.
In [40], the authors test the hypothesis that the spatial distributions of cliques and cycles will differ in their anatomical locations. They construct 1-and 2-dimensional persistent diagrams of brain networks using the Vietoris-Rips filtration (VR). The structural brain networks of eight volunteers is extracted using diffusion spectrum imaging. The undirected and weighted network consists of 83 nodes representing different brain regions and edges that refer to the density of white matter between the nodes. Weak and strong connections between cliques are assessed by observing the difference between birth and death times of k-cliques in persistent diagrams.
Additionally, in [41], persistent homology is also used to analyze the brain networks Moreover, in [42], the authors demonstrate that persistent homology is useful in analyzing functional brain connectivity. The application involves electroencephalography (EEG) data from eight cortical regions of corticosterone (CORT) induced depression mouse and control models. After the EEG measurement is obtained, the square root of (1-correlation) distance metric is used to create a binary network. Next, the Vietoris-Rips filtration (VR) is applied and used to visualize topological changes by 0-dimensional barcodes which are then used to construct single-linkage dendrograms (SLD). Finally, single-linkage distance is computed using the generated SLDs. The results show that CORT model is characterized by an increased local connectivity and by a decreased global connectivity.
Besides its utility on brain networks, persistent homology is also used to analyze word co-occurrence, remittance, and migration networks. In [43], the authors study the word co-occurrence networks to explore the conceptual landscape of mathematical research.
They first create the network using 54177 articles in arXiv from 01/1994 to 03/2007. Then they parse a concept list from Wikipedia that includes 1612 equations, theorems, and lemmas. Next, they combine these two datasets by checking 1612 concepts' appearance in the articles and find that 1067 of them match in at least one article and 35018 articles contain at least one of the concepts. They first take 1067 concepts as nodes and include a (n − 1)-simplex for each article containing n-concepts. Furthermore, whenever the concept sets of two articles intersect at n concepts, their corresponding simplices share a face of dimension (n − 1). In total, this construction results in 32707 unweighted edges. They use the temporal filtration (TMP) using article dates. They create the 1-and 2-dimensional persistent diagrams, i.e. they just look at the 2-dimensional holes bounded by edges and 3-dimensional holes bounded by triangles respectively.
They interpret these holes to explain the intrinsic characteristics of how research evolves in mathematics. They also explore the authors' conceptual profile using the holes and their attributes to the holes.

Multiple Graphs Analysis
Graph comparison is an important task for many graph applications such as classification and matching. On the other hand, it is a computationally complex problem where we need to compute the similarity between 2 networks [46]. It has been studied for many years and defined as either exact matches (e.g. graph isomorphism [47]) or some measures of structural similarity (e.g. graph edit distance [48]). Graph kernels are also used to capture the graph similarity [49]. Recent years, persistent homology is used to extract topological features of networks to compare them.
While most existing metrics for network structure rely on local features of vertices such as node degrees, correlation of neighborhood nodes, they do not capture the precise mesoscopic structure of complex networks. Sizemore et al. [50] extract mesoscale homological features as 0-3 dimensional Betti numbers. They use the Vietoris-Rips (VR) filtration to compute the homology and record the maximal clique distribution and Betti sequence. Extracted features are used to classify 14 commonly studied weighted network models into four groups or classes with agglomerative hierarchical clustering to use for graph classification. Betti values and parameters from the maximal clique distribution are used to determine the structural similarities between networks. After classifying networks into groups, they analyze the structural patterns in each group of networks.
In [20,51,52], persistent homology is used to detect particular non-local structural In [55], the authors propose to use persistence diagrams for graph classification problem for undirected weighted graphs. They first define a graph kernel function, namely heat kernel signatures [56], on networks and use the sublevel and superlevel VFB filtration on each network to generate PDs. Then they employ two layers neural network architecture to process the PDs and classify the graphs. They evaluate their classification model on social networks, medical and biological networks. They also compare their results with four different state-of-the-art graph classification methods and show that their method has comparable results despite being much simpler than other methods.
Moreover, persistent homology is used to analyze the structure of weighted networks.
In [57], the authors consider the collaboration networks as weighted network. They use the Vietoris-Rips (VR) filtration to generate the persistence barcodes of networks. They employ the Betti numbers of 0, 1, and 2 dimensions and use them to distinguish collaboration networks from random networks. They conclude that the first and second Betti numbers give us richer information about weighted networks.
Siddharth et al. [28] study the growing collaboration network with a temporal parametrization and characterize the temporal changes in its topological features. In a collaboration network, each person in a paper or a movie is represented as a vertex, and each collab- In [58], the authors consider the national input-output networks of domestic products as a weighted network and use persistent homology to identify dissimilarities between them. The nodes are available sectors in an economy and edge weights are the monetary flow measuring the magnitude of the economic relationship between two sectors. They generate persistence diagrams for dimensions 0, 1, and 2 with the Vietoris-Rips (VR) filtration. Using 0-dimensional diagrams, they distinguish economies with high GDP, large population, and small import/export percentages of GDP from those with lower GDP, small population, and larger import/export percentages. They also discuss the potential for applying higher-dimensional persistent homology to study these networks.
Similarly, financial networks are considered as weighted networks and persistent homology is used to detect early signs of critical transitions of financial crisis in [59]. The Furthermore, in [60], the undirected attributed networks are considered as weighted networks. They first assign weights on edges using the vertex attributes. Then, they extract the ego-networks of each vertex and define a graph kernel function, namely the diffusion Fréchet function [61], on each ego network that takes both the network topology and edge weights into consideration. Next, they generate the persistence diagrams of each ego network using the sublevel and the superlevel VFB filtration and obtain the distance matrix between each vertex computing the Wasserstein distance between their persistence diagrams. Finally, they cluster the network using the k-means clustering algorithm.
Beside previous weighted networks, brain networks are considered as sparse weighted networks and persistent homology is also used to analyze them [62]. They obtain the topological structure of a graph induced by sparse correlation. They first transform MRI and DTI data to weighted networks where they employ the sparse Pearson correlation to obtain the edge weights. They generate the 0-dimensional Betti plots for the brain networks using the Vietoris-Rips filtration (VR). They also generate Betti plots using sparse covariance. They show that the sparse correlation method gets a huge group separation between normal and stress-exposed children visually. This method is also less computationally expensive than the sparse covariance method.
In [63], the authors study dynamical connectome state analysis on brain networks using three different methods: k-means clustering, modularity based clustering and topological feature based clustering. They consider brain networks as weighted networks. In topological feature based clustering, they use the Vietoris-Rips (VR) filtration. They first split the correlation matrix to the two matrices with positive and negative correlations.
Then, they create VR filtrations for both matrices. In their clustering, different type of connections describes different processes in the brain, so they compute persistent homology with annotated intervals collection. After getting the intervals, they compute different statistics for each homology group and for types of interactions. Then, finally, they perform hierarchical clustering based on these topological features. They show that topological feature based clustering is more informative than the other two clustering methods.
In addition to this, in [21,22], the authors classify the brain (hippocampal) networks using persistence diagrams . They consider five different environments with 4 holes, 3 holes, 2 holes, 1 hole and no hole and for each environment, 20 simulated brain networks are created. Persistence diagrams of these 100 networks are computed with the Dowker filtration (DSS). They use the bottleneck distance between the 1-dimensional diagrams of networks to compare them. Finally, they classify the networks using the single linkage dendrogram algorithm and show that Dowker filtration is successful in capturing the differences between the five classes of networks. The authors also work on the same problem and dataset using the zigzag simplicial filtration (ZSF) in [64]. They create 1-dimensional zigzag persistent diagrams to perform persistent homology computations on dynamic simplicial complexes resulted from these brain networks.
In [65], the authors show that persistent homology, or more precisely persistence vineyard, is a robust approach to estimate functional connectivity in the resting and gaming stages of the brain networks. They conduct an experiment with 26 male college students aged 19-29 years old from two universities located in Seoul, Republic of Korea.
They undergo all the 26 healthy subjects resting and gaming experiments. Each stage was recorded for five minutes separately. They segment their data using 30s window lengths and 2s step size. For each window, they compute the persistence diagram using Pearson correlation between brain channels employing the weighted simplex filtration (WS). Then, they compute the 0-dimensional persistence vineyard to analyze the dynamic brain connectivity. In a brief, a persistent vineyard is a p dimension persistent diagram with a time dimension added, tracking the birth and death of p dimension diagrams in a time-varying topological space [66]. Their results show that persistence vineyard is successful to determine the temporarily dynamic properties of the brain in a robust and threshold-free way. They also show that persistent vineyard is more effective than the principal component analysis (PCA) and standard graph theoretical methods.
[67] compares resting state functional brain activity in 15 healthy volunteers after intravenous infusion of placebo and psilocybin using persistent homology and other statistical methods-density function. First, the raw data from fMRI (functional magnetic resonance imaging) dataset is transformed into a functional network. They create the 1-dimensional persistence diagram using the Vietoris-Rips (VR) filtration. Later, they define two different homological scaffolds depending on how frequently edges are part of the generators of the persistent homology groups and how persistent are the generators to which they belong to. The results show that the homological structure of the brain's functional patterns undergoes a dramatic change post-psilocybin, characterized by the appearance of many transient structures of low stability and of a small number of persistent ones that are not observed in the case of placebo.
In [23], the authors first study random graphs using the clique (CCL) filtration. Using different probabilities, they generate random networks and compute their barcodes.
They show that the results on these barcodes are in agreement with the theoretical studies on these complexes. As another application, they study an email network. They create barcodes and show that higher dimensional barcodes, which do not exist for random networks, correspond to more dense communications among certain groups. They also apply their methods on scale-free networks with a modular structure. They use three different parameters to generate three types of scale networks: Clustered modular networks, clustered non-modular networks and non-clustered modular networks.
They show that both clustered modular and clustered non-modular networks have more bars in their 3-dimensional and 4-dimensional barcodes than the non-clustered modular network.
Moreover, persistent homology is used for metric graph comparison. In [27], the authors first introduce the functional metric graph filtration (FMG) on metric graphs.
Then, they define the persistence distortion distance between two finite metric graphs using the persistence diagrams from FMG filtration. In their experiment, they show the stability of the proposed distance measure on the Athen's road network as a metric graph and generate its noisy sample using a noise level . The results show that the persistence distortion distance between the original graph and its noisy sample grows roughly proportionally to . They also use proposed persistence distortion distance to compare surface meshes of different geometric models. Models from the same group have very smaller persistence distortion distances among them than those between the dissimilar group, which shows the that proposed distance is able to differentiate surface models.
In [68], persistent homology is employed to quantify structural changes in time- High order networks are weighted complete hypergraphs collecting relationships between elements of tuples. Computing distance between high order network is difficult when the number of nodes is large. In [25], the authors use persistent homology to derive distance approximations of networks. They compute the bottleneck distance between persistence diagrams of networks to evaluate the differences between networks.
They first define a relationship function between a set of nodes to represent a measure of similarity or dissimilarity for members of the group. They use this function to assign weight on each simplex. Using these weights and the weighted simplex filtration (WS), they generate 0-, 1-and 2-dimensional persistence diagrams. They show that they can lower bound distance between two higher order networks, which is in general computationally expensive, with a computationally less expensive distance between their persistence diagrams. They apply their method to the coauthorship networks. They first create the networks using 5 journals from the mathematics community and 6 journals from the engineering community. They use the lower bounds to classify the networks, distinguish the collaboration patterns of engineering and mathematics community and also discriminate engineering communities with different research interests.
To answer the question of whether the existing anonymization mechanisms for preserving privacy truly keep the graph utility, Gao et al. [69] employ persistent homology to analyze and evaluate four anonymization mechanisms. They study online social networks (OSN). They define the distance between two nodes as the number of hopes on the shortest path between these nodes and create 0-,1-and 2-dimensional persistence VR 0 dim Betti plots Brain networks [63] VR 0-1 dim PD Brain networks [67] VR 1 dim PD Brain networks [65] WS 0 dim persistence vineyards Brain Networks [25] WS 0-2 dim PD Collaboration networks [57] VR 0-2 dim Betti numbers Collaboration networks [28] TMP 1 dim Betti numbers Collaboration networks [50] VR 0-3 dim Betti numbers PPI, brain and simulated weighted networks [58] VR 0-2 dim PD Economy networks [59] VR 0-1 dim PD Finance networks [23] CL 0-11 dim PD Random, email and scale-free networks [27] FMG 0 dim PD Road networks [70] VSF 0 dim zigzag PD Dynamic biological networks [30] PPH 1 dim PD Cycle networks [68] POW 0-1 dim PD Dynamic communication networks [60] VFB 0-1 dim PD Attributed social networks [69] POW 0-1 dim PD Online social networks [55] VFB 0-1 dim PD Social, medical and biological networks [20,51,52] VR 1 dim PB Social, infrastructural and biological networks barcodes using this distance in the power filtration (POW). They analyze the original and anonymized OSNs using the barcodes. The results show that original OSN graphs have stable structures. Furthermore, the 0-dimensional barcodes they obtain show that most anonymized OSNs are more closely connected than the original graph. All anonymized graphs are not as stable as the original graph, because they have more 2-dimensional holes or larger holes. They also compare their results with traditional graph metrics.
In [70], the authors study flocking/swarming behaviors in animals. They first create dynamic graphs and simplicial complexes using the Vietoris-Rips complex for a fixed scale parameter. Then, they construct the zigzag simplicial filtration (ZSF) and obtain 0-dimensional zigzag persistent diagrams to classify the four different type of flocking behavior in animals. Finally, using the bottleneck distance, the single linkage hierarchical clustering, and MDS, they distinguish the 4 behaviors very well.
In [30], the authors characterize the directed cycle networks by digraph filtration using persistent path homology (PPH). They prove that the persistent diagrams of a cycle network with n nodes for n ≥ 3 solely depends on n.

Conclusion
In this paper, we provide a conceptual review of key advancements in the area of using PH on applied network science. We look into research studies that use PH on networks and highlight different algorithms that are used to extract topological features of networks. We review the applications where PH is used in solving network mining problems. We believe our summary of the analysis of PH on networks will provide important insights to researchers in applied network science.
At the moment, the implicit goal of most studies is to extract the topological features of the networks that persist across multiple scales. However, there are some limitations to these studies. Firstly, scalability may be a concern for future progress. The networks used in these studies are mostly small networks (number of vertices is less than 1000).
There is still significant work needed to be done in scaling PH approaches for larger networks. Secondly, there are some filtrations whose stability has not proven yet.
Furthermore, although many filtration methods are proposed, they are mainly designed for static networks. However, many real-world networks are evolving over time.
For example, in the Facebook network, friendships between users always dynamically change over time, hence new edges are continuously added to the social network while some edges may be deleted. Most of the existing methods cannot be directly applied to large scale evolving networks. New filtration algorithms, which are able to tackle the dynamic nature of evolving networks, are highly desirable in persistent homology.
As another future research direction, PH can also be used for network and subnetwork embedding problems.