 Research
 Open access
 Published:
Chained structure of directed graphs with applications to social and transportation networks
Applied Network Science volume 7, Article number: 64 (2022)
Abstract
The need to determine the structure of a graph arises in many applications. This paper studies directed graphs and defines the notions of \(\ell\)chained and \(\{\ell ,k\}\)chained directed graphs. These notions reveal structural properties of directed graphs that shed light on how the nodes of the graph are connected. Applications include city planning, information transmission, and disease propagation. We also discuss the notion of incenter and outcenter vertices of a directed graph, which are vertices at the center of the graph. Computed examples provide illustrations, among which is the investigation of a bus network for a city.
Introduction
A complex system that is composed of separate items that are interconnected in some way can be modeled by a network. Networks are represented by graphs, which are made up of nodes and edges. The latter connect the nodes. Networks arise in many areas of science and engineering, such as biology, communication, transportation, and social media; see e.g., Estrada (2012), Newman (2010) for discussions of these and many other applications.
The edges in a network may have weights, which are real values and generally positive, and may measure the strength of the interaction between linked nodes. The connections may have a direction. A graph is referred to as undirected if all edges are undirected, i.e., they are “twoway streets;” a graph with at least one directed edge (which can be thought of as a “oneway street”) is said to be directed. We are concerned with directed unweighted graphs without selfloops. Thus, all edges have the same weight (which we set to one), and there are no edges from a node back to itself.
A considerable number of mathematical and computational methods for studying networks have been developed. Among the aims of network analysis is the identification of the most important nodes or edges of a graph by using the notion of centrality, which first arose in the context of social science, or to determine the structure of the underlying graph; see, e.g., De la Cruz Cabrera et al. (2020), De la Cruz Cabrera et al. (2021), Estrada (2012), Estrada and Higham (2010), Newman (2010) for many examples.
A fundamental topological property of a graph, which will be briefly recalled in the section “Notation and some properties of graphs and networks”, is multipartivity. The nodes in an mpartite graph can be split into m disjoint subsets \({{\mathcal {V}}}_i\), \(i=1,2,\ldots ,\ell\), called partite sets, with connections occurring only between the subsets, but not within the subsets. When \(\ell=2\), the graph is said to be bipartite. A refinement of bipartivity for undirected graphs, referred to as the chained structure of the graph, was introduced in Concas et al. (2021). The chained structure characterizes undirected multipartite graphs; an mchained graph has only edges between nodes that belong to “subsequent” partite sets \({{\mathcal {V}}}_i\) and \({{\mathcal {V}}}_{i+1}\), \(i = 1, 2, \ldots , \ell1\) (and vice versa). This paper extends the notion of chained graphs from undirected graphs, to directed graphs. The chained structure reveals the “depth” of a graph, i.e., how many steps it may take to go from a specified node to any other node, by following edges along their direction.
In Concas et al. (2021), we used chained graphs to identify central nodes by introducing the position centrality measure for nodes of an undirected graphs. This notion is a generalization of closeness centrality. Central nodes are identified by their location in the chained structure. For an overview of other centrality measures; see Borgatti (2005), Estrada and Higham (2010), Estrada and RodriguezVelazquez (2005). This paper generalizes position centrality to directed graphs. Specifically, for directed graphs that have directed spanning trees, we define inposition and outposition centralities of a node by examining two different types of directed spanning trees associated with the graph; see Gabow and Myers (1978) for a discussion on directed spanning trees. These centrality concepts shed light on the ease of communication within a network. In the two sections devoted to numerical examples, we compare them to other existing centrality measures. In general, it is impossible to state which centrality measure is the best, as the concept of centrality takes different meanings in different applications. What we show is that position centrality, by varying the value of the parameter on which it depends, is able to spot specific aspects of a network that are not detected by traditional measures, and that depend upon the underlying chained structure.
The identification of the chained structure of a directed graph also can be useful for detecting the presence of anticommunities, i.e., node subsets that are loosely connected internally, but have many external connections with the rest of the graph. Several methods have been developed to identify anticommunities in undirected graphs; see Concas et al. (2020), Estrada and Knight (2015), Fasino and Tudisco (2017). The relation between clustering and community detection in directed graphs has been discussed in Laenen and Sun (2020). In Concas et al. (2021), we illustrated how the chained structure may be used for introducing a density measure for computing an “anticommunity score” for undirected graphs. We extend this measure to directed graphs in the present paper. To the best of our knowledge, while the identification of anticommunities has been studied in the literature (Estrada and Knight 2015) (see also Concas et al. 2020; Fasino and Tudisco 2017), the identification of nearanticommunities (which are associated with a small anticommunity score) has not been discussed yet.
This paper is organized as follows. The section “Notation and some properties of graphs and networks” introduces notation and discusses general properties of graphs that will be used later. Directed chained graphs are defined in the section “Directed \(\ell\)chained graphs and their adjacency matrices”. They can be studied with the aid of directed spanning trees. This is discussed in the section “Directed chained graphs and directed spanning trees”. The chained structure naturally leads to the concept of position centrality, defined in “Position centrality and some applications” section. Nodes with the largest position centrality are referred to as central nodes. Some data sets deriving from realworld applications, including a social network, are analyzed in the section “Some examples”. The section “A case study about position centrality” sheds light on the properties of center nodes by considering a case study concerning a bus transportation network. Finally, the section “Conclusion” contains concluding remarks.
Notation and some properties of graphs and networks
A network can be represented by a graph \({{\mathcal {G}}}=\{{{\mathcal {V}}},{{\mathcal {E}}}\}\), where \({{\mathcal {V}}}=\lbrace v_i\rbrace _{i=1}^n\) is a set of nodes, or vertices, and \({{\mathcal {E}}}=\lbrace e_i\rbrace _{i=1}^m\) a set of edges, which connect the nodes. Two nodes \(v_i\) and \(v_j\), for \(i\ne j\), are said to be adjacent if there is an edge from node \(v_i\) to node \(v_j\). In this context, an undirected edge between the nodes \(v_i\) and \(v_j\) points both from \(v_i\) to \(v_j\) and from \(v_j\) to \(v_i\). The node \(v_i\) is said to be connected to the node \(v_j\) if there is a path from \(v_i\) to \(v_j\), that is, if there is a sequence of edges \(\{e_{r_s}\}_{s=1}^k\) such that \(e_{r_1}\) originates from \(v_i\), \(e_{r_k}\) points to \(v_j\), and if \(e_{r_s}\) points to \(v_\ell\), then \(e_{r_{s+1}}\) starts from the same node for \(s=1,2,\ldots ,k1\). A cycle is a path that starts and ends at the same node \(v_i\).
An undirected graph is connected if each pair of distinct nodes is connected by a path. A directed graph is said to be strongly connected if for each vertex pair \((v_i,v_j)\) the node \(v_i\) is connected to the node \(v_j\), and the node \(v_j\) is connected to \(v_i\). A directed graph is said to be semiconnected if for each vertex pair \((v_i,v_j)\) either the vertex \(v_i\) is connected to the vertex \(v_j\), or \(v_j\) is connected to \(v_i\). A directed graph is weakly connected if there is a path between each vertex pair \((v_i,v_j)\) in the underlying undirected graph, that is, in the undirected graph obtained by replacing all directed edges by undirected ones. We refer to Estrada (2012) and Newman (2010) for discussions on graphs and their properties.
An unweighted graph \({{\mathcal {G}}}\) with n vertices can be represented by an adjacency matrix \(A=[a_{ij}]_{i,j=1}^n\) with \(a_{ij}=1\) if there is an edge from vertex \(v_i\) to vertex \(v_j\); otherwise \(a_{ij}=0\). Since an undirected edge can be thought of as being made up of two directed edges (in opposite directions), the adjacency matrix of an undirected graph is symmetric; the adjacency matrix of a directed graph is nonsymmetric.
Multipartivity and, in particular, bipartivity are fundamental topological characteristics of graphs that model interactions between different types of objects. Bipartite graphs contain vertices that can be partitioned into two disjoint vertex subsets \({{\mathcal {V}}}_1\) and \({{\mathcal {V}}}_2\), such that there are no connections between vertices in the same subset. Assume that the n vertices of a bipartite graph \({{\mathcal {G}}}\) are separated so that the first \(n_1\) vertices make up the vertex set \({{\mathcal {V}}}_1\) and the remaining \(n_2=nn_1\) vertices make up the vertex set \({{\mathcal {V}}}_2\). Then the adjacency matrix A of \({{\mathcal {G}}}\) is of the form
where \(C_1\in {\mathbb {R}}^{n_1\times n_2}\), \(C_2\in {\mathbb {R}}^{n_2\times n_1}\), and O denotes a zeromatrix of suitable order. If the graph \({{\mathcal {G}}}\) is undirected, then \(C_2=C_1^T\), where the superscript \(^T\) denotes transposition.
In undirected \(\ell\)chained graphs, the nodes are divided into \(\ell\) disjoint subsets
so that there are edges only between nodes belonging to “adjacent” node sets, that is, all edges from a node in \({{\mathcal {V}}}_i\) point to a node in \({{\mathcal {V}}}_{i+1}\) or in \({{\mathcal {V}}}_{i1}\) for some i. This kind of partitioning is discussed in Concas et al. (2021).
Directed \(\ell\)chained graphs and their adjacency matrices
The directed chained graphs introduced in this section generalize the notion of undirected chained graphs defined in Concas et al. (2021).
Definition 1
A directed graph \({{\mathcal {G}}}= \{{{\mathcal {V}}},{{\mathcal {E}}}\}\) is said to be directed \(\ell\)chained, with initial vertex \(v_i\), if the set of vertices can be subdivided into \(\ell\) disjoint nonempty subsets \({{\mathcal {V}}}_1,{{\mathcal {V}}}_2,\ldots ,{{\mathcal {V}}}_\ell\), see (2.2), such that \(v_i\in {{\mathcal {V}}}_1\) and all edges from vertices in the set \({{\mathcal {V}}}_j\) point to vertices in the set \({{\mathcal {V}}}_{j+1}\) for \(j=1,2,\ldots ,\ell 1\), where the chain length \(\ell\) is the largest number of vertex subsets \({{\mathcal {V}}}_j\) with this property. The vertex subset \({{\mathcal {V}}}_{j+1}\) is said to be adjacent to the vertex set \({{\mathcal {V}}}_j\).
The chain length \(\ell\) of a directed \(\ell\)chained graph may depend on the choice of the initial vertex \(v_i\). After a suitable permutation of the nodes, the adjacency matrix A of a directed \(\ell\)chained graph \({{\mathcal {G}}}= \{{{\mathcal {V}}},{{\mathcal {E}}}\}\) becomes upper block bidiagonal with zero diagonal blocks,
where the submatrix \(A_i\in {{\mathbb {R}}}^{n_i\times n_{i+1}}\) describes the connections from vertices in \({{\mathcal {V}}}_i\) to vertices in \({{\mathcal {V}}}_{i+1}\), for \(i=1,2,\ldots ,\ell 1\).
Example 3.1
Consider the graph of Fig. 1. This is a 3chained graph with the chained node sets \({{\mathcal {V}}}_1=\{v_1,v_2\}\), \({{\mathcal {V}}}_2=\{v_3\}\), and \({{\mathcal {V}}}_3=\{v_4\}\). The initial node can be chosen to be either \(v_1\) or \(v_2\). The adjacency matrix is
where we can choose the submatrices
Assume that a graph is known to be directed \(\ell\)chained for some \(\ell \ge 1\), but that the value of \(\ell\) is not known. Moreover, let a permuted version of the matrix (3.1) be known (for some unknown value of \(\ell\)). Thus, the available adjacency matrix is of the form
where P is a permutation matrix that modifies the vertex ordering. Given the adjacency matrix \({\widetilde{A}}\), we are interested in determining the vertex subsets \({{\mathcal {V}}}_1,{{\mathcal {V}}}_2,\ldots ,{{\mathcal {V}}}_\ell\) in Definition 1, as well as the number of sets \(\ell \ge 1\). A method for determining if a directed graph is \(\ell\)chained and partitioning the nodes into subsets is described by Algorithm 1. Given an adjacency matrix A of a directed graph, the first node subset \({{\mathcal {V}}}_1\) is obtained by considering the column indices j such that \(A_{ij}=0\) for each row index i; see line 1 of the algorithm. Then the other vertex subsets are determined by identifying the blocks in A that describe connections with nodes in the preceding node subset (line 6). If it is not possible to determine the first vertex set, or if during the process it results that some node is connected to a vertex in a preceding subset, then the graph is not \(\ell\)chained. This process gives a constructive proof of the following result.
Proposition 1
Let \({{\mathcal {G}}}=\{{{\mathcal {V}}},{{\mathcal {E}}}\}\) be a directed graph. Then it is possible to detect if it possesses an \(\ell\)chained structure and determine the number of subsets, \(\ell\), as well as the vertex set partitioning \({{\mathcal {V}}}={{\mathcal {V}}}_1\cup {{\mathcal {V}}}_2\cup \cdots \cup {{\mathcal {V}}}_\ell\).
The definition of directed \(\ell\)chained graphs is quite restrictive. To be able to discuss properties of a larger set of directed graphs, we relax the requirements of Definition 1 to allow edges between vertices in the vertex subset \({{\mathcal {V}}}_i\) to vertices in vertex subset \({{\mathcal {V}}}_j\) for some \(j\le i\) with j not much smaller than i.
Definition 2
The directed graph \({{\mathcal {G}}}= \{{{\mathcal {V}}},{{\mathcal {E}}}\}\) is said to be directed \(\{\ell ,k_i\}\)chained with initial vertex \(v_i\) if it has the chained structure described in Definition 1 with the extension that edges from vertices in the set \({{\mathcal {V}}}_j\) are allowed to point to vertices in the sets \({{\mathcal {V}}}_{\max \{jk_i,1\}},\ldots ,{{\mathcal {V}}}_j,{{\mathcal {V}}}_{j+1}\) for \(j=1,2,\ldots ,\ell 1\) and some \(k_i\ge 0\). The integer \(k_i\), which we refer to as the lower bandwidth, is the largest integer with this property.
We note that Definition 1 corresponds to the situation when \(k_i=1\) for all i in Definition 2.
Definition 3
The minimal lower bandwidth, k, of a directed chained graph is defined as
where the minimum is over all initial vertices \(v_i\) in the vertex set \({\overline{{{\mathcal {V}}}}}\subset {{\mathcal {V}}}\) that gives maximal chain length \(\ell\). When k is the minimal lower bandwidth, the graph is said to be directed \(\{\ell ,k\}\)chained.
The \(\{\ell ,k\}\)chained structure is quite general. We conjecture that any weakly connected graph with n nodes is \(\{\ell ,k\}\)chained for some \(n\ge \ell >k\ge 1\). A small value of k indicates that information in the graph flows in a preferred direction, with small back propagation. This structure can be investigated by means of spanning trees as described in the section “Directed chained graphs and directed spanning trees”.
Example 3.2
Consider the directed graph \({{\mathcal {G}}}\) shown in Fig. 2. It is a directed \(\{5,2\}\)chained graph with initial vertex \(v_1\). If one removes the edge from vertex \(v_4\) to \(v_2\), the graph becomes a directed \(\{5,1\}\)chained graph with initial vertex \(v_1\). If one continues by removing the edge from \(v_3\) to \(v_2\), then a directed 5chained graph with the same initial vertex is obtained.
The adjacency matrix analogous to Eq. (3.1) for a directed \(\{\ell ,k\}\)chained graph \({{\mathcal {G}}}= \{{{\mathcal {V}}},{{\mathcal {E}}}\}\) can be represented by a lower block Hessenberg matrix
when the nodes are suitably ordered. Here the block \(A_{ij}\) represents edges that point from the vertex subset \({{\mathcal {V}}}_i\) to the vertex subset \({{\mathcal {V}}}_j\). All superdiagonal blocks \(A_{i,i+1}\) are nonvanishing, because if all entries of the block \(A_{i,i+1}\) were zero, then there would be no edges from the vertex subset \({{\mathcal {V}}}_i\) to vertices in the subset \({{\mathcal {V}}}_{i+1}\). But this would contradict the fact that the graph \({{\mathcal {G}}}\) is directed \(\{\ell ,k\}\)chained.
If the minimal lower bandwidth, defined by Eq. (3.2), is \(k=0\), then there is at least one edge from a node to another node in the same vertex subset. The adjacency matrix corresponding to such a graph is upper block bidiagonal when the nodes are suitably ordered. Similarly, a lower bandwidth \(k=1\) indicates that when the nodes are suitably enumerated, the adjacency matrix can be represented by a block tridiagonal matrix. More generally, a small lower bandwidth (3.2) indicates that there only are edges between vertex subsets \({{\mathcal {V}}}_j\) with close indices.
The following result shows that for strongly connected directed \(\{\ell ,k\}\)chained graphs, directed cycles will be observed if \(k\ge 1\). For semiconnected or weakly connected directed graphs, cycles are not guaranteed to exist.
Proposition 2
Let \({{\mathcal {G}}}=\{{{\mathcal {V}}},{{\mathcal {E}}}\}\) be a strongly connected directed \(\{\ell ,k\}\)chained graph with vertex partition \({{\mathcal {V}}}={{\mathcal {V}}}_1 \cup \cdots \cup {{\mathcal {V}}}_{\ell }\). Assume there are no edges between vertices belonging to the same vertex set and that \(k\ge 1\). Let \(e_{j,i}\in {{\mathcal {E}}}\) represent a directed edge from vertex \(v_j\) to \(v_i\), where \(v_i\in {{\mathcal {V}}}_i\) and \(v_j\in {{\mathcal {V}}}_{i+s}\) for \(1\le s\le k\). Then there exists at least one directed cycle that starts at \(v_i\), contains the edge \(e_{j,i}\), and ends at \(v_i\). The possible minimum length of the directed cycle is \(s+1\).
Proof
Since the graph \({{\mathcal {G}}}\) is strongly connected and there are no edges between any nodes in the same vertex subset, the shortest possible directed path from vertex \(v_i\) to \(v_j\) has length s as shown below
where \(v_{i_t}\in {{\mathcal {V}}}_{i+t}\) for \(t=1,2,\ldots ,s1\). Combining this path with the edge \(e_{j,i}\) determines a directed cycle of length \(s+1\). \(\square\)
Identification of the \(\{\ell ,k\}\)chained structure of a directed graph (if present) sheds considerable light on properties of the graph, including the presence of anticommunities. Anticommunities are vertex subsets \({{\mathcal {W}}}_i\), \(i=1,2,\ldots ,q\), of \({{\mathcal {V}}}\) such that there are many fewer edges from nodes in \({{\mathcal {W}}}_i\) to nodes in \({{\mathcal {W}}}_i\), than from nodes in \({{\mathcal {W}}}_i\) to nodes in \({{\mathcal {W}}}_j\) for \(j\ne i\). For instance, the node subsets \({{\mathcal {V}}}_j\) of an \(\ell\)chained graph are anticommunities. Recent discussions on anticommunity detection for undirected graphs can be found in Concas et al. (2020), Estrada and Knight (2015), Fasino and Tudisco (2017). There are several methods and measures that allow one to identify communities or clusters, such as the intracluster density which, for undirected graphs, is defined as the ratio of the number of internal edges and the number of all possible internal edges; see Fortunato (2010). An analogous density measure for computing the anticommunity score for undirected graphs was introduced in Concas et al. (2021). Here, we extend this measure to directed \(\{\ell ,k\}\)chained graphs.
Definition 4
The anticommunity score \(\rho \in [0,1]\) for a node subset \({{\mathcal {V}}}_i\) of the node set \({{\mathcal {V}}}\) of a directed \(\{\ell ,k\}\)chained graph is the ratio of the number of directed edges between the vertices in \({{\mathcal {V}}}_i\) and the total possible number of directed edges between them. An anticommunity with score \(\rho\) is said to be a \(\rho\)anticommunity.
We remark that the anticommunity score aims at identifying an approximate anticommunity as a node set for which \(\rho\) takes a small value. A large value of \(\rho\) does not necessarily identify a community, because it does not consider the connections between the nodes in \({{\mathcal {V}}}_i\) and those not contained in \({{\mathcal {V}}}_i\).
Example 3.3
For directed \(\ell\)chained graphs with node subset partitioning (2.2), the subsets \({{\mathcal {V}}}_i\), for \(i=1,2,\ldots ,\ell\), are 0anticommunities, because there are no internal edges. For a directed \(\{\ell ,k\}\)chained graph described in Definition 2, the subset \({{\mathcal {V}}}_i\) has a positive anticommunity score \(\rho _i\) when it has internal edges. If \(\rho _i\) is small, then the subset \({{\mathcal {V}}}_i\) may be considered as an approximate anticommunity.
Directed chained graphs and directed spanning trees
The chained structure of a spanning tree T for an undirected graph \({{\mathcal {G}}}\) is used in Concas et al. (2021) to determine a chained structure for a graph \({{\mathcal {G}}}\), if such a structure exists, and to approximate a graph without a chained structure by a graph with such a structure. In this section, we consider directed graphs that have directed spanning trees. We remark that not all directed graphs have a directed spanning tree. The directed spanning trees are employed to partition the node set \({{\mathcal {V}}}\) into subsets \({{\mathcal {V}}}_i\) that determine directed \(\ell\)chained graphs; cf. (2.2). This approach to partition the node set \({{\mathcal {V}}}\) is applied to partitioning node sets of directed graphs that have a directed spanning tree, but do not possess a chained structure, and provides an approach to approximate a directed graph \({{\mathcal {G}}}\) without chained structure by a directed graph with chained structure.
We first briefly review results for undirected graphs. Let \({{\mathcal {G}}}=\{{{\mathcal {V}}},{{\mathcal {E}}}\}\) be an undirected graph. A spanning tree for \({{\mathcal {G}}}\) is a subgraph \({{\mathcal {T}}}=\{{{\mathcal {V}}},{{\mathcal {E}}}'\}\) that is a tree and contains all the vertices of \({{\mathcal {G}}}\); see, e.g., Concas et al. (2021), Deo (1974), Newman (2010). A spanning tree \({{\mathcal {T}}}\) is not uniquely determined by \({{\mathcal {G}}}\) and, in particular, depends on the chosen initial vertex of the tree, the socalled root.
When the graph \({{\mathcal {G}}}\) is directed, two different types of spanning directed trees, the outtree (or arborescence) and the intree, can be defined; see Deo (1974). We will employ both these directed trees.
Definition 5
An outtree rooted at node \(v_i\) for a directed graph \({{\mathcal {G}}}=\{{{\mathcal {V}}},{{\mathcal {E}}}\}\) is a subgraph \({{\mathcal {T}}}_\text {out}^i=\{{{\mathcal {V}}},{{\mathcal {E}}}'\}\) of \({{\mathcal {G}}}\) that is a tree with the same vertices as \({{\mathcal {G}}}\), and such that for every vertex \(v_j\), for \(j\ne i\), there is only one directed path starting at \(v_i\) and ending at \(v_j\) in the tree.
Definition 6
An intree rooted at \(v_i\) for a directed graph \({{\mathcal {G}}}=\{{{\mathcal {V}}},{{\mathcal {E}}}\}\) is a subgraph \({{\mathcal {T}}}_\text {in}^i=\{{{\mathcal {V}}},{{\mathcal {E}}}'\}\) of \({{\mathcal {G}}}\) that is a tree with the same vertices as \({{\mathcal {G}}}\), and such that for every vertex \(v_j\), for \(j\ne i\), there is only one directed path from \(v_j\) to \(v_i\) in the tree.
In an outtree, information may flow from the root to each vertex in the graph, while in an intree information may flow from any vertex to the root. In the first case, the root is a good source of information for the nodes of the graph; in the second case, the root is a good receiver.
Outtrees and intrees exist for every vertex of a directed graph only if the graph is strongly connected. Any vertex in a semiconnected graph belongs to an outtree or an intree. This follows from Proposition 3 below. We remark that this property is not guaranteed to hold for a weakly connected graph.
Proposition 3
Let \({{\mathcal {G}}}=\{{{\mathcal {V}}},{{\mathcal {E}}}\}\) be a semiconnected directed graph. Then the graph \({{\mathcal {G}}}\) has at least one outtree and one intree.
Proof
Let \(v_i,v_j\in {{\mathcal {V}}}\) be arbitrary distinct vertices. Then either \(v_i\) is connected to \(v_j\), or \(v_j\) is connected to \(v_i\). Assume there is a directed path P from \(v_i\) to \(v_j\). If all the vertices of \({{\mathcal {G}}}\) except for \(v_i\) and \(v_j\) are on the path P, then P is an outtree rooted at \(v_i\) and an intree rooted at \(v_j\).
Let u be a vertex of \({{\mathcal {G}}}\) that is not on the path P. Assume that there is neither a directed path from u to \(v_i\) nor a directed path from \(v_j\) to u; otherwise, we extend P by including u as a root. Then \({{\mathcal {E}}}\) contains directed paths from \(v_i\) to u and from u to \(v_j\). Therefore an outtree rooted at \(v_i\) and an intree rooted at \(v_j\) are obtained. \(\square\)
Proposition 4
Let \({{\mathcal {G}}}\) be a directed graph. If the vertex \(v_i\) of \({{\mathcal {G}}}\) is the root of both an outtree \({{\mathcal {T}}}_\text {out}^i\) and an intree \({{\mathcal {T}}}_\text {in}^i\) of \({{\mathcal {G}}}\), then the graph \({{\mathcal {G}}}\) is strongly connected.
Proof
Let \(v_i\) satisfy the assumption of the proposition. Then for any vertex \(v_j\), \(j\ne i\), there is a directed path from \(v_i\) to \(v_j\) and viceversa. Hence, for every pair of vertices \((v_k, v_j)\), \(k,j\ne i\), there is a directed path from \(v_k\) to \(v_j\) passing through \(v_i\) and vice versa. It follows that the directed graph \({{\mathcal {G}}}\) is strongly connected. \(\square\)
Each directed spanning tree has a directed \(\ell\)chained structure (2.2). For outtrees, the root of the tree is the only vertex in the first set \({{\mathcal {V}}}_1\) of the chained structure, and the partition of the vertex set \({{\mathcal {V}}}\) is determined by the relation between the vertices of the tree. Thus, the vertex set \({{\mathcal {V}}}_2\) contains the children of the root and, in general, the vertex set \({{\mathcal {V}}}_i\) contains the children of the vertices in \({{\mathcal {V}}}_{i1}\), \(i=2,3,\ldots ,\ell\). For an intree, the root belongs to the last vertex set \({{\mathcal {V}}}_\ell\), and the partition of the vertex set is determined by following the direction of the edges backwards until one reaches the set \({{\mathcal {V}}}_1\), which contains the leaves farthest away from the root.
Remark 4.1
Let \({{\mathcal {G}}}=\{{{\mathcal {V}}},{{\mathcal {E}}}\}\), with \({{\mathcal {V}}}={{\mathcal {V}}}_1\cup {{\mathcal {V}}}_2\cup \cdots \cup {{\mathcal {V}}}_\ell\), be an \(\ell\)chained graph. A directed outtree for \({{\mathcal {G}}}\) does not exist, for example, if the first set \({{\mathcal {V}}}_1\) contains more than one node. Similarly, a directed intree does not exist when the last set \({{\mathcal {V}}}_\ell\) contains more than one node.
The process of generating directed spanning trees for a directed graph is illustrated in the following example.
Example 4.1
Consider the directed graph \({{\mathcal {G}}}\) shown in Fig. 3. It is semiconnected. The outtree \({{\mathcal {T}}}_\text {out}^1\) and the intree \({{\mathcal {T}}}_\text {in}^3\) rooted at \(v_1\) and \(v_3\), respectively, are displayed in Fig. 4. These are the only outtrees and intrees for the graph \({{\mathcal {G}}}\). Their directed chained structure is illustrated in Fig. 5.
The chained structure of a directed spanning tree \({{\mathcal {T}}}\) of \({{\mathcal {G}}}\) can be used to detect, or approximate, the directed chained structure of \({{\mathcal {G}}}\). The chained structure of \({{\mathcal {G}}}\) might not be unique, as it depends on the starting vertex and the directed spanning tree \({{\mathcal {T}}}\).
Definition 7
Let \({{\mathcal {T}}}=\{{{\mathcal {V}}},{{\mathcal {E}}}'\}\) be an outtree (or intree) for the graph \({{\mathcal {G}}}\). A directed \(\ell\)chained vertex set decomposition for \({{\mathcal {T}}}\) is said to be a directed \(\ell\)chained vertex set decomposition for \({{\mathcal {G}}}\). We will refer to leaves of \({{\mathcal {T}}}\) as leaves of \({{\mathcal {G}}}\).
Example 4.2
Consider the partition of the directed tree \({{\mathcal {T}}}_\text {out}^1\) in Example 4.1. Starting with the vertex \(v_1\), we obtain the vertex sets \({{\mathcal {V}}}_1=\lbrace v_1\rbrace\), \({{\mathcal {V}}}_2=\lbrace v_2\rbrace\), \({{\mathcal {V}}}_3=\lbrace v_3, v_5 \rbrace\), and \({{\mathcal {V}}}_4=\lbrace v_4,v_6\rbrace\). Thus, \(\ell =4\); see Fig. 5. The graph may have other directed \(\ell\)chained partitions that are not determined by using spanning trees. For example, starting with vertices \(v_1, v_4\), we obtain the partition \({{\mathcal {V}}}_1=\lbrace v_1,v_4\rbrace\), \({{\mathcal {V}}}_2=\lbrace v_2,v_5\rbrace\), and \({{\mathcal {V}}}_3=\lbrace v_3, v_6 \rbrace\), and \(\ell =3\).
Now consider the intree \({{\mathcal {T}}}_\text {in}^3\) in Fig. 4. In this case, \({{\mathcal {V}}}_1=\lbrace v_1, v_4, v_6\rbrace\), \({{\mathcal {V}}}_2=\lbrace v_2, v_5 \rbrace\), \({{\mathcal {V}}}_3=\lbrace v_3\rbrace\), and \(\ell =3\).
We have already mentioned that some semiconnected graphs may not allow an \(\ell\)chained partitioning for an arbitrarily chosen initial vertex. For example, vertex \(v_2\) in Example 4.1 neither can be the root of an outtree nor of an intree that spans the graph.
Let \({{\mathcal {D}}}={{\mathcal {E}}}\setminus {{\mathcal {E}}}'\) be the set of the edges in \({{\mathcal {G}}}\) that are not in \({{\mathcal {T}}}\), and let \(C({{\mathcal {T}}})\) denote the graph obtained by adding the edges in \({{\mathcal {D}}}\) to the spanning tree \({{\mathcal {T}}}\). The graph \(C({{\mathcal {T}}})\) coincides with \({{\mathcal {G}}}\) and inherits the chained structure of \({{\mathcal {T}}}\).
Definition 8
A directed graph \({{\mathcal {G}}}\) is said to be compatible with a spanning tree \({{\mathcal {T}}}\) if all the edges in \({{\mathcal {D}}}\) are compatible with the chained structure of \({{\mathcal {T}}}\), that is, if for each edge \(e_j\in {{\mathcal {D}}}\) there is an index \(2\le i\le \ell 1\) such that \(e_j\) connects a vertex in \({{\mathcal {V}}}_i\) to a vertex in \({{\mathcal {V}}}_{i+1}\).
If \({{\mathcal {G}}}\) is compatible with \({{\mathcal {T}}}\), then the graph \({{\mathcal {G}}}=C({{\mathcal {T}}})\) is directed \(\ell\)chained. If instead there is at least one edge connecting a vertex in \({{\mathcal {V}}}_i\) to a vertex in \({{\mathcal {V}}}_{ik}\), for \(i=k+1,k+2,\ldots ,\ell\), and \(k\ge 0\) is the maximal number with this property, then the graph \({{\mathcal {G}}}=C({{\mathcal {T}}})\) is directed \(\{\ell ,k\}\)chained.
The graphs \(C({{\mathcal {T}}}_\text {out}^1)\) and \(C({{\mathcal {T}}}_\text {in}^3)\), obtained by adding the missing edges to the spanning trees of Fig. 5, are displayed in Fig. 6. The former graph is \(\{4,1\}\)chained and the latter one is \(\{3,1\}\)chained.
Position centrality and some applications
The notion of position centrality for vertices of an undirected network was introduced in Concas et al. (2021). It is a generalization of closeness centrality. This section generalizes position centrality to directed graphs by defining the inposition and outposition centralities of a node. The incloseness centrality of a node measures how close this node is to those it is receiving information from, while the outcloseness centrality of a node shows how close the node is to the nodes it is sending information to.
Let \((\#{{\mathcal {V}}}_i)\) denote the number of vertices in the set \({{\mathcal {V}}}_i\).
Definition 9
Let us assume that an outtree \({{\mathcal {T}}}_\text {out}=\{{{\mathcal {V}}},{{\mathcal {E}}}'\}\) rooted at the node v for the directed graph \({{\mathcal {G}}}\) exist. Moreover, let \({{\mathcal {V}}}_1,{{\mathcal {V}}}_2,\ldots ,{{\mathcal {V}}}_\ell\) be the directed \(\ell\)chained structure, starting at vertex v, determined by the tree. For a fixed \(p\in {{\mathbb {R}}}\), the outposition centrality of v is defined as
We refer to a vertex \(v_c\) with the smallest outposition centrality as a poutcenter vertex.
Definition 10
Let us assume that an intree \({{\mathcal {T}}}_\text {in}=\{{{\mathcal {V}}},{{\mathcal {E}}}'\}\) rooted at the node v for the directed graph \({{\mathcal {G}}}\) exist. Moreover, let \({{\mathcal {V}}}_1,{{\mathcal {V}}}_2,\ldots ,{{\mathcal {V}}}_\ell\) be the directed \(\ell\)chained structure, ending at vertex v, determined by the tree. For a fixed \(p\in {{\mathbb {R}}}\), the inposition centrality of v is defined as
We refer to a vertex \(v_c\) with the smallest inposition centrality as a pincenter vertex.
The in/outposition centralities depend on the spanning tree chosen. They can be defined for every node only if the directed graph is strongly connected. The outcenter vertex can be described as an “information transfer station”, such that it can “easily” send information to all the other vertices in the graph. A similar interpretation holds for the incenter vertex, which acts as an information sink. The following example illustrates how the in/outposition centralities of a vertex can be computed by using the chained structures starting from the vertex.
Example 5.1
Consider the strongly connected directed graph \({{\mathcal {G}}}\) in Fig. 7. To compute the outposition centrality of vertex \(v_3\), we identify an outtree rooted at \(v_3\) letting \({{\mathcal {V}}}_1=\{v_3\}\), \({{\mathcal {V}}}_2=\{v_4,v_5\}\), and \({{\mathcal {V}}}_3=\{v_1,v_2\}\). The 1outposition centrality of vertex \(v_3\) is
while \(P_{1/2}^{out}(v_3)=4.24\) and \(P_{5}^{out}(v_3)=96\).
We turn to the inposition centrality of vertex \(v_2\). Consider the intree rooted at \(v_2\) with vertex set partitioning \({{\mathcal {V}}}_1=\{v_3,v_4\}\), \({{\mathcal {V}}}_2=\{v_1,v_5\}\), and \({{\mathcal {V}}}_3=\{v_2\}\). We have
Since the graph \({{\mathcal {G}}}\) is strongly connected, we can compute the in/outposition centralities for all the other vertices similarly. When \(p=\frac{1}{2}\) and \(p=1\), the vertex \(v_3\) has the smallest outposition centrality. This indicates that \(v_3\) is the outcenter vertex. The incenter vertices are \(v_2\) and \(v_5\) for \(p=\frac{1}{2}\) and \(p=1\). When \(p=5\), the outcenter vertices are \(v_1\) and \(v_5\), while the incenter vertex is \(v_1\).
In a semiconnected directed graph, the vertices can be divided into three subsets: \({{\mathcal {O}}}\), which contains vertices connected to every other vertex in the network, \({{\mathcal {I}}}\), whose elements are vertices to which every vertex can send information, and \({{\mathcal {M}}}\), which contains intermediate vertices. There may be a nonempty intersection between the sets \({{\mathcal {O}}}\) and \({{\mathcal {I}}}\). Outposition centrality is defined only for vertices in \({{\mathcal {O}}}\), while inposition centrality can be computed for vertices in \({{\mathcal {I}}}\). Since every vertex belongs to at least one spanning tree, semiconnected graphs are directed \(\ell\)chained or directed \(\{\ell ,k\}\)chained.
Vertices for weakly connected directed graphs also can be divided into the above three subsets \({{\mathcal {O}}}\), \({{\mathcal {I}}}\), and \({{\mathcal {M}}}\). However, the sets \({{\mathcal {O}}}\) and \({{\mathcal {I}}}\) may both be empty, since out/intrees are not guaranteed to exist. Hence, weakly connected directed graph may not possess a chained structure.
Some examples
This section describes a few examples concerned with directed graphs. For each graph, we analyze the presence of anticommunities by identifying its directed chained structure. The out/incenter vertices are identified by computing the smallest out/inposition centralities. Knowledge of the chained structure is beneficial in the following contexts:

Information dissemination in a social network: we are interested in determining directed \(\ell\)chained or directed \(\{\ell ,k\}\)chained structures with initial vertex (center vertex) such that information from this node can reach all other individuals in the least amount of time, where we assume that the time is proportional to the path length. Similarly, we may be interested in determining which individual(s) can collect information from all other vertices in the least amount of time. Moreover, in a directed \(\{\ell ,k\}\)chained graph, the presence of an edge \(e_{j,i}\) from vertex \(v_j\in {{\mathcal {V}}}_j\) to vertex \(v_i\in {{\mathcal {V}}}_i\), for \(i<j\), indicates the possibility of feedback of the information from \(v_j\) to \(v_i\). The minimal lower bandwidth k shows the minimal length of the path from \(v_j\) to \(v_i\).

Prevention of the spread of an infectious disease: let the edges of a directed chained graph represent the spread of an infectious disease among subjects that are represented by nodes. An edge \(e_{j,i}\) from vertex \(v_j\in {{\mathcal {V}}}_j\) to vertex \(v_i\in {{\mathcal {V}}}_i\), for \(i<j\), represents a secondary infection of \(v_i\) from \(v_j\). It is reasonable to prevent the spread of disease by detecting and possibly eliminating the outcenter vertex. In the context of COVID19, it is important that outcenter vertices be vaccinated. Similarly, it can be important to protect an incenter vertex from infection from other nodes. Vaccination may be one way to achieve this.
To graphically illustrate the chained structures revealed by the model discussed above, we first consider the following two small directed graphs:

ibm32 (32 vertices, 126 edges): collected from the IBM 1971 conference advertisement. After removing selfloops, the graph has 94 edges. It is available at https://sparse.tamu.edu/HB/ibm32/.

n2c6b10, short for JGD_Homology/n2c6b10 (306 vertices, 330 edges): simplicial complexes from homology by Volkmar Welker. There are 329 edges after removing the selfloop. The graph is available at https://www.cise.ufl.edu/research/sparse/matrices/list_by_dimension.html.
These networks are not social networks, but nevertheless will be seen to have structure that can be studied with the concepts introduced in the present paper. The network n2c6b10 is represented by a weighted graph. We consider the corresponding unweighted graph obtained by setting all weights to 1.
Figure 8 displays the outtree \({{\mathcal {T}}}_\text {out}^{20}\) (left) and the intree \({{\mathcal {T}}}_\text {in}^{20}\) (right) for the ibm32 network. Both trees are rooted at vertex \(v_{20}\) of the graph and have maximal chained structure length \(\ell =7\). The set \(C({{\mathcal {T}}}_\text {out}^{20})\), which contains the additional edges that are not in \({{\mathcal {T}}}_\text {out}^{20}\), is shown on the lefthand side of Fig. 9. It has \(\{\ell ,k\}\)chained structure with minimal lower bandwidth \(k=4\), that is, edges in \(C({{\mathcal {T}}}_\text {out}^{20})\) from vertices in the subset \({{\mathcal {V}}}_i\) are allowed to point to vertices in the subsets \({{\mathcal {V}}}_{i4},\ldots ,{{\mathcal {V}}}_i,{{\mathcal {V}}}_{i+1}\) for \(i=5,\ldots ,\ell\). The righthand side of Fig. 9 displays the graph \(C({{\mathcal {T}}}_\text {in}^{20})\) with minimal lower bandwidth \(k=4\). We conclude that the graph ibm32 is directed \(\{7,4\}\)chained with initial vertex \(v_{20}\). Consider the chained structure determined by the intree \({{\mathcal {T}}}_\text {in}^{20}\). There are four 0anticommunities (the first subset \({{\mathcal {V}}}_1\) contains only one vertex) and 3 anticommunities with scores \(\rho _3=0.10\), \(\rho _4=0.11\), and \(\rho _5=0.06\). Moreover, since both the outtree and intree are rooted at vertex \(v_{20}\), it follows from Theorem 4 that the graph is strongly connected.
Starting from each vertex, an outtree and an intree are constructed and their associated chained structures are determined. Figure 10 displays the chain length \(\ell\) and the lower bandwidth k of the \(\{\ell ,k\}\)chained structure associated with the outtree (left) and the intree (right) rooted at each vertex of the graph ibm32. The property of “exact length” in the legend represents the maximal chain length and the property “not exact length” indicates that the chain length is not maximal. The symbol \(\circ\) in the figure displays the chain length of each structure as a function of the initial vertex \(v_j\), \(j=1,2,\ldots ,32\). When the chain length is maximal, we use the symbol \(\square\).
The symbols \(*\) and \(\times\) display the lower bandwidth \(k_j\) of the chained structure with initial node \(v_j\) for \(j=1,2,\ldots ,32\); see Definition 2. The lower bandwidths of the chained structure with the maximal chain length are displayed by \(*\) symbols; for the other chained structures, the symbol is \(\times\). Among the lower bandwidths associated with the maximal chain length, the smallest \(k_j\) is the minimal lower bandwidth. Hence, Fig. 10 shows the maximal chain length to be \(\ell =7\) and the minimal lower bandwidth \(k=4\). It can be seen that the maximal chain length and the minimal lower bandwidth are not achieved for each starting or ending vertex.
The lefthand side of Fig. 11 displays the 1outposition centrality, i.e., the outposition centrality for \(p=1\), of each vertex of graph ibm32. The 1inposition centrality for each vertex is shown on the righthand side of Fig. 11. The outcenter vertices of the graph are \(v_2\) and \(v_3\), and the incenter vertex is \(v_{10}\).
Figure 12 displays the outtree \({{\mathcal {T}}}_\text {out}^{28}\) with maximal chain length and its corresponding graph \(C({{\mathcal {T}}}_\text {out}^{28})\) for the graph n2c6b10. We note that this graph is directed \(\{4,0\}\)chained with initial vertex \(v_{28}\). Since the outtree \({{\mathcal {T}}}_\text {out}^{28}\) is the only spanning tree of n2c6b10, the initial vertex is the outcenter vertex and the graph is semiconnected. The graph n2c6b10 has three 0anticommunities and the node subset \({{\mathcal {V}}}_2\) is an anticommunity with \(\rho _2=0.04\).
We now determine the directed chainedlike structure and center vertices for the mediumsized directed graph gre_1107 with 1107 vertices and 5664 edges. This graph arises from simulation studies in computer systems and is available at https://math.nist.gov/MatrixMarket/data/HarwellBoeing/grenoble/gre_1107.html. We refer to the graph as gre. After removing selfloops, this graph has 4557 edges. The outtree \({{\mathcal {T}}}_\text {out}^{808}\) and intree \({{\mathcal {T}}}_\text {in}^{644}\) with maximal chained structure length are displayed on the left and the right of Fig. 13, respectively. The graph \(C({{\mathcal {T}}}_\text {out}^{808})\) is \(\{53,4\}\)chained and the graph \(C({{\mathcal {T}}}_\text {in}^{644})\) has a \(\{53,5\}\)chained structure. The chain length and lower bandwidth of \(\{\ell ,k\}\)chained structures starting and ending at each vertex of the graph gre are shown in Fig. 14. Only one directed outtree, \({{\mathcal {T}}}_\text {out}^{808}\), and one directed intree, \({{\mathcal {T}}}_\text {in}^{644}\), are found to have maximal chain length. Their corresponding \(\{\ell ,k\}\)chained structures have minimal lower bandwidth \(k=4\) and \(k=5\), respectively; this is not visible in the figure because of the density of the symbols. Hence, the graph gre is a directed \(\{53,4\}\)chained graph with initial vertex \(v_{808}\). It has five 0anticommunities and 48anticommunities with the minimal score \(\rho =0.01\) and maximal score \(\rho =0.5\). The graph gre is strongly connected since each vertex has both outtrees and intrees.
The 1outposition centrality and 1inposition centrality of each vertex of the graph gre are shown in Fig. 15. The outcenter vertex is \(v_{400}\) with 1outposition centrality \(P^\text {out}_1(v_{400})=10129\), and the incenter vertex is \(v_7\) with 1inposition centrality \(P^\text {in}_1(v_7)=9030\).
For the previous test networks, of small to medium dimension, it was possible to determine both a spanning outtree and an intree. Now, we investigate the presence of such spanning trees in larger networks, some of which are extracted from the Stanford Large Network Dataset Collection (SNAP) (Leskovec and Krevl 2014). The networks are the following

twitter (3656 vertices, 188,712 edges) available from http://wiki.gephi.org/index.php/Datasets, reproduces the connections of some part of the Twitter social network;

wikivote (8297 vertices, 103,690 edges). The nodes in the network represent Wikipedia users and a directed edge from node i to node j represents that user i voted on user j in an administrator election (Leskovec and Krevl 2014);

gnutella (10,879 vertices, 39,994 edges) is the p2pGnutella04 network from Leskovec and Krevl (2014);

foldoc (13,380 vertices, 120,700 edges) is an online searchable dictionary http://foldoc.org/: an edge from term i to term j exists in the network if in the FOLDOC dictionary the term j is used to describe the meaning of term i. The network is available at http://vlado.fmf.unilj.si/pub/networks/data/.

math (13,840 vertices, 195,330 edges) is available from Leskovec and Krevl (2014) and represents the interactions on the stack exchange web site Math Overflow https://mathoverow.net/. In particular, a direct edge is present between node i and node j if user i commented on user j’s answer.
Table 1 displays, for each network, if a spanning out/intree exists, and the values of \(\ell\) and k in the corresponding \(\{\ell ,k\}\)chained structure.
Three of the above networks admit either a spanning outtree or an intree. Two of them do not, so we eliminated the out/in dangling nodes, that is, vertices that do not have incoming edges or outgoing edges, respectively. This preprocessing is reflected in a different number of nodes in columns 3 and 5, than in column 2. The absence of dangling nodes is a necessary condition for the existence of an out/in spanning tree, but not sufficient, as the results in Table 1 confirm. The results show that there are realworld networks resulting from important applicative settings that have a directed spanning tree and that inherit from it a chained structure.
We now analyze in more detail the network twitter. To illustrate the different center nodes determined by varying the value of p in Definitions 9 and 10, we analyzed this graph for the p values reported in the first column of Table 2. It turns out that no vertex admits an intree, while most of the nodes (3485) have an outtree. The table shows that different outcenter vertices are identified when p varies, even if there is some stability for p between 0 and 1. Both the depth \(\ell\) and the minimal lower bandwidth k of the corresponding \(\{\ell ,k\}\)chained structure can be seen to grow with p.
In the same table, we also report the outcenter nodes identified by other wellknown centrality measures. The outdegree, betweenness centrality (Newman 2010), PageRank (Page et al. 1999), and hubs score from HITS (Kleinberg 1999), have been computed by the centrality function of Matlab. The hubcentrality (Benzi et al. 2013) has been computed by the hubauth package, developed in Baglama et al. (2014) and available at https://bugs.unica.it/cana/software/.
Table 2 confirms the wellknown fact that centrality measures often disagree, making it hard to judge which result is the best. The table also points out that trees rooted at the vertices with largest position centralities tend to identify chained structures with a smaller depth \(\ell\) and bandwidth k than trees rooted at nodes considered important with respect to other measures. Table 2 illustrates that when using position centrality, we are able to identify chained structure with the smallest \(\ell\) and k values.
Figure 16 displays the spanning trees rooted at the first two outcenter nodes of Table 2. It is evident that the tree corresponding to the larger value of p, shown on the right, produces a chained structure for which the cardinality of the \({{\mathcal {V}}}_j\) sets with small j is larger at the beginning of the sequence than for the tree on the left; the cardinality of the sets in the tree on the left are more balanced. However, it is difficult to understand the effect of this parameter on the choice of the center nodes without knowledge of the identity and history of the people defining the vertices. The transportation network analyzed in the next section aims to clarify the meaning of poutcenter nodes, as well as to compare position centrality to other centrality measures.
To gain some insight into how the centrality measures considered in this section are related, we computed the Kendall rank correlation coefficient between the in/out position centrality with \(p=1\) (IPC/OPC) and degree (IDC/ODC), closeness (ICC/OCC), betwenness (BC), hub/authority (HC/AC), and PageRank (PRC) centralities. The comparison has been performed on the ibm32, n2c6b10, gre, twitter, and busca networks. The last network will be discussed in the next section.
The results are displayed in Fig. 17. The networks n2c6b10 and twitter do not appear in the graph on the left, because they have no incenter nodes; there is only one outcenter node in n2c6b10, so for this data set the comparison is meaningless, as it is clear in the graph on the right. As it is expected, the Kendall coefficients of position centrality and closeness centrality for both incoming and outcoming connections are 1, meaning that the agreement of the two ranks is perfect. Position centrality is seen to be strongly correlated to degree and PageRank for some of the data sets, but the two graphs demonstrate that the considered centrality indexes represent different features of the networks.
A case study about position centrality
To investigate the effect of the parameter p on the choice of the center vertices determined by the outposition centrality \(P^\text {out}_p(v)\) and the inposition centrality \(P^\text {in}_p(v)\) of a vertex v, we studied a transportation network for which it is possible to judge by common sense the results of the analysis. Like all transportation networks, it is closely related to the social behaviour of the individuals living in the area of interest.
We considered the bus network that serves the metropolitan area around the town of Cagliari in Sardinia, Italy. The area is about 65 km\(^2\), hosts \(4.2\cdot 10^5\) people, and includes the town of Cagliari as well as four smaller municipalities very close to Cagliari, contiguous in some parts: Monserrato, Selargius, Quartucciu, and Quartu Sant’Elena; see Fig. 18.
The bus network was constructed using data available on the web. We refer to this network as busca. There are 970 bus stops. They define nodes. The distance between the bus stops is not available. We therefore measure distance as the number of bus stops between the starting and ending nodes on the shortest path. The bus routes define edges. The resulting network is unweighted. Some bus routes depend on the direction of travel, e.g., because some streets are oneway. The bus network therefore is directed.
A bus network, like most geographical networks, is strongly influenced by the landscape and urbanization. The Cagliari commercial center is located on the southwest border of the network, in front of the harbor. A large pond, the Molentargius Saline Regional Nature Park, is located in the center of the urban area. It separates the four municipalities from Cagliari, and prevents straight travel between them.
We computed the poutcenter and the pincenter vertices of the bus network for different values of p. The results are reported in Table 3, where the bus stops are identified by their name. The table also reports the center nodes according to the degree, betweenness (Newman 2010), PageRank (Page et al. 1999), HITS (Kleinberg 1999), and hub/authority centrality (Baglama et al. 2014; Benzi et al. 2013). Since betweenness and PageRank do not distinguish outcenters from incenters, only one center node is reported for them. For each in/outcenter vertex, the depth \(\ell\) (i.e., the distance between the center vertex, which is the root, and a most distant leaf of the spanning tree) is reported. Here the distance is measured in terms of the number of edges on the shortest path between the root and the leaf. We also report the minimal lower bandwidth k of the \(\{\ell ,k\}\)chained graph structure.
The center nodes corresponding to \(p=0.5\) and \(p=1\) are in the commercial center of Cagliari, the part of town where the largest number of shops and restaurants are located, and where a large number of bus routes converge. The outtree rooted at the outcenter corresponding to the “San Benedetto” bus stop is displayed in the right pane of Fig. 19. The density of nodes in the upper part of the tree shows that many of the first chained sets \({{\mathcal {V}}}_i\), i.e., sets with small index i, contain a large number on nodes. This indicates that it is possible to reach a large number of destinations within a small number of bus stops, i.e., in a small time.
When p is significantly smaller than one, the out and incentralities defined in Definitions 9 and 10 give a smaller weight to sets \({{\mathcal {V}}}_i\) with large cardinality than when p is larger than one. The effect is that the spanning tree rooted at the corresponding outcenter node is more balanced; see the picture on the left of Fig. 19. This corresponds to a less rapid decay in the number of elements of the sets \({{\mathcal {V}}}_i\) and also to a smaller depth of the tree. The center node “Legnano” is located in the Pirri district of Cagliari, while “Zuddas”, and “Riu Mortu” are in Monserrato, a neighboring municipality. Both these zones join Cagliari with the small towns in the west part of the area, so they are barycentric for the network. It is possible to reach the farthest parts of the network from them after a relatively small number of bus stops.
For a value of p somewhat larger than 1, say \(p=5\), the center nodes are found in densely populated parts of Cagliari. When the value of p becomes very large, the center nodes are suburban bus stops that are served by “strategic” routes that connects them rather easily to the rest of the network.
The depth of the spanning trees, that is, the maximal length of the routes starting from the tree root, increases monotonously with p. It is remarkable that the minimal lower bandwidth is rather large. This is a consequence of that there are some bus routes going back towards the center node with only a single bus stop before reaching the center.
The other centrality measures, reported in the lower part of Table 3, with the exception of the betweenness centrality, produce center vertices located in the Cagliari harbor area, the touristic center. In any case, these measures produce results complying with the traditional idea of centrality, while varying the parameter p in the position centrality gives the possibility to consider different aspects of this transportation network. The betweenness central vertex is difficult to interpret, as it is located in the central part of the Quartu Sant’Elena town, which does not appear to identify the real center of the network.
Conclusion
It is important to be able to identify interesting structural properties of directed graphs, because they shed light on how the vertices are connected. This paper introduces the notion of directed chained graphs and illustrates how it helps us to understand the structure of directed graphs. Also, the related notions of incentral and outcentral nodes are defined and illustrated. The latter notions are quite intuitive and examples illustrate that they are helpful for identifying important nodes that differ from nodes that are identified by several popular available centrality measures.
Availability of data and materials
The datasets generated and analyzed during the current study are available from the corresponding author upon request.
References
Baglama J, Fenu C, Reichel L, Rodriguez G (2014) Analysis of directed networks via partial singular value decomposition and Gauss quadrature. Linear Algebra Appl 456:93–121
Benzi M, Estrada E, Klymko C (2013) Ranking hubs and authorities using matrix functions. Linear Algebra Appl 438:2447–2474
Borgatti SP (2005) Centrality and network flow. Soc Netw 27:55–71
Concas A, Noschese S, Reichel L, Rodriguez G (2020) A spectral method for bipartizing a network and detecting a large anticommunity. J Comput Appl Math 373:112306
Concas A, Reichel L, Rodriguez G, Zhang Y (2021) Chained graphs and some applications. Appl Netw Sci 6:39
De la Cruz Cabrera O, Matar M, Reichel L (2020) Edge importance in a network via line graphs and the matrix exponential. Numer Algorithms 83:807–832
De la Cruz Cabrera O, Matar M, Reichel L (2021) Centrality measures for nodeweighted networks via line graphs and the matrix exponential. Numer Algorithms 88:583–614
Deo N (1974) Graph theory with applications to engineering and computer science. PrenticeHall, Englewood Cliffs
Estrada E (2012) The structure of complex networks: theory and applications. Oxford University Press, Oxford
Estrada E, Higham DJ (2010) Network properties revealed through matrix functions. SIAM Rev 52:696–714
Estrada E, Knight P (2015) A first course in network theory. Oxford University Press, Oxford
Estrada E, RodriguezVelazquez JA (2005) Subgraph centrality in complex networks. Phys Rev 71:056103
Fasino D, Tudisco F (2017) A modularity based spectral method for simultaneous community and anticommunity detection. Linear Algebra Appl 542:605–623
Fortunato S (2010) Community detection in graphs. Phys Rep 486:75–174
Free Online Dictionary Of Computing. http://foldoc.org/
Gabow HN, Myers EW (1978) Finding all spanning trees of directed and undirected graphs. SIAM J Comput 7:280–287
Gephi Sample Data Sets. http://wiki.gephi.org/index.php/Datasets
Kleinberg J (1999) Authorative sources in hyperlinked environments. J ACM 49:604–632
Laenen S, Sun H (2020) Higherorder spectral clustering of directed graphs. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in neural information processing systems. Curran Associates, Inc., pp 941–951. https://proceedings.neurips.cc/paper/2020/file/0a5052334511e344f15ae0bfafd47a67Paper.pdf
Leskovec J, Krevl A (2014) SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data
MathOverflow website. https://mathoverflow.net/
Matrix Market Collection. https://math.nist.gov/MatrixMarket/data/HarwellBoeing/grenoble/gre_1107.html
Newman MEJ (2010) Networks: an introduction. Oxford University Press, Oxford
Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the web. Stanford InfoLab, Stanford
Pajek dataset. http://vlado.fmf.unilj.si/pub/networks/data/
SuiteSparse Matrix Collection. https://sparse.tamu.edu/HB/ibm32/
UF Sparse Matrix Collection. https://www.cise.ufl.edu/research/sparse/matrices/list_by_dimension.html
Acknowledgements
Not applicable.
Funding
Work by LR was supported in part by NSF Grant DMS1720259. AC, CF, and GR were supported by the Regione Autonoma della Sardegna research project “Algorithms and Models for Imaging Science (AMIS)” [RASSR57257] and the INdAMGNCS research project “Tecniche numeriche per l’analisi delle reti complesse e lo studio dei problemi inversi”. CF gratefully acknowledges Regione Autonoma della Sardegna for the financial support provided under the Operational Programme P.O.R. Sardegna F.S.E. (European Social Fund 2014–2020  Axis III Education and Formation, Objective 10.5, Line of Activity 10.5.12).
Author information
Authors and Affiliations
Contributions
AC, CF, LR, GR and YZ contributed equally to this work. In particular, AC, CF, LR, GR and YZ collaborated in designing the research and the methodology, performing the analyses, and implementing the algorithms. All authors edited the paper and read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Concas, A., Fenu, C., Reichel, L. et al. Chained structure of directed graphs with applications to social and transportation networks. Appl Netw Sci 7, 64 (2022). https://doi.org/10.1007/s4110902200502x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s4110902200502x