- Research
- Open access
- Published:
Local clustering coefficient-based iterative peeling strategy to extract the core and peripheral layers of a network
Applied Network Science volume 9, Article number: 49 (2024)
Abstract
We propose a novel local clustering coefficient (LCC)-based strategy to extract the peripheral layers of a complex network and identify the nodes that be part of one of these layers as well as those that would form the core of the network. The LCC of a node is the probability that any two neighbors of the node are connected. Our hypothesis is that for any given network topology, nodes with a LCC of 1.0 ideally suit to the role of peripheral nodes. We propose an iterative peeling strategy wherein we remove the nodes with a LCC of 1.0 (the LCC values of the residual nodes are recalculated after each iteration) to extract the peripheral nodes at a particular layer (starting from the outermost peripheral layer to the innermost peripheral layer) and penetrate towards the inner core. We stop the iterative peeling when all the residual nodes (left over nodes in the network after the removal of the nodes at the peripheral layers) have a degree of 0 or the LCC values of all the non-zero degree residual nodes are less than 1.0. We show that the proposed LCC-based iterative peeling strategy has the potential to envision an inner most single-layer core and a hierarchical (one or more peripheral layers) peripheral structure for scale-free networks and a single layer of peer nodes (all nodes have a LCC of less than 1.0) for random networks.
Introduction
Core–periphery analysis is a mesoscopic-level (Yanchenko and Sengupta 2023) of structural analysis of a complex network to identify the nodes that would form the densely-connected central core of the network and the loosely-connected peripheral nodes that are located away from the center of the network. There are several approaches (such as those based on spectral decomposition (Cucuringu et al. 2016), random walking (Della Rossa et al. 1467), motif counting (Ma et al. 2018), statistical inference (Kojaku and Masuda 2017), etc.) proposed in the literature to extract the core–periphery structure of a network; these approaches differ in how they consider the core nodes and peripheral nodes to be mutually connected to each other. Nevertheless, the underlying connectivity models of the nodes behind several such works on core–periphery analysis could be grouped into two broad categories: hierarchical versus two-block models (proposed originally be Borgatti and Everett (2000)). In Gallagher et al. (2021), the core–periphery partitions identified by the two categories of models were observed to be not the same for networks across different domains.
The k-shell decomposition (Newman September 2010; Bollobás 1998) is a classical strategy that could be used to explain the connectivity of the nodes per the hierarchical model: the k-shell comprises of a subset of nodes that have at least k links to other nodes in the subset; the shell index of a node is k if the node can be part of the k-shell of the network, but not be part of any (k + 1)-shell of the network. The core nodes are those with the largest shell index and the rest of the nodes (whose shell index is less than the maximum) are the peripheral nodes. The two-block model is a hub-and-spoke model wherein the core nodes (that form one block) are considered to be adjacent (or densely connected through fewer intermediate core nodes) to other core nodes as well as to one or more peripheral nodes (that form the other block), but the peripheral nodes are expected to be not connected amongst themselves. The two-block model is very rigid as it seeks to identify nodes strictly as either core or peripheral, irrespective of the underlying variation in the connectivity of the nodes. For example, it is not possible to impose the two-block model for random networks or similar category of networks (see Fig. 1: Graph(a), a random network-style graph generated per the well-known Erdos–Renyi (ER) model (Erdos and Renyi 1959), with the value for the model parameter plink: probability for a link between any two nodes being 0.52) wherein nodes are more of peers to each other: any node could be connected to any other node. Due to the rigidity observed with the hub-and-spoke modeling approach, we seek to subscribe to the hierarchical approach of characterizing the core–periphery structure of a network in this paper.
The hierarchical approaches for core–periphery structural analysis are either directly or indirectly driven by the degree of the nodes (as the basis for layer extraction) and tend to favor high-degree nodes for placement in the inner core (irrespective of their contribution to the connectivity amongst the rest of the nodes in the network as well as their own connectivity to the rest of the nodes in the network) and low-degree nodes are preferred for placement in the peripheral layers (even if they are topologically well-suited to serve the role of bridge nodes). The example graph-(b) of Fig. 1 illustrates our above claim: if we were to run the k-shell decomposition method on this graph, it would categorize nodes {1, 3, 5, 6} to be core nodes with a shell index of 3 and categorize the rest of the nodes (including the bridge node 2) as peripheral nodes with a shell index of 2. However note that node 2 in graph-(b) of Fig. 1 is very critical for connectivity among the rest of the nodes in the network (especially, to connect the clusters {1, 3, 5, 6, 9} and {2, 4, 7, 8, 10}) as well as is more closer to several nodes; whereas, nodes 5 and 6 (that are part of the 3-shell core) add no value to the connectivity among the rest of the nodes in the network as well as are relatively far to the nodes in the {2, 4, 7, 8, 10} cluster.
We propose that nodes closer to the rest of the nodes in the network and/or lying on the shortest paths to the rest of the nodes in the network are more appropriate to be categorized as core nodes and likewise, nodes far away from the rest of the nodes in the network and are not in the shortest paths for the rest of the nodes in the network are more appropriate to be categorized as peripheral nodes. In this context, the shortest paths-based centrality metrics (Freeman 1977, 1979) such as closeness centrality (CLC) and betweeness centrality (BWC) would be more appropriate (rather than node degree) to be used for such core–periphery categorization; but, the computation of both CLC and BWC require global knowledge of the network at every node and BWC is a synchronously computed metric as well (i.e., the BWC-computation algorithm (Brandes 2001) needs to be run on every node and the BWC values would be computed for every node even if one is interested only in the BWC value for a particular node). While scouting for a computationally-light metric (whose computation would require only local knowledge and be asynchronous: i.e., only need to be computed for the node of interest) that could effectively capture the above stated topological characteristics pertinent to core-peripheral structural analysis, we claim the local clustering coefficient (LCC) metric (Newman 2010; Meghanathan 2017d) would fit the bill.
The LCC of a node is a fraction (see “LCC-based Iterative Peeling Strategy” Section for its computation approach) ranging from 0.0 to 1.0. The lower this fraction, the larger the probability that the neighbors of the node would go through the node for shortest path communication. If the LCC of a node is 1.0, it implies that none of its neighbor nodes (and hence the rest of the nodes in the network) need to go through the node for shortest path communication. We propose a LCC-based iterative peeling strategy for core-peripheral structural analysis, according to which we peel away all the peripheral nodes (that are identified with a LCC of 1.0 as we peel layer by layer) and stop peeling (we stop when all the residual nodes have a degree of 0 or none of the residual nodes of non-zero degree have a LCC of 1.0) and conclude the left over nodes as core nodes. These core nodes are critical for shortest path communication and form the center of the network. If the majority of nodes in the network are identified as core nodes per this strategy, the network is referred to as "core-heavy". If the majority of the nodes in the network are identified as peripheral nodes (all these nodes have been peeled off from the network at some peripheral-level), then the network is referred to as "peripheral-heavy". If none of the nodes in a given network have a LCC of 1.0 to begin with, we do not execute the iterative peeling strategy and simply conclude the network to be a random network wherein all the nodes are more appropriate to be considered as peers of each other.
The rest of the paper is organized as follows: “LCC-based Iterative Peeling Strategy” Section proposes an LCC-based iterative peeling strategy for core–periphery analysis and illustrates its execution using two toy graphs as running examples. “Application of the LCC-based Iterative Peeling Strategy on Real-World Networks” Section presents the results of LCC-based core–periphery analysis for a suite of 80 real-world networks, several of which are used as benchmark networks in earlier works on complex network analysis. “Application of the LCC-based Iterative Peeling Strategy on Real-World Networks” Section presents the categorization of real-world networks as either random, core-heavy or peripheral-heavy based on the proposed LCC-based iterative peeling strategy. “Application of the LCC-based Iterative Peeling Strategy on Real-World Networks” Section also presents a logistic regression-based binary classification model that could be used to predict whether a real-world network (for which there exists at least one node with LCC of 1.0) is core-heavy or peripheral-heavy. “Application of the k-Shell Decomposition Method on Real-World Networks” Section presents and discusses the results of the application of the k-shell decomposition method to extract the core–periphery structures of the real-world networks as well as compares the results with those obtained from the proposed LCC-based iterative peeling strategy. “Related Work” Section reviews related work and highlights the contributions of our work. “Conclusions and Future Work” Section concludes the paper and outlines plans for future work. Throughout the paper, the terms ‘node’ and ‘vertex’, ‘edge’ and ‘link’, ‘network’ and ‘graph’, {‘strategy’, ‘method’, ‘approach’}, peripheral layers or peripheral levels are used interchangeably. They mean the same.
LCC-based iterative peeling strategy
The LCC (Newman 2010) of a node is the probability that any two neighbors of the node are connected. The LCC of a node can be computed as the fraction: the actual number of links between any two neighbors of the node to the maximum possible number of links between the neighbors of the node. The LCC of a node with 0 or 1 neighbor is 1.0. For nodes with two or more neighbors, if the LCC of a node is 1.0, it implies that any two neighbors of the node are directly reachable to each other and would not need this node for shortest path communication. If the neighbors of a node are not going to use a node for shortest path communication, then the rest of the nodes in the network are not going to use the node for shortest path communication; thus, the BWC of such nodes (with a LCC of 1.0) would be zero as well (Meghanathan 2017b) and such nodes are more likely to exist in the peripheral layers of a network, rather than the inner core.
The above observation motivated us to propose the use of the LCC metric for core–periphery analysis. Our hypothesis is that for any given network topology, nodes with a LCC of 1.0 ideally suit to the role of peripheral nodes. We propose an iterative peeling strategy wherein we remove the nodes with a LCC of 1.0 to extract/remove the peripheral nodes (and their associated links from the graph) at a particular layer (starting from the outermost peripheral layer to the innermost peripheral layer) and penetrate towards the inner core. We identify the peripheral layers as 1-peripheral (1-p), 2-peripheral (2-p), etc. (in this order) as we progress from the outer/periphery to the inner/core. We stop the iterative peeling when the degree value for all the residual nodes (left over nodes in the network after the removal of the nodes at the peripheral layers) is either a 0 or the LCC values for all the residual nodes with a non-zero degree are less than 1.0; these residual nodes form the core nodes (there is only one core layer in our model). Note that during the subsequent iterations (after the first iteration), the LCC values of the residual nodes in the network are calculated only on the basis of the links connecting the residual nodes.
We applied the proposed LCC-based iterative peeling strategy on a suite of 80 real-world networks (see “Application of the LCC-based Iterative Peeling Strategy on Real-World Networks” Section for details) with diverse degree distributions, quantified in this paper in the form of the spectral radius ratio for node degree (Meghanathan 2014) (denoted: λsp(k), whose values are ≥ 1.0). In the case of random networks (whose λsp(k) values are in the vicinity of 1.0), none of the nodes were observed to have a LCC of 1.0 to begin with; hence, we cannot peel of layers of nodes in random networks. On the other hand, for scale-free networks (whose λsp(k) values are appreciably greater than 1.0), we could extract one or more peripheral layers of the nodes (as described above) before reaching the inner core. We measure the fraction of nodes that belong to each of the peripheral layers or the core layer and accordingly calculate the fraction of peripheral nodes (considering all the peripheral layers together) as well as the fraction of core nodes. We propose to categorize scale-free networks as either core-heavy (if the fraction of core nodes exceeds the fraction of peripheral nodes) or peripheral-heavy (if the fraction of peripheral nodes exceeds the fraction of core nodes).
Execution details of the proposed strategy
We hypothesize that nodes with an LCC of 1.0 suit to the role of peripheral nodes rather core nodes. To begin with, we compute the LCC value for each node in the given network. If none of the nodes have a LCC of 1.0, we stop and conclude the network under analysis is a random network. The iterative peeling strategy is initiated if one or more nodes in a given network incur an LCC of 1.0. We designate these nodes (with an LCC of 1.0) to belong to the 1-peripheral layer and remove them as well as their associated links. We compute the LCC values of the nodes in the 1-residual network (the network comprising of the remaining nodes and their associated links after the removal of nodes in the 1-peripheral layer) and repeat the above process, leading to the identification of nodes in the 2-peripheral layer (such nodes have an LCC of 1.0 in the 1-residual network) and their removal, leading to the 2-residual network. We continue the above process until we reach a p-residual network (by then we would have identified p-peripheral layers and one or more peripheral nodes in each of these layers) wherein the degree for all the residual nodes is 0.0 or the LCC values for all the residual nodes with a non-zero degree are less than 1.0. All the nodes in the p-residual network are labeled as core nodes. We determine the fraction of core nodes (computed as the ratio of the number of nodes in the p-residual network to the total number of nodes) and the fraction of peripheral nodes (computed as the ratio of the total number of peripheral nodes identified across all the 1-, 2-, …, p-periphery layers to the total number of nodes). If the fraction of core nodes is greater than or equal to 0.5, we designate the network as core-heavy; otherwise (the fraction of peripheral nodes is greater than 0.5), we designate the network as peripheral-heavy.
Execution of the LCC-based iterative peeling strategy on toy example graphs
We now illustrate the application of the proposed LCC-based iterative peeling strategy on two example graphs (shown in Figs. 2 and 3). The LCC values of the nodes in the initial network and each of the residual networks are shown alongside the nodes. The LCC value of a node is computed as the fractional value of the ratio of the actual number of links between the neighbors of the node to the maximum possible number of links among the neighbors of the node. For example, consider node 7 in the initial network of Fig. 2: the node has four neighbors {2, 4, 8, 10} and a total (maximum) of 4*(4–1)/2 = 6 links are required to connect any two of these neighbor nodes; however there are actually three links 2–4, 4–8 and 8–10 among these four neighbor nodes in the graph. Hence, the LCC of node 7 in the initial network of Fig. 2 is 3/6 = 0.50. If we were to compute the LCC values of all the nodes in graph-(a) of Fig. 1, we would observe none of the LCC values to be 1.0. This would lead to the graph to be representative of a random network, which it indeed is, as the graph has been generated from the well-known Erdos–Renyi model (Erdos and Renyi 1959) for random networks.
Figure 2 presents the application of the LCC-based iterative peeling strategy on graph-(b) of Fig. 1. Four nodes {3, 6, 9, 10} incur an LCC value of 1.0 in the initial network; these nodes are labeled as 1-peripheral nodes and removed from the network, leading to the 1-residual network comprising of the remaining nodes {1, 2, 4, 5, 7, 8} and the links connecting them (retained from the original network). We again compute the LCC values of these residual nodes to identify nodes {5, 8} to be part of the 2-peripheral layer and eventually remove them, leading to the 2-residual network comprising of nodes {1, 2, 4, 7}. The LCC values of these four nodes in the 2-residual network indicate {1, 4, 7} could now be part of the 3-peripheral layer and be removed, leading to the 3-residual network comprising of only node 2. At this stage, one of the two terminating conditions kicks in (the degree of all the residual nodes is a 0): the only remaining node, node 2 has a degree of 0. We categorize node 2 as a core node and include it as part of the nodes in the core layer. We thus observe a total of 3 peripheral layers (comprising of a total of nine nodes across these layers) and a core layer, with one node. The fractions of peripheral nodes and core nodes in this network are 9/10 and 1/10 respectively, and it is clearly a “peripheral-heavy” network.
Figure 3 presents the application of the LCC-based iterative peeling strategy on a second example graph. Three nodes {4, 9, 10} incur LCC values of 1.0 each and are removed as 1-peripheral nodes, leading to the 1-residual network comprising of the other seven nodes. The LCC values of all these seven nodes in the 1-residual network are less than 1.0 and there are no nodes with degree 0.0. Hence, we consider the terminating condition to have set in and categorize all the seven nodes {1, 2, 3, 5, 6, 7, 8} as core nodes. The network is a core-heavy network, with the fraction of core nodes being 7/10. Figure 4 presents a hierarchical visualization of the peripheral-heavy graph of Fig. 2 and the core-heavy graph of Fig. 3. We color the peripheral nodes at a particular layer with the same color and the color transitions from red to blue as we move through from the outer peripheral layers to the inner peripheral layers and further to the core; we also indicate the peripheral layer number (like 1-p, 2-p, 3-p) adjacent to the node. In the peripheral-heavy network, we transition from red color to blue color for the nodes as we penetrate through the network from the 1-peripheral layer to the 3-peripheral layer and further to the core.
Execution of k-shell decomposition on toy example graphs
The k-shell decomposition method is executed as follows: We start with a k-value of 1 and recursively remove all nodes with degree less than or equal to 1. We continue this process for k-values of 2, 3, etc. until there are no more nodes in the network. The shell index of a node is the k-value for which the node gets removed from the network per the k-shell decomposition method. Nodes whose shell index is the largest are considered the “core” nodes and the rest of the nodes (whose shell index is not the maximum) are considered the “peripheral” nodes. Per this convention, we hypothesize that the number of core nodes is more likely to be less than the number of peripheral nodes. Hence, we surmise that a majority of the real-world networks are likely to be classified as peripheral-heavy per the k-shell decomposition method, which is confirmed in “Application of the k-Shell Decomposition Method on Real-World Networks” Section. On the other hand, as we observe in the results of “Application of the LCC-based Iterative Peeling Strategy on Real-World Networks” Section: per our proposed LCC-based iterative peeling strategy, only 28 of the 80 real-world networks are peripheral-heavy and 44 of the 80 real-world networks are core-heavy.
Figure 5 presents the step-by-step execution of the k-shell decomposition method on the toy example graph of Fig. 3. Nodes 9 and 10 have a degree of 1, and their shell index is thus 1. After the removal of these two nodes, the rest of the nodes have a degree of at least 2. Hence, we now execute k-shell decomposition with a k-value of 2. Nodes 7 and 8 first get removed due to their degree being 2. This recursively triggers the removal of node 5 (whose degree becomes 2 due to the removal of nodes 9, 10 followed by nodes 7 and 8) and further triggers the removal of node 6. The shell index of nodes 5, 6, 7 and 8 is 2. We are now left with a clique of nodes 1, 2, 3 and 4, each with a degree of 3. These four nodes would thus be part of a 3-shell, which is also the largest value of the shell index observed for the nodes in this graph. Thus, nodes 1, …, 4 would be the core nodes and the other six nodes (5, …, 10) would be the peripheral nodes. With 6 of the 10 nodes being classified peripheral, the graph gets categorized as peripheral-heavy per the k-shell decomposition method. However, per the LCC-based iterative peeling strategy (see Fig. 3), the same graph is categorized as core-heavy (with only 3 of the 10 nodes: nodes 4, 9 and 10 classified as peripheral nodes, and the rest of the nodes are classified as core nodes).
Figure 6 presents a similar step-by-step execution of the k-shell decomposition method on the toy example graph of Fig. 2. Though the graph gets categorized as peripheral-heavy (as is the case with the LCC-based iterative peeling strategy as well), the classification of nodes as core/peripheral is different with respect to the two methods. k-shell decomposition classifies node 6 to be a core node (with the largest shell index of 3), the LCC-based iterative peeling strategy considers node 6 to be at the outermost periphery due to its LCC of 1.0 (no other node would go through node 6 for shortest path communication); the node is also farther away from the rest of the nodes in the network. On the other hand, node 2 (the only core node per the LCC-based iterative peeling approach) is classified as a peripheral node per the k-shell decomposition approach. When we run the k-shell decomposition method on the random graph of Fig. 1a, it would categorize the network as core-heavy, with nodes 1,…, 6 being part of a 3-shell core and node 0 being part of a 2-shell.
Comparison of the LCC-based iterative peeling strategy and k-shell decomposition method for core–periphery analysis
The k-shell decomposition method is solely degree-driven and is agnostic to the contribution of a node for shortest path communication as well as the proximity of the nodes to the rest of the nodes in the network. Nodes (like node 6 in the graph of Figs. 2 and 6 and node 4 in the graph of Figs. 3 and 5) that are part of cliques are more likely to be classified as core nodes, even if their betweeness centrality is 0 (i.e., the nodes are not part of the shortest paths between any two nodes in the network) and closeness centrality is larger (i.e., the nodes are farther away from the rest of the nodes in the network). Also, unlike our LCC-based iterative peeling approach, the k-shell decomposition method would not be able to identify a random network.
Figure 7 also presents a high-level visualization of the core–periphery classification of the two toy example graphs (analyzed in the earlier sub sections) obtained with the k-shell decomposition vis-a-vis the LCC-based iterative peeling strategy. We quantify the extent of similarity in their classification by counting the number of nodes whose class (Core or Periphery) is the same with respect to both the methods. The core/periphery class similarity index (CP Similarity Index: CPSI) of the two methods for a particular graph is the fraction of the nodes of the same class. For the toy example graphs of Figs. 2/6 and 3/5, we observe the CP similarity index between the k-shell decomposition and LCC-based iterative peeling strategy to be 4/9 = 0.44 and 5/10 = 0.50 respectively. In a scale of 0…1, the two numbers fall in the moderate range, quantitatively capturing the similarity and differences between the two methods with respect to the class assignment (core or periphery) for the individual nodes in a particular network.
Application of the LCC-based Iterative peeling strategy on real-world networks
We now present the results of the application of the LCC-based iterative peeling strategy on a suite of 80 real-world networks (modeled as undirected graphs) spread across several network-domains (such as: biological, social, friendship, literature, co-appearance, employment, transportation networks, etc.) in real-world and ranging theoretically from random networks to scale-free networks, primarily differentiated on the basis of the extent of variation in node degree. The minimum, maximum and median values for the number of nodes and links in the 80 networks are {22, 1538, 133} and {38, 16,715, 619} respectively.
We use the spectral radius ratio for node degree (Meghanathan 2014) (denoted: λsp(k)) to capture the extent of variation in node degree. The λsp(k) value for a network is computed as the ratio of the principal Eigenvalue of the adjacency matrix of the network and the average node degree. Random networks are expected to exhibit a relatively lower variation in their node degree compared to the scale-free networks (Meghanathan 2014). One of the significant contributions of this paper is the categorization of the scale-free networks as either core-heavy or peripheral-heavy, which has been hitherto not conducted in the literature. We also analyze the impact of the λsp(k) values on the categorization of the real-world networks of different domains as either random, core-heavy or peripheral-heavy scale-free networks.
Figures 8 and 9 present a comprehensive listing of the 80 real-world networks, their domain and their classification as either random (R), core-heavy (C) or peripheral-heavy (P), the number of peripheral layers (#p-levels) as well as their fraction of core nodes (the fraction of peripheral nodes is 1—fraction of core nodes), the fraction of core-core links, fraction of core–periphery links and the fraction of periphery-periphery links. The #p-levels in the random networks is 0 (as all nodes are peers, with none of the nodes incurring a LCC of 1.0). Of the 80 real-world networks, we observe 8 networks to be random, 44 networks to be core-heavy (C) and 28 networks to be peripheral-heavy (P). That is, at a high-level: only 10% of the real-world networks analyzed in this paper are random and 90% of the real-world networks are scale-free (either core-heavy or peripheral-heavy): indicating, there exists some level of hierarchy in majority of the real-world networks. Per our results, any real-world network is likely to be random only with a probability of 0.1 and scale-free (with some level of hierarchy among the nodes) with a probability of 0.9. As expected, the values for the fraction of core nodes is greater than 0.5 for core-heavy networks and is less than 0.5 for peripheral-heavy networks.
Fraction of links within and between the core and peripheral sets of nodes
Per the stochastic block model for core–periphery analysis, the fraction of core-core links (\(f_{links}^{CC}\)) is expected to be greater than the fraction of core–periphery links (\(f_{links}^{CP}\)), which is further expected to be greater than the fraction of periphery–periphery links (\(f_{links}^{PP}\)). The results in Figs. 8 and 9 show that such a requirement can be expected to be satisfied only for core-heavy networks and need not be satisfied for peripheral-heavy networks. Except for two networks (#55: Primary School Network and #73: US Airports 97 Network; see Fig. 9), the rest of the 42 core-heavy networks satisfy the requirement: \(f_{links}^{CC}\) > \(f_{links}^{CP}\) > \(f_{links}^{PP}\). For the two core-heavy networks #55 and # 73, \(f_{links}^{CC}\) > \(f_{links}^{PP}\) > \(f_{links}^{CP}\). However, such a clear ordering is not widely observed for all the peripheral-heavy networks. For 8, 8 and 12 of the 28 peripheral-heavy networks, we observe \(f_{links}^{CC}\), \(f_{links}^{CP}\), \(f_{links}^{PP}\) to be respectively the largest for the fraction of links, thus vindicating our earlier statement that the peripheral-heavy networks need not satisfy the requirement: \(f_{links}^{CC}\) > \(f_{links}^{CP}\) > \(f_{links}^{PP}\). The \(f_{links}^{CC}\) value could be the largest for peripheral-heavy networks whose fraction of core nodes is around 0.40 (i.e., closer to the cut-off of 0.50) and likewise, the \(f_{links}^{PP}\) value could be the largest for peripheral-heavy networks whose fraction of core nodes is around 0.10 (i.e., far away from the cut-off 0.50, indicating a significant fraction of the nodes are peripheral nodes).
#p-levels versus core-heavy or peripheral-heavy networks
Figure 10 presents an analysis of the #p-levels versus the number of networks observed to be either R (random), C (core-heavy) or P (peripheral-heavy). Networks that incurred 0 for the #p-levels were classified as random. Networks that incurred more than 0 for the #p-levels were either classified as core-heavy or peripheral-heavy, depending on the fraction of nodes across the different peripheral levels and the fraction of nodes at the core-level. The larger the value for the #p-levels, the more deeper is the hierarchy.
Figure 10 also presents the percentage of networks that are core-heavy, peripheral-heavy or random for each of the values for the #p-levels metric observed for the 80 real-world networks. Even though we observe the #p-levels to range from 1 to 15 for real-world networks that are not random, we observe 57 of the 72 real-world networks (i.e., about 80% of the scale-free real-world networks) to incur values of either 1, 2 or 3 for the #p-levels: indicating the majority of the real-world networks are not very deep in their hierarchy with respect to the distribution of the peripheral nodes across the different peripheral layers (p-levels). Of these 57 real-world scale-free networks with #p-levels = {1, 2 or 3}, 40 networks are core-heavy (out of a total of 44 core-heavy networks) and 17 networks are peripheral-heavy (out of a total of 28 peripheral-heavy networks). Hence, there is a 44/72 ~ 0.60 probability of observing a real-world scale-free network to be core-heavy, but if a real-world network is core-heavy, there is a 40/44 ~ 0.90 (a very high) probability of observing the #p-levels values for a core-heavy network to be either 1, 2 or 3 and even further a 33/44 ~ 0.75 probability of observing the #p-levels values for a core-heavy network to be just either 1 or 2. On the other hand, 11 of the 15 real-world networks whose #p-levels is more than 3 were observed to be peripheral-heavy: Hence, if the #p-levels for a real-world network is greater than 3, there is a good chance (11/15 ~ 0.75 probability) that the real-world network is peripheral-heavy. However, unlike the core-heavy networks whose #p-levels values were mainly either 1 or 2, the #p-levels values for the peripheral-heavy networks are widely distributed across the entire range {1,…, 15}.
Impact of the spectral radius ratio for node degree
Figure 11 presents a distribution of the spectral radius ratio for node degree (λsp(k)) values for the random (R), core-heavy (C) and peripheral-heavy (P) real-world networks. The median λsp(k) values for the R, C and P categories are respectively 1.11, 1.54 and 2.99 respectively: indicating real-world networks exhibit a gradual transformation from random networks to core-heavy scale-free networks and further to peripheral-heavy scale-free networks as the variation in node degree increases. We observe that 34/44 ~ 75% of the core-heavy real-world networks incur a λsp(k) value less than 2.0, whereas, the same percentage (21/28 ~ 75%) of the peripheral-heavy real-world networks incur a λsp(k) value greater than 2.0. Also, 50 of the 80 real-world networks analyzed in this paper reported a λsp(k) value of less than 2.0 and only 5 of these 50 networks are peripheral-heavy and 37 of these 50 (about 75%) real-world networks are core-heavy. Thus, we could use the λsp(k) value of 2.0 as a cut-off to prescribe a 0.75 probability for a real-world network to be core-heavy if its λsp(k) value is less than 2.0. Out of the remaining 30 real-world networks that incurred a λsp(k) value greater than or equal to 2.0, we observed 21 networks to be peripheral-heavy, indicating that there is a 21/30 ~ 0.70 probability that a real-world network could be peripheral-heavy if its λsp(k) value is greater than or equal to 2.0.
Also, the λsp(k) values for 4 of the 8 random real-world networks were less than 1.10, and the lowest λsp(k) value for a core-heavy network was observed to be 1.11. We could thus use the λsp(k) value of 1.10 as a cut-off to claim that if the λsp(k) value for a real-world network is less than 1.10, there is a 4/8 ~ 0.50 probability that the network is random. Seven of the eight random networks identified in this paper incurred λsp(k) values less than or equal to 1.25 and there are a total of 17 of the 80 real-world networks observed with λsp(k) values less than or equal to 1.25, implying there is a 7/17 ~ 0.40 (moderate) probability that a real-world network is random (rather than core-heavy or peripheral-heavy) if its λsp(k) value is less than or equal to 1.25. These are important observations and valuable insights on the λsp(k) metric vis-a-vis the categorization of real-world networks as random versus core-heavy or peripheral-heavy that have been hitherto not reported in the literature on complex network analysis.
Network domain versus core-heavy or peripheral-heavy categorization of real-world networks
We observe the social networks (with a 12/18 ~ 2/3 probability) to be predominantly core-heavy, also justified by their mean λsp(k) value of 1.46. The literature networks (8/14 ~ 0.57) and friendship networks (6/11 ~ 0.54) also exhibit a more than 50% chance to be core-heavy. The co-appearance networks (mainly involving networks that capture the co-appearance of characters in novels) are more likely to be peripheral-heavy (4/6 ~ 0.67). On the other hand, we observe an almost equal (50%) chance for the biological networks (7/15), employment networks (4/8) and transportation networks (2/4) to be either core-heavy or peripheral-heavy. Figure 12 presents a tabular summarization of the number of core-heavy, peripheral-heavy and random real-world networks observed across the different domains. Coinciding with our earlier observation that 47/80 ~ 60% of the real-world networks reported #p-levels values of either 1 or 2, the number of real-world networks with #p-levels ≤ 2 for each of the network domains were observed to be at least half the number of real-world networks under these domains. However, an interesting observation is that 16 of the 18 social networks were observed to incur #p-levels values ≤ 2. In other words, if a real-world network is a social network: there is about a 0.90 probability (16/18) that the #p-levels in this network will be ≤ 2, also vindicating our earlier observation that social networks are predominantly core-heavy.
Computational-time complexity for the LCC-based iterative peeling strategy
The LCC (local clustering coefficient) metric used in our research as the basis for identifying layers of peripheral nodes and the inner single layer of core nodes is a computationally-light metric that can be even computed asynchronously (as well as in parallel for all the nodes) using a node’s two-hop neighborhood information. If < k > = 2 m/n is the average degree of n nodes in a network of m links, the LCC value for a node can be computed in O(< k > 2) = O(m2/n2) time by enumerating all the possible pairs of neighbors of a node and checking the adjacency matrix for the presence of an edge between the nodes forming the pairs. If executed sequentially for all the nodes, we can determine the LCC values for all the nodes in the network in O((m2/n2)*n) = O(m2/n) time. It would take O(m + n) time to adjust the adjacency lists of the vertices to account for the removal of the peripheral nodes and their incident links at a particular p-level. The number of iterations of the iterative peeling strategy depends on the #p-levels for a network: for random networks, we just need one iteration of the algorithm; for core-heavy and peripheral-heavy networks, we need 1 + #p-levels iterations, leading to an overall-time complexity of O(#p-levels * (m2/n + m + n) of the algorithm for any network. For 57 of the 80 real-world networks analyzed in this section, we observed #p-levels value to be at most 3 and overall at most 15 for any of the 80 real-world networks. The range of values {1,…, 15} for the #p-levels is almost a constant compared to the number of nodes and links in the networks. Hence, the overall time complexity of the LCC-based iterative peeling strategy can be approximated to simply O(m2/n + m + n).
Core–periphery visualization for some benchmark networks
Several real-world networks (among the 80 networks) used in our analysis are benchmark networks analyzed in the literature on complex network analysis. Figure 13 presents the Yifan-Hu proportional layout (https://gephi.org/tutorials/gephi-tutorial-layouts.pdf)-based visualization (the blue-colored nodes are the core nodes and the red-colored nodes are the peripheral nodes) of the LCC-based core–periphery structure determined for eight of these benchmark networks: the Band Jazz network (Geiser and Danon 2003): (#62 in Fig. 9), the Karate network (Zachary 1977): (#63 in Fig. 9), the US Airports ‘97 network (Batagelj and Mrvar 2006): (#72 in Fig. 9) and the Celegans metabolic network (Domenico et al. 2015): (#2 in Fig. 8) for core-heavy networks, and the Copperfield network (Newman 2006): (#19 in Fig. 8), the Anna Karnenina network (Knuth 1993): (#21 in Fig. 8), the CS Department Aarhus network (Magnani et al. 1303): (#26 in Fig. 8) and the London transportation network (Domenico et al. 2014): (#75 in Fig. 9) for peripheral-heavy networks.
Like nodes 3 and 6 in the toy example graph of Fig. 2 (that are part of a clique, but found to incur a LCC of 1.0, thus classified as peripheral nodes at the outermost 1-peripheral layer, p-level = 1), we notice several nodes in the four peripheral-heavy networks shown in Fig. 13 that are part of a clique, but classified as peripheral nodes. The presence of such peripheral nodes with a larger degree (as they are part of a clique) contributes to the larger fraction of links between core-peripheral nodes and between peripheral-peripheral nodes compared to core-core nodes in peripheral-heavy networks. Though there are a significantly larger number of nodes at p-level = 1 in the peripheral-heavy networks, we can also notice the presence of several red-colored peripheral nodes in the midst of the blue-colored core nodes, especially in the Anna Karnenina network. These are the peripheral nodes at p-levels of 2 and 3 that are typically of larger degree and more likely to be connected to several core nodes compared to the peripheral nodes at p-level of 1. We also notice that in networks with multiple p-levels (like the US Airports '97 network whose #p-levels is 3), some peripheral nodes could be only connected to an another peripheral node (and not to any core node).
Modeling and predicting the core-heaviness or peripheral-heaviness of a scale-free network
After analyzing the results and the visualizations presented in Fig. 13, an intuitive reasoning that we get is that networks with a relatively larger number of nodes with a LCC of 1.0 to begin with (i.e., at p-level = 1) and with a relatively lower density (fraction of links = actual number of links / maximum possible number of links; the maximum possible number of links for a network of n nodes = n(n − 1)/2) are more likely to be peripheral-heavy; on the other hand, networks with a relatively lower fraction of nodes with an LCC of 1.0 to begin with (i.e., at p-level = 1) and those with a relatively larger density are more likely to be core-heavy.
Logistic regression model
We now seek to build a logistic regression (Hosmer et al. 2013)-based binary classification model that would use the initial fraction of nodes with an LCC of 1.0 (denoted as \(f_{nodes}^{LCC = 1.0}\)) and the fraction of links (denoted as \(f_{links}\)) as the two independent variables to predict whether a scale-free network is core-heavy (referred to as class-1) or peripheral-heavy (referred to as class-0). The hypothesis function for the logistic regression model is \(h_{\Theta } (X) = \frac{1}{{1 + e^{ - \Theta X} }}\), where \(\Theta X = \Theta_{0} + \Theta_{1} *f_{nodes}^{LCC = 1.0} + \Theta_{2} *f_{links}\). We use the entire dataset of 72 records (X corresponds to any record with values for the features \(f_{nodes}^{LCC = 1.0}\) and \(f_{links}\)) relevant to the scale-free real-world networks (44 of which are core-heavy and 28 are peripheral-heavy) to train our model and then use the same as well as the two toy example graphs of “LCC-based Iterative Peeling Strategy” Section for testing the prediction accuracy of the model. Note that per the logistic regression model, the value of the hypothesis function \(h_{\Theta } (X)\) for a test record X is the probability with which the real-world network corresponding to record X belongs to class-1 (i.e., core-heavy networks).
We use the Gradient-Descent algorithm (Hosmer et al. 2013) to train the model per the cost function for the logistic regression classifier. The values of the parameters \(\Theta_{0}\), \(\Theta_{1}\) and \(\Theta_{2}\) determined are respectively 7.8271, − 18.5466 and − 4.3851. Figure 14 presents the distribution of the \(f_{nodes}^{LCC = 1.0}\) versus \(f_{links}\) values and the decision boundary given by the equation \(\Theta X = 7.8271 - 18.5466*f_{nodes}^{LCC = 1.0} - 4.3851*f_{links} = 0\). The decision boundary line cuts the \(f_{nodes}^{LCC = 1.0}\) axis at the coordinate (0.4241, 0) and the \(f_{links}\) axis at the coordinate (0, 1.7849). We get these two coordinates by setting \(f_{links}\) = 0 in the equation \(\Theta X = 7.8271 - 18.5466*f_{nodes}^{LCC = 1.0} - 4.3851*f_{links} = 0\) and solving for \(f_{nodes}^{LCC = 1.0}\) as well as by setting \(f_{nodes}^{LCC = 1.0}\) = 0 in the equation and solving for \(f_{links}\). Since we associate the core-heavy networks to class-1 (and the peripheral-heavy networks to class-0), for any test record X, if ΘX ≥ 0 = > \(h_{\Theta } (X)\) ≥ 0.50, the corresponding real-world network is predicted to be a core-heavy network and if ΘX < 0 = > \(h_{\Theta } (X)\) < 0.50, the corresponding real-world network is predicted to be a peripheral-heavy network. Note that we break the tie (when ΘX = 0 = > \(h_{\Theta } (X)\) = 0.50) in favor of the core-heavy networks, as they are in majority in the training dataset (44 core-heavy and 28 peripheral-heavy). Now to decide the region that represents the class-1 core-heavy networks (and accordingly the class-0 peripheral-heavy networks) in the \(f_{nodes}^{LCC = 1.0}\) versus \(f_{links}\) plot vis-a-vis the decision boundary, we set \(\Theta X = 7.8271 - 18.5466*f_{nodes}^{LCC = 1.0} - 4.3851*f_{links} \ge 0\) (corresponding to class-1) and observe this is equivalent to \(18.5466*f_{nodes}^{LCC = 1.0} + 4.3851*f_{links} < 7.8271\). Thus, the region below the decision boundary represents class-1: core-heavy networks and the region above the decision boundary represents class-0: peripheral-heavy networks.
To evaluate the prediction accuracy of the above logistic regression model, we substituted the actual values for \(f_{nodes}^{LCC = 1.0}\) and \(f_{links}\) incurred for each of the 72 scale-free real-world networks in the equation \(\Theta X = 7.8271 - 18.5466*f_{nodes}^{LCC = 1.0} - 4.3851*f_{links}\), and thereby computed the value of the hypothesis function \(h_{\Theta } (X)\). If the \(h_{\Theta } (X)\) value for a real-world network came out to be ≥ 0.50, the network is predicted to be a core-heavy network and if the \(h_{\Theta } (X)\) value is less than 0.50, the network is predicted to be a peripheral-heavy network. We observed the predictions (see Fig. 15) to be accurate for 67 of the 72 real-world networks: the five real-world networks (highlighted in Fig. 15) for which the predictions were not correct are the net #s 4, 6 (both classified as core-heavy in Fig. 5, but predicted to be peripheral-heavy in Fig. 15) and net #s 36, 37 and 49 (all the three are classified as peripheral-heavy in Figs. 5 and 6, but predicted to be core-heavy in Fig. 15). The prediction accuracy of the (\(f_{nodes}^{LCC = 1.0}\), \(f_{links}\))-based logistic regression model to predict whether a scale-free real-world network is core-heavy or peripheral-heavy is 67/72 ~ 0.93. We thus show that the fraction of the number of nodes with an LCC of 1.0 and the fraction of links (network density) could be used to build a binary classifier that would predict whether the network is core-heavy or peripheral-heavy with a very high accuracy. This is another significant contribution of this paper that has not been hitherto reported in the literature. Since we have used a diverse suite of 72 real-world networks to build the logistic regression model \(18.5466*f_{nodes}^{LCC = 1.0} + 4.3851*f_{links} < 7.8271\) for core-heavy networks (otherwise, for peripheral-heavy networks), anyone with the (\(f_{nodes}^{LCC = 1.0}\), \(f_{links}\)) values for a real-world network could substitute these values in the above model and predict whether the real-world network is core-heavy or peripheral-heavy with a significant accuracy.
Testing with the toy example graphs
We now test the logistic regression model on the two toy example graphs of Figs. 2 and 3. For the example graph in Fig. 2: \(f_{nodes}^{LCC = 1.0}\) = 4/10 = 0.4 and \(f_{links}\) = 16/ (10*9/2) = 0.3556. Substituting these two values in the logistic regression model decision boundary equation \(\Theta X = 7.8271 - 18.5466*f_{nodes}^{LCC = 1.0} - 4.3851*f_{links}\), we get ΘX = − 1.1507 = > \(h_{\Theta } (X)\) = 0.2403 < 0.50, leading to the prediction that the graph of Fig. 2 belongs to class-0 (i.e., peripheral-heavy), which is indeed correct per our analysis shown in “LCC-based Iterative Peeling Strategy” Section. Likewise, for the example graph in Fig. 3: \(f_{nodes}^{LCC = 1.0}\) = 3/10 = 0.3 and \(f_{links}\) = 15/(10*9/2) = 0.3333. Substituting these two values in the decision boundary for the logistic regression model, we get ΘX = 0.8016 = > \(h_{\Theta } (X)\) = 0.6903 > 0.50, leading to the prediction that the graph of Fig. 3 belongs to class-1 (i.e., core-heavy), which is indeed correct as well per our analysis shown in “LCC-based Iterative Peeling Strategy” Section.
Application of the k-shell decomposition method on real-world networks
We now present the results of the application of the k-shell decomposition method on the suite of 80 real-world networks and compare the results with those obtained for the LCC-based iterative peeling strategy (see “Application of the LCC-based Iterative Peeling Strategy on Real-World Networks” Section). Figure 16 presents the # shells observed for each of the 80 the real-world networks and the classification of the networks as core-heavy or peripheral-heavy per the fraction of nodes (core nodes) in the shell corresponding to the largest value of k vis-a-vis the fraction of nodes (peripheral nodes) in the rest of the shells obtained during the decomposition. Figure 16 also presents the classification of the real-world networks per the LCC-based iterative peeling strategy as well as the core/periphery class similarity index (CP Similarity Index: CPSI) between the two methods. The net #s listed in Fig. 16 correspond to the net #s listed in Figs. 5 and 6. In Fig. 16, we have also provided a heat map visualization coloring for each of the three columns: the fraction of core nodes and the number of shells identified by the k-shell decomposition method as well as the CP similarity index values between the k-shell decomposition and LCC-based iterative peeling methods.
Of the 80 real-world networks, we observe the k-shell decomposition method to categorize only 22 networks as core-heavy and the remaining 58 networks are categorized as peripheral-heavy. The k-shell decomposition method does not have the capability to identify random networks; all the eight networks that were classified random per the LCC-based iterative peeling approach are categorized as core-heavy by the k-shell decomposition method. Of the remaining 22–8 = 14 core-heavy networks categorized by the k-shell decomposition method, ten networks are also classified as core-heavy by the LCC-based iterative peeling strategy, with a median CP similarity index of 0.74. Thus, only four of the 22 core-heavy networks categorized by the k-shell decomposition method were categorized as peripheral-heavy by the LCC-based iterative peeling strategy (the CP similarity index values for these four networks are 0, 0.03, 0.16 and 0.68). Of the 58 peripheral-heavy networks categorized by the k-shell decomposition method, 34 networks were categorized as core-heavy by the LCC-based iterative peeling strategy, and the median of the CP similarity index values for these networks is 0.41. The remaining 58 − 34 = 24 real-world networks were categorized as peripheral-heavy by both the methods, and the median of the CP similarity index values of these 24 networks is 0.79. Overall of the 80 real-world networks, 10 (core-heavy) + 24 (peripheral-heavy) = 34 real-world networks were similarly categorized by both the k-shell decomposition and the LCC-based iterative peeling strategies, and the median CP similarity index values of these 34 real-world networks is 0.79: indicating a higher similarity in the classes for the individual nodes as well. For the remaining 80 (total # networks)—8 (random networks)—34 (similarly categorized networks) = 38 real-world networks that were not similarly categorized by the two methods, the median of the CP similarity index values is 0.41, indicating only a low-moderate similarity in the classes for the nodes.
The median for the # shells identified by the k-shell decomposition method in the 22 core-heavy networks is 3.5 and in the 58 peripheral-heavy networks is 7.0. This is consistent with the observation of a relatively larger number of p-levels by the LCC-based iterative peeling strategy for its 28 peripheral-heavy networks vis-a-vis the 44 core-heavy networks. Thus, if the hierarchy of a real-world network is more deeper, the real-world network is more likely to be categorized as peripheral-heavy by both the k-shell decomposition and LCC-based iterative peeling approaches; likewise, networks with a shallow hierarchy are more likely to be categorized as core-heavy by both the approaches. On the other hand, the median λsp(k) values for the core-heavy and peripheral-heavy real-world networks categorized by the k-shell decomposition method are respectively 1.24 and 1.99, appreciably lower than those (1.54 and 2.99) observed for the LCC-based iterative peeling strategy. This implies, scale-free networks with moderate λsp(k) values (say, less than 2.0) are more likely to be categorized core-heavy by the LCC-based iterative peeling strategy and peripheral-heavy by the k-shell decomposition method.
Figure 17 presents a visualization (Yifan-Hu proportional layout) of the k-shell decomposition-based core–periphery structures of the same eight real-world networks that were earlier visualized in Fig. 13 (per the core–periphery structures obtained with the LCC-based iterative peeling strategy). In Fig. 17, the CP similarity index (shown as CPSI) values for the real-world networks between the k-shell decomposition and the LCC-based iterative peeling methods are listed alongside as well. In Fig. 17: depicting the core/peripheral nodes as identified by the k-shell decomposition method, we can observe low-moderate CPSI values of 0.24, 0.40, 0.57 and 0.59 for the real-world networks categorized as core-heavy by the LCC-based strategy in Fig. 13; on the other hand, we observe relatively larger CPSI values of 0.63, 0.76, 0.80 and 0.97 for networks categorized as peripheral-heavy by the LCC-based strategy.
A visual comparison of the core–periphery classification of the nodes (core nodes are in blue color and peripheral nodes are in red color) in Figs. 13 and 17 reveals that the core nodes identified by the k-shell decomposition approach are mostly a part of the dense block of core nodes identified by the LCC-based iterative peeling strategy as well as the peripheral nodes identified by the LCC-based iterative peeling strategy are typically a subset of the peripheral nodes identified by the k-shell decomposition method. A visual analysis of the eight benchmark real-world networks as well as the moderate-high median values for the CP similarity index values between the two methods (presented in Fig. 16) clearly indicate that the k-shell decomposition method basically underestimates the number of core nodes and overestimates the number of peripheral nodes, especially for networks categorized as core-heavy by the LCC-based iterative peeling strategy. This could be attributed to the over reliance of the k-shell decomposition method on the degree centrality metric for iteratively peeling the network rather than taking into consideration the betweeness and closeness of the nodes to the rest of the nodes in the network (which formed the motivation for the proposal of the LCC-based iterative peeling strategy).
Net # 62 Band Jazz network is an ideal case for showcasing a potential weakness associated with the k-shell decomposition method. A relatively smaller clique of high-degree nodes that is not in between a majority of node pairs as well as not closer to a majority of the nodes gets classified as the core nodes of the network per the k-shell decomposition method that also clubs the rest of the nodes (the peripheral nodes) across 20 shells, leading to a total of 21 shells and the network being categorized as peripheral-heavy. On the other hand, we observe the LCC-based iterative peeling strategy (see the results in Fig. 6) to categorize the Band Jazz network (also see the visualization in Fig. 13) as a core-heavy network, with just 2 p-levels. The Yifan Hu proportional layout of the Band Jazz network (featuring a larger/dense connected component of nodes) presented in Figs. 13 and 17 vindicate the articulation and accuracy of the LCC-based iterative peeling strategy in extracting core–periphery structures.
Related work
Core–periphery structure analysis is done at the intermediate-scale (neither at the network-level nor at the nodde-level), referred to as mesoscale-level (Yanchenko and Sengupta 2023) (i.e., groupings of nodes to categories, in this case: core and peripheral). A related mesoscale structure in network analysis is the notion of “communities” (also referred to as clusters) wherein nodes within a community (Newman 2010) are expected to be more densely connected and nodes across two communities are sparsely connected. Community detection algorithms (e.g., (Girvan and Newman 2002)) typically focus on determining modular communities (that possess a larger intra-cluster density and lower inter-cluster density). However, the objective to determine modular communities leads to the detection of non-overlapping communities (i.e., a node can be part of only one community). On the contrary, it has been observed in Yang and Leskovec (2014) that an objective to maximize the participation of nodes in different communities (per the community-affiliation graph model (Yang and Leskovec 2012)) leads to the detection of overlapping communities wherein a larger density was observed among the nodes in the overlapping segments compared to the density among nodes that are within just a single community. The authors of Yang and Leskovec (2014) advocated that such overlapping communities could be the precursor to detect core–periphery structures (nodes in the overlapping segments of the communities could form the core and those that are in just a single community could form the peripheral nodes). Community detection algorithms require global knowledge of the network for their execution and even the best known computationally-efficient community detection algorithm (the Louvain algorithm (Blondel et al. October 2008), implemented in Gephi (2022)) is of O(nlogn) time complexity, where n is the number of nodes in the network. On the other hand, the LCC metric used in our research as the basis for identifying layers of peripheral nodes and the inner single layer of core nodes is a computationally-light metric that can be computed asynchronously (as well as in parallel for all the nodes) just using a node's two-hop neighborhood information.
In (Gallagher et al. 2021), the authors proposed a stochastic block model for core–periphery structure analysis in undirected networks (a version for directed networks is proposed in Elliott et al. (2020)), wherein they associate probabilities pCC, pCP and pPP respectively to links between two core nodes, between a core node and a peripheral node and between two peripheral nodes such that it is expected that the determined core–periphery structure by any algorithm satisfies the relation: pCC > pCP > pPP. In Figs. 5 and 6, we observe the above relation holds mainly for core-heavy networks and need not be necessarily true for peripheral-heavy networks. For a majority of the peripheral-heavy networks, we observe pCP (the probability for a link between a core node and a peripheral node) to be the largest of all the three probabilities; for 5 of the 28 peripheral-heavy real-world networks, we even observed pCP (the probability for a link between two peripheral nodes) to be the largest. In Yanchenko and Sengupta (2023), the authors claim that a network could only have either a community structure or a core–periphery structure, not both. They claim that a network with a community structure would rather have: pCC > pPP > pCP in order to maximize the modularity scores of the communities. We see this claim as an opportunity to further investigate the difference in the effectiveness of the community detection algorithms when run on core-heavy networks vis-a-vis peripheral-heavy networks (in other words, would peripheral-heaviness of a network make a community detection algorithm vulnerable in its performance vis-a-vis a core-heavy network?).
A dominating set (Cormen et al. 2022) of a network is a subset of the nodes such that any node in the network is either in the dominating set or is directly connected to at least one node in the dominating set. The problem of determining a minimum dominating set is a NP-hard problem. In Papachristou (2021), the authors explored the connection between dominating sets and core–periphery structures and proposed that the core nodes of a network could be identified based on the dominating set of a network and the peripheral nodes could be nodes that are not in the dominating set. A potential problem that exists with building dominating set-based core–periphery structures is that the core nodes (nodes constituting the dominating set) need not form a connected backbone and there could exist multiple cores (especially, in larger networks). It would be indeed better to determine a minimum connected dominating set (Cormen et al. 2022; Meghanathan 2020) (MCDS; another NP-hard problem) of a network and associate the MCDS nodes to be part of the core. We plan to explore the connection between MCDS and core–periphery structure as part of future work.
The degree centrality metric is the centrality metric (Newman 2010) used to perform the k-shell decomposition (Newman 2010; Bollobás 1998), a well-known method in the literature to identify layers of core nodes in a network. The k-shell decomposition assigns the shell index of a node to be the largest value of k for which the node would be part of a k-connected sub graph (in which each node will have at least a degree of k) and the node cannot be part of any (k + 1)-connected sub graphs. However, (for example: see the toy example graph-b in Fig. 1), the k-shell decomposition method could assign a high shell index to the peripheral nodes (nodes 3 and 6) with a larger degree and a lower shell index to bridge nodes like node 2 (that are in the center of the network) with a relatively lower degree. The shell index for a node assigned through the k-shell decomposition method is agnostic to the contribution of the node in shortest path communication among the rest of the nodes in the network as well as unaware of the closeness of the node to the rest of the nodes in the network. The LCC (local clustering coefficient) metric captures the contribution of a node in shortest path communication: the LCC value of a node (especially, if it is of larger degree) could be low if the neighbors of the node are not connected to each other and need to go through the node for shortest path communication and likewise, the LCC value for a node could would be low if the neighbors of the node are directly connected to each other and do not require the node for shortest path communication. We opine that it is imperative to consider the contribution of nodes for shortest path communication (rather just node degree) while categorizing nodes are core or peripheral. Though (Cucuringu et al. 2016) also advocate the need to consider core nodes on the basis of their participation in shortest paths, they suggest the computation of betweenness centrality (BWC)-style scores for the nodes to quantify the extent to which a node could serve as the core node. However BWC-style metrics require global knowledge for computation as well as need to be computed synchronously: i.e., for all the nodes, even if one is interested in the score only for a particular node.
In a recent work (Polanco and Newman 2023), the authors advocated for a multi-core, multi-peripheral structure for a network that could be either part of a hierarchy or just as peers to each other. The authors also identify a kind of "inside out" structure in certain real-world networks wherein the core nodes are strongly connected amongst themselves as well as strongly connected to the nodes in the periphery. However, irrespective of their location in the network, the core nodes in Polanco and Newman (2023) are still nodes with a larger degree and the peripheral nodes are nodes with a lower degree. In this context, they even consider the US Football network (Girvan and Newman 2002) (net # 80 in Fig. 6, categorized as a random network in our LCC-based approach) to comprise of multiple cores (each core is a clique or a highly connected sub graph) that are primarily located in the periphery of the network and the relatively low degree nodes in the center to be the peripheral nodes that connect the multiple cores. We opine that random networks with a narrow Poisson-style degree distribution (i.e., with a very low variation in node degree) should not be considered for core–periphery structural analysis.
A notion of neighborhood-based bridge node centrality (NBNC) tuple was recently proposed in Meghanathan (2021) to quantify and rank the extent to which nodes could play the role of bridge nodes. A node is a bridge node (Meghanathan 2021) if when removed, its neighborhood graph (a sub graph featuring the neighbors of a node and the edges connecting the neighbors) either gets disconnected or is less sparsely connected. Nodes whose neighborhood graph gets disconnected to multiple components upon their removal are highly rated as bridge nodes. The bridge nodes could indeed be the nodes connecting the densely connected communities of core nodes to the peripheral nodes. For example: consider node 5 in Fig. 3; it is the top-ranked bridge node per the NBNC tuple as its removal would disconnect its 5-node neighborhood graph comprising of nodes {1, 6, 7, 9, 10} into three components. Upon its removal, node 5 connects the component (the core): {1, 2, 3, 5, 6, 7, 8} to the peripheral nodes {9} and {10} that are in their own components.
Conclusions and future work
The high-level contribution of this research is the proposal to use the local clustering coefficient (LCC) metric as the basis to iteratively extract peripheral layers of nodes in a network and thereby the inner core as well. We show that the LCC-based approach would overcome the weaknesses (such as classifying a high-degree node or clique far away from the rest of the nodes as a core and classifying a low-degree bridge node at the center of the network as a peripheral node) associated with classical approaches (such as k-shell decomposition) for core–periphery analysis. We applied the proposed LCC-based iterative peeling strategy on a suite of 80 real-world networks of diverse domains and observed that scale-free real-world networks could actually be core-heavy (fraction of core nodes ≥ 0.50) or peripheral-heavy (fraction of peripheral nodes, considered together across all the peripheral layers > 0.50), and the general expectation that the fraction of core-core links be the largest need not be satisfied by the peripheral-heavy networks (for which the fraction of core–periphery links was observed to be the largest in several cases). We observed the core-heavy networks to be not as deep as the peripheral-heavy networks with respect to the number of peripheral-levels. Nevertheless, the number of peripheral-levels for any of the 80 real-world networks analyzed is significantly lower than the number of nodes in these networks and could be assumed a constant with respect to the time complexity for the iterative peeling strategy (each iteration corresponds to a peripheral level). We observe the core-heavy networks to exhibit a relatively lower variation in node degree compared to peripheral-heavy networks. We also show that the terminating condition of the LCC-based iterative peeling approach (we stop when all the residual nodes have a degree of 0 or none of the residual nodes of non-zero degree have a LCC of 1.0) could be effective in identifying random networks (for which we stop with just one iteration, as none of the nodes in random networks were observed to have a LCC of 1.0) in which all nodes are peers with significantly less variation in node degree. Using the results obtained for the 72 real-world networks, we also built a logistic regression-based binary classification model that could take the tuple (fraction of nodes with LCC of 1.0 in a network, fraction of links representing network density) for a real-world network as input and predict whether the network is core-heavy or peripheral-heavy; the prediction accuracy was observed to be 93%.
As part of future work, we plan to investigate the effectiveness of a MCDS (minimum connected dominating set) for core–periphery structure analysis as well as investigate the effectiveness of community-detection algorithms to determine modular communities in peripheral-heavy networks. We also plan to examine the assortativity (Newman 2002) and algebraic connectivity (Meghanathan 2017a) of core-heavy networks versus peripheral-heavy networks. Also, if one is interested in further decomposing the core nodes identified using the LCC-based iterative peeling strategy, we propose the use of k-shell decomposition on the sub graph of core nodes (featuring the core nodes and the core-core links connecting these nodes). Since the LCC-based iterative peeling approach is used to identify such a sub graph of the core nodes, it is not likely to be a remote clique that would be subjected to k-shell decomposition (a vulnerability possible with the use of k-shell decomposition on the original network to identify the core nodes). As part of future work, we plan to explore the above idea as well as any other possible mechanisms to further decompose the core nodes, especially in the case of networks categorized as "core-heavy" by the LCC-based iterative peeling approach.
Availability of data and materials
No datasets were generated or analysed during the current study.
References
Batagelj V, Mrvar A (2006) Pajek datasets. http://vlado.fmf.uni-lj.si/pub/networks/data/
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 10:P10008
Bollobás B (1998) Modern graph theory. Springer
Borgatti SP, Everett MG (2000) Models of core/periphery structures. Soc Netw 21(4):375–395
Brandes U (2001) A Faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177
Cormen TH, Leiserson CE, Rivest RL, Stein C (2022) Introduction to algorithms, 3rd edn. MIT Press, Cambridge
Cucuringu M, Rombach P, Lee SH, Porter MA (2016) Detection of core–periphery structure in networks using spectral methods and geodesic paths. Eur J Appl Math 27:846–887
De Domenico M, Sole-Ribalta A, Gomez S, Areans A (2014) Navigability of interconnected networks under random failures. Proc Natl Acad Sci 111:8351–8356
De Domenico M, Nicosia V, Arenas A, Latora V (2015) Structural reducibility of multilayer networks. Nat Commun 6(1):6864
Della Rossa F, Dercole F, Piccardi C (2013) Profiling core–periphery network structure by random walkers. Sci Rep 3(1467):1–8
Elliott A, Chiu A, Bazzi M, Reinert G, Cucuringu M (2020) Core–periphery structure in directed networks. Proc R Soc A: Math Phys Eng Sci 476(2241):1–22
Erdos P, Renyi A (1959) On random graphs I. Publ Math 6(3–4):290–297
Freeman L (1977) A set of measures of centrality based on betweenness. Sociometry 40(1):35–41
Freeman L (1979) Centrality in social networks: conceptual classification. Soc Netw 1(3):215–239
Gallagher RJ, Young J-G, Welles BF (2021) A clarified typology of core–periphery structure in networks. Sci Adv 7(12):eabc9800
Geiser P, Danon L (2003) Community structure in Jazz. Adv Complex Syst 6(4):563–573
Girvan M, Newman MEJ (2002) community structure in social and biological networks. Proc Natl Acad Sci USA 99:7821–7826
Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, 3rd edn. Wiley
Knuth DE (1993) The stanford graphbase: a platform for combinatorial computing, 1st edn. Addison-Wesley, Reading
Kojaku S, Masuda N (2017) Finding multiple core–periphery pairs in networks. Phys Rev E 96:052313
Ma C, Xiang B-B, Chen H-S, Small M, Zhang H-F (2018) Detection of core–periphery structure in networks based on 3-tuple motifs. Chaos 28:053121
Magnani M, Micenkova B, Rossi L (2013) Combinatorial analysis of multiple networks. http://arxiv.org/abs/1303.4986 [cs.SI]
Meghanathan N (2014) Spectral radius as a measure of variation in node degree for complex network graphs. In: Proceedings of the 3rd international conference on digital contents and applications, Hainan, pp 30–33
Meghanathan N (2017a) Link selection strategies based on network analysis to determine stable and energy-efficient data gathering trees for mobile sensor networks. Ad Hoc Netw 62:50–75
Meghanathan N (2017b) A computationally-lightweight and localized centrality metric in lieu of betweenness centrality for complex network analysis. Vietnam J. Comput. Sci. 4(1):23–38
Meghanathan N (2017c) Randomness index for complex network analysis. Soc Netw Anal Min 7(25):1–15
Meghanathan N (2017d) Local clustering coefficient-based assortativity analysis of real-world network graphs. Int J Netw Sci 1(3):187–208
Meghanathan N (2020) A binary search algorithm to determine the minimum transmission range for minimum connected dominating set of a threshold size in Ad hoc networks. Int J Wirel Netw Broadband Technol 9(2):1–16
Meghanathan N (2021) Neighborhood-based bridge node centrality tuple for complex network analysis. Appl Netw Sci 6(47):1–36
Newman MEJ (2002) Assortative mixing in networks. Phys Rev Lett 89:208701
Newman MEJ (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104
Newman MEJ (2010) Networks: an introduction, 1st edn. Oxford University Press, Oxford
Papachristou M (2021) Sublinear domination and core–periphery networks. Sci Rep 11:15528
Polanco A, Newman MEJ (2023) Hierarchical core–periphery structure in networks. Phys Rev E 108(2):024311
Yanchenko E, Sengupta S (2023) Core–periphery structure in networks: a statistical exposition. Stat Surv 17:42–74
Yang J, Leskovec J (2012) Community-affiliation graph model for overlapping community detection. In: Proceedings of the 12th IEEE international conference on data mining, Brussels, Belgium, pp 1170–1175
Yang J, Leskovec J (2014) Structure and overlaps of ground-truth communities in networks. ACM Trans Intell Syst Technol 5(2):1–35
Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473
Acknowledgements
This work is partly supported by the following grants: through the U.S. National Science Foundation (NSF) grant OAC-1835439, the Army Research Office (ARO) Grant W911NF-21-1-0264 and a sub contract received from University of Virginia titled: Global Pervasive Computational Epidemiology, with the National Science Foundation as the primary funding agency. The views and conclusions contained in this paper are those of the authors and do not represent the official policies, either expressed or implied, of the funding agency.
Author information
Authors and Affiliations
Contributions
Natarajan Meghanathan is the sole author of this paper and did all the work and writing for this manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no Competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Meghanathan, N. Local clustering coefficient-based iterative peeling strategy to extract the core and peripheral layers of a network. Appl Netw Sci 9, 49 (2024). https://doi.org/10.1007/s41109-024-00667-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41109-024-00667-7