Skip to main content

Distributed Identification of Central Nodes with Less Communication

Abstract

This paper is concerned with distributed detection of central nodes in complex networks using closeness centrality. Closeness centrality plays an essential role in network analysis. Distributed tasks such as leader election can make effective use of centrality information for highly central nodes, but complete network information is not locally available. Evaluating closeness centrality exactly requires complete knowledge of the network; for large networks, this may be inefficient, so closeness centrality should be approximated. Here, situations for decentralised network view construction where a node has zero knowledge about other nodes on the network at initial and there is no central node to coordinate evaluations of node closeness centrality are considered. Unlike centralized methods for detection of central nodes, in decentralized methods an approximated view of the network must be available at each node, then each node can evaluate its own closeness centrality before it can share it with others when applicable. Based on our knowledge, there is no much work done under this setting where the leading approach consists of running the breadth-first search Skiena (1998) on each node with a limited number of iterations (which is less than the diameter of the graph into consideration), as done by You et al. (2017), Wehmuth and Ziviani (2012), before each node evaluates its centrality. Running the breadth-first search on each node in a decentralized fashion requires high cost in terms of communication. Our contribution is to consider a better way of constructing network view in a decentralised manner with less communication cost. This paper refines a distributed centrality computation algorithm by You et al. (2017) by pruning nodes which are almost certainly not most central. For example, in a large network, leave nodes can not play a central role. This leads to a reduction in the number of messages exchanged to determine the centrality of the remaining nodes. Our results show that our approach reduces the number of messages for networks which contain many prunable nodes. Our results also show that reducing the number of messages may have a positive impact on running time and memory size.

Introduction

Centrality metrics play an essential role in network analysis (Lam and Reiser 1979). For some types of centrality metrics such as betweenness centrality, evaluating network centrality exactly requires complete knowledge of the network; for large networks, this may be too costly computationally, so approximate methods (i.e. methods for building a view of a network to compute centrality) have been proposed. Centrality information for highly central nodes can be used effectively for distributed tasks such as leader election, but distributed nodes do not have complete network information. The relative centrality of nodes is important in problems such as selection of informative nodes, for example, in active sensing (Nelson and MacIver 2006)—i.e. the propagation time required to synchronize the nodes of a network can be minimized if the most central node is known (Ramírez and Santoro 1979; Kim and Wu 2013). Here we consider closeness centrality—We wish to detect most central nodes in the network.

Not all nodes ultimately need to build a view of the network formed in their interaction since some nodes may realize quickly that they are not suitably central, i.e. they have small closeness centralities. The question is how to identify such insignificant nodes in a decentralised algorithm? This paper tackles this problem by introducing a pruning strategy which reduces the number of messages exchanged between nodes compared to the algorithm from You et al. (2017). When a leader is chosen based on closeness centrality, we observe that in the algorithm of You et al. (2017), even nodes which can not play a central role, for example leaves, overload communication by receiving messages.Footnote 1

This work proposes modifications to You et al’s decentralised method to construct a view of a communication graph for distributed computation of node closeness centrality. Since we consider an approximate and decentralised method to construct a view of a communication graph, inevitably nodes will construct different topologies describing their interaction network. We will use the term view as shorthand for view of a communication network.

Our proposed method can be applied to arbitrary distributed networks, and is most likely to be valuable when nodes form very large networks: reducing the number of messages will be a more pressing concern in large-scale networks. Such applications include instrumented cars, monitoring systems, mobile sensor networks or general mobile ad-hoc networks. There are many reasons for reducing the number of messages in distributed systems. Here, we consider the following reason. A careful treatment of network communication of agents under weak signal conditions is crucial (Tomic et al. 2012).

Contributions.

We consider situations for decentralised network view construction where a node has zero knowledge about other nodes on the network at initial and there is no central node to coordinate evaluations of node closeness centrality. Unlike centralized methods for detection of central nodes, in decentralized methods an approximated view of the network must be available at each node, then each node can evaluate its own closeness centrality before it can share it with others when applicable. Based on our knowledge, there is no much work done under this setting where the leading approach consists of running the breadth-first search (Skiena 1998) on each node with a limited number of iterations (which is less than the diameter of the graph into consideration), as done by You et al. (2017), Wehmuth and Ziviani (2012), before each node evaluates its centrality. Running the breadth-first search on each node in a decentralized fashion requires high cost in terms of communication. Our contribution is to consider a better way of constructing network view in a decentralised manner with less communication cost. At each iteration of view construction, each node prunes some nodes in its neighbourhood once and thereafter interacts only with the unpruned nodes to construct its view. We refer to this approach as pruning. The more of these prunable nodes a network contains the better our algorithm performs relative to the algorithm in You et al. (2017) in terms of number of messages. We empirically evaluate our approach on a number of benchmark networks from Leskovec et al. (2005) and Leskovec et al. (2007), as well as some randomly generated networks, and observe our method outperforms the benchmark method (You et al. 2017) in terms of number of messages exchanged during interaction. We also observe some positive impact that pruning has on running time and memory usage.

We make the following assumptions as in You et al. (2017):

  • nodes are uniquely identifiable;

  • a node knows identifiers of its neighbours;

  • communication is bidirectional, FIFO and asynchronous;

  • each agent is equipped with its own round counter.

The rest of the paper is organised as follows: Section 2 discusses the state of the art for decentralised computation of closeness centrality distribution of networks. The new algorithm will be discussed in Section 3. Section 4 discusses the results. In Section 5, we conclude and propose further work.

Background and related work

In a distributed system, the process of building a view of the network can be centralised or decentralised. It is centralised when the construction of the view of a network is performed by a single node, known as an initiator. Once the initiator has the view of a network, it may send it to the other nodes (Naz 2017). If the initiator is not known in advance, the first stage of constructing a view of a network will involve a selection of the initiator.

Decentralised methods to construct views can be exact or approximate. Recent literature on methods to construct views can be found in Naz (2017). Exhaustive methods build the complete topology of the communication graph—they involve an all-pairs shortest path computation. As a consequence, exact approaches suffer from problems of scalability (Naz 2017). This can be a particular issue for large networks of nodes with constrained computational power and restricted memory resources.

To overcome the problem mentioned above that exhaustive methods suffer from, approximate methods have been proposed. Unlike exhaustive methods, approximate methods do not result complete knowledge of the communication graph.

Many distributed methods for view construction are centralised, i.e. they require a single initiator in the process of view construction. The disadvantage of approximate methods is that the structure of a view depends on the choice of the initiator. An interesting distributed approach for view construction can be found in Kim and Wu (2013), where an initiator constructs a tree as the view.

To the best of our knowledge and according to Naz (2017), decentralised approximate methods for view construction are very scarce because many methods for view construction assume some prior information about the network, so centralised methods are more appropriate. The method proposed by You et al. (2017) seems to be the state of the art for decentralised approximate methods for view construction, i.e. views are constructed only from local interactions. This method simply runs breadth-first search (Skiena 1998) on each node.

We next show the decentralised construction of a view using the algorithm in You et al. (2017).

Decentralised view construction

We consider the method proposed in You et al. (2017)—the “YTQ method”—as the state of the art for decentralised construction of a view of a network. The YTQ method was proposed for decentralised approximate computation of centrality measures (closeness, degree and betweenness centralities) of nodes in arbitrary networks. As treated in You et al. (2017), these computations require a limited view. At the end of the interaction between nodes, each node can estimate its centrality based on its own view. In the following, we show how nodes construct views using the YTQ method. Here we consider connected and unweighted graphs, and we are interested in computation of closeness centrality.

Let \(\delta _{ij}\) denote the path distance between the nodes \(v_i\) and \(v_j\) in an (unweighted) graph G with vertex set \({\mathcal {V}}\) and edge set \({\mathcal {E}}\). The path distance between two nodes is the length of the shortest path between these nodes.

Definition 0.1

The closeness centrality (Bavelas 1950) of a node is the reciprocal of the average path distance from the node to all other nodes. Mathematically, the closeness centrality \(c_i\) is given by

$$\begin{aligned} c_i = \frac{|{\mathcal {V}}|-1}{\sum _{j} \delta _{ij}}\,. \end{aligned}$$
(1)

Nodes with high closeness centrality score have short average path distances to all other nodes.

Each node’s view is gradually constructed based on message passing. Each node sends its neighbour information to all of its immediate neighbours which relay it onward through the network. Communication between nodes is asynchronous, i.e. there is no common clock signal between the sender and receiver.

figure a

The YTQ method (You et al. 2017) that each node \(v_i\) uses to construct its view is given in Algorithm 1.

Let \({\mathcal {N}}_i\) be the set of neighbours of \(v_i\) and \({\mathcal {N}}_i^{(t)}\) the set of nodes at distance \(t+1\) from a node \(v_i\), so \({\mathcal {N}}^{(0)}_i ={\mathcal {N}}_i\). The initial set of neighbours, \({\mathcal {N}}_i\), is assumed to be known.

During each iteration, each node sends its neighbourhood information to all its immediate neighbours. We are restricted to peer-to-peer communication because nodes have limited communication capacity. Each node waits for communication from all of its direct neighbours after which it updates an internal round counter. A node \(v_i\) stores messages received in a queue, represented by \({\mathcal {M}}_i\). After round \(t\ge 1\), the topology of \(v_i\)’s view is updated as follows

$$\begin{aligned} {\mathcal {N}}^{(t)}_i\leftarrow \bigcup _{v_j \in {\mathcal {N}}_i}{\mathcal {N}}^{(t-1)}_j \setminus {\mathcal {N}}_{i, t}\,, \end{aligned}$$
(2)

where

$$\begin{aligned} {\mathcal {N}}_{i, t}\leftarrow \bigcup _{k=0}^{t-1}{\mathcal {N}}^{(k)}_i\,. \end{aligned}$$
(3)

The algorithm terminates after at most D iterations, where D is an input of the algorithm.Footnote 2 In this paper, we consider a pre-set value of D. However, some nodes can also reach their equilibrium stage before the iteration D (e.g. nodes which are more central than others). Such nodes need to terminate when equilibrium is reached (see Line 57 in Algorithm 1). A node \(v_i\) reaches equilibrium at iteration t when

$$\begin{aligned} {\mathcal {N}}^{(t)}_i = \emptyset . \end{aligned}$$

At the end, every node has a view of network, and so the required centralities can be calculated locally. This view construction method is approximate when the total number of iterations D is less than the diameter of the graph, otherwise the method is exhaustive, i.e. all views correspond to the exact correct information, assuming a failure-free scenario. With a decentralised approximate method, nodes may have different views at the end. Each node will evaluate its closeness centrality based on its own view of the network.

Decentralised view construction

The idea behind pruning technique is that, during view construction, some nodes can be pruned (i.e. some nodes will stop relaying neighbour information). Pruned nodes are not involved in subsequent steps of the algorithm and their closeness centralities are treated as zero.

Our approach thus applies pruning after each iteration t of communication of the YTQ method. During the pruning stage, each node checks whether it or any nodes in its one-hop neighbourhood should be pruned. Nodes can identify the other nodes in their neighbourhood being pruned so that they do not need to wait for or send messages to them in subsequent iterations. This reduces the number of messages exchanged between nodes.

Given two direct neighbours, their sets of pruned nodes are not necessarily the same, and nodes do not need to exchange such information between themselves.

Pruning

Before describing our proposed pruning method, we argue that pruning preserves information of most central nodes of a graph using closeness centrality.

Theoretical justification

While we are aiming to estimate closeness centrality distribution on a graph using pruning, the concept of pruning can directly be related to eccentricity centrality (Hage and Harary 1995). The eccentricity of a node is the maximum distance between the node and another node. Eccentricity and eccentricity centrality are reciprocal to each other. We consider the following points to achieve our goal.

  • Pruned nodes have relatively high eccentricities (as will be discussed later in Lemma 0.3).

  • Previous studies show that eccentricity and closeness centralities are strongly positively correlated for various types of graphs (Batool and Niazi 2014; Meghanathan 2015). This is partly due to the fact that they both operate on the concepts of paths.

From what precedes, our proposed pruning method is then recommended for approximations of closeness centralities for categories of graphs where eccentricity and closeness centralities are highly correlated.

Description

We will first show how a node identifies prunable nodes after each iteration. There are two types of objects (leaves, and nodes causing triangles) that a node can prune after the first iteration. When a node is found to be of one of these types, it is pruned. Since nodes have learnt about their 2-hop neighbours after the end of the first iteration, a node \(v_i\) knows the neighbours \({\mathcal {N}}_j\) of each of its direct neighbours \(v_j\). Our pruning method is decentralised, so each node is responsible to identify prunable nodes in its neighbourhood, including itself. Let \(d_i\) denote the degree of node \(v_i\).

Definition 0.2

(A node causing a triangle) A non-leaf node \(v_j\) causes a triangle if \(d_j=2\) and its two immediate neighbours are immediate neighbours to each other.

Let \({\mathcal {F}}_i^{(t)}\) denote the set of pruned nodes known by a node \(v_i\) at the end of iteration t (with \(t\ge 1\)).

Fig. 1
figure 1

Illustration of graphs with prunable nodes

Let \({\mathcal {N}}^{\textrm{up}}_{i, t}={\mathcal {N}}_{i}{\setminus } {\mathcal {F}}^{(t-1)}_{i}\) denote the set of neighbours of \(v_i\) which have not yet been pruned at the beginning of iteration t. Functions that a node \(v_i\) applies to detect prunable elements in its neighbourhood after the first iteration are given in Algorithm 2 (see functions leavesDetection and triangleDetection).

Note that for complete graphs, pruning is not involved because, after the first iteration each node will realise that it its current view of the graph is complete.

So far, we have described how elements of \({\mathcal {F}}_i^{(1)}\) are identified by node \(v_i\). We now consider the case of further pruning which is straightforward: new nodes should be pruned when they have no new information to share with their other active neighbours. Thus a node stops relaying neighbouring information to neighbours from which it receives no new information.

At the end of iteration t, a node \(v_i\) considers itself as element of \({\mathcal {F}}_i^{(t)}\) if it gets all its new information from only one of its neighbours at that iteration. Also, node \(v_i\) prunes \(v_j\) if the neighbouring information \({\mathcal {N}}_j^{(t)}\) sent by \(v_j\) to \(v_i\) does not contain new information, i.e.

$$\begin{aligned} {\mathcal {N}}_j^{(t)}\subseteq \bigcup _{l<t}{\mathcal {N}}_i^{(l)}\,. \end{aligned}$$
(4)

Our hope is that the most central node is among the nodes which are not pruned on termination of the algorithm. Equation 4 indicates for a node \(v_j\) to be pruned, there must be another node \(v_i\) which is unpruned because a comparison needs to be done. This is true because there are always unpruned nodes which remain after the first iteration, except a complete graph of at most three nodes in which case pruning is not invoked. This means that at the end of our pruning method, there will always remain some unpruned nodes.

Nodes in \({\mathcal {F}}^{(t)}_i\) for \(t \ge 2\) could be viewed as leaves or nodes causing triangles in the subgraph obtained after the removal of all previously pruned nodes. Recall that an unpruned node only interacts with its unpruned neighbours and the number of the unpruned neighbours of a node may get reduced over iterations. So at some iteration an unpruned node can be viewed as leaf if it remains only with one unpruned neighbour.

Let

$$\begin{aligned} {\mathcal {F}}_{i, t} = \bigcup _{l=1}^t {\mathcal {F}}^{(l)}_i. \end{aligned}$$

The procedure for how a node \(v_i\) detects elements of \({\mathcal {F}}^{(t)}_i\) at the end of each iteration \(t\ge 2\) is given in Algorithm 2 (see function furtherPruningDetection).

After describing pruning, we now connect it to eccentricity (see Lemma 0.3) as mentioned above. Let \(\textrm{ecc}_i\) denote the eccentricity of a node \(v_i\), i.e.

$$\begin{aligned} \textrm{ecc}_i = \max _{v_j\in {\mathcal {V}}} \delta _{ij}\,. \end{aligned}$$
(5)

Lemma 0.3

If \(v_j\in {\mathcal {N}}_i\) such that

$$\begin{aligned} v_j\in \bigcup _{t\ge 1}{\mathcal {F}}_i^{(t)}, \end{aligned}$$

then

$$\begin{aligned} \textrm{ecc}_j \ge \textrm{ecc}_i. \end{aligned}$$

Proof

A node \(v_j\) is pruned if it has a direct neighbour \(v_i\) from which it can receive new information while at the same time it can not provide new information to that neighbour. Given two direct neighbours \(v_j\) and \(v_i\) where \(v_j\) has been pruned at iteration t by \(v_i\), we have Eq. 4. From the definition of \({\mathcal {N}}_i^{(t)}\) and Eq. 4, it is straightforward that \(\textrm{ecc}_j\ge \textrm{ecc}_i\). \(\square\)

From Lemma 0.3 it can be seen that prunable nodes are nodes with relatively high eccentricities. So pruning can not introduce errors when searching for a node of maximum eccentricity centrality.

figure b
figure c

Example 1

Consider execution of Algorithm 2 on the communication graph in Fig. 1, with \(D=4\). The results of our pruning method on this graph are presented in Table 1 and discussed below.

After the first iteration, each node needs to identify prunable nodes in its neighbourhood, including itself. Using Algorithm 2, \({\mathcal {F}}_1^{(1)}=\{v_1\}\), \(v_1\) will prune itself. Also, \({\mathcal {F}}_2^{(1)}={\mathcal {F}}_3^{(1)}=\{v_1\}\), \(v_2\) and \(v_3\) will also prune \(v_1\). At the first iteration, all the leaves (i.e. \(v_5, v_6, v_9\) and \(v_{10}\)) are pruned; they all have \(\textrm{ecc}_i=6\). But, a non-leaf node, \(v_1\) (with \(\textrm{ecc}_i=4\)), is also pruned on the first iteration.

At the beginning of iteration \(t=2\), \(v_1, v_5, v_6, v_9\) and \(v_{10}\) are no longer involved since they have been identified as prunable nodes at the end of the previous iteration; \({\mathcal {F}}_2^{(2)}={\mathcal {F}}_4^{(2)}=\{v_4\}\) and \({\mathcal {F}}_7^{(2)}={\mathcal {F}}_8^{(2)}=\{v_8\}\); and the node \(v_3\) does not identify any prunable node after this iteration. At this iteration, the remaining nodes with high \(\textrm{ecc}_i\) (i.e. nodes \(v_4\) and \(v_8\)) are pruned. The results for the remaining iterations are shown in Table 1.

Table 1 Table indicating pruned nodes, identified at each node after each iteration using the graph in Fig. 1

In our proposed distributed system, at the end of each iteration each node is aware of whether each of its direct neighbour is pruned or not). A pruned node neither sends a message nor waits for a message. So when a node is still unpruned, it knows which immediate neighbours to send messages to and which to wait for messages from. This prevents the nodes from suffering from starvation or deadlock in failure-free scenarios (Coulouris et al. 2005).

Communication analysis

In this section, we evaluate the impact of pruning on the communication requirements for view construction. Let \(u_i^{(t)}\) denote the number of neighbours of \(v_i\) which have been pruned at the end of iteration t. Let \(Y^{(D)}_i\) and \(P^{(D)}_i\) be the number of messages that the node \(v_i\) receives according to Algorithm 1 and the number of messages \(v_i\) receives through the use of pruning in Algorithm 2 for D rounds respectively. We expect the number of messages any node \(v_i\) saves due to pruning to satisfy

$$\begin{aligned} \varDelta ^{(D)}_{i}\equiv Y^{(D)}_i-P^{(D)}_i\ge 0\,. \end{aligned}$$
(6)

A node \(v_i\) receives \(d_i\) messages at the end of each iteration using the YTQ method. Recall that our proposed pruning and the YTQ methods can also terminate when an equilibrium is reached. Let \(H_i\) denote the iteration after which a node \(v_i\) applying the YTQ and our pruning methods reaches an equilibrium. This value is the same for both algorithms because at iteration t, a node \(v_i\) (which should be an unpruned node using our proposed method) has the same view using both algorithms. Note that for our pruning method, a pruned node does not reach an equilibrium and it is not possible to prune all nodes before equilibrium. Let \(h_i^{(t)}\) be the number of neighbours of \(v_i\) which have reached equilibrium at the end of iteration t. If at least one neighbour of an unpruned node \(v_i\) has reached equilibrium by iteration t, then \(v_i\) will reach equilibrium by iteration \(t+1\). For the YTQ method,

$$\begin{aligned} Y^{(D)}_i = \sum _{t=1}^{\min (D, H_i)} \left( d_i-\sum _{l=0}^{t-1} h_i^{(l)}\right) \,. \end{aligned}$$
(7)

Let \(L_i\) denote the round at which a node \(v_i\) is pruned (\(L_i=+\infty\) for unpruned node \(v_i\)).

Lemma 0.4

The number of messages received by a node \(v_i\in {\mathcal {V}}\) in Algorithm 2 is

$$\begin{aligned} P^{(D)}_i = \sum _{t=1}^{\min (D, H_i, L_i)} \left( d_i-\sum _{l=0}^{t-1} \left( h_i^{(l)}+u^{(l)}_i\right) \right) . \end{aligned}$$

Proof

At the end of each iteration t, the node \(v_i\) receives \(\left( d_i-\sum _{l=0}^{t-1} \left( h_i^{(l)}+u^{(l)}_i\right) \right)\) messages. Note that \(u_i^{(0)}=h_i^{(0)}=0\). Also a node can stop interacting with other nodes after it is pruned. If the node \(v_i\) is pruned at the end of iteration \(\min (D, H_i, L_i)\), then it stops receiving messages. \(\square\)

Theorem 0.5

The number of messages saved by a node \(v_i\in {\mathcal {V}}\) in a failure-free scenario is

$$\begin{aligned} \varDelta ^{(D)}_{i}&= \sum _{t=1}^{\min (D, H_i, L_i)} \sum _{l=0}^{t-1} u^{(l)}_i+ \sum _{t=\min (D, H_i, L_i)+1}^{\min (D, H_i)} \left( d_i-\sum _{l=0}^{t-1} h_i^{(l)}\right) \,. \end{aligned}$$

Proof

This is straightforward by Eqs. 6 and 7, and Lemma 0.4. \(\square\)

It is clear that pruned nodes build very limited views as they stop interacting with others once they are pruned. These nodes would have built broader views using the YTQ method (You et al. 2017). Thus if considering applications where all nodes are required to build broader views, our pruning method is not recommended.

Communication failure

We also extend our pruning method to take into account communication failures during view construction. For failure management, we simply incorporate the neighbour coordination approach proposed by Sheth et al. (2005) into our pruning method. We found the coordination approach for failure management most suitable for our pruning method because each node can monitor the behaviour of its immediate neighbours and report failures and recoveries if detected, which suits our decentralised approach well. Details of the extended version of pruning with communication failure can be found in Masakuna (2020). (It should be noted that we exclude details of failure management so that we can focus on the main contribution of this work).

Experimental investigation

For the comparison of our proposed method with the YTQ method (You et al. 2017) in terms of the number of messages, we consider the total and the maximum number of messages received per node. We wish to see the impact of our pruning method on message complexity in the entire network. We wish to reduce the maximum number of messages per node because if the communication time per message is the bottleneck, reducing only the total number of messages may not be helpful.

Our experiments considered the following cases, in an attempt to comprehensively test the proposed approach:

  1. (1)

    Comparison of the number of messages received by nodes for each method on various networks. We also used a Wilcoxon signed-rank (Wilcoxon 1992) test and the effect size (Cohen 1962) to verify whether the mean differences of the number of messages between pruning and the YTQ method are significantly different.

  2. (2)

    Comparison of the approximated most central nodes obtained with our method to the YTQ method. A good approximation should choose a most central node with a small distance to the exact most central node. We also used a Wilcoxon signed-rank test and the effect size to verify whether the mean differences of shortest path distances between approximate central nodes obtained using the YTQ and our pruning methods with respect to the exact most central node on some random graphs are significantly different.

    In Section 3, we showed that pruning is related to node eccentricity which allows us to ensure approximation of closeness centrality using pruning because eccentricity and closeness centralities are positively and strongly correlated for various types of graphs (Batool and Niazi 2014; Meghanathan 2015). We run simulation experiments to determine the Spearman’s \(\rho\) (Spearman 1961) and Kendall’s \(\tau\) (Kendall 1938) coefficients between eccentricity and closeness centralities. We consider the Spearman’s \(\rho\) and Kendall’s \(\tau\) coefficients because they are appropriate correlation coefficients to measure the correspondence between two rankings.

For the hypothesis tests, the significance level we use is 0.01.

Experimental setup

We implemented our pruning method using Python and NetworkX (Hagberg et al. 2013). Our simulation was run on two HPC (High Performance Computing) clusters hosted by Stellenbosch University. Our code can be found at https://bitbucket.org/jmf-mas/codes/src/master/network.

We ran several simulations with random graphs (generated as discussed below), as well as some real-world networks.

Randomly generated networks

We used a 200x200 grid with integer coordinates and generated 50 random connected undirected graphs as follows: We generated N uniformly distributed grid locations (sampling without replacement) as nodes. The number of nodes, N, was sampled uniformly from [50, 500]. Two nodes were connected by an edge if the Euclidean distance between them was less than a specified communication range \(d=8\). The number of edges and the diameter for these graphs were in the intervals [50, 2000] and [20, 60] respectively—see Fig. 2.

Fig. 2
figure 2

Diameters of random graphs. We use a binwidth of 2

Real-world networks

The 34 real-world graphs we consider are a phenomenology collaboration network (Leskovec et al. 2007), a snapshot of the Gnutella peer-to-peer network (Leskovec et al. 2007), and 32 autonomous graphs (Leskovec et al. 2005). The phenomenology collaboration network represents research collaborations between authors of scientific articles submitted to the Journal of High Energy Physics. In the Gnutella peer-to-peer network, nodes represent hosts in the Gnutella network and edges represent connections between the Gnutella hosts. Autonomous graphs are graphs composed of links between Internet routers. These graphs represent communication networks based on Border Gateway Protocol logs. Some characteristics of some of these networks are given in Table 2.

Results and discussion

Average and maximum number of messages

Fig. 3
figure 3

Differences in the number of messages between the YTQ method and our pruning method on the 32 autonomous graphs. Positive values indicate that our pruning method outperfoms the YTQ method. (3a): Reduction in the average number of messages. (3b): Reduction in the maximum number of messages

Fig. 4
figure 4

Numbers of messages per node using each method on a random graph with \(D=10\). On the horizontal axis Y and P denote the YTQ and pruning methods respectively

Our experiments illustrate the improved communication performance over the YTQ method in You et al. (2017) resulting from pruning. Figure 3 shows differences in the averages and in the maximum number of messages per node between the YTQ and our pruning methods on the 32 autonomous graphs. We see that the approach reduced communication by \(30-50\%\) on average for all network, with over \(75\%\) of networks reducing their maximum number of messages by \(30\%\) or more.

Table 2 Graph properties and the number of messages with each approach for five real-world networks (the phenomenology collaboration network, the Gnutella peer-to-peer network, and three autonomous networks)

Table 2 shows the average and the maximum number of messages per node for five real-world networks respectively. The number of messages per node for each technique on one random network are contrasted in Fig. 4. The results confirm that the pruning method is better than the YTQ method (Fig. 3).

Furthermore, it can be seen that for the same type of network (e.g. the three autonomous networks in Table 2), pruning is more involved on the second network than on the other two. Loose nodes have relatively high eccentricities. The number of prunable nodes which is shown by the pruning method is proportional to the structure of the graph considered, but not directly on the number of nodes or edges or any particular relationship between numbers of nodes and edges. This means the more loose nodes or nodes causing cycles found, the more pruning is involved. For instance, a ring and a path graph of n nodes both have almost the same number of edges. However, path graphs contains a larger number of prunable nodes (\(n-1\) nodes in total) then rings. Thus, for particular structures including networks with multiple paths as subgraphs as well as many cut vertices, pruning is more expressed.

Hypothesis test

We observed a p-value of \(7.96\times 10^{-90}\) and an effect size of 21.1140 between the number of messages obtained with our pruning and the YTQ methods on 50 random graphs containing 500 nodes each. The p-value is less than the threshold 0.01, so the means of the number of messages using the YTQ method against pruning are significantly different. In terms of effect size, according to the classification in Gail and Richard (2012), the effect size between our pruning and the YTQ methods is large (\(e\ge 0.8\)). So the means of the number of messages using the YTQ method against pruning differ markedly.

We also observed a reduction in total running time and memory usage from our approach. So no adverse effect on power usage from the approach.

Quality of selected most central node

First, for various graphs considered here, the averages and standard deviations of the Spearman’s \(\rho\) and Kendall’s \(\tau\) coefficients between eccentricity and closeness centralities are \(0.9237\pm 0.0508\) and \(0.7839\pm 0.0728\) respectively, and the correlation coefficients were all positive. This shows that there is predominantly a fairly strong level of correlation between eccentricity and closeness centralities for various graphs considered here. This confirms the results by Batool and Niazi (2014), and Meghanathan (2015). Note that we chose real-world graphs and random graphs modelled on what might be realistic for a sensor network—so no attempt to choose graphs from models with high correlation.

Tables 3 shows the shortest path distances between the exact most central node and approximated most central nodes using our pruning method and the YTQ method (You et al. 2017) for two random graphs. Figure 5 shows differences of shortest path distances between the exact most central node and approximated central nodes obtained with the YTQ and our pruning methods on two random graphs. For each of the graphs, we randomly vary D in a range of values smaller than the diameter of the graph.

Table 3 Shortest path distances between the exact most central node and approximated most central node using pruning and the YTQ method
Fig. 5
figure 5

Differences of shortest path distances between approximated central nodes obtained using the YTQ and our pruning methods with respect to the exact most central node on two random graphs: (5a): the first (diameter of 26, 125 nodes and 180 edges) and (5b): second (diameter of 36, 289 nodes and 597 edges)random graphs. Positive values indicate that our method is better than the YTQ method

The evaluation of node centrality based on a limited view of the communication graph has an impact on the choice of the most central node. When using our pruning and the YTQ methods to choose a leader based on closeness centrality, the methods can yield different results under the same conditions. We found that (Table 3 and Fig. 5) our pruning method generally gave better approximations to closeness centrality than the YTQ method when D is considerably smaller than the diameter, with the results of the YTQ method improving as D increases. This supports our claim that our pruning method effectively identifies nodes which should not be chosen as leaders as they are highly unlikely to have the highest closeness centrality. Note that, even though the two methods sometimes give the exact most central nodes for some D (for example for \(D=26\) in Table 3), these exact most central nodes are not guaranteed.

The reason why the YTQ method yields poor results when D is smaller than the diameter of the graph is as follows. When some of the nodes have different views of the communication graph and each evaluates its closeness centrality based only on its own view, a node with small but unknown exact closeness centrality may have a high estimated closeness centrality. This can lead to poor conclusions. In the YTQ method, the central node is selected from all nodes. The advantage of the pruning method is that only unpruned nodes compute their approximate closeness centralities, i.e. the many nodes that are pruned are no longer candidates for central nodes. This reduces the chance of yielding poor performance as the central node is selected from a shorter list of candidates, i.e. the unpruned nodes.

Hypothesis test

We observed a p-value of 0.1197 between the results obtained with our pruning and the YTQ methods on 50 random graphs of 500 nodes each. The p-value is greater than the threshold 0.01, so there is no significant difference between the means for the two approaches. This means that the qualities of the selected most central nodes using both methods are almost the same. This is beneficial to pruning—though they both provide almost the same qualities of selected most central nodes, pruning reduces the number of messages significantly compared to the YTQ method (You et al. 2017).

Conclusion

We proposed an enhancement to a benchmark method (You et al. 2017) for view construction. The main motivation of this enhancement was to reduce the amount of communication: we aim to reduce the number of messages exchanged between nodes during interaction. Given a network, some nodes can be identified early as being unlikely to be central nodes. Our main contribution was noting that we can identify such nodes and reduce communication by pruning them.

Our proposed method improves the benchmark method in terms of number of messages. We found that reduction of the number of messages has a positive impact on running time and memory usage (Masakuna 2020).

Future work. Our message counting model ignores the fact that in large networks, messages comprise multiple packets. Analysis of the savings of our approach in terms of the actual amount of data communicated could be investigated in future. Further, it may be possible to identify further types of prunable nodes and consider richer classes of graphs for assessment.

Notes

  1. Except in special cases, a leaf node should not play a central role.

  2. It can also be set or determined in a distributed manner (i.e. the value of D can be determined by nodes during interaction) as in Garin et al. (2012).

References

  • Batool K, Niazi MA (2014) Towards a methodology for validation of centrality measures in complex networks. PLoS ONE 9(4):e90283

    Article  Google Scholar 

  • Bavelas A (1950) Communication Patterns in Task-Oriented Groups. The Journal of the Acoustical Society of America 22(6):725–730

    Article  Google Scholar 

  • Cohen J (1962) The statistical power of abnormal-social psychological research: a review. Psychol Sci Public Interest 65(3):145

    Google Scholar 

  • Coulouris GF, Dollimore J, Kindberg T (2005) Distributed systems: concepts and design. Pearson Education

  • Gail MS, Richard F (2012) Using effect size-or why the P value is not enough. J Grad Med Educ 4(3):279–282

    Article  Google Scholar 

  • Garin F, Varagnolo D, Johansson KH (2012) Distributed estimation of diameter, radius and eccentricities in anonymous networks. IFAC Proc 45(26):13–18

    Article  Google Scholar 

  • Hagberg A, Schult D, Swart P, Conway D, Séguin-Charbonneau L, Ellison C, Edwards B, Torrents J (2013) Networkx. High productivity software for complex networks. Webová strá nka https://networkx.lanl.gov/wiki

  • Hage P, Harary F (1995) Eccentricity and centrality in networks. Soc Networks 17(1):57–63

    Article  Google Scholar 

  • Masakuna JF (2020) Active strategies for coordination of solitary robots. Ph.D. thesis, Stellenbosch University

  • Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1/2):81–93

    Article  MATH  Google Scholar 

  • Kim C, Wu M (2013) Leader election on tree-based centrality in ad hoc networks. Telecommun Syst 52(2):661–670

    Google Scholar 

  • Lam S, Reiser M (1979) Congestion control of store-and-forward networks by input buffer limits-an analysis. IEEE Trans Commun 27(1):127–134

    Article  Google Scholar 

  • Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, pp 177–187. ACM

  • Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1):2

    Article  Google Scholar 

  • Meghanathan N (2015) Correlation coefficient analysis of centrality metrics for complex network graphs. In: Computer science on-line conference, pp 11–20. Springer

  • Naz A (2017) Distributed algorithms for large-scale robotic ensembles: centrality, synchronization and self-reconfiguration. Ph.D. thesis, Université Bourgogne Franche-Comté

  • Nelson ME, MacIver MA (2006) Sensory acquisition in active sensing systems. J Comp Physiol A 192(6):573–586

    Article  Google Scholar 

  • Ramírez RJ, Santoro N (1979) Distributed control of updates in multiple-copy databases: a time optimal algorithm. In: Proceedings of the 4th berkeley conference on distributed data management and computer networks (Berkeley, CA), pp 191–207

  • Sheth A, Hartung C, Han R (2005) A decentralized fault diagnosis system for wireless sensor networks. In: IEEE international conference on mobile adhoc and sensor systems, pp 1–3. IEEE

  • Skiena SS (1998) The algorithm design manual, vol 1. Springer, New York

    MATH  Google Scholar 

  • Spearman C (1961) General intelligence objectively determined and measured. Am J Psychol 15:663–671

    Google Scholar 

  • Tomic T, Schmid K, Lutz P, Domel A, Kassecker M, Mair E, Grixa IL, Ruess F, Suppa M, Burschka D (2012) Toward a fully autonomous UAV: research platform for indoor and outdoor urban search and rescue. IEEE Robot Autom Magaz 19(3):46–56

    Article  Google Scholar 

  • Wehmuth K, Ziviani A (2012) Distributed assessment of the closeness centrality ranking in complex networks. In: The fourth annual workshop on simplifying complex networks for practitioners, pp 43–48. ACM

  • Wilcoxon F (1992) individual comparisons by ranking methods. In: Breakthroughs in statistics, pp 196–202. Springer

  • You K, Tempo R, Qiu L (2017) Distributed algorithms for computation of centrality measures in complex networks. IEEE Trans Autom Control 62(5):2080–2094

    Article  MathSciNet  MATH  Google Scholar 

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

In this manuscript authors have proposed a new distributed method for determination of central nodes of a network using centrality metrics. The proposed method outperforms other techniques in terms of number of messages exchanged during interaction. The method has also a positive impact on running time and memory usage. Jordan F. Masakuna came up with the idea, implemented the approach and ran experiments. Pierre K. Kafunda read, and reviewed and refined the proposal. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Jordan F. Masakuna.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supported by Optimall Research Lab

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Masakuna, J.F., Kafunda, P.K. Distributed Identification of Central Nodes with Less Communication. Appl Netw Sci 8, 13 (2023). https://doi.org/10.1007/s41109-023-00539-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s41109-023-00539-6

Keywords