In order to estimate node connectedness under a stochastic link disconnection, we propose a node ranking measure, called connectedness centrality, and its efficient sampling algorithm. To this end, we explain three versions of connectedness centrality measures. More specifically, we present the first centrality, called *c**n**c*_{1}, as a general theoretical framework and then derive the second, called *c**n**c*_{2}, as a computable measure by discretizing its prior probability distribution. We then propose a third, called *c**n**c*_{3}, as a practical measure equipped with its efficient estimation algorithm. This can be naturally explained as a special case of *c**n**c*_{2}, assuming that each link connection probability is the same, although this equal probability assumption can be easily relaxed, as shown in “Extension: case of non-uniform connection probabilities” section. Furthermore, to select multiple nodes, we propose group connectedness centrality by extending the target of connectedness centrality from each node to node groups.

### Connectedness centrality: *c*
*n*
*c*
_{1}

Let \(G = ({\mathcal {V}}, {\mathcal {E}})\) be the graph structure of a given spatial network. For each link \(e \in {\mathcal {E}}\), we consider a link connection probability *p*(*e*;*s*) that is determined according to some model, such as a road blockage model, based on geographical properties, where *s* is a parameter, just like an inverse of magnitude of earthquake, to control the probability *p*(*e*;*s*). We set *s* in the range of 0≤*s*≤1 for our convenience. Figure 1 depicts an uncertain graph introducing connection probabilities to a given spatial network. For each link \(e \in {\mathcal {E}}\), let *x*(*e*) be a random variable expressing the link connectivity, i.e., *x*(*e*)=1 if link e is connected; otherwise *x*(*e*)=0, where *p*(*x*(*e*)=1;*s*)=*p*(*e*;*s*). Then, by suitably arranging these random variables and setting \(\Omega = \{0, 1\}^{|{\mathcal {E}}|}\), we can construct an indicator vector expressed as **x**=(⋯,*x*(*e*),⋯)∈*Ω*, whose total number of possible instantiations (possible worlds) amounts to \(|\Omega | = |\{0, 1\}|^{|{\mathcal {E}}|} = 2^{|{\mathcal {E}}|}\). For each instance of the indicator vector **x**, we can obtain the corresponding graph \(G_{\textbf {x}} = ({\mathcal {V}}, {\mathcal {E}}_{\textbf {x}})\), where \({\mathcal {E}}_{\textbf {x}}= \{e~|~e \in {\mathcal {E}}, x(e)=1\}\). In this paper, assuming a basic model based on independent Bernoulli trials for all links, with repect to each graph *G*_{x} obtained from **x**, we can compute its occurrence probability as follows:

$$ q(\mathbf{x}; s) = \prod_{e \in {\mathcal{E}}} p(e; s)^{x(e)} (1-p(e; s))^{1-x(e)} = \prod_{e \in {\mathcal{E}}_{\mathbf{x}}} p(e; s) \prod_{e \in {\mathcal{E}} \setminus {\mathcal{E}}_{\mathbf{x}}} (1-p(e; s)), $$

(1)

where ·∖· stands for a set difference operator. Here, we should emphasize that, unlike most studies on uncertain graphs, where each link connection probability is designated as a value, our approach specifies each as a stochastic model of link connection *p*(*e*;*s*) controlled by parameter *s*.

After decomposing *G*_{x} into connected components, we compute the size of each connected component as the number of nodes belonging to the component and let *c*(*v*;*G*_{x}) be the set of nodes belonging to the connected component in which node \(v \in {\mathcal {V}}\) is included, where *c*(*u*;*G*_{x})=*c*(*v*;*G*_{x}) if the nodes *u* and *v* belong to the same connected components. In this study, under a given stochastic model of link connection, we define our connectedness centrality of node \(v \in {\mathcal {V}}\) by the expected size of the connected component where *v* is included. More specifically, for each node \(v \in {\mathcal {V}}\), we quantify our first version of connectedness centrality by the following expectation:

$$ \phi_{1}(v) = cnc_{1}(v) = \int_{0}^{1} \sum_{\mathbf{x} \in \Omega} |c(v ;G_{\mathbf{x}})| q(\mathbf{x}; s) r_{1}(s) ds, $$

(2)

where *r*_{1}(*s*) stands for a prior probability distribution with respect to parameter *s*. For instance, it can be used to express the fact that small earthquakes occur frequently, but huge ones are quite rare.

### Connectedness centrality: *c*
*n*
*c*
_{2}

Next, we consider computing the integration of *s* by the summation of *H*+1 equal interval points. Note that, for the *h*-th point (0≤*h*≤*H*), the link connection probability is set to *p*(*e*;*h*/*H*). Under this quantization, for each node \(v \in {\mathcal {V}}\), we can quantify our second version of connectedness centrality by the following expectation:

$$ \phi_{2}(v) = \sum_{h=0}^{H} \sum_{\mathbf{x} \in \Omega} |c(v ;G_{\mathbf{x}})| q(\mathbf{x}; h) r_{2}(h), $$

(3)

where \(r_{2}(h) = r_{1}(h/H)/\sum _{h'=0}^{H} r_{1}(h'/H)\).

Below, we propose computing the summation of \(2^{|{\mathcal {E}}|}\) times by *J* Monte Carlo simulations. Let \(G_{(h, j)} = ({\mathcal {V}}, {\mathcal {E}}_{(h, j)})\) be a graph obtained by the *j*-th simulation (1≤*j*≤*J*) at the *h*-th point (See Fig. 2); then, we can estimate our connectedness centrality *ϕ*_{2}(*v*) defined in Eq. (3) by the following:

$$ cnc_{2}(v) = \frac{1}{J} \sum_{h=0}^{H} \sum_{j=1}^{J} |c(v ;G_{(h, j)})| r_{2}(h). $$

(4)

Now, by considering the following expectation value of |*c*(*v*;*G*_{x})| denoted by 〈|*c*(*v*;*G*_{x})|〉_{Ω}, with respect to our simulation based on *q*(**x**;*h*/*H*),

$$ \langle |c(v ;G_{\mathbf{x}})| \rangle_{\Omega} = \sum_{\mathbf{x} \in \Omega} |c(v ;G_{\mathbf{x}})| q(\mathbf{x}; h/H), $$

(5)

we can see that *c**n**c*_{2}(*v*) is an unbiased estimator of *ϕ*_{2}(*v*), *i.e.*,

$$ \langle cnc_{2}(v) \rangle = \frac{1}{J} \sum_{h=1}^{H} \sum_{j=1}^{J} \langle |c(v ;G_{\mathbf{x}})| \rangle_{\Omega} r_{2}(h) = \phi_{2}(v). $$

(6)

Thus, by setting both *H* and *J* to sufficiently large values, we can naturally expect that *c**n**c*_{2}(*v*) defined in Eq. (4) can be reasonably accurately estimated to *c**n**c*_{1}(*v*) defined in Eq. (2). However, when straightforwardly computing *c**n**c*_{2}(*v*) for every \(v \in {\mathcal {V}}\) for a large *H* and *J*, we need a large computational load because its computational complexity becomes *O*(*H**J*(*N*+*L*)), where \(N = |{\mathcal {V}}|\) and \(L = |{\mathcal {E}}|\) respectively stand for the numbers of nodes and links for a given network. Note that the computational complexity of decomposing a graph into its connected components is *O*(*N*+*L*) and, during this process, we can simultaneously compute their sizes.

### Connectedness centrality: *c*
*n*
*c*
_{3}

Below, we propose another reasonably accurate estimate, referred to as *c**n**c*_{3}(*v*), instead of *c**n**c*_{2}(*v*) together with an effective algorithm whose computational complexity becomes *O*(*J*(*L*+*N* log*N*)), rather than *O*(*H**J*(*N*+*L*)). We assume that each link connection probability is the same, i.e., *p*(*e*;*h*/*H*)=*p*(*h*/*H*)=*h*/*H*, and define the set of graphs whose number of links is *h*, expressed as \(\Omega (h) = \{ \mathbf {x}~|~\sum _{e \in {\mathcal {E}}} x(e) = h \}\). This definition corresponds to employing a setting of *H*=*L*. Under this uniform probability setting, for each node \(v \in {\mathcal {V}}\), we can quantify our third version of connectedness centrality by the following expectation:

$$ \phi_{3}(v) = \sum_{h=0}^{H} \frac{1}{|\Omega(h)|} \sum_{\mathbf{x} \in \Omega(h)} |c(v ;G_{\mathbf{x}})| r_{2}(h). $$

(7)

Below, we estimate *ϕ*_{3}(*v*) by *J* Monte Carlo simulations.

In our proposed algorithm, from the initial state that all links are disconnected and thus all nodes are isolated in the setting *p*(0)=0, we repeatedly add a randomly selected link one by one until the final state where all original links are connected in the setting *p*(1)=1 (See Fig. 3). During this process, we attempt to efficiently compute the expected size of the connected component for each node \(v \in {\mathcal {V}}\) by focusing on the difference between the graphs caused by adding only one link. More specifically, for the *j*-th simulation, we assign a random order to each link \(e \in {\mathcal {E}}\), denoted by *e*^{(h,j)}, where we also use *h*∈{1,⋯,*H*} to express the order that the link becomes connected. By considering a graph defined by \(G^{(h, j)} = ({\mathcal {V}}, {\mathcal {E}}^{(h, j)})\), where \({\mathcal {E}}^{(h, j)} = \left \{ e^{(h', j)} \in {\mathcal {E}}~\left |~h' \leq h\right. \right \}\), we can estimate our connectedness centrality *ϕ*_{3}(*v*) defined in Eq. (7) by the following:

$$ cnc_{3}(v) = \frac{1}{J} \sum_{j=1}^{J} \sum_{h=1}^{H} \left|c\left(v ;G^{(h, j)}\right)\right| r_{2}(h). $$

(8)

By considering the following expectation value of |*c*(*v*;*G*_{x})|, denoted by 〈|*c*(*v*;*G*_{x})|〉_{Ω(h)}, with respect to our simulation based on 1/|*Ω*(*h*)|,

$$ \langle |c(v ;G_{\mathbf{x}})| \rangle_{\Omega(h)} = \frac{1}{|\Omega(h)|} \sum_{\mathbf{x} \in \Omega(h)} |c(v ;G_{\mathbf{x}})|, $$

(9)

we can see that *c**n**c*_{3}(*v*) is an unbiased estimator of *ϕ*_{3}(*v*), i.e.,

$$ \langle cnc_{3}(v) \rangle = \frac{1}{J} \sum_{h=1}^{H} \sum_{j=1}^{J} \langle |c(v ;G_{\mathbf{x}})| \rangle_{\Omega(h)} r_{2}(h) = \phi_{3}(v). $$

(10)

Thus, for uniform probability settings, by setting both *H* and *J* to sufficiently large values, we can naturally expect that *c**n**c*_{3}(*v*) defined in Eq. (8) can be a reasonably accurate estimate of *c**n**c*_{1}(*v*) defined in Eq. (2).

### Solution algorithm of *c*
*n*
*c*
_{3}

Below, we provide details of our proposed algorithm together with its computational complexity. In the initial state with no link, we set that every node belongs to an individually different component by assigning a unique component number *n*(*v*)∈{1,⋯,*N*} to each node \(v \in {\mathcal {V}}\). When a new link (represented by a red link in Fig. 3) denoted by *e*^{(h,j)}=(*x*,*y*)^{(h,j)} is added, we can proceed to the next link if nodes *x* and *y* belong to the same connected component; otherwise, we need to change the component number of nodes belonging to one component.

More specifically, by assuming |*c*(*x*;*G*^{(h,j)})|≥|*c*(*y*;*G*^{(h,j)})| without loss of generality, we propose that the component number with a smaller size is changed to a larger one by setting *n*(*z*)←*n*(*x*) for each *z*∈*c*(*y*;*G*^{(h,j)}). Evidently, for each link addition, the number of nodes whose component number is changed never exceeds *N*/2. Thus, during all link additions, the computational complexity of these renumbering processes becomes *O*(*N* log*N*).

Let \(cnc_{3}^{(h, j)}(v)\) be the partial summation of \(\left |c\left (v ;G^{(h', j)}\right)\right |\phantom {\dot {i}\!}\) until *h*^{′}=*h* for the *j*-th simulation defined by

$$ cnc_{3}^{(h, j)}(v) = \sum_{h'=1}^{h} \left|c\left(v ;G^{(h', j)}\right)\right| r_{2}(h'). $$

(11)

Now, suppose that when a new link *e*^{(h,j)}=(*x*,*y*)^{(h,j)} was added at the *h*-th step, nodes *x* and *y* switch to belong to the same connected component for the first time. For arbitrary *h*^{′}≥*h*, since \(\phantom {\dot {i}\!}c\left (x ;G^{(h', j)}\right) = c\left (y ;G^{(h', j)}\right)\), we can obtain the following relation:

$$ cnc_{3}^{(h', j)}(x) - cnc_{3}^{(h', j)}(y) = cnc_{3}^{(h-1, j)}(x) - cnc_{3}^{(h-1, j)}(y). $$

(12)

Thus, by maintaining the partial summation \(cnc_{3}^{(h', j)}(x)\) for a head node *x* of each connected component and keeping the difference values such as \(cnc_{3}^{(h-1, j)}(x) - cnc_{3}^{(h-1, j)}(y)\) for the other nodes in the component, we can obtain the final summation values, such as \(cnc_{3}^{(H, j)}(y)\), by using Eq. (12). Note that the computational complexity of obtaining \(cnc_{3}^{(h, j)}(v)\) for every \(v \in {\mathcal {V}}\) is *O*(*N*) and that of updating these difference values is *O*(*N* log*N*) because these updates can be done together with the above node renumbering processes. Therefore, since we need to shuffle and examine all of the links at the *j*-th simulation, the total computational complexity of our proposed algorithm becomes *O*(*J*(*L*+*N* log*N*)). Algorithm 1 and Fig. 4 show the details of the algorithm of connectedness centrality. In Algorithm 1, *delta* has two meanings: for the head node *s* of a connected component at step *h*, *s*.*d**e**l**t**a* indicates the partial sum of reachable nodes \(cnc_{3}^{(h-1,j)}(s)\); for the other appearing node *x*, *x*.*d**e**l**t**a* indicates the difference value of the partial summation of the reachable nodes between node *x* and its head node *s*, \(cnc_{3}^{(h,j)}(x)-cnc_{3}^{(h,j)}(s)\).

### Group connectedness centrality

Although we can extract high-connectedness nodes using our connectedness centrality, these nodes gather unevenly in some parts of the network because of focusing on whether or not they belong to the large connected component. Actually, as shown in “Results of connectedness centrality: *c**n**c*_{3}(*v*)” section, the top 1000 nodes of the connectedness centrality ranking are located near each other. This tendency is impractical for the purpose of estimating evacuation facility locations. To overcome this shortcoming, we enhance the notion of our connectedness centrality, called group connectedness centrality.

In group connectedness centrality, connectedness of the node set \({\mathcal {R}}\) is defined as:

$$ cnc_{1}({\mathcal{R}}) = \int_{0}^{1} \sum_{\mathbf{x} \in \{0, 1\}^{|{\mathcal{E}}|}} |c({\mathcal{R}} ;G_{\mathbf{x}})| q(\mathbf{x}; s) r_{1}(s) ds, $$

(13)

where \(c({\mathcal {R}} ;G_{\mathbf {x}}) = \bigcup _{r \in {\mathcal {R}}}c(r ;G_{\mathbf {x}})\) stands for the number of reachable nodes from whichever of \(r \in {\mathcal {R}}\).

Similarly to connectedness centrality, we compute the integration of *s* by the summation of *H*+1 equal interval points and set *r*(*s*) to be a uniform distribution.

$$ cnc_{3}({\mathcal{R}}) = \frac{1}{J} \sum_{j=1}^{J} \sum_{h=1}^{H} \left|c\left({\mathcal{R}} ;G^{(h, j)}\right)\right| r_{2}(h). $$

(14)

In order to select *K* nodes, \({\mathcal {R}}\), which maximizes the objective function defined in Eq. (14), we utilize a greedy algorithm. Hereafter, we refer to the selected node as the representative node. When selecting *k*-th representative node \({\hat r}_{k}\), the greedy algorithm fixes *k*−1 already selected nodes \({\mathcal {R}}_{k-1}\) and selects the node with the highest marginal gain, *MG*, defined by

$$\begin{array}{@{}rcl@{}} MG(v; {\mathcal{R}}_{k-1}) &=& cnc_{3}({\mathcal{R}}_{k-1} \cup \{v\}) - cnc_{3}({\mathcal{R}}_{k-1}) \\ &=& \frac{1}{J} \sum_{j=1}^{J} \sum_{h=1}^{H} mg(v;{\mathcal{R}}_{k-1})^{(h, j)} r_{2}(h), \end{array} $$

(15)

where \(mg(v;{\mathcal {R}}_{k-1})^{(h, j)} = \left |c\left ({\mathcal {R}}_{k-1} \cup \{v\} ;G^{(h, j)}\right) \setminus c\left ({\mathcal {R}}_{k-1} ;G^{(h, j)}\right)\right |\) stands for the increment of the number of reachable nodes when node *v*, which is a candidate for the *k*-th representative node, is added to \({\mathcal {R}}_{k-1}\). The total computational complexity of group connectedness centrality becomes *O*(*K**J*(*L*+*N* log*N*)). Let \({\mathcal {Q}}\) be a subset of \({\mathcal {R}}\), i.e., \({\mathcal {Q}} \subset {\mathcal {R}}\). Then we obtain \(mg(v;{\mathcal {Q}})^{(h,j)} \geq mg(v;{\mathcal {R}})^{(h,j)}\), which directly derives \(MG(v;{\mathcal {Q}}) \geq MG(v;{\mathcal {R}})\) from the definition of \(MG(v;{\mathcal {R}})\) shown in Eq. (15). Therefore, we can see that \(cnc_{3}({\mathcal {R}}))\) is a submodular function, and thus its greedy solution guarantees to be reasonably high quality with the worst case.

After selecting *K* representative nodes, each of the remaining nodes is assigned to the community where a representative node with the highest connectedness exists. Suppose that when a new link is added at the *h*-th step of the *j*-th simulation, node *v* switches to belong to the same connected component with representative node *r*. The degree of connectedness of nodes *v* and *r* is then defined as *f*(*v*,*r*)^{(j)}=1−*h*/*H*. Therefore, the degree of connectedness in all *J* simulations is \(F(v, r) = J^{-1}\sum _{j = 1}^{J} f(v, r)^{(j)}\). For each of the remaining nodes, we assign one community of a representative node with the highest connectedness as follows:

$${\mathcal{V}}^{(k)} = \{v \in {\mathcal{V}}; r_{k} = \text{arg~max}_{r \in {\mathcal{R}}} F(v,r) \}. $$

In the final stage of a simulation, when most representative nodes would belong to the same connected component and the degree of connectedness between a remaining node *v* and each representative node is equal, node *v* is assigned to the community of the closest representative node in terms of graph distance. Hereafter, we refer to this method as CNC and show the summary of CNC in Algorithm 2.

In the context of the evacuation facility location problem, the representative node corresponds to a candidate site for an evacuation facility.