In this section, we propose new methods for multiple-turns graph protection problem in dynamic networks, namely ReProtect and ReProtect-p. To restrain the spreading of epidemic in dynamic networks, we divide the protection budget wisely into several turns. The protected nodes are selected in each turn according to the currently observed temporal snapshot of dynamic network. Using the multiple-turns protection, we aim to address the changing of network structure and incoming rumor or virus attacks during the temporal transition in dynamic networks.
Figure 2 illustrates our proposed method in each turn, which takes a temporal snapshot of dynamic networks at time t as an input and determines the set of protected nodes. In each given turn, we determine the most critical set of nodes of the input network. A node is considered as a critical node if it is assumed that protecting such node contribute to block large-scale epidemic spreading (Chen et al. 2016; Wang et al. 2016, 2017).
The main idea of our method can be described in the following key points:
1. Minimum vertex cover (MVC)
At first, we aim to find the set of the most critical nodes in the input network. Many previous studies suggest that a certain critical node criterion is best for a certain type of network structure. For instance, degree centrality is most suitable for dense and highly centralized network (Lawyer 2015; Chen et al. 2016), while betweenness centrality and connectivity are well fit for clustered networks with the existence of graph bridges (Italiano et al. 2012; Khan et al. 2015; Lawyer 2015).
We propose to consider a minimum vertex cover (MVC) as a criterion to determine set of critical nodes from networks. Given a graph G=(V,E), a vertex cover is a subset of the nodes Vc⊆V such that every edge of G is connected to Vc. Hence, this set of nodes Vc in graph G cover every edge in G. A minimum vertex cover is a vertex cover with the smallest possible number of nodes. Every graph trivially has a vertex cover where Vc=V. Figure 3a shows the vertex cover, and Fig. 3b shows the minimum vertex cover for the same graphs. The complexity of vertex cover problem is NP-Complete, and that of the minimum vertex cover problem is NP-Hard.
As shown in Fig. 2, our input is a static network Gt, the observed snapshot of dynamic network at time t. We aim to completely cover all the connections in Gt, which are represented by edges, by the smallest possible size of nodes. The size definition of MVC is intuitively aligned with the limited size of the protection budget in graph protection problem. Following the definition of graph protection problem, we can show the role of MVC as the protection threshold in a network.
Theorem 1
(Protection Threshold) The protection threshold is the minimum required size of S to disconnect graph G such that no propagation may occur among nodes. Given an undirected connected graph G=(V,E), a minimum vertex cover of G is also a protection threshold of G.
Proof
A vertex cover Vc of G is a subset of the nodes Vc⊆V such that (u,v)∈E⇒u∈Vc∨v∈Vc. A minimum vertex cover \(V_{c}^{*}\) is a Vc with the smallest size as follows:
$$ V_{c}^{*} = \underset{V_{c}}{\text{arg\ min}} |V_{c}| $$
(4)
Since all edges in graph G is covered by \(V_{c}^{*}\):
$$ (u,v) \in E \Rightarrow u \in V_{c}^{*} \vee v \in V_{c}^{*}, $$
(5)
then by removing all corresponding edges in G connected to \(V_{c}^{*}\) we get \(G^{(V_{c}^{*})} = \left (V_{c}^{*}, E^{(V_{c}^{*})}\right)\). Thus, \(G^{(V_{c}^{*})}\) has no edge, i.e., \(E^{(V_{c}^{*})} = \{\}, \left |E^{(V_{c}^{*})}\right | = 0\).
According to Definition 1 and 2, protecting the set S of nodes in G is removing all edges of G connected to S. This is a minimax function of minimizing the size S to get the maximum edges in G covered as follows:
$$ S^{*} = \underset{S}{\text{arg\ min}} |\underset{E^{(S)}}{\text{arg\ max}} |E^{(S)}|| $$
(6)
Consequently, by protecting minimum vertex cover \(V_{c}^{*}\), i.e., \(S = V_{c}^{*}\), then G(S) has no edge. Hence, a minimum vertex cover \(V_{c}^{*}\) of G is also a protection threshold of G. □
2. Top-k highest degree MVC
Let us recall that MVC is a set of nodes without any requirement of ordering. Intuitively, given k budget, selecting any k nodes from \(V_{c}^{*}\) may result in a different set of nodes. Additionally, not all of the node in MVC should have the same priority to be protected within a limited budget. We consider that the more connected a node v to its neighbors in G, the more critical node v to be protected. Hence, after obtaining MVC nodes from the input network, we reorder MVC nodes using their degree value within the input network.
Suppose that at time t we are given an input temporal snapshot graph Gt and protection budget kt. Under the constraint of limited protection budget (kt), we select top- kt MVC nodes based on their degree value within graph Gt.
3. Reinforcement learning as solution approximation
Despite the protection threshold guarantee of MVC, finding the MVC nodes of graphs is NP-Hard (Hartman and Weigt 2006). We consider a reinforcement learning (RL) approach to approximate the solution. RL approach aims to obtain an optimal solution by maximizing the cumulative rewards without given any pre-defined deterministic policies (Mnih et al. 2015; Khalil et al. 2017). Such advantage enables us to exploit the known best policy while also consider exploring unknown policies to obtain an optimal solution.
More specifically, we leverage the n-step fitted Q-Learning (Khalil et al. 2017) to obtain MVC approximation with an efficient training process and scalable implementation. Hence, our proposed methods take the advantage of n-step Q-Learning (Sutton and Barto 1998) and fitted Q-iteration (Riedmiller 2005).
We let the n-step fitted Q-Learning iteratively learn to construct a vertex cover (Vc) solution of the input network. We define the RL environment as follows:
-
State (\(\mathbb {S}\)): set of currently selected Vc nodes from input graph
-
Action (\(\mathbb {A}\)): add new node v to vertex cover set \(\mathbb {S}\)
-
Reward (\(\mathbb {R}\)): -1, as our goal is to get the minimum size of vertex cover, we set a penalty for adding a new node into Vc set.
-
Termination criteria: all edges are covered
To quantify how good is taking an action \(a \in \mathbb {A}\) given a state \(s \in \mathbb {S}\), in Q-Learning, we have the Q-Function (Watkins 1989). Q-Function evaluates the pair of state and action and maps it into a single value, called Q-Value, using the following Bellman optimality equation:
$$ Q(s,a) = r + \lambda(\max(Q(s^{\prime},a^{\prime}))) $$
(7)
where \(s \in \mathbb {S}\) is a given state, \(a \in \mathbb {A}\) is the current action, r is the current reward, λ is the discount factor of the future rewards, \(s' \in \mathbb {S}\) is the next state, and \(a' \in \mathbb {A}\) is the next action. The calculation of Q-Function is performed and updated iteratively for each possible pair of state and action. The result of all Q-Value is stored in a table, called Q-Table. The best action for a given state is indicated by the highest Q-Value.
To obtain the maximum expected cumulative reward achievable from a given pair of state and action, we can compute the optimal Q-Function, denoted as Q∗, using the following equation:
$$ Q^{*}(s,a) = \max \mathbb{E} [\sum_{t \geq 0} \lambda^{t} r_{t}|s_{0} = s, a_{0} = a] $$
(8)
where s0 and a0 are the initial state and action respectively, t indicates a step which consists of: observe a state, perform an action, retrieve a reward, and observe the next state.
As the number of all possible pair of state and action can be very large, calculating the Q-Value in Q-Table is not efficient. Especially, if we are handling a large-size input network, using Q-Table is computationally infeasible and resource-consuming. A non-linear function approximator can be used to estimate the optimal Q-Function in Eq. (8) such that:
$$ Q(s,a,\Psi) \approx Q^{*}(s,a) $$
(9)
where Ψ is the function parameters (weights) of our non-linear function approximator Q(s,a,Ψ). A neural network or a kernel function can be used as the non-linear function approximator of Q-Function (Sutton and Barto 1998).
Recent studies show that neural networks or convolutional neural networks achieve state-of-the-art results as function approximators (Mnih et al. 2015; Sutton and Barto 1998). The neural network architecture also speed up learning in finite problems, due to the fact that it can generalize from earlier experiences to previously unseen states (Mnih et al. 2015). In this paper, we propose a convolutional neural network as the function approximator of optimal Q-Function. Recall that in Q-Function, our input is a given state and action to obtain Q-Value as output. The state is the given input graph with currently selected Vc nodes. The actions are the possible nodes to be included into current Vc. In convolutional neural network architecture, our input should represents both of those state and action. Hence, we need a same fixed-length feature representation of the graph and each of its node. Therefore, in our construction of minimum vertex cover, our function approximator in Eq. (9) will be denoted as:
$$ \hat{Q}(h(\mathbb{S}),v,\Psi) $$
(10)
where \(h(\mathbb {S})\) and v represent the fixed-length feature representation of the state \(\mathbb {S}\) and an action of adding node v using the neural network set of weights Ψ.
4. Graph embeddings as feature-based representations
We leverage an efficient and scalable graph embedding technique, called Structure2Vec (Dai et al. 2016; Khalil et al. 2017), to embed the input graph and each of its node. This graph embedding technique computes a d-dimensional feature embedding μv for each node v∈V, given the current partial solution \(\mathbb {S}\).
Given a temporal snapshot graph Gt, we embed each node v by constructing a d-dimensional embedding μv. All of \(\mu _{v}^{(0)}\) entries are initialized as zero, and for every v∈V we update it iteratively in T iterations as follows:
$$ \mu_{v}^{(t+1)}= \text{ReLU }(\psi_{1} x_{v} + \psi_{2} \sum_{u \in N(v)} \mu_{u}^{(t)} + \psi_{3} \sum_{u \in N(v)} \text{ReLU} (\psi_{4} w(u,v))), $$
(11)
with xv is node v own tag, whether being already selected or not. Selected node will be given tag = 1, otherwise 0. N(v) is the set of neighbors of node v in graph Gt. \(\sum _{u \in N(v)} \mu _{u}^{(t)} \) is the feature of node v neighbors. w(u,v) is the neighbors’ edge weight, to consider the weighted connection in weighted graph. ψ1,ψ2,ψ3, and ψ4 are the function parameters (weights) which specified as \(\psi _{1} \in \mathbb {R}^{d}\), \(\psi _{2} \in \mathbb {R}^{dxd}\), \(\psi _{3} \in \mathbb {R}^{dxd}\), and \(\psi _{4} \in \mathbb {R}^{d}\). ReLU is the rectifier linear unit activation function applied elementwise to input where ReLU(x)=x if x>0 and 0 otherwise.
Here we will explain how to get the function \(\hat {Q}(h(\mathbb {S}_{t}),v; \Psi)\) in Eq. (10). Once the embedding μv for each node v∈V is calculated using Eq. (11) after T iteration, we get \(\mu _{v}^{(T)}\). The pooled embedding of the entire graph Gt is then given by
$$ \sum_{u \in V} \mu_{u}^{(T)} $$
(12)
Then we can use it to estimate the optimal Q-Function in Eq. (10) as follows:
$$ \hat{Q}(h(\mathbb{S}),v;\Psi) = \psi_{5}^{\top} \text{ReLU }\left(\text{concat} \left(\psi_{6} \sum_{u \in V} \mu_{u}^{(T)}, \psi_{7} \mu_{v}^{(T)}\right)\right), $$
(13)
being \(\sum _{u \in V} \mu _{u}^{(T)}\) is the pooled embedding of the entire graph. ψ5,ψ6, and ψ7 are the neural network parameters (weights) which specified as \(\psi _{5} \in \mathbb {R}^{2d}\), \(\psi _{6} \in \mathbb {R}^{dxd}\), and \(\psi _{7} \in \mathbb {R}^{dxd}\).
To this end, we show that the pooled embedding of the entire graph is used as a surrogate to represent the state. And the embedding of each node is used as a surrogate to represent the action. The function \(\hat {Q}(h(\mathbb {S}),v)\) is depend on the collection of seven parameters \(\Psi = \{\psi _{i}\}_{i=1}^{7}\) which are learned during the training phase and will be evaluated during the evaluation phase. Figure 4 shows the architecture illustration of neural networks used in this paper.
a. Training Phase
Algorithm 1 illustrates our proposed training phase. In each training iteration, our method returns the neural network’s set of parameters Ψ which successfully get Vc from graph G. In line 5, we specify how to select a new node by balancing exploration and exploitation. In this case, the exploration means selecting a random nodes with probability ε. The exploitation means we aim to get the maximum expected cummulative rewards, i.e. by selecting a node which maximizes the function \(\hat {Q}(h(\mathbb {S}_{t}),v; \Psi)\). \(h(\mathbb {S}_{t})\) is the embedding of state \(\mathbb {S}\) at step t. The exploration probability ε is set to decrease from 1.0 to 0.05 linear to the iteration step. To efficiently train the neural network, we perform batch processing as described in line 9.
The loss function which learned to minimize is as follows:
$$ \left(y - \hat{Q}(h(\mathbb{S}_{t}), v_{t}; \Psi)\right)^{2} $$
(14)
being \(y = \sum _{i=0}^{n-1} r(S_{t+i},v_{t+i}) + \lambda \max _{v}' \hat {Q}\left (h(\mathbb {S}_{t+n}), v'; \Psi \right)\). n is the number of step updates.
b. Evaluation Phase
Algorithm 2 illustrates the evaluation phase of our proposed method. To get the best-trained neural network’s set of parameters (weights) Ψ∗, we evaluate the training result against a set of given graph GD available snapshots. We use this neural network set of parameters in the testing simulation of the graph protection.

c. Testing Phase
Algorithm 3 shows the testing phase of multiple-turns graph protection strategy on dynamic networks. We are given an input snapshot of graph Gt and budget kt. Each node in Gt is embedded into d-dimensional feature vector. The size of d is equal to the embedding size during training in Algorithm 1. The minimum vertex cover of Gt is then constructed using the best-trained neural network’s set of parameters Ψ∗ resulted from Algorithm 2. Finally, we get a set S of top- kt degree-ordered MVC nodes to be protected from the current temporal snapshot of graph Gt.
We also propose ReProtect-p method, a variant of ReProtect, which trained on the perturbed version of each available snapshot of dynamic networks. The perturbation is performed by removing edges probabilistically from the snapshot graph. Specifically, for each edge, we generate a random number. If the edge weight is smaller than the generated random number, the edge will be removed. We introduce this variant to provide more variety to the training data and avoid possible overfitting issue.
Computational complexity analysis
Based on Algorithm 3, we present the analysis of computational complexity of our proposed ReProtect method. The cost of step 1 to initialize empty set S is constant. The step 2 and 3 are to construct an approximated MVC set of graph Gt which has the complexity of O(p·M) based on the analysis by Dai et al. (2016); Khalil et al. (2017). p is the constant number of node testing steps, equals to the number of nodes divided by the number step updates in Q-Learning. M is the number of edges. In n-step Q-Learning, we update the value of each action based on the rewards of taking the sequence of n actions consecutively. n is called as the number of step updates. Suppose that the number of nodes in graph Gt is 500 and the number of step updates is 5, then p is a constant number equals to 100. One can see that p ranges from 1 to the number of nodes in graph Gt.
Getting the ordered MVC nodes in step 4 has an average O(N·logN) using QuickSort, where N is the number of nodes in Gt. Therefore, the total computational complexity of our ReProtect method is O(p·M+N·logN). The difference of ReProtect and ReProtect-p is only on training process. Similarly, we can infer that the computational complexity of ReProtect-p method is also O(p·M+N·logN).