In this section, we propose new methods for multipleturns graph protection problem in dynamic networks, namely ReProtect and ReProtectp. To restrain the spreading of epidemic in dynamic networks, we divide the protection budget wisely into several turns. The protected nodes are selected in each turn according to the currently observed temporal snapshot of dynamic network. Using the multipleturns protection, we aim to address the changing of network structure and incoming rumor or virus attacks during the temporal transition in dynamic networks.
Figure 2 illustrates our proposed method in each turn, which takes a temporal snapshot of dynamic networks at time t as an input and determines the set of protected nodes. In each given turn, we determine the most critical set of nodes of the input network. A node is considered as a critical node if it is assumed that protecting such node contribute to block largescale epidemic spreading (Chen et al. 2016; Wang et al. 2016, 2017).
The main idea of our method can be described in the following key points:
1. Minimum vertex cover (MVC)
At first, we aim to find the set of the most critical nodes in the input network. Many previous studies suggest that a certain critical node criterion is best for a certain type of network structure. For instance, degree centrality is most suitable for dense and highly centralized network (Lawyer 2015; Chen et al. 2016), while betweenness centrality and connectivity are well fit for clustered networks with the existence of graph bridges (Italiano et al. 2012; Khan et al. 2015; Lawyer 2015).
We propose to consider a minimum vertex cover (MVC) as a criterion to determine set of critical nodes from networks. Given a graph G=(V,E), a vertex cover is a subset of the nodes V_{c}⊆V such that every edge of G is connected to V_{c}. Hence, this set of nodes V_{c} in graph G cover every edge in G. A minimum vertex cover is a vertex cover with the smallest possible number of nodes. Every graph trivially has a vertex cover where V_{c}=V. Figure 3a shows the vertex cover, and Fig. 3b shows the minimum vertex cover for the same graphs. The complexity of vertex cover problem is NPComplete, and that of the minimum vertex cover problem is NPHard.
As shown in Fig. 2, our input is a static network G_{t}, the observed snapshot of dynamic network at time t. We aim to completely cover all the connections in G_{t}, which are represented by edges, by the smallest possible size of nodes. The size definition of MVC is intuitively aligned with the limited size of the protection budget in graph protection problem. Following the definition of graph protection problem, we can show the role of MVC as the protection threshold in a network.
Theorem 1
(Protection Threshold) The protection threshold is the minimum required size of S to disconnect graph G such that no propagation may occur among nodes. Given an undirected connected graph G=(V,E), a minimum vertex cover of G is also a protection threshold of G.
Proof
A vertex cover V_{c} of G is a subset of the nodes V_{c}⊆V such that (u,v)∈E⇒u∈V_{c}∨v∈V_{c}. A minimum vertex cover \(V_{c}^{*}\) is a V_{c} with the smallest size as follows:
$$ V_{c}^{*} = \underset{V_{c}}{\text{arg\ min}} V_{c} $$
(4)
Since all edges in graph G is covered by \(V_{c}^{*}\):
$$ (u,v) \in E \Rightarrow u \in V_{c}^{*} \vee v \in V_{c}^{*}, $$
(5)
then by removing all corresponding edges in G connected to \(V_{c}^{*}\) we get \(G^{(V_{c}^{*})} = \left (V_{c}^{*}, E^{(V_{c}^{*})}\right)\). Thus, \(G^{(V_{c}^{*})}\) has no edge, i.e., \(E^{(V_{c}^{*})} = \{\}, \left E^{(V_{c}^{*})}\right  = 0\).
According to Definition 1 and 2, protecting the set S of nodes in G is removing all edges of G connected to S. This is a minimax function of minimizing the size S to get the maximum edges in G covered as follows:
$$ S^{*} = \underset{S}{\text{arg\ min}} \underset{E^{(S)}}{\text{arg\ max}} E^{(S)} $$
(6)
Consequently, by protecting minimum vertex cover \(V_{c}^{*}\), i.e., \(S = V_{c}^{*}\), then G^{(S)} has no edge. Hence, a minimum vertex cover \(V_{c}^{*}\) of G is also a protection threshold of G. □
2. Topk highest degree MVC
Let us recall that MVC is a set of nodes without any requirement of ordering. Intuitively, given k budget, selecting any k nodes from \(V_{c}^{*}\) may result in a different set of nodes. Additionally, not all of the node in MVC should have the same priority to be protected within a limited budget. We consider that the more connected a node v to its neighbors in G, the more critical node v to be protected. Hence, after obtaining MVC nodes from the input network, we reorder MVC nodes using their degree value within the input network.
Suppose that at time t we are given an input temporal snapshot graph G_{t} and protection budget k_{t}. Under the constraint of limited protection budget (k_{t}), we select top k_{t} MVC nodes based on their degree value within graph G_{t}.
3. Reinforcement learning as solution approximation
Despite the protection threshold guarantee of MVC, finding the MVC nodes of graphs is NPHard (Hartman and Weigt 2006). We consider a reinforcement learning (RL) approach to approximate the solution. RL approach aims to obtain an optimal solution by maximizing the cumulative rewards without given any predefined deterministic policies (Mnih et al. 2015; Khalil et al. 2017). Such advantage enables us to exploit the known best policy while also consider exploring unknown policies to obtain an optimal solution.
More specifically, we leverage the nstep fitted QLearning (Khalil et al. 2017) to obtain MVC approximation with an efficient training process and scalable implementation. Hence, our proposed methods take the advantage of nstep QLearning (Sutton and Barto 1998) and fitted Qiteration (Riedmiller 2005).
We let the nstep fitted QLearning iteratively learn to construct a vertex cover (V_{c}) solution of the input network. We define the RL environment as follows:

State (\(\mathbb {S}\)): set of currently selected V_{c} nodes from input graph

Action (\(\mathbb {A}\)): add new node v to vertex cover set \(\mathbb {S}\)

Reward (\(\mathbb {R}\)): 1, as our goal is to get the minimum size of vertex cover, we set a penalty for adding a new node into V_{c} set.

Termination criteria: all edges are covered
To quantify how good is taking an action \(a \in \mathbb {A}\) given a state \(s \in \mathbb {S}\), in QLearning, we have the QFunction (Watkins 1989). QFunction evaluates the pair of state and action and maps it into a single value, called QValue, using the following Bellman optimality equation:
$$ Q(s,a) = r + \lambda(\max(Q(s^{\prime},a^{\prime}))) $$
(7)
where \(s \in \mathbb {S}\) is a given state, \(a \in \mathbb {A}\) is the current action, r is the current reward, λ is the discount factor of the future rewards, \(s' \in \mathbb {S}\) is the next state, and \(a' \in \mathbb {A}\) is the next action. The calculation of QFunction is performed and updated iteratively for each possible pair of state and action. The result of all QValue is stored in a table, called QTable. The best action for a given state is indicated by the highest QValue.
To obtain the maximum expected cumulative reward achievable from a given pair of state and action, we can compute the optimal QFunction, denoted as Q^{∗}, using the following equation:
$$ Q^{*}(s,a) = \max \mathbb{E} [\sum_{t \geq 0} \lambda^{t} r_{t}s_{0} = s, a_{0} = a] $$
(8)
where s_{0} and a_{0} are the initial state and action respectively, t indicates a step which consists of: observe a state, perform an action, retrieve a reward, and observe the next state.
As the number of all possible pair of state and action can be very large, calculating the QValue in QTable is not efficient. Especially, if we are handling a largesize input network, using QTable is computationally infeasible and resourceconsuming. A nonlinear function approximator can be used to estimate the optimal QFunction in Eq. (8) such that:
$$ Q(s,a,\Psi) \approx Q^{*}(s,a) $$
(9)
where Ψ is the function parameters (weights) of our nonlinear function approximator Q(s,a,Ψ). A neural network or a kernel function can be used as the nonlinear function approximator of QFunction (Sutton and Barto 1998).
Recent studies show that neural networks or convolutional neural networks achieve stateoftheart results as function approximators (Mnih et al. 2015; Sutton and Barto 1998). The neural network architecture also speed up learning in finite problems, due to the fact that it can generalize from earlier experiences to previously unseen states (Mnih et al. 2015). In this paper, we propose a convolutional neural network as the function approximator of optimal QFunction. Recall that in QFunction, our input is a given state and action to obtain QValue as output. The state is the given input graph with currently selected V_{c} nodes. The actions are the possible nodes to be included into current V_{c}. In convolutional neural network architecture, our input should represents both of those state and action. Hence, we need a same fixedlength feature representation of the graph and each of its node. Therefore, in our construction of minimum vertex cover, our function approximator in Eq. (9) will be denoted as:
$$ \hat{Q}(h(\mathbb{S}),v,\Psi) $$
(10)
where \(h(\mathbb {S})\) and v represent the fixedlength feature representation of the state \(\mathbb {S}\) and an action of adding node v using the neural network set of weights Ψ.
4. Graph embeddings as featurebased representations
We leverage an efficient and scalable graph embedding technique, called Structure2Vec (Dai et al. 2016; Khalil et al. 2017), to embed the input graph and each of its node. This graph embedding technique computes a ddimensional feature embedding μ_{v} for each node v∈V, given the current partial solution \(\mathbb {S}\).
Given a temporal snapshot graph G_{t}, we embed each node v by constructing a ddimensional embedding μ_{v}. All of \(\mu _{v}^{(0)}\) entries are initialized as zero, and for every v∈V we update it iteratively in T iterations as follows:
$$ \mu_{v}^{(t+1)}= \text{ReLU }(\psi_{1} x_{v} + \psi_{2} \sum_{u \in N(v)} \mu_{u}^{(t)} + \psi_{3} \sum_{u \in N(v)} \text{ReLU} (\psi_{4} w(u,v))), $$
(11)
with x_{v} is node v own tag, whether being already selected or not. Selected node will be given tag = 1, otherwise 0. N(v) is the set of neighbors of node v in graph G_{t}. \(\sum _{u \in N(v)} \mu _{u}^{(t)} \) is the feature of node v neighbors. w(u,v) is the neighbors’ edge weight, to consider the weighted connection in weighted graph. ψ_{1},ψ_{2},ψ_{3}, and ψ_{4} are the function parameters (weights) which specified as \(\psi _{1} \in \mathbb {R}^{d}\), \(\psi _{2} \in \mathbb {R}^{dxd}\), \(\psi _{3} \in \mathbb {R}^{dxd}\), and \(\psi _{4} \in \mathbb {R}^{d}\). ReLU is the rectifier linear unit activation function applied elementwise to input where ReLU(x)=x if x>0 and 0 otherwise.
Here we will explain how to get the function \(\hat {Q}(h(\mathbb {S}_{t}),v; \Psi)\) in Eq. (10). Once the embedding μ_{v} for each node v∈V is calculated using Eq. (11) after T iteration, we get \(\mu _{v}^{(T)}\). The pooled embedding of the entire graph G_{t} is then given by
$$ \sum_{u \in V} \mu_{u}^{(T)} $$
(12)
Then we can use it to estimate the optimal QFunction in Eq. (10) as follows:
$$ \hat{Q}(h(\mathbb{S}),v;\Psi) = \psi_{5}^{\top} \text{ReLU }\left(\text{concat} \left(\psi_{6} \sum_{u \in V} \mu_{u}^{(T)}, \psi_{7} \mu_{v}^{(T)}\right)\right), $$
(13)
being \(\sum _{u \in V} \mu _{u}^{(T)}\) is the pooled embedding of the entire graph. ψ_{5},ψ_{6}, and ψ_{7} are the neural network parameters (weights) which specified as \(\psi _{5} \in \mathbb {R}^{2d}\), \(\psi _{6} \in \mathbb {R}^{dxd}\), and \(\psi _{7} \in \mathbb {R}^{dxd}\).
To this end, we show that the pooled embedding of the entire graph is used as a surrogate to represent the state. And the embedding of each node is used as a surrogate to represent the action. The function \(\hat {Q}(h(\mathbb {S}),v)\) is depend on the collection of seven parameters \(\Psi = \{\psi _{i}\}_{i=1}^{7}\) which are learned during the training phase and will be evaluated during the evaluation phase. Figure 4 shows the architecture illustration of neural networks used in this paper.
a. Training Phase
Algorithm 1 illustrates our proposed training phase. In each training iteration, our method returns the neural network’s set of parameters Ψ which successfully get V_{c} from graph G. In line 5, we specify how to select a new node by balancing exploration and exploitation. In this case, the exploration means selecting a random nodes with probability ε. The exploitation means we aim to get the maximum expected cummulative rewards, i.e. by selecting a node which maximizes the function \(\hat {Q}(h(\mathbb {S}_{t}),v; \Psi)\). \(h(\mathbb {S}_{t})\) is the embedding of state \(\mathbb {S}\) at step t. The exploration probability ε is set to decrease from 1.0 to 0.05 linear to the iteration step. To efficiently train the neural network, we perform batch processing as described in line 9.
The loss function which learned to minimize is as follows:
$$ \left(y  \hat{Q}(h(\mathbb{S}_{t}), v_{t}; \Psi)\right)^{2} $$
(14)
being \(y = \sum _{i=0}^{n1} r(S_{t+i},v_{t+i}) + \lambda \max _{v}' \hat {Q}\left (h(\mathbb {S}_{t+n}), v'; \Psi \right)\). n is the number of step updates.
b. Evaluation Phase
Algorithm 2 illustrates the evaluation phase of our proposed method. To get the besttrained neural network’s set of parameters (weights) Ψ^{∗}, we evaluate the training result against a set of given graph G^{D} available snapshots. We use this neural network set of parameters in the testing simulation of the graph protection.
c. Testing Phase
Algorithm 3 shows the testing phase of multipleturns graph protection strategy on dynamic networks. We are given an input snapshot of graph G_{t} and budget k_{t}. Each node in G_{t} is embedded into ddimensional feature vector. The size of d is equal to the embedding size during training in Algorithm 1. The minimum vertex cover of G_{t} is then constructed using the besttrained neural network’s set of parameters Ψ^{∗} resulted from Algorithm 2. Finally, we get a set S of top k_{t} degreeordered MVC nodes to be protected from the current temporal snapshot of graph G_{t}.
We also propose ReProtectp method, a variant of ReProtect, which trained on the perturbed version of each available snapshot of dynamic networks. The perturbation is performed by removing edges probabilistically from the snapshot graph. Specifically, for each edge, we generate a random number. If the edge weight is smaller than the generated random number, the edge will be removed. We introduce this variant to provide more variety to the training data and avoid possible overfitting issue.
Computational complexity analysis
Based on Algorithm 3, we present the analysis of computational complexity of our proposed ReProtect method. The cost of step 1 to initialize empty set S is constant. The step 2 and 3 are to construct an approximated MVC set of graph G_{t} which has the complexity of O(p·M) based on the analysis by Dai et al. (2016); Khalil et al. (2017). p is the constant number of node testing steps, equals to the number of nodes divided by the number step updates in QLearning. M is the number of edges. In nstep QLearning, we update the value of each action based on the rewards of taking the sequence of n actions consecutively. n is called as the number of step updates. Suppose that the number of nodes in graph G_{t} is 500 and the number of step updates is 5, then p is a constant number equals to 100. One can see that p ranges from 1 to the number of nodes in graph G_{t}.
Getting the ordered MVC nodes in step 4 has an average O(N·logN) using QuickSort, where N is the number of nodes in G_{t}. Therefore, the total computational complexity of our ReProtect method is O(p·M+N·logN). The difference of ReProtect and ReProtectp is only on training process. Similarly, we can infer that the computational complexity of ReProtectp method is also O(p·M+N·logN).