Link prediction for ex ante influence maximization on temporal networks

Influence maximization (IM) is the task of finding the most important nodes in order to maximize the spread of influence or information on a network. This task is typically studied on static or temporal networks where the complete topology of the graph is known. In practice, however, the seed nodes must be selected before observing the future evolution of the network. In this work, we consider this realistic ex ante setting where p time steps of the network have been observed before selecting the seed nodes. Then the influence is calculated after the network continues to evolve for a total of T>p\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T>p$$\end{document} time steps. We address this problem by using statistical, non-negative matrix factorization and graph neural networks link prediction algorithms to predict the future evolution of the network, and then apply existing influence maximization algorithms on the predicted networks. Additionally, the output of the link prediction methods can be used to construct novel IM algorithms. We apply the proposed methods to eight real-world and synthetic networks to compare their performance using the susceptible-infected (SI) diffusion model. We demonstrate that it is possible to construct quality seed sets in the ex ante setting as we achieve influence spread within 87% of the optimal spread on seven of eight network. In many settings, choosing seed nodes based only historical edges provides results comparable to the results treating the future graph snapshots as known. The proposed heuristics based on the link prediction model are also some of the best-performing methods. These findings indicate that, for these eight networks under the SI model, the latent process which determines the most influential nodes may not have large temporal variation. Thus, knowing the future status of the network is not necessary to obtain good results for ex ante IM.


Introduction
Influence maximization (IM) is a canonical problem in the computational analysis of social networks where the goal is to find seed nodes that maximize the reach of information diffusion (Kempe et al. 2003;Chen et al. 2009;Li et al. 2018).Ever since Kempe et al. 's seminal paper (Kempe et al. 2003), this topic has attracted great attention from researchers across various domains.The IM problem is typically studied under one of two settings.The first considers a static network where the information spreads through the network over time, but the topology of the network remains fixed.This assumption is violated in many real-world settings, so recently there has also been interest in developing methods where the network structure is also allowed to vary with time (Holme and Saramäki 2012).But there are also unrealistic assumptions that typically accompany this case.For a given temporal network G = (G 1 , . . ., G T ) , it is typically assumed that the network topology is known for all t ∈ {1, . . ., T } , i.e., the ex post assumption.Then at time t = 1 , a researcher chooses a seed set to maximize the influence on the evolv- ing network.In other words, the researcher can "peer into the future" to pick the most influential nodes at time t = 1 .Assuming complete foresight of the network evolution, however, is unrealistic.In practice, one must choose the seed nodes based on the past evolution of the network without knowing exactly what the network will look like in the future.In the temporal network literature, this is known as the ex ante setting where the solution is based on forecasts and not actual results.
In this paper, we consider the more realistic and difficult setting highlighted in Fig. 1.Here, the researcher observes some network G 1 , . . ., G p from t = 1, . . ., p .Then after observing these first p snapshots of the network, he/she chooses the seed nodes which will maximize the influence over the next T − p snapshots, G p+1 , . . ., G T , before observ- ing them.Then the number of influenced nodes after the spreading process has finished on the final snapshot is calculated.The formal problem statement is as follows: given some network G = (G 1 , . . ., G p , . . ., G T ) evolving over time, what seed nodes should be chosen at time t = pbased only on G 1 , . . ., G p in order to maximize the influence on the network at time t = T + 1?

Influence maximization problem
We begin by defining the influence maximization (IM) problem.Let G = (G 1 , . . ., G T ) be a temporal network where the graph at time t, G t = (V , E t ) is a collection of nodes V and edges E t .Notice that the nodes are fixed but the edge set is allowed to vary over time.Each snapshot G t is defined by an n × n adjacency matrix A (t) such that A (t) ij = 1 if nodes i and j have an edge at time t, and 0 otherwise.Given some process by which information is diffused on the network and integer k < n , the IM problem tries to find a set of nodes S = {v 1 , . . ., v k } such that, if the nodes in S are initially "influenced" or "infected" with the information, then the spread of influence on the network is maximized at time t = T + 1.
Crucial to this problem is the information diffusion mechanism.There are many popular choices in the literature including the Independent Cascade (IC) model (Wang et al. 2012; Wen et al. 2017), Linear Threshold (LT) model (Chen et al. 2010b;Goyal et al. 2011b), triggering (TR) model (Kempe et al. 2003;Tang et al. 2014), and more.For this work, we adopt the susceptible-infected (SI) model often used in epidemiological studies (Allen 1994;Murata and Koga 2018).At each step in the process,1 a node is either in a susceptible state (S)2 or infected state (I).If a node is in the S state, then the information has not yet reached it, and a node in the I state has already received the information.At the beginning of the process ( t = 1 ), nodes in the seed set are set to the I state and all other nodes are in the S state.Then at time t, if node u is in I and node v is in S and there is an edge between the two nodes, i.e., A (t)  uv = 1 , then node v will change to state I at time t + 1 with probability .The susceptibility or infection parameter controls the rate at which information propagates throughout the network.The diffusion process ends once t = T + 1 .Thus, if σ (S) is the expected number of nodes infected at time T + 1 when nodes in S are initialized to I state under the SI model, the IM problem seeks the set of seed nodes of size k that maximizes σ (S) on G , i.e., The IM problem is NP-hard under the IC, LT, TR and SI models (see, e.g., Kempe et al. 2003;Li et al. 2018).Thus, the optimal solution is unfeasible in many cases so heuristics must be exploited to find a suitable seed set.Indeed, even evaluating σ (S) is #P-hard (e.g., Chen et al. 2010a, b).In practice, we estimate σ (S) via Monte Carlo (MC) simula- tions by simulating the spreading process a large number of times and taking the average number of nodes infected at time T + 1.
Notice that in this definition, the future evolution of the network is taken as given.In other words, the seed nodes are selected at time t = 1 under the assumption that the topology of the graph at time t = 2, . . ., T is known.In practice, however, this is an unreasonable assumption; a practitioner needs to select seed nodes without having access to the future dynamics of the network.While we are interested in this more realistic and difficult setting, we begin by highlighting some of the existing IM algorithms.

Static IM
We first highlight some of the existing approaches for IM in the static case, i.e., T = 1 .Please see Li et al. (2018) for a comprehensive survey.Kempe et al. (2003) was the first to postulate this as a combinatorial optimization problem and used a greedy algorithm to find the optimal seed set.At each step in the greedy algorithm, the node which maximizes the marginal gain in σ (S) is added, and the process continues until |S| = k .Math- ematically, node v is added to S where (1) The authors prove that the solution yielded by the greedy algorithm is within a factor of (1 − 1/e) of the optimal solution.Thus, in studies it is often considered the "gold-stand- ard" of IM.The approach, however, is extremely computationally expensive and, therefore, infeasible for large networks.Efforts to make this algorithm more efficient include estimating the upper bound on the marginal influence (Leskovec et al. 2007;Goyal et al. 2011a) and simplifying the calculation of σ (S) (Wang et al. 2010).Another class of IM algorithms ranks nodes according to some metric and then selects the k nodes with largest metric value as the seed nodes.For example, in Chen et al. (2009), nodes are ranked according to their degree and once a node u is selected for the core, all nodes which share an edge with u have their degree "discounted" by a specified factor to take into account the nodes' overlap in influence.Liu et al. (2014) takes a similar approach for PageRank.These approaches avoid calculating σ (S) which leads to computational speed- ups, but lack performance guarantees.

Temporal IM
Assuming that information diffuses on a static network is often times unreasonable, so recently there have been many works that study the IM problem for temporal or dynamic networks.First, the Greedy algorithm of Kempe can easily be extended to the temporal case and many extensions of this method have been studied (e.g., Liqing et al. 2019;Erkol et al. 2020).Michalski et al. (2014) show that temporal IM methods greatly outperform their static counterparts under the LT model.While this paper assumes that the future snapshots of the network are unknown, there is no attempt to forecast its evolution.Osawa and Murata (2015) adopt the SI model and develop a heuristic method by approximating the probability that a node is infected in the next time step.This method performs similarly to a greedy algorithm but is significantly faster.Several static network IM heuristics are extended in Murata and Koga (2018) including a Dynamic Degree discount algorithm, based on dynamic degrees from Yu et al. (2010).This method is faster than greedy and Osawa and Murata (2015) while yielding comparable influence spread.Lastly, Erkol et al. (2020) leverage the SIR model and use an individual node mean field approximation to compute the expected influence with a greedy algorithm.One interesting finding from this work is that simply using the first temporal layer of the network to find the seed nodes can often times still yield good performance.This paper also briefly mentions the problem we are interested in, where the future evolution of the network is unknown, but do not thoroughly explore it.

Main contributions
In this paper, we study the temporal IM problem under the realistic ex ante setting where the future evolution of the network is unknown.Given the observed network snapshots, we first predict the future topology of the network using statistical, non-negative matrix factorization and graph neural network link prediction techniques.We also propose a novel heuristic for IM based on the link prediction model output.Then, we use greedy and dynamic degree IM algorithms to find the optimal seed nodes on the estimated future networks.We conduct extensive experiments on synthetic and realworld networks and show that, in almost all cases, finding the optimal seed nodes on an aggregated graph using only the historical snapshots yields an influence spread within 80% of the influence spread when the seed nodes are found using the actual future evolution.Additionally, the proposed IM heuristics yield influence spreads as good or better than actually predicting the future network evolution.These results together indicate the potential existence of an influential node latent process that does not vary temporally.More importantly, the IM problem can still be solved with good performance in this realistic and difficult setting.The remainder of this paper is structured as follows.In "Methodology" section, we propose our methodology and discuss various link prediction and IM algorithms.We also propose a novel ex ante IM algorithm based on the link prediction models.We conduct experiments on eight synthetic and real-world networks in "Experiments" section and share concluding thoughts in "Conclusions" section.
Finally, the most similar works to ours consider a temporally evolving network and selecting a new seed set at each snapshot.For example, Singh and Kailasam (2021) adopt the IC model for spreading dynamics and a conditionally temporal restricted Boltzmann machine (ctRBM) for link prediction (Li et al. 2014).The authors propose to choose new seed nodes for each graph snapshot and update the set using an Interchange Heuristic (Nemhauser et al. 1978).Our paper differs from that of Singh's in several key areas including, but not limited to: only selecting a seed set once, predicting multiple time steps in the future, allowing edges to form and disappear, comparing several link prediction algorithms, and using the SI model.Our problem is also different than that of Zhuang et al. (2013) and Han et al. (2017).These papers assume that the future networks are unobserved but that they can be partially known through probing different nodes.Conversely, our work assumes the future networks are completely unknown.

Methodology
The goal of this paper is to develop a method for ex ante influence maximization where seed nodes are selected before observing the future evolution of the network.We propose the following approach: 1. Observe networks G 1 , . . ., G p . 2. Predict the future evolution of the network Ĝp+1 , . . ., ĜT based on the observed net- works.3. Apply an IM algorithm to the predicted networks Ĝp+1 , . . ., ĜT treating t = p + 1 as the starting time to obtain optimal seed nodes S = {v 1 , . . ., v k }. 4. Allow the network to continue to evolve as G p+1 , . . ., G T and compute the influence of S on the true networks.
There are two key components in the proposed method: step (2) predicting the future networks and step (3) the IM algorithm.

Link prediction
The goal of link prediction methods is determining the most likely missing and/ or future links in a network.There is a rich body of literature on this problem with numerous approaches such as statistical, non-negative matrix factorization (NMF) and graph neural networks (GNN).We refer the interested reader to the following surveys for a thorough review of link prediction methods (Lü and Zhou 2011;Kumar et al. 2020;Divakaran and Mohan 2020;Zhou 2021).In general, static link prediction methods look for missing links in the network, thereby not allowing for the possibility of removing links.Temporal methods, on the other hand, must account for the possibility of new edges arising and existing edges disappearing.In our scenario, we are interested in the dynamic evolution of the network so we are interested in temporal link prediction methods.
For this work, we consider one link prediction method from each of the following popular paradigms: statistical, NMF and GNN.
Statistical: Zou et al. (2021) use a linear regression model with LASSO penalty to predict future links in the network.We slightly modify their approach to perform logistic regression with LASSO.Let x i (t) = 1 if node pair i had an edge at time t and 0 otherwise, for i = 1, . . ., M where M is the number of node pairs with at least one link in G 1 , . . ., G p .Then for i = 1, . . ., M , where Thus for each node pair i, we fit a logistic regression model to find the best fitting β i = (β i0 , β i1 , . . ., β iM ) T .Since many edge pairs are likely uninformative in this model, the authors add a LASSO penalty ( L 1 regularization) to shrink the absolute value of each β ij to 0, i.e., α M j=1 |β ij | where α determines the strength of the penalty.A small value of α corresponds to little regularization and vice-versa for a large value.The optimal α is chosen from a grid of values to minimize the validations set area under the receiver operating curve (AUC).To predict the probability of an edge at several time steps in the future pi (p + t) , we sequentially use the fitted βi and the estimated probabilities of an edge, i.e., where The main advantage of this method is that it yields a valid probability of a link for each edge pair and not simply a similarity score like the following methods.We can also predict edges multiple time steps in the future without having to re-fit the model.A notable limitation of this method is that it can only predict links for edge pairs with at least one historical edge.Additionally, the simplicity of the linear model may not be able to capture the complex mechanism behind link formation and fitting separate (3a) .
models for every edge pair means that method may not scale well for networks with a large number of edges.
Non-negative matrix factorization: For a given n × n matrix A, non-negative matrix factorization (NMF) seeks to find an n × q matrix U and q × n matrix V such that A ≈ UV , q < n and all entries of U and V are non-negative.In other words, U and V are low-dimensional representations of A. Ahmed et al. (2018) applies this approach for link prediction in temporal networks.Let A 1 , . . ., A p be the adjacency matrices correspond- ing to graphs G 1 , . . ., G p .Then the author seek to find a sequence of matrices U t , V t such that A t ≈ U t V t and all U t and V t are close to some consensus matrices U * and V * , respec- tively.Mathematically, this means minimizing the following loss function: where subject to U t , V t are non-negative.Here, φ is an attenuation coefficient gives a larger weight to more recent graphs and || • || F is the Frobenius norm.The authors derive an iterative algorithm to minimize (5a).Once the algorithm has converged, the the rows of V * represent a low-dimensional embedding for each node, (V * ) i .Thus, S ij = sim{(V * ) i , (V * ) j } is a score for the likelihood of a link between nodes i and j at time p + 1 where (V * ) k is the kth row of V * and sim(•, •) is a measure of similarity.In this work, we use the cosine similarity.To extend this method to predict multiple time steps in the future, we first use S ij to predict A t+1 using a threshold cutoff (see below for further discussion).Then the appropriate terms are added to the loss function to find the nonnegative matrix factorization of this new adjacency matrix, i.e., Once the algorithm converges, we obtain a new estimate of V * and this process continues.
This method captures the temporal patterns of the network in the low-dimensional representations U * and V * .Additionally, this method can predict a link for any edge pair, even if there has yet to be one.Yet, there are several challenges to using this method in our context.First, it is well-known that NMF is a non-convex optimization problem so we may end up with a local optimum.Additionally, we must chose q, the latent dimension space of the non-negative matrices, and φ , the attenuation factor. (5a) Graph neural network: We also consider a state-of-the-art deep learning graph neural network method for link prediction called EvolveGCN (Pareja et al. 2020).The basic idea of this method is that for a given time t, this method performs a graph convolution step on G t and then the corresponding weights are updated in the temporal direction using a recurrent neural network (RNN).Specifically, let A t , H (ℓ)  t and W (ℓ) t be the adjacency, node embedding and weight matrices, respectively, at time t ∈ {1, . . ., p} for layer ℓ ∈ {1, . . ., L} .Then the node embeddings for time t are updated using a graph convolu- tion step, i.e., The initial embedding matrix H (0) t is the node features at time t and the GCONV function simply normalizes the adjacency matrix before multiplying the normalized adjacency matrix with the other two inputs.Next, the authors propose two ways to temporally update the weight matrices for each graph convolution layer.The -H version treats W (ℓ)  t as the hidden state of a dynamical system and updates the weights using the current node embeddings H (ℓ)  t via a gated recurrent unit.The -O approach ignores the node embeddings and instead updates the weights using a long short term memory (LSTM) cell.These two steps together make up the EvolveGCN framework.
This approach can easily be leveraged for link prediction.In particular, if h i p and h j p are, respectively, the ith and jth rows of the final embedding matrix H p , then their dot prod- uct yields a similarity score S ij for the likelihood of a link between the nodes i and j, i.e., S ij = (h i p ) T h j p .The larger S ij , the more we expect an edge to exist between nodes i and j at time p + 1 , similar to the NMF approach.As such, this method can predict a link for any edge pair, regardless of whether there has been a historical link.Note that negative sampling and cross-entropy loss function are used to optimize the weights.In order to extend this method for predicting multiple time steps ahead, we first predict the state of the network one time step ahead, Ĝp+1 .Then this network can be fed into the fitted model to predict the ensuing time step Ĝp+1 and so on.
There are several practical considerations for using this method for the IM problem.First, this method requires node attributes.If these are unavailable, we compute the node2vec (Grover and Leskovec 2016) embedding for each node at each snapshot and then use the output as the node feature.Since we believe that the graph structure plays a bigger role in the evolution of the network rather than these node features, we choose the -O version, per the authors' suggestion.This method scales well for large networks and can capture complicated dynamics driving link formation.This method, however, relies on negative sampling which is an important open problem in GNNs (e.g.Robinson et al. 2020).It also requires node features and it is not clear if node2vec is the optimal choice in the absence of meaningful domain features.
From output to predicted edges: Each link prediction method yields either a probability or similarity score for each edge pair.But this still leaves the important step of converting these continuous outcomes to a binary prediction of "link" or "no link." To the knowledge of the authors, using link prediction methods for downstream tasks has received relatively little attention in the literature so this is a non-trivial step.The approach that we adopt preserves the "average" edge density in the network.Specifically, let ρt be the average probability of an edge for snapshot t for t = 1, . . ., p , i.e., ( 7) Instead of taking the average of each, we give a larger weight to the more recent values.Then the weighted, average edge density is where 0 ≤ ξ ≤ 1 .Thus, for each method, we select the top n 2 ρ * edge pairs and predict that they will have a link at the next time step.This also ensures that each method predicts the same number of edges for each future time step.

Influence maximization
Once we have predicted the future evolution of the network, the second step of the proposed method is the IM algorithm.While there exists many temporal IM algorithms (e.g., Michalski et al. 2014;Osawa and Murata 2015;Erkol et al. 2020), we consider a greedy algorithm and the dynamic degree discount algorithm (Murata and Koga 2018).
Greedy: The first algorithm that we consider is based on a greedy heuristic.At each step, a node is added to the seed set which results in the greatest expected marginal gain in influence spread (Algorithm 1).Kempe et al. (2003) prove that for the IC and LT static propagation model, the greedy algorithm yields a solution within a factor of (1 − 1/e) of the optimal spread so it is considered the "gold standard" in IM problems.This method also trivially accommodates any diffusion process.Since it requires computing the expected influence spread for O(n) nodes at each of k steps, however, the algorithm is computationally intensive and only feasible on small networks.

Dynamic degree discount:
In order to address the computational complexity of the greedy algorithm, Murata and Koga (2018) propose an IM heuristic algorithm based on dynamic degrees.First, the authors define the dynamic degree, D T (v) , of node v for a temporal network with T snapshots as where N v,t is the set of neighbors of node v at time t, i.e., N v,t = {u ∈ V : A (t) uv = 1} .Then they extend the Degree Discount algorithm for static networks by Chen et al. (2009) to the temporal case.This method chooses the k nodes with largest dynamic degree where (8 the effect of selected nodes is removed or "discounted" from the remaining nodes.See Algorithm 2 for details.Murata and Koga (2018) show that this method yields comparable results with the greedy algorithm but at a fraction of the run time.

Link prediction output heuristic
Thus far, we have laid out the path to combine link prediction algorithm with IM to find the seed nodes which maximize the influence on the unobserved future evolution of the network.Since we are primarily focused on IM, we are not strictly interested in the future evolution of the network, so much as in determining which nodes are the most "important" in the future.Thus, predicting the exact future evolution is not completely necessary.Additionally, we have seen how it is non-trivial to choose a cutoff to determine which edges to predict for the future and any sort of threshold that we use to predict edges will inherently lose information.Therefore, we propose the following IM heuristics based on the fitted link prediction model to determine the most likely influential nodes in the future networks.The idea is that if a node is likely to have edges with many other nodes, then it is also likely to be a good candidate for the IM seed set.Thus, if P is an n × n matrix returned by a link prediction algorithm where Pij is the probability of an edge or similarity between nodes i and j, the column sums of P yield a useful measure of node importance for the IM task.Below, we describe the specific procedure for each method and include a general Algorithm in 3.

LogRegSum:
The output of LogReg is the probability of a link for each edge pair that has previously been observed, i.e., p(p+1) ij := Pij is the probability of an edge at time t = p + 1 returned by the algorithm where P ij = 0 if A (t) ij = 0 for t ∈ {1, . . ., p} .Thus, for each node i, we can sum the probabilities of node i having an edge with every other node as a measure of importance, i.e., θ i = j P(p+1)

ij
. Then the k nodes with largest value of θ i are selected as the seed nodes.
NMFSum: The output of NMF is the similarity score Pij for each link (i, j), computed as the cosine similarity between (V * ) i and (V * ) j , i.e., Pij = cos((V * ) i , (V * ) j ) .The k nodes with largest θ i = j Pij are again chosen as seed nodes.
GNNSum: Once the GNN model is fit, the similarity score Pij for each link (i, j) is the dot product of h i p and h j p where h k p is the kth row of the embedding matrix for time t = p , i.e., Pij = (h i p ) T h j p and the seed nodes are chosen as in the other two cases.There are several desirable features of these heuristics.First, they do not require a cutoff to predict future edges which is non-trivial.The lack of a threshold also means that we do not lose any information from the link prediction output; all of the information in the model is incorporated into the θ value.Similarly, we found in practice that often times the link prediction output will only have n p < n nodes which are active, i.e., have at least one predicted future edge.If n p < k , then it is not clear how to chose the other k − n p nodes to include in the seed set.These heuristics, however, can always select k seed nodes for any k.Another advantage is that we get the IM results for "free" after fitting the link prediction model, so it will be faster than any method that requires a second IM step, i.e., greedy or DynDeg.Lastly, the seed node selection does not depend on how far in the future we want to predict, nor the infection parameter , which is generally unknown in practice.

Experiments
We perform experiments on one synthetic network and seven real-world networks from various domains in order to compare the different link prediction and IM algorithms.

Datasets
We begin by briefly describing each dataset used in these experiments.See Table 1 for some relevant summary statistics.Synthetic is a synthetically generated network.For the first time step, an Erdos-Renyi (Erdös and Renyi 1959) graph was generated with p = 0.002 .Then for all subsequent time steps, 50% of the previous edges were kept, and an equal amount were newly generated, thus preserving the total number of edges for and communication on a social media platform in College.These networks were selected because they have a wide range of nodes and edges, comprise different network-generating mechanisms, and come from a variety of domains.We stress that we intentionally chose different network sizes (n, m), numbers of aggregation layers (T, p), seed sizes (k) and infection parameters in order to compare the methods across a wide range of settings.Note that varies between networks in order to ensure a reasonable amount of influence spread, and to be able to discriminate between the different methods.

Methods
For each method, we use the given algorithm to find the optimal seed set on the predicted future graphs using different link prediction approaches.Then, using these seed sets, we compute the influence spread on the actual future evolution of the network, taking the average results over 1000 MC samples.We seek to find the methods with largest influence spread.
• Oracle: assumes the future evolution of the network is known when finding the seeds nodes at the IM step, i.e., finds the nodes which maximize the spread on G p+1 , . . ., G T .• Static (last): uses G p to find the optimal seed nodes and, therefore, implicitly assumes that the network does not continue to evolve • Static (mem): constructs G mem where there is a link between every edge pair with at least one link in G 1 , . . ., G p .Mathematically, if A (t) is the adjacency matrix associated with G t and A mem is for G mem , then A mem ij = 1 if A (t) ij = 1 for any t ∈ {1, . . ., p} and 0 otherwise and IM algorithms are implemented on G mem .• JC: a simple link prediction heuristic based on Jaccard coefficients (JCs) (Liben-Nowell and Kleinberg 2003).Consider G p and let nodes u and v be such that there is no edge between them, i.e., A uv = 0 .Then the JC of nodes u and v is where N i is the neighbors of node i.We find the JC for all edge pairs without a link at time p and predict an edge at time p + 1 for the node pairs corresponding to the largest 5% of JCs.In order to preserve the density of the network, we also randomly remove 5% of edges.Once we have obtained Ĝp+1 , then this process can be repeated in order to predict the evolution of the network multiple steps in the future.• LogReg: Uses the logistic regression method with LASSO penalty.The optimal penalty parameter is chosen from a grid search based on the best validation AUC using 75% of the network time steps for training.The cutoff is chosen to preserve ρ * , the average network sparsity.• LogRegSum: Top k nodes chosen as seed nodes based on LogReg heuristic described in Sect.2.3 • NMF: Non-negative matrix factorization method.The dimension of U, V is 0.05n, i.e., five percent of the number of nodes.The algorithm is run 25 times with random initialization and the results with the lowest loss function are kept.The cutoff is chosen to preserve ρ * .• NMFSum: Top k nodes chosen as seed nodes based on NMF heuristic described in Sect.2.3 • GNN: EvolveGCN method.Node features are constructed using d = 16 dimension node2vec embeddings.The model is trained over 200 epochs and the cutoff is chosen to preserve ρ * .• GNNSum: Top k nodes chosen as seed nodes based on GNN heuristic described in Sect.2.3 Note that the dynamic degree algorithm does not apply to static networks-Static (last) and Static (mem)-so we apply a simple algorithm based on degrees.Please see the appendix for details.

Synthetic
First, we generate a synthetic network using the process described in Sect.3.1.This network has n = 500 nodes and m = 5174 unique edges.There are T = 20 time steps with the first p = 10 used for training and the last T − p = 10 used for prediction.We fix = 0.05 and vary the size of the seed set.The results are in Fig. 2a where we imple- mented the Dynamic Degree algorithm.LogReg yields the largest influence spread while NMFSum and GNNsum yield the smallest, but all methods are approximately the same.The total influence spread roughly increases linearly with k for each method.When k = 50 , the influence spread from LogReg is about 87% of that of Oracle.Note that the link prediction task for this network is extremely challenging as the process of adding and removing nodes is random and the time horizon for prediction is long compared with the number of training time steps.

Reality
The first real-world data set is the Reality network from Eagle and Pentland (2006).This network has n = 64 nodes and m = 26, 260 edges where links were recorded every 5 s over an 8.63 h period.We aggregate the network into evenly-spaced snapshots G 1 , . . ., G T where T = 24 .We fix = 0.10, p = 20, T − p = 4 and vary the size of the seed set.The results are in Fig. 2b using the Greedy algorithm.Note that Static (last), JC and GNN have a * because each method has less than k active nodes for large k.Static (mem) and LogRegSum perform the best for almost all k and come within about 67% of the influence spread of Oracle.Both LogRegSum and GNNSum perform substantially better than LogReg and GNN, respectively, whereas NMFSum fares much worse than NMF.Static (last) and JC do not perform well in this scenario.

Email
Next, we consider an Email network (email4) from Michalski et al. (2011).This network has n = 167 nodes and m = 82, 927 edges collected over 271 days with a granularity of 1 s.We aggregate the network into roughly one-week snapshots G 1 , . . ., G T where T = 39 .Then we consider the first p = 30 for training the link prediction algorithms

High school
The High School 1 network comes from Mastrandrea et al. (2015).This network has n = 312 nodes and m = 2242 unique edges collected at 20 s intervals over 5 h.We aggre- gate the network into roughly 15-min snapshots G 1 , . . ., G T where T = 20 .Then we

Office
The Office network comes from Génois et al. (2015) and consists of n = 92 nodes and m = 755 unique edges.There are no links in the temporal middle of this dataset, which is presumably the weekend when no workers interact at the office.Dropping these times, we aggregate the data into T = 7 snapshots, representing one work day and predict the

Hospital
With n = 75 nodes and m = 1139 unique links, the hospital network (Vanhems et al. 2013) is the next that we consider.We aggregate the network at six hour intervals, yielding T = 16 snapshots and we use the first p = 12 for training the models and the final day for prediction.The results are in Fig. 4b using the Greedy algorithm and = 0.10 .Copenhagen bluetooth Sapiezynski et al. (2019) collected the Copenhagen Bluetooth network with n = 703 nodes and m = 21, 318 unique edges.Data was collected over several weeks at 5 min intervals.We aggregate the network at T = 100 evenly spaced intervals and use the first p = 90 for training the model.We set = 0.05 , vary the seed size k and compute the optimal seed nodes using the Dynamic Degree algorithm.The results are in Fig. 5a.Static (mem) yields the largest influence spread and comes within 99% of the spread of Oracle.LogRegSum has almost the same influence as Static (mem), especially for large k.
Part of the reason that Static (last) does not perform as well for large k is that it has less than 150 active nodes, similar to GNN and JC.NMF and NMFSum perform similarly and are comparable to GNNSum when k ≥ 100.

College
The college network Panzarasa et al. (2009) has m = 13, 838 unique edges among n = 1899 nodes.Links are recorded at 1 s intervals over the 193 days.We aggregate the network into T = 50 snapshots over equally spaced intervals and hold out the last T − p = 10 snapshots for prediction.We fix = 0.25 and vary k, using the Dynamic Degree algorithm to find the optimal seed nodes.The results are in Fig. 5b.For smaller k, Static (last) and JC yield the greatest influence.For larger k, however, Static (mem), LogReg and LogRegSum do the best and come within 96% of Oracle.Static (last) and JC are again hampered by having less than k active nodes for large k.NMF and NFMSum continue to yield the smallest influence spread.

Discussion
There are several interests trends that emerge from these experiments.First, we observed that for every data set, save Reality, the best performing method came within 87% of the influence spread of the Oracle method, and in one case (Copenhagen Bluetooth), was as high as 99%.This is a promising finding as it shows that meaningful seed nodes can be found for IM, even when the future evolution of the network is unobserved.Of the proposed methods, LogRegSum proved itself to be the best method as it yielded the largest influence spread for many settings, followed by GNN/GNNSum, while NMF/NMFSum regularly yielded the smallest influence.Perhaps the most interesting finding, however, is that the simple heuristic based solely on historical edges, Static (mem), consistently yielded one of the best performances.These results held across networks with different sizes, temporal characteristics, prediction duration, and IM algorithms.Because Static (mem) does not have a link prediction step, it is substantially faster than all of the proposed methods (including LogRegSum), so we suggest this method for use in practice.
There are also some noteworthy connections between the temporal network statistics from Table 1 and the IM results.For example, in the Email 4 network, all methods may have performed well because this network has less temporal variation.fNT, fLT, FNT and FLT are all higher in this data set than for other networks, indicating there might be less temporal variation.For instance, FNT being large means that the majority of nodes are present at the beginning and end of the network's life cycle (and presumably in-between as well).Another observation is that the GNN methods do comparably worse in networks with a strong community structure (High School 1 and Copenhagen Bluetooth), as measured by degree associativity.Interestingly, even though the College network has very few nodes and edges in the first and last 5% of the sampling time which indicates significant temporal variation, several methods still perform well.Static (mem) also performs well on sparser networks (High School 1, Copenhagen and College).Having fewer edges in the network likely makes the link prediction task more difficult, so aggregating across time turns out to be the best way to illuminate a node's influence.
In the Appendix, we also report the true positive rate (TPR) and relative degree mean square error (MSE) for JC, LogReg, NMF and GNN on each network which quantifies the quality of the link predictions.These results highlight the difficulty of link prediction as a TPR greater than 0.40 is never achieved, nor is an MSE below 0.60.
The strong performance of LogRegSum and Static (mem) yields some key insights into the ex ante IM task on temporal networks.It first demonstrates that predicting the future evolution of the network is not strictly necessary to determine the most influential seed nodes.Instead, the majority of the information needed to select seed nodes can be extracted from the previous edge history while the actual evolution of the network does little to change the importance of nodes.This unexpected result-somewhat in disagreement with results from the related, yet distinct, vaccination problem (Lee et al., 2012)indicates that while a network is evolving over time, the underlying importance of nodes, from an IM-sense, may not change.Thus, elucidating the influential nodes from the observed network is more important than predicting the future evolution.Indeed, link predictions may yield noisy results and inherently lose information due to cutoffs and thresholds, so methods that only consider historical data, such as LogRegSum and Static (mem), may perform better by "averaging out" some of this noise.Additionally, in light of Static (mem) performing so well, it should be no surprise that LogRegSum also does well since this method only predicts links for edge pairs with at least one historical edge.Thus, even though GNN and NMF can predict links for any edge pairs, this may not be an advantageous feature.We stress that these findings only apply for the SI model, and may not be applicable under other diffusion mechanisms.Lastly, we saw that if a link prediction method or heuristic yielded predictions with n p < k active nodes, then choosing the remaining k − n p seed nodes becomes a non-trivial task.This problem is avoided with methods such as LogRegSum and Static (mem), which always allow for all n nodes to be considered for the seed set.We stress that these conclusions only apply to these datasets and the SI diffusion model.

Conclusions
In this work, we addressed the important problem of ex ante influential maximization on temporally-varying networks.We first predict the future evolution of the network and then use a standard temporal IM algorithm on the predicted network to find the optimal seed nodes.We also proposed IM heuristics using the model fit of the link predictions to find seed nodes, omitting the actual link prediction step.Across many settings, we demonstrated influence spread using the proposed methods on par with the gold standard method of comparison, with LogRegSum performing the best.These results show that it is possible to construct satisfactory seed sets for the IM task, even when the future topology of the network is unknown.Additionally, we found that in many cases, a simple heuristic based on historical edges yielded the best results and in practice we suggest this method due to its performance, simplicity and computational advantage.Our surprising results indicate that the most influential nodes may not vary with time, even as the network topology does.
We emphasize that these results were shown under the SI diffusion model.It is possible that under a different model, e.g., IC or LT, Static (mem) may not perform as well.Indeed, since the SI model allows nodes to activate their neighbors at any time following infection (nodes never become inactive), it is possible that the timing of when links occur is less important than the absence or presence of a link at any time.For example, Gayraud et al. (2015) and Michalski et al. (2014) both highlight the importance of the timing of links and that the performance of their methods suffers when all snapshots are simply aggregated, but this was for the IC and LT models.Even when using the SIR model, a close cousin of the SI model, Erkol et al. (2020) show that an approach based on aggregated temporal snapshots performed poorly.That being said, Gayraud and Erkol's results were shown under the ex post assumption.Regardless, in practice it is paramount to understand the diffusion mechanism for the particular problem at hand as the performance of these algorithms may vary.
There are several interesting avenues of future work.Using link prediction methods for down-stream tasks has received relatively little attention and we have highlighted several challenges.We proposed a heuristic to side-step this challenge and it would be interesting to study other ex ante tasks and see if the same results hold.Additionally, any link prediction method inherently has uncertainty in its outputted edges.Another direction of future work is determining how to incorporate this uncertainty into the IM task.LogReg would be a sensible method to work with since it yields probabilities of edges, as compared to NMF or GNN which only yield similarity scores.is better.MSE is the root mean squared error between the aggregated degrees of the future and predicted networks, divided by the future average degree.Mathematically, if d 0 and d 1 are the true and estimated aggregated degree sequence of G p+1 , . . ., G T , respectively, then Here smaller is better.The TPR is particularly low for each method for Synthetic, Reality and High School 1, which also corresponds to some of the weaker performances of the Table 2 True positive rate (TPR) and relative degree root mean squared error (MSE) for each method and each network TPR is the proportion of future edges that the method correctly predicted averaged over all time steps (larger is better).MSE is the root mean squared error between the aggregated degrees of the future and predicted networks, divided by the future average degree (smaller is better) respective methods.Some of the best results occur for Email 4, which also corresponds to some of the largest influence spread results compared with Oracle.These results show how difficult link prediction is as a TPR greater than 0.40 is never achieved, nor is an MSE below 0.60.

Fig. 1
Fig. 1 Schematic of proposed setting and compare the results on the remaining T − p = 9 graphs.For this network, we fix = 0.05 , vary the seed size k and use the Dynamic Degree algorithm.The results are in Fig.3a.Several methods come within 98% of the performance of Oracle including LogReg, LogRegSum and GNNSum.For k ≥ 15 , Static (mem) yields the largest influ- ence spread of the proposed methods while Static (last) and JC do well for small k.NMF and NMFSum perform the worst of all methods.LogRegSum performs about the same as LogReg and GNNSum also outperforms GNN.NMF and NMFsum are roughly the same.

Fig. 2
Fig. 2 Influence maximization results on synthetic and reality networks

Fig. 3
Fig. 3 Influence maximization results on Email4 and high School 1 networks

Fig. 4
Fig. 4 Influence maximization results on Office and Hospital networks

Fig. 5
Fig. 5 Influence maximization results on Copenhagen Bluetooth and College networks

Table 1
Summary statistics and temporal network measures for datasets under consideration n: number of nodes; m: number of unique links; p : average edge density across time steps; T(p): number of time steps (number used for training time); fNT: fraction of nodes present at half the sampling time; fLT: fraction of unique links present at half the sampling time; FNT: fraction of nodes present in the first and last 5% of the sampling time; FLT: fraction of unique links present in the first and last 5% of the sampling time; DA: degree assortativity All other networks are real-world networks.Several record an edge if two people are within close proximity: Reality, High School 1, Hospital, Office and Copenhagen Bluetooth.The remaining networks come from online interactions: emails in Email4