A Systematic Framework of Modelling Epidemics on Temporal Networks

We present a modelling framework for the spreading of epidemics on temporal networks from which both the individual-based and pair-based models can be recovered. The proposed pair-based model that is systematically derived from this framework offers an improvement over existing pair-based models by moving away from edge-centric descriptions while keeping the description concise and relatively simple. For the contagion process, we consider the Susceptible-Infected-Recovered (SIR) model, which is realized on a network with time-varying edges. We show that the shift in perspective from individual-based to pair-based quantities enables exact modelling of Markovian epidemic processes on temporal tree networks. On arbitrary networks, the proposed pair-based model provides a substantial increase in accuracy at a low computational and conceptual cost compared to the individual-based model. From the pair-based model, we analytically find the condition necessary for an epidemic to occur, otherwise known as the epidemic threshold. Due to the fact that the SIR model has only one stable fixed point, which is the global non-infected state, we identify an epidemic by looking at the initial stability of the model.


Introduction
In recent years epidemiological modelling, along with many other fields, has seen renewed activity thanks to the emergence of network science. Approaching these models from the view of complex coupled systems has shed new light onto spreading processes where the early black-box ordinary differential equation (ODE) models from Kermack and McKendrick had its limitations. These ODE models assume homogeneous mixing of the entire population, which may be an appropriate approximation for small communities. However, when attempting to model the spread of arXiv:2009.11965v1 [physics.soc-ph] 24 Sep 2020 disease at a national or international level, they fail to capture how heterogeneities in both travel patterns and population distributions contribute to and affect the spread of disease. Epidemiological models on complex networks aim to solve this problem by moving away from averaged dynamics of populations and mean-field descriptions. Instead, the focus is on interactions between individuals or meta populations, where the spreading process is driven by contacts in the network.
There have been many improvements made in regards to network models, e.g., generalised multi-layer network structures or more specifically temporal networks that allow for the network structure to change with time. Temporal networks are a natural way of representing contacts and lead to an insightful interplay between the disease dynamics and the evolving network topology. With the ever growing availability of mobility and contact data it has become easier to provide accurate and high-resolution data to inform network models. The results can be extremely useful tools for public-health bodies and other stakeholders.
In previous works, a widely used epidemiological concept is the individual-based model. It assumes statistical independence in the state of each vertex. A major problem associated with such a model is that it suffers quite badly from an echo chamber effect due to the fact that there is no memory of past interactions due to statistical independence. There have been efforts to ameliorate this problem by introducing memory at the level of each vertex's direct neighbours. These models referred to as contact-based [1] or pair-based [2] and have been shown to significantly reduce the echo chamber effect, depending on the underlying network structure.
These two models differ in their initial approach. The contact-based model takes an edge-based perspective, which extends the message passing approach [3], and all dynamic equations are formulated in terms of edges. By contrast, the pair-based model keeps the vertex-based approach of the individual-based model and dynamic equations are in terms of vertices.
In this paper, we extend the pair-based model to a temporal setting giving a Temporal Pair-based Model (TPM). We show how it can be drastically reduced and simplified under a certain dynamical assumption. We deal specifically with susceptible-infected-recovered (SIR) models. Once the TPM is written in concise form, it is then possible to show that the contact-based model is equivalent to a linearised version of the TPM. We then establish the conditions for an epidemic to occur according to the TPM, also known as the epidemic threshold. We investigate how the TPM performs on a number of synthetic and empirical networks and investigate what kind of network topologies work best with the TPM.
In this paper, we extend the pair-based model to a temporal setting giving a Temporal Pair-based Model. We show how it can be drastically reduced and simplified under a certain dynamical assumption. We deal specifically with susceptibleinfected-recovered (SIR) models. Once the pair-based model is written in concise form, it is then possible to show that the contact-based model is equivalent to a linearised version of the pair-based model. We then establish the conditions for an epidemic to occur according to the pair-based model, also known as the epidemic threshold. We investigate how the pair-based model performs on a number of synthetic and empirical networks and investigate what kind of network topologies work best with the pair-based model.

SIR Network Model
Let us consider a temporal network G = {G 1 , . . . , G T } to be a series of network G t = (V, E t ), which all share the same vertex set V but differ in their edge set E t .
The adjacency matrix for the network at time t will be denoted by A [t] , and A implies an edge between vertices i and j at time t.

Reduced Master Equations
Let Ω be the set of compartments in the model, where in the SIR model Ω = {S, I, R}. Let X n = (X n 1 , X n 2 , . . . , X n N ) T be the vector whose i-th element refers to the state of the i-th vertex in the network at time step n, thus X n ∈ Ω N where |V | = N . The evolution of the disease is then described by the master equations [4], thus assuming the infection process is Markovian. P (X n+1 ) is the probability of the network being in the particular configuration of states given by X n+1 and P (X n+1 |X n ) is probability of the network moving from the configuration of states X n to X n+1 between their respective time steps. These equations describe the entire process on the network, however, in order to progress the system forward one step in time, the probabilities of all combinations of state vectors must be found, this usually is not feasible for network processes with potentially billions of vertices as for the SIR process the total combination of states is given by 3 N .
Instead, it is possible to describe the evolution of the disease using a system of Reduced Master Equations (RME) [5], which describes the evolution of subsystems within the network, such as individual vertices, removing the need to obtain every possible combination of states. An important note is that these RMEs are in fact not themselves true master equations as they are not necessarily linear due to the fact that the transition rates of the subsystems are non-linear combinations of the transitions rates of the original system. However, we shall continue to use the term RME introduced by the author of [5].
For notational convenience we use the following notation for the joint marginal probabilities, P (X n i1 , X n i2 , · · · , X n im ) = X n i1 X n i2 · · · X n im .
When we wish to specify a particular realisation of X n i , we denote it by S n i , I n i or R n i to imply X n i = S, X n i = I or X n i = R respectively. Employing this new notation we start with the RME which describes the evolution of individual vertices, For SIR dynamics, the evolution of each vertex in each compartment is given as the following, Where I n+1 i |S n i reads the probability vertex i is infected at time n+1 given it was susceptible at time n and similarly for R n+1 i |I n i . Note that R n i can be recovered using the conservation of the probabilities S n i + I n i + R n i = 1. In order to compute the transition rates we define the following quantities, the probability of infection on contact β, the rate of recovery µ. A [n] is the temporal adjacency matrix of the network on which the process is occurring. Following directly from [2], the transition rates of moving from S to I, and I to R are given by For the expression within the square brackets of Eq. (5a), the first term is the probability that vertex i is infected by some other vertex in the network, however, double counts events.
Each subsequent term accounts for double-counting and over-correcting in the previous.
These equations describe the probabilistic SIR process on temporal networks. However, the system of equations is not closed as it lacks a description for their joint probabilities. There are a number of ways which this problem can be tackled, usually by making a number of numerical or dynamical approximations [6][7][1] [8]. In the next sections we attempt to improve on and unify many existing approaches showing how they are derived from the system of RME's given by Eqs. (4) and (5).

Individual-based Model
One of the most commonly used epidemiological models on networks is the individual-based (IB) model [7]. This refers to the assumption of statistical independence of vertices or the mean field approximation, i.e, the factorisation X n i 1 X n i 2 . . . X n i M = X n i 1 X n i 2 . . . X n i M . By assuming independence of vertices, Eq. (5a) simplifies to which upon factorising can be concisely written as Upon substituting the transition rates I n+1 i |S n i and R n+1 i |I n i under the assumption of statistical independence the full IB model is written as This model closes the equation (5a) at the level of vertices, thus ignoring all correlations with other vertices at previous times. However by ignoring all past correlations causes the model to suffer quite badly from an echo chamber effect [9]. This echo chamber has the effect of vertices artificially amplifying each others probability of being infected I n i at each new time step as the marginal probability of each vertex is highly correlated with the rest of the network and the factorisation of Eq. (5a) means each vertex forgets its past interactions. This is nicely demonstrated in [9] where they demonstrate in the absence of a recovered compartment, a static network of two linked vertices for non-zero initial conditions the probabilities of being infected converge according to limn→∞ I n 0 = limn→∞ I n 1 = 1 for the IB model.

Pair-based Model
In contrast to the IB-model, instead of assuming independence of vertices we can approximate the marginal probabilities in terms of combinations of lower order marginals using some form of moment closure [2] [10]. Here we make an equivalent assumption to that of the message passing approaches [9] [3]. We assume the network contains no time-respecting non-backtracking cycles, in other words, starting at some initial vertex i that leaves via vertex j, there is no way to find a time-respecting path returning to this vertex that does not return via j. This is equivalent to a tree network when the temporal network is viewed in it's static embedding of the supra-adjacency representation []. This allows us to write all higher order moments in Eq. (5a) as a combination of pairs S n i I n k . To show why this is possible, consider the three vertices i, j, k connected by two edges through i. If conditional independence of these vertices is assumed given we have the state of i, then one can make the following assumption, This has the effect of assuming the network is tree-like in structure as it implies any interaction between vertices j and k must occur through vertex i, thus the process is exact on networks which contain no time-respecting non-backtracking cycles and otherwise provides an improved approximation of varying degree which depends on the true network structure. The result obtained in Eq. (9) is often referred to as the Kirkwood closure [11].
Under the assumption that the network is tree like, the following simplification is obtained for Eq. (5a), However, we run into the problem that we have no description for pairs of vertices, thus we derive expressions for their evolution from the RMEs for pairs of vertices which is given by, Note that the RME for S n i S n j is also required, which we find to be the following Since only the probabilities S n i I n j and S n i S n j are needed in order to describe the RMEs in Eq. (10), we consider those two combinations of states. From [2], we obtain the exact transition rates for pairs of vertices, though we find we can factorise the pair-wise transition rates similar to Eq. (5a). Here we give the expression for S n+1 i I n+1 j |S n i S n j only while the rest of the pair-wise transition rates are given in Appendix A.
In the above equation, the term in the first pair of square brackets corresponds to the probability that vertex i does not become infected and the term in the second pair of square brackets corresponds to the probability that vertex j becomes infected. Upon applying our moment closure technique Eq. (14) may be written as By introducing the following functions, the RMEs for pairs as well as the individual vertices can be written more concisely. The probability that vertex i does not become infected at time step n + 1, given that i is not infected at time step n is denoted by Similarly, the probability that vertex i does not become infected at time step n + 1, given that i is not infected at time step n while excluding any interaction with j, is given by Then, the evolution of the state of every vertex in the network is determined by the following closed set of equations, This approximation allows a large increase in accuracy compared to IB model while only adding two equations to the final model. All past dynamic correlations are now tracked by the model and so the echo chamber effect is eliminated, but only with direct neighbours (vertices which share an edge). A major benefit of this particular PB model over other existing iterations [1] [8] is that this model can be implemented as an element-wise sparse matrix multiplication rather than having to iterate through all edges for every time step making it extremely computationally efficient and fast on even large networks. It also benefits from a low conceptual cost by not deviating from a vertex-based perspective, like the contact-based models, which move to the perspective of edges and thus define the model in terms of the line-graphs and non-backtracking matrices [1].
Similar to [9], we can compare this PB model to the IB model using the two vertex example. We consider two vertices connected by an undirected static edge and give the two vertices some initial non-zero probability I 0 0 = I 0 1 = z of being infected. We then run the IB and PB models for some given parameters β and µ and compare it to the ground truth, which is the average of a number of Monte-Carlo realisations. Looking at Fig. 1. it becomes apparent how the IB model fails to capture the true SIR process on the network due to the previously discussed echo chamber induced by assuming statistical independence of vertices. It becomes clear that the PB model accurately describes the underlying SIR process for this simple example as each vertex is able to recover the dynamic correlations of past interactions with direct neighbours.

Equivalence Between The Contact-based and Pair-based Models
In the contact-based model [1], the central component is θ n ij , which is the probability that node j has not passed infection to node i up to time step n. From θ n ij , the author finds S n i may be computed as This equation is the basis for the contact-based model and allows us to easily compare with the pair-based model as it describes the same quantity as our Eq. (18a). The authors also assume that the evolution of θ n ij satisfies the following relation θ 0 ij = 1.
In the pair-based model, the evolution of the susceptible probability, given by Eq. (18a), can similarly be rewritten in terms of its initial conditions, From equating (21) and (19) it is clear that if the models are exactly equivalent then θij is defined by However, this contradicts the assumption made by Eq. (20). Thus the Pair-based and Contact-based models are only equivalent if the following linearisation is assumed: which then implies (22) can be written as are making the assumption that there are no recovered vertices in the network when the initial random infected seed is injected, thus giving the disease the best chance possible to cause an epidemic. By using the fact that I n j = S n i I n j + I n i I n j + R n i I n j we get S n i I n j = I n j − I n i I n j − R n i I n j (25a) and thus, ij + δij(1 − µ).
Following Ref. [7], we find that the condition required for an epidemic to occur is given by For the values of β and µ which the above is satisfied, implies that when a disease is introduced into the network the DFS is unstable for a period of time. What this means is that in the equivalent SIS model with the same parameters, the proportion of infected vertices never settles on the DFS.

Results
In this section we look at comparing the accuracy of the IB and PB models against the ground truth Monte-Carlo average. In short, we show how the PB model can offer a massive increase in accuracy and also discuss when it fails to accurately capture the true dynamics of the stochastic SIR process. Furthermore, we validate the analytical epidemic threshold.

Tree Network
The assumption in the PB model is conditional independence between vertices with a neighbour in common given the common neighbours state. This is equivalent to assuming the network contains no time-respecting non-backtracking cycles. To illustrate this reasoning, we consider a static tree network, that is, tree networks contain no cycles of length 3 or greater, made up of 100 vertices. All vertices start from some initial non-zero probability I 0 i = z of being infected. We then run the IB and PB models for some given parameters β and µ and compare it to the ground truth, which is the average of a number of Monte-Carlo realisations.  process for this simple example as each vertex is able to recover the dynamic correlations of past interactions with direct neighbours. As we will see from the next section, temporal networks that are well approximated by tree networks are also well approximated by the PB model.

Empirical Networks
In the following section, we have chosen 2 empirical temporal networks that all vary in both structure and temporal activity. For each of the the considered empirical networks we wish to test our our findings that the PB model offers an increase in accuracy over the IB model and the we observe a change in behaviour as the model parameters cross the epidemic threshold. We run the SIR IB and PB models for all our networks for different values of β and µ and then compare them to the average of a sufficiently large number of MC simulations. This allows us to quantify how well the different models approximate the dynamics of the true SIR process. At each time-step, the average prevalence of states within the network are collected and denoted as S n avg , I n avg and R n avg with the cumulative prevalence of infection being taken as I n avg + R n avg . Then, we validate our analytical finding for the epidemic threshold of the PB SIR process. For this purpose, we fix a value for µ and then for increasing values of β, perform a number of MC simulations for long times in order to get a distribution of the final out break size, which is given by limn→∞ I n avg + R n avg . In the long-term dynamics of the SIR process, limn→∞ I n avg + R n avg , will usually exceed the observation time of the network. Therefore, periodicity of the networks is assumed similarly to [7] when computing the final outbreak sizes.  Fig. 3 (a) indicates a scale-free behaviour often seen in empirical networks. The network appears to be quite sparse as is evident from Fig. 3  Therefore, it is reasonable to expect that the SIR process would be well approximated by the PB model for such a network. As shown in Fig. 5 the number of non-backtracking cycles as a fraction of the total number of paths in the network is very small peaking at

Conference Contacts
The second data set (cf. Tab. 1) is the Conference network described in Ref. [12]. It  Because of the small number of nodes in the network it is difficult to say much about the degree distribution but from Fig. 6 (a) there is a clear heavy tail with most vertices having a relatively small aggregated degree. In Fig. 6 (b) we see that the network activity in this case shows a number of peaks occurring then quickly dying out, these are explained by breaks between talks at the conference during which the participants converse and interact.
Because of the time scale and observation period of this particular temporal network it is not feasible to try and model the spread of disease as infection and recovery in unlikely to occur within the observation period (which is approximately 20 hours). However we could use our model to simulate the spread of viral information or "gossip" using the same dynamics as the SIR model. Infection is equivalent to receiving some information in such a way that it becomes interesting enough to for the individual to try and spread to those they contact in the future and recovery is equivalent to growing tired of the information and they no longer inform others they meet.   the PB model is that the IB model has a sustained higher first peak which uses up the pool of susceptible individuals leaving a smaller population available to catch the disease.
The reason we do not see a good agreement with the MC average for this particular data set is due to the underlying topology of the network which is a physical social interaction network where individuals congregate in groups and most or all in the group interact with one another, leading to large clusters that give rise to many non-backtracking cycles.
The more time-respecting non-backtracking cycles that occur, the worse the PB model will perform. It is for this reason that we see a relatively large deviation from the MC simulations for the PB model. This can be explained by Figure 8 (a) which in contrast to the cattle network shows that the number of NBT-cycles is relatively dense at many points in time, reaching as high as 12%.
In Fig. 8 (b) we see the distribution of final outbreak proportions against the critical β computed for the epidemic threshold of the PB model. Again, for values of β that are greater than the computed epidemic threshold, there is an obvious but gradual change in

Conclusions
In this paper, we forward work done on SIR pair-based models by extending it to a temporal setting and investigating the effect of non-backtracking cycles on the accuracy of the model on arbitrary network structures. We find that the existence of many such non-backtracking cycles leads to a deviation in the pair-based model from the true SIR process due to the echo chamber effect they induce. Thus the pair-based model is best suited to network structures which don't contain many cycles such as production networks for the reasons described earlier. We also find that our analytical finding for the epidemic threshold holds up when compared to numerical simulations by showing a qualitative change in the final outbreak proportion.