Dynamic centrality measures for cattle trade networks

We study network centrality measures that take into account the specific structure of networks with time-stamped edges. In particular, we explore how such measures can be used to identify nodes most relevant for the spread of epidemics on directed, temporal contact networks. We present a percolation study on the French cattle trade network, proving that time-aware centrality measures such as the TempoRank significantly outperform measures defined on the static network. In order to make TempoRank amenable to large-scale networks, we show how it can be efficiently computed through direct simulation of time-respecting random walks.

complexity of temporal networks is reflected by the high computational cost of some of these methods, many of which are prohibitively expensive for networks exceeding thousands of nodes, or more than a hundred time-slices.
In this work, our main contribution is to study SIR-type outbreaks on the French cattle trade network, in which nodes are cattle holdings, and time-stamped edges correspond to the commercial movement of animals between these holdings (Dutta et al. 2018;Natale et al. 2011). We focus on movements in 2014 and 2015, which gives rise to networks with over 170,000 vertices and over 2,250,000 time-stamped edges, with a daily granularity. Besides classical static measures, we will focus on the TempoRank measure, as defined in Rocha and Masuda (2014). This is an extension of PageRank centrality, defined using time-respecting random walks on the network. We will recall some of its properties and present a deterministic (exact) implementation. We will also present an extension of the TempoRank which might be more efficient in the context of epidemic prevention, by identifying vertices that have high downstream transmission potential. However, the matrix-based methods on which this implementation is based fail in the context of large networks, because of the absence of sparsity. We show how direct simulation can instead be used to approximate TempoRank computation, which makes this measure useful even for very large temporal networks. We show that our stochastic method is able to compute TempoRank centrality measures on the large BDNI network, and, using a numerical percolation study, that they outperform static centrality measures in mitigating outbreak sizes.
The paper is structured as follows: in the "Centrality measures for temporal networks" section, we will recall some of the definitions of centrality measures used and show how some of them are related to the properties of random walks on the network. Then, in the "Random walk centralities: definition and properties" section, we will present two versions of the TempoRank measure in the same framework, as well as a stochastic approximation algorithm that enables their computation even for large network sizes or large numbers of time-slices. Finally, in the "Centrality measures in cattle trade networks" section we will describe how these measures are correlated with the spread of infectious disease, using the French cattle trade network as an example.

Centrality measures for temporal networks
In this section, we will describe the framework in which we will define and study centrality measures. Since most temporal network centralities are generalizations of static centralities, we will first review some classical notions of centrality, which will also be used later in the numerical percolation study on the French cattle trade network.

Static PageRank centrality
Static Networks. Let G = (V , E) be a directed graph with |V | = N vertices and let A = (a i,j ) be its adjacency matrix. The aim of centrality measures is to associate to each vertex a numerical quantity representing its importance in the graph according to some criterion. In this paper, we will mainly consider centrality measures based on random walks, such that the most central vertices are the ones visited most often by random walkers on the network.
PageRank. The best-known such measure is PageRank centrality, defined as following. Let P be the transition matrix of a simple random walk on V , defined by p i,j = a i,j / k� =i a i,k . To avoid the possibility of random walkers getting stuck in vertices with no outgoing edges, we add a teleportation probability d , which corresponds to a transition from a vertex i to any other vertex j , uniformly chosen in V . The transition matrix of the random walk with teleportation is then: For d ∈ (0, 1] , P is then an irreducible (row-)stochastic matrix, regardless of the connectivity structure of the original network G . Hence, according to the Perron-Frobenius theorem, the left eigenspace associated to the dominant eigenvalue 1 is of dimension 1, and contains a positive eigenvector v . The PageRank vector v PR is then the normalized eigenvector The latter equality can be interpreted in probabilistic terms. Indeed, v PR defines a probability distribution on V , and (2) is simply the stationarity of this distribution with respect to the simple random walk with teleportation on G.

Temporal network centralities
Temporal networks. We will consider temporal networks G = (G t , t ∈ [1, T ]) defined over a discrete timespan t ∈ [1, T ] . For each time t , we require G t to be a network G t = (V , E t ) on a fixed vertex-set V with cardinal N . For convenience, we will label the vertices of G and assume that V = [1, N ] . By default, the networks we consider are directed, since directionality of trade is of great importance in the main application we have in mind, namely cattle trade networks, in which nodes are holdings and edges represent cattle being moved from one holding to another.
Representing temporal networks by a sequence of discrete snapshots naturally leads to a matrix representation (A t , t ∈ [1, T ]) , where A t = (a t,i,j , i, j ∈ V ) is the adjacency matrix of the network G t . However, it is also possible to embed the snapshots G t inside a single concatenated network G conc , which has vertex-set [1, T + 1] × V . In this representation, each edge (u, v) ∈ E t is seen as a directed edge from vertex (t, u) to (t + 1, v) . This representation is inspired by the application to cattle trade movements, in which an edge (u, v) at time t represents the movement of an animal between two holdings, such that that animal is present in holding u at time t , then in holding v at time t + 1 . Of course, it would also be possible to use edges in E t to represent movements from time-slice t − 1 to time-slice t . Using our convention however, in matrix form, the concatenated network G conc has an adjacency matrix defined by blocks: Note that this matrix is of dimension TN × (T + 1)N . We will also sometimes consider the wrapped network G wrap . This network has vertex-set [1, T ] × V and edges from E T go from layer T to layer 1. Its matrix representation is then Optionally, each of these networks can be augmented to include self-loops linking each vertex (t, i) to its representation (t + 1, i) in the next time-slice (using the convention that T + 1 = 1 in the case of the wrapped network). This will be useful when considering lazy random walks, which can stay in a given vertex for some amount of time before moving using an outgoing edge. TempoRank centrality. There are at least two extensions of PageRank centrality to the temporal networks. Here, we will focus on TempoRank centrality (Rocha and Masuda 2014), but we note that a different approach was considered in Rozenshtein and Gionis (2016), also taking advantage of the close relationship between PageRank scores and the stationary distributions of random walks on the network. The temporal PageRank approach of Rozenshtein and Gionis (2016) defines a time-dependent score r(u, t) for each node u and each time-stamp t recorded in the temporal network by considering the number of time-respecting random walks reaching u before time t . Nodes that can be reached by many time-respecting walks will then have a high temporal PageRank score at time t . This approach is particularly useful in the case of streaming graphs, and the authors describe an efficient algorithm to update the temporal PageRank scores as more interactions are recorded, accurately reflecting rising and falling influence of a given node as time passes. However, such an approach is not well-suited to our context of outbreak mitigation on cattle trade networks, due to the periodic nature of such networks. Indeed, there is a strong annual and seasonal periodicity in the activity of certain holdings. The corresponding nodes will then have low temporal PageRank scores after long periods of inactivity, while being of critical importance in epidemic transmission during their activity period.
By contrast, the TempoRank score introduced in Rocha and Masuda (2014) summarizes all temporal information contained in a set of time-stamped edges into a single score by averaging over all observation times. TempoRank centrality considers the wrapped network G wrap associated with a given temporal network G . With the application to cattle trade networks in mind, we will assume that G , hence also G wrap are directed. Let q ≥ 0 be a laziness parameter and d ≥ 0 be a teleportation parameter. The transition matrix for the lazy random walk with teleportation is then defined by: with the out-degree odeg t (i) defined by: In other words, random walkers stay in place with probability q (note that the authors of Rocha and Masuda (2014) used a degree-dependent probability q deg t (i) of staying in place). If they decide to move, they teleport to a uniformly chosen vertex with probability d , or move along a uniformly chosen outgoing edge with probability 1 − d . Note that if the current vertex is isolated at time t ( odeg i (t) = 0 ), there is no such outgoing edge, and they stay in place (this occurs with probability (1 − q)(1 − d) ). Using the wrapped network representation of (4), we then get the following transition matrix on G wrap : By definition, B wrap is a stochastic matrix, and, by the Perron-Frobenius theorem, admits a (unique up to a multiplicative constant) left 1-eigenvector with nonnegative elements. Writing left 1-eigenvectors under the form � = (π 1 . . . π T ) ∈ R TN with π i ∈ R N , we see that they need to satisfy the equations: which is equivalent to In other words, the π t are the stationary distributions of the random walks (X mT +t ) as m → ∞ , where X has one-step transition matrix given by (5). Averaging over time, we define: We will also consider the following version of TempoRank that takes into account the existence of outgoing edges in a given time-slice. We will call it out-TempoRank centrality. It gives no weight to vertices with no outgoing edges, by contrast with regular Tem-poRank centrality.

Stochastic approximation
We will now see how direct stochastic simulations can be used to approximate randomwalk based centrality measures. This makes their computation effectively possible on very large networks, for which matrix-based methods are not feasible. Indeed, eigenvalue methods for large matrices are relying on sparsity assumptions that are not satisfied here: for d > 0 , the transition matrices B t are always dense. Even if d = 0 , while the individual transition matrices might be sparse, depending on the connectivity structure of the underlying network, the cyclical products used in the definition of the Tempo-Rank will lose sparsity as the number of time-slices increases. For q ∈ [0, 1) and d ∈ [0, 1] , let then (X n , n ≥ 0) be a Markov chain on G wrap with transition matrix B wrap (7). There is an obvious projection map π : We are interested in the asymptotic behavior of the visit times V n , as well as the exit times O n , defined respectively by: We can use these quantities to approximate both TempoRank and out-TempoRank centrality. Indeed, if d > 0 , then X is an irreducible Markov chain on V , and Hence, we can compute TempoRank and out-TempoRank centrality by simulating timerespecting random walks on G wrap and by counting the time spent in each vertex, resp. the number of movements exiting a given vertex through an outgoing edge (not through teleportation). Both the deterministic (matrix-based) and the stochastic algorithm are freely available 1 . The use of such methods to compute centralities when the size of the graph makes matrix-based methods unusable is classical (Avrachenkov et al. 2007;Broder et al. 2006;Bahmani et al. 2010) and has been extensively studied in the context of internet search engines.

Convergence of random walks
In order to quantify the effect of the laziness parameter q and the teleportation parameter d on the speed of convergence of the quantities described above to their deterministic limit, we used the well-known Primary School temporal network . (Gemmetto et al. 2014). This network is constituted by face-to-face close interactions between high school students and teachers, as measured in 20-second intervals. We aggregated all contacts occurring in a given hour into a single time-slice, which resulted in a temporal network of 242 nodes and 26,603 edges over 19 time-slices.
We computed the vector of exit times for a random walk with 5 × 10 7 steps, logging interim values at increments of 5 × 10 5 steps, for varying values of d and q . We also computed the exact value of the out-TempoRank vector using (11) and compared it to the stochastic approximation at each iteration using Total Absolute Percentage Error: Results (shown in Fig. 1) indicate that, for this network at least, the fastest convergence is obtained for small values of q (including 0), which of course reflects the fact that laziness slows down the random walks. Most importantly, there also exists an optimal value of d for which convergence is fastest. For the schools network, this value lies around d = 0.3 . Understanding how this optimal parameter choice is related to the geometry of the wrapped temporal network could be of great importance in tuning the approximation algorithm in settings for which deterministic computations are unfeasible.

Centrality measures in cattle trade networks
In this section, we will use the previously defined centrality measures to identify the nodes most responsible for the spread of disease in a large network of cattle movements in France. Other than the static and temporal PageRank centralities defined above, we also consider Katz and betweenness centrality (definitions can be found in Additional file 1: section 1). Steps Fig. 1 Number of steps required to have total absolute percentage error between estimated and actual out-TempoRank score below 1%, as a function of laziness (q) and teleportation (d) parameters

The French BDNI network
The French Base de Données Nationale d'Identification (BDNI) is a national database logging all movements of cattle between two holdings in France. For the purpose of this work, we extracted all movements occurring in a two-year period between 2014-01-01 and 2015-12-31. We used these movements to construct a temporal network, with holdings as vertices and cattle movements as edges, with a time-stamp corresponding to the day of the movement. We summarized some statistics of the 2014 and 2015 networks in Table 1 below. A more detailed statistical analysis of the BDNI networks between 2005 and 2009 can be found in Dutta et al. (2014). An important feature of this network and more generally of cattle trade networks is the presence of markets and assembly centres, through which a significant part of all trades occur. They are the obvious most central vertices in any cattle trade network (Vidondo and Voelkl 2018), but they may not be hotspots of disease transmission, in particular for pathogens that require prolonged close contact for infection, such as paratuberculosis or BVDV (Gates et al. 2014). Indeed, animals remain in these holdings for a single day (or up to a week in the case of assembly centres), which might not be long enough for significant transmission to occur. To account for such dynamics, we consider two versions of the BDNI network: the first (dubbed the complete network), includes all movements to and from markets and assembly centres, whereas the second (the reconstructed network) bypasses these vertices by reconstructing movements between farms that involve a market or assembly centre as intermediary. More precisely, if an animal moves from a farm F 1 to a market or assembly centre at a date t 1 , then subsequently moves to another farm F 2 at a later date t 2 , we will log a single movement from F 1 to F 2 at time t 1 in the reconstructed network (Fig. 2). As can be seen in Table 1, the most striking difference between the complete and reconstructed networks lies in their degree distribution. Assuming a power-law distribution, we computed its tail exponent, which is such that for the degree D of a uniformly chosen vertex in the network. While the complete networks exhibit so-called scale-free behavior ( 1 < α < 3 ), the reconstructed networks have α > 3 , indicating a network without large hubs. The same behavior can be seen with the higher average out-degree in the reconstructed networks, caused by the splitting of single farm-to-market or farm-to-assembly-centre movements into several farm-to-farm movements. As we will see below, the different connectivity structures lead to markedly different epidemic behavior and thus, to different hierarchies between centrality measures used to control the epidemics.

Centrality in cattle trade networks
There have been several studies of centrality in cattle trade networks. Most approaches have focused on (temporal) path-counting methods, such as the Disease-Flow centralities in Natale et al. (2009), closely related to the outgoing and ingoing contact chains described in Nöremark et al. (2011). Such methods have proved useful in identifying epidemiologically important nodes (Büttner et al. 2013; Vidondo and Voelkl 2018) but fail for large-scale networks due to combinatorial explosion in the number of nodes or the number of time-steps. Other approaches are simulation-based, aiming at identifying suitable control strategies by directly simulating outbreaks (using, for instance, generic SI-type transmission dynamics) on the temporal networks (Payen et al. 2019;Bajardi et al. 2012).
We computed static and temporal centrality measures for the 2014 and 2015 BDNI networks, both in their complete and reconstructed version. Due to the size of the network, computation of the TempoRank measures using deterministic methods was impossible. We used the stochastic approximations of (14)-(15), with 10 8 steps for the random walks. We compared the centralities at increments of 10 6 steps to the final state to check convergence (Fig. 3). We observe that the laziness parameter has no influence on the speed of convergence, whereas for fixed q , the fastest convergence was achieved for low values of d.
In order to assess correlations between measures, we computed Spearman (rank) coefficients (Fig. 4). This analysis shows the high level of similarity between Katz and PageRank centrality, as well as moderate levels of correlation between Katz centrality, PageRank centrality and betweenness, as well as out-degree centrality and out-Tempo-Rank. Broadly speaking, this suggests a rough classification of the six measures considered here into three categories: those measuring the outgoing centrality of a node (outgoing degree and out-TempoRank) and those measuring its ingoing centrality (Tem-poRank, Katz and PageRank centrality). Betweenness centrality is moderately correlated to measures in both groups, since it takes both incoming and outcoming paths into account. In addition to this analysis, we also compared the rankings of vertices according to the different centralities, using rank-biased overlap (RBO) (Webber et al. 2010). This measure of similarity between two rankings (permutations) σ and τ of [1, N ] is defined by where σ 1:k is the set of the k highest-ranked elements in σ and where p ∈ (0, 1) is a parameter quantifying how much weight should be placed on comparisons between the highest-ranking elements. In essence, the smaller p , the less importance given to the overlap σ 1:k ∩ τ 1:k with high k . Figure 5 shows the pairwise RBO scores for p = 0.997 , chosen so that 95% of weight is given to the top 1000 vertices in the network. There is a striking dissimilarity between the complete and the reconstructed network. On the complete network, most centrality measures have moderately high overlap, with the exception of the TempoRank measure, which has low overlap with all other centralities. The latter is probably due to it highly ranking vertices with no or few outgoing edges, on which random walkers spend a disproportionate amount of time, thereby increasing their TempoRank score. On the reconstructed network however, almost all pairs of centralities have very low overlap, the most similar being TempoRank and Katz centrality as well as out-TempoRank and PageRank centrality. This shows that the problem of identifying high-centrality nodes is much more difficult on a diffuse network such as the reconstructed cattle trade network, whereas on scale-free networks, most measures have similar rankings, perhaps with slight differences in the order.

Percolation analysis
Ultimately, we wish to understand how centrality of a node is correlated with the role this node plays in epidemic outbreaks and if it can be a good predictor of the spread of infection on the temporal network under study. This is a classical problem in network analysis (Lü et al. 2016) and is usually studied either by direct simulation (Vidondo and Voelkl 2018;Payen et al. 2019) or by computation of a proxy of epidemic spread, such as the size of the largest connected component (Mweu et al. 2013;Büttner et al. 2013) or the so-called epidemic threshold (Valdano et al. , 2018. In this section, we will use as a metric the mean final outbreak size of simulated SIR processes. We will use the tsir package for fast outbreak simulation (Holme 2020). This package simulates epidemic outbreaks on temporal networks in the following manner: 1. Choose a node uniformly at random in the network and set its status to "infected" at the start of the observation period (2015-01-01 in our case). All other nodes are set to "susceptible". 2. Each infected node remains in that state for an exponentially distributed time with parameter γ > 0 , after which it becomes "removed". 3. For each time-stamped edge (t, u, v) such that u is infected and v is susceptible at time t , v becomes infected with probability β ∈ (0, 1]. For each centrality measure, we ranked vertices in decreasing order of centrality, the measures being computed on the corresponding 2014 movement network. We then removed an increasing proportion of vertices in the 2015 network, ranging from 0.1% to 5% of all vertices. When removing a vertex, we removed all edges from and to that vertex. We then simulated 10,000 iterations of a temporal SIR process on the percolated 2015 network, with per-contact infection probability β = 0.5 and recovery rate γ = 1 per day. On the static network aggregating all edges with timestamps in 2015, this corresponds to an epidemic with basic reproduction number (Durrett 2007) given by: where k and k 2 are the first and second moment of the degree distribution. This would give R 0 = 28.75 and R 0 = 10 on the complete and reconstructed networks, respectively. However, when taking temporality into account, contagion dynamics are much less explosive since the number of contacts of a given node need only be considered over a short time period (Enright and Kao 2018). Exact computation of the epidemic threshold for general temporal networks is possible (Valdano et al. 2018) but can be computationally unfeasible for large networks. Interestingly, we found that final outbreak sizes had low variance (see Additional file 1: section 2 for more data), perhaps reflecting the fact that once a pathogen reaches a node with high connectivity and high activity, outbreaks tend to be quite similar. We calibrated the TempoRank algorithm by computing centrality using different values of laziness q and teleportation d and comparing the mean final outcome curves (Figs. 6 and 7). Complete results for all examined combinations of q and d can be found in Additional file 1: section 6. Interestingly, the laziness parameter seemed to have only a very limited impact on final outbreak sizes, with higher laziness leading to higher outbreak sizes. This effect is only appearing for high fractions of removed nodes; it seems that, whatever value of q was chosen for out-TempoRank computation, the highest-ranked nodes were identical. A similar effect can be seen when comparing disintegration curves for varying levels of teleportation, although the differences are much more important. The best outcomes were obtained when teleportation was set to the lowest levels ( d = 0.01 ). We then used TempoRank and out-TempoRank scores computed with the optimal values q = 0 and d = 0.01 for comparison with other centrality measures.
Results of the percolation experiment with all centrality measures taken into account are shown in Fig. 8. In the complete graph, the rapid decline in final outbreak sizes, even for small numbers of removed nodes, is a well-known feature of scale-free networks when nodes are targeted appropriately (not uniformly at random). For most centrality measures, the highest-ranked nodes tend to be nodes with high degree, and their removal leads to rapid disintegration of the connectivity structure of the network. Indeed, in cattle trade networks, markets and assembly centres act as trade hubs (Hoscheit et al. 2017;Natale et al. 2009;Bajardi et al. 2011;Salines et al. 2017)  and they are highly ranked by most centralities (see Additional file 1: section 3), which leads to similar disintegration profiles. Nevertheless, out-TempoRank centrality is the best-performing measure by this metric. By contrast, TempoRank centrality performs similarly at first than the static measures we included in our analysis. However, the disintegration curves associated to most static measures exhibit a plateau after a rapid initial decline, which is not the case for TempoRank centrality. Interestingly, comparison of the fraction of markets and assembly centres in the highest-ranked nodes (Additional file 1) shows that out-TempoRank manages to identify those nodes as influential spreaders. By contrast, there is a smaller fraction of markets and assembly centres in the highly ranked nodes according to TempoRank, yet it still manages to effectively prevent epidemic spread at around 3% of removed nodes. The results we obtained, in particular the hierarchy of centrality measures, were consistent across the parameter space for the epidemic process (Additional file 1).
In the reconstructed graph, most disintegration profiles are linear, with different slopes depending on the centrality measure used to select the removed nodes. This is again as expected, because large hubs have been removed and replaced with many farm-to-farm edges (as can be seen in the exponent of the degree distribution in Table 1). This makes the node removal problem more difficult, since there are potentially many redundant paths that can be taken by infections. We see that the temporal centrality measures significantly outperform the static measures on the reconstructed network, which shows that temporal information plays a major part in the spread of infection on cattle trade networks, once its strong connectivity properties have been muted. On this network, the regular TempoRank measure outperforms out-TempoRank, suggesting that the reweighting of nodes by the fraction of time that they had active outgoing edges is a poor predictor of epidemic spread on this network. The difference with the complete network is striking in that regard. We observed no qualitative difference in the disintegration curves for outbreaks simulated on the 2015 networks, whether the centralities were computed on the 2014 or the 2015 data (see Additional file 1: section 4). This is indicative of high year-to-year similarity of the global structure of the networks, even though at the local level, there can be significant variation in the choice of trading partners . Also, since contemporary data is not always available at the time of implementation of control measures, this shows that relying on past data can still produce high-quality outbreak prevention and mitigation.

Conclusion
In this work, we showed how random-walk-based centrality measures are well-suited for large temporal networks, since they can be efficiently computed using stochastic simulations. We demonstrate their suitability for the purpose of epidemic mitigation by targeted removal of nodes, and their improved performance when compared to measures computed on the static (time-aggregated) network.
However, there are some limitations inherent to the definition of TempoRank and out-TempoRank centralities. Indeed, they are defined as time-averaged quantities as can be seen from their computation through the stationary distributions of random walks on the time-wrapped network G wrap . This makes them robust to small local changes in the network, but fails to take into account heterogeneous behaviour, such as seasonality (Vidondo and Voelkl 2018) or changes in network dynamics which would make the recent past a better predictor of future outbreaks than older temporal patterns. It should be possible to account for such ruptures, provided they can be accurately detected (Ranshous et al. 2015;Donnat and Holmes 2018;Monnig and Meyer 2018) by appropriately reweighting time-slices in the definition of TempoRank centrality (10). Other, so-called online centrality measures update the scores as new interactions are added to the dataset (Béres et al. 2018;Rozenshtein and Gionis 2016) but fail to identify older, periodically active nodes which might have an important role to play in epidemic spread in specific timeframes.
An important next step will be the use of centrality measures for targeted control measures more subtle than the outright removal procedure studied here. For instance, targeted vaccination campaigns or diagnostic strategies are some of the tools available for the control of cattle disease, but they need to be weighted against their economic impact. Better focused strategies could yield similar epidemiological outcome at lower economic cost.
In a similar fashion, the effect of parameters q and d on final epidemic size should be explored further. For instance, on temporal networks with different geometries (such as sexual contact networks), it would be interesting to understand whether low final outbreak sizes are always associated with low teleportation parameters in the TempoRank scores or if this is a specificity of networks that are similar to cattle trade networks.
It will also be important to obtain quantitative results on the speed of convergence of the stochastic algorithm to the TempoRank and out-TempoRank centrality vectors. It is well-known (Levin and Peres 2017) that mixing times of Markov chains are strongly