Multi-species temporal network of livestock movements for disease spread

The objective of this study is to show the importance of interspecies links and temporal network dynamics of a multi-species livestock movement network. Although both cattle and sheep networks have been previously studied, cattle-sheep multi-species networks have not generally been studied in-depth. The central question of this study is how the combination of cattle and sheep movements affects the potential for disease spread on the combined network. Our analysis considers static and temporal representations of networks based on recorded animal movements. We computed network-based node importance measures of two single-species networks, and compared the top-ranked premises with the ones in the multi-species network. We propose the use of a measure based on contact chains calculated in a network weighted with transmission probabilities to assess the importance of premises in an outbreak. To ground our investigation in infectious disease epidemiology, we compared this suggested measure with the results of disease simulation models with asymmetric probabilities of transmission between species. Our analysis of the temporal networks shows that the premises which are likely to drive the epidemic in this multi-species network differ from the ones in both the cattle and the sheep networks. Although sheep movements are highly seasonal, the estimated size of an epidemic is significantly larger in the multi-species network than in the cattle network, independently of the period of the year. Finally, we demonstrate that a measure based on contact chains allow us to identify around 30% of the key farms in a simulated epidemic, ignoring markets, whilst static network measures identify less than 10% of these farms. Our results ascertain the importance of combining species networks, as well as considering layers of temporal livestock movements in detail for the study of disease spread.


Introduction
Infectious diseases in livestock are of great concern as they pose an economic burden, compromise animal health and welfare, and threaten human health by contributing to the emergence of new zoonotic diseases. Mathematical models of infectious disease spread are useful tools to help us understand the drivers of an outbreak, and inform policy decisions. In a world where pandemics are becoming more likely (Morse 2001; as the ranking of importance of nodes. It is therefore crucial to explore the multi-species network in order to highlight and quantify potential consequences for disease risk. Livestock movement network analyses have been performed mostly on static networks, where most analytic results are available (Newman 2018). A static network assumes that the change in the set of contacts are negligible over the course of the epidemic (Enright and Kao 2018). In reality, livestock movements have an inherent temporal component, that are highly relevant to transmission, as they occur on a daily basis and constitute discrete events. As well as being intermittent, movements of livestock are not necessarily consistent over time (Bajardi et al. 2011). A number of studies have shown that dynamic network analyses of livestock movements outperform those from static network analyses, when the aim is an in-depth understanding of disease spreading processes (Lentz et al. 2016;Vidondo and Voelkl 2018;Rossi et al. 2017), or predictions of epidemic risks (Valdano et al. 2015). The study of the dynamics of the cattle-sheep network in Scotland is of interest, because as well as the general dynamics of livestock networks we consider two farming systems which have distinct seasonalities and varying trading behaviours, and the interaction between these systems.
The aim of this work is to understand how the sheep and cattle movement networks interact, and the implication for understanding disease spread. We first analyse the static movement networks, and compare the results of the single-and multi-species networks. Secondly we analyse and compare the cattle, sheep, and multi-species dynamic networks. Finally, we compared results of the static and dynamic network analyses with a disease simulation model explicitly incorporating the temporal dynamics of the network. The dynamic network analysis exhibited important differences between the singlespecies and the multi-species networks, providing evidence that the premises driving epidemics would not be the same in the single-species and the multi-species networks. These results would have important consequences for disease control. In addition, we showed that dynamic network measures outperform static network measures to identify the most important farms in the network.

Materials and methods
Cattle movement data were obtained from the Cattle Tracing System (CTS), Animal Plant and Health Agency (APHA), and sheep movement data were retrieved from ScotEID, the livestock traceability system for Scotland managed by the Scottish Agricultural Organization Society (SOAS) on behalf of the Scottish Government. We considered movements within Scotland only: between premises, which can be farms, markets or shows. Our interest is in the control of an outbreak after introduction, and therefore movements to or from outside of Scotland were ignored. Births, deaths, and movements to slaughterhouses were also ignored, because the length of the period considered in the study (i.e. four weeks) is short compared to the turnover in the population. Some characteristics of the data are summarised in Table 1.
Overall the sheep population is larger accounting for 6.83 million heads, while the cattle were 1.76 million. There were slightly more sheep farms than cattle farms; of these, 6,039 farms raised cattle and sheep on the same premises (i.e. 50% of the sheep farms, and 56% of the cattle farms). In addition to the farms, the data include 26 auction markets.
We constructed networks by considering each premises as a node, and animal movements between two premises as a directed link. If one movement of an animal between two premises occurred during the period considered, we assigned a permanent link between these two premises in the static network. The links were weighted depending on the number of animals moved and the probability of an animal being infected: where µ is the probability of an animal being infected, and n the number of animals moved. The probabilities depend on the type of movement and the species. We used the parameter values estimated by Kao et al. (2006) in the 2001 FMD epidemic in GB: µ 1 = 0.02 for a sheep movement between two farms; µ 2 = 1 for a cattle movement between two farms; µ 3 = 0.004 for a sheep movement from a market; µ 4 = 0.02 for a cattle movement from a market.
These weights are relevant for an infectious disease similar to FMD, where the infectiousness of sheep is lower than that of cattle (Geering 1967;Gibson and Donaldson 1986;Sørensen et al. 2000;Ferguson et al. 2001).
In the dynamic network each link was annotated with a time variable equal to the date of the animal movement (i.e. we assume these movements occur on a single day).

Static network analysis
We considered the static networks in successive 4-week periods. This allows us to highlight (i) short-term changes in the network structure, which would be relevant for the control of a fast-spreading disease, and (ii) temporal variation according to the season. Livestock movements are generally seasonal, depending on the species and type of production. In Scotland the cattle network typically shows two peaks; the largest is observed in spring, and the second largest in autumn (Robinson and Christley 2006), whereas the sheep network has one main period of high trading activity around September . These peaks can be seen in Fig. 1, which shows the number of cattle and sheep moved in each 4-week period of the year. The combined network is represented in Fig. 2, during the Spring and the Autumn peak, 5 th adn 10 th 4-week periods of the year respectively.

Table 1 General characteristics of the setting in figures
The number of nodes describe the total number of farms raising cattle or sheep reporting at least one animal movement during 2016. The number of movements (batches), the total number of animal moved per species, as well as the headcount of cattle and sheep in Scotland are shown. The number of distinct movements correspond to the number of unique pair of origin and destination in each network We examined the overall characteristics of each network by calculating the average path length, clustering coefficient, edge density, component structure (number of components and sizes of the giant strongly and weakly connected components (GSCC and GWCC respectively) and diameter (definitions in Table 2). These measures were calculated for the single-species networks and the multi-species network, for each 4-week period of the year 2016.   (Freeman 1978) Diameter Length of the shortest path between the two most distant nodes of the network (Wasserman and Faust 1994) Edge density Proportion of links between nodes that actually exists in the network, calculated as the number of links, divided by the possible number of links (Wasserman and Faust 1994) PageRank A variant of Eigenvector Centrality, primarily used for directed networks: measure of a node's importance while giving consideration to the importance of its neighbors in a directed network (Newman 2010) We then calculated node centrality measures for all premises of the network, using the geometric mean degree, betweenness and PageRank (definitions in Table 2). In our case, degree centrality corresponds to the number of trading partners a farmer has. Because our network is directed, we differentiate in-degree (denoted degree in ), i.e. number of premises a farmer buys animals from, and out-degree (denoted degree out ), i.e. number of premises a farmer sells animals to. The geometric mean of the degree degree in × degree out (denoted GM − Deg ), accounts for the risk of introducing the disease as well as spreading it further. Betweenness centrality is the frequency with which a premises is in the shortest path between pairs of premises in the network. Identifying high-betweenness premises is useful from a disease control point of view because these premises represent bridges, which can accelerate the epidemic by spreading diseases to previously unexposed communities of farms. PageRank centrality is based on an algorithm used by Google to rank web pages in their search engine (Page et al. 1999). Pag-eRank centrality can capture useful information relevant to diffusion processes, such as epidemics, in networks (Bucur and Holme 2019; Kandhway and Kuri 2017). Data manipulation and analysis have been conducted in R (R Core Team 2019); the 'igraph' package (Csardi and Nepusz 2006) was used for the network analysis.
We used these measures to rank the premises in each 4-week period for the singlespecies and multi-species networks respectively. The premises which showed the highest value (i.e. ranked first) was removed, and the measure was computed again. We focused on the top 100 premises in each network, and refer to these as the risky premises. These premises could be targeted for control strategies in the first stages of an epidemic.
We compared the set of risky premises from the multi-species network, with the set of risky premises in the cattle or sheep network by looking at the intersection. The size of the intersection in the set of risky premises between single-species and multi-species networks serves as a measure of how wrong one would be if considering only one species or the other, instead of the combination of both in the context of an outbreak where both species would be involved in the epidemiology.

Dynamic network analysis
Livestock movements for trade are occasional and not necessarily recurrent over time. Animal movements occur and are recorded on a daily basis, giving the network a temporal dimension. Thus, it is a system where network dynamics are both likely to be important and are well recorded. In the dynamic network, links are considered as an origin, a destination, and a date of occurrence. Two nodes are in contact if there exists a temporally logical path between them (see Fig. 3).
In order to assess the importance of premises in the dynamic network, we calculated temporal Outgoing Contact Chains (OCC) and Ingoing Contact Chains (ICC), which are derived from the reachability, as described by Holme (2005). Contact chains (CC) were used in the context of diseases in livestock systems by Dubé et al. (2008) under the name of infection chain. Here we used the method previously described by Konschake et al. (2013), where the OCC is defined as the number of premises that can be temporally reached from a primary infected node, considering an infectious period of k days. The ICC is the number of nodes from which a particular node can be temporally reached, accounting for the considered infectious period. We considered an infectious period of seven days, consistent with a fast-spreading FMD-like disease. In other words, the OCC of a premises corresponds to the largest possible epidemic size if the outbreak started in this premises; and the ICC of a premises is proportional to its probability of being infected if an epidemic starts somewhere in the network. We used a method based on a Breadth-First-Search algorithm to calculate the contact chains for limited periods of four weeks. Starting from a designated node, we traverse the network by exploring all the neighbor nodes at the present depth prior to moving An example of temporal paths. In this example there is a temporal path between node 1 and 3, if we consider an infectious period k, there is no path between 1 and 4. There is no temporal path between 1 and 5, since the movement from 2 to 5 occurs prior to the movement from 1 to 2; if t 3 − t 0 < 28 days, node 1 and 2 would be connected to 3, 4 and 5 in the static network on to the nodes at the next depth level. We chose to compute the measure for a period of four weeks because: (i) we are interested in the early stage of the epidemic before the outbreak is detected and a movement ban applied; (ii) this makes our results comparable with the results of the static network analysis which had been performed for the same periods.

Unweighted in-and out-going contact chains
We first considered unweighted links to avoid making assumptions about the characteristics of the disease. This corresponds to the worst case scenario where the probability of transmission is certain given a link between premises. We compared the sets of risky premises according to the geometric mean of their contact chains ( GM − CC ), defined as √ ICC × OCC , in the different networks, i.e. comparing the top hundred risky premises in the single-species network and the multi-species network. This measure has been proven useful to assess the infection potential for fast spreading disease (Rossi et al. 2017). We also looked at the changes in the set of risky premises according to geometric mean degree and geometric mean contact chain sizes for the same network, to understand the difference between considering a static or dynamic network.
In order to highlight potential shifts in estimated risk between the multi-species and the cattle systems, we looked at the difference in maximal epidemic size between these two systems, by quantifying the change in the OCC of cattle premises taking into account the movements of both species or cattle movements only (see schematic  Fig. 4). The maximal size of an epidemic is a critical parameter, often used in epidemiological studies to quantify the potential impact of an outbreak. Because we computed the OCC for a limited period of 28 days, the OCC is the potential size of the epidemic after 28 days of uncontrolled spread. We calculated for all cattle premises the factor by which their OCC was multiplied in the multi-species network; we called this factor the multiplication factor, defined as: where OCC M and OCC C are the OCC in the multi-species and cattle networks respectively.

Weighted in-and out-going contact chain in a multi-species network
Assuming that all movements are equally important-regardless of the species type, the number of animals, or the characteristics of the premises-neglects important and potentially useful information affecting the spread of a disease. We therefore also calculated weighted Outgoing Contact Chains ( OCC w ) where the weights are equal to Formula 1 and correspond to the probability of transmission given that the node is infected. We consider a network defined as a set of nodes V, and the set of edges E j t,w j −→i where i, j ∈ V , t is a time, w j is a weight. We denote the probability of being infected for a node i at time t, p I (i, t) , the complementary probability of not being infected p NI (i, t) . The probability of disease transmission for a movement from j to i at time t is consequently equal to p I (j, t − 1) × w j .
We adapted the algorithm used in the previous section, using a similar method to the one proposed by Enright and Kao (2016). In the initial conditions, all nodes are susceptible, except one root node u. At each discrete time step, we identify all edges E j t,w j −→i , where j has a non null probability of being infected. The probability of not being infected for the nodes i is updated by multiplying it by the probability for the edge j t,w j −→i to not transmit infection, which is 1 − p I (j, t − 1) × w j . We keep track of the probability of not having been infected so far, to consider cases of multiple potential infections. We present this algorithm as pseudo code in Algorithm 1.
(2) OCC M OCC C Likewise, we calculated the weighted ICC ( ICC w ) for all premises in the multi-species network and the different periods of the year. We ranked premises according to the geometric mean of weighted contact chains ( GM − CC w defined as √ ICC w × OCC w ), expecting this ranking to be relevant to the prioritisation of control strategies.

Disease simulations
To investigate agreement between the network analysis results and a more realistic situation, we stochastically simulated transmission of a fast-spreading disease in both cattle and sheep. The simulation is based on a Susceptible-Infected-Recovered (SIR) metapopulation model, compatible with an immunising infection. The time step is one day, to take into account the daily recorded animal movements, and disease transmission is frequency-dependent. We considered an infection with asymmetric transmission risk, where the rate of effective contacts β has the highest value between cattle, and the lowest from sheep to cattle (Table 3). The contact rate between sheep, and from cattle to sheep have intermediate values. The parameter values were chosen arbitrarily within the range of plausibility for a fast-spreading disease like FMD (Keeling 2005). Parameter values are given in Table 3.
For simplicity, we simulated epidemics starting only in premises having an OCC size greater than 100 premises, that is, premises that can potentially lead to an epidemic of 100 premises or more. The simulations were run for a limited period of four weeks, starting at the first day of each 4-week period of the year. We used the SimInf package (Widgren et al. 2019) in R to perform 100 simulations per seed, and recorded the size of the epidemic after four weeks, as well as the number of times a premises was involved in the outbreak over all simulations for each period.
We defined an indicator of the epidemic risk for each premises and each period as, ER = N E × N I , where N E is the average size of the epidemic at four weeks, and N I is the number of times the premises is infected during the epidemic and is proportional to the probability of getting infected.
To evaluate the performance of network measures in identifying the most important farms, we compared the 100 premises with the highest ER according to the simulations with the 100 most risky premises according to the different measures ( GM − Deg , betweenness, PageRank, GM − CC , and GM − CC w ). For this comparison we considered only farms in the ranking, because markets and shows are already known to be high risk and would be targeted first for control measures.

Results
The number of sheep movements was consistently higher than the number of cattle movements (Fig. 1). The highest volume of trading activity in the Scottish network occurs in late summer to early autumn, when the sheep movement volumes peak. Overall, most of the recorded movements went through markets, accounting for 75% of the trading operations for cattle, and 93% for sheep.

Static network analysis
The sheep network is more dense than the cattle and the multi-species networks, whereas the cattle network is generally more clustered (Table 4). As expected, average path length is longest in the cattle network, and shortest in the multi-species network. These shorter paths between pairs of premises enable faster spread of diseases. The sheep network is generally more connected, with typically only a single weakly connected component, while the cattle network is sparser, counting on average 187.1 components during any 4-week snapshot. The multi-species network is less fragmented than the cattle-movement network; this indicates that sheep movements connect different components of mixed or cattle premises in the multi-species network, which were disconnected in the cattle network.
The variation in sizes of the components follows the same pattern as the seasonality of the movements (Fig. 5), i.e. the sizes of the strongly and weakly connected components show two peaks in the cattle network during the 5 th and 11 th 4-week periods of the year, whereas the sizes of both components sharply increase around the 10 th period of the year in the sheep network. Moreover, the membership to components across periods is mostly consistent, with over 60% of the premises constituting the GWCC of the cattle networks being the same in periods 5 and 11, when cattle movements peak. In the multi-species network, over 70% of the premises in the GWCC are remaining the same between these periods. In addition, the size of the GSCC in the multi-species network is always larger than the sum of the GSCC in the cattle and the sheep networks. This highlights how interconnected the two farming systems are. This is important, because the size of the GSCC corresponds to a lower bound on the maximum number of nodes that a newly introduced infectious agent might reach. Figure 6 shows on the one hand that the three measures used are correlated, consistently identifying mostly the same premises as risky; but on the other hand that, for all periods of the year, the most risky premises in the multi-species network are more similar to the ones in the cattle network. The graph also shows, during the 9th and 10th periods, an increase in the number of identical risky premises between the sheep and the multi-species systems, whilst a decrease in this number is observed between the multispecies and cattle network. The majority (more than 96%, for all 4-week periods) of risky premises in the multi-species network were also considered most risky in the cattle network, for all three network measures considered. However, only around 20% of the risky premises in the multi-species network were also identified as most risky in the sheep network, in any 4-week period.

Dynamic network analysis
In the dynamic network analysis, the risky premises are the top 100 premises with the largest GM − CC . The set of identified risky premises in the multi-species network  are substantially different from the sets in the cattle or the sheep movement networks (Table 5). On average only 47.2% of the most risky premises in the multi-species network are most risky in the cattle networks as well, and 32.4% in the sheep networks.
Although for most of the year, the set of risky premises in the multi-species network is more similar to the one in the cattle network (Fig. 7), during the 9th and 10th periods of the year, the trend reverses with the risky premises in the multi-species network becoming more similar to the ones in the sheep network. In addition, over all periods on average 29% of the risky premises in the multi-species network are not identified as risky in any of the single-species networks. This suggests that some premises with the largest ICC and OCC in the multi-species network exhibit large contact chains only through combination of cattle and sheep movements. The proportion of risky premises of this kind can be as high as 72% during the 11th period.
These results differ considerably from those of the static analysis. We compared the sets of risky premises according to GM − Deg in the static networks, to the risky premises according to GM − CC in the dynamic networks (grey cells in Table 5). The percentage similarity between the risky premises according to these two measures is low, with on average only around 17% of the premises being the same.
Overall, 99.5% of the cattle premises considered have a larger OCC when including sheep movements, and therefore a multiplication factor greater than 1. Very few cattle premises see their OCC unchanged when sheep movements are considered, even during the period of low activity in the sheep network (Fig. 8). Sheep movements contribute to the construction of significantly larger OCC in most cases: over all periods of the year, half of the premises (54%) see their OCC multiplied by at least 2. As expected, large increases in OCC is more consistent during the period of high activity Almost all cattle premises see their OCC increased in the multi-species network (Fig. 9), but the range of OCC values as well as the range of increase are variable. The OCC values are similar in the 4th and 10th periods are similar for the cattle network, but differ considerably for the multi-species network, exhibiting sharper increases of the OCC during the 10th period. During the 10th period, on average 2513 additional premises can be reached in the multi-species network, which would not be reached through cattle movements only.

Disease simulations
The static network measures identified at most 11% of the top 100 farms involved in the simulation, and their performance is very poor in most time periods (Fig. 10).  . 7 Comparison of the single-species and multi-species dynamic network analysis results. Number of identical risky premises between networks after ranking premises according to geometric mean contact chains. In grey is shown the number of premises which are risky in the multi-species, the sheep, and the cattle networks, in blue and in green is shown the number of premises risky in the multi-species and the cattle network but not in the sheep network, and the multi-species and the sheep, but not in the cattle network respectively; the red area represents the premises which are risky in the multi-species network only Dynamic network measures offer a clear improvement. The static measures identified on average only 7% of the 100 farms potentially most important in the epidemic, whereas the dynamic measures identified on average over 30% of the main actors of an outbreak (after markets and shows).

Fig. 8
Multiplication factor distribution considering only cattle premises having an OCC of more than 10 premises Fig. 9 Multiplication factor of cattle premises in the multi-species network for three periods of the year. The graph shows the multiplication factor in log scale according to the OCC in the cattle network. Only premises with an OCC of at least ten premises are shown

Discussion
The first objective of this study was to explore the Scottish cattle-sheep multi-species network characteristics to determine if this network was substantially different than the single-species ones. The temporal multi-species network exhibited significant differences in its structure, compared to the temporal cattle network. We found that more than a half of the risky premises in the multi-species network, were not identified as risky in the cattle network. If the cattle network was used to identify risky premises in a context of a disease involving sheep and cattle, these premises would be missed. More importantly, a number of risky premises are identified as such in the multi-species network only: 72% in the 11th period of the year, indicating that their risk is derived from interaction between the two farming systems. These differences indicate that, not only are the risks associated with multi-species epidemics higher, the premises likely to be driving those risks are also different. These differences are not captured by a static network representation of the system, and underlines the importance of temporality in livestock movement networks.
In the temporal network, most OCC of cattle premises were significantly larger in the multi-species network than in the cattle network, for all periods of the year. By constructing longer contact chains, interaction between the farming systems increases risk of larger epidemics throughout the year, and not only during the period of intensive trading in the sheep farming system, as one may expect. Finally, when disease spread was simulated in the multi-species system, the temporal measures performed better at identifying the most important farms than the static network measures. The inability of static descriptors to reliably predict outbreak risk is expected (Vidondo and Voelkl 2018), when the pattern of contacts changes over timescales that are short compared to disease generation times. The measures based on contact chains take into account the more important aspects of the movement networks, such as temporal paths, which are relevant in the occurrence of an epidemic (Holme 2005;Lentz et al. 2016). Usefulness of a measure based on ingoing and outgoing contact chains for assessing disease risk was already confirmed by other studies (Frössling et al. 2012;Vidondo and Voelkl 2018). We used similar metrics, but took into account transmission probabilities. Our weighted contact chains performed better than the simpler contact chains. Here we used weightings in line with the 2001 FMD epidemic in the UK, which substantially improved the predictive power of the metric. This shows that even a mildly informed choice of transmission probability per link, without prior knowledge on the disease parameters, gives a better prediction of risk than an unweighted network.
Our work reinforces the importance of incorporating differences in transmission probability where they exist within a system (e.g. differences between host-species in a multi-species system (Dobson 2004)). Significant variation can lead to critically different disease dynamics (Lloyd-Smith et al. 2005) which must be captured for predictive modelling. In our weighted network, the weights incorporate variation due to both the volume and species traded, which allows our network metrics to explicitly include this information. Similar methods based on contact chains have included more explicit simulation rather than metric calculation: e.g. Knific et al. (2020) reports work which filters a weighted temporal network and then simulates scenarios with differing transmission probabilities, thus somewhat decoupling the disease model from the underlying network. Our metric differs in that it incorporates transmission probabilities directly into network and metric, and is thus most useful when consider a particular pathogen with known transmission probabilities that vary by category of edge.
Future work is possible to improve the applicability of this approach to FMD, to broaden its use to other diseases and systems, and to improve the computational performance of the method. While we demonstrated the importance of using a multi-species network to understand transmission of an FMD-like disease, additional work remains to be done. Cattle and sheep are not the only species vulnerable to FMD: in a future FMD epidemic, network layers including other species (e.g. pigs) could be included, as well as non-trade layers that could incorporate transmission risk due to shared equipment or human movements.
Because characteristics of the disease are included in our weighted temporal network construction, a number of adaptations would be needed to apply our work to other diseases. In particular, the time of aggregation has important consequences for the network's interaction with the pathogen (Bajardi et al. 2011). The time scale in our approach should therefore be adapted to correspond to both the infectious period of a disease and the time scale of network dynamics (Kao et al. 2007).
While we have implemented our methods with reasonably efficient code, our focus has been on assessing the usefulness of the approaches as opposed to producing code optimised for speed and memory requirement. Computational performance could likely be improved, and further work may be required to deploy these approaches on very large or very dense networks.
Weighted contact chains can be a powerful tool to inform decisions in the early stages of an epidemic because it only relies on animal movement data that are immediately available. As well as being easy and fast to compute, it is deterministic, which means the metric can be calculated in a single computational run. In addition, the method proposed showed that weighting the network with reasonable transmission probabilities helps to improve the prediction of risk, which could aid decision making in the early stages of an epidemic when disease parameters are still unknown.