Multi-species temporal network of livestock movements for disease spread

Ruget, Anne-Sophie; Rossi, Gianluigi; Pepler, P. Theo; Beaunée, Gaël; Banks, Christopher J.; Enright, Jessica; Kao, Rowland R.

doi:10.1007/s41109-021-00354-x

Research
Open access
Published: 18 February 2021

Multi-species temporal network of livestock movements for disease spread

Applied Network Science volume 6, Article number: 15 (2021) Cite this article

2849 Accesses
7 Citations
5 Altmetric
Metrics details

Abstract

Introduction

The objective of this study is to show the importance of interspecies links and temporal network dynamics of a multi-species livestock movement network. Although both cattle and sheep networks have been previously studied, cattle-sheep multi-species networks have not generally been studied in-depth. The central question of this study is how the combination of cattle and sheep movements affects the potential for disease spread on the combined network.

Materials and methods

Our analysis considers static and temporal representations of networks based on recorded animal movements. We computed network-based node importance measures of two single-species networks, and compared the top-ranked premises with the ones in the multi-species network. We propose the use of a measure based on contact chains calculated in a network weighted with transmission probabilities to assess the importance of premises in an outbreak. To ground our investigation in infectious disease epidemiology, we compared this suggested measure with the results of disease simulation models with asymmetric probabilities of transmission between species.

Results

Our analysis of the temporal networks shows that the premises which are likely to drive the epidemic in this multi-species network differ from the ones in both the cattle and the sheep networks. Although sheep movements are highly seasonal, the estimated size of an epidemic is significantly larger in the multi-species network than in the cattle network, independently of the period of the year. Finally, we demonstrate that a measure based on contact chains allow us to identify around 30% of the key farms in a simulated epidemic, ignoring markets, whilst static network measures identify less than 10% of these farms.

Conclusion

Our results ascertain the importance of combining species networks, as well as considering layers of temporal livestock movements in detail for the study of disease spread.

Introduction

Infectious diseases in livestock are of great concern as they pose an economic burden, compromise animal health and welfare, and threaten human health by contributing to the emergence of new zoonotic diseases. Mathematical models of infectious disease spread are useful tools to help us understand the drivers of an outbreak, and inform policy decisions. In a world where pandemics are becoming more likely (Morse 2001; Jones et al. 2008; Madhav et al. 2017), the usefulness of modelling techniques is well recognised (Colizza et al. 2007; Dye and Gay 2003; Lessler et al. 2014). Presently, models of infectious disease spread at a population scale are typically based on two phenomena: (i) the infection dynamic, which depends on the characteristics of the disease itself (transmission rate, infectious period, etc.), and (ii) the contact patterns allowing for disease transmission, depending on the transmission routes of the disease.

Here our interest is in the transmission of infectious livestock diseases, where the movements of live animals between farms are known to be one of the main transmission routes (Fèvre et al. 2006). Better knowledge and understanding of contact patterns is a key element for building realistic and useful models. However, detailed models can be computationally costly, and require a substantial amount of data in order to be fitted properly.

When outbreaks occur, policy makers need rapid and robust information to define their strategy and support decisions at the early stages of the epidemic, when data are still limited. A better understanding of the structure of the livestock movement network and its characteristics is therefore useful, both to understand their role in the spread of endemic diseases such as bovine Tuberculosis (Boehm et al. 2009; Palisson et al. 2016; Brooks-Pollock et al. 2014; Green et al. 2008), or BVD (Tinsley et al. 2012)), and to inform policies to control a newly introduced disease in an early stage. The 2001 FMD epidemic provided considerable incentive to study and use livestock movements for network analysis (Ortiz-Pelaez et al. 2006; Christley et al. 2005; Kao et al. 2006; Robinson and Christley 2007; Robinson et al. 2007; Kiss et al. 2006; Vernon and Keeling 2009; Volkova et al. 2010). Analysis of contact networks has proven useful to help identify key actors in terms of disease spread.

Although most infectious diseases can affect several host species (Taylor et al. 2001), network analysis studies have generally focused on single species contact networks. Notable exceptions include Boehm et al. (2009), Nöremark et al. (2011), Kao et al. (2006), and Mohr et al. (2018). Practically, aggregation of movement data from different species is often difficult, because (i) data are recorded separately, often stored in different databases, and possibly managed by different administrative authorities; and (ii) the databases might have different formats or contain different levels of information, and therefore need to be homogenised before use.

The cattle and sheep farming systems are strongly linked in Scotland, because on approximately half of the cattle farms, sheep are also raised. This allows ample opportunity for transmission of diseases between the two species, with FMD and bluetongue virus (BTV) being notable examples of diseases affecting both species. As a consequence, mixed-species farms can link groups of farms that would not be in contact in the network if the species were considered separately. From the network point of view, this might also have consequences for metrics describing the general structure, as well as the ranking of importance of nodes. It is therefore crucial to explore the multi-species network in order to highlight and quantify potential consequences for disease risk.

Livestock movement network analyses have been performed mostly on static networks, where most analytic results are available (Newman 2018). A static network assumes that the change in the set of contacts are negligible over the course of the epidemic (Enright and Kao 2018). In reality, livestock movements have an inherent temporal component, that are highly relevant to transmission, as they occur on a daily basis and constitute discrete events. As well as being intermittent, movements of livestock are not necessarily consistent over time (Bajardi et al. 2011). A number of studies have shown that dynamic network analyses of livestock movements outperform those from static network analyses, when the aim is an in-depth understanding of disease spreading processes (Lentz et al. 2016; Vidondo and Voelkl 2018; Rossi et al. 2017), or predictions of epidemic risks (Valdano et al. 2015). The study of the dynamics of the cattle-sheep network in Scotland is of interest, because as well as the general dynamics of livestock networks we consider two farming systems which have distinct seasonalities and varying trading behaviours, and the interaction between these systems.

The aim of this work is to understand how the sheep and cattle movement networks interact, and the implication for understanding disease spread. We first analyse the static movement networks, and compare the results of the single- and multi-species networks. Secondly we analyse and compare the cattle, sheep, and multi-species dynamic networks. Finally, we compared results of the static and dynamic network analyses with a disease simulation model explicitly incorporating the temporal dynamics of the network. The dynamic network analysis exhibited important differences between the single-species and the multi-species networks, providing evidence that the premises driving epidemics would not be the same in the single-species and the multi-species networks. These results would have important consequences for disease control. In addition, we showed that dynamic network measures outperform static network measures to identify the most important farms in the network.

Materials and methods

Cattle movement data were obtained from the Cattle Tracing System (CTS), Animal Plant and Health Agency (APHA), and sheep movement data were retrieved from ScotEID, the livestock traceability system for Scotland managed by the Scottish Agricultural Organization Society (SOAS) on behalf of the Scottish Government. We considered movements within Scotland only: between premises, which can be farms, markets or shows. Our interest is in the control of an outbreak after introduction, and therefore movements to or from outside of Scotland were ignored. Births, deaths, and movements to slaughterhouses were also ignored, because the length of the period considered in the study (i.e. four weeks) is short compared to the turnover in the population. Some characteristics of the data are summarised in Table 1.

Table 1 General characteristics of the setting in figures

Full size table

Overall the sheep population is larger accounting for 6.83 million heads, while the cattle were 1.76 million. There were slightly more sheep farms than cattle farms; of these, 6,039 farms raised cattle and sheep on the same premises (i.e. 50% of the sheep farms, and 56% of the cattle farms). In addition to the farms, the data include 26 auction markets.

We constructed networks by considering each premises as a node, and animal movements between two premises as a directed link. If one movement of an animal between two premises occurred during the period considered, we assigned a permanent link between these two premises in the static network. The links were weighted depending on the number of animals moved and the probability of an animal being infected:

$$\begin{aligned} 1-(1-\mu )^n \end{aligned}$$

(1)

where $\mu$ is the probability of an animal being infected, and n the number of animals moved. The probabilities depend on the type of movement and the species. We used the parameter values estimated by Kao et al. (2006) in the 2001 FMD epidemic in GB:

$\mu _1=0.02$ for a sheep movement between two farms;
$\mu _2=1$ for a cattle movement between two farms;
$\mu _3=0.004$ for a sheep movement from a market;
$\mu _4=0.02$ for a cattle movement from a market.

These weights are relevant for an infectious disease similar to FMD, where the infectiousness of sheep is lower than that of cattle (Geering 1967; Gibson and Donaldson 1986; Sørensen et al. 2000; Ferguson et al. 2001).

In the dynamic network each link was annotated with a time variable equal to the date of the animal movement (i.e. we assume these movements occur on a single day).

Static network analysis

We considered the static networks in successive 4-week periods. This allows us to highlight (i) short-term changes in the network structure, which would be relevant for the control of a fast-spreading disease, and (ii) temporal variation according to the season. Livestock movements are generally seasonal, depending on the species and type of production. In Scotland the cattle network typically shows two peaks; the largest is observed in spring, and the second largest in autumn (Robinson and Christley 2006), whereas the sheep network has one main period of high trading activity around September (Kiss et al. 2006). These peaks can be seen in Fig. 1, which shows the number of cattle and sheep moved in each 4-week period of the year. The combined network is represented in Fig. 2, during the Spring and the Autumn peak, 5^th adn 10^th 4-week periods of the year respectively.

We examined the overall characteristics of each network by calculating the average path length, clustering coefficient, edge density, component structure (number of components and sizes of the giant strongly and weakly connected components (GSCC and GWCC respectively) and diameter (definitions in Table 2). These measures were calculated for the single-species networks and the multi-species network, for each 4-week period of the year 2016.

Table 2 Network analysis terminology

Full size table

We then calculated node centrality measures for all premises of the network, using the geometric mean degree, betweenness and PageRank (definitions in Table 2). In our case, degree centrality corresponds to the number of trading partners a farmer has. Because our network is directed, we differentiate in-degree (denoted $degree_{in}$), i.e. number of premises a farmer buys animals from, and out-degree (denoted $degree_{out}$), i.e. number of premises a farmer sells animals to. The geometric mean of the degree $\root \of {degree_{in} \times degree_{out}}$ (denoted $GM-Deg$), accounts for the risk of introducing the disease as well as spreading it further. Betweenness centrality is the frequency with which a premises is in the shortest path between pairs of premises in the network. Identifying high-betweenness premises is useful from a disease control point of view because these premises represent bridges, which can accelerate the epidemic by spreading diseases to previously unexposed communities of farms. PageRank centrality is based on an algorithm used by Google to rank web pages in their search engine (Page et al. 1999). PageRank centrality can capture useful information relevant to diffusion processes, such as epidemics, in networks (Bucur and Holme 2019; Kandhway and Kuri 2017). Data manipulation and analysis have been conducted in R (R Core Team 2019); the ‘igraph’ package (Csardi and Nepusz 2006) was used for the network analysis.

We used these measures to rank the premises in each 4-week period for the single-species and multi-species networks respectively. The premises which showed the highest value (i.e. ranked first) was removed, and the measure was computed again. We focused on the top 100 premises in each network, and refer to these as the risky premises. These premises could be targeted for control strategies in the first stages of an epidemic.

We compared the set of risky premises from the multi-species network, with the set of risky premises in the cattle or sheep network by looking at the intersection. The size of the intersection in the set of risky premises between single-species and multi-species networks serves as a measure of how wrong one would be if considering only one species or the other, instead of the combination of both in the context of an outbreak where both species would be involved in the epidemiology.

Dynamic network analysis

Livestock movements for trade are occasional and not necessarily recurrent over time. Animal movements occur and are recorded on a daily basis, giving the network a temporal dimension. Thus, it is a system where network dynamics are both likely to be important and are well recorded. In the dynamic network, links are considered as an origin, a destination, and a date of occurrence. Two nodes are in contact if there exists a temporally logical path between them (see Fig. 3).

In order to assess the importance of premises in the dynamic network, we calculated temporal Outgoing Contact Chains (OCC) and Ingoing Contact Chains (ICC), which are derived from the reachability, as described by Holme (2005). Contact chains (CC) were used in the context of diseases in livestock systems by Dubé et al. (2008) under the name of infection chain. Here we used the method previously described by Konschake et al. (2013), where the OCC is defined as the number of premises that can be temporally reached from a primary infected node, considering an infectious period of k days. The ICC is the number of nodes from which a particular node can be temporally reached, accounting for the considered infectious period. We considered an infectious period of seven days, consistent with a fast-spreading FMD-like disease. In other words, the OCC of a premises corresponds to the largest possible epidemic size if the outbreak started in this premises; and the ICC of a premises is proportional to its probability of being infected if an epidemic starts somewhere in the network. We used a method based on a Breadth-First-Search algorithm to calculate the contact chains for limited periods of four weeks. Starting from a designated node, we traverse the network by exploring all the neighbor nodes at the present depth prior to moving on to the nodes at the next depth level. We chose to compute the measure for a period of four weeks because: (i) we are interested in the early stage of the epidemic before the outbreak is detected and a movement ban applied; (ii) this makes our results comparable with the results of the static network analysis which had been performed for the same periods.

Unweighted in- and out-going contact chains

We first considered unweighted links to avoid making assumptions about the characteristics of the disease. This corresponds to the worst case scenario where the probability of transmission is certain given a link between premises. We compared the sets of risky premises according to the geometric mean of their contact chains ($GM-CC$), defined as $\root \of {ICC \times OCC}$, in the different networks, i.e. comparing the top hundred risky premises in the single-species network and the multi-species network. This measure has been proven useful to assess the infection potential for fast spreading disease (Rossi et al. 2017). We also looked at the changes in the set of risky premises according to geometric mean degree and geometric mean contact chain sizes for the same network, to understand the difference between considering a static or dynamic network.

In order to highlight potential shifts in estimated risk between the multi-species and the cattle systems, we looked at the difference in maximal epidemic size between these two systems, by quantifying the change in the OCC of cattle premises taking into account the movements of both species or cattle movements only (see schematic representation in Fig. 4). The maximal size of an epidemic is a critical parameter, often used in epidemiological studies to quantify the potential impact of an outbreak. Because we computed the OCC for a limited period of 28 days, the OCC is the potential size of the epidemic after 28 days of uncontrolled spread. We calculated for all cattle premises the factor by which their OCC was multiplied in the multi-species network; we called this factor the multiplication factor, defined as:

$$\begin{aligned} \frac{OCC_M}{OCC_C} \end{aligned}$$

(2)

where $OCC_M$ and $OCC_C$ are the OCC in the multi-species and cattle networks respectively.

Weighted in- and out-going contact chain in a multi-species network

Assuming that all movements are equally important—regardless of the species type, the number of animals, or the characteristics of the premises—neglects important and potentially useful information affecting the spread of a disease. We therefore also calculated weighted Outgoing Contact Chains ($OCC_w$) where the weights are equal to Formula 1 and correspond to the probability of transmission given that the node is infected. We consider a network defined as a set of nodes V, and the set of edges E $j{\mathop {\longrightarrow }\limits ^{t,w_j}}i$ where $i,j \in V$, t is a time, $w_j$ is a weight. We denote the probability of being infected for a node i at time t, $p_{I}(i,t)$, the complementary probability of not being infected $p_{NI}(i,t)$. The probability of disease transmission for a movement from j to i at time t is consequently equal to $p_I(j,t-1) \times w_j$.

We adapted the algorithm used in the previous section, using a similar method to the one proposed by Enright and Kao (2016). In the initial conditions, all nodes are susceptible, except one root node u. At each discrete time step, we identify all edges E $j{\mathop {\longrightarrow }\limits ^{t,w_j}}i$, where j has a non null probability of being infected. The probability of not being infected for the nodes i is updated by multiplying it by the probability for the edge $j{\mathop {\longrightarrow }\limits ^{t,w_j}}i$ to not transmit infection, which is $1-p_{I}(j, t-1)\times w_j$. We keep track of the probability of not having been infected so far, to consider cases of multiple potential infections. We present this algorithm as pseudo code in Algorithm 1.

Likewise, we calculated the weighted ICC ($ICC_w$) for all premises in the multi-species network and the different periods of the year. We ranked premises according to the geometric mean of weighted contact chains ($GM-CC_w$ defined as $\root \of {ICC_w \times OCC_w}$), expecting this ranking to be relevant to the prioritisation of control strategies.

Disease simulations

To investigate agreement between the network analysis results and a more realistic situation, we stochastically simulated transmission of a fast-spreading disease in both cattle and sheep. The simulation is based on a Susceptible-Infected-Recovered (SIR) metapopulation model, compatible with an immunising infection. The time step is one day, to take into account the daily recorded animal movements, and disease transmission is frequency-dependent. We considered an infection with asymmetric transmission risk, where the rate of effective contacts $\beta$ has the highest value between cattle, and the lowest from sheep to cattle (Table 3). The contact rate between sheep, and from cattle to sheep have intermediate values. The parameter values were chosen arbitrarily within the range of plausibility for a fast-spreading disease like FMD (Keeling 2005). Parameter values are given in Table 3.

Table 3 Daily rates for the parameters in the simulation model

Full size table

For simplicity, we simulated epidemics starting only in premises having an OCC size greater than 100 premises, that is, premises that can potentially lead to an epidemic of 100 premises or more. The simulations were run for a limited period of four weeks, starting at the first day of each 4-week period of the year. We used the SimInf package (Widgren et al. 2019) in R to perform 100 simulations per seed, and recorded the size of the epidemic after four weeks, as well as the number of times a premises was involved in the outbreak over all simulations for each period.

We defined an indicator of the epidemic risk for each premises and each period as, $ER = N_E \times N_I$, where $N_E$ is the average size of the epidemic at four weeks, and $N_I$ is the number of times the premises is infected during the epidemic and is proportional to the probability of getting infected.

To evaluate the performance of network measures in identifying the most important farms, we compared the 100 premises with the highest ER according to the simulations with the 100 most risky premises according to the different measures ($GM-Deg$, betweenness, PageRank, $GM-CC$, and $GM-CC_w$). For this comparison we considered only farms in the ranking, because markets and shows are already known to be high risk and would be targeted first for control measures.

Results

The number of sheep movements was consistently higher than the number of cattle movements (Fig. 1). The highest volume of trading activity in the Scottish network occurs in late summer to early autumn, when the sheep movement volumes peak. Overall, most of the recorded movements went through markets, accounting for 75% of the trading operations for cattle, and 93% for sheep.