 Research
 Open Access
 Published:
A framework for reconstructing transmission networks in infectious diseases
Applied Network Science volume 7, Article number: 85 (2022)
Abstract
In this paper, we propose a general framework for the reconstruction of the underlying crossregional transmission network contributing to the spread of an infectious disease. We employ an autoregressive model that allows to decompose the mean number of infections into three components that describe: intralocality infections, interlocality infections, and infections from other sources such as travelers arriving to a country from abroad. This model is commonly used in the identification of spatiotemporal patterns in seasonal infectious diseases and thus in forecasting infection counts. However, our contribution lies in identifying the interlocality term as a timeevolving network, and rather than using the model for forecasting, we focus on the network properties without any assumption on seasonality or recurrence of the disease. The topology of the network is then studied to get insight into the disease dynamics. Building on this, and particularly on the centrality of the nodes of the identified network, a strategy for intervention and disease control is devised.
Introduction
Modeling the evolution of infectious diseases with the goal of forecasting the numbers of infections answers a wealth of questions in a range of disciplines from medicine, genetics, pharmacology to social and economic sciences [1, 2], and is at the heart of subsequent investigations on practical aspects: from management of resources (assessment of the preparedness for disease containment and readiness of the healthcare system) to possible intervention measures (vaccination and testing strategies [6], government control measures) and their consequences [7]. Some of these studies have a regional focus, investigating a disease propogation in countries, or regions of a country, while others have considered its dynamics in larger geographical contexts [3,4,5].
The different types of modeling that have been applied to investigate the dynamics of infectious disease have a long history in epidemic modeling. Compartmental models (e.g., SIR, SIER) are a well known class of models. They study the interplay between susceptible, infected and recovered individuals within communities, with different degrees of spatial refinement. For instance, in the so called networked compartmental models, interactions between communities are encoded in a network [8], often to identify the spatial and temporal origin of the disease [9]. In statistics, spatial/and/or temporal point processes are often employed to study the dynamics of the disease. Some models allow for the number of infections to be triggered by those at previous times, others can incorporate, as covariates, additional available information such as demographics, human mobility, and policy decisions. Some recent work follows this direction [10,11,12]. Epidemic models can also be recast in a standard regression framework, where the time series of infection counts are fitted by specifying a distribution for the counts and the associated conditional mean function [13, 14]. In most of these studies, the goal is to predict and capture the spatiotemporal patterns of disease spread. The major contribution of [13, 14] is in the study of diseases which exhibit a certain periodicity or recurrence, like influenza for example. Recently, this approach was implemented on COVID19 data [3,4,5] to mainly break down the contributing factors to the disease spread. We observe that this model’s richness is not fully explored as it has an underlying timeevolving transmission network, which was never fully identified and whose properties have never been explored.
In this paper, we propose a generic framework for the recovery of the transmission network in infectious diseases. Our method does not assume any periodicity in the dynamics or any underlying recurrence. It hinges on the seminal work of [13, 14], which is a statistical regression analysis of infection counts over time and aggregated over localities, in a country, with a mean function that takes account of the spatial proximity of these localities. The fitted model is then used to reconstruct a weighted network, which constitutes the second component of our framework. The salient point of our method of network recovery is that smoothness conditions on the temporal data are not required [15], and neither is the near steadystate dynamics that is instead necessary for the perturbation/response approaches to work [16,17,18]. Our reconstruction is similar to that in [19], while it differs from procedures that rely on the deterministic evolution of the disease [20, 21]. The third component of our framework is the study of the changes in the topology of the underlying recovered network and the computation of centrality measures (specifically the nodes’ betweenness centrality), from which the recommendation of optimal control measures ensues. We apply the method in the case of COVID19 in Lebanon.
Model definition
Our starting point of the analysis is a statistical model that captures the spatiotemporal dynamics of the infections, under the statistical framework discussed in [13]. Namely, we consider the number \(Y_{it}\) of infections recorded in a given locality i in a given day t, as independent, conditionally on the counts at previous times, random variables distributed according to a negative binomial distribution having a mean function decomposed into three terms as follows \(\mu _{it} = E(Y_{i,t}Y_{j,t1}, e_i)\):
The first two terms constitute the autoregressive part of the model: one being the contribution to the mean infection \(\mu _{it}\) in locality i at time t, due to the infections within i at the previous day, the other being the contribution to \(\mu _{it}\) due to positive cases from other localities j also at the previous day. The final term accounts for all other contributions not captured by the first two, such as infected people who entered the country under study from abroad. For simplicity we will refer to the last term \(\nu _{it}e_i\) as the component due to travel and assume that it is proportional to the size of the population \(e_i\) of the locality. The logtransforms of nonnegative coefficients \(\lambda _{it}\) and \(\phi _{it}\), which quantify the contribution of the past observations to future counts, and the logtransform of the parameter \(\nu _{it}\) are each modeled as a linear function of time, with a localityspecific slope to allow more flexibility across localities. Intercepts and slopes are estimated from the data. Finally, we model \(\omega _{ji}\) as a power function of the geographical distance \(d_{ij}\) of the localities: \(\omega _{ji} \propto d_{ij}^{f}\). This is assumed because previous studies have shown that mobility flows are governed by powerlaw functions of interlocalities distances [22,23,24,25].
Network identification and characterization
We wish to focus now on the interlocality term and study it from a different perspective. To do so, observe that the second term in the mean equation (1) can be rewritten as follows: \(\sum _{j \ne i}A_{ij}(t) Y_{j, t1}\), where \(A_{ij}(t)=\phi _{it}\cdot \omega _{ij}\). It can be interpreted as the contribution to the cases at time t in locality i from cases from locality j at the previous day. We can suggestively think of \(A=(A_{ij})\) as defining the weights of a network between localities: the transport network \(\omega\) describes the traffic flow between localities, and thus predates the disease, while \(\phi _{it}\) is the number of transported cases from i into neighboring localities. A(t) explicitly depends on t since the coefficients of \(\log \phi _{it}\), which are linear functions of t, and the power f in the definition of \(\omega\) are estimated over each interval [0, t].
This complex network A drives the crosslocalities dynamics. We will now suggest employing some useful summary metrics for A(t) and its time evolution in order to understand its properties, and accordingly prescribe adequate control measures.
One useful summary metric is the modularity, which is a measure of cluster formation in a network. More specifically, the modularity Q of a given network A is defined with respect to a given grouping of its nodes. We follow [27] where the grouping of the nodes is determined by a stochastic procedure that reveals densely connected subgraphs. An illustration of a grouping is given in Fig. 1.
Given this group membership, the modularity of A is then computed according to the formula:
where h denotes the total number of edges, \(k_i\) and \(k_j\) are the degree of nodes i and j respectively, \(c_i\) labels the group to which i belongs, and \(\delta\) is the Kronecker delta.
We also suggest to analyse additional topological measures for the networks: mainly, the clustering coefficient, the average path length , and the strength distribution [29, 30]. These are generally used to classify networks into random, scalefree, or regular. The clustering coefficient C of a network is a measure of transitivity that counts the ratio of the number closed triplets to the number of all (closed and open) triplets. A triplet is closed if all the three connections between the three nodes exist and is open if one of the links is missing. The average path length l of a network is given by the mean distance over all pairs of vertices, where distance is the number of edges in the shortest path joining them. An illustration is shown in Fig. 2.
Finally, a node’s strength is the sum of the weights of its edges. Namely, for the ith node:
Smallworld or scalefree networks (that is, networks with node degrees and strengths distributed according to a powerlaw) are characterized by high clustering coefficients and low average path lengths compared with those of regular/ordered graphs [29, 30]. Random graphs are, on the other hand, characterized by low average path lengths and low clustering coefficients compared to regular graphs. An illustration of the three different network types is shown in Fig. 3.
Data description and fitting of the regression model
As an application of the framework, the model (1) was applied to the COVID19 data collected in Lebanon. On daily basis, the laboratories from the public and private sectors report the confirmed cases to the Epidemiological Surveillance Program of Lebanon’s Ministry of Public Health (ESUMOH). Later, the cases are investigated in order to get additional demographic information and health condition. The data are then archived in a national platform. Specifically, the data we have considered consist of counts of COVID19 recorded daily in each of the 1544 localities of Lebanon from February 21, 2020 to January 20, 2022. Such localities correspond to Lebanon’s smallest statistical units called “circonscriptions foncières” or cadastral villages following the Central Administration for Statistics (CAS) nomenclature [26]. Recommendations on possible interventions and updates on the disease evolution were sought for by the Ministry of Public Health at 20 days intervals. Model (1) was fitted using the R package surveillance [14] over the intervals [0, t], \(t=20 \cdot n\) for \(n=1, \ldots , 41\). This allows us to follow the evolution in time of the model parameters until day 820, the last observation point.
Figure 4 displays the aggregated counts over all localities, \(\sum _{i} Y_{i, t}\), and the fitted values over the complete time period of our study broken down into the three components of the mean function. The fit appears quite adequate. It is in fact a better fit to the data than the model with counts assumed to be Poissondistributed, which is an indication of overdispersion in the data. A further comparison of these two models in terms of AIC value and prediction errors is provided in Table 1.
We further notice that the model of equation (1) considers a timelag of one day; that is, the future counts depend on the counts recorded on the previous day. Changes in the timelag from one to a few days did not result in any noticeable difference. Our analysis provides evidence that the interlocality infection drives the overall transmission of the disease [22]. Then, for this reason we shift our focus to the network that governs the interaction between localities and observe that it is not purely a static spatiallydependent network but rather dynamic and timeevolving: in fact the product of timedependent coefficients with the spatial proximity matrix. Figure 4 indicates that the interlocality term has the most important contribution to the increase in the mean number of infections compared to the intralocality and travel terms. This suggests that the interlocality transmissions should be the main focus of analysis, and what one learns from their study would be useful for disease control.
The parameter estimates of model (1) for all 1554 localities and their errors can not be displayed in an uncluttered fashion but are available from the corresponding authors. A sample of the evolution of the interlocality term \(\phi _{it}\) is shown in Fig. 5.
An example of the reconstructed network is provided in Fig. 6 which is a graphical representation of \(A^{(15)}(t=300)\). The superscript 15 in the notation of A indicates that the latter was estimated on the counts data of 15 contiguous 20day time intervals, that is the 300day time span \([0,20\cdot 15]\) from February 21, 2020 to December 16th, 2020.
Further, Fig. 7 shows the modularities of the 41 matrices \(A^{(41)}(t)\) at days \(t=20 \cdot n\) for \(n=1, \ldots , 41\). The superscript n in the notation \(A^{(n)}(t)\), as mentioned above, indicates that A is estimated using the counts of the \(20\cdot n\) days of the study.
Figure 8 shows the clustering coefficients and the average paths lengths for the matrices \(A^{(41)}(t)\). Similar behavior of both C and l was observed for all \(A^{(n)}\), with \(n \ge 10\).
One can see a jump in modularity on the tenth 20day time interval, which we will denote by \(I_c\). This behavior may signal the onset of an emerging powerlaw [28]. The evolution of both C and l gives additional evidence for a transition at a point \(I_c\). The clustering coefficient starts suddenly to increase. At the 10th interval there is an abrupt jump in the average path length as well at \(I_c\) (Fig. 8). This is an indication of scalefreeness of the network. This property expedites the spread of epidemics unlike what would occur in ordered networks, which are characterized by a slower spread because they possess a high C and an l that scales with system size [31, 32].
To characterize the transition to scalefreeness, we now analyse the distribution of the strengths of the nodes, as additional evidence for change in the network topology at the 10th interval \(I_c\). Figures 9 and 10 show the empirical and estimated distributions of the strengths (in fact, the survival function \(P(S> s)\)) of \(A^{(n)}(t = 20\cdot n)\), at the time intervals \(n=1, \ldots , 41\) on a loglog scale. We note that a transition occurs at \(t=20 \cdot 10\), where the distribution becomes linear, which is indicative of a powerlaw (Pareto distribution): \(P(X \le x)=1(\beta /x)^{\alpha 1}\), for \(x\ge \beta\). The exponent \(\alpha\) and the boundary value (scale) \(\beta\) are estimated by maximum likelihood following [33].
Figure 11 summarizes the estimates of the exponents for these networks and their standard errors (obtained by nonparametric bootstrap).
After the 180th day, that is for time intervals labeled by the index \(n \ge 10\), most powerlaws have very close exponents of about 2.5. This signals the stabilization of the network topology. Thus, \(I_c\) marks the onset of the emergence of the steady state network. We think that only above this point any prescription of control measures is likely to be efficient as the revealed network topology, relying on the daily counts, has stabilized. One can wonder if there is any explanation on why the stable phase has set in during this interval \(I_c\), and not before or after it. \(I_c\) chronologically coincides with the period between August, 19, 2020, and September 7, 2020. Perhaps, the blast in Beirut which occurred on August 4th and in the following weeks of social protests, personal precaution measures (such as social distancing and wearing of masks) were compromised. Either of these occurrences may have contributed to the detected change in the network type. See the Appendix for the chronology.
Putting the analysis into action: control measures
Having fully characterized the network and identified the steadystate, we now turn to a possible use of this analysis to guide an optimal strategy for disease control. The strategy will identify some localities as candidates for being isolated or for having their connections to other localities curtailed. The measure on which the identification is based is that of centrality of a node. The betweenness centrality of a node v is defined as [30]:
where \(\sigma _{ij}\) is the total number of shortest paths from node \({\displaystyle i}\) to node \({\displaystyle j}\) passing through \(\sigma _{ij}(v)\). Therefore, the more central the node is, the more its removal has an effect on the network’s connectivity, since its removal would yield a network with more disconnected subgraphs. The control strategy we propose involves an iterative procedure, where at each step the centralities of the nodes are computed, the node with the resulting highest centrality is removed, and the matrix A is updated, as illustrated in Fig. 12.
Other removal schemes of nodes in network exist, but the one we have just described has been suggested to incur the highest loss of connectivity for scalefree networks [34,35,36,37,38,39]. In practice, candidate targets for intervention the localities corresponding to nodes with higher centralities. We notice that at the policy level this strategy based on our analysis was indeed adopted. The localities we have identified through this strategy were given priority in the national vaccination campaign. On the other hand, the recommendations we put forward based on this analysis were only partially adopted in targeting the high centrality localities for lockdown and intervention measures, as the decision making process involved other ministries and stakeholders. However, we conclude by considering theoretically the wouldbe repercussions of such implementation. Clearly, the loss of connectivity would impede the evolution of the disease since the localities which are contributing the most to the infection would be isolated. For example, removing around \(20\%\) of the most connected localities on the basis of their betweenness centrality would lead to \(80\%\) loss of connectivity as shown in Fig. 13.
Specifically, the localities causing \(80\%\) loss of connectivity are shown in Fig. 14, while the fitted model of the top sixteen localities is shown in Fig. 15. An animated map of the control strategy is available on this https://www.dropbox.com/s/guhamz3p7op6b3y/animated.gif?dl=0.
Conclusion
In this paper, we have proposed a framework that can be used to inform control measures for epidemics in a country for which infections counts aggregated over local regions are available over time. In particular, and as an example, we have followed the evolution of the counts of COVID19 cases in Lebanon at the level of local administrative units at a daily resolution. The framework entails fitting an autoregressive model to the data; recovering an underlying network over which the disease propagates; analyzing such timeevolving network to identify topological measures of node centrality that suggest an optimal control of the spread of the disease. Specifically, for the data about COVID19 in Lebanon the analysis of the topological metrics of the network has given us a hint into a transition to a steady state structure that governs interactions between localities. After identifying this steady state network, and characterizing it as a scalefree, we have proposed control measures based on betweenness centrality of its nodes. The findings were taken into consideration in the national vaccination campaign for COVID19, with the identified localities given priority for vaccination.
Availability of data and material
The data is available upon request from the Ministry of Public Health’s Surveillance Program.
Abbreviations
 ESUMOH:

Epidemiological Surveillance Program of Lebanon’s Ministry of Public Health
 CAS:

Central Administration for Statistics
References
Mohamadou Y, Halidou A, Kapen PT (2020) A review of mathematical modeling, artificial intelligence and datasets used in the study, prediction and management of COVID19. Appl Intell 50(11):3913–3925
Guan J, Wei Y, Zhao Y, Chen F (2020) Modeling the transmission dynamics of COVID19 epidemic: a systematic review. J Biomed Res 34(6):422
Celani A, Giudici P (2021) Endemicepidemic models to understand COVID19 spatiotemporal evolution. Spat Stat 49:100528
Ssentongo P, Fronterre C, Geronimo A, Greybush SJ, Mbabazi PK, Muvawala J, Nahalamba SB, Omadi PO, Opar BT, Sinnar SA et al (2021) PanAfrican evolution of withinand betweencountry COVID19 dynamics. Proc Natl Acad Sci 118(28):e2026664118
Dickson MM, Espa G, Giuliani D, Santi F, Savadori L (2020) Assessing the effect of containment measures on the spatiotemporal dynamic of COVID19 in Italy. Nonlinear Dyn 101(3):1833–1846
Gozzi N, Bajardi P, Perra N (2021) The importance of nonpharmaceutical interventions during the COVID19 vaccine rollout. medRxiv
Perra N (2021) Nonpharmaceutical interventions during the COVID19 pandemic: a review. Phys Rep 913:1–52
Brockmann D, Helbing D (2013) The hidden geometry of complex, networkdriven contagion phenomena. Science 342(6164):1337–1342
Schlosser F, Brockmann D (2021) Finding disease outbreak locations from human mobility data. EPJ Data Sci 10(1):52
Zhu S, Bukharin A, Xie L, Santillana M, Yang S, Xie Y (2021) Highresolution spatiotemporal model for countylevel COVID19 activity in the US. ACM Trans Manag Inf Syst (TMIS) 12(4):1–20
Chiang WH, Liu X, Mohler G (2021) Hawkes process modeling of COVID19 with mobility leading indicators and spatial covariates. Int J Forecast 38:505–520
Giudici P, Pagnottoni P, Spelta A (2021) Network selfexciting point processes to measure health impacts of COVID19. Available at SSRN 3892998
Held L, Höhle M, Hofmann M (2005) A statistical framework for the analysis of multivariate infectious disease surveillance counts. Stat Model 5:187–199
Meyer S, Held L, Höhle M (2014) Spatiotemporal analysis of epidemic phenomena using the R package surveillance. arXiv preprint arXiv:1411.0416
Shandilya SG, Timme M (2011) Inferring network topology from complex dynamics. New J Phys 13(1):013004
Prabakaran S, Gunawardena J, Sontag E (2014) Paradoxical results in perturbationbased signaling network reconstruction. Biophys J 106(12):2720–2728
Yu D (2010) Estimating the topology of complex dynamical networks by steady state control: generality and limitation. Automatica 46(12):2035–2040
Yu D, Parlitz U (2010) Inferring local dynamics and connectivity of spatially extended systems with longrange links based on steadystate stabilization. Phys Rev E 82(2):026108
Wan X, Liu J, Cheung WK, Tong T (2014) Inferring epidemic network topology from surveillance data. PLoS ONE 9(6):100661
Pajevic S, Plenz D (2009) Efficient network reconstruction from dynamical cascades identifies smallworld topology of neuronal avalanches. PLoS Comput Biol 5(1):1000271
Braunstein A, Ingrosso A, Muntoni AP (2019) Network reconstruction from infection cascades. J R Soc Interface 16(151):20180844
Meyer S, Held L (2014) Powerlaw models for infectious disease spread. Ann Appl Stat 8(3):1612–1639
Zipf GK (1946) The p 1 p 2/d hypothesis: on the intercity movement of persons. Am Sociol Rev 11(6):677–686
Simini F, González MC, Maritan A, Barabási AL (2012) A universal model for mobility and migration patterns. Nature 484(7392):96–100
Barthélemy M (2011) Spatial networks. Phys Rep 499(1–3):1–101
Verdeil E, Faour G, Velut S, Hamzé M, Mermier F (2007) Atlas du LIBAN. Presses de l’Ifpo
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
Grindrod P, Higham DJ (2018) High modularity creates scaling laws. Sci Rep 8(1):1–9
Albert R, Barabási AL (2002) Statistical mechanics of complex networks. Rev Mod Phys 74(1):47
Newman ME (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256
PastorSatorras R, Castellano C, Van Mieghem P, Vespignani A (2015) Epidemic processes in complex networks. Rev Mod Phys 87(3):925
Newman ME (2002) Spread of epidemic disease on networks. Phys Rev E 66(1):016128
Clauset A, Shalizi CR, Newman ME (2009) Powerlaw distributions in empirical data. SIAM Rev 51(4):661–703
Albert R, Jeong H, Barabási AL (2000) Error and attack tolerance of complex networks. Nature 406(6794):378–382
Dong G, Gao J, Du R, Tian L, Stanley HE, Havlin S (2013) Robustness of network of networks under targeted attack. Phys Rev E 87(5):052804
Valdez LD, Shekhtman L, La Rocca CE, Zhang X, Buldyrev SV, Trunfio PA, Braunstein LA, Havlin S (2020) Cascading failures in complex networks. J Complex Netw 8(2):013
Buldyrev SV, Parshani R, Paul G, Stanley HE, Havlin S (2010) Catastrophic cascade of failures in interdependent networks. Nature 464(7291):1025–1028
Albert R, Albert I, Nakarado GL (2004) Structural vulnerability of the North American power grid. Phys Rev E 69(2):025103
Edsberg Møllgaard P, Lehmann S, Alessandretti L (2021) Understanding components of mobility during the COVID19 pandemic. Phil Trans R Soc A 380(2214):20210118
Acknowledgements
The authors acknowledge the support of Prof. Rima Habib from the AUB’s Faculty of Public Health for her unfailing support and invaluable advice to physicists and mathematicians as they worked with public data.
Funding
The authors received no funding.
Author information
Authors and Affiliations
Contributions
J.T contributed to the problem formulation and led the project, and together with S.N and S.M developed the model and numerical analysis involved in this manuscript. G.F and C.A worked on the geographical and the demographic data, R.H and H.S worked on the data processing of infection counts, N.G and H.H on the epidemiological analysis and the significance of the results and their implementation in the vaccination strategy. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Chronology of COVID19 pandemic in Lebanon
Appendix: Chronology of COVID19 pandemic in Lebanon
In this appendix we summarize the chronology of the COVID19 pandemic in Lebanon from the first recorded case to February 2022. We divide this time interval into 4 periods, and highlight the main governmental interventions taken to control the spread of the disease.
Period 1 (February 2020 to June 2020). The first cases are documented. Early lockdown measures are implemented with airport closure. Testing is carried out for suspected cases, close contacts, and travelers. Cases are mainly within clusters. Aggressive contact tracing is adopted.
Period 2 (July 2020 to December 2020): The airport reopens in July 2020. The daily number of cases increases progressively and community transmission sets in. On August 4th, the Beirut blast occurs.
Period 3 (January 2021 to June 2021): The alpha variant is introduced. The case counts increase. Lockdown measures are implemented resulting in a decrease of the recorded cases. However, after lockdown release, an increase of the number of infections is observed until midMarch, with a progressive and sustained decrease up to June.
Period 4 (July 2021 to December 2021). Introduction of the delta variant, which progressively replaces the alpha variant. Two waves of delta are observed: July–September and November–December.
Period 5 (January 2022 to February 2022). Introduction of the omicron variant. High transmissibility of the new variant leads to high daily case counts reaching 10,000 on 1st Feb 2022.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Najem, S., Monni, S., Hatoum, R. et al. A framework for reconstructing transmission networks in infectious diseases. Appl Netw Sci 7, 85 (2022). https://doi.org/10.1007/s41109022005254
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41109022005254
Keywords
 Network reconstruction
 Betweenness centrality
 Autoregressive model
 COVID19
 Optimal control