 Research
 Open Access
 Published:
Heterogeneity in SIR epidemics modeling: superspreaders and herd immunity
Applied Network Science volume 5, Article number: 93 (2020)
Abstract
Deterministic epidemic models, such as the SusceptibleInfectedRecovered (SIR) model, are immensely useful even if they lack the nuance and complexity of social contacts at the heart of network science modeling. Here we present a simple modification of the SIR equations to include the heterogeneity of social connection networks. A typical powerlaw model of social interactions from network science reproduces the observation that individuals with a high number of contacts, “hubs” or “superspreaders”, can become the primary conduits for transmission. Conversely, once the tail of the distribution is saturated, herd immunity sets in at a smaller overall recovered fraction than in the analogous SIR model. The new dynamical equations suggest that cutting off the tail of the social connection distribution, i.e., stopping superspreaders, is an efficient nonpharmaceutical intervention to slow the spread of a pandemic, such as the Coronavirus Disease 2019 (COVID19).
Introduction
According to recent results on the spread of COVID19, a small fraction of the population is responsible for most infections (Adam et al 2020; Schuchat 2020). Clusters in care facilities, restaurants and bars, workplaces, and music events (Furuse et al 2020), or even choir practice (Hamner et al. 2020), dance events (Jang et al 2020) are a hallmark of superspreading (LloydSmith et al. 2005).
The most widespread deterministic Kermack–McKendrick equations (Kermack and McKendrick 1927), or SIR equations, are not designed to model superspreading. Superspreading is introduced as an additional dispersion of the secondary infections (LloydSmith et al. 2005) using a negative binomial distribution, possibly in the context of random network theory (HébertDufresne et al 2020), or modeled with stochastic Markov Chain methods (Allen 2017).
The simplest way to modify the SIR equations to include superspreaders is by adding a new class of susceptible individuals, P, superspreaders (Mkhatshwa and Mummert 2010) to the equations. However, network science (Barabási and Albert 1999) suggests a more complex picture. If we map individuals into vertices of a graph and their connections into edges, their degree distribution, i.e., the number of social connections of an individual, typically follows a powerlaw distribution with a slope between 2 and 3. Recent analyses COVID19 data (Ziff and Ziff 2020) support such powerlaw behavior that is the hallmark of smallworld phenomena of network science. Our goal is to use network science insight to generalize the SIR formalism. The result is a set of equations similar to polySIR generalizations of the SIR formalism (Moghadas et al. 2020; Chikina and Pegden 2020), but more specific to network science and does not assume a linear contact matrix: nonlinearity is essential to superspreading. Note that here we are focusing on structural superspreading due to the average contact structure of the population as they go about their daily lives. In the discussions, we will outline how to model transient superspreading, i.e., rare individual events that can dominate the case counts when the overall numbers are comparable to the cutoff of the contact (degree) distribution, and viral superspreading, from those (if they exist) who shed more virus than average.
The principal idea is that we replace the susceptibleinfectedrecovered variables with a (binned) distribution in terms of their contacts. In the next section, we derive the equations governing the time development of these distributions; in “Results” section, we illustrate the dependence of the solutions of the network science parameters; and in the final section, we summarize and discuss the results.
Network science inspired generalization of the SIR equations
Let us denote the number of social connections or contacts of a person with k. If a person is represented as a vertex of a graph, k would be the number of links of that vertex to other persons, or the degree of that vertex. Note that in reality this graph is changing every day: for instance, a person going to a rock concert would have many contacts on a particular day. We assume that over time, the contact distribution itself is stationary at least over the fundamental timescales associated with the disease. This is an approximation that will smear out transient rare events and can be fixed only with Monte Carlo modeling briefly discussed at the end. We focus on potentially infectious contacts, but do not track their quality. Some encounters that are closer, longer, or in an enclosed space, are more likely to lead to transmission. We average over these and assume that the resulting graph has a structure typical of other social networks.
The probability per day that a contact from an infected person produces a new infection is \(\beta = 1/T_c\), or the inverse of the average time in days that such connections will produce a new infection. In the usual SIR type modeling, the time derivative \(\dot{S} = \beta I/N S \), i.e., the infected fraction of the population I/N times the S chances of infections with probability \(\beta \) per day that an encounter results in a successful transmission. The original interpretation of \(\beta \) incorporates social interactions, while we split it off to do social contact distributions defined next.
To model the social connection network without a Monte Carlo representation of the actual social connection graph, we introduce the quantities, \(S_k, I_k, R_k\), the susceptible, infected, and recovered number of the population with k social links. The sum of the variables, \(S_k+ I_k+R_k = N_k \propto k^{\alpha }\) typically from network science. \(2< \alpha < 3\) (“ultra small world”) is the most interesting and practical limit for social networks. \(\alpha < 2\) is pathological, and \(\alpha > 3\) corresponds to random graph (“small world”). Typical ultra small world social networks contain large “hubs”, individuals with large number of connections. They are positioned in the tail of the distribution, and they will correspond to superspreaders in an epidemiological network.
With the above definitions we will write the network science inspired modification of the SIR equations as
where \(p_k\) is the probability that a vertex with k links will be infected, depending on \(p_k = p_k(\beta ,I_l\)), and on \(\gamma = 1/T_r\), the usual inverse recovery time in unit of 1/days. As noted earlier, the meaning of \(\beta \) is the average transmission fraction of potentially infectious encounters. This is subtly different from the original SIR model, where \(\beta \) includes the probability that a potentially infectious encounter actually happens. Note that the sum of any variable over index k gives the corresponding traditional SIR variable, and the sum of the equations correspond to an effective SIR equation with changing parameters.
The above equations will describe how an initial distribution \(S_k\) responds to the disease dynamics. We estimate \(p_k\) next. It corresponds to the chance per day that a susceptible person with k social links will get infected. It is equal to \(1q_k\), where \(q_k\) is the probability that the person in question does not get infected during that day, i.e. \((1\beta p)^k\), where p is the probability that one random link carries an infection (here we assume no correlation between infected links, a zeroth order approximation). Finally, in the spirit of the original SIR approximation, we approximate the probability with the ratio of the infected links to the total number of links \(p \simeq {\sum _l l I_l}/{\sum _l l N_l}\). Putting all these together
Note that we double counted both the infectious and total number of links which cancels in the ratio. While this is the simplest approximation for \(p_k\) and it does not account for topological details and degree correlations of the graph representing social interactions. The above random, nonassociative network assumption captures many features of network theory without a full graph Monte Carlo simulation.
The sum of these variables over k will obey an effective SIR equation with an effective \(\beta \) parameter
with \(N = \sum _k N_k\), etc. Each summed variable inherits its time dependence from the variables of the underlying NSIR equation. We define the effective \(R_{{\mathrm {eff}},t}= \beta _{\mathrm {eff}}/\gamma \), and \(R_{{\mathrm {eff}},0}=R_{{\mathrm {eff}},t=0}\). These effective parameters depend on the social dynamics through the underlying k dependence and the disease dynamics represented by \(\beta \), and \(\gamma \). Therefore a given pandemic, like COVID19, could play out differently depending on the social interaction hierarchy of people.
Results
We implemented the above model as a python code. If only the \(k=1\) terms are different from zero, then \(p_k = \beta I/N\), and \(\beta _{\mathrm {eff}} = \beta \), i.e. our model is identical to the original SIR model. As a sanity check, we verified that this is the case.
To illustrate the flexibility and the qualitative behavior of the new model, we choose a set of fiducial (reference) parameters and plot solutions with respect to changing each parameter. First, we select a traditional SIR model as a baseline: we choose \(\beta = 0.2\) and \(\gamma = 0.1\), which corresponds to \(R_0 = 2\). The typical behavior of this SIR model is shown on Fig. 1 with dotted lines.
For the network inspired model, labeled NSIR, we choose \(\beta = 0.07\), \(\gamma = 0.1\), a slope of the vertex degree distribution \(\alpha = 2.5\), the maximum number of connections (i.e. the cutoff of the vertex degree distribution, the largest superspreader) \(k_{\mathrm {max}} = 30\). In this case initially \(\beta _{\mathrm {eff}} \simeq 0.7\), or \(R_0 \simeq 7\); this is much higher than the corresponding \(R_0 = 2\) in our baseline SIR model. As illustrated in Fig. 1, even with this seemingly extreme initial condition, the time development of the disease dynamics is a milder version of the baseline SIR model. The peak infection rate is smaller, and a similarly smaller recovered fraction leads to herd immunity. Note that for the NSIR model we define the herd immunity level where the total \(I = \sum I_k\) variable no longer grows; it can be read off from the figures as the peak of the red curve where the tangent is parallel to the time axis. This is a generic feature of models with significant superspreaders: since they are responsible for a large fraction of infections, but also prone to get infected quickly, they eliminate themselves from the susceptible pool early on. Once that happened, the transmission is no longer efficient. As we will show later, (Fig. 5), this process expresses itself as a precipitous drop in \(R_t\).
Next we demonstrate the behavior of the solutions in terms of individual parameters. We focus on the parameters describing the social network properties, the novel feature of our model. First, we take a look at the effect of inserting the infection at a particular degree node. Figure 2 demonstrates, the effect of introducing the infection at the highest or lowest degree node is relatively minor: infecting a highest degree individual (or hub) slightly speeds up the timeline, while infecting a low degree node slightly slows it down. Aside from a slight displacement of the peak, the effect is insignificant. Most importantly, the herd immunity level is not influenced by the transient behavior due to the insertion degree, therefore the conclusions that follow should be robust against that.
Next we examine the cutoff of the degree distribution, corresponding to the highest degree hubs or superspreaders present in the population. While in the fiducial model we assumed at most \(k_{\mathrm {max}} = 30\) connections, Fig. 3 shows 11 (top) and 100 (bottom) connections. The higher the cutoff, the faster the time development and the higher the peak infected rate is. Conversely, cutting off the superspreaders flattens the curve, as expected. The number of equations is three times the highest degree node, for \(k_{\mathrm {max}} = 11,30,100\), respectively, 33, 90, 300 coupled differential equations were solved.
The slope of the degree distribution controls the relative contribution of the rare high degree nodes (hubs, superspreaders) compared to the more usual low degree nodes. Our fiducial parameter \(\alpha = 2.5\) is middle of the range, and Fig. 4 values close to the extremes in the interesting range from network science: \(\alpha = 2.1\) (top), corresponding to a distribution with a heavy tail, or many superspreaders, while \(\alpha = 2.9\) (bottom) is approaching a random network (\(\alpha \ge 3\)). It is clear that increasing the proportion of superspreaders increases the severity of the disease timeline, as before.
As we have shown earlier, whether the infection is introduced at a high degree or low degree nodes corresponds to transient effects, and hardly influences the top line final results. To elucidate the difference it makes in the early stage of the endemic, Fig. 5 plots \(R_t\) for several cases: the fiducial SIR model, our fiducial NSIR model, and two extremes, where the infection was introduced at a hub (superspreader) node, or a bottom (low social connection) node. Eventually, all three NSIR models converge to the same SIR model asymptote. If the infection starts with a hub, the early values of \(R_t\), corresponding to an effective \(R_0\), are extremely high and drop quickly when the superspreaders eliminate themselves from the susceptible pool. Conversely, if the infection starts at a low degree node, the initial \(R_0\) is much lower, even rises initially as the infection spreads towards superspreaders. Therefore, if this is a realistic model of disease transmission, measurements of \(R_0\) are fairly sensitive to serendipitous transient effects due to where exactly the infection was introduced.
Finally, we show time development of each bin in the contact distribution of the population in our fiducial NSIR model. On Fig. 6, the solid red lines from top to bottom correspond to the degree distribution at a particular point in time from low to high degree. For reference, the total SIR variables are displayed with dots on this semilog plot: the red dots correspond to the sum of the red solid lines.
After transient effects, nodes with different degrees have similar development, perhaps high degree nodes, hubs or superspreaders, reaching their maxima slightly earlier. The one outlying solid line corresponds to the transient behavior of the node carrying the initial infection. After the first few weeks (while the infected fraction is still fairly low), the transient behavior joins the trend carved out by the rest of the curves. It took about two timescales of \(1/\gamma \) to achieve such statistical “thermalization” and erase the memory of where the infection started.
Summary and discussions
We have introduced a generalization of the SIR model, where each variable is a distribution. This generalization is a specific case of a polySIR model that could describe a variety of situations, e.g., Moghadas et al. (2020) and Chikina and Pegden (2020) used it to model the interactions of different age groups with a constant contact matrix. We focus on social connection distributions motivated by network science. The resulting equations are more nonlinear than a standard polySIR model with a fixed contact matrix.
There are a number of phenomenological parameters that influence the dynamics of the disease transmission. In our formulation, both parameters \(\beta \) and \(\gamma \) are describe average properties of the viral infection, unlike the traditional SIR model, where \(\beta \) is influenced by the properties of the disease as well as social interactions. We model social interactions through the social contact distribution motivated from network science. The universality of social networks (Barabási and Albert 1999) motivates the assumptions of a power law degree distribution. We explore how the slope, the cutoff, and the contact degree of the initial infection drives the dynamics.
Our main result, consistent with expectations and results from other methods, is that if supserspreaders, highly connected hubnodes, are important for disease transmission, a lower level of infection rate leads to herd immunity than otherwise. Superspreaders tend to infect each other quickly and remove themselves from the susceptible pool earlier. Thus, the initial effective \(R_0\) drops quicker than in the analogous SIR model.
The steeper the degree distribution and the lower its cutoff, the less important superspreaders become. Steepening the distribution, i.e. moving people to lower connections than normal, corresponds to social distancing or lockdown. Moving the cutoff to lower degrees, i.e., removing superspreaders appears to be an effective strategy to flatten the curve as well. If parameters are fit from realistic data, our results could inform nonpharmaceutical mitigation strategies during the COVID19 pandemic.
We found that the insertion of the infection at a highly or poorly connected node results widely varying effective \(R_0\) values if interpreted in terms of the traditional SIR model. We speculate that zoonotic viral transmission to humans is likely at an average node (because there is more such nodes), while transmission by or after travel is more likely through hub nodes. This is qualitatively consistent with the initial lower estimates of \(R_0 \simeq 23\) for COVID19 (Zhao et al. 2020) that was revised up to \(R_0 \simeq 46\) (Sanche et al. 2020; Flaxman et al 2020) after international spread.
Our work does not take into account the discrete transients of disease transmission: a particular person (or the corresponding node) is either infected or not, there are no fractional infections. In the contrary, the solutions to SIRlike differential equations have a scaling property: a new solution is obtained by multiplying a solution with an arbitrary real number. This is clearly not a good approximation for smaller communities and stochastic effects are important in the early phases of an endemic in larger communities. For instance, the first infection might die out or trigger a transient superspreader event, ultimately leading to vastly different outcomes.
It would be relatively straightforward to modify our model to include transients in a Monte Carlo fashion. Instead of multiplying with the probability \(p_k\), we could draw the number of infections from a multinomial distribution of \(S_k\) total numbers with \(p_k\) infection probability, assuming each infection is independent. Our differential equations then become a discrete difference equations: \(\dot{S}_k \rightarrow \Delta S_k\), and so forth. Each solution corresponds to a Monte Carlo realization, and rare events, corresponding to mass gatherings, for instance, could now happen in a discrete fashion. If it is found that disease transmission probability varies widely with individuals \(\beta \) itself could be a realization from a distribution. If that distribution has a long tail, this process would induce viral superspreading.
Several of the Monte Carlo simulations are needed to evaluate the statistics and small number transients associated with the transmission. Nevertheless, such Monte Carlo simulations would still be less costly than a full network science Monte Carlo model. Once the total numbers are high enough, the multinomial distribution will narrow enough to produce results very similar to our equations 3. The above generalizations are left for future work.
The plausible simplifications that enabled the construction of the NSIR model need to be confronted with data. The more complex behavior of the effective \(R_t\), as shown on Fig. 5 would be the first qualitative sign of behavior beyond the simple SIR model. Detailed analyses of contact tracing information could affirm or reject the assumption that potentially infectious contacts have similar structure to other social networks. Finally, investigating the time development of infections correlated with publicly available social networking data could reveal a structure similar to Fig. 6, a smoking gun for NSIRlike development.
If parameters of the NSIR model constrained by data confirm that superspreading is significant, these results can inform public health measures: nonpharmaceutical interventions could focus on potential superspreading situations and people; the first wave of vaccines could be directed toward individuals in jobs or situations (e.g., living quarters) of superspreading potential. Such measures would be effective compared to policies blind to social networks. To end on a positive note, the NSIR model suggests that if superspreading is important in the spread of COVID19, herd immunity levels are somewhat lower than the more pessimistic estimates from high \(R_0\) estimated from the SIR model.
Availability of data and materials
The code and all data generated during this study are available from the author on reasonable request.
Abbreviations
 SIR model:

SusceptibleInfectedRecovered model
 COVID19:

Coronavirus Disease 2019
 NSIR model:

Network science motivated modification of the SIR model
References
Adam D, Wu P, Wong J, Lau E, Tsang T, Cauchemez S, Leung G, Cowling B (2020) Clustering and superspreading potential of severe acute respiratory syndrome coronavirus 2 (SARSCoV2) infections in hong kong. https://doi.org/10.21203/rs.3.rs29548/v1
Allen LJS (2017) A primer on stochastic epidemic models: formulation, numerical simulation, and analysis. Infect Dis Model 2(2):128–142. https://doi.org/10.1016/j.idm.2017.03.001
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512. https://doi.org/10.1126/science.286.5439.509
Chikina M, Pegden W (2020) Modeling strict agetargeted mitigation strategies for COVID19. arXiv:2004.04144
Flaxman S, Mishra S, Gandy A, Unwin H, Coupland H, Mellan T, Zhu H, Berah T, Eaton J, Perez Guzman P, Schmit N, Cilloni L, Ainslie K, Baguelin M, Blake I, Boonyasiri A, Boyd O, Cattarino L, Ciavarella C, Cooper L, Cucunuba Perez Z, CuomoDannenburg G, Dighe A, Djaafara A, Dorigatti I, Van Elsland S, Fitzjohn R, Fu H, Gaythorpe K, Geidelberg L, Grassly N, Green W, Hallett T, Hamlet A, Hinsley W, Jeffrey B, Jorgensen D, Knock E, Laydon D, Nedjati Gilani G, Nouvellet P, Parag K, Siveroni I, Thompson H, Verity R, Volz E, Walters C, Wang H, Wang Y, Watson O, Winskill P, Xi X, Whittaker C, Walker P, Ghani A, Donnelly C, Riley S, Okell L, Vollmer M, Ferguson N, Bhatt S (2020) Report 13: estimating the number of infections and the impact of nonpharmaceutical interventions on covid19 in 11 European countries. https://doi.org/10.25561/77731
Furuse Y, Sando E, Tsuchiya N, Miyahara R, Yasuda I, Ko YK, Saito M, Morimoto K, Imamura T, Shobugawa Y, Nagata S, Jindai K, Imamura T, Sunagawa T, Suzuki M, Nishiura H, Oshitani H (2020) Clusters of coronavirus disease in communities, Japan, January–April 2020. Emerg Infect Dis. https://doi.org/10.3201/eid2609.202272
Hamner L, Dubbel P, Capron I, Ross A, Jordan A, Lee J, Lynn J, Ball A, Narwal S, Russell S, Patrick D, Leibrand H (2020) High SARSCoV2 attack rate following exposure at a choir practice—Skagit county, Washington, March 2020. MMWR Morb Mortal Wkly Rep 69(19):606–610. https://doi.org/10.15585/mmwr.mm6919e6
HébertDufresne L, Althouse BM, Scarpino SV, Allard A (2020) Beyond R0: Heterogeneity in secondary infections and probabilistic epidemic forecasting. https://doi.org/10.1101/2020.02.10.20021725
Jang S, Han SH, Rhee JY (2020) Cluster of coronavirus disease associated with fitness dance classes. Emerg Infect Dis, South Korea. https://doi.org/10.3201/eid2608.200633
Kermack WO, McKendrick AG (1927) A contribution to the mathematical theory of epidemics. Proc R Soc Lond Ser A Contain Pap Math Phys Character 115(772):700–721. https://doi.org/10.1098/rspa.1927.0118
LloydSmith JO, Schreiber SJ, Kopp PE, Getz WM (2005) Superspreading and the effect of individual variation on disease emergence. Nature 438(7066):355–359. https://doi.org/10.1038/nature04153
Mkhatshwa T, Mummert A (2010) Modeling superspreading events for infectious diseases: case study SARS. arXiv:1007.0908
Moghadas SM, Shoukat A, Fitzpatrick MC, Wells CR, Sah P, Pandey A, Sachs JD, Wang Z, Meyers LA, Singer BH, Galvani AP (2020) Projecting hospital utilization during the COVID19 outbreaks in the united states. Proc Natl Acad Sci 117(16):9122–9126. https://doi.org/10.1073/pnas.2004064117
Sanche S, Lin YT, Xu C, RomeroSeverson E, Hengartner N, Ke R (2020) High contagiousness and rapid spread of severe acute respiratory syndrome coronavirus 2. Emerg Infect Dis 26(7):1470–1477. https://doi.org/10.3201/eid2607.200282
Schuchat A (2020) Public health response to the initiation and spread of pandemic COVID19 in the United States, February 24–April 21, 2020. MMWR Morb Mortal Wkly Rep 69(18):551–556. https://doi.org/10.15585/mmwr.mm6918e2
Zhao S, Lin Q, Ran J, Musa SS, Yang G, Wang W, Lou Y, Gao D, Yang L, He D, Wang MH (2020) Preliminary estimation of the basic reproduction number of novel coronavirus (2019nCoV) in China, from 2019 to 2020: a datadriven analysis in the early phase of the outbreak. Int J Infect Dis 92:214–217. https://doi.org/10.1016/j.ijid.2020.01.050
Ziff AL, Ziff RM (2020) Fractal kinetics of COVID19 pandemic. https://doi.org/10.1101/2020.02.16.20023820
Acknowledgements
The author thanks Lee Altenberg, Tom Blamey, Marguerite Butler, István Csabai, and Deveraux Talagi for useful comments.
Funding
The author used no external funding for this project.
Author information
Authors and Affiliations
Contributions
IS has carried out all work for this manuscript. The author read and approved the final manuscript
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable to this theoretical project.
Consent for publication
Not applicable to this theoretical project.
Competing interests
The author declares that he has no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Szapudi, I. Heterogeneity in SIR epidemics modeling: superspreaders and herd immunity. Appl Netw Sci 5, 93 (2020). https://doi.org/10.1007/s41109020003365
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41109020003365
Keywords
 COVID19
 Epidemiology
 SIR equations
 Disease dynamics
 Network science
 Superspreaders
 Herd immunity