Skip to main content

Network-based prediction of COVID-19 epidemic spreading in Italy

Abstract

Initially emerged in the Chinese city Wuhan and subsequently spread almost worldwide causing a pandemic, the SARS-CoV-2 virus follows reasonably well the Susceptible–Infectious–Recovered (SIR) epidemic model on contact networks in the Chinese case. In this paper, we investigate the prediction accuracy of the SIR model on networks also for Italy. Specifically, the Italian regions are a metapopulation represented by network nodes and the network links are the interactions between those regions. Then, we modify the network-based SIR model in order to take into account the different lockdown measures adopted by the Italian Government in the various phases of the spreading of the COVID-19. Our results indicate that the network-based model better predicts the daily cumulative infected individuals when time-varying lockdown protocols are incorporated in the classical SIR model.

Introduction

The outbreak of the greatest epidemic of the twenty first century caused by the SARS-CoV-2 virus has stimulated researchers to understand and control the spread of the disease inside a population with the help of mathematical models developed in recent years (Hethcote 2000; Pastor-Satorras et al. 2015). A single outbreak of a disease is typically described by a SIR compartmental model, where each individual at a certain time t can only be in one of the three different disease stages: Susceptible (S), i.e. healthy, but vulnerable for the infection, Infected (I) and Recovered (R), i.e. the individual either recovers from the disease or, unfortunately, dies. A diffusion-like SIR epidemic spread on a contact network models the infection between individuals when they come into contact, close enough in space and long enough in time (Chu et al. 2020). By adopting the SIR model, Prasse et al. (2020) predict the spreading of the COVID-19 epidemic on a contact network consisting of 16 cities in the Chinese province Hubei via their Network Inference-based Prediction Algorithm (NIPA). Since the interactions between cities are unknown, Prasse et al. exploit their network reconstruction approach, described in Prasse and Van Mieghem (2020b), to estimate the contact network from the observations of the viral states.

In this paper, we use NIPA (Prasse and Van Mieghem 2020b; Prasse et al. 2020) to investigate the spreading of the COVID-19 epidemic in Italy by considering the 21 Italian regions, shown in Fig. 1, as nodes of the network. We extend NIPA to NIPA-LD (NIPA with LockDown), that takes into account the different lockdown measures adopted in the various phases of the COVID-19 spreading in Italy by adapting the ideas of Song et al. (2020). Song et al. (2020) pointed out that the epidemiological models do not consider the several containment measures, such as in-home isolation, travel and social activities restrictions, enforced by governments to dampen the transmission rate over time. Due to the containment measures, the infection rates vary over time, which should be incorporated in a prediction model to reflect the real situations of epidemic and provide more meaningful analyses.

We apply NIPA and the extension NIPA-LD to the period between the first of March till June 9th. Our results indicate that NIPA-LD is capable to better predict the daily cumulative infected individuals, because the time-varying lockdown restrictions are considered.

Related work

In the last months, the number of papers studying the COVID-19 pandemic and proposing models to predict the evolution of the disease sky-rocketed. In Estrada (2020), Estrada discusses how this pandemic is actually modeled and proposes future research directions by reviewing the three main areas of modeling research against COVID-19: epidemiology, drug repurposing, and vaccine design. After the strict policies in China to reduce close contacts between people, which revealed the best strategy to effectively block the virus diffusion, Italy and many other European countries imposed several containment measures, called lockdown. Some researches then investigated how mobility changed during the lockdown phases (Oliver et al. 2020; Klein et al. 2020; Galeazzi et al. 2020; Schlosser et al. 2020), others have shown how lockdown can effectively slow down disease transmission. Flaxman et al. (2020) study the effect on COVID-19 transmission of the major non-pharmaceutical interventions (NPIs) across 11 European countries for the period from the start of the COVID-19 epidemics in February 2020 until May 4th 2020. In a more general work, Haug et al. (2020) quantify the effectiveness of the world-wide NPIs to mitigate the spreading of COVID-19 and SARS-CoV-2 showing that this effectiveness is strongly related to the economic development as well as the dimension of governance of a country. At a country level, Hadjidemetriou et al. (2020) use driving, walking and transit real-time data to investigate the impact of UK government control measures on human mobility reduction and consequent COVID-19 deaths. Pei et al. (2020) assess the effect of NPIs on COVID-19 spread in the United States finding significant reductions of the basic reproductive numbers in major metropolitan areas when applying social distancing and other control measures. Di et al. (2020) study the case of the Île-de-France exploiting a stochastic age-structured transmission model which combines data on age profile and social contacts to evaluate the impact of lockdown and propose possible exit strategies. The Italian town of Vo’ Euganeo is finally studied by Lavezzo et al. (2020), where the efficacy of the implemented control measures are evaluated, providing also insights into the transmission dynamics of asymptomatic individuals.

Concerning the modeling of the COVID-19 spreading with the imposed restrictions, Maier and Brockmann (2020), for instance, proposed a model that takes into account both quarantine of symptomatic infected individuals and population isolation due to containment policies, and showed that the model agrees with the observed growth of the epidemic in China. Arenas et al. (2020) defined a model that stratifies the Spanish population by age and predicts the incidence of the epidemics through time by considering control measures. They show that the results can be refined by taking into account mobility restrictions imposed at the level of municipalities. Chinazzi et al. (2020) used a global metapopulation disease transmission model to study the impact of travel limitations on the national and international spread of the epidemic in China. The NIPA-LD approach presented in this paper is different from the described proposals since it extends the NIPA method, which assumes no knowledge on the population flows and estimates the interactions between groups of individuals, by considering time-varying lockdown policies in the prediction phase.

Modeling the spread of COVID-19 in Italy has followed several approaches. Ferrari et al. (2020), for instance, use an adjusted time-dependent SIRD (Susceptible-Infected-Recovered-Died) model to predict the provincial cases. Caccavo (2020) propose a modified SIRD model to describe both the Chinese and the Italian outbreaks. Giuliani et al. (2020) define a model with \(c = 8 \) compartments or stages of infection: susceptible (S), infected (I), diagnosed (D), ailing (A), recognized (R), threatened (T), healed (H) and extinct (E), collectively termed SIDARTHE. However, only one compartment is measured in the Covid-19 crisis, namely the number of active cases. Thus, for an epidemic model with many compartments, it is not possible to evaluate the accuracy in predicting compartments other than the number of active cases. In this work, we confine to the \(c = 3\) compartmental SIR model for the predictions by NIPA. Kozyreff (2020) provides an SIR modeling comparison between Belgium, France, Italy, Switzerland and New York City suggesting that finer models are unnecessary with the corresponding available macroscopic data.

Background

In this section, we briefly review the epidemic SIR model on contact networks (Youssef and Scoglio 2011; Prasse and Van Mieghem 2020b) and the prediction of the COVID-19 infection, caused by the SARS-CoV-2 virus, based on the SIR model (Prasse et al. 2020). Then, we incorporate time-varying protocols introduced by the government to slow down the virus propagation.

We consider a network with N nodes, where each node i corresponds to the set of individuals living in the same place, like a city or a region. An individual at any discrete time \(k=1, 2, \ldots \) is in either one of the \(c=3\) compartments Susceptible (S), Infectious (I), Recovered (R). The SIR model assumes that infectious individuals become recovered and cannot infect any longer because of hospitalization, death, or quarantine measures. The viral state of any node i at time k is denoted by the \(3 \times 1\) vector \(v_i[k] =(S_i[k],I_i[k],R_i[k])^T\), where \(S_i[k],~I_i[k],~R_i[k]\) are the fractions of susceptible, infectious, and recovered individuals, respectively, satisfying the conservation law \(S_i[k]+I_i[k]+R_i[k]=1\). The discrete-time SIR model (Youssef and Scoglio 2011; Prasse and Van Mieghem 2020b) defines the evolution of the viral state \(v_i[k]\) of each node i as:

$$\begin{aligned} I_i[k+1]= & {} (1 - \delta _i)I_i[k] + (1- I_i[k] - R_i[k] )\sum \limits _{j=1}^N \beta _{ij} I_j[k] \end{aligned}$$
(1)
$$\begin{aligned} R_i[k+1]= & {} R_i[k] + \delta _i I_i[k] \end{aligned}$$
(2)

where \(\beta _{ij}\) denotes the infection probability when individuals move from place (also called region) j to place i. The self-infection probability \(\beta _{ii}\ne 0\), because individuals inside the same place interact. The \(N \times N\) infection probability matrix B specifies the contact transmission chance between each couple of regions. The curing probability \(\delta _i\) of place i quantifies the capability of individuals in place i to cure from the virus. We assume that the SIR model (1), (2) has both \(\beta _{ij}\) and \(\delta _i\) that do not change over time.

Prasse et al. (2020) proposed the Network Inference-based Prediction Algorithm (NIPA), which estimates the spreading parameters \(\delta _i\) and \(\beta _{ij}\) for each region i from the time series \(v_i[1], v_i[2], \ldots , v_i[n]\). These estimates in (1) and (2) predict the evolution of the virus in the next future times \(k>n\).

The SIR model has three compartments. In principle, with c compartments, we must have \(c-1\) independent measurements. The input to NIPA is only one compartment, the infectious compartment I, which is less than c \(- 1=2\) compartments necessary to reconstruct the network with the SIR model. NIPA creates observations of the R compartment by iterating over different candidate values of the curing rates \(\delta _i\) and assuming the initial condition R(0) \(=\) 0. Thus, we observe only one compartment, the infectious compartment I, and the recovered compartment R is obtained by Eq. (2) after estimating the curing probability \(\delta _i\) in the training phase.

To obtain the curing probability \(\delta _i\), 50 equidistant values between \(\delta _{min}\) and \(\delta _{max}\) have been considered, and then the value giving the best fit of model (1) has been used to estimate the matrix B based on the least absolute shrinkage and selection operator (LASSO). For a general class of dynamics on networks (including the SIR model), completely different network topologies can result in the same dynamics. Hence, it is not possible to deduce the network accurately from observations, regardless of the reconstruction method: two very different networks perfectly match the observations, and there is no reason to infer one network instead of the other. Thus, though NIPA accurately predicts the dynamics, the estimated network B can be very different from the true network (Prasse and Van Mieghem 2020c).

Let n be the number of days in which the infection has been observed. To evaluate the prediction accuracy, a fixed number of days \(n_{neglect}\) is removed prior to \(v_i[1], v_i[2], \ldots , v_i[n]\). The model is then trained on the days \(v_i[1], v_i[2], \ldots , v_i[n- n_{neglect}]\). Thereafter, the omitted \(n_{neglect}\) days (\(k=n- n_{neglect}+1, \ldots , n\)) are predicted. It is possible to predict also \(n_{predict}\) days (\(k=n+1, \ldots , n+n_{predict}\)) ahead the number n of available observations, however, in such a case, we cannot evaluate the goodness of the prediction.

Prasse et al. (2020) showed that the approach accurately predicts the cumulative infections for \(n_{neglect} \le 5\). However, if the number of neglected days increases, then the prediction capability of NIPA decreases. NIPA assumes constant values for \(\beta _{ij}\), which, however, do not reflect the reality of the COVID-19 pandemic, because the containment measures imposed by the governments diminish \(\beta _{ij}\) and thus the spread of the infection. Hence, infection probabilities \(\beta _{ij}[k]\) which vary over time k should be considered in the epidemic model.

Extended SIR model with time-varying infection rate

Song et al. (2020) proposed the concept of transmission modifiers, which decrease the probability that a susceptible individual can come into contact with an infected one because of the quarantine measures.

At any discrete time k, let \(q_S[k]\) be the chance of an individual to be in home isolation, and \(q_I[k]\) the chance of an infected person to be in hospital quarantine. The transmission modifier \(\pi [k]\) is defined as follows:

$$\begin{aligned} \pi [k]=(1-q_S[k])(1-q_I[k]) \in [0,1] \end{aligned}$$
(3)

and if no quarantine is active, then \(\pi [k]=1\). In order to have a realistic infection rate \(\beta \), Song et al. (2020) multiply \(\beta \) by \(\pi [k]\) in the classic continuous SIR model. Thus, the infection rate now reflects the effective currently enforced quarantine measures taken in a country. In the extended SIR model, the curing probability \(\delta _i\) remains the same, but the infection probability \(\beta _{ij}\) is replaced by \(\beta _{ij}\pi [k] \). The same considerations can be applied to the discrete-time SIR model by modifying Eq. (1) above:

$$\begin{aligned} I_i[k+1] = (1 - \delta _i)I_i[k] + (1- I_i[k] - R_i[k] )\sum \limits _{j=1}^N \beta _{ij} \pi [k] I_j[k] \end{aligned}$$
(4)

The transmission modifier \(\pi [k]\), however, should be specified on the base of the effective quarantine protocols undertaken in a specific region. Regarding the Hubei province in China, Song et al. (2020) suggest a step function mirroring the isolation measures established by the government.

In the next section, the extended time-varying model (4) is applied to Italy by considering as nodes of the contact network the 21 regions by which Italy is composed.

Transmission modifier for Italy

In Italy, the outbreak of the COVID-19 epidemic started in February in the North of Italy. A map of Italy with the division in regions is shown in Fig. 1. On February 21st, the first case of infection appeared in the town of Codogno, in Lombardia, and two cases also in the town of Vo’ Euganeo in Veneto. These two towns where immediately declared red zones and nobody could either go out or come in. On February 24th, the three regions of Lombardia, Veneto, and Emilia-Romagna registered 172, 33, and 18 cases of infections, respectively. After that date, the virus propagated all over Italy very fast.

During the first week, until the first days of March, no other particularly strict safety measures were enforced. On March 9th, however, Italy turned into a lockdown Phase 1 with several strong restrictions and quarantine protocols. Schools, universities, shops, and many offices were closed, travels were not allowed and exits were only allowed for work, health or necessity situations with a mandatory self-certification.

Phase 2 followed, in which countermeasures were adopted to reduce the pandemic. Finally, Phase 3 reopened almost all the activities and travels all over Italy. In order to define the values of the transmission modifier for the different quarantine periods, we identified the following time intervalsFootnote 1:

  • \(\pi _0\): \(k \le \) March 9 soft measures;

  • \(\pi _1\): March 10 \(\le k\le \) April 13 lockdown;

  • \(\pi _2\): April 14 \(\le k \le \) May 3 libraries and stationeries reopen;

  • \(\pi _3\): May 4 \( \le k \le \) May 17 manufacturing, construction activities, wholesales reopen, meetings with relatives allowed;

  • \(\pi _4\): May 18 \(\le k \le \) May 24 hair dressers, beauty center, barber shops, bar, restaurants, retailers reopen, outdoor sport, baby parks allowed;

  • \(\pi _5\): May 25 \(\le k \le \) June 2 gym, swimming pools, sport structures reopen

  • \(\pi _6\): \(k \ge \) June 3 inter-regional mobility allowed.

The choice of the best values of the transmission modifier reflecting well the quarantine protocols is not an easy task and deserves a deep investigation. In the next sections, a study on the improvement of the NIPA method when different lockdown levels related to the quarantine strategies adopted by authorities is performed.

Data preprocessing

Our measurement data have been collected by the Italian Civil Protection DepartmentFootnote 2 and are daily published on a repository. The available data are national, regional and provincial. We selected the regional ones which refer to the 21 regions depicted in Fig. 1: Abruzzo, Basilicata, P.A. Bolzano, Calabria, Campania, Emilia-Romagna, Friuli Venezia Giulia, Lazio, Liguria, Lombardia, Marche, Molise, Piemonte, Puglia, Sardegna, Sicilia, Toscana, P.A. Trento, Umbria, Valle d’Aosta, Veneto. Thus, for Italy, the entry \(\beta _{ij}\) of the \(21\times 21\) matrix B estimates the infection probability between the regions j and i. In the map, regions have been divided in 4 different colors representing the level of COVID-19 infected individuals. The red regions have been the most affected by COVID-19, followed by the yellow ones, the orange ones and the green regions with a lower number of cases.

For each observation day, we focused on the new positives to COVID-19. We considered observations from March 1, 2020 to June 9, 2020.

Transmission modifier analysis

To compare the NIPA method with the NIPA-LD implementing the lockdown measures, we considered the model generated by NIPA which, in the training phase, neglects \(n_{neglect}\) days, and then applied this model for the prediction phase by using different values of \(\pi \) and an increasing value of \(n_{neglect}\). After that, we computed the average percentage error reduction of NIPA-LD with respect to NIPA.

Let \(I_{CF,i}[k]\) be the observed cumulative fraction of infections of region i at time k:

$$\begin{aligned} I_{CF,i}[k]=\sum _{\tau =1}^{k} I_{i}[\tau ] \end{aligned}$$
(5)

To quantify the prediction accuracy we considered the Mean Absolute Percentage Error (MAPE) defined as:

$$\begin{aligned} e[k]=\frac{1}{N}\sum _{i=1}^{N}{\frac{\mid I_{CFpred,i}[k]-I_{CF,i}[k]\mid }{I_{CF,i}[k]}} \end{aligned}$$
(6)

where \(I_{CFpred,i}[k]\) is the predicted cumulative fraction of infected individuals in region i at time k.

Let e[k] and \(e_{LD}[k]\) denote the MAPE errors when \(I_{CFpred,i}[k]\) is computed by NIPA and NIPA-LD, respectively. The percentage error improvement of NIPA-LD over NIPA is then computed as

$$\begin{aligned} pe[k]=\frac{e[k] - e_{LD}[k]}{e[k]}\times 100 \end{aligned}$$
(7)

In order to find a good transmission modifier which reflects the real situation best, we tested different \(\pi \) values by supposing a different response from people in respecting the quarantine measures imposed in the 3 months with varying levels of restrictions. Thus, we fixed increasing values of \(\pi \) which intuitively correspond to a lower compliance to the containment protocols by the individuals. In view of the Italian lockdown measures previously described, we considered the following transmission modifier values:

$$\begin{aligned} \pi _{LD1}= & {} [1~ 0.1 ~ 0.3~ 0.5 ~ 0.7 ~ 0.8 ~ 1] \\ \pi _{LD2}= & {} [1~ 0.2 ~ 0.4 ~ 0.6 ~ 0.8 ~ 0.9 ~ 1]\\ \pi _{LD3}= & {} [1~ 0.3 ~ 0.5 ~ 0.7 ~ 0.85 ~ 0.95~ 1]\\ \pi _{LD4}= & {} [1 ~ 0.4 ~ 0.55 ~ 0.75 ~ 0.85 ~ 0.95 ~ 1]\\ \pi _{LD5}= & {} [1~ 0.5 ~ 0.7 ~ 0.8 ~ 0.9~ 0.95~ 1]\\ \pi _{LD6}= & {} [1~ 0.6~ 0.75~ 0.85 ~ 0.95 ~ 0.99 ~ 1]\\ \pi _{LD7}= & {} [1 ~ 0.7 ~ 0.8 ~ 0.90 ~ 0.96~ 0.99 ~ 1] \end{aligned}$$

Table 1 reports the improvement of the percentage error of NIPA-LD with respect to NIPA, for the seven transmission modifiers and different numbers of predicted/omitted days, averaged over all the Italian regions and considering all the time windows under study, while Fig. 2 shows the mean absolute prediction error as a function of the predicted/omitted days. From the table we can observe that for \(n_{neglect}\) equals to 10, 30 and 40 the percentage of improvement is overall very significant for most of the transmission modifier vectors. This means that NIPA-LD can be used to reliably perform both short and long term predictions. More specifically, for the short term predictions (\(n_{neglect}=10\)) low transmission modifier values are more suitable: \(\pi _{LD1}\), for example, is able to achieve an improvement of 35.369%. For the long term predictions, on the contrary, where we neglect 30 or even 40 days aiming to predict them, higher transmission modifier values like those of \(\pi _{LD7}\) perform better. When \(n_{neglect} =20\) the error reduces, on average, only for \(\pi _{LD6}\) and \(\pi _{LD7}\). However, as Fig. 2b highlights, for \(\pi _{LD5}\) there is a reduction of the prediction error since the 10th day, and for \(\pi _{LD4}\), \(\pi _{LD3}\), \(\pi _{LD2}\) in the following next days, except for \(\pi _{LD1}\). Hence, for this case, we can conclude that soft lockdown protocols are able to induce a positive improvement in the error for all the values of the number of neglected days. Finally, Fig. 2e depicts a cone of error evolution for \(n_{neglect}=30\) when using as transmission modifiers \(\pi _{LD5},\pi _{LD6},\pi _{LD7}\), considering \(\pi _{LD5}\) and \(\pi _{LD7}\) as lower bound and upper bound of \(\pi _{LD6}\), respectively. Then, we could assume that the future evolution of the epidemic can be predicted with an error that falls in between the predictions based on \(\pi _{ub}\) and \(\pi _{lb}\).

The Fig. 2 shows that the differences between the different lockdown measures are meaningful.

In the next section, a detailed analysis for all the Italian regions is performed to evaluate the prediction accuracy of NIPA and NIPA-LD.

Table 1 Percentage improvement of NIPA-LD over NIPA prediction for different transmission modifier values and increasing number of neglected days

Results

In this section, we evaluate the prediction accuracy of NIPA and NIPA-LD by computing the cumulative infections for each observation day when \(n_{neglect}=30\) and compare them to the true data by using \(\pi _{LD6}\) as transmission modifier for the different quarantine periods. In this experiment, thus, NIPA does not consider the 30 last days of the observed daily data of the newly infected individuals for estimating the curing probability \(\delta _i\) and the infection probability \(\beta _{ij}\). Then both NIPA and NIPA-LD predict the cumulative infections from May 10 until June 9 and the are compared with (a) the true data, and (b) to the logistic function as baseline. The logistic function, introduced in the 19th century by Verhulst to model population growth, approximates the solutions of the SIS and SIR models (Kermack and McKendrick 1927; Prasse and Van Mieghem 2020a). The cumulative number of infected cases \(y_i[k]\) at time k for the region i is assumed to follow:

$$\begin{aligned} y_i[t]=\frac{y_{\infty ,i}}{1+e^{-K_i(t-t_{0,i})}} \end{aligned}$$
(8)

where \(y_{\infty ,i}\) is the long-term fraction of infected individuals, \(K_i\) is the logistic growth rate, t is the time in day.

Due to lack of space, we only report the plots for a subset of the North regions, the ones highly affected by the virus spreading in the red and yellow zones (Piemonte, Lombardia, Veneto, Emilia-Romagna), for one representative region of the orange zone (Lazio) and for one of the green zones (Puglia). For the center and the south of Italy, the COVID-19 spreading has been characterized by a lower number of cases and for this reason we report only two representative regions. In Fig. 3, the cumulative infections for Piemonte are shown. Here, the lockdown modified NIPA variant clearly outperforms the classical NIPA, which overestimates the number of infected individuals. For Piemonte, NIPA-LD better matches the true data. Moreover, for this region, a simple logistic regression is not able to well predict the epidemic. Figure 4 depicts the trend of the predictions for the most challenging region in Italy, Lombardia, which has been mostly affected by the COVID-19. Again, the logistic regression excessively understimates the cumulative infections. From May 10 to May 30, both NIPA and NIPA-LD models well match the number of cumulative infections. However, for the next days, NIPA slightly overestimates the infections while NIPA-LD underestimates them. This is probably due to a much higher mobility of the population after the loosening of the lockdown rules on May 25. The Veneto case (Fig. 5), another region of the North Italy highly affected by the COVID-19, on the contrary, is accurately predicted by NIPA-LD, while classical NIPA without lockdown clearly overestimates the number of infections. Here, the logistic regression works better than the previous regions but still understimates the cumulative infections. For the last North region, Emilia-Romagna, the cumulative infections are better predicted by the lockdown modified NIPA, which slightly overestimates the infections but to a lesser extent than the classical NIPA (Fig. 6). The baseline on the contrary, underestimates the infections. In Fig. 7, the results for Lazio confirm the better accuracy of NIPA-LD. Finally, Fig. 8 shows the results obtained for the Puglia region. We observe that the NIPA prediction with the lockdown transmission modifiers is able again for this region to accurately predict the cumulative infections, while the classical NIPA overestimates them from May 15 until June 9 and the logistic regression underestimates the infections even from May 10.

Figures 9 and 10 report the mean relative prediction error e[k] for the first 12 and for the last 9 regions, respectively, over an observation period of 30 days from May  10 to June 9.

For most of the regions (P.A. Bolzano, Emilia-Romagna, Friuli Venezia Giulia, Marche, Piemonte, Puglia, Sardegna, Sicilia, Toscana, P.A. Trento, Umbria, Valle d’Aosta, Veneto) NIPA-LD results in a substantially lower prediction error. In particular, after few days the re-openings of May 18 (corresponding to the third day in the plots), for which the population gradually started again going to bars, shops, hair dressers and other commercial activities and exploiting other kind of allowed services, the prediction error is much lower with the lockdown applied to NIPA. In other regions, like Abruzzo, Basilicata, Calabria, Campania, and Lazio, NIPA performs better than NIPA-LD for many days after May 16. This behavior could be due to the fact that on May 18 the mobility among the Italian region was allowed, thus there has been a high flow of people moving towards the southern regions. Thus, in spite of the restrictions made by the regional governor, often much more strict than the national ones, like, for instance in Campania, the lockdown measures where not effective. For Liguria and Lombardia, characterized by much more COVID-19 cases compared to the other regions, NIPA results in a lower error. Also for these two regions it seems that lockdown measures did not work. Finally, the Molise case is the only one having no substantial difference between the prediction error with lockdown and without lockdown. This region had the lowest number of COVID-19 cases. Moreover, there has been an erratic change in the number of infections in Molise, due to a single group of people, who did not follow the quarantine measures imposed by the Italian Government.

Death prediction

The network-based SIR model described in the paper does not consider the death cases. To predict the number of deaths a new compartment should be added. However, by substituting the cumulated cases of infected with those of dead people, the model allows to predict the deaths. Thereby, we assume that the number of deaths is proportional to the number of infections. Thus, we executed NIPA and NIPA-LD on these cumulated death cases to predict the deaths instead of the infections. Even if the death numbers are subject to greater variations and there are significantly fewer deaths than infections, the methods give good results. Table 2 reports for each region the average MAPE error for NIPA and NIPA-LD in predicting COVID-19 deaths. The lower error values are highlighted in Italic. For this experiment we set the number of neglected days to 30 by using the same transmission modifier values of the previous experiments. The table shows that the error values are very low and that NIPA-LD outperforms NIPA in 14 out of the 21 regions. It is worth pointing out that when NIPA performs better, the differences between error values are very low, except for Lombardia. As known, this region had more than 16 thousands deaths in the considered period. NIPA-LD in such a case underestimates the number of deaths. Figure 11 shows the predicted cumulative deaths of these two methods and those predicted by using logistic regression. Note that the baseline function is not able to obtain a good prediction, in fact it overestimates too much the number of deaths.

Table 2 Average MAPE prediction error of COVID-19 deaths for NIPA and NIPA-LD, when the number of neglected days is 30

Discussion

The results reported in the previous section show that NIPA-LD is able to better predict the evolution of COVID-19 in Italy when compared to the original NIPA method, that does not consider the lockdown measures, and to the baseline prediction method. The main contribution of NIPA-LD is the capability of sensibly improving the long-term prediction of NIPA by implementing the different lockdown measures adopted in the various phases of the spreading of the COVID-19 in Italy into the network-based prediction model. In fact, NIPA-LD obtains lower prediction errors than NIPA when the number of training days diminishes. The introduction of the concept of transmission modifiers in NIPA thus allows to have epidemic transmission rates which well reflect the changes in the containment measured imposed by authorities.

However, the adoption of the same values of transmission modifier for all the Italian regions has some drawbacks. In Tables 3 and 4, we report the daily error fraction value between NIPA-LD and NIPA for 30 neglected days. In the last column of Table 4, the average value of this error is also shown. When NIPA-LD outperforms NIPA, the daily error fraction is lower than 1. For most of the regions, NIPA-LD shows its superiority. Veneto, for example, is characterized by very low values with an average daily error of 0.15. Exceptions are Abruzzo, Basilicata, Calabria, Campania, Lazio, Liguria, and Lombardia, where NIPA performs better than NIPA-LD. Thus, though on average, NIPA-LD improves the prediction, this improvement is not for all the regions. Future works will investigate specialized transmission modifiers for the different regions. Moreover, whereas the transmission modifier \(\pi [k]\) may change over time, the infection rates \(\beta _{ij}\) are assumed constant. Hence, in NIPA-LD (and classic NIPA) another limitation is that the probabilities of infection are assumed to be constant, or potentially scaled/multiplied by \(\pi [k]\). Similarly, our model assumes constant curing rates \(\delta \). However, (hopefully soon available) vaccinations may be deployed in a time-varying manner.

Another observation is that although NIPA and NIPA-LD can obtain good short-term predictions, accurate long-term predictions are generally difficult. When aiming at predicting the infections beyond some time horizons, the accuracy of the forecasting starts decreasing. To provide a case study, in Figs. 12, 13, 14, 15 and 16, we show what happens when trying to predict the last 10, 20, 30, 40, 50 days of cumulative infections, respectively, in Valle d’Aosta. In the short-term of 10 and 20 neglected days, both NIPA and NIPA-LD well match the observed data. When predicting the last 30 days until June 9, NIPA-LD predicts the infections better than NIPA. For 40 neglected days, NIPA-LD is still able to predict with a certain accuracy while NIPA definitely overestimate the cumulative infections. For 50 days, note that both the two NIPA methods are not able to accurately predict the number of cumulative infections while the logistic regression, on the contrary, works better. When thus adding too many predicted days, an accurate prediction is not possible with the NIPA-based methods. However, even if the transmission modifier is equal for all the regions, we point out that NIPA-LD performs generally better than NIPA, also for \(n_{neglect}=30\) and \(n_{neglect}=40\) which can be considered long-term predictions.

Finally, we point out that this work is based on the discrete-time SIR model. This model is characterized by 3 compartments. NIPA can be used for any compartmental epidemic model (Prasse and Van Mieghem 2020b) with c compartments, provided that \(c-1\) compartments are measured. We point out that the approach in this work observes only one compartment, the infectious compartment I, and the recovered compartment R is obtained by Eq. (2) after estimating the curing probability \(\delta _i\) in the training phase. Here, the advantage is that the less compartments we use, the less data we need to provide an accurate forecasting. When only macroscopic data, such as those exploited here, are available, a simple epidemiological model like the SIR has shown to be sufficient to predict with a high accuracy the trend of the epidemic (Kozyreff 2020). More complicated models than the SIR, such as SEIR, SIRD, which require more additional states, do not necessarily obtain better accuracy.

Table 3 Daily error fraction value between NIPA-LD and NIPA for 30 neglected days, from day 1 to day 15
Table 4 Daily error fraction value between NIPA-LD and NIPA for 30 neglected days, from day 16 to day 30

Conclusion

We exploited a network-based SIR model to predict the curves of the cumulative infections of individuals affected by the SARS-CoV-2 virus in Italy. The classic SIR epidemic model has been expanded by incorporating time-varying lockdown protocols in order to have epidemic transmission rates that change as the government quarantine rules change. Tested on regional data of the COVID-19 in Italy, the network-based prediction method results in a higher prediction accuracy when compared to the classical method that does not consider the lockdown measures.

Fig. 1
figure1

The 21 Italian regions

Fig. 2
figure2

Mean prediction error when the number of the omitted days equals a \(n_{neglect} =10\), b \(n_{neglect}=20\), c \(n_{neglect} =30\) and d \(n_{neglect} =40\), for different transmission modifier vectors. e Cone of error evolution for \(n_{neglect}=30\)

Fig. 3
figure3

Cumulative infections for Piemonte

Fig. 4
figure4

Cumulative infections for Lombardia

Fig. 5
figure5

Cumulative infections for Veneto

Fig. 6
figure6

Cumulative infections for Emilia-Romagna

Fig. 7
figure7

Cumulative infections for Lazio

Fig. 8
figure8

Cumulative infections for Puglia

Fig. 9
figure9figure9

Mean relative prediction error for the period from May 10th to June 9th: 12 regions

Fig. 10
figure10

Mean relative prediction error for the period from May 10th to June 9th: 9 regions

Fig. 11
figure11

Cumulative deaths for Lombardia with \(n_{neglect}=30\)

Fig. 12
figure12

Cumulative infections for Valle d’Aosta with \(n_{neglect}=10\)

Fig. 13
figure13

Cumulative infections for Valle d’Aosta with \(n_{neglect}=20\)

Fig. 14
figure14

Cumulative infections for Valle d’Aosta with \(n_{neglect}=30\)

Fig. 15
figure15

Cumulative infections for Valle d’Aosta with \(n_{neglect}=40\)

Fig. 16
figure16

Cumulative infections for Valle d’Aosta with \(n_{neglect}=50\)

Experiments, however, pointed out that equal values of the transmission modifiers for all the Italian regions could not be appropriate, because of the differences in people mobility. On the other hand, the NIPA method extended to account for the lockdown measures highlighted the tremendous potential of an optimal transmission modifier. In fact NIPA-LD could be practically used to experiment which lockdown strategies are effective or not and which countermeasures are more appropriate to stop the spreading of COVID-19 epidemic. Future work will investigate how a transmission modifier might be best related to a quarantine strategy also in the training phase of NIPA, in order to improve the prediction capability of the approach.

Availability of data and materials

All data generated or analysed during this study can be downloaded from the Italian Civil Protection Department at the address https://github.com/pcm-dpc/COVID-19

Notes

  1. 1.

    Here, we recall the main reopening steps of commercial activities and services.

  2. 2.

    https://github.com/pcm-dpc/COVID-19.

Abbreviations

NIPA:

Network Inference based Prediction Algorithm

NIPA-LD:

Network Inference based Prediction Approach with LockDown

References

  1. Arenas A, Cota W, Gomez-Gardenes J, Gómez S, Granell C, Matamalas JT, Soriano-Panosand D, Steinegger B (2020) A mathematical model for the spatiotemporal epidemic spreading of COVID19. medRxiv. https://doi.org/10.1101/2020.03.21.20040022v1

    Article  Google Scholar 

  2. Caccavo D (2020) Chinese and Italian COVID-19 outbreaks can be correctly described by a modified SIRD model. medRxiv. https://doi.org/10.1101/2020.03.19.20039388

    Article  Google Scholar 

  3. Chinazzi M, Davis JT, Ajelli M, Gioannini C, Litvinova M, Merler S, Piontti AP, Mu K, Rossi L, Sun K, Viboud C, Xiong X, Yu H, Halloran ME Jr, Vespignani A (2020) The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science 368(6489):395–400

    Article  Google Scholar 

  4. Chu DK, Akl EA, Duda S, Solo K, Yaacoub S, Schünemann HJ (2020) Physical distancing, face masks, and eye protection to prevent person-to-person transmission of SARS-CoV-2 and COVID-19: a systematic review and meta-analysis. Lancet 10:10160140673620311429

    Google Scholar 

  5. Di LD, Pullano G, Sabbatini C, Boëlle P, Colizza V (2020) Impact of lockdown on COVID-19 epidemic in île-de-france and possible exit strategies. BMC Med 18(1):240–240

    Article  Google Scholar 

  6. Estrada E (2020) Covid-19 and sars-cov-2. Modeling the present, looking at the future. Phys Rep 869:1–51

    MathSciNet  Article  Google Scholar 

  7. Ferrari L, Gerardi G, Manzi G, Micheletti A, Nicolussi F, Salini S (2020) Modelling provincial covid-19 epidemic data in Italy using an adjusted time-dependent SIRD model. arXiv:2005.12170

  8. Flaxman S, Mishra S, Gandy A, Unwin HJT, Mellan TA, Coupland H, Whittaker C, Zhu H, Berah T, Eaton JW et al (2020) Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe. Nature 584(7820):257–261

    Article  Google Scholar 

  9. Galeazzi A, Cinelli M, Bonaccorsi G, Pierri F, Schmidt AL, Scala A, Pammolli F, Quattrociocchi W (2020) Human mobility in response to COVID-19 in France, Italy and UK. arXiv:2005.06341

  10. Giuliani D, Dickson MM, Espa G, Santi F (2020) Modelling and predicting the spatio-temporal spread of coronavirus disease 2019 (COVID-19) in Italy. Available at SSRN 3559569

  11. Hadjidemetriou GM, Sasidharan M, Kouyialis G, Parlikad AK (2020) The impact of government measures and human mobility trend on COVID-19 related deaths in the UK. Transp Res Interdiscip Perspect 6:100167

    Google Scholar 

  12. Haug N, Geyrhofer L, Londei A, Dervic E, Desvars-Larrive A, Loreto V, Pinior B, Thurner S, Klimek P (2020) Ranking the effectiveness of worldwide COVID-19 government interventions. medRxiv. https://doi.org/10.1101/2020.07.06.20147199

    Article  Google Scholar 

  13. Hethcote HW (2000) The mathematics of infectious diseases. SIAM Rev 42(4):599–653

    MathSciNet  MATH  Article  Google Scholar 

  14. Kermack WO, McKendrick AG (1927) A contribution to the mathematical theory of epidemics. Proc R Soc Lond Ser A 115:700–721

    MATH  Article  Google Scholar 

  15. Klein B, LaRock T, McCabe S, Torres L, Privitera,F, Lake B, Kraemer MUG, Brownstein JS, Lazer D, Eliassi-Rad T, Scarpino SV, Chinazzi M, Vespignani A (2020) Assessing changes in commuting and individual mobility in major metropolitan areas in the united states during the COVID-19 outbreak. https://www.networkscienceinstitute.org/publications

  16. Kozyreff G (2020) Hospitalization dynamics during the first COVID-19 pandemic wave: sir modelling compared to Belgium, France, Italy, Switzerland and New York City data. arXiv:2007.01411

  17. Lavezzo E, Franchin E, Ciavarella C, Cuomo-Dannenburg G, Barzon L, Del Vecchio C, Rossi L, Manganelli R, Loregian A, Navarin N et al (2020) Suppression of COVID-19 outbreak in the municipality of VO, Italy. Nature 584:425–429. https://doi.org/10.1038/s41586-020-2488-1

    Article  Google Scholar 

  18. Maier BF, Brockmann D (2020) Effective containment explains subexponential growth in recent confirmed COVID-19 cases in China. Science 4557(eabb4557):742–746

    Article  Google Scholar 

  19. Oliver N, Lepri B, Sterly H, Lambiotte R, Deletaille S, Nadai MD, Letouze E, Salah AA, Benjamins R, Cattuto C, Colizza V, de Cordes N, Fraiberger SP, Koebe T, Lehmann S, Murillo J, Pentland A, Pham PN, Pivetta F, Saramaki J, Scarpino SV, Tizzoni M, Verhulst S, Vinck P (2020) Mobile phone data for informing public health actions across the COVID-19 pandemic life cycle. Science Advances 6(23):eabc0764

    Article  Google Scholar 

  20. Pastor-Satorras R, Castellano C, Van Mieghem P, Vespignani A (2015) Epidemic processes in complex networks. Rev Mod Phys 87(3):925–979

    MathSciNet  Article  Google Scholar 

  21. Pei S, Kandula S, Shaman J (2020) Differential effects of intervention timing on COVID-19 spread in the United States. medRxiv. https://doi.org/10.1101/2020.05.15.20103655

    Article  Google Scholar 

  22. Prasse B, Van Mieghem P (2020a) Fundamental limits of predicting epidemic outbreaks. Delft University of Technology, Delft

    Google Scholar 

  23. Prasse B, Van Mieghem P (2020b) Network reconstruction and prediction of epidemic outbreaks for general group-based compartmental epidemic models. IEEE Trans Netw Sci Eng. https://doi.org/10.1109/TNSE.2020.2987771

    Article  Google Scholar 

  24. Prasse B, Van Mieghem P (2020c) Predicting dynamics on networks hardly depends on the topology. arXiv:2005.14575

  25. Prasse B, Achterberg MA, Ma L, Van Mieghem P (2020) Network-inference-based prediction of the COVID-19 epidemic outbreak in the Chinese province Hubei. Appl Netw Sci 5(1):1–11

    Article  Google Scholar 

  26. Schlosser F, Maier BF, Hinrichs D, Zachariae A, Brockmann D (2020) COVID-19 lockdown induces structural changes in mobility networks? Implication for mitigating disease dynamics. arXiv:2007.01583v2

  27. Song PX, Wang L, Zhou Y, He J, Zhum B, Wang F, Tang L, Eisenberg M (2020) An epidemiological forecast model and software assessing interventions on COVID-19 epidemic in China. medRxiv. https://doi.org/10.1101/2020.02.29.20029421

    Article  Google Scholar 

  28. Youssef M, Scoglio C (2011) An individual-based approach to SIR epidemics in contact networks. Journal of Theoretical Biology 283(1):136–144

    MathSciNet  MATH  Article  Google Scholar 

Download references

Open Access

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Author information

Affiliations

Authors

Contributions

All authors contributed to the paper, read and approved the manuscript.

Corresponding author

Correspondence to Clara Pizzuti.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Funding

This work has been supported by the Universiteits fonds Delft under the program TU Delft Covid-19 Response Fund.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pizzuti, C., Socievole, A., Prasse, B. et al. Network-based prediction of COVID-19 epidemic spreading in Italy. Appl Netw Sci 5, 91 (2020). https://doi.org/10.1007/s41109-020-00333-8

Download citation

Keywords

  • Network inference
  • Epidemiology
  • COVID-19
  • Coronavirus
  • SIR model
  • Transmission modifier