Network-based prediction of COVID-19 epidemic spreading in Italy

Initially emerged in the Chinese city Wuhan and subsequently spread almost worldwide causing a pandemic, the SARS-CoV-2 virus follows reasonably well the Susceptible–Infectious–Recovered (SIR) epidemic model on contact networks in the Chinese case. In this paper, we investigate the prediction accuracy of the SIR model on networks also for Italy. Specifically, the Italian regions are a metapopulation represented by network nodes and the network links are the interactions between those regions. Then, we modify the network-based SIR model in order to take into account the different lockdown measures adopted by the Italian Government in the various phases of the spreading of the COVID-19. Our results indicate that the network-based model better predicts the daily cumulative infected individuals when time-varying lockdown protocols are incorporated in the classical SIR model.


Introduction
The outbreak of the greatest epidemic of the 21st century caused by the SARS-CoV-2 virus has stimulated researchers to understand and control the spread of the disease inside a population with the help of mathematical models developed in recent years [13,20].A single outbreak of a disease is typically described by a SIR compartmental model, where each individual at a certain time t can only be in one of the three different disease stages: Susceptible (S), i.e. healthy, but vulnerable for the infection, Infected (I) and Recovered (R), i.e. the individual either recovers from the disease or, unfortunately, dies.A diffusion-like SIR epidemic spread on a contact network models the infection between individuals when they come into contact, close enough in space and long enough in time [4].By adopting the SIR model, Prasse et al. [23] predict the spreading of the COVID-19 epidemic on a contact network consisting of 16 cities in the Chinese province Hubei via their Network Inferencebased Prediction Algorithm (NIPA).Since the interactions between cities are unknown, Prasse et al. exploit their network reconstruction approach, described in [22], to estimate the contact network from the observations of the viral states.
In this paper, we use NIPA [22,23] to investigate the spreading of the COVID-19 epidemic in Italy by considering the 21 Italian regions, shown in Fig. 1, as nodes of the network.We extend NIPA to NIPA-LD (NIPA with LockDown), that takes into account the different lockdown measures adopted in the various phases of the COVID-19 spreading in Italy by adapting the ideas of Song et al. [27].Song et al. [27] pointed out that the epidemiological models do not consider the several containment measures, such as in-home isolation, travel and social activities restrictions, enforced by governments to dampen the transmission rate over time.Due to the containment measures, the infection rates vary over time, which should be incorporated in a prediction model to reflect the real situations of epidemic and provide more meaningful analyses.
We apply NIPA and the extension NIPA-LD to the period between the March 1 and June 9, 2020.Our results indicate that NIPA-LD is capable to better predict the daily cumulative infected individuals, because the time-varying lockdown restrictions are considered.

Related work
In the last months, the number of papers studying the COVID-19 pandemic and proposing models to predict the evolution of the disease sky-rocketed.In [6], Estrada discusses how this pandemic is actually modeled and proposes future research directions by reviewing the three main areas of modeling research against COVID-19: epidemiology, drug repurposing, and vaccine design.After the strict policies in China to reduce close contacts between people, which revealed the best strategy to effectively block the virus diffusion, Italy and many other European countries imposed several containment measures, called lockdown.Some researches then investigated how mobility changed during the lockdown phases [19,15,9,26], others have shown how lockdown can effectively slow down disease transmission.Flaxman et al. [8] study the effect on COVID-19 transmission of the major non-pharmaceutical interventions (NPIs) across 11 European countries for the period from the start of the COVID-19 epidemics in February 2020 until May 4 2020.In a more general work, Haug et al. [12] quantify the effectiveness of the world-wide NPIs to mitigate the spreading of COVID-19 and SARS-CoV-2 showing that this effectiveness is strongly related to the economic development as

Background
In this section, we briefly review the epidemic SIR model on contact networks [28,22] and the prediction of the COVID-19 infection, caused by the SARS-CoV-2 virus, based on the SIR model [23].Then, we incorporate time-varying protocols introduced by the government to slow down the virus propagation.
We consider a network with N nodes, where each node i corresponds to the set of individuals living in the same place, like a city or a region.An individual at any discrete time k = 1, 2, . . . is in either one of the c = 3 compartments Susceptible (S), Infectious (I), Recovered (R).The SIR model assumes that infectious individuals become recovered and cannot infect any longer because of hospitalization, death, or quarantine measures.The viral state of any node i at time k is denoted by the 3 are the fractions of susceptible, infectious, and recovered individuals, respectively, satisfying the conservation law The discrete-time SIR model [28,22] defines the evolution of the viral state v i [k] of each node i as: where β ij denotes the infection probability when individuals move from place (also called region) j to place i.The self-infection probability β ii = 0, because individuals inside the same place interact.The N × N infection probability matrix B specifies the contact transmission chance between each couple of regions.The curing probability δ i of place i quantifies the capability of individuals in place i to cure from the virus.We assume that the SIR model described in ( 1) and ( 2) has both β ij and δ i that do not change over time.
Prasse et al. [23] proposed the Network Inference-based Prediction Algorithm (NIPA), which estimates the spreading parameters δ i and β ij for each region i from the time series These estimates in (1) and ( 2) predict the evolution of the virus in the next future times k > n.
The SIR model has three compartments.In principle, with c compartments, we must have c − 1 independent measurements.The input to NIPA is only one compartment, the infectious compartment I, which is less than c−1 = 2 compartments necessary to reconstruct the network with the SIR model.NIPA creates observations of the R compartment by iterating over different candidate values of the curing rates δ i and assuming the initial condition R(0)=0.Thus, we observe only one compartment, the infectious compartment I, and the recovered compartment R is obtained by equation (2) after estimating the curing probability δ i in the training phase.
To obtain the curing probability δ i , 50 equidistant values between δ min and δ max have been considered, and then the value giving the best fit of model (1) has been used to estimate the matrix B based on the least absolute shrinkage and selection operator (LASSO).For a general class of dynamics on networks (including the SIR model), completely different network topologies can result in the same dynamics.Hence, it is not possible to deduce the network accurately from observations, regardless of the reconstruction method: two very different networks perfectly match the observations, and there is no reason to infer one network instead of the other.Thus, though NIPA accurately predicts the dynamics, the estimated network B can be very different from the true network [25].
Let n be the number of days in which the infection has been observed.To evaluate the prediction accuracy, a fixed number of days n neglect is removed prior to Thereafter, the omitted n neglect days (k = n−n neglect +1, ..., n) are predicted.It is possible to predict also n predict days (k = n+1, ..., n+n predict ) ahead the number n of available observations, however, in such a case, we cannot evaluate the goodness of the prediction.
Prasse et al. [23] showed that the approach accurately predicts the cumulative infections for n neglect ≤ 5.However, if the number of neglected days increases, then the prediction capability of NIPA decreases.NIPA assumes constant values for β ij , which, however, do not reflect the reality of the COVID-19 pandemic, because the containment measures imposed by the governments diminish β ij and thus the spread of the infection.Hence, infection probabilities β ij [k] which vary over time k should be considered in the epidemic model.

Extended SIR model with time-varying infection rate
Song et al. [27] proposed the concept of transmission modifiers, which decrease the probability that a susceptible individual can come into contact with an infected one because of the quarantine measures.
At any discrete time k, let q S [k] be the chance of an individual to be in home isolation, and q I [k] the chance of an infected person to be in hospital quarantine.The transmission modifier π[k] is defined as follows: and if no quarantine is active, then π[k] = 1.In order to have a realistic infection rate β, Song et al. [27] multiply β by π[k] in the classic continuous SIR model.Thus, the infection rate now reflects the effective currently enforced quarantine measures taken in a country.In the extended SIR model, the curing probability δ i remains the same, but the infection probability β ij is replaced by The same considerations can be applied to the discrete-time SIR model by modifying equation (1) above: The transmission modifier π[k], however, should be specified on the base of the effective quarantine protocols undertaken in a specific region.Regarding the Hubei province in China, Song et al. [27] suggest a step function mirroring the isolation measures established by the government.
In the next section, the extended time-varying model ( 4) is applied to Italy by considering as nodes of the contact network the 21 regions by which Italy is composed.

Transmission modifier for Italy
In Italy, the outbreak of the COVID-19 epidemic started in February in the North of Italy.A map of Italy with the division in regions is shown in Figure 1.On February 21, the first case of infection appeared in the town of Codogno, in Lombardia, and two cases also in the town of Vo' Euganeo in Veneto.These two towns where immediately declared red zones and nobody could either go out or come in.On February 24, the three regions of Lombardia, Veneto, and Emilia-Romagna registered 172, 33, and 18 cases of infections, respectively.After that date, the virus propagated all over Italy very fast.
During the first week, until the first days of March, no other particularly strict safety measures were enforced.On March 9, however, Italy turned into a lockdown Phase 1 with several strong restrictions and quarantine protocols.Schools, universities, shops, and many offices were closed, travels were not allowed and exits were only allowed for work, health or necessity situations with a mandatory self-certification.
The choice of the best values of the transmission modifier reflecting well the quarantine protocols is not an easy task and deserves a deep investigation.In the next sections, a study on the improvement of the NIPA method, when different lockdown levels related to the quarantine strategies are adopted by authorities, is performed.

Data preprocessing
Our measurement data have been collected by the Italian Civil Protection Department2 and are daily published on a repository.The available data are national, regional and provincial.We selected the regional ones which refer to the 21 regions depicted in Figure 1: Abruzzo, Basilicata, P.A. Bolzano, Calabria, Campania, Emilia-Romagna, Friuli Venezia Giulia, Lazio, Liguria, Lombardia, Marche, Molise, Piemonte, Puglia, Sardegna, Sicilia, Toscana, P.A. Trento, Umbria, Valle d'Aosta, Veneto.Thus, for Italy, the entry β ij of the 21 × 21 matrix B estimates the infection probability between the regions j and i.In the map, regions have been divided in 4 different colors representing the level of COVID-19 infected individuals.The red regions have been the most affected by COVID-19, followed by the yellow ones, the orange ones and the green regions with a lower number of cases.
For each observation day, we focused on the new positives to COVID-19.We considered observations from March 1 to June 9, 2020.

Transmission modifier analysis
To compare the NIPA method with the NIPA-LD implementing the lockdown measures, we considered the model generated by NIPA which, in the training phase, neglects n neglect days, and then applied this model for the prediction phase by using different values of π and an increasing value of n neglect .After that, we computed the average percentage error reduction of NIPA-LD with respect to NIPA.
Let I CF,i [k] be the observed cumulative fraction of infections of region i at time k: To quantify the prediction accuracy we considered the Mean Absolute Percentage Error (MAPE) defined as: where In order to find a good transmission modifier which reflects the real situation best, we tested different π values by supposing a different response from people in respecting the quarantine measures imposed in the three months with varying levels of restrictions.Thus, we fixed increasing values of π which intuitively correspond to a lower compliance to the containment protocols by the individuals.In view of the Italian lockdown measures previously described, we considered the following transmission modifier values: Table 1 reports the improvement of the percentage error of NIPA-LD with respect to NIPA, for the seven transmission modifiers and different numbers of predicted/omitted days, averaged over all the Italian regions and considering all the time windows under study, while Figure 2 shows the mean absolute prediction error as a function of the predicted/omitted days.From the table we can observe that for n neglect equals to 10, 30 and 40 the percentage of improvement is overall very significant for most of the transmission modifier vectors.This means that NIPA-LD can be used to When n neglect = 20 the error reduces, on average, only for π LD6 and π LD7 .However, as Figure 2(b) highlights, for π LD5 there is a reduction of the prediction error since the 10th day, and for π LD4 , π LD3 , π LD2 in the following next days, except for π LD1 .Hence, for this case, we can conclude that soft lockdown protocols are able to induce a positive improvement in the error for all the values of the number of neglected days.Finally, Figure 2(e) depicts a cone of error evolution for n neglect = 30 when using as transmission modifiers π LD5 , π LD6 , π LD7 , considering π LD5 and π LD7 as lower bound and upper bound of π LD6 , respectively.Then, we could assume that the future evolution of the epidemic can be predicted with an error that falls in between the predictions based on π ub and π lb .
Figure 2 shows that the differences between the different lockdown measures are meaningful.In the next section, a detailed analysis for all the Italian regions is performed to evaluate the prediction accuracy of NIPA and NIPA-LD.

Results
In this section, we evaluate the prediction accuracy of NIPA and NIPA-LD by computing the cumulative infections for each observation day when n neglect = 30 and compare them to the true data by using π LD6 as transmission modifier for the different quarantine periods.In this experiment, thus, NIPA does not consider the 30 last days of the observed daily data of the newly infected individuals for estimating the curing probability δ i and the infection probability β ij .Then, both NIPA and NIPA-LD predict the cumulative infections from May 10 until June 9 and the predictions are compared with (a) the true data, and (b) to the logistic function as baseline.The logistic function, introduced in the 19th century by Verhulst to model population growth, approximates the solutions of the SIS and SIR models [14,24].The cumulative number of infected cases y i [k] at time k for the region i is assumed to follow: where y ∞,i is the long-term fraction of infected individuals, K i is the logistic growth rate, t is the time in day.Due to lack of space, we only report the plots for a subset of the North regions, the ones highly affected by the virus spreading in the red and yellow zones (Piemonte, Lombardia, Veneto, Emilia-     Romagna), for one representative region of the orange zone (Lazio) and for one of the green zones (Puglia).For the center and the south of Italy, the COVID-19 spreading has been characterized by a lower number of cases and for this reason we report only two representative regions.In Figure 3, the cumulative infections for Piemonte are shown.Here, the lockdown modified NIPA variant clearly outperforms the classical NIPA, which overestimates the number of infected individuals.For Piemonte, NIPA-LD better matches the true data.Moreover, for this region, a simple logistic regression is not able to well predict the epidemic.Figure 4 depicts the trend of the predictions for the most challenging region in Italy, Lombardia, which has been mostly affected by the COVID-19.Again, the logistic regression excessively understimates the cumulative infections.From May 10 to May 30, both NIPA and NIPA-LD models well match the number of cumulative infections.However, for the next days, NIPA slightly overestimates the infections while NIPA-LD underestimates them.This is probably due to a much higher mobility of the population after the loosening of the lockdown rules on May 25.The Veneto case (Figure 5), another region of the North Italy highly affected by the COVID-19, on the contrary, is accurately predicted by NIPA-LD, while classical NIPA without lockdown clearly overestimates the number of infections.Here, the logistic regression works better than the previous regions but still understimates the cumulative infections.For the last North region, Emilia-Romagna, the cumulative infections are better predicted by the lockdown modified NIPA, which slightly overestimates the infections but to a lesser extent than the classical NIPA (Figure 6).The baseline on the contrary, underestimates the infections.In Figure 7, the results for Lazio confirm the better accuracy of NIPA-LD.Finally, Figure 8 shows the results obtained for the Puglia region.We observe that the NIPA prediction with the lockdown transmission modifiers is able again for this region to accurately predict the cumulative infections, while the classical NIPA overestimates them from May 15 until June 9 and the logistic regression underestimates the infections even from May 10.Figures 9, 10 and 11 report the mean relative prediction error e[k] for the 21 regions divided into 3 groups, over an observation period of 30 days from May 10 to June 9.For most of the regions (P.A. Bolzano, Emilia-Romagna, Friuli Venezia Giulia, Marche, Piemonte, Puglia, Sardegna, Sicilia, Toscana, P.A. Trento, Umbria, Valle d'Aosta, Veneto) NIPA-LD results in a substantially lower prediction error.In particular, after few days the re-openings of May 18 (corresponding to the third day in the plots), for which the population gradually started again going to bars, shops, hair dressers and other commercial activities and exploiting other kind of allowed services, the prediction error is much lower with the lockdown applied to NIPA.In other regions, like Abruzzo, Basilicata, Calabria, Campania, and Lazio, NIPA performs better than NIPA-LD for many days after May 16.This behavior could be due to the fact that on May 18 the mobility among the Italian region was allowed, thus there has been a high flow of people moving towards the southern regions.Thus, in spite of the restrictions made by the regional governor, often much more strict than the national ones, like, for instance in Campania, the lockdown measures where not effective.For Liguria and Lombardia, characterized by much more COVID-19 cases compared to the other regions, NIPA results in a lower error.Also for these two regions it seems that lockdown measures did not work.Finally, the Molise case is the only one having no substantial difference between the prediction error with lockdown and without lockdown.This region had the lowest number of COVID-19 cases.Moreover, there has been an erratic change in the number of infections in Molise, due to a single group of people, who did not follow the quarantine measures imposed by the Italian Government.

Death Prediction
The network-based SIR model described in the paper does not consider the death cases.To predict the number of deaths a new compartment should be added.However, by substituting the cumulated cases of infected with those of dead people, the model allows to predict the deaths.Thereby, we assume that the number of deaths is proportional to the number of infections.Thus, we executed NIPA and NIPA-LD on these cumulated death cases to predict the deaths instead of the infections.Even if the death numbers are subject to greater variations and there are significantly fewer deaths than infections, the methods give good results.Table 2 reports for each region the average MAPE error for NIPA and NIPA-LD in predicting COVID-19 deaths.For this experiment we set the number of neglected days to 30 by using the same transmission modifier values of the previous experiments.The table shows that the error values are very low and that NIPA-LD outperforms NIPA in 14 out of the 21 regions.It is worth pointing out that when NIPA performs better, the differences between error values are very low, except for Lombardia.As known, this region had more than 16 thousands deaths in the considered period.NIPA-LD in such a case underestimates the number of deaths.Figure 12 shows the predicted cumulative deaths of these two methods and those predicted by using logistic regression.Note that the baseline function is not able to obtain a good prediction, in fact it overestimates too much the number of deaths.

Discussion
The results reported in the previous section show that NIPA-LD is able to better predict the evolution of COVID-19 in Italy when compared to the original NIPA method, that does not consider the lockdown measures, and to the baseline prediction method.The main contribution of NIPA-LD is the capability of sensibly improving the long-term prediction of NIPA by implementing the different lockdown measures adopted in the various phases of the spreading of the COVID-19 in Italy into the network-based prediction model.In fact, NIPA-LD obtains lower prediction errors than NIPA when the number of training days diminishes.The introduction of the concept of transmission modifiers in NIPA thus allows to have epidemic transmission rates which well reflect the changes in the containment measured imposed by authorities.However, the adoption of the same values of transmission modifier for all the Italian regions has some drawbacks.In Tables 3 and 4, we report the daily error fraction value between NIPA-LD and NIPA for 30 neglected days.In the last column of Table 4, the average value of this error is also shown.When NIPA-LD outperforms NIPA, the daily error fraction is lower than 1.For most of the regions, NIPA-LD shows its superiority.Veneto, for example, is characterized by very low values with an average daily error of 0.15.Exceptions are Abruzzo, Basilicata, Calabria, Campania, Lazio, Liguria, and Lombardia, where NIPA performs better than NIPA-LD.Thus, though on average, NIPA-LD improves the prediction, this improvement is not for all the regions.Future works will investigate specialized transmission modifiers for the different regions.Moreover, whereas the transmission modifier π[k] may change over time, the infection rates β ij are assumed constant.Hence, in NIPA-LD (and classic NIPA) another limitation is that the probabilities of infection are assumed to be constant, or potentially scaled/multiplied by π[k].Similarly, our model assumes constant curing rates δ.However, (hopefully soon available) vaccinations may be deployed in a time-varying manner.
Another observation is that although NIPA and NIPA-LD can obtain good short-term predictions, accurate long-term predictions are generally difficult.When aiming at predicting the infections beyond some time horizons, the accuracy of the forecasting starts decreasing.To provide a case study, in Figures 13,14,15,16 and 17,we show what happens when trying to predict the last 10, 20, 30, 40, 50 days of cumulative infections, respectively, in Valle d'Aosta.In the short-term of 10 and 20 neglected days, both NIPA and NIPA-LD well match the observed data.When predicting the last 30 days until June 9, NIPA-LD predicts the infections better than NIPA.For 40 neglected days, NIPA-LD is still able to predict with a certain accuracy while NIPA definitely overestimate the cumulative infections.For 50 days, note that both the two NIPA methods are not able to accurately predict the number of cumulative infections while the logistic regression, on the contrary, works better.When thus adding too many predicted days, an accurate prediction is not possible with the NIPA-based methods.However, even if the transmission modifier is equal for all the regions, we point out that NIPA-LD performs generally better than NIPA, also for n neglect = 30 and n neglect = 40 which can be considered long-term predictions.
Finally, we point out that this work based on the discrete-time SIR model.This model is characterized by 3 compartments.NIPA can be used for any compartmental epidemic model [22] with c compartments, provided that c − 1 compartments are measured.We underline that the approach in this work observes only one compartment, the infectious compartment I, and the recovered compartment R is obtained by equation (2) after estimating the curing probability δ i in the training phase.Here, the advantage is that the less compartments we use, the less data we need to provide an accurate forecasting.When only macroscopic data, such as those exploited here, are available, a simple epidemiological model like the SIR has shown to be sufficient to predict with a high accuracy the trend of the epidemic [16].More complicated models than the SIR, such as SEIR, SIRD, which require more additional states, do not necessarily obtain better accuracy.

Conclusion
We exploited a network-based SIR model to predict the curves of the cumulative infections of individuals affected by the SARS-CoV-2 virus in Italy.The classic SIR epidemic model has been expanded by incorporating time-varying lockdown protocols in order to have epidemic transmission rates that change as the government quarantine rules change.Tested on regional data of the COVID-19 in Italy, the network-based prediction method results in a higher prediction accuracy when compared to the classical method that does not consider the lockdown measures.Experiments, however, pointed out that equal values of the transmission modifiers for all the Italian regions could not be appropriate, because of the differences in people mobility.On the other hand, the NIPA method extended to account for the lockdown measures highlighted the tremendous potential of an optimal transmission modifier.In fact NIPA-LD could be practically used to experiment which lockdown strategies are effective or not and which countermeasures are more appropriate to stop the spreading of COVID-19 epidemic.Future work will investigate how a transmission modifier might be best related to a quarantine strategy also in the training phase of NIPA, in order to improve the prediction capability of the approach.

Figure 2 :
Figure 2: Mean prediction error when the number of the omitted days equals (a) n neglect = 10, (b) n neglect = 20, (c) n neglect = 30 and (d) n neglect = 40, for different transmission modifier vectors.(e) Cone of error evolution for n neglect = 30.

Figure 9 :
Figure 9: Mean relative prediction error for the period from May 10 to June 9: first group of regions.

Figure 10 :Figure 11 :Figure 12 :
Figure 10: Mean relative prediction error for the period from May 10 to June 9: second group of regions.
the predicted cumulative fraction of infected individuals in region i at time k.Let e[k] and e LD [k] denote the MAPE errors when I CF pred,i [k] is computed by NIPA and NIPA-LD, respectively.The percentage error improvement of NIPA-LD over NIPA is then computed as

Table 1 :
Percentage improvement of NIPA-LD over NIPA prediction for different transmission modifier values and increasing number of neglected days.nneglect pe π LD1 short and long term predictions.More specifically, for the short term predictions (n neglect = 10) low transmission modifier values are more suitable: π LD1 , for example, is able to achieve an improvement of 35.369%.For the long term predictions, on the contrary, where we neglect 30 or even 40 days aiming to predict them, higher transmission modifier values like those of π LD7 perform better.

Table 2 :
Average MAPE prediction error of COVID-19 deaths for NIPA and NIPA-LD, when the number of neglected days is 30.

Table 3 :
Daily error fraction value between NIPA-LD and NIPA for 30 neglected days, from day 1 to day 15.