Modeling self-propagating malware with epidemiological models

Self-propagating malware (SPM) is responsible for large financial losses and major data breaches with devastating social impacts that cannot be understated. Well-known campaigns such as WannaCry and Colonial Pipeline have been able to propagate rapidly on the Internet and cause widespread service disruptions. To date, the propagation behavior of SPM is still not well understood. As result, our ability to defend against these cyber threats is still limited. Here, we address this gap by performing a comprehensive analysis of a newly proposed epidemiological-inspired model for SPM propagation, the Susceptible-Infected-Infected Dormant-Recovered (SIIDR) model. We perform a theoretical analysis of the SIIDR model by deriving its basic reproduction number and studying the stability of its disease-free equilibrium points in a homogeneous mixed system. We also characterize the SIIDR model on arbitrary graphs and discuss the conditions for stability of disease-free equilibrium points. We obtain access to 15 WannaCry attack traces generated under various conditions, derive the model’s transition rates, and show that SIIDR fits the real data well. We find that the SIIDR model outperforms more established compartmental models from epidemiology, such as SI, SIS, and SIR, at modeling SPM propagation.


Introduction
Self-propagating malware (SPM) is one of today's most concerning cybersecurity threats. Over past years, SPM resulted in huge financial losses and data breaches with high economic and societal impacts. For instance, the infamous WannaCry (Mike Azzara 2021) attack, first discovered in 2017 and still actively used by attackers nowadays, was estimated to have affected more than 200, 000 computers across 150 countries worldwide, with economic damages ranging from hundreds of millions to billions of dollars. In May 2021, the Colonial Pipeline (Wikipedia 2023a) cyber-attack caused the shut down of the entirety of the Colonial gasoline pipeline system for several days. It affected consumers and airlines along the East Coast of the United States and was deemed a national security threat. Another remarkable worldwide SPM attack is Petya (Wikipedia 2023b), first discovered in 2016 when it started spreading through phishing emails. Petya represents a family of various types of ransomware responsible for estimated economic damages of over 10 million dollars (Wikipedia 2023b).
Given the current cyber-crime landscape, with new threats emerging daily, tools designed for modeling SPM behavior become crucial. Indeed, a deep understanding of self-propagating malware characteristics provides us opportunities to identify threats, test control strategies, and design proactive defenses against attacks. A large body of research on the subject so far has been devoted to the design of methods to detect and mitigate self-propagating malware. Proposed techniques include network traffic signatures (Kim and Karp 2004;Kumar and Lim 2020;Ongun et al. 2021;Newsome et al. 2005) and host-level binary analysis (Chen and Bridges 2017;Ben Said et al. 2018) used to identify anomalous behavior, software-defined networking (SDN) for ransomware threat detection and mitigation (Akbanov et al. 2019;Alotaibi and Vassilakis 2021), as well as evasion-resilient methods for detecting adaptive worms (Li and Stafford 2014;Newsome et al. 2005;Ongun et al. 2021). However, less attention was dedicated to comparing and finding the most suitable models to capture SPM behavior. Additionally, the majority of existing works on SPM modeling focus on theoretical analyses of infection spreading (Guillén et al. 2017;Guillén and del Rey 2018;Mishra and Saini 2007;Martínez Martínez et al. 2021), lacking a thorough real-world evaluation of these models.
In this paper, we model the behavior of a well-known SPM attack, WannaCry, based on real-world attack traces. The similarities between the behavior of biological and computer viruses enable us to leverage compartmental models from epidemiology. We adopt a novel compartmental epidemic model called SIIDR (Chernikova et al. 2022), and conduct a thorough analysis to show that it can be used to accurately model SPM spreading dynamics.
First, we study the model assuming a homogeneous mixing of hosts and analytically derive its basic reproduction number R 0 (Dietz 1993;Kephart and White 1993; Van den Driessche and Watmough 2008). R 0 is the number of secondary cases generated by an infectious seed in a fully susceptible population. It describes the epidemic threshold, thus, the conditions necessary for a macroscopic outbreak ( R 0 > 1) (Fraser et al. 2009; Van den Driessche and Watmough 2008). We also investigate equilibrium or fixed points of SIIDR as they provide insights on how to contain or suppress the spreading.
Additionally, computer networks are often represented as graphs, where nodes denote the hosts in the network and edges represent the communication links between them. In any static graph, the propagation of contagion processes depends not only on the transition rates of SPM but also on the spectral properties of the graph (Newman 2018). To discuss the important characteristics of SIIDR that illustrate the ability of SPM to successfully propagate through the network in these settings, we represent SIIDR model as a Non-Linear Dynamical System (NLDS) and relaxing the homogeneous mixing assumption.
Finally, we reconstruct the dynamics of WannaCry spreading analysing real traffic logs. We use the Akaike Information Criterion (AIC) (Akaike 1974) to compare how different compartmental models fit the derived epidemic traces. We show that SIIDR captures malware spreading better than classical epidemic models such as SI, SIS, SIR. Indeed, the investigation of real WannaCry attacks showed that consecutive infection attempts originating from the same host are delayed by a variable time interval. This finding suggests the existence of "dormant" infected state, in which infected hosts temporarily cease to pass infection to their neighbors. Furthermore, calibrating the model to the real data via an Approximate Bayesian Computation technique we determine the transition rates (i.e., model parameters) that characterize WannaCry propagation.
To summarize, our contributions are the following: • We derive the basic reproduction number of the SIIDR model (Chernikova et al. 2022) and discuss the stability conditions of the disease-free equilibrium points of the system of ODEs that represents SIIDR under a homogeneous mixing assumption. • We derive the conditions for stability of the SIIDR disease-free equilibrium points on arbitrary graphs thus relaxing the homogeneous mixing assumption. • We reconstruct the spreading dynamics of an actual SPM (WannaCry) using realworld traces obtained by running a vulnerable version of Windows in a virtual environment. • We show that SIIDR outperforms several classical models in terms of capturing WannaCry behavior, and derive the model's transition rates from actual attacks.
We organize the rest of the paper as follows: first, we provide background information about the WannaCry malware and the most common compartmental models of epidemiology. We also define the threat model and problem statement. Then we introduce the SIIDR model, discuss the derivation of R 0 and the stability of the disease-free equilibrium points. In addition, we present the experimental results that support the findings of the paper. Table 1 includes common terminology used in the paper.

WannaCry malware
WannaCry is a self-propagating malware attack, which targets computers running the Microsoft Windows operating system by encrypting data and demanding ransom in Bitcoins. It automatically spreads through the network and scans for vulnerable systems, using the EternalBlue exploit to gain access, and the DoublePulsar backdoor tool to install and execute a copy of itself. WannaCry malware has a 'kill-switch' that appears to work like this: part of WannaCry's infection routine involves sending a request that checks for a web domain. If its request returns showing that the domain is alive or online, it will activate the 'kill-switch' , prompting WannaCry to exit the system and no longer proceed with its propagation and encryption routines. Otherwise, if the malicious program can not connect to the domain, it encrypts the computer's data, then attempts to exploit the vulnerability of Server Message Block protocol to spread out to random computers on the Internet, and laterally to computers on the same network (Wikipedia 2023c).

Epidemiological models
Compartmental epidemiological models are used to model the spread of infectious diseases (Brauer 2008;Keeling and Rohani 2008). This approach segments the population into groups (compartments) describing the various stages of infection. The compartmental structure varies according to the disease under study and the application of the model. Following disease evolution, individuals can transition at specific rates among compartments. Generally speaking, these transitions can be either spontaneous (e.g., recovery process) or resulting from interactions (e.g., infection process). In their simplest formulation, compartmental models assume homogeneous mixing. Said differently, each individual is potentially in contact with everyone else (Vespignani 2012).
The most common compartmental models are the SI, SIS, SIR and SEIR models. In Appendix 1 we will briefly review the formulation of these models by neglecting demographic changes in the population (i.e., the number of individuals is assumed to be fixed). More in detail, we represent them as systems of Ordinary Differential Equations (ODEs). This is a common approach to model epidemics in continuous time, even though it approximates the number of individuals in different compartments as continuous functions.

Problem statement and threat model
The objective of our work is to provide a rigorous mathematical analysis of realistic SPM attacks, and thus lay down the foundation of efficient defense strategies against these prevalent threats. Several works propose models to capture the behavior of SPM (Guillén et al. 2017;Guillén and del Rey 2018;Mishra and Saini 2007;Martínez Martínez et al. 2021), however, the vast majority of them have only theoretical analysis and do not incorporate the information about real-world SPM traces. Thus, they lack validation in real-world scenarios. Additionally, it is hard to perform comparative analysis to other models without presenting their performance using real-world data. Existing work that uses actual malware traces for modeling SPM (Levy et al. 2020) leverages minimal epidemiological models that, in their simplicity, fail to fully capture malware characteristics. To this end, here we use a more advanced compartmental model (called SIIDR) to describe epidemics resulting from SPM and apply it to real-world attack traces from a well-known malware, WannaCry.
Besides studying different epidemiological models according to their suitability to describe WannaCry epidemics, our second goal is to infer the parameters of the SIIDR epidemic model for different malware variants. Parameter inference is crucial for enabling attack simulations on real networks to measure the impact of the attack, as well as the effectiveness of defensive measures. Indeed, once the parameters of the attack are known, an analyst could estimate the basic reproduction number of the attack, and understand whether the attack might result in a macroscopic outbreak. Similarly, a defender might configure its network topology by performing edge or node hardening (Le et al. 2015;Tong et al. 2012;Torres et al. 2021), minimizing the leading eigenvalue of the graph to prevent the damage from self-propagating malware attacks, or using anomaly detection methods to detect the malware propagation (Ongun et al. 2021).
In this work, our focus is on modeling SPM propagation inside a local network (e.g., enterprise network, campus network) since we do not have global visibility on SPM propagation across different networks. We assume that the attacker gets a foothold inside the local network through a single initially infected host. From the 'patient zero' victim, the attack can propagate and infect other vulnerable machines in the subnet. We initially assume a homogeneous mixing model, meaning that every machine can contact all others. This is a valid assumption because in a subnet every machine is able to scan every other internal IP within the same subnet. We are assuming that none of the machines is immune to the exploited vulnerability at the beginning of the attack, thus, all of them may become infected during SPM propagation. Infectious machines become recovered when the malware is successfully detected and an efficient recovery process removes it. We assume that these machines cannot be reinfected again. We then relax the homogeneous mixing assumption and characterize the behavior of the model on arbitrary graph, considering that a contact between any two nodes in a network does not occur randomly with equal probabilities, but each node communicates with the particular subset of nodes in the network.

Related work
Numerous works propose to simulate and model malware propagation on different levels of fidelity and scalability (Perumalla and Sundaragopalan 2004). The research on modeling malware and worms propagation includes hardware testbeds (Vahdat et al. 2002;White et al. 2002), emulation systems (Durst et al. 1999;Wei et al. 2010), packetlevel simulations (Riley et al. 2004;Szymanski et al. 2003), fully-virtualized environments (Perumalla and Sundaragopalan 2004), mixed abstraction simulations (Guo et al. 2000;Kiddle et al. 2003), and epidemic models. In our work we focus on this last line of research. Similarly, Mishra and Jha (Mishra and Jha 2010) introduce the SEIQRS (Susceptible-Exposed-Infectious-Quarantined-Recovered-Susceptible) model for viruses and study the effect of the quarantined compartment on the number of recovered nodes. In their paper, the authors focus on the analysis of the threshold that determines the outcome of the disease. Mishra and Pandey (2014) introduce the SEIS-V model for viruses with a vaccinated state, while (Mishra and Saini 2007) study the SEIRS model to characterize the malicious objects' free equilibrium, formulating the stability of the results in terms of the threshold parameter. Toutonji et al. (2012) propose a VEISV (Vulnerable-Exposed-Infectious-Secured-Vulnerable) model and use the reproduction rate to derive global and local stability. With the help of simulation, they show the positive impact of increasing security countermeasures in the vulnerable state on wormexposed and infectious propagation waves. Guillén et al. (2019) introduce a SCIRAS (Susceptible-Carrier-Infectious-Recovered-Attacked-Susceptible) model. Authors study the local and global stability of its equilibrium points and compute the basic reproductive number. Ojha et al. (2021) develop a new SEIQRV (Susceptible-Exposed-Infected-Quarantined-Recovered-Vaccinated) model to capture the behavior of malware attacks in wireless sensor networks. In their work, authors obtain the equilibrium points of the proposed model, analyze the system stability under different conditions, and verify the performance of the model through simulations. Zheng et al. (2020) introduce the SLBQR (Susceptible-Latent-Breaking out-Quarantined-Recovered) model considering vaccination strategies with temporary immunity as well as quarantined strategies. The authors study the stability of the model, investigate a strategy based on quarantines aimed at suppressing the spread of the virus, and discuss the effect of the vaccination on permanent immunity. In order to verify their findings, the authors simulate the model exploring a range of temporary immune times and quarantine rates.
Recently, several attempts have been made to enhance the realism of the epidemic models. For instance, Guillén et al. (2017) study the SEIRS model with an improved incidence rate (i.e., new infected hosts per time unit). Additionally, the equilibrium points are computed and their local and global stability are studied. Finally, the authors derive the explicit expression of the basic reproductive number and propose efficient measures to control the epidemics. Martínez Martínez et al. (2021) introduce a dynamic version of SEIRS. The authors look at the performance of the model with different sets of parameters, propose optimal values, and discuss its applicability to model real-world malware. Gan et al. (2020) propose a dynamical SIP (Susceptible-Infected-Protected) model, find an equilibrium point, and discuss its local and global stability. Additionally, the authors perform the numerical simulations of the model to demonstrate the dependency on parameter values. Yao et al. (2018) present a time-delayed worm propagation model with variable infection rate. They analyze the stability of equilibrium and the threshold of Hopf bifurcation. The authors carry out the numerical analysis and simulation of the model.
Some papers explore malware propagation on networks comprised of different types of devices. For instance, Guillén and del Rey (2018) considers the special class of carrier devices whose operative systems are not targeted by malware (for example, iOS devices for Android malware); the authors introduce a new compartment (Carrier) to account for these devices, and analyze efficient control measures based on the basic reproductive number. Zhu et al. (2012) take into consideration the ability of viruses to infect not only computers, but also many kinds of external removable devices; in their model, internal devices can be in Susceptible, Infected, and Recovered states, while removable devices can be in Susceptible and Infected states.
None of these previous works perform model fitting to real-world malware scenarios, but only consider theoretical analyses of the proposed models. The closest to our work is Levy et al. (2020); the authors use real traces to fit malware propagation with SIR, a simplistic model that, as we have shown, performs poorly compared to SIIDR and fails to capture self-propagating malware dynamics.

Analysis of the SIIDR model
In this section, we introduce the main characteristics of WannaCry propagation dynamics, the proposed modeling framework (the SIIDR model), we discuss its basic reproduction number and the stability of disease-free equilibrium points. Table 2 includes common terminology used in this section.

SPM modeling with the SIIDR model
A detailed analysis of the WannaCry traces (Chernikova et al. 2022) revealed the following characteristics: • The time interval t between two consecutive malicious attempts from the same infected IP is not constant and has high variability. This intuition is supported by the results in Fig. 1 where we show the quartile coefficient of dispersion (QCoD) of these t for different Wannacry variants. The QCoD is defined as (Q 3 − Q 1 )/(Q 3 + Q 1 ) . As benchmark we show the hypothetical QCoD of exponentially distributed t with the same mean observed in the data. We chose the exponential distribution since time intervals lapsing between Poisson-like events happening at constant rate follow this distribution. From the figure we see that the QCoD of t obtained from the data is much higher ( ∼ 50% more across variants) than the one we would expect to see with constant frequency events. Fig. 1 Quartile coefficient of dispersion of t between two consecutive malicious attempts from the same infected IP and of exponential distribution with same mean for different WannaCry variants • The time interval t between the last attack from an infected IP and the end of the collected trace is large. The average values of t between two consecutive malicious attempts and t between the last attack attempt from an infected IP and the end of the epidemics are shown in Fig. 2. The mean value of the t in the second case are much larger then the t between two consecutive attack attempts.
Based on the first observation, an infected dormant state I D is included to capture the heterogeneous distribution of time windows between two malicious attack attempts. Therefore, an infected node can become dormant for some period of time and resume its malicious activity later. The second observation supports the presence of a Recovered state: once nodes recover, they will not become infectious or susceptible again, at least within a certain observation period. The transition diagram corresponding to the SIIDR model is illustrated in Fig. 3. Interacting with the infectious, a susceptible node can become infected with rate β , and afterwards, it may either  The evolution of the system can be modeled through the following ODEs system: where the total size of the population N is constant. It is important to stress how the system of ODEs assumes an homogeneous mixing in the host population.

SIIDR equilibrium points
While modeling SPM we are interested in equilibrium states when the number of infected individuals equals to 0 and does not change over time (i.e., disease-free equilibrium points). Thus, we need to derive the constant solutions of the ODE system corresponding to SIIDR model (Perko 2013 (1) For the SIIDR model we can find the equilibrium points by solving the following system: Thus, we find disease-free equilibrium points of the SIIDR model as E * = (S, 0, 0, R) where I = I D = 0 and S + R = N . The particular case is the beginning of the propagation process when the number of recovered individuals is 0: R = 0 or E * = (N , 0, 0, 0) . Therefore, we perform further analyses of SIIDR model based on this equilibrium point. There exists no endemic equilibrium point when I = 0 for SIIDR model. It is present only when µ = 0 (SIID model) and is equal to (0, I * , γ 1 I * γ 2 , 0).

The basic reproduction number
The basic reproduction number R 0 is the number of secondary cases generated by a single infectious seed in a fully susceptible population (Keeling and Rohani 2008). R 0 defines the epidemic threshold, that is the condition for a macroscopic outbreak. If R 0 > 1 , on average, infected individuals are able to sustain the spreading. If R 0 < 1 , on average, the disease will die out before any macroscopic outbreak. One way to derive the basic reproduction number is to use the next-generation matrix approach (Diekmann et al. 1990(Diekmann et al. , 2010Blackwood and Childs 2018). This states that the basic reproduction number is the largest eigenvalue of the next-generation matrix. The method takes into consideration the dynamics of compartments linked to new infections. For example the number of infected individuals in compartment i, i ∈ {1, . . . , k} , where k is the number of compartments with infected individuals, changes as follows: is the rate of transfer of individuals into compartment i and V − i (X) represents the rate of transfer of individuals out of compartment. If E * is a disease-free equilibrium, then we can define a next-generation matrix: where: In the case of SIIDR model, the matrix G can be represented at one of the disease-free equilibrium points DFE = (N , 0, 0, 0) as follows: Let v be an eigenvector of the matrix G, and its corresponding eigenvalue. The eigenvalue equation is (Bhatia 1997 where v is a nonzero vector, therefore det[ I − G] = 0 . Using G from equation 2, we obtain: which results in: 1) = 0 or 2) = β/µ . According to the next-generation matrix method (Diekmann et al. 1990(Diekmann et al. , 2010Blackwood and Childs 2018), the reproduction number R 0 is the largest eigenvalue of the next-generation matrix G, hence, R 0 = β µ , which is the same definition of R 0 of the SIR model. In other words, the introduction of the new compartment I D does not alter the conditions for a macroscopic outbreak. We note that, in general, the disease free equilibrium might contain individuals already immune to the disease, i.e., E * = (N − R, 0, 0, R) . This might be due to wave of infections caused by previous introductions of the virus. In this more general case we have: where in parenthesis we have the fraction of the susceptible population.

Stability analysis of SIIDR equilibrium points
A particularly important characteristic of a disease-free equilibrium point is its stability (Hirsch and Smale 1974), which indicates whether the system will be able to return to the equilibrium point after small perturbations. For example, a small perturbation can be a slight increase in the number of initially infected nodes.
Let us consider the system of ODEs that captures the dynamics of our SIIDR model (see Eqs. 1), governed by: Furthermore, let us assume that the system's initial state at t = 0 is X = X 0 . In this context, the stability of E * can be obtained answering to the following question: if the system starts near E * , how close will it remain to E * ? Beside this intuition, stability is more formally defined as follows (Hirsch and Smale 1974): Definition 2 The equilibrium point E * is stable if for any ǫ > 0 , there exists a δ > 0 such that: if the system's initial state X 0 lies in the ball of radius δ around E * (i.e., ||X 0 − E * || < δ ), then solutions X t exist for all t > 0 , and they stay in the ball of radius ǫ around E * (i.e., ||X t − E * || < ǫ).
In addition: Definition 3 We say that E * is locally asymptotically stable if it is stable and the solutions X t with initial state X 0 in the ball of radius δ converge to E * as t → ∞.

And:
Definition 4 We say that E * is stable in the sense of Lyapunov (i.e., Lyapunov stable) when there exists the continuously differentiable function L(X) such that: If L (X) < 0 and L (X) = 0 only when X = E * , then E * is locally asymptotically stable.
We next analyze the stability of the SIIDR disease-free equilibrium points and show that they are Lyapunov stable, if the reproduction number R 0 is smaller or equal to one. We formally state and prove it in the following theorem: where L is the valid Lyapunov function as long as it is nonnegative continuously differentiable scalar function which equals 0 at the disease-free equilibrium point ( I = I D = 0 ). The time-derivative of L is the following: where we used Eqs. 1 that describe the evolution of I and I D . Therefore, L ≤ 0 (Eq. 4) when: Given the basic reproduction number R 0 = βS µN , we obtain: Eq. 5 holds when R 0 ≤ 1 . Hence, L ≤ 0 when R 0 ≤ 1 . Furthermore, L (E * ) = 0 (since I = 0 when X = E * ), which concludes the proof that E * is a Lyapunov stable disease-free equilibrium point. ( Note that L (X) = 0 when I = 0 , even if X = E * (for instance, if I D = 0 ). Thus, E * is not locally asymptotically stable (see Definition 4).

SIIDR analysis on arbitrary graphs
Our analysis in previous sections was performed under the homogeneous-mixing assumption (Bansal et al. 2007;Vespignani 2012). In this limit, all hosts are well-mixed and potentially in contact. The homogeneous approximation might be a good representation of the contact dynamics in a local subnet where each machine can contact anyone else. However, the contact patterns in larger networks are complex. Indeed, many real networks (including the Internet) feature, among other properties, a heterogeneous connectivity distribution consisting of a few highly-connected 'hubs' , while the vast majority of nodes have much lower connectivity (Albert and Barabási 2002;Pastor-Satorras et al. 2015). In this section, we analyze the epidemiological dynamics of the SIIDR model on arbitrary graphs that capture heterogeneity in host contact patterns. In this case, the propagation of malware can be modeled with a discrete-time Non-Linear Dynamical System (Chakrabarti et al. 2008;Prakash et al. 2011).
A NLDS system is specified by the vector of probabilities at time step t + 1 as P t+1 = g(P t ) , where g is non-linear continuous function operating on a vector P t . We define the system equations based on the transition diagram of the model (Fig. 3).
First, we are computing the probability of node i of not getting infected at time step t: ζ i,t (I) , which happens when: (1) none of its neighbors are in state I, or (2) a neighbor is in state I but fails to infect i with probability (1 −β) , where β is the attack transmission probability over a contact-link. We note how β is generally different than the infection rate β introduced above. Indeed we can approximate β =β�k� t where k t is the average contact rate per unit time. Hence: Next, we develop the equations for probabilities P of node i to be in each of the possible states ( S, I, I D , R ) at time step t + 1.
For generality and clarity, we denote by α XY the probability of a node to transition from state X to Y, while α XX is the probability of a node to remain in state X. With this notation, the probability equations for each state are as follows: State S: A node i is in state S at time t + 1 if it was in state S at time t and it did not get infected: State I: A node i is in state I at time t + 1 if either: 1) it was in state S at time t and was successfully infected, or 2) it was in state I at time t and it remained there (i.e., it did not transition to states R or I D ), or 3) it was in state I D at time t and transitioned to state I.
(1 −βA i,j P I,j,t ) (7) P S,i,t+1 = P S,i,t · ζ i,t (I) (8) P I,i,t+1 = P S,i,t · (1 − ζ i,t (I)) + P I,i,t · α II + P I D ,i,t · α I D I State I D : A node i is in state I D at time t + 1 if either: 1) it was in state I at time t and transitioned to state I D , or 2) it was in state I D at time t and it remained there.
State R: We can compute P R,i,t using the relation: Now we can write down the system of equations for SIIDR using Eqs. 7-10 to define P t , the probability vector that completely describes the evolution of the system at any time step t:

Stability analysis
The next step in our analysis of the SIIDR propagation on complex networks represented as arbitrary graphs is to define the disease-free equilibrium points and analyze their stability.
Definition 5 An equilibrium point of NLDS is the probability vector P * that satisfies P t+1 = P t = P * for any t (Verhulst 2006).
Thus, for the SIIDR model we can define the disease-free equilibrium point as follows: One way to analyze the stability of the equilibrium point of a non-linear dynamical system is to approximate its dynamics at that point as a linear dynamical system (i.e., linearization) (Sayama 2015). In this case, the system behavior in an infinitesimally small area about the equilibrium point is approximated with a Jacobian matrix.
The largest eigenvalue J of the Jacobian matrix indicates whether the equilibrium point of the system is stable or not. Since we are considering the time as discrete, if | J | < 1 , the equilibrium point is asymptotically stable; even if small perturbations occur, the system asymptotically goes back to the equilibrium point. If | J | > 1 , the system is unstable and diverges away from the equilibrium point. If | J | = 1 , then the system may either diverge from, or converge to the equilibrium point (Bof et al. 2018;Dahleh et al. 2004;Haddad and Chellaboina 2011;Sayama 2015).
The Jacobian matrix of SIIDR modeled as NDLS and an analysis of its eigenvalues is presented in Appendix 3. We show that one of the eigenvalues of the Jacobian has value 1. This result is particularly significant. Asymptotic stability requires all the eigenvalues of the Jacobian matrix to be less than 1 in absolute values. Since the Jacobian matrix has at least one eigenvalue of value 1, the equilibrium point of the NLDS system cannot be asymptotically stable. However, the equilibrium point can still be Lyapunov stable.
(9) P I D ,i,t+1 = P I,i,t · α II D + P I D ,i,t · α I D I D (10) ∀i, t : P S,i,t + P I,i,t + P I D ,i,t + P R,i,t = 1 (11) P S,i,t+1 = P S,i,t · ζ i,t (I) P I,i,t+1 = P S,i,t · (1 − ζ i,t (I)) + P I,i,t · α II + P I D ,i,t · α I D I P I D ,i,t+1 = P I,i,t · α II D + P I D ,i,t · α I D I D P R,i,t+1 = 1 − P S,i,t − P I D ,i,t · (α I D I + α I D I D ) − P I,i,t · (α II + α II D ) P * = [P S , 0, 0, P R ] T , where P R + P S = 1 We show that the equilibrium points of SIIDR are indeed Lyapunov stable using Lyapunov's second stability criterion.

Definition 6
The equilibrium point P * of P t+1 = g(P t ) NLDS is Lyapunov stable if there exists a continuous function L, such that for any t:

Experimental results
In this section, we present the reconstruction of WannaCry dynamics from network logs captured with Zeek monitoring tool (The Zeek Project 2023). Additionally, we show supporting results that confirm that the SIIDR model fits WannaCry traces best. We also present our experiments for parameter estimation, providing the statistics from the posterior distribution of SIIDR transition rates. These results expand the results presented in our previous work where we introduced SIIDR model (Chernikova et al. 2022). Moreover, we study the basic reproduction number R 0 of the reconstructed attacks to understand its correlation with SPM dynamics (in particular, its propagation speed). We also discuss the issue of structural and practical identifiabiility of SIIDR parameters which is common in epidemiological modeling. Finally, we experimentally demonstrate that the condition for Lyapunov stability of the diseasefree equilibrium point holds when the networks are modeled as arbitrary graphs relaxing homogeneous mixing assumption.

WannaCry malware traces
We obtained realistic WannaCry attack traces by running the malware in a controlled virtual environment consisting of 51 virtual machines, configured with a version of Windows vulnerable to the EternalBlue SMB exploit. The external traffic generated by the VMs was blocked to isolate the environment and prevent external malware spread. The infection started from an initial victim IP, and then the attack propagated through the network as the infected IPs began to scan other IPs. In these experiments, WannaCry varied the number of threads used for scanning, which were set to 1, 4 or 8, and the time interval between scans, which was set to 500ms, 1 s, 5 s, 10 s or 20 s. Using the combination of these two parameters resulted in 15 WannaCry traces. While running WannaCry with this setup, the log traces were collected with the help of the open source Zeek network monitoring tool.

WannaCry reconstruction
To reconstruct the WannaCry dynamics we are using Zeek communication logs where we consider only communication between internal IPs. Since WannaCry attempts to exploit the SMB vulnerability, we label as malicious all the attempts of connections on destination port 445. The first attempt to establish the malicious connection is considered to be the start of the epidemics, and the end corresponds to the last communication event in the network. Each IP trying to establish a malicious connection for the first time at time t is considered infected at time t. The cumulative number of infected IPs through time represents the curve of the WannaCry epidemics.

WannaCry dynamics
We show the dynamics of WannaCry variants characterized by different numbers of scanning threads and time between scans in Fig. 4. These dynamics represent the cumulative number of infected nodes during the epidemic time. The trace which corresponds to 1 thread and 20 s sleeping time wc_1_20s has unusual behavior in the dynamics. It has a very small number of infected nodes until the end of the attack, when the infections rapidly increase to the 7 infected nodes at once. For all other WannaCry variants we observe that the attack reaches the maximum number of infected nodes quickly and is not able to infect any other nodes for a large time window before the end of the epidemic. These graphs confirm the fact that after an IP enters a recovered state it no longer has an opportunity to get back to susceptible or infected nodes. For modeling and parameter estimation experiments we exclude the time windows after which the number of infections does not change. Additionally, we present the number of contacted and infected IPs in Table 3. Interestingly, the overall percentage of infected nodes is small (around 25% on average) for all variants. The possible reason for this is the fact that some of the machines that do not get infected may have immunity to the malware.

Model selection
We select the model that fits WannaCry traces best among several representative compartmental epidemiological models: SI, SIS, SIR, SEIR and SIIDR assuming an homogenous mixing of machines. These models have different number of parameters and, therefore, different a-priori explaining power. The SIIDR model is also the one that has the largest number of parameters. To allow for a fair comparison among models, we considered the Akaike Information Criterion (AIC) as a metric to measure their performance. The AIC is calculated based on the maximum likelihood estimate and the number of free model parameters, thus, allowing comparison of models with different number of parameters. More information about AIC criteria can be found in "Model selection" section in Appendix 5. We perform model selection for all WannaCry traces. The lowest AIC score corresponds to the best model. We run the experiments on an uniform grid of model parameter values between 0 and 1. We select the lowest AIC score for each WannaCry trace and each compartmental model. The results are illustrated in Table 4. In bold, we highlight the minimum AIC value across all models for each Wanna-Cry trace. The SIIDR model has the lowest AIC score for all traces except for wc_1_20s. For instance, the AIC score associated with the SEIR model for wc_8_5s WannaCry trace is equal to -87, the SIS model score is 104, the SIR model score is -35, whereas for the SIIDR model the AIC is the lowest and has the value of -121. This trend is valid for all other WannaCry traces except for wc_1_20s where the SEIR model provides the best fit. However, this variant is an outlier. Therefore, we can conclude that, among the four epidemiological models, the SIIDR model fits the WannaCry attack traces best. For each compartmental model and each WannaCry trace, we plot the reconstruction curve of the number of infected nodes using the parameters corresponding to the lowest AIC score along with the true dynamics of infected nodes. The results are shown in Fig. 5. In the case of the SIS model, the orange line (representing the simulated dynamics of the number of infected nodes) is far from the blue one, which illustrates the empirical dynamic for all malware traces. In the case of the SIR and SEIR models the numbers of simulated infections are closer to the real ones, however, the SIIDR and actual dynamics curves are the closest.

Parameter estimation
We approximated the posterior distribution of SIIDR transition rates using the ABC-SMC-MNN technique (Filippi et al. 2013). The details of this technique are described in "Posterior distribution of transition rates" in Appendix 5. The mean values and standard deviation of the posterior distribution of SIIDR transition rates ( β , µ , γ 1 , γ 2 ) are represented in Table 5. The parameter dt is the integration step, which is calculated as: dt = (t N − t 0 )/T , where t N is the last timestamp, t 0 is the first timestamp, and T is the number of timestamps in WannaCry traces. dt differs by variant due to the different propagation speeds. The attack transmission probability β is related to attack transmission rate β as follows: β =β�k� t where k t is the average contact rate per unit time. In the WannaCry traces we have one communication or contact per dt, hence, the transmission probability β over a contact-link also equals β.
Based on estimated values of transition rates we calculated the basic reproduction number R 0 for all WannaCry traces. We also calculate the SPM propagation speed for all WannaCry traces as the average number of new infections per 100 s. The results are illustrated in Fig. 6. As expected, we observe that higher SPM propagation speed corresponds to a higher basic reproduction number R 0 .  . 6 The basic reproduction number R 0 calculated by using estimated values of transition rates compared to the speed of SPM propagation. Higher propagation speed corresponds to higher R 0 . We exclude the results for the wc_1* traces The mean values of the parameters' posterior distribution can be further used to simulate SPM with the SIIDR model. This provides an opportunity to create synthetic, but realistic, WannaCry scenarios and evaluate whether existing defenses are successful in preventing and stopping the malware from propagation in the networks. However, we notice that some of the WannaCry attack variants affect only a small number of nodes. For example, the wc_8_5s trace has only 4 infected nodes at the end of the trace which constitutes 14% of all nodes. Consequently, ABC-SMC-MNN is expected to perform worse in the estimation of transition rates for such traces. Thus, parameters estimated from the traces with higher numbers of infections are more reliable.

Identifiability of SIIDR transition rates
As long as the goals of modeling with SIIDR include inferences about the underlying propagation process, we are interested in the estimation of SIIDR parameter distribution corresponding to model outputs that best fit the observed data. However, parameters' estimation can only produce robust results if the model is identifiable meaning that it is possible to obtain a unique solution for all unknown parameters given the model structure and output. On the other hand, if parameters are not identifiable their similar values may yield considerably different model outputs (Chis et al. 2011;Tuncer and Le 2018).
The common problem of data uncertainty forces the issue of parameter identifiability to appear relevant in epidemiological modeling (Chowell 2017;Gallo et al. 2022;Weitz and Dushoff 2015;Valdez et al. 2015). The lack of identifiability in the model parameters may prevent reliable predictions of the epidemic dynamics. Therefore, it becomes crucial to investigate the parameter identifiability, and its limitations and propose solutions to improve it.
There exist notions of structural and practical identifiability. Structural identifiability is a property of the model structure itself given that the model is error-free and the observed data has no noise. Practical identifiability is connected to the quality of data leveraged for parameter estimation. It measures whether there is enough information to infer the transition rates (Dankwa et al. 2022).
We addressed the structural SIIDR parameters identifiability using the method of differential algebra (Chis et al. 2011;Miao et al. 2011) with the help of DAISY (Bellu et al. 2007) and SIAN (Hong et al. 2020;Ilmer et al. 2021) software and achieved the following result: Theorem 3 All parameters of the SIIDR model are globally structurally identifiable when incidence represents the output of the model and the size of population N is known. Otherwise, parameters N and β appear to be structurally non-identifiable while µ, γ 1 and γ 2 remain identifiable.
Therefore, we consider the SIIDR model to be structurally identifiable as long as the size of the computer networks is usually known. More information about SIIDR structural identifiability along with the results from DAISY software can be found in Appendix 2.
However, even when the model parameters are structurally identifiable, they may still be non-identifiable in practice due to the limited number of observed variables, the quality of data used for estimation, and the complexity of the model (the number of parameters that are jointly estimated).
To investigate practical identifiability we looked at the joint posterior distribution of SIIDR parameters. The plots can be found in "Joint posterior distributions of SIIDR parameters" in Appendix 2. For some of the WC variants, there exists a correlation between parameters β and µ . Additionally, some of the joint posterior distributions possess multimodality. Although on average the issue of non-identifiability is not dominant, it might appear in some parts of the phase space of the SIIDR model. One reason for this behavior is that the incidence represents the output of the fitted model and appears to be insufficient to characterize the whole model's dynamic. On the other hand, SIIDR has four parameters estimated jointly, therefore, it may contain multiple sets of parameters that lead to the same output of the model. Hence, measuring the data about other states rather than just the number of infected nodes as a function of time to characterize the system dynamics more extensively, should improve the practical SIIDR identifiability.

Threshold evaluation
In this section, we evaluate the conditions of SIIDR model equilibrium points to satisfy the Lyapunov stability. Specifically, we are interested in the equilibrium point which corresponds to the start of epidemics, when all nodes in the network have the following probability vector to appear in all of the states of SIIDR model P We study the stability of this point after the infection of the initial node by SPM (i.e., the system initial state P 0 lies in the ball of radius δ around P * ) by looking at the density of recovered nodes w.r.t to the stability threshold s and associated infection propagation dynamics P t .
We evaluate stability conditions on the variety of synthetic and real-world networks described in the following subsection.

Graphs characteristics
We consider synthetic networks generated with Barabási-Albert (BA) (Barabási and Albert 1999), Erdős-Rényi (ER) (Erdős and Rényi 1959), Watts-Strogatz (WS) (Watts and Strogatz 1998), Configuration Model (CM) (Newman 2003), and Scale-free (SF) (Barabási 2009) models along with three real-world graphs (Leskovec et al. 2005;Leskovec and Mcauley 2012;Leskovec et al. 2007). Real-world graphs include networks generated using Facebook data (Facebook), autonomous systems peering information inferred from Oregon route views (Oregon), and anonymized traffic data about incoming and outgoing emails between members of the European research institution (Email). All synthetic graphs have 1000 nodes and different topological characteristics. Thus, ER graphs have different leading eigenvalues that range from 11 to 999, BA networks have the leading eigenvalue between 35 and 508, and WS graphs -between 10 and 900. ER, BA, and WS networks have only one connected component. They have a larger diameter and average path length, and smaller density and transitivity in the graphs with smaller leading eigenvalues. CM and SF networks have more connected components and the values of other topological characteristics are similar to ER, BA, and WS graphs with small leading eigenvalues.
More details about the topological characteristics of considered networks are presented in Table 6. Chernikova et al. Applied Network Science (2023) 8:52 Phase transition To illustrate the results of Theorem 2 we plot the final number of recovered nodes in the network with respect to the threshold values s = A * β/µ in the range from 0 to 2. We achieve these results by fixing the transition rates µ = 0.5, γ 1 = 0.5, γ 2 = 0.5 and changing the value of β . For ER, BA and WS graphs infection propagation starts from one  . 7 The mean value of recovered nodes R with 50% and 95% reference ranges obtained from numerical simulations of the SIIDR model on Erdős-Rényi networks with respect to threshold 1 * β/µ value initially infected node, for SF, CM and real-world networks the fraction of infected nodes at t = 1 is 0.05. We average results over 100 stochastic realizations that we run considering 50 different seeds. Resulting phase transition plots are illustrated in Figs. 7,8,9,and 10. For all types of graphs, the total fraction of recovered nodes is negligible for values of s < 1 . As predicted by the theory, the epidemic threshold is s ∼ 1 . In the case of SF, CM, and real networks (see Fig. 10), the threshold appears to be for s < 1 . However, we note how in order to obtain macroscopic outbreaks in these graphs, we started the simulations with 5% of initially infected seeds, instead of a single one as done for the other networks. Hence, also for these networks, the phase transition takes place for s ∼ 1.

Fig. 8
The mean value of recovered nodes R with 50% and 95% reference range obtained from numerical simulations of the SIIDR model on Barabási-Albert networks with respect to threshold 1 * β/µ value Fig. 9 The mean value of recovered nodes R with 50% and 95% reference range obtained from numerical simulations of the SIIDR model on Watts-Strogatz networks with respect to threshold 1 * β/µ value In general, networks with larger diameters and average path lengths, smaller density, and transitivity have a smaller fraction of recovered nodes during the infection propagation.
These results demonstrate that for all t the solution P t stays in some ball of radius ǫ from the starting equilibrium point P * = P 0 when s < 1 , therefore, it is Lyapunov stable. Moreover, we see that SIIDR behaves the same as the SIR model in terms of the stability of equilibrium points: when the threshold s is less than one the SIIDR system solution converges to DFE when t tends to infinity. It can be explained by the fact that SIIDR model is very similar to a SIR model except for the particular configuration of transition rates.

Conclusions
We performed a comprehensive analysis of a new compartmental model, SIIDR, that captures the behavior of self-propagating malware. We showed that SIIDR fits real-world Wan-naCry traces much better than existing compartmental models such as SI, SIS, SIR, and SEIR (which were previously studied in the literature). Additionally, we estimated the posterior distribution of the model's parameters for real attack traces and showed how they characterize the WannaCry behavior. We also analytically derived the conditions when SPM is expected to become an epidemic and discussed the stability of model's disease-free equilibrium points. Our work demonstrates the impact of modeling the propagation of SPM, simulating real attacks on networks, and evaluating defensive techniques.

SI model
The SI model is used to describe diseases where infection is permanent. It features two compartments and one transition. The susceptible compartment S represents healthy Fig. 10 The mean value of recovered nodes R with 50% and 95% reference range obtained from numerical simulations of the SIIDR model on real-world networks along with scale-free and configuration model graphs with respect to threshold 1 * β/µ value individuals that interacting with infectious individuals in the compartment I can get infected ( S + I → 2I ). It can be translated in the following system of ODEs: Due to the homogeneous mixing assumption, the per capita rate at which susceptible individuals get infected can be written as the probability of interacting with an infected individual (I/N) times the transmission rate of the disease β . The state diagram for the SI model is shown in Fig. 11.

SIS model
The SIS model features two compartments and two transitions. Beside the infection process as in the SI model, SIS models have also a recovery process: infected individuals spontaneously recover at rate µ becoming susceptible to the disease again ( I → S ). Hence SIS models are used for diseases that can infect individuals multiple times. The system of ODEs associated with SIS model is: Note how, differently from infection, the recovery process is spontaneous and does not require any interaction. Hence, each infected individual has an average duration of infection of µ −1 . The state diagram for SIS model is shown in Fig. 12.

SIR model
The SIR model describes diseases that give permanent (or long-lasting) immunity. It features three compartments and two transitions. Differently from SIS models, within the SIR framework infected individuals that are no longer infectious transition to the recovered compartment R. The system of differential equations corresponding to the SIR model is the following: The state diagram for the SIR model is represented in Fig. 13.

SEIR model
The SEIR model describes diseases where susceptible individuals S remain exposed E after interaction with infected I individual before becoming infectious themselves. It features four compartments and three transitions. The system of differential equations corresponding to the SEIR model is the following: The state diagram for the SEIR model is represented in Fig. 14.  The size of the Jacobian matrix is 4N × 4N , where N is the number of nodes in the graph. Every row has 4 elements of size N × N . We use the following notation: I is the identity matrix of size N × N and O is a matrix of size N × N with all zeros. A is the adjacency matrix of the network represented as a graph, of size N × N .
The first row is a linear combination of the other rows, thus: Let us represent the Jacobian matrix as follows: where Let v of size 3N × 1 and J be the eigenvector and the eigenvalue of J respectively. Then we can define v to be composed of v 1 of size N × 1 and v 2 of size 2N × 1: v and J satisfy the following equation: which results in: Eq. 23 implies that: From Eq. 25 we have: 1. � v 2 = � 0 , or 2. v 2 is the eigenvector of Q 3 and J is the eigenvalue of Q 3 .
We look at the first case into more detail: if v 2 = 0, from Eq. 24, we obtain that Q 1 · � v 1 = J · � v 1 . That means either: (a) � v 1 = 0 , which is not feasible, because in this case � v = � 0 , or (b) J is the eigenvalue of Q 1 . Thus, the eigenvalues of the Jacobian matrix can be represented as eigenvalues of matrix Q 1 (when v 2 = 0) and eigenvalues of matrix Q 3 . Given the structure of Q 1 (i.e., identity matrix of size N × N ), the eigenvalues of Q 1 are equal to 1 . Thus, we can conclude that the Jacobian matrix has at least one eigenvalue equal to 1. where A is the largest eigenvalue of the adjacency matrix, β and µ are probabilities of infection and recovery respectively.

Appendix 4 SIIDR stability as the system of NLDS
Proof System (11) can be reduced to the first three equations because of linear dependency of P R,i,t+1 on other equations, and has the following representation in the matrix form: where matrices C and P T t BP t of size 3N × 3N correspond to the linear and non-linear part of the system, respectively. P T = {P T 1 , P T 2 , P T 3 } is a 3N × 9N matrix, where P T i is a 3N × 3N matrix with non-zero i th row P S , P I , P I D : has the size of 3N × 3N . Based on our system representation (11) matrix C is the following: and matrix B is: where A is the adjacency matrix of the corresponding graph.
Let L be the continuous function equal to P T K , where K is the 3N × 1 matrix: Then L is positive definite because it is equal to the sum of probabilities of all nodes in the graph be infected or infected dormant. The finite difference (13) in this case is equal to: where: Chernikova et al. Applied Network Science (2023) 8:52 thus, which results in the condition: or where P I is the 1 × N vector of node probabilities to be infected, P S is the 1 × N vector of node probabilities to be susceptible, and A is the adjacency matrix of the corresponding graph. Expression (27) means that the sum of probabilities of nodes to recover should be greater than the sum of probabilities of nodes to become infected at each time step for the equilibrium points of the system (11) to be Lyapunov stable.
As long as the maximum value of probabilities in the vector P S is 1, it is true that: So if we prove that: the condition (27) will be satisfied.
This condition can also be formulated by incorporating the nodes' degrees as follows:

Model selection
We use the AIC as guiding criterion to compare SIIDR to other epidemiological models, namely SI, SIS, SIR. The AIC is calculated based on the number of free parameters k and the maximum likelihood estimate of the model L as follows: The first term introduces a penalty that increases with the number of parameters and thus discourages overfitting. The second term rewards the goodness of fit that is assessed by the likelihood function. For the likelihood function, we use the least squares estimation. The best model is the one with the lowest AIC. In the case of the least squares estimation, the AIC can be expressed as: where: and ǫ i are the estimated residuals: with I sim t being the cumulative number of infected nodes from model simulations, and I real t the cumulative number of infected nodes from real-world observations, at time interval t.
We use stochastic simulations (Higham 2001) to obtain a numerical approximation of the propagation process described by the system of ODEs. Generally, statistical methods such as stochastic simulations are a good approximation for larger systems, while in the case of smaller systems stochastic fluctuations become more important. The transitions among compartments are implemented through chain binomial processes (Abbey 1952). At step t the number of entities in compartment X transiting to compartment Y is sampled from a binomial distribution Pr Bin (X(t), p X→Y (t)) , where p X→Y (t) is the transition probability. If multiple transitions can happen from X (e.g., X → Y , X → Z ), a multinomial distribution is used (e.g., Pr Mult (X(t), p X→Y (t), p X→Z (t))).
The model selection methodology is summarized in Algorithm 1. We start by creating a uniform grid of possible parameter values (lines 2-5). For each model and each set of parameter values p = (β, µ, γ 1 , γ 2 ) we perform several stochastic experiments simulating the model dynamics (the run_stochastic_avg procedure). Each stochastic realization consists of a time series, where S(t), I(t), I D (t), R(t) represent the number of nodes in each state at time interval t during the simulation. The cumulative infection I sim consists of the total number of nodes in states I, I D , and R, and is also a time series across all time intervals dt. Next, we compute the AIC using equation (38) by comparing the simulated to the actual dynamic. We select the minimum AIC score for each model; the best model is the one with the minimum AIC score overall. return M 17: end procedure SIIDR Parameters associated with the best AIC score In Table 7 we show the SIIDR parameters associated with the minimum AIC score for all WC variants.

Posterior distribution of transition rates
To find the best set of parameters for the SIIDR model we can approximate the posterior distribution of the parameters using Approximate Bayesian Computation (ABC) techniques (Minter and Retkute 2019). These techniques are based on the Bayes rule for determining the posterior distribution of parameters given the data: where P(θ) is the prior distribution of parameters that represents our belief about them and P(D|θ) is the likelihood function, i.e., the probability density function of the data given the parameters. Marginal likelihood of the data P(D) does not depend on θ , and therefore the posterior distribution P(θ |D) is proportional to the numerator in (40). ABC methods are useful when the likelihood function is unknown or is not feasible to estimate analytically. The simplest version of ABC techniques is called rejection algorithm and is illustrated in Algorithm 2. Despite it simplicity, the rejection algorithm is generally slow at converging. Indeed, each iteration is independent from the previous ones and the prior distribution from which parameters are sampled is never updated. Furthermore, it is often difficult to decide, a priori, a reasonable threshold value ǫ that guarantees both fast convergence and accurate results.
In alternative to the rejection algorithm, we use here a more advanced ABC technique that leverages Sequential Monte Carlo (ABC-SMC) (Toni et al. 2009;McKinley et al. 2018). The ABC-SMC approach iteratively constructs generations of prior distributions by decreasing the rejection threshold over time. At the first generation, a given number of parameter sets (i.e., particles) is accepted from the starting prior distribution, while each prior distribution used in following generations is obtained as a weighted sample from the previous generation θ * perturbed through a kernel K (θ |θ * ) . Common choices for the kernel are the uniform and multivariate normal distributions. A kernel with a large variance will prevent the algorithm from being stuck in the local modes, but will result in a huge number of particles being rejected, which is inefficient. Therefore, we use the multivariate normal distribution, where the covariance matrix is calculated considering M nearest neighbors (MNN) of the particles from the previous generation (Filippi et al. 2013). The ABC-SMC-MNN algorithm is illustrated in Algorithm 3.