In the simplest scenario, the geographic distribution of the population impacts the relative risk of an infection event occurring in a specific local area in two ways: (i) through the local population density, and (ii) through the connectedness of the local area to the rest of the territory. If we assume that the contact rate of any two individuals depends on their household distance, we can quantify the average connectedness of an area in terms of its geographical centrality. Thus, the chances for a given area to be reached by the contagion ultimately depend on its local density and centrality. To formalize this idea, we focus on the relative attack rate \(\alpha\) and on the first infection time \(\tau\), defined in (1) and (2), starting from two simple models: homogeneous mixing (HM) and distance-based mixing (DM).

### Homogeneous mixing

At this level of description, the geography of an outbreak of size *R* can be described by means of a vector of indices \(O=(j_0, j_1,..., j_{R-1})\) such that \(j_h=i\) if and only if the *h*th infection occurs in \(V_i\) (simultaneous infection events are sorted in random order). If we assume that the population is homogeneously mixed, all individuals have, on average, the same chances of contracting the disease. The probability that \(j_h=i\) is thus \(S_i(t_h)/S(t_h)\), i.e., the probability that, at the time \(t_h\) when the *h*th infection occurs, a susceptible individual chosen uniformly at random belongs to \(V_i\). Under homogeneous mixing, for fixed \(t_h\), the expected value of the ratio \(S_i(t_h)/S(t_h)\) can be approximated by the local population density \(\rho _i=N_i/N\).

Under these assumptions, \(V_i\)’s epidemic size \(R_i\) follows a binomial distribution of parameters \(\rho _i\) and *R*, hence, its expected value is \(\rho _i {\mathbb {E}[R]}\). This gives:

$$\begin{aligned} \alpha _i = \rho _i\frac{N}{N_i} = 1 . \end{aligned}$$

(3)

Let \(W_i\) be the number of infection events needed to observe the first infection in \(V_i\), i.e. \(W_i=h_i\) if \(h_i=\min \{h:j_h=i\}\). \(W_i\) follows the geometric distribution \(\rho _i(1-\rho _i)^{h_i-1}\), and the expected value of \(W_i\) is \(1/\rho _i\). In the exponential phase of the epidemic the number of contagion events grows exponentially in time and we can assume that the *h*th contagion event occurs at time \(t_h \propto \ln (h)\), yielding the following estimate for the first infection time:

$$\begin{aligned} \tau _i\simeq - \ln (\rho _i) \cdot {\text {const.}} = -\ln \left( \frac{N_i}{N}\right) \cdot {\text {const.}} \end{aligned}$$

(4)

### Connection flow to a sub-population

In the HM case, the local density \(\rho _i=N_i/N\) is a good approximation of the probability of an infection reaching the *i*th tile, and it is therefore the key ingredient in determining both \(\alpha _i\) and \(\tau _i\). In many cases, it is possible to partition the population into sub-populations \(\{V_i\}_{i=1}^n\) and to assume that, at least with good approximation, \(p_{u,v}\approx p_{i,j}\) for all \(u\in V_i\) and \(v\in V_j\), i.e., that \(p_{u,v}\) mostly depends on the sub-populations of *u* and *v*. A simple example is the geographic partitioning into subpopulations achieved by dividing a territory with a tiling. More generally, one can partition a population based on a combination of geographic and demographic attributes, such as location and age.

In this case, the probability that at least one individual of \(V_i\) comes into contact with an infectious individual is intuitively proportional to the quantity

$$\begin{aligned} k_i = \frac{N_i\sum _{j} N_j p_{i,j}}{N} = \frac{N_i}{N}\sum _{j} N_j p_{i,j} . \end{aligned}$$

(5)

\(k_i\) can be interpreted as a *per node connection flow* to the *i*th tile: a randomly chosen node \(u\in V\) has on average \(k_i\) daily contacts with the nodes \(v \in V_i\)—i.e., the dynamic network \(G^t\) contains, on average, \(k_i\) edges incident to *u* and \(V_i\). If we know that none of the individuals in \(V_i\) is infectious, a better estimate is provided by

$$\begin{aligned} {\bar{k}}_i = \frac{N_i}{N}\sum _{j\ne i} N_j p_{i,j} , \end{aligned}$$

(6)

which only differs from \(k_i\) for the fact that the average does not involve the members of \(V_i\) itself.

The same rationale used for the HM case now brings us to the following general estimate for \(\alpha _i\) in a partitioned population:

$$\begin{aligned} \alpha _i \simeq k_i\frac{N}{N_i} \cdot {\text {const.}} = \sum _{j} N_j p_{i,j} \cdot {\text {const.}} \end{aligned}$$

(7)

Along the same line, if the infection reaches \(V_i\) during the phase of exponential growth of the epidemic, \(\tau _i\) can be estimated as:

$$\begin{aligned} \tau _i\simeq - \ln ({\bar{k}}_i) \cdot {\text {const.}} = - \ln \left( \frac{N_i}{N}\sum _{j\ne i} N_j p_{i,j}\right) \cdot {\text {const.}} \end{aligned}$$

(8)

### Distance-based mixing

Let \(d_{i,j}\) be the normalized geographic distance between the center of tiles \(V_i\) and \(V_j\). The normalization is obtained through a division by \(\frac{l}{2}\), where *l* is the tile side. Additionally, we impose \(d_{i,i}=1\), so that individuals in the same tile are at half the distance of individuals living in neighboring tiles. Finally, we set \(d_{u,v}=d_{i,j}\) for all \(u\in V_i\) and \(v\in V_j\). This approximation is often inevitable, at least for small values of *l*, because population density data are generally discrete and we may only have access to the residence tile of a vertex.

In line with previous empirical findings (Liben-Nowell et al. 2005; Illenberger et al. 2013; Kowald et al. 2013; Herrera-Yagüe et al. 2015; Büchel and Ehrlich 2020), to model distance-based mixing (DM) we assume that the probability that \(u\in V_i\) and \(v\in V_j\) come into contact decays as a power of the distance: \(p_{u,v}=p_{i,j}\propto d_{i,j}^{-\gamma }\) for \(\gamma >0\). It is straightforward that substituting \(p_{i,j} \propto d_{i,j}^{-\gamma }\) in (5) and (6) gives

$$\begin{aligned} k_i&\propto \frac{N_i}{N}\sum _{j} N_j d_{i,j}^{-\gamma } , \end{aligned}$$

(9)

$$\begin{aligned} {\bar{k}}_i&\propto \frac{N_i}{N}\sum _{j\ne i} N_j d_{i,j}^{-\gamma }. \end{aligned}$$

(10)

The right hand sides of (9) and (10) are, actually, measures of geographic centrality. This leads us to introduce the following metrics:

$$\begin{aligned}&\text {HC}^{\gamma }(v) = \frac{1}{N}\sum _{u\in V} d_{u,v}^{-\gamma } , \end{aligned}$$

(11)

$$\begin{aligned}&\text {HC}^{\gamma }_i = \text {HC}^{\gamma }(V_i) = \frac{N_i}{N}\sum _{j=1}^n N_j d_{i,j}^{-\gamma } , \end{aligned}$$

(12)

$$\begin{aligned}&{ \bar{\text {HC}}^{\gamma }(v) = \frac{1}{N}\sum _{u\in V\setminus V_i}^n d_{u,v}^{-\gamma } \quad {\text {for }} v \in V_i}, \end{aligned}$$

(13)

$$\begin{aligned}&\bar{\text {HC}}^{\gamma }_i = \bar{\text {HC}}^{\gamma }(V_i) = \frac{N_i}{N}\sum _{\begin{array}{c} j=1\\ j\ne i \end{array}}^n N_j d_{i,j}^{-\gamma } = \text {HC}^{\gamma }_i -\frac{N_i}{N}d_{i,i}^{-\gamma } . \end{aligned}$$

(14)

\(\text {HC}^{\gamma }(v)\) and \(\text {HC}^{\gamma }_i\) are the *harmonic geographic centrality* of order \(\gamma\) of vertex *v* and of tile \(V_i\), respectively. \(\bar{\text {HC}}^{\gamma }(v)\) and \(\bar{\text {HC}}^{\gamma }_i\) are the *outward* versions of \(\text {HC}^{\gamma }(v)\) and \(\text {HC}^{\gamma }_i\) that only measure the centrality of the individuals \(v\in V_i\) with respect to the population that does not belong to \(V_i\).

Substituting \(k_i{\propto }\text {HC}^{\gamma }_i\) and \({\bar{k}}_i{\propto }\bar{\text {HC}}^{\gamma }_i\) in (7) and (8), respectively, we obtain the following estimates for the first infection time and the relative attack rate under the DM model:

$$\begin{aligned}&\alpha _i \simeq \frac{N}{N_i}\text {HC}^{\gamma }_i \cdot {\text {const.}} ,\end{aligned}$$

(15)

$$\begin{aligned} & \tau _{i} \simeq - \ln (\bar{\text {HC}}^{\gamma }_{i}) \cdot {\text {const.}} \end{aligned}$$

(16)