 Research
 Open Access
 Published:
Mobile phone data reveals the importance of predisaster intercity social ties for recovery after Hurricane Maria
Applied Network Science volume 4, Article number: 98 (2019)
Abstract
Recent disasters have shown the existence of large variance in recovery trajectories across cities that have experienced similar damage levels. Case studies of such events reveal the high complexity of the recovery process of cities, where intercity dependencies and intracity coupling of social and physical systems may affect the outcomes in unforeseen ways. Despite the large implications of understanding the recovery processes of cities after disasters for many domains including critical services, disaster management, and public health, little work have been performed to unravel this complexity. Rather, works are limited to analyzing and modeling cities as independent entities, and have largely neglected the effect that intercity connectivity may have on the recovery of each city. Large scale mobility data (e.g. mobile phone data, social media data) have enabled us to observe human mobility patterns within and across cities with high spatial and temporal granularity. In this paper, we investigate how intercity dependencies in both physical as well as social forms contribute to the recovery performances of cities after disasters, through a case study of the population recovery patterns of 78 Puerto Rican counties after Hurricane Maria using mobile phone location data. Various network metrics are used to quantify the types of intercity dependencies that play an important role for effective postdisaster recovery. We find that intercity social connectivity, which is measured by predisaster mobility patterns, is crucial for quicker recovery after Hurricane Maria. More specifically, counties that had more influx and outflux of people prior to the hurricane, were able to recover faster. Our findings highlight the importance of fostering the social connectivity between cities to prepare effectively for future disasters. This paper introduces a new perspective in the community resilience literature, where we take into account the intercity dependencies in the recovery process rather than analyzing each community as independent entities.
Introduction
Following the recent series of natural hazards with unprecedented severity and magnitude, including Hurricanes Harvey, Irma and Maria, the concept of urban resilience has gained significant global attention (Kull et al. 2016; Gitay et al. 2013). For many cities, it is of utmost importance to minimize economic loss and maintain the wellbeing of their citizens (Eakin et al. 2017; Kousky 2014). Recent disasters have shown the existence of large variance and disparities in recovery trajectories across communities that have experienced similar damage levels (Finch et al. 2010; Aldrich 2012). We have witnessed manifold cases where cities experience significant loss of population even after sufficient recovery of infrastructure systems (Myers et al. 2008; McCaughey et al. 2018). These case studies reveal the high complexity of the recovery of cities, where both the intercity dependencies and intracity coupling of social and physical systems may affect the recovery outcomes in an unforeseen and nonlinear manner (Klammler et al. 2018; Aerts et al. 2018).
Post disaster population displacement and recovery patterns on the city scale have long been studied based on surveys and census data (McCaughey et al. 2018; Cutter et al. 2008; Gray and Mueller 2012; Finch et al. 2010; Fussell et al. 2014; DeWaard et al. 2016; Sadri et al. 2018; Piguet et al. 2011). Such studies reveal factors that are associated with displacement and recovery; however, they often neglect the effect of interdependencies between cities. With the increase in the availability of large mobility datasets including mobile phone call detail records, Global Positioning System (GPS) logs, and social media posts, it has become possible to observe individual mobility patterns at an unprecedented high spatiotemporal granularity (Deville et al. 2014; Jiang et al. 2016; Wardrop et al. 2018; Gonzalez et al. 2008; Schneider et al. 2013). In the context of disasters, several studies have focused on analyzing human mobility during and after earthquakes (Lu et al. 2012; Song et al. 2014; Yabe et al. 2016), hurricanes (Wang and Taylor 2014; 2016) and other anomalous events (Bagrow et al. 2011). Most recently, cross comparative analysis of population recovery using large scale mobile phone GPS datasets from multiple disasters have unraveled general patterns of population recovery (Yabe et al. 2019). Moreover, the heterogeneity in initial and longterm displacement rates across various communities were explained by a set of key universal factors including the community’s median income level, population size, housing damage rate, and the connectivity to other regions. Despite such progress, the importance of intercity dependencies on recovery is understudied in the literature. The aforementioned work (Yabe et al. 2019) only focused on the proximity of regions to other regions on the road network. The proximity of region i to other regions was defined as \(d_{p}(i)= \frac {\sum _{j\in S(i)} N_{j}}{N_{i}}\), where N_{i} denotes the number of households in region i, and S(i) denotes the set of regions that can be reached within 1 hour by vehicles from region i. This metric fails to capture the social dimension of the connectivity between cities.
In this paper, we investigate how intercity dependencies in both physical as well as social forms contribute to the recovery of cities after disasters. We investigate this problem through a case study of the population recovery patterns of 78 Puerto Rican counties after Hurricane Maria. We analyze mobile phone location datasets, which include the GPS location data of more than 50,000 unique users from over 6 months before and after the Hurricane. Various metrics from the spatial networks literature are used to quantify the nodelevel characteristics of Puerto Rican counties on social and physical intercity networks to understand the types of intercity dependencies that play an important role for effective postdisaster recovery. We find that intercity social connectivity, which is measured using predisaster mobility patterns, is crucial for quicker recovery after Hurricane Maria. More specifically, counties that had more influx and outflux of people prior to the hurricane were able to recover faster. Our findings highlight the importance of fostering the social connectivity between cities as well as strengthening the physical infrastructure, to prepare effectively for future disasters. This paper introduces a new perspective in the community resilience literature, where we take into account the intercity dependencies in the recovery process rather than analyzing each community as independent entities.
The remainder of the paper is organized as follows. First, we describe the datasets (mobile phone data, socioeconomic data, and disaster damage data) used in this study. Next, we introduce our methods for 1) estimating the time until recovery using mobile phone data, 2) computing the node level network statistics that reflect the physical and social connectivity, and 3) spatial regression analysis. Then, we present the empirical results using data collected from Puerto Rico after Hurricane Maria in 2017. Finally, the results are discussed and the study is concluded in the final sections.
Data
Mobile phone data
Mobile phone location data were collected by Safegraph Inc.^{Footnote 1} during Hurricane Maria. GPS data were obtained from mobile phones of individuals who agreed to provide their location data for research purposes, and all information were anonymized to protect the identity and security of users. Each observation includes the spatial and temporal information (time, longitude, latitude) of mobile phones as well as a hashed unique identifier (ID), as shown in Table 1. The example ID is masked to protect privacy issues.
Figure 1 shows the probability density of the number of observations per day for a user in the location dataset. As shown in panel A), the probability density of the number of GPS logs per user follows a power law distribution. The average number of logs per user was 83 and the standard deviation was 262. For our analysis of population recovery on the county scale, it is important to account for the biases in population samples. To ensure that the data are not spatially biased, we plot the number of samples and sample rate (= observed IDs / population) proportional to census population in Fig. 1 for Puerto Rico. We can see that the number of mobile phone samples are highly correlated with census population for each county, with Pearson’s correlation coefficient of 0.985 Fig. 1b. Moreover, Fig. 1c shows that the sample rates have low correlations with census population. Thus, it is shown that the mobile phone data have little to no spatial bias in sample distributions.
However, after the Hurricane, the spatial bias in mobile phone user samples increases due to various reasons, including damages to mobile phone towers cutting off mobile phone users from the mobile network, and power outages causing people to not be able to recharge their phones. As explained more in depth in the “Methods” section, we overcome this issue by using data from only the temporal periods where the spatial bias of mobile phone user sampling rate was minimal.
Sociodemographic data
In this study, population and income data of each county were used for regression analysis, in accordance with the findings from previous studies (Yabe et al. 2019). Population data were obtained from the US National Census^{Footnote 2}, and median income data were obtained from the American Community Survey^{Footnote 3}.
Figure 2a and b show spatial plots of the population and income distribution of counties in Puerto Rico, respectively. We can observe that in Puerto Rico, the majority of the population is concentrated in the metro areas including San Juan County. The income disparity is less significant compared to the population disparity.
Disaster damage data
The tropical cyclone Hurricane Maria developed on September 16, 2017 in the Atlantic Ocean to the northeast of South America, and made landfall in southeastern Puerto Rico on September 20, 2017 with wind speeds of 155 miles per hour. Damage to Puerto Rico was severe and widespread following the Hurricane, with heavy rainfall, flooding, storm surge, and high winds causing considerable damage (Pasch et al. 2018). Various infrastructure systems were heavily damaged, causing power outages and water shortages for the entire island for months (United States Department of Energy and Restoration 2017). Fatalities as a consequence of Maria are still under investigation, however the most recent estimates suggest between 793 to 8,498 excess deaths occurred following the storm (Kishore et al. 2018). Total economic losses are estimated to be $ 92 billion.
Physical damage caused by the hurricane are measured by the housing damage rates in each county, which was provided through the “Housing Assistance Data” provided by the Federal Emergency Management Agency (FEMA). The raw data can be found through the link^{Footnote 4}. We defined “housing damage rate” for each county as the total number of houses that were inspected to have had more than $ 10,000 worth of damage due to the target hurricane, divided by the number of households in that county. Figure 2c shows the spatial distribution of the housing damages in Puerto Rico. They gray dotted lines show the trajectory of the hurricane, and we can observe higher housing damage rates near the trajectory. We can observe that many of the counties in Puerto Rico experienced high housing damage rates, between 20% and 60%.
Methods
Estimating population recovery from fragmented Mobile phone data
To estimate the recovery time of each county, the population displacement dynamics were estimated from mobile phone observations. First, the home locations of individuals were estimated from the mobile phone data observed prior to the hurricane. It is well known that human mobility trajectories show a high degree of temporal and spatial regularity, each individual having a significant probability to return to a few highly frequented locations, including his/her home location (Gonzalez et al. 2008). Due to this characteristic, it has been shown that home locations of individuals can be estimated with high accuracy by clustering the individual’s stay point locations over night (Calabrese et al. 2011). Home locations of each individual were estimated by applying meanshift clustering to the nighttime stay points (observed between 8PM and 6AM), weighted by the duration of stays in each location (Ashbrook and Starner 2003; Kanasugi et al. 2013). Mean shift clustering was implemented using the scikitlearn package on Python. An individual was detected to be displaced if the individual is estimated to be staying in a location outside the city of his/her estimated home location. Such home location estimates were used to check the representativeness of mobile phone user samples in Fig. 1.
Second, using the longitudinal observations during and after the hurricane, the displacement rate of each county was estimated. The nighttime stay points of each individual were estimated from their mobile phone location data observed during night time, using meanshift clustering of mobile phone observation points weighted by the length of stay at each point. We spatially aggregated the number of people estimated to have stayed the night in each county to obtain the daily population count for each county. However, we found that there was a significant decrease in the number of observed users during and after Hurricane Maria. The number of mobile phone users observed before, during and after Hurricane Maria are shown in Fig. 3a. We observe a significant decrease in the number of users over time, which is assumed to be due to various reasons including: 1) no signals due to damage to mobile phone towers, 2) dormant users due to lack of power to charge their phones, and 3) users who uninstalled their app which collected location data. However, since our interest is in the displacement rate of the users in each county, the absolute number of observed mobile phone users is less important in this study. Rather, whether or not the sampling rate of mobile phone users (# of mobile phone users estimated to be residents of county i / # of census population of county i) is uniform across all counties is important for our analysis. The Pearson Correlation between each county’s census population and observed mobile phone users for each day, are plotted in Fig. 3b. The red plots indicate days where the correlation is within 1% confidence interval from the mean correlation after December 15th, where the sample rates are stable. The blue plots indicate days where the correlation is significantly low, where sample rates of mobile phone user samples are biased across counties. We observe that until October 7th, the Pearson Correlation is significantly lower than usual, indicating that there is spatial inequality in the sampling rates of mobile phone users. However, after October 7th, which is 16 days from the Hurricane, the spatial bias becomes minimal (high correlation), and shows that we are able to construct the population recovery dynamics without significant sampling bias. Thus, we use population displacement data observed after October 7th to estimate the recovery time to avoid the effects of power outages and mobile phone tower outages.
The observed population dynamics are fitted with an exponential function. We use the exponential model instead of the linear recovery function due to better fit to data (see “Appendix” section). We define “time until recovery”, denoted by \(\tilde {T}_{i}^{\alpha }\), as the timing of the first time the city’s population recovers to \(M_{i}^{\alpha } = M_{i}^{\infty } \cdot \alpha \), which is characterized by the long term normalized population after the disaster \(\left (M_{i}^{\infty }\right)\) and a constant parameter 0<α<1. In cases where the long term population exceeds the original population \(\left (M_{i}^{bef} = 1 < M_{i}^{\infty }\right)\), the amount of time after exceeding the original population should not be considered as time used for recover (rather, the population is growing after the disaster). In these cases, \(\tilde {T}_{i}^{\alpha }\) is defined as the timing of the first time the city’s population recovers to α, which is the population of predisaster level multiplied by α. In sum, time until recovery is defined as the following:
where min(x,y) returns the smaller value between x and y. α=0.95 is used as the threshold parameter in the further results, since time until recovery are not affected by the parameter value significantly (see “Appendix” section). For simple notations, we write \(\tilde {T}_{i}^{0.95}\) as T_{i} and call the variable “Time Until Recovery” in the following analysis. Figure 4 shows the estimated recovery times T_{i} of each county. We observe a general trend where the counties around San Juan recover quickly (yellow), and recovery spreads to coastal cities, then interior regions (red).
Nodelevel network statistics
Distance metrics for edge weights
To investigate the effect of various types of intercity dependencies on postdisaster recovery, we construct multiple networks based on various edge weights between nodes (78 counties in Puerto Rico). Table 2 lists the 5 distance metrics that were used as edge weights to build the intercity network \(\mathcal {N}\) in this study. Given a distance metric x, we denote the network built using distance metric x as \(\mathcal {N}_{x}\). The distance metrics (edge weights) are Euclidean distance e, travel time TT, road distance RD, mobility flow F, and the number of overnight stays S. Euclidean distance is an undirected metric, calculated by \(e_{ij} = \sqrt {(x_{i}x_{j})^{2} + (y_{j}y_{j})^{2}}\) given center points of two counties (x_{i},y_{i}) and (x_{j},y_{j}). Travel time and road distance between counties i and j were calculated using Google Maps API^{Footnote 5} in the usual conditions (prior to the disaster). Although these metrics are almost symmetrical, there are differences in travel times and fastest routes depending on the direction of travel, thus would produce directed networks. The two social distance metrics are mobility flow F and the number of overnight stays S, which are both observed using mobile phone data from prior to the disaster. Mobility flow F_{ij} is defined as the inverse of the average number of travelers from node i to node j in the usual state (prior to disaster). The number of overnight stays S_{ij} is similar to mobility flow, and is defined as the inverse of the average number of visitors from node i who stay overnight at node j in the usual state (prior to disaster).
Figure 5 visualizes the networks built using the 5 distance metrics. Node sizes are proportional to the predisaster population of each county prior to the disaster. All of these metrics can be defined for all origindestination pairs, however for clarity of the figures, we only show the edges with the 500 highest edge weights. The weight of each edge is shown as the width of each edge. For the direct networks (BE), sum of the weights on both directions are visualized. From visual inspection, we can see a large difference in the structure of networks built based on physical (AC) and social metrics (D,E). In particular, in D) and E), we can see that some cities on the coast have large direct weights between San Juan, which mean that despite the large physical distance, there are many people who travel and/or stay overnight between these cities.
Table 3 shows the elementwise Pearson correlation between edge weights based on different distance metrics. We can see that Euclidean distance \(\mathcal {N}_{e}\) has high correlations with road distance \(\mathcal {N}_{RD}\) and travel times \(\mathcal {N}_{TT}\), thus we exclude Euclidean distance from our analysis, and focus on networks constructed by the four remaining distance metrics.
Network statistics
Using the four different networks built from physical and social distance metrics defined above, we attempt to understand what type of network statistic on these networks can explain the time until recovery well in Puerto Rico. Identifying the most highly correlated network statistic and distance metric could provide insights into the underlying process that dictates the recovery of counties after disasters. In a similar manner as the distance metrics, we test various nodelevel statistics to compute the importance of each node in each network. Table 4 lists the nodelevel statistics that we will test on the four different networks \(\mathcal {N}_{x}\). The six network statistics for node i in network \(\mathcal {N}_{x}\) are: weighted in, out, and total degrees (\(WI(\mathcal {N}_{x})_{i}\), \(WO(\mathcal {N}_{x})_{i}\), and \(W(\mathcal {N}_{x})_{i}\), respectively), weighted clustering coefficient \(CC(\mathcal {N}_{x})_{i}\), direct distance from the source of recovery (San Juan in the case of Puerto Rico) \(DSJ(\mathcal {N}_{x})_{i}\), and shortest path distance from the source of recovery \(SPSJ(\mathcal {N}_{x})_{i}\).
Such network statistic measures were proposed in the literature of weighted directed and spatial networks to quantify the characteristics of each node (Barthélemy 2011; Barrat et al. 2005; Barrat et al. 2004; Opsahl et al. 2010; Fagiolo 2007). Such metrics are applied in various domains to assess the importance of nodes in weighted networks, including urban networks (Zhong et al. 2014), urban traffic networks (De Montis et al. 2007), airport networks (Li and Cai 2004), and metabolic processes (Almaas et al. 2004).
A weighted network \(\mathcal {N}_{x}\) can be described with a N×N adjacency matrix W, where w_{ij} denotes the (i.j)th element and the weight assigned to edge i to j. The weighted in, out, total degrees, weighted clustering coefficient, and weighted betweenness centrality of node i were calculated for each weighted network \(\mathcal {N}_{x}\). The weighted indegree of node i is calculated by \(WI(\mathcal {N}_{x})_{i} = \sum _{j \in \mathcal {V}(i)} w_{ji}\), where \(\mathcal {V}(i)\) is the set of nodes connected to node i. Similarly, the weighted outdegree of node i is calculated by \(WO(\mathcal {N}_{x})_{i} = \sum _{j \in \mathcal {V}(i)} w_{ij}\). The weighted total degree of node i is the sum of in and out degrees \(W(\mathcal {N}_{x})_{i} = WI(\mathcal {N}_{x})_{i} + WO(\mathcal {N}_{x})_{i}\). The weighted clustering coefficient measures the statistical level of cohesiveness around node i. In general, this value decays with respect to degrees, shown in both airplane network and authors network (Barrat et al. 2004). This is because low degree nodes are connected to highly connected communities, while large degree nodes are connected to many nodes that are not directly connected. It is computed by \(CC(\mathcal {N}_{x})_{i} = \frac {(\hat {W}+\hat {W}^{T})_{ii}^{3}}{2 T_{i}^{D}}\), where \(\hat {W}=\left \{w_{ij}^{\frac {1}{3}}\right \}\) and \(T_{i}^{D} = d_{i}^{tot}\left (d_{i}^{tot}1\right)2d_{i}^{\leftrightarrow }\), \(d_{i}^{\leftrightarrow } = W^{2}_{ii}\). The direct distance from San Juan of node i is the simply the distance metric between node i and San Juan. San Juan is chosen as the destination because it was the major source of recovery after Hurricane Maria in Puerto Rico. Similarly, we also measure the shortest path distance from San Juan on network \(\mathcal {N}_{x}\).
Thus, in summary, we construct networks based on four different distance metrics, and define six nodelevel statistics for each defined network. This gives us 24 different network metrics that each quantify the physical and/or social characteristics of each node from different aspects. In the next section, we test whether these network metrics can explain the variance in recovery speed across counties in Puerto Rico after Hurricane Maria.
Spatial regression models
First, ordinary least squares (OLS) method is used to estimate the parameters of the general regression model specified below.
where, y is an n×1 vector representing the objective variable (recovery time), x is an n×k matrix of the independent variables, and β is a k×1 vector of the coefficients. Here, the error term ε is assumed to be an i.i.d. normal. When there is spatial dependence in the error term, the i.i.d. normal assumption is violated. Two approaches are taken to deal with spatial correlation (Ward and Gleditsch 2018). First is the spatial lag model, where the error term is decomposed into a spatially lagged term for the dependent variable and an independent error term, ε=ρWy+e, where W is the matrix that reflects the spatial proximity between areas which is commonly defined by encoding the knearest neighbors. This gives us the spatial lag model, described by the following equation:
where ε∼N(0,σ^{2}I). The parameters can be estimated using maximum likelihood estimation.
The other approach is to assume that the error is spatially correlated, instead of the objective variables affecting the objective variables of neighboring areas. We can write the spatial error model as follows:
where ε∼N(0,σ^{2}I). In the following regression analysis, we first test the general regression model and also test for significance in spatial lag ρ and spatial error λ terms. Then, we apply the appropriate spatial regression analysis accordingly.
Results
Table 5 shows the statistics of the independent and objective variables. In the regression models, recovery time T_{i} is set as the objective variable. The variables in the second block (county population, median income, and housing damage rate) and one variable from the third block (network statistics) are used as independent variables for each of the regression models. The network statistic variables are normalized by the following equation:
where max(x) and min(x) are maximum and minimum values of variable x.
The Pearson correlations between recovery time and each independent variable are shown in the right column of Table 5. We observe that all sociodemographic variables (county population, median income, and housing damage rate) have moderate correlations with recovery time, as shown in past studies (Yabe et al. 2019). Among the network statistic variables, all of the nodelevel statistics of the mobility flow based network and overnight stay network had significant correlations with recovery time, indicating the significant effect of intercity social connectivity on postdisaster recovery. In contrast to social connectivity, nodelevel statistics computed from physical networks were less significantly correlated with recovery time in Puerto Rico. To examine the collinearity effect of these network statistics, we test the regression models using each nodelevel network statistics. Figure 6 shows the Akaike Information Criterion (AIC) and adjusted R^{2} of each regression model. We observe that the regression model using the weighted total degree of the mobility flow network (\(W(\mathcal {N}_{F})\)) has the lowest AIC and adjusted R^{2} value. Table 6 shows the detailed regression results of the two regression models with and without the \(W(\mathcal {N}_{F})\) variable. The estimated regression coefficients and their significance levels are shown in stars. Using the network statistic, both the AIC and adjusted R^{2} improve significantly, and we observe that the population variable becomes insignificant when considering the intercity network variable. Moreover, the results show that the network metric variable negatively affects recovery time, meaning that the more the influx and outflux mobility flow before the disaster, the quicker the recovery.
The spatial dependence of recovery time is tested using various metrics in Table 7. All metrics including the Moran’s I, the lag Lagrange multiplier ρ, and the error Lagrange multiplier λ are significant, showing significant spatial dependence. Robust tests of both Lagrange multipliers show that the error Lagrange multiplier λ is more significant. Thus, we test the Spatial Error Model and compare the results with the OLS Model in Table 8. Results show that the AIC is lower in the Spatial Error Model, indicating that spatial dependence in the error term explains the heterogeneity in recovery time across the counties in Puerto Rico. In both models, income levels and the network metric (\(W(\mathcal {N}_{F})\)) have significant effects on the recovery time. Housing damage rates, however, become insignificant under the spatial error model.
Discussions
Our analyses based on observational data from Puerto Rico after Hurricane Maria confirmed that intercity network metrics, namely the predisaster mobility flow, has a significant positive influence on the speed of recovery. The results imply that the more socially connected an area is to other counties, the more easier it is for people living in those communities to receive support and to recovery quickly. This paper introduces a new perspective in the community resilience literature, where we take into account the intercity dependencies in the recovery process rather than analyzing each community as independent entities. These insights encourage communities to prepare for future hazards by not only preparing its physical infrastructure (e.g. roads), but also by strengthening their social connectivity with other cities, to have greater chances of receiving support in case of emergencies.
Now, we discuss future research opportunities that this study enables. First, Puerto Rico is a unique case study because of its island geography. It is valuable to examine whether the same rules apply to other regions with different geographical characteristics. We will start collecting additional data from other disaster events to test the generalizability of our method between different disaster events. The Haiti Earthquake is an example where a large disaster struck a lowincome island region. Another example where we observe severe damage is the Tohoku Tsunami (Japan) in 2011, where the coastal cities of the east coast of the Tohoku region are still struggling to recovery from the disaster. Comparing the analysis presented in this study across different disaster instances would be an interesting topic for future research.
Second, modeling the underlying process of recovery was not in the scope of this study. This work was limited to testing the statistical significance of network metrics using econometric models. To better understand, predict and control the recovery of communities after disasters, there is a need to model the underlying process that dictates the population recovery. Developing agent based models and system dynamics models for predicting community recovery based on the insights obtained from this study will be the next steps in our research.
Thirdly, the reliability of the results in this study could improve if we could increase the diversity of datasets to quantify the social connectivity between counties. One candidate would be Twitter data, where we can use textmining to determine counties that have frequent contacts via messaging or retweeting. Also, call records of mobile phones would be a good data source to quantify social connections between two counties. In Puerto Rico, social connectivity measured by the mobility of people was shown to explain recovery times. It would be valuable to investigate whether the social measures used in this study would apply to other regions with different social norms, such as Japan or mainland US. Using more datasets to investigate such questions on the intercity dependencies in the recovery process would be the focus of future studies.
Conclusions
Using large scale mobile phone data collected from Puerto Rico, we revealed the importance of intercity social connectivity on disaster recovery after Hurricane Maria. More specifically, we showed that observing the mobility patterns between counties prior to the disaster can increase the predictability of time until recovery of communities. These insights highlight the importance of communities and policy makers to invest more into developing the social networks across counties or nearby cities through the interaction of people prior to the disaster to prepare for future disasters, as well as investing into the physical infrastructure networks.
Appendix
On the choice of the exponential recovery model
The normalized population data were fitted using two different functions to denoise the observations. We test two functions: 1) linear model and 2) exponential model.
The exponential model is defined as follows:
where, τ_{i} is the recovery speed of city i, \(M_{i}^{0}\) is the initial population rate of city i, and \(M_{i}^{\infty }\) is the long term population rate of city i. t=T_{0} denotes the timing of initial population rate. Figure 7a shows an illustration of the exponential model.
The alternative is the linear model, defined as the following:
where, β_{i} is the coefficient of the linear recovery speed of city i, \(M_{i}^{0}\) is the initial population rate of city i, and \(M_{i}^{\infty }\) is the long term population rate of city i. t=T_{c} denotes the timing where the linear recovery line reaches \(M_{i}^{\infty }\). Figure 7b shows an illustration of the linear model.
We chose the model by testing which model is able to fit the empirical data obtain from Puerto Rico after Hurricane Maria. Parameters (τ_{i}, β_{i} and \(M_{i}^{\infty }\)) of the models were fitted using least squares method. \(M_{i}^{0}\), which is the initial population rate, is defined as \(M_{i}^{0} = \min _{t} M_{i}(t)\). Squared error (SE) and Pearson’s correlation coefficient ρ between the observed and fitted time series were used as evaluation metrics. SE is defined as:
where M(t) and \(\hat {M(t)}\) are observed and fitted values of the time series, and T is the total number of time steps. Pearson’s correlation coefficient ρ is defined as:
where Cov(x,y) is the covariance between x and y, σ_{x} is the standard deviation of x.
Figure 8 shows the histograms of the two metrics for the two models, to evaluate the goodness of fit of the two models. We can observe that the exponential model has lower SE and higher correlation, meaning that it is able to better approximate the observations. Thus, in our further analysis, we use the results obtained by fitting the observations to the exponential model.
On the robustness against parameter α for recovery time estimation
As shown in Fig. 9, Pearson’s correlation coefficients between \(\tilde {T}_{i}^{0.80}\), \(\tilde {T}_{i}^{0.85}\), \(\tilde {T}_{i}^{0.90}\), \(\tilde {T}_{i}^{0.95}\) are all very high (ρ>0.96). This implies that the \(\tilde {T}_{i}^{\alpha }\) values do not depend highly on the values of α. Thus, we use α=0.95 as a parameter for our further analysis. For notational simplicity, we write \(\tilde {T}_{i}^{0.95}\) as \(\tilde {T}_{i}\) and call the variable “time until recovery” in the analysis.
Availability of data and materials
Mobile phone data that support the findings of this study are available from Safegraph Inc. but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Safegraph Inc. Other collected data are available from official sources that are openly accessible. Computer codes used to process and analyze the data are posted on the author’s github page (https://github.com/takayabe0505).
Change history
09 January 2020
Following publication of the original article [1], the author reported to remove the name from the acknowledgements in the original article.
Abbreviations
 GPS:

Global positioning system
 ID:

Identifier
 OLS:

Ordinary least squares
 SE:

Squared error
References
Aerts, J, Botzen W, Clarke K, Cutter S, Hall JW, Merz B, MichelKerjan E, Mysiak J, Surminski S, Kunreuther H (2018) Integrating human behaviour dynamics into flood disaster risk assessment. Nat Clim Chang 8(3):193.
Aldrich, DP (2012) Building Resilience: Social Capital in Postdisaster Recovery. University of Chicago Press, Chicago.
Almaas, E, Kovacs B, Vicsek T, Oltvai Z, Barabási AL (2004) Global organization of metabolic fluxes in the bacterium escherichia coli. Nature 427(6977):839.
Ashbrook, D, Starner T (2003) Using gps to learn significant locations and predict movement across multiple users. Pers Ubiquit Comput 7(5):275–286.
Bagrow, JP, Wang D, Barabasi AL (2011) Collective response of human populations to largescale emergencies. PLoS ONE 6(3):17680.
Barrat, A, Barthelemy M, PastorSatorras R, Vespignani A (2004) The architecture of complex weighted networks. Proc Nat Acad Sci 101(11):3747–3752.
Barrat, A, Barthélemy M, Vespignani A (2005) The effects of spatial constraints on the evolution of weighted complex networks. J Stat Mech: Theory Exper 2005(05):05003.
Barthélemy, M (2011) Spatial networks. Phys Rep 499(13):1–101.
Calabrese, F, Di Lorenzo G, Liu L, Ratti C (2011) Estimating origindestination flows using opportunistically collected mobile phone location data from one million users in boston metropolitan area. IEEE Perv Comput 10(4):36–44.
Cutter, SL, Barnes L, Berry M, Burton C, Evans E, Tate E, Webb J (2008) A placebased model for understanding community resilience to natural disasters. Glob Environ Change 18(4):598–606.
De Montis, A, Barthélemy M, Chessa A, Vespignani A (2007) The structure of interurban traffic: a weighted network analysis. Environ Plan B: Plan Des 34(5):905–924.
Deville, P, Linard C, Martin S, Gilbert M, Stevens FR, Gaughan AE, Blondel VD, Tatem AJ (2014) Dynamic population mapping using mobile phone data. Proc Nat Acad Sci 111(45):15888–15893.
DeWaard, J, Curtis KJ, Fussell E (2016) Population recovery in new orleans after hurricane katrina: exploring the potential role of stage migration in migration systems. Popul Environ 37(4):449–463.
Eakin, H, BojórquezTapia LA, Janssen MA, Georgescu M, ManuelNavarrete D, Vivoni ER, Escalante AE, BaezaCastro A, MazariHiriart M, Lerner AM (2017) Opinion: urban resilience efforts must consider social and political forces. Proc Nat Acad Sci 114(2):186–189.
Fagiolo, G (2007) Clustering in complex directed networks. Phys Rev E 76(2):026107.
Finch, C, Emrich CT, Cutter SL (2010) Disaster disparities and differential recovery in new orleans. Popul Environ 31(4):179–202.
Fussell, E, Curtis KJ, DeWaard J (2014) Recovery migration to the city of new orleans after hurricane katrina: a migration systems approach. Popul Environ 35(3):305–322.
Gitay, H, Bettencourt S, Kull D, Reid R, McCall K, Simpson A, Krausing J, Ambrosi P, Arnold M, Arsovski T, et al. (2013) Building resilience: Integrating climate and disaster risk into developmentlessons from world bank group experience. The World Bank Experience.
Gonzalez, MC, Hidalgo CA, Barabasi AL (2008) Understanding individual human mobility patterns. Nature 453(7196):779.
Gray, CL, Mueller V (2012) Natural disasters and population mobility in bangladesh. Proc Nat Acad Sci 109(16):6000–6005.
Jiang, S, Yang Y, Gupta S, Veneziano D, Athavale S, González MC (2016) The timegeo modeling framework for urban mobility without travel surveys. Proc Nat Acad Sci 113(37):5370–5378.
Kanasugi, H, Sekimoto Y, Kurokawa M, Watanabe T, Muramatsu S, Shibasaki R (2013) Spatiotemporal route estimation consistent with human mobility using cellular network data In: Pervasive Computing and Communications Workshops (PERCOM Workshops), 2013 IEEE International Conference On, 267–272.. IEEE, San Diego.
Kishore, N, Marqués D, Mahmud A, Kiang MV, Rodriguez I, Fuller A, Ebner P, Sorensen C, Racy F, Lemery J, et al. (2018) Mortality in puerto rico after hurricane maria. New Engl J Med 379(2):162–170.
Klammler, H, Rao P, Hatfield K (2018) Modeling dynamic resilience in coupled technologicalsocial systems subjected to stochastic disturbance regimes. Environ Syst Decis 38(1):140–159.
Kousky, C (2014) Informing climate adaptation: A review of the economic costs of natural disasters. Energy Econ 46:576–592.
Kull, D, Gitay H, Bettencourt S, Reid R, Simpson A, McCall K (2016) Building resilience: World bank group experience in climate and disaster resilient development In: Climate Change Adaptation Strategies–An Upstreamdownstream Perspective, 255–270.. Springer International Publishing, Switzerland.
Li, W, Cai X (2004) Statistical analysis of airport network of china. Phys Rev E 69(4):046106.
Lu, X, Bengtsson L, Holme P (2012) Predictability of population displacement after the 2010 haiti earthquake. Proc Nat Acad Sci 109(29):11576–11581.
McCaughey, JW, Daly P, Mundir I, Mahdi S, Patt A (2018) Socioeconomic consequences of postdisaster reconstruction in hazardexposed areas. Nat Sustain 1(1):38.
Myers, CA, Slack T, Singelmann J (2008) Social vulnerability and migration in the wake of disaster: the case of hurricanes katrina and rita. Popul Environ 29(6):271–291.
Opsahl, T, Agneessens F, Skvoretz J (2010) Node centrality in weighted networks: Generalizing degree and shortest paths. Soc Netw 32(3):245–251.
Pasch, RJ, Penny AB, Berg R (2018) National hurricane center tropical cyclone report: Hurricane maria:1–48. https://www.nhc.noaa.gov/data/tcr/AL152017_Maria.pdf.
Piguet, E, Pécoud A, De Guchteneire P (2011) Migration and climate change: An overview. Refug Surv Quart 30(3):1–23.
Sadri, AM, Ukkusuri SV, Lee S, Clawson R, Aldrich D, Nelson MS, Seipel J, Kelly D (2018) The role of social capital, personal networks, and emergency responders in postdisaster recovery and resilience: a study of rural communities in indiana. Nat Hazards 90(3):1377–1406.
Schneider, CM, Belik V, Couronné T, Smoreda Z, González MC (2013) Unravelling daily human mobility motifs. J Royal Soc Int 10(84):20130246.
Song, X, Zhang Q, Sekimoto Y, Shibasaki R (2014) Prediction of human emergency behavior and their mobility following largescale disaster In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 5–14.. ACM, New York.
United States Department of Energy, IS, Restoration E (2017) Hurricanes Nate, Maria, Irma, and Harvey Situation Reports. https://www.energy.gov/ceser/downloads/hurricanesnatemariairmaandharveysituationreports. Accessed 17 Sept 2018.
Wang, Q, Taylor JE (2014) Quantifying human mobility perturbation and resilience in hurricane sandy. PLoS ONE 9(11):112608.
Wang, Q, Taylor JE (2016) Patterns and limitations of urban human mobility resilience under the influence of multiple types of natural disaster. PLoS ONE 11(1):0147299.
Ward, MD, Gleditsch KS (2018) Spatial Regression Models, vol. 155. Sage Publications, United States of America.
Wardrop, N, Jochem W, Bird T, Chamberlain H, Clarke D, Kerr D, Bengtsson L, Juran S, Seaman V, Tatem A (2018) Spatially disaggregated population estimates in the absence of national population and housing census data. Proc Nat Acad Sci 115(14):3529–3537.
Yabe, T, Tsubouchi K, Fujiwara N, Sekimoto Y, Ukkusuri SV (2019) Universal population recovery patterns after disasters. In Rev. arXiv preprint arXiv:1905.01804.
Yabe, T, Tsubouchi K, Sudo A, Sekimoto Y (2016) A framework for evacuation hotspot detection after large scale disasters using location data from smartphones: case study of kumamoto earthquake In: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 44.. ACM, California.
Zhong, C, Arisona SM, Huang X, Batty M, Schmitt G (2014) Detecting the dynamics of urban structure through spatial network analysis. Int J Geograph Inf Sci 28(11):2178–2199.
Acknowledgements
We thank Mr. Noah Yonack of Safegraph Inc. for providing the mobile phone data.
Funding
T.Y. is partly supported by the Doctoral Fellowship provided by the Purdue Systems Collaboratory. The work of T.Y. and S. V. U. is partly funded by NSF Grant No. 1638311 CRISP Type 2/Collaborative Research: Critical Transitions in the Resilience and Recovery of Interdependent Social and Physical Networks.
Author information
Affiliations
Contributions
TY, SVU, and PSCR designed research; TY and SVU analyzed data; TY, SVU, and PSCR wrote the paper. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Yabe, T., Ukkusuri, S.V. & C. Rao, P.S. Mobile phone data reveals the importance of predisaster intercity social ties for recovery after Hurricane Maria. Appl Netw Sci 4, 98 (2019). https://doi.org/10.1007/s4110901902215
Received:
Accepted:
Published:
Keywords
 Disaster recovery
 Human mobility
 Mobile phone data
 Intercity networks
 Social networks