Skip to main content

On strategies to help reduce contamination on public transit: a multilayer network approach


In times of a global pandemic, public transit can be crucial to spreading viruses, especially in big cities. Many works have shown that the human infection risk could be extremely high due to the length of exposure time, transmission routes, and structural characteristics during public transportation, and this can result in the rapid spread of the infection. Vaccines are often part of strategies to reduce contagion; however, they can be scarce in pandemic settings. Considering real-world and large-scale traffic data, this work proposes using time-varying multilayer networks to identify the main critical places to be prioritized in interventions, such as vaccination campaigns, to help reduce contagion on public transit. We exemplify our strategy in different vaccination scenarios. First, when considering only critical bus stops as priority vaccination points, determined by our approach, we indicate that focusing on vaccination in these locations reduces the spread of infection using fewer doses than a random vaccination. In another experiment, we demonstrate the flexibility of our approach in identifying other critical points of interest, healthcare units in this case. Vaccination in these vital health units could also be a viable strategy to curb contagion using a predetermined number of doses. The approach proposed in this study is not limited to vaccination strategies. It also applies to other problems that share similar properties, even in several different contexts, such as optimization in public transit or exploring different points of interest to gather insights from other issues of interest.


Severe acute respiratory syndrome (SARS) and Influenza A (H1N1) have been spreading rapidly worldwide, posing a huge threat to public health. Public transport vehicles are confined spaces that are conducive to the human-to-human transmission of infectious diseases. Consequently, several countries have reported many clusters of cases in public transport vehicles with infections caused by respiratory viruses, including Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) (Shen et al. 2020). Such conditions impose severe financial and public health burdens on our society (Zhu et al. 2012). We currently live amid the SARS-CoV-2 pandemic, which causes the disease COVID-19 and which, by January 2023, had already affected more than 700 million people worldwide, leading to the death of more than 6 million people (World Health Organization - WHO 2023).

In this sense, studying and understanding the transmission of infectious diseases spread through the air, especially in environments with a high density of occupants, such as public transit systems (Zhu et al. 2012; Sun et al. 2014), can help government officials to develop ways to minimize the effects of these transmissions.

The public transit system can be examined as a complex network system (Chodrow et al. 2016; Boccaletti et al. 2006; Estrada 2011), being an efficient way to model and explain the mechanisms of collective behavior (Newman 2018). From this perspective, the most common tool to model such networks is traditional graphs, such as those that do not account for the time dimension or other contextual attributes. However, this approach may represent an oversimplification of a much more complex reality, potentially leading to an under-representation of the system under study (Buldyrev et al. 2010; Cardillo et al. 2013).

More sophisticated modeling of complex systems can be done using several interdependent subsystems (or layers). The concept of a multilayer network goes far beyond a simple intention to capture data heterogeneity. Therefore, a layer in a network can vary according to the context (Kivelä et al. 2014). Thus, studies such as the spread of diseases, navigation, and synchronization in multilayer networks have attracted significant attention (Sahneh et al. 2019; Jacobsen et al. 2018; Lv et al. 2018).

In this work, we use a particular case of multilayer networks in which different nodes, instants of time, and aspects of the layers represent the model’s primary resources. Using public data, we established two multilayer network models and assessed the impact of disease spread and its prevention. Identifying critical points of interest from the models created, we use different vaccination strategies and measure their effects based on mobility patterns observed in the city.

The main contributions of this study can be summarized as follows:

  • Creation of time-varying multilayer models that integrate mobility data with other points of interest in the city regardless of the context;

  • Strategy for the identification of critical points of interest from established models based on the Percolation Centrality;

  • Study and application of the models created in vaccination strategies for disease prevention. We used real and large-scale public transit data for this study.

The rest of the study is organized as follows. “Related Works” Section shows related works, while “Materials and Methods” Section presents the methodological procedures and other relevant information to understand the results. “Experiments and Results” Section 5 presents the results obtained and their analyses, and, finally, “Discussion and Conclusion” Section concludes the work with final discussions.

Related works

Multilayer networks

The applicability of multilayer models for extracting knowledge is a concept introduced previously in the literature (Oselio et al. 2014; Belyi et al. 2017). An example of using these models can be found in the transportation area. Domenico and colleagues (De Domenico et al. 2013) present a tensor structure for the study of multilayer networks, which is useful to address problems in complex multilayer systems, such as inference of who influences who in multichannel social networks or even assist in the development of routing techniques for multimodal transit systems.

In the same line of study, Kurant and Thiran (2006) present a layered model to facilitate the description and analysis of small complex networks. They studied three examples of transit networks, varying sizes from city to continent. For these examples, the bottom layer is the physical infrastructure, and the top is the layer that represents traffic flows. This layered view captures the fundamental differences between the actual load and the commonly used load estimators, explaining why these estimators fail to estimate the actual load.

In another work, De Domenico et al. (2015) point out that it is challenging to identify central agents in complex networks that are sometimes responsible for the faster spread of information, epidemics, failures, and congestion. In such a work, the authors describe a mathematical structure that calculates the centrality in networks and classifies the nodes accordingly, finding those that play essential roles in the cohesion of the entire structure, called the most versatile nodes.

Hristova et al. (2016) present a new network perspective on the interconnected nature of people and places, allowing them to capture the social diversity of urban locations through their visitor’s social networks and mobility patterns. The authors use four metrics of the social diversity of places to distinguish places that commonly meet strangers from those that tend to gather friends and places that attract diverse individuals instead of regular customers. The authors conclude that their analysis mirrors the relationship between the prosperity of people and places, distinguishing between different categories and urban geographies.

Some models of multilayer networks are proposed to represent the variation in time in the mobility study, such as the MAG (Multi-Aspect Graph) model (Wehmuth et al. 2016). Rodrigues et al. (2017) propose SMAFramework to analyze MAG-based mobility patterns. The authors used two data sources (Twitter and Yellow Taxis) to evaluate the method developed in that work. The authors proposed an algorithm for mobility analysis that allows users to analyze the time-space correlation between data collected from different data sources. According to the authors, this algorithm can help with user mapping between layers.

Identification of points of interest

Identifying and recommending points of interest (POIs) have increasingly shown to be a relevant research topic with real-world applications (Liu and Seah 2015).

Huang’s work (Huang et al. 2018) analyzes the relationship between POIs and human mobility network communities in various community detection methods. The authors used the taxi systems of Shanghai and Beijing as case studies. They established transit networks where the urban regions were the network nodes, and the connections between the areas were the edges, weighted by mobility flows. Among the community detection methods analyzed, the authors identified two best-correlated communities and POIs, showing the importance of understanding the formation of space communities and selecting the most appropriate community detection methods.

In Tang’s work (Tang et al. 2020), the authors propose a framework for discovering functional zones by analyzing urban structures and social behaviors. Using the city of Raleigh in the United States as a case study, the GPS (Global System Position) data of human activities and POIs were subjected to an unsupervised clustering method. The proposed approach models the internal influences between spatial locations and human activities. The results showed that this approach better identifies the functional zones than other methods for identifying important POIs, such as the distribution of POIs and collaborative POIs.

To explore the spatial analysis of the use of shared bicycles in New York, Bao et al. (2018) conducted a study that considered the diversity between different categories of bicycle stations using smart card data and POIs, which were collected through the Google Places API. The K-means clustering method was used to classify the stations. In contrast, the geographically weighted regression method was applied to establish the relationship between shared bicycles and the various influencing factors. The analysis confirmed the importance of building separate models for each category of bicycle-sharing stations instead of a joint model. Such an approach is relevant for public administrators in developing specific planning and management strategies in different bicycle-sharing stations in New York City.

Exploring social media data, Thomee et al. (2016) propose a method for automatically locating points of interest represented in photos taken by people worldwide—to keep the location of POIs up to date. This technique explores the geographical coordinates and compass direction provided by modern cameras while considering possible measurement errors due to the variability in the accuracy of the sensors that produced them. This method proved to be more efficient in estimating the geographical coordinates of POIs than the technique based on average. Also, Silva et al. (2013) present an application to identify points of interest in a city exploring Instagram data, illustrating the promising potential of this sort of data for studying city dynamics. More examples in this direction can be found in Silva et al. (2019).

Strategies for reducing disease contagion in public transit

Public transit systems are considered high-risk environments for spreading infections due to confinement conditions and limited ventilation in these means of transport (Zhu et al. 2012).

Mo et al. (2021) propose a weighted time-varying network of encounters to model the spread of infectious diseases through public transit systems using Singapore as a case study. The authors used intelligent card data to understand better the general space-time dynamics of the spread of the SARS-CoV-2 pandemic in the public transit system and assess the effects of various preventive measures, both by public health agencies and the public transit system. The authors concluded that the partial closure of bus routes helps to slow down but cannot fully contain the spread of disease. Besides, they noted that identifying so-called ‘influential passengers’ using smart card data and isolating them earlier can also effectively reduce the spread of the epidemic.

Müller et al. (2020) simulate complete trajectories of individual mobility based on data from cell phones to calculate the risks of spreading COVID-19 in dynamic encounters in transit and buildings. They found that successful contact screening can reduce the reinfection rate by about 30–40%, which suggests the importance of considering time-varying human interactions for epidemic modeling.

Goscé and Johansson (2018) analyze the London metro trips to verify the correlation between using this modality of public transit and the contamination by infectious diseases transmitted by air. Using smart card data, they inferred the routes of passengers and, with the aid of a microscopic analytical model, estimated the spread of these diseases. The authors concluded that passengers coming from regions of the city with a higher rate of influenza also have a greater number of contacts along the journey, so disease-prevention actions are suggested for these agents.

In Ventresca and Aleman’s work (Ventresca and Aleman 2013), the 2006 census of Ontario, CA, was used to build a weighted network and examine different aspects of the network, including a disease spread after hypothetical vaccination in six different strategies. These strategies were divided into four categories. The authors compared the ability to contain the spread of disease in these strategies, examining the numbers of pre and post-vaccination reproduction, an average of the local cluster coefficients, and degree distributions. They conclude that vaccination strategies are most successful when the network is segmented into components of similar sizes.

Wang et al. (2020) propose a method of decision analysis based on case-based reasoning, which aims to solve emergency response problems for preventing and controlling the 2019 coronavirus (COVID-19) in urban rail transport. They use similarity calculations between historical cases recorded in the Tianjin rail transit system. They proposed expanding an existing CBR-based emergency response decision-making method with a new proposal for solving the problem of generating an emergency plan for public health emergencies.

Epidemic spread models with time-spatial behavior

Motivated by the COVID-19 pandemic, several works have recently been published concerned with modeling epidemics’ spread.

Fazio et al. (2022) propose an agent-based model to simulate the impact of mobility restrictions on the spreading of COVID-19 at a large scale level, by considering different factors that can be attributed to the diffusion and lethality of the virus and population mobility patterns. The reproducibility of the proposed methodology and its scalability allow us to apply it to different contexts and administrative levels, from the urban scale to a national one. Moreover, the model is able to provide a decision-support tool for the design of strategic plans to contrast pandemics based on respiratory diseases.

Alexi et al. (2023) propose a novel pandemic intervention policy that relies on the strategic deployment of inspection units (IUs). These IUs are allocated in the environment, represented as a graph, and sample individuals who pass through the same node. If a sampled individual is identified as infected, she is extracted from the environment until she recovers (or dies). A realistic simulation-based evaluation of the Influenza A pathogen using both synthetic and real-world data is provided. The results demonstrate significant potential benefits of the proposal in mitigating a pandemic spread which can complement other standard policies such as social distancing and mask-wearing.

Khorev et al. (2023) analyze the effects of infection development in the area, for example, of a city divided into several population districts. They consider the effect of the central district, which is the hub of infection, and they investigate how the interaction strength influences the city’s level of epidemic development. Finally, they obtained that the final infected amount in the district rises with an increasing degree of connection with the hub.

In addition, many works, as in Jang et al. (2020); Lazebnik and Alexi (20222023); Shen et al. (2020) addressed the issue with graph-based pandemic spread models. However, they did not take into account the geographic location of the POI, for example, the GPS bus stop location or the GPS location of the card entry.

Unlike previous efforts, this paper explores a time-varying multilayer network methodology to point out critical points of interest for vaccination in a scenario with conditions analogous to COVID-19 to obtain greater efficiency of immunization in public transit in large cities.

Models and metrics


This work uses a specific multilayer model, called Multi-Aspect Graph (MAG), a structure capable of representing a time-varying multilayer network. A MAG model is given by \(H=(A,E)\), where E is the set of edges and A is the list of aspects that make up the model. Each aspect \(\alpha\) \(\epsilon\) A is a finite set, and the number of aspects p represents the order of MAG. Each e \(\epsilon\) E edge is a tuple with \(2 \times p\) elements. All edges have the form \(({a}_{1},..., {a}_{p}, {b}_{1},..., {b}_{p})\), where \({a}_{1}, {b}_{1}\) are elements of the first aspect of H, \({a}_{2}, {b}_{2}\) are elements of the second aspect from H and so on, up to \({a}_{p}, {b}_{p}\), which are elements of the p-th aspect of H (Wehmuth et al. 2016).

To carry out the analysis of this work, we create two MAG models as described below.

MAG 1 model

The MAG 1 Model is a Multi-Aspect Graph with an order of \(p=4\), i.e., four aspects. This model has two layers, one referring to smart card users and the other referring to bus stops, which is one aspect. The second aspect is an identifier (user’s smart card or bus stop number). The third aspect is one of the 75 neighborhoods of Curitiba (the neighborhood of the user’s home or neighborhood where the bus stop is located). Finally, the fourth aspect is the smart card registration date and time. This model presents edges between layers and between the nodes of the bus stop layer, as shown in Fig. 1. An edge is created between the user and the accessed bus stop for each smart card entry. If it is not the user’s first entry on the day, an edge is also created between the last accessed bus stop and the current one, creating the user’s path during the day in the bus stop layer.

Fig. 1
figure 1

MAG 1 model: a Multi-Aspect Graph with four aspects (bus stops, identifier, neighborhood, and smart card registration) and two layers (smart card users and bus stops)

MAG 2 model

The MAG 2 Model has three aspects and three layers. The first is the MAG layer, the same as in MAG 1 Model, with a layer referring to the healthcare units. The second is an identifier, for instance, a user smart card, bus stop number, or healthcare unit code. The third is one of the 75 neighborhoods in Curitiba, i.e., the neighborhood where the user lives, the neighborhood where the bus stop is located, or the neighborhood where the healthcare unit is located.

Fig. 2
figure 2

MAG 2 model: a Multi-Aspect Graph with three aspects and three layers (healthcare units, identifier, and neighborhood

As we can see in Fig. 2, MAG 2 has edges between the layer of the bus stops and the layer of healthcare units created following the attendance of the database, in addition to the edges described in the MAG 1 model. Each attendance has an associated public transit user, as described in Sect. . Thus, each smart card entry the user makes on the day has an attendance, and an edge is created between the bus stops and healthcare unit layers. It is worth mentioning that, in both models, the edges are weighted to quantify the displacements existing in a given connection, where each displacement has a weight equal to 1.

Percolation centrality

Some studies have successfully modeled the spread of disease as a specific example of percolation in networks (Newman and Watts 2000; Sander et al. 2002; Meyers et al. 2006). We used the Percolation Centrality measure established by Piraveenan et al. (2013), which quantifies the relative importance of, in our case, disease spread for each node in the network based on their topological connectivity as their percolation states.

The percolation state of a node i at time t is denoted by \(x_{i}^{t}\). Only \(x_{i}\) is used when there is no temporal aspect. If \(x_{i}^{t}=0\), this indicates a non-percolate state of the node. If \(x_{i}^{t}=1\), it indicates a fully percolated state. It is also possible for a node to assume a partially percolated state, corresponding to \(0<x_{i}^{t}<1\).

The percolation centrality of a given node is defined as the proportion of percolated paths that pass through that node. A percolated path means the shortest path between a pair of nodes, where the node of origin is percolated (in our study, it is the infected concept). The target node can be percolated (\(x_ {i}^{t}=1\)) or not (\(x_{i}^{t}=0\)), or be in a partially percolated state. Mathematically, the percolation centrality of a node v in time t is represented by Eq. 1:

$$\begin{aligned} {PC}^{t}(v)=\frac{1}{(N-2)}\sum _{s\ne v\ne r}\frac{\sigma _{s,r}(v)}{\sigma _{s,r}}\frac{x_{s}^{t}}{\big [\sum x_{i}^{t}\big ]-x_{v}^{t}} \end{aligned}$$

where \((N-2)\) is a scaling factor (more details can be found in Piraveenan et al. 2013), \(\sigma _{s,r}\) is the number of shortest paths between the node of origin s e the target node r, while \(\sigma _{s,r}(v)\) is the number of shortest paths between the origin node s and the target node r passing through the node v.

Some nodes must be identified as infected for processing this centrality. The critical value of each node in the network is expressed in decimal and is between 0 and 1. Only nodes from the smart card users layer could be infected in our models. A node attribute named ’percolation’ was used to identify such nodes, where 1 indicates infected and 0 is not infected.

Radius of gyration

To analyze the urban displacement performed by the users of the smart card, the radius of gyration (González et al. 2008) was used, calculated according to Eq. 2:

$$\begin{aligned} {r}_{g}=\sqrt{\frac{1}{n}\sum _{i=1}^{n}{({r}_{i}-{r}_{cm})^{2}}}, \end{aligned}$$

where n is the number of entries on the smart card, \(r_i\) is the geographic location of the entry given by latitude and longitude, and \(r_ {cm}\) is the center of mass given by the midpoint between all entries on the smart card. The distance between the entrance and the center of mass (\(r_i-r_{cm}\)) was calculated using the Euclidean distance.

Materials and methods

Curitiba and its public transit and health system

According to the IBGEFootnote 1, the municipality of Curitiba has 1,751,907 inhabitants in a total area of 434,967 km\(^2\). The North-South extension is 35 km, and the East–West extension is 20 km. Curitiba is divided into 10 Administrative Regions, covering its 75 neighborhoods, as shown in Fig. 1. For instance, the ‘Matriz’ region is composed of 18 neighborhoods, one of which is the ‘Centro’ neighborhood. All of these neighborhoods are covered by the public transit system in Curitiba. Such a transit system has exclusive corridors where the BRT (Bus Rapid Transit) buses circulate.

These buses pass through several integration terminals that receive the feeder buses in the neighborhood, allowing system integration. The system also has circular lines between communities, allowing users to move from one community to another without traveling through the city’s central region. Direct lines offer faster trips with fewer stops on the itinerary. These direct lines have a specific type of bus stop for them and the integration terminals. Another essential feature of public transit in Curitiba is the integrated fare. By paying only one ticket, citizens can compose their route, moving around the city.

Since 2002, Curitiba has used the electronic ticketing system for public transit. The implementation of this system was necessary to reduce the circulation of cash in the transit system, speed up the boarding and passage of users through turnstiles, discipline and measure the use of the transit system by categories that use free access, in addition to reducing the operating costs of the system (Taniguchi and Duarte 2012).

According to URBSFootnote 2 (Urbanização de Curitiba S/A - URBS 2018), in 2018, an average of 1,365,615 passengers were transported each business day. Additionally, according to the same report, about 60.96% of fares were paid using the smart card. The number of active smart cards in 2018 was 1,928,184, divided into three of the four existing transit categories: User Card, Exemption Card, Student Card, and Single Card (for this category, there is no information in the data provided by URBS).

Concerning the healthcare system, Curitiba has 111 Basic Health Units, 9 Emergency Care Units, and 8 Medical and Dental Specialty Centers. All neighborhoods in the city are covered by service units, as shown in Fig. 3. An average of 330,000 patients is served monthly, according to the management report provided by the municipal health department (Secretaria Municipal de Saúde de Curitiba 2018).

Fig. 3
figure 3

Curitiba neighborhoods and healthcare units (figure adapted from IPPUC (2023))

Data pre-processing

This section describes relevant information about data pre-processing.

Inferring neighborhood of residence

The main objective of this stage was to establish information about the neighborhood where the public transit users live. Knowing that only the entry in the system is registered, as a result of this pre-processing, we keep only the data from which we could infer the origin based on the usage history of each smart card.

Fig. 4
figure 4

Flowchart of pre-processing of data related to transit

Figure 4 shows the data pre-processing steps, from obtaining raw data to creating the database used in our study. Each of these steps is detailed below. The use of the smart card represents a total of 17,511,710 records and the information about the vehicles’ 129,461,200 records. We used different approaches to infer the neighborhood where each smart card was used. When users enter the transit system at tube stations or integration terminals (exclusive BRT bus stops and some special lines), the bus line code and the vehicle code linked to this entry are directly related to the bus stop location. Therefore, the identification of the neighborhood happens with high precision. When user input occurs at regular bus stops, we look for a match using the line code, vehicle code, date, and time, to obtain a GPS (Global Positioning System) coordinate for each smart card entry.

There are cases in which the GPS position recording interval is extended. In Curitiba, bus updates, including GPS coordinates, are recorded every 5 s. The approximate area where the smart card was used is inferred with such information. Thus, we maintain only one update per minute for each vehicle since, in one minute, it is considered that the bus has yet traveled a long distance. After this filter, the new vehicle information represents 18,594,495 records, which allows the inference procedure to be performed more efficiently without considerably compromising quality.

It is worth mentioning that when combining the attributes of line code, vehicle code, date, and time between the smart card base and the bus records, it was observed that some buses had the line code and the code of the vehicle filled with generic value. In these cases, these records have been deleted because it is impossible to guarantee the identification of the GPS location of the card entry.

After enriching a smart card entry with latitude and longitude associated with the neighborhood information for the specific bus stop, this procedure, firstly, the inferred latitude and longitude were compared with the bus stop closest to the particular line of the smart card entry. This procedure was done using the R-Tree data structure. The R-Tree was proposed by Antonin Guttman and is widely used as a spatial access method, allowing the indexing of multi-dimensional information as geographic coordinates (Guttman 1984). At the end of this stage, we could infer the neighborhood for 5,388,638 smart card entries.

Some people lend their smart cards, with or without financial advantages, to others traveling on the same bus line, generating noise in the records. Also, some entries are made with a few minutes difference on the same line and the same bus, generating many entries for the same smart card. To minimize the impact of these smart cards in our analysis, we exclude smart cards with more than 150 entries. The rationale behind this value is that the study period is 69 days (53 working days and 16 weekend days) and that each user typically uses, on average, at least two passes per day. In a similar line of reasoning, smart cards with less than ten entries were excluded during the analysis period. They represent a small number; on average less than one entry per week. By disregarding them, we also intend to make the results more robust. After these exclusions, the number of entrances to the neighborhood decreased to 5,225,573.

A significant increase in the number of entries was observed in the data analysis at around 5 am. This growth reaches its peak around 7 am, and the phenomenon ends at 10 am, as shown in Fig. 5. Due to this scenario, we consider the entries made between 5 and 10 am for identifying the home neighborhood. This period is when users are likely to use public transit to move to an everyday activity, such as working or studying.

Fig. 5
figure 5

Hourly smart card entries

In possession of the card entries between 5 and 10 am, we group the data by smart card and select the most frequent neighborhood as housing, requiring at least five entries in this neighborhood to be considered. When analyzing the result, we verified that the established limits preserve many smart cards without compromising the analysis. After this procedure, we obtained 92,857 exclusive smart cards containing the home neighborhood.

Despite the possible limitations of our approach, we have obtained evidence that it is a good approximation, as explained below. To evaluate our approach, we constructed a different dataset of smart card entries classified manually based on interviews with the smart card owner. This way, we obtained the neighborhood of origin—this extra dataset is our ground truth. We interviewed users with different urban mobility patterns to build this ground truth dataset. When applying our approach to the ground truth, we identified all the information correctly. The construction of this ground truth dataset is very challenging for several reasons, such as identifying volunteers willing to disclose private information; however, this experiment is vital to help us get a sense of the quality of our approach.

Use of health services

The data referring to the use of services in the health units total 972,826 records. These records are distributed in 128 health units in Curitiba and cover the period from September to November 2018. Figure 6 shows the pre-processing stages from obtaining raw data to creating the database used in our study. Each of these steps is detailed below.

Fig. 6
figure 6

Health data pre-processing flowchart

For each service, an attempt was made to associate one of the 92,857 smart cards mentioned in the previous section. The fields in the patient’s record used to carry out such correspondence were birth date, sex, and neighborhood. The birth date and sex of the user of the smart card are already present in the data set itself, and the neighborhood of residence was inferred according to the methodology described in “Inferring Neighborhood of Residence” Section. This methodology proposes an approximation of time to determine the nearest bus stop and a spatial approximation because the neighborhood of a bus stop is treated as the user’s home neighborhood. To minimize possible problems associated with those approximations, we created a list of possible neighborhoods for each smart card with an associated neighborhood to be considered as the user’s home neighborhood.

We consider all latitudes and longitudes registered for each smart card entry. Then, we average all latitude and longitude entries between 5 and 10 am. We use this average location as a central point of a circumference of a 500 m radius, and all neighborhoods within this area were classified as candidate users’ home neighborhoods.

After associating the transit users with the healthcare attendance by employing the birth date, sex, and home neighborhood (considering the candidate home neighborhood for the users), it was necessary to verify whether the corresponding user used public transit on the day of the healthcare attendance.

At the end of this procedure, disregarding the attendances associated with more than one card, 24,786 records were kept in the database. A duplication identification is verified on those records; if the difference between two records associated with the same user was less than two hours, one of the records was considered invalid and excluded from the database. At the end of these steps, 18,225 records were considered with an associated public transit user.

Experiments and results

The results described below comprise a period corresponding to seven consecutive days, from October 02 to October 08, 2018, covering both weekdays and weekends.

Analysis of vaccination scenarios

Three scenarios were established to analyze the evolution of the infection under different vaccination scenarios:

  • Scenario 1: Vaccination of users of the transit system at random with 0.25 probability;

  • Scenario 2: Vaccination of users of the transit system that access the 100 most critical bus stops, with 0.25 probability;

  • Scenario 3: Vaccination of users of the transit system that access the 100 most critical bus stops, with 0.5 probability.

For these simulations, MAG 1 Model was used in two city neighborhoods, ‘Centro’ and ‘Pinheirinho,’ which correspond to strategic regions as presented in our previous work (Santin et al. 2020). Choosing ‘Centro’ is interesting because it is the most connected and high-traffic neighborhood. Study ‘Pinheirinho’ is interesting because this neighborhood has one of the bus terminals most accessed by the population, generating high traffic. Still, its location is quite distant from the city’s central region.

Five simulations were studied in the three scenarios presented: Scenarios 1, 2, and 3 considering ‘Centro’ and Scenarios 1 and 3 (most critical) considering ‘Pinheirinho’.Footnote 3 Each simulation has an initial infection since the percolation centrality is started from the previously infected node, thus determining critical bus stops.

When ‘Centro’ is the simulation’s focus (i.e., the infection starts on it), all users who lived in ‘Centro’ and used transit on the first day of the simulation were infected in all scenarios. Similarly, when ‘Pinheirinho’ is the focus, we infect all smart card users who live in ‘Pinheirinho’ and ride public transit on the first day of the simulation.

Critical bus stops were determined by applying the percolation centrality to MAG 1 Model shown in Sect. , excluding the date and time aspect. With this exclusion, all records of smart card users are connected, allowing routes of more significant traffic to be represented by heavier edges.

Table 1 Hundred most critical bus stops by regional of the scenarios with infection started in ‘Centro’ and ‘Pinheirinho’

At the end of each simulation, 100 bus stops with the highest percolation value are filtered and displayed by region in Table 1. It is possible to notice a similarity in the number of critical bus stops by region differentiated, with an increase, especially in the neighborhood where the infection started.

After the scenario is instantiated with users infected on the first day, for the other days, new users infected are counted at the end of the day, and the new ones are vaccinated at the beginning of the next day. The infection is the same for all scenarios where the user is infected with the probability of the percolation value for each bus stop accessed on the day. As the infection is cumulative, once infected, the user continues to be infected until the end of the week. A user is considered vaccinated when selected with the probability defined in each scenario. Vaccinated users cannot be infected and, thus, not spread the infection. In all scenarios, five repetitions were performed to minimize possible errors due to randomness in the definition of infected and vaccinated users.

Fig. 7
figure 7

Number of infected and vaccinated users normalized by the random vaccination scenario according to the initial infection

The objective of establishing different vaccination scenarios is to verify whether a strategy can be significantly beneficial instead of a random vaccination in terms of growth in the number of infected and the number of doses used. Looking at Fig.  7a and b, which show the scenarios with initial infection in the ‘Centro,’ it is possible to see that scenario 3 has advantages both in the number of doses and in the number of infected people concerning random vaccination (scenario 1). This means that 35% decrease in infections at the end of the week using less than 60% of the doses than in the random vaccination scenario. Even the second scenario shows that using a strategy with a tiny amount of doses ends up being worse when compared to a random vaccination regarding the spread of the infection. The same analyzes were carried out where the initial infection occurred in ‘Pinheirinho’. Looking at Fig. 7c and d, we can see that the growth in the number of infected people is more gradual, which is expected since the ‘Pinheirinho’ is a less connected neighborhood compared to ‘Centro’. However, the results are similar. With the application of the vaccination strategy in the 100 critical bus stops, we observe a 30% decrease in infections at the end of the week, using less than 60% of the doses than in the random vaccination scenario.

Analysis of groups of vaccinated users

In this analysis, users of public transit associated with attendance to healthcare units were referred to as patients. Three vaccination rules were created to differentiate the behavior and, consequently, the impact on the spread of the infection from different groups of vaccinated patients:

  • Rule 1: Random vaccination of patients with 0.25 probability;

  • Rule 2: Vaccination of patients who access the 20 most critical healthcare units determined by the infection started ‘Centro’;

  • Rule 3: Vaccination of patients who access the 20 most critical healthcare units determined by the infection started in ‘Pinheirinho’.

Table 2 Twenty most critical healthcare units by regional

The percolation centrality in MAG 2 Model was applied to determine the most critical healthcare units. Two scenarios were carried out. In one, all users living in the ‘Centro’ who used public transit on the first day of the simulation were infected. In the other scenario, users who live in ‘Pinheirinho’ were infected—following the same idea. Each scenario corresponds to the second and third vaccination rules, respectively. At the end of each scenario, 20 healthcare units with the highest percolation value were filtered and grouped by region in Table 2. A similarity is noted in the number of critical health units by region, differentiated only in the neighboring regions ‘Matriz’ and ‘Santa Felicidade’. Since in the scenario where the ‘Centro’ neighborhood started the infection, the region to which it belongs, i.e., the ‘Matriz’ region, received a slightly higher number of critical units. ‘Santa Felicidade’ is an interesting case, showing the potential of the approach considered in this study. More critical units are observed in that region when ‘Pinheirinho’ is the focus. This indicates that transit users living in ‘Pinheirinho’ use health services in a different region.

In the application of rule 1, 25% of the patients who accessed any of the healthcare units in Curitiba were vaccinated daily. This random vaccination was performed with 100 simulations to minimize errors of randomness and increase the confidence interval. The median number of doses of random vaccination was calculated daily in all simulations. This value was established as a limit for daily doses used in the other vaccination rules (2 and 3). That is, the same daily amount of patients was vaccinated regardless of the rule. This practice allows comparisons and conclusions based on the same criteria.

Fig. 8
figure 8

Radius of gyration of groups of vaccinated users by displacement of groups of vaccinated user

In applying rules 2 and 3, referring to the ‘Centro’ and ‘Pinheirinho’ neighborhoods as the focus, all patients in the 20 most critical healthcare units were vaccinated. The limit of the number of daily doses established in rule 1 was used in these rules. The different vaccination rules aimed to assess how much a patient moves and how much it can be considered a vector of disease transmission. Figure 8 shows the displacement of groups of vaccinated users based on their radius of gyration. Although with a subtle difference, vaccination strategies in pre-defined locations compared to random vaccination have advantages if observing the displacement of patients. In the vaccination strategies, patients move more and could spread infections to more city regions—see median values in boxplots.

Suppose we observe these vaccination rules in the use of public transit. It is observed that the patients vaccinated in the pre-established healthcare units represent the largest number of entries at times considered to be at the peak of the public transit system, as shown in Fig. 9, again showing an advantage of vaccination strategies based on points of interest. Thus, this result suggests that when considering critical units for vaccinations, we reach users who move over longer distances using public transit and those who tend to board the system at peak times. In this way, they have plenty of contact with other users in public transit.

Fig. 9
figure 9

Hourly entries by vaccinated user groups

Discussion and conclusion

This article presented a study on vaccination strategies to help reduce contagion in pre-established points of interest. It was possible to establish critical points of interest based on the percolation centrality using a time-varying multilayer network model.

We present two applications using a Multi-Aspect Graph (MAG) focused on vaccination scenarios, where temporal analysis is paramount to determining contagion and its reduction. Using large-scale data from a public transit system and the healthcare system of Curitiba, we established networks to analyze the strategies modeled in this work and their impact on the city’s mobility.

In the first application of MAG, we established a network with two layers and four aspects. The layers were related to smart card users and city bus stops. The aspects were the bus stops, an identifier, one of the neighborhoods in Curitiba, and the date and time of the smart card registration. Three scenarios were evaluated for a broader understanding of our approach.

It was concluded that vaccination using critical points of interest, spotted by our approach, specifically in the 100 most critical bus stops, with a 0.5 probability of vaccinating users utilizing the transit system, obtained promising results. This type of strategy has advantages in the number of doses and the number of infected users compared to the random vaccination in the evaluated cases. In this particular strategy, we observe 30% less infection using less than 60% of the doses used in the random vaccination scenario.

In the second analysis of the application of MAG presented in this study, a network with three layers and three aspects was established. This analysis aims to study the impact of the spread of infection on healthcare units’ patients. It was observed that vaccination strategies evaluating the displacement of vaccinated patients in critical healthcare units have advantages over random vaccination. These patients tend to move higher distances and could potentially spread diseases to more city regions. Besides, patients in these units tend to ride the public transit system at peak times, i.e., when more people are using the systems and, thus, more opportunities for contamination.

Both applications presented in this study indicated that the vaccination strategies applied at critical points of interest were more efficient and effective than a random vaccination. Different factors must be evaluated in the contamination by airborne diseases, but the contact between people is certainly one of the most relevant.

It is important to note that the applications presented in this work are related to the current SARS-CoV-2 pandemic. However, it is not limited to that. The MAG model shown in this work can be applied in other contexts, such as optimization in public transit or even with different points of interest. For example, by changing the healthcare layer in schools or universities, it is possible whether there is adequate and quality access to these institutions using public transit. Despite that, some limitations have been observed. The replicability of MAG 2 model may be hindered by the need for the user’s date of birth and gender information from the smart card to create a connection between the transportation user and healthcare unit attendance. It is not common for this information to be available in smart card data. In addition, there are some points of attention regarding scalability in terms of data and aspects. The data used in the models cover a period of seven days, and the maximum number of aspects used was four. If there is a need to scale for a relatively long time or for a context that requires a significant increase in the number of aspects, the complexity increases considerably. This can make analyzing and visualizing these models quite challenging, requiring specialized tools, techniques, and hardware.

This work showed that the proposal identifies critical points that may assist public administrators in disease prevention campaigns and vaccination strategies, optimizing resources, and reaching a larger population. In summary, the models and results presented here can be used by policymakers in the following way. By having access to a network with user mobility data in a specific region and assuming or knowing where an infection or information flow originated, it is possible to perform simulations similar to those presented to find the points that most favor the spread through the network and therefore establish strategies focused on those points. In order to slow down such spread more quickly or even act preventively, simulations can be performed with different scenarios, as exemplified in vaccination scenarios. These scenarios represent different types of investments to contain the spread, and thus decisions can be made taking into account the best cost-benefit.

There are several opportunities for future work. For instance, the model explored in this study could have been coupled with the epidemic spreading to show how these vaccination strategies based on the multi-aspect graphs and percolation centrality can dynamically reduce the contagion and its impact in the different phases of the pandemic. Moreover, it would also be interesting to consider new evaluation scenarios to understand better the effective reduction of the spread under different situations beyond the scope investigated - additional types of visual presentations should be helpful in this context to ease the understanding of the results.

Availability of data and materials

All databases used can be freely and unrestricted accessed through the Open Data Policy of the Curitiba City Hall, as follows:


  1. IBGE: Instituto Brasileiro de Geografia e Estatística.

  2. URBS (Urbanização de Curitiba S/A): the company that manages public transit in Curitiba.

  3. We opted to disregard Scenario 2 for ‘Pinheirinho’ to simplify the analysis since the main message is achieved with the Scenario 3, the most critical one.


Download references


This study was supported in part by CAPES-DS, the project SocialNet (Process 2023/00148-0 from São Paulo Research Foundation - FAPESP), and National Council for Scientific and Technological Development - CNPq (Process 310998/2020-4).

Author information

Authors and Affiliations



FG and PS ran the analysis and made the visualizations. FG, PS, MF, AM, TS conceptualized the research, evaluated the results, and wrote the paper. MF, AM, TS revised the paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Thiago H. Silva.

Ethics declarations

Ethics approval and consent to participate

The volunteers agreed to share their smart card information to build the ground truth dataset and consented to publish it. In the studied public transit dataset, users are anonymized, and no sensitive information is present—Curitiba City Hall curated these data and made them available.

Competing interests

All authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gubert, F.R., Santin, P., Fonseca, M. et al. On strategies to help reduce contamination on public transit: a multilayer network approach. Appl Netw Sci 8, 37 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: