Road network data
OSMnx (Boeing 2017) is a Python package that downloads road networks from OpenStreetMap (Haklay and Weber 2008) and constructs them into primal, nonplanar and weighted multidigraphs. This means that nodes and directed edges represent intersections and roads respectively (primal), gradeseparated roads such as overpass and underpass do not have an intersection (nonplanar), and geographic and spatial information of roads such as road length is included in the edge attributes that can be used as weights. We utilized OSMnx for our study since the above distinctive features of the package takes the dynamic nature of the road networks into account.
Framework
Urban road networks have hierarchical structures. Highlevel roads (e.g., motorways and arterial roads) transport a large number of vehicles at fast speeds while lowlevel roads (e.g., residential roads) have lower speed limits and are used to provide access between highlevel roads and local areas. Lowlevel roads have little impact on the vehicular dynamics from the perspective of global transportation. However, some lowlevel roads may provide detours and shortcuts between subregions or distribute traffic avoiding congestion on highlevel roads. Thus, arbitrary loss of lowlevel roads may have nontrivial impact on traffic flow and topological context should be considered in any road network simplification process. In our method, we distinguish trivial lowlevel roads which are superfluous from topologically important roads, then selectively omit such roads so that the topological characteristics of the original network can be preserved after simplification. To identify the redundant roads, we utilize three patterns in residential street network suggested by Southworth and BenJoseph (2013): loops and lollipops, lollipops on a stick, and gridiron. Figure 1 shows an example network for each pattern.
The loops and lollipops pattern is characterized by the presence of loops and culdesacs. Both loops and culdesacs are not likely to contribute to transportation functionality since a loop tends to get back to its starting point and a culdesac does not provide any through pass to the rest of the network. The lollipops on a stick pattern consists of a few through streets with branching off culdesacs from those streets, where culdesacs are considered as redundant as explained above. Some studies highlight the effect of culdesacs on road networks. The studies in Batac and Cirunay (2022) and Distel (2015) point out that travel from a deadend node to another is sinuous, especially if the length of the path is very short, which may translate into a degradation in the quality of travel. Also, in the study in Li et al. (2022), culdesacs may provide access points where traffic may flow into a traffic analysis zone (TAZ) of interest from outside the TAZ. As some features of a TAZ are computed using the number of access points to the TAZ, culdesacs are not trivial in their study. Consequently, studies in Batac and Cirunay (2022), Li et al. (2022), and Distel (2015) show that culdesacs have local impacts on road networks. Since in this study, we perform a citylevel analysis of road networks and focus on the networkwide properties, we remove culdesacs in the proposed simplification framework ignoring their potential local impacts.
The gridiron pattern is a simple system of two series of parallel streets crossing at right angles to form a pattern of rectangular blocks, which provides a lot of route choices. Also, the studies in Distel (2015) and Daganzo et al. (2011) argue that gridiron pattern may have a critical impact on the road network as it may cause gridlock traffic congestion especially when traffic demand is very high. In our study, however, we simplify gridiron pattern that only consists of lowlevel roads. These lowlevel roads will remain unused since they have the same direction and length as their nearest highlevel roads and will be used only when the highlevel roads are highly degraded or disrupted. Thus we consider them as trivial elements from a networkwide traffic perspective, and decided to remove them through simplification. We used the road type information that is tagged by OSM to identify lowlevel roads and roads with residential tag are considered as lowlevel roads.
Our method identifies the target patterns using the topological, geometric, and semantic information of a road network. The framework is depicted in Fig. 2 and consists of five steps:

1.
Parallel edges are removed from the input graph leaving only the shortest edge between two adjacent nodes. Those edges are relatively long and generally used to provide access from main roads to residential areas.

2.
Selfloop edges are removed from the graph. Circular ends for easy turning at the end of roads, which are represented as selfloops in a graph, not only add unnecessary overhead, but also make a deadend node have at least two adjacent nodes (itself and its neighbor). Removing selfloops ensures deadend nodes have a single adjacent node for the next step.

3.
The graph is simplified by removing deadends, which are the nodes that have only one adjacent node and incident edges of the nodes. These components are only used to provide access to the end node and can be collapsed to the entrance of each culdesac.

4.
Areas with gridiron pattern are simplified by removing lowlevel components. The nodes that satisfy all the following conditions are removed along with their incident edges: (1) have exactly 4 adjacent neighbors, (2) the maximum length of the incident edges is less than 300 m, (3) the road type of all the incident edges is residential, and (4) at least two nodes under the conditions above are adjacent.

5.
The interstitial nodes on a single road line are removed by replacing the subedges with a single unified edge. We used the method proposed by Boeing (2017) for this step. These five steps are iterated until the input graph converges to the final graph upon which no further simplification can be made.
Tracking regional node density of the original network
The simplified network has lower node density than the original, especially in residential areas of the network, as the framework removes nodes in target areas. This can lead to different topological characteristics (e.g. centrality measures) after simplification. To circumvent this, we set a node attribute aggr_node_number to keep track of the regional node density in the original network. The value of the attribute is initialized to 1 for each node in the original graph. When a node or a group of nodes is removed for simplification, the aggr_node_number value of the node, or the summation of aggr_node_number values of the group of nodes being removed, is distributed equally to its neighbors. Intuitively, this attribute of a node in the simplified network represents the number of other nodes that were collapsed into it, which can be used to approximate the node density in the original network. In our study, we used it to better estimate the centrality measure of original networks from simplified networks as explained in detail later. There are other potential benefits of this attribute such as using it for generating origin and destination pairs in traffic simulations.
Centrality measures and estimation
In graph theory and network analysis, the basic idea of centrality is that there are relatively more central or important nodes and edges in a network. Since the first set of centrality indices were defined for social network analysis by Freeman (1978), various centrality indices have been suggested and widely applied to many other fields of study including road network analysis (Porta et al. 2006; Park and Yilmaz 2010; Zhang et al. 2011; Huynh and Selvakumar 2020). In our study, we utilized two centrality indices: betweenness centrality and information centrality. Porta et al. (2006) showed those centrality indices nicely capture the backbone structure of a road network and collective behaviors. Observing the difference in the distribution of centrality measurements before and after simplification provides a measure of how much a simplification method distorts the topological characteristic of a road network.
Edge betweenness centrality
Edge betweenness centrality is a concept that generalize Freeman’s betweenness centrality to edges (Girvan and Newman 2002), which shows how frequently an edge lies on the shortest paths connecting a pair of nodes in a graph. In a road network, the higher betweenness an edge has, the more it provides shortest routes and is likely to contribute to the transportation in a city. The betweenness centrality \(C^B\) of an edge e is defined by
$$\begin{aligned} C^B(e) = \frac{1}{N(N1)} \sum _{s,t \in V} \frac{\sigma (s,te)}{\sigma (s,t)} \end{aligned}$$
(1)
where N is the number of nodes in a graph, V is the set of nodes, \(\sigma (s,t)\) is the number of shortest paths between an origin and destination pair (s, t), and \(\sigma (s,te)\) is the number of those paths that passing through edge e.
Edge information centrality
Based on the concept of efficient propagation of information over a social network, Latora and Marchiori defined information centrality as the relative drop in the network efficiency caused by the removal of a node (Latora and Marchiori 2004) where network efficiency represents how efficiently information is exchanged over the network (Latora and Marchiori 2001). Fortunato et al. (2004) generalized information centrality to edges and defined edge information centrality. Applying edge information centrality to a road network, we recognize network efficiency as the summation of the ratio between the length of the straight line and the shortest path between each origin and destination pair (s, t). The normalized network efficiency E for a weighted graph G as proposed in Vragović et al. (2005) is given by:
$$\begin{aligned} E(G) = \frac{1}{N(N1)} \sum _{s,t \in V; s \ne t } \varepsilon _{st} = \frac{1}{N(N1)} \sum _{s,t \in V; s \ne t } \frac{d^{Eucl}_{st}}{d_{st}} \end{aligned}$$
(2)
where \(\varepsilon _{st}\) is the efficiency of travel from node s to t, \(d^{Eucl}_{st}\) is the Euclidean distance between a pair of nodes s and t, and \(d_{st}\) is the length of the shortest path from node s to t. In case there is no path from s to t, \(d_{st}=\infty\) and, consequently, \(\varepsilon _{st}=0\). Thus edge information centrality is well defined for either weakly connected or disconnected graphs. The removal of an edge forces origin and destination pairs to choose alternative paths or there may not be any available alternatives, in either case the network suffers from a high drop in efficiency. For example, the removal of bridges or the removal of the only connection to a large subgraph is likely to cause high drop in efficiency. Thus such edges have high information centrality. The information centrality \(C^I\) of an edge e is defined by
$$\begin{aligned} C^I(e) = \frac{E(G_{org})E(G^e_{cut})}{E(G_{org})} \end{aligned}$$
(3)
where \(G_{org}\) is an original graph and \(G^e_{cut}\) is the graph that removed edge e from \(G_{org}\).
Estimating centrality from a simplified network
Roads and intersections in residential areas are significantly simplified by our method. To preserve the original density of nodes we use the method of preserving overall aggr_node_number discussed above. Although the removed nodes have minimal effect on the transportation functionality, the decreased node density in those areas would result in a difference in topological characteristics and centrality distribution between the simplified and original networks.
Specifically, we estimate centrality in the original network from a simplified network by using the value of aggr_node_number attribute which represents the number of nodes that are removed in the vicinity of a node. For computing centrality in a simplified network, we can use the attribute as a weight to estimate centrality of the same element in the original network.
The estimated betweenness centrality \({\hat{C}}^B\) of an edge e is defined by
$$\begin{aligned} {\hat{C}}^B(e) = \frac{1}{N(N1)} \sum _{s,t \in V'} \frac{\sigma (s,te)}{\sigma (s,t)} \times aggr_s \times aggr_t \end{aligned}$$
(4)
where \(V'\) is the set of nodes in the simplified graph, and \(aggr_v\) is the aggr_node_number value of a node v. Similarly to \({\hat{C}}^B\), we estimate the network efficiency E of the original graph G from a simplified graph \(G_{simple}\). The estimated network efficiency \({\hat{E}}\) is defined by
$$\begin{aligned} {\hat{E}}(G_{simple}) = \frac{1}{N(N1)} \sum _{s,t \in V'; s \ne t } \frac{d^{Eucl}_{st}}{d_{st}} \times aggr_s \times aggr_t \end{aligned}$$
(5)
By plugging in \({\hat{E}}\) into Eq. (3), the estimated information centrality \({\hat{C}}^I\) can be derived.
The time complexity of computation of centrality is O(VE) (Girvan and Newman 2002) for betweenness centrality and \(O(VE^3)\) (Fortunato et al. 2004) for information centrality where V is the number of nodes and E is the number of edges in a network. The attribute aggr_node_number is simply a multiplicative factor in the time complexity which is dominated by the size of graph defined by the number of nodes and edges in the graph. Thus, estimating the centrality measures in the original network from a simplified network has substantially lower computational cost than directly computing centrality measures of the original network.