Normalized closeness centrality of urban networks: impact of the location of the catchment area and evaluation based on an idealized network

The decision of where to locate the catchment area of an urban network exerts significant influence on the indicator values and in this research this influence is referred to as the placement effect. Placement effect has significant impact on the studies at the neighborhood scale focusing on the structural properties of the network models, the network analysis results and centrality measures, the inferred movement patterns and the accessibility to destination. Placement effect becomes even more significant when multiple catchment areas are sampled to be compared or classified. This research examines placement effect on one of the most affected indicators, closeness centrality, and proposes using an idealized network as a reference to be compared with the real network in order to find a solution to mitigate the placement effects. By comparing the normalized closeness centrality in the real network with that in the idealized network, we can (1) evaluate the placement effect on the closeness centrality and (2) find the threshold distance in order to mitigate the placement effect. The results show that the closeness centrality of the same node varies remarkably depending on its position and how central it is in the chosen catchment area. Specifically, in the selected areas in this research, if the center point of a catchment area is moved by more than 100 m away from the original center point, the closeness centrality of the same node starts to be significantly influenced by the placement effect. The threshold distance of 100 m offers a recommendation that a direct comparison of the closeness centrality between different nodes in the same catchment area should be drawn only if these nodes are less than 100 m away from each other. In other words, when comparing two nodes located further than the threshold distance from each other, it is advisable to create two separate catchment areas, where these nodes serve as the center points. It should be noted that the threshold distance of 100 m derived specifically from the current research should not be generalized to other cases. The threshold distance of different case studies remains open for further investigation in the future as it may vary among cities or areas.


Introduction
Urban network system is a complex spatial system whose members connect and interact with each other.For any real-world spatial network analysis, it is essential to define an artificial border of the network model (Park 2009).Spatially confined networks or, in other words, local sub-networks of the entire global network system have been termed catchment areas (Chen and Dietrich 2021), contextual areas or bounded systems (Park 2009), subnetworks or regional networks (Rheinwalt et al. 2012).
Drawing an arbitrary boundary inevitably cuts the links connecting the catchment area under investigation with the rest of the network outside the boundary (Rheinwalt et al. 2012).However, the events, structures, behavior and dynamics of the entire global network system still affects the local sub-network inside the catchment area (Greenberg et al. 2020).The inevitable arbitrary delineation of the boundary may induce distortion of the results of the measures, which can subsequently induce bias that affects the inferences based on these measures (Paul 2014).Such distortion of the results is found to be more pronounced when the nodes or links are closer to the border of the catchment area (Okabe and Sugihara 2012).
These boundary determination problems, which have been termed the edge effect (Crucitti et al. 2006;Gil 2017;Ripley 2004) or the boundary effect (Park 2009;Okabe and Sugihara 2012), have significant impact on the studies focusing on the structural properties of the network models, the network analysis results and centrality measures, the inferred movement patterns and the accessibility to destination.First of all, the choice of the boundaries decides the internal structure of the local network model that is spatially confined within the catchment area.This decision directly influences the members of the sub-structure and topology included in the network model and, therefore, affects our understanding of the spatial structure and functional properties of the network system (Laumann et al. 1989).
Secondly, the delineation of the model boundary can cause a certain bias in network analysis results (Ratti 2004;Joutsiniemi 2010) because the analytic algorithms of network analysis are relational (Okabe and Sugihara 2012) and "network data by definition includes dependencies among observations" (Laumann et al. 1983).Similarly, syntactic values are meaningful only with reference to a system boundary that a researcher chooses for his or her analysis (Park 2009).Excluding any elements or members of the entire global network system will affect the characteristics, performance and behavior of the measurement result of the local network models.In particular, path-based measures, such as closeness centrality and betweenness centrality, are very sensitive to the boundary effect.Distortion of the results can be induced when links are cut off by an arbitrary boundary and, therefore, are not included in the calculation.The nodes and links closer to the border are less central and peripheral only because of the presence of the boundary.Usually, the boundary effect on the path-based measures is prevalent in all nodes and links in the entire network model (Rheinwalt et al. 2012) and is particularly pronounced for those at the border of the catchment area.Nodes or links near the center of the catchment area tend to have higher closeness centrality (Gil 2017) and betweenness centrality (Chen and Dietrich 2021) compared with those close to the border.
Thirdly, boundary effects can also induce a bias on the inference based on such distorted measure results.For example, Krafta (1994) has carried out tests for the correlation between different definitions of boundaries and pedestrian movement.Park (2009) has tested the predictability of human movement patterns under various boundary conditions and found that this predictability reaches its maximum at a certain radius from the boundary, which is also an indication of the presence of the size-independent boundary effects on the internal structure.
Finally, boundary definition also affects accessibility analysis.Previous studies (Sharkey and Horel 2008) related to public health issues and to nutrition and food accessibility in rural areas have been criticized for not considering resources outside of the study area, even though resources across the boundary may also affect behavior within the area under investigation (Sadler et al. 2011;van Meter et al. 2010).In response to this methodological deficiency, van Meter et al. (2010) andSadler et al. (2011) have investigated the boundary effect on reaching the retail shops from the locations within the study area, which is typically within an arbitrary administrative boundary.The results show that the boundary effect has led to considerable bias in mis-identification of food desert communities at the border of the study area, even if there is a source of food right across the border.The actual distance of traveling necessary for buying food has also been over-reported.
In order to improve the reliability and the consistency of the network analysis results across locations, a number of procedures and practices have been proposed to mitigate the boundary effect.The first approach, the catchment of catchment (Hillier et al. 1993), adds an additional boundary to create a buffer area outside the actual test area.The size of the boundary of the buffer area is larger than that of the catchment area.Network analysis is then carried out for both the catchment and the buffer area.However, the results of the network measure of the buffer area are not included in the analysis because they are distorted by the boundary effect (Gil 2017;Penn et al. 1998).Secondly, instead of one fixed boundary definition, the other mitigation method, the radius-radius analysis (Hillier 1996) or local radius analysis (Gil 2017), applies various boundary conditions by creating multiple circles around the center of the catchment area under investigation.
Although the catchment of catchment method and the radius-radius analysis have proved to be successful in mitigating the boundary effect in many empirical studies, the optimal size of the buffer and the radius remains open to further research.Gil (2017) finds that the results of the network centrality analysis are very unstable in small study areas (e.g. on a neighbourhood scale) and suggests that the study area should be embedded in a larger context.However, there remains the question of how large is a large enough context.In other words, the problem of delineating the boundary becomes the problem of deciding the radius or the size of the study area (Joutsiniemi 2005).In response to this open question, Chen and Dietrich (2021) conducted a series of experiments of the size-related boundary effects, i.e. the size effect, on the indicator values.Based on these experiments, they have suggested that, first of all, the average street length can be one of the indicators for determining the size of the catchment area and, secondly, "the size effect on the indicator is not very significant when the size of the catchment area is larger than 4000 × 4000 m 2 .Therefore, any size larger than 4000 × 4000 m 2 would not be necessary" (Chen and Dietrich 2021).
Another mitigating method, namely the moving boundary approach, consists in shifting the center of the circular boundaries with fixed size and shape to calculate network measures (Penn et al. 1998;Hillier and Penn 2004;Turner 2007;Gil 2017).However, by keeping the same shape and size of the moving network boundaries, Gil (2017) shows that indicators like closeness centrality still vary in different study areas and are particularly sensitive to the shift of network centers.Since the boundary shape and size are identical, the variation of the indicator values can only be explained by the location of the network.In other words, although the size-related boundary effect can be eliminated by adopting the moving boundary approach, the placement-related boundary problem, which is termed placement effect in this research, still exist.To be more specific, placement effect refers to the phenomenon that the variation of the indicator values depends on where the catchment area is retrieved.From this perspective, a very important question is where the center of the catchment area should be.The current research intends to further investigate this placement effect in order to provide more refined guidelines for deciding the location of the network model.

Presenting the placement effect on closeness centrality
The results of Gil's experiments (2017) show that one of the indicators that could be most influenced by the placement effect is closeness centrality (C c ), which is the reciprocal of the sum of the shortest distance between the chosen node-v and all other nodes in the catchment area.The C c of node-v can be formally expressed as the following where C c (v) refers to closeness centrality of the chosen node-v, S(v, u) refers to the length of the shortest distance between the chosen node-v and other nodes, u, and N refers to the total number of nodes in the chosen catchment area.Closeness centrality measures how fast a node exerts influence on all other nodes.For example, if the target is to spread the information in the network, a node with large closeness centrality means that it is in a position to spread information quickly.Nodes with higher value of closeness centrality can be important influencers in the network.Closeness centrality is not an absolute value as it may change depending on the location of the selected catchment area in the entire global network.A node in the center of the catchment area has the advantage of having more influence on other nodes and has higher value of closeness centrality than a node located on the borders of the network (Gil 2017).Therefore, the closeness centrality of a chosen node might not necessarily be small in the entire city street network, but it may be small in the selected catchment area only because it is not close to the center of the catchment area.
In order to demonstrate the placement effect, the Plaza Luceros in Alicante has been selected as the center point of the study area.The size of the catchment area is 3000 × 3000 m 2 and the unit of closeness centrality is 1/km.We have chosen the area size that is larger than the acceptable walking distance for the pedestrian because this allows more space to move the chosen node further away from the center in order to investigate the placement effect.One of our targets is to foster pedestrians in cities and to help to develop walkability, visibility and accessibility of points of interest.Therefore, we would like to investigate a network for pedestrian and Open Street Network (OSM) (1) can be a source of data to extract existing pedestrian networks in cities.The following highway tags of the OSM are selected to form the network within the catchment area: primary, secondary, tertiary, residential, pedestrian, steps, path and unclassified.
Eight catchment areas with different centers were selected and presented in Table 1.The center of each catchment area is indicated by a blue center point.The placement effect on the chosen node, which is indicated by the red node in each catchment area, 1 will be investigated by examining the changes in the indicator values of the eight selected study areas in Table 1.
These eight catchment areas differ in the distance between the blue center point and the red chosen node.In catchment area 1, the blue center point and the red chosen node are overlapping with each other.The centers of the catchment areas offset from 50 m in catchment area 2 to 1500 m in catchment area 8.That is, starting from catchment area 2 to catchment area 8, the blue center point gradually moves 50 m, 100 m, 200 m, 300 m, 500 m, 1000 m and 1500 m to the north from the red chosen node.
The results in Table 1 show that the closeness centrality of the red chosen node changes from 0.000678 1/km in catchment area 1 to 0.000438 1/km in catchment area 8.This change shows that the closeness centrality of the red chosen node is affected by its distance to the blue center point in all eight catchment areas.Hence, there is a placement effect on the value of the closeness centrality of the red chosen node.

Normalization of the closeness centrality in the real network
The processes of normalization has been proposed to connect the number of nodes with the indicator (Masucci and Molinero 2016).The current research also applies the normalization procedure and creates an indicator, normalized closeness centrality, C N , so that the different number of nodes in different catchment areas is balanced through the normalization and the fact that the number of nodes changes with the locations of the catchment area is now taken into the consideration.In this section, the results in Table 1 are used to explain the normalization of closeness centrality derived from two different indicators: (1) the closeness centrality and (2) the shortest distance between the chosen node and all other nodes.
A common way of determining the normalized closeness centrality is multiplying the closeness centrality of the chosen node with the number of nodes in the catchment area.The normalized closeness centrality can be formally expressed as the following.
where C N (v) refers to the normalized closeness centrality of node-v, N refers to total number of nodes in the catchment area, C c (v) refers to the closeness centrality of the chosen node-v.(N-1) refers the number of connections between a chosen node to all other nodes because the chosen node has no (or zero) connection with itself.In the case of a small urban area, in which N is not very large, (N-1) is used.In the case where N is very large, the '1' can be dropped from (N-1).
(2) To illustrate this, we can use catchment area 1 in Table 1 as an example.The total number of nodes, N, is 1606 nodes and the closeness centrality of the red chosen node-v, C(v), is 0.000678 1/km.Based on the formula (2), the normalized closeness centrality of the red chosen node-v, C N (v), accounts to (1606 − 1) × 0.006784 = 1.089 1/km.

Idealized network
Knowing the value of the normalized closeness centrality of the chosen node-v, C N (v), in the real network, we need to have a reference for different networks to be compared with in order to evaluate the value of C N (v).In this research we propose to use the idealized network to be the reference.For our purposes, the idealized network is defined to a The length of any link between two nodes can be calculated with a simple Pythagoras' Theorem  In this section the center node of the idealized network is the chosen node-v, which is used to explain how the closeness centrality of this chosen node-v changes with (1) different number of nodes in the catchment area, N; and (2) the corresponding sum of the shortest distance between the chosen node-v and all other nodes, N u=1 S(v, u). Figure 1 presents the idealized network.There is a direct connection between the center node and all other nodes.The nodes and links in such a network form a star-like pattern.The shortest distance from any node to the center node is indicated by the yellow arrow.It should be noted that such a network is only favorable to one node, which is the center node in this case.For all other nodes, such a network is not ideal because the sum of the shortest distances between all nodes and any of the non-center node is larger than the sum of distances between the center node and all other nodes.This type of network is often found in real urban networks, such as Place Charles es de Gaulle in Paris or Connaught Place in New Delhi.It is designed to give the central place a high and exceptional importance.

Idealized network as the reference for comparison
This section explains (1) the calculation of the normalized closeness centrality in the idealized network and (2) the comparison of the normalized closeness centrality in the idealized and real networks.

Calculation of normalized closeness centrality of idealized network
In order to calculate the normalized closeness centrality in the idealized network, the first step is to acquire the sum of the shortest distances from the chosen node to all other nodes.Although the side length of the catchment area (which is indicated by the black rectangular in Fig. 2) is 3000 m, because of the symmetry it is sufficient to focus on just one quadrant of the catchment area, which has the side length of 1500 m and is indicated by the red rectangle in Fig. 2.
Figure 3 presents the relationship between the increasing number of links on each side of the 1500 m × 1500 m quadrant2 and the average distance between the center node and all other nodes in the idealized network.Table 2 presents the average distance between the center node and all other nodes in Fig. 2, with an increasing number of nodes on each side of the quadrant.As the number of nodes on each side of the catchment area  increases, the node density of the network also increases.The results show that, with an increasing number of nodes on the side of the quadrant, the average distance between the center node and all other nodes, i.e.N u=1 S(v, u)/(N − 1) decreases and reaches sat- uration at 1148 m.Applying formula (1) and ( 2), one can calculate the normalized closeness centrality of the center node in the idealized network, namely

Comparing the normalized closeness centrality in the idealized and the real network
Our next task is to compare the normalized closeness centrality in the idealized and real network and to examine the relationship between the number of nodes and the closeness centrality.The closeness centrality of the chosen node in the eight catchment areas of the real network in Table 1 is plotted in Fig. 4a.The normalized closeness centrality of the chosen node in the eight catchment areas of the real network is shown by the red  curved line in Fig. 4b.The blue line in Fig. 4b indicates the normalized closeness centrality of the center node in the idealized network.
As shown in Fig. 4a, the closeness centrality of the chosen node is not normalized by the number of nodes, and its value reduces as the catchment area moves from catchment area 1 to catchment area 8.There can be several explanations for this observation.
One of the possible explanations is that, as the catchment area moves, the number of nodes increases.Increasing number of nodes indicates a higher node density and a greater length of the shortest paths.It also leads to a smaller value of closeness centrality because, by definition, closeness centrality is the reciprocal of the sum of the shortest distances between the chosen node and all the other nodes.As mentioned before, one of the contributions of normalizing closeness centrality is that the number of nodes can be taken into consideration.Therefore, the changes in the normalized closeness centrality also reflects the changes in number of nodes.
Further examination shows that, in catchment areas 6, 7, and 8 there is a higher number of nodes (1916, 2062 and 1919 nodes) in comparison with 1606 nodes in catchment area 1.Also, comparing with catchment area 1, the closeness centrality of catchment areas 6, 7, and 8 is smaller.In order to check whether the changes in number of nodes can account for the decreasing closeness centrality shown in Fig. 4a, we examine the changes in the normalized closeness centrality, as shown in Fig. 4b.
When taking the number of nodes into consideration, the normalized closeness centrality of catchment area 1 becomes smaller than that of the catchment area 6.The nearly straight line in Fig. 4a changes into the curved line in Fig. 4b, which demonstrates the effect of the normalization.This means that the decreasing closeness centrality can indeed be broadly explained by the increasing number of nodes in the catchment area.Fig. 4 Comparing the closeness centrality in the real network, the normalized closeness centrality in the real network and in the idealized network In sum, comparing with the closeness centrality, the normalized values provide additional information about the network, such as whether there is a change in the number of nodes included in the catchment area.

Finding the threshold distance between two nodes in order to mitigate the placement effect
Furthermore, in the case of the selected catchment areas in Alicante, we can use the normalized closeness centrality of the center node in catchment area 1 as a reference and plot the deviation of the normalized closeness centrality of the other 7 catchment areas.The results in Fig. 5 show that, if the center node is placed within 100 m of the original place, the deviation is less than 2%,3 which is highlighted with the gray belt.This means that, if the center node of a catchment area is moved by more than 100 m away from the original center point, the closeness centrality of the same node starts to be significantly influenced by the placement effect.As shown in Fig. 5, the percentage of deviation starts to rise after 100 m.The threshold distance of 100 m offers a recommendation that a direct comparison of the closeness centrality between different nodes in the same catchment area should be drawn only if these nodes are less than 100 m away from each other.

Conclusions and discussion
For the assessment of an urban network, it is important to define the location of the center point and a catchment area around it.The location of the catchment area, which is a sample of the entire network, exerts significant influence on the value of the indicators and this influence is referred to as the placement effect in the current research.This effect becomes even more significant when multiple catchment areas are sampled to be compared and classified.
This paper examines one of the most affected indicators, the closeness centrality.The closeness centrality of a chosen node changes remarkably depending on its position in the catchment area.If the value of the closeness centrality of a chosen node is the highest among all the nodes, one cannot be sure whether it is because this chosen node happens to be placed in the center of the catchment area or it really is the most important node in the entire street network.In this research we propose that this problem can be mitigated to a certain degree by a normalization process, since the normalization takes into account the number of nodes in the catchment area.By comparing the normalized closeness centrality in the real network with that in the idealized network, we can (1) evaluate the placement effect on the closeness centrality (2) find the threshold distance in order to mitigate the placement effect.
The results show that the closeness centrality of the same node varies remarkably depending on its position and how central it is in the chosen catchment area.In other words, the closeness centrality of a chosen node might not necessarily be small in the entire city street network, while it appears small in the selected catchment area just because the node is located far from the center of the catchment area.Furthermore, if the center node of a catchment area is moved by more than 100 m away from the original center point, the closeness centrality of the same node starts to be significantly influenced by the placement effect.
The first implication of these result is that, as a consequence of the placement effect, a direct comparison of closeness centrality between different nodes in the same catchment area is only possible if these nodes are less than 100 m away from each other.Secondly, when comparing two nodes that are further than the threshold distance, which is 100 m in the case of the current research, it is better to create two separate catchment areas where these two nodes are the respective center points.It should be noted that the threshold distance of 100 m derived specifically from the current research should be generalized to other cases.The threshold distance for different case studies remains open for further investigation in the future as it may vary among cities or areas.
As a consequence, a map like Fig. 6 that is an output of software cannot be used to compare the absolute values of closeness centrality of different nodes in the entire street Finally, we found that the relationship between the distribution of node and the difference in normalized closeness centrality between the real and idealized network may deserve some further investigation in the future study.In the idealized network, where the nodes are evenly distributed, the central node has the shortest possible sum of distances from all other nodes to the central node.Therefore, the closeness centrality of the central node in an idealized network has the highest possible value among all nodes.However, very often the closeness centrality of the central node in a real network does not necessarily have the highest value among all nodes.Take Fig. 4b as an example, if the nodes in the real network were evenly distributed, the normalized closeness centrality of the central node cannot be higher than the one of the idealized one.However, the results in Fig. 4b show that the closeness centrality of the central node is in fact higher than the one in the idealized network in all catchment areas except for catchment area 8. Therefore, we predict that the nodes in the real network are not evenly distributed and more concentrated around the node under investigation.Further investigate of the Alicante map in Table 1 confirms that predication; the nodes are highly concentrated in the historical old city and the surrounding extensions.For the future studies, it would be interesting to further investigate whether the distribution of the nodes around the node under investigation can indeed be derived by comparing the normalized closeness centrality in both of the real and idealized networks.

Fig. 2
Fig. 2 Relationship between the catchment area and one quadrant of the catchment area

Fig. 3
Fig. 3 Relationship between number of links on each side of quadrant and average distance between center and all nodes centrality of the chosen node in the eight catchment areas in the real network.(b)Normalized closeness centrality of the chosen node in the eight catchment areas in the real network in

Fig. 5
Fig.5Relationship between percentage of deviation and distance between center point and the chosen node under investigation in Table1

Fig. 6
Fig. 6 Nodes colored by closeness centrality Normalized closeness centrality dLength of a link of the idealized network D Side length of the idealized network (link) L Number of links on each side of the catchment area of the idealized network (link)

Table 1
Catchment areas with different center (blue) nodes and indicator values of (red) node under investigation in the selected catchment areas

Table 1
(continued) 1500m Center One of the possible shortest paths to the center

Table 2
Changes of measurements in quadrant with different number of nodes on each side of the quadrant