Street context of various demographic groups in their daily mobility

We present an urban science framework to characterize phone users’ exposure to different street context types based on network science, geographical information systems (GIS), daily individual trajectories, and street imagery. We consider street context as the inferred usage of the street, based on its buildings and construction, categorized in nine possible labels. The labels define whether the street is residential, commercial or downtown, throughway or not, and other special categories. We apply the analysis to the City of Boston, considering daily trajectories synthetically generated with a model based on call detail records (CDR) and images from Google Street View. Images are categorized both manually and using artificial intelligence (AI). We focus on the city’s four main racial/ethnic demographic groups (White, Black, Hispanic and Asian), aiming to characterize the differences in what these groups of people see during their daily activities. Based on daily trajectories, we reconstruct most common paths over the street network. We use street demand (number of times a street is included in a trajectory) to detect each group’s most relevant streets and regions. Based on their street demand, we measure the street context distribution for each group. The inclusion of images allows us to quantitatively measure the prevalence of each context and points to qualitative differences on where that context takes place. Other AI methodologies can further exploit these differences. This approach presents the building blocks to further studies that relate mobile devices’ dynamic records with the differences in urban exposure by demographic groups. The addition of AI-based image analysis to street demand can power up the capabilities of urban planning methodologies, compare multiple cities under a unified framework, and reduce the crudeness of GIS-only mobility analysis. Shortening the gap between big data-driven analysis and traditional human classification analysis can help build smarter and more equal cities while reducing the efforts necessary to study a city’s characteristics.

urban areas (Jiang and Claramunt 2004;Cheng et al. 2013). One of the most used representations is the street network, where nodes play the role of street intersections and edges the role of streets. This representation has been fruitful [see section B1 in Barthélemy (2011) for a detailed review], allowing to characterize a city's possibilities in terms of potential mobility. Incorporating human mobility information (Jiang et al. 2013) (for example in the form of Call Detail Records, CDR) can help contrast the potential view given by the street network with actual traveling. Estimating travel demand (Toole et al. 2015), measuring congestion during peak hour traffic (Çolak et al. 2016), and road demand in connection with departing car sources (Wang et al. 2012) are some examples. Combining the best of both views can help to understand the relevance of streets (Toole et al. 2015;Wang et al. 2012), both in terms of the possibilities they offer, and the demand they receive.
Understanding the street user's view of the street network improves with the addition of mobility information, as it makes possible to weight each street based on the use each street receives. Still, multiple dimensions regarding the perception of the places and the opportunities they provide fall behind this type of data. New technologies provide solutions to this type of characterizations using street imagery-based analysis (Gebru et al. 2017;Naik et al. 2014). Methodologies based on street imagery allow researchers to detect land use (Cao et al. 2018), street context (Alhasoun and González 2019), measure income of a neighborhood (Gebru et al. 2017), perceived safety (Naik et al. 2014), or even detect car accident risk (Kita and Kidziński 2019). While manual image classification is costly, artificial intelligence (AI) can help to massively classify them, making the procedures scalable to multiple orders of magnitude without extra effort.
The topic of racial segregation in U.S. cities has a long history (Iceland and Sharp 2013;Iceland and Weinberg 2002), and has been deeply studied by multiple authors. Segregated regions of the town consist of areas where mainly only one racial/ethnic group inhabits. The process in which these segregated regions arise is not trivial, but the result of multiple factors regarding work opportunities, social connections, and the city's building (Sherman 2014;Sanchez et al. 2003;Fox 2017;Galster 1988). The possibilities offered by big data and machine learning methodologies can help to detect on greater scale aspects of the urban life in which racial inequalities in the opportunities and segregation manifest (Benthall and Haynes 2019;Gebru et al. 2017;Naik et al. 2014;Candipan et al. 2021;Hofstra and de Schipper 2018;Zhou et al. 2020).
Here we propose a data analysis framework based on the ease of managing the information layers provided by the network representation and the power of AI to manage big amounts of images (Alhasoun and González 2019). We study each racial/ethnic group's street demand of Boston city using a CDR-based model [TimeGeo (Jiang et al. 2016)]. TimeGeo allows us to represent human mobility through synthetically generated daily trajectories based on real CDR data. We use the street network to construct paths between different regions of the city. We add an extra information layer to the street network, representing each street's context, using both human and AI-based labeled street imagery. We characterize each group's street context during its daily activities using the algorithm presented in Alhasoun and González (2019). The addition of mobility with massively gathered data characterized using AI points the direction of further studies on combining contextual information, like the recorded by mobile devices, with the regions through which persons move daily.

Materials and methods
The proposed data analysis framework uses three sources of information: city information in GIS, street view imagery, and daily trajectories over the city. City information regarding population demographic and housing partition is obtained through the 2010 Census (2016), and the street network is obtained through OpenStreetMaps (2017). The street imagery is obtained through Google Street View (Anguelov et al. 2010;Alhasoun and González 2019). The daily trajectories are generated with the CDR-based model TimeGeo (Jiang et al. 2016).

Street view imagery
To provide a context for every street in Boston city, we worked with two sets of images, downloaded from Google Streets View (Anguelov et al. 2010). The first set H consists of 23, 927 images, and the second set A consists of 5998 images. We use the street network geographic representation to obtain images over the streets based on their latitude-longitude locations. Downloaded images represent a horizontal view with an ocular opening of 45 • each. To have a uniform sample over the streets, we sampled a subgroup of the GIS points representing the streets and downloaded two images at each of these points. This guarantees an equal representation of street images over all the geography represented by the streets.
The classification process can be summarized as follows (Alhasoun and González 2019): • Side use context is determined based on land use information. This information separates images on commercial and residential use. • Transportation context is determined based on the characteristics of the street. This includes highway, throughway, downtown, and neighborhood. • Special conditions on the street lead to categories like Park, Alley, and Industrial.
The number of images of each type can be found in the Table 1: The AI-based labeling methodology follows Alhasoun and González (2019) using the set A of images. Images in set A were downloaded in pairs corresponding to the same latitude-longitude location, each one comprising the half of a 90 • view. We combine the paired images to cover the complete angle of 90 • covered by them following the procedure indicated in Alhasoun and González (2019). Images in set A have been selected in a way that both images in each pair have the same context, as required by Alhasoun and González (2019). We split set A into two sets, A T and A P , with sizes 4804 and 1194. Images in A T are manually labeled in the same manner explained for set H. We use images in set A T to train a convolutional neural network (CNN) (Szegedy et al. 2016). This CNN identifies patterns in groups of pixels, seeking identifiable objects associated with each street context [see Fig. 6 in Alhasoun and González (2019)]. Several layers allow the CNN to associate parks with trees, borders of highways with highways, big streets, and certain shops with neighborhood commercial zones and other particular patterns for each category. Finally, we use the resulting CNN to predict the labels of images in A P . After manually checking images in A P , we found 97% of correctly assigned categories in the label prediction considering all categories together (the fraction of correctly assigned images per category in training set A T can be found in Table 2). The resulting number of images in each context for sets A T and A P can be found in Table 1. The overall result is above 95% and the mean above 97%.

GIS from census and street network
We used the US 2010 Decennial Census (2016) tracts and their associated information regarding the number of self-reported persons in the racial/ethnic groups of White, Black, Hispanic, and Asian. We categorized each census tract based on the population fraction in each group (fraction of self-reports in each racial/ethnic group). A tract is labeled after group g if the biggest population fraction corresponds to g and also that fraction is over 0.5. Tracts which do not meet these conditions are left unlabeled. The resulting group labels are shown in Fig 1A, while Table 3 presents the amount of tracts and city area of each group. The city presents typical segregation patterns (Iceland and Weinberg 2002). Whites are majoritarian with more than half of the labeled tracts and half the area of the city in their group. Blacks follow, with around 15% of the Boston  area. Most of the Hispanic group is concentrated in the northern region of the city, while Asians are associated with only two tracts in the downtown. Figure 1B shows the street network of the city. Notice how Hispanic and White tracts at the north are heavily dependent on the highways connecting them to the rest of the city. To associate street context to the streets, we assign a buffer radius of 10 m to each position where an image was sampled, and associate the corresponding image to every street intersecting the buffer radius. This buffer was selected to acknowledge the width of the various types of streets. This process results on a set of categorized images for each street. We define the street context of street e as a vector u e , where component u i e represents the fraction of images associated with street section e and context i. Figure 1C shows the street network after labeling each street using the images in set H. Each street is labeled after the context i that maximizes u i e , when u e has a maximum value over 0.5. Streets in contexts like Highways and Commercial Trwy. connect the city.

Daily mobility and TimeGeo trajectories
TimeGeo (Jiang et al. 2016) is a primarily CDR-based Jiang et al. (2013) mechanistic modeling framework that generates urban mobility patterns with a resolution of 10 min and approximately 400 m. It generates daily trajectories, representing aspects like activity stay duration, visited Location Points (LPs) per day, and daily mobility networks. TimeGeo divides LPs into three categories: home, work, and other, depending  on the activity realized in that LP. Both home and work locations are uniquely defined, meaning that each agent has only one associated home LP, and one associated work LP. However, they may transit to both of them many times during their daily trajectory. Other-type activities have different behavior. Agents may arrive at multiple different other LPs during the day. The spatial choices of visited locations are modeled by a rank-based exploration and preferential return mechanism (Jiang et al. 2016). These locations are originally selected from a grid over the city, with 400-m side cells representing the region where the activity occurs. For this work's purpose, we do not consider the different types of activities performed by each user. Furthermore, we convert the original trajectories consisting of latitude-longitude pairs to sequences of tracts, associating each point to the census tract containing it. So a trajectory consists of a sequence representing the tracts where the activities are performed. We discard from the trajectories sequential activities realized on the same tract. Our focus is to consider the effects over the four different groups. Thus we only consider trajectories departing from tracts with a defined group label, and label each trajectory associated with that group. Table 4 presents the number of considered trajectories per group. Note that TimeGeo generates trajectories proportionally to the tract population. As tracts are approximately equal in population, then the number of trajectories per group is proportional to the number of tracts in that group.
To relate the sequences of census tracts with the street network geographic representation, we use the OSMnx (Boeing 2017) python package which downloads information from OpenStreetMaps (2017). We construct the street path between two census tracts using the graph representation of the city streets. The network edges represent the street segments and the nodes represent the street intersections, so we refer to the edge corresponding to street segment e with the same letter. When going from census tract A to census tract B, the starting and ending points are selected as the centroids of tract A and B, respectively. Then, we identify the geographically nearest street network nodes to the centroids using OSMnx, and calculate the shortest path between them in the street network's graph representation. This process converts a travel between tract A and tract B into a sequence of edges of the network (with each edge being associated with a street segment) used to reach the centroid of A to the centroid of B. This process ultimately converts every TimeGeo trajectory in a set of street network edges representing the shortest path among the tracts' centroids in which the trajectory's activities occur.
While considering that all the trajectories from a tract start from its centroid conflates all of them in a unique trajectory, this reduction can still be realistic. The majority of the trajectories are expected to follow the shortest and fastest routes between locations. Given that a trajectory departs from a location in tract A to reach another site in tract B, we can expect that a significant portion of the route connecting both locations will use of the fastest route connecting A and B, which can be seen as the one connecting their centroids. This approximation becomes better as the distance between locations becomes greater.

Network weighting by edge demand
We use the sequence of edges constructed from each trajectory to weigh each street network edge based on the number of times that edge appears on the trajectories of each group. The weight of an edge e, d g e equals the number of times edge e is included in the trajectories departing from tracts in group g. d g e represents the demand over edge e of group g. The total demand over edge e is calculated as d e = g d g e . To consider how the demand over an edge is shared among the groups, we compute f g e = d g e /d e , the fraction of the total demand over edge e corresponding to group g. To detect if an edge has a particularly high fraction of demand, we compute the total demand of each group d g = e d g e , the total demand d = d g , and the corresponding fraction of demand f g = d g /d . We set a threshold to separate the demand levels into three categories. This threshold measures the deviation of the fraction of demand over an edge ( f g e ) from what could be expected by chance ( f g ). We consider that the demand for an edge e is high if log 2 (f g e /f g ) > 0.2 , low if log 2 (f g e /f g ) < −0.2 and average otherwise. Figure 2 presents the street network for each racial/ethnic group, with edge width representing the square root of group demand d g e , and colors representing demand (high, Fig. 2 Each image represents the street network demand for each group, with edges colored based on each group fraction of demand and width representing each group's demand on the edge. Red indicates that the fraction of demand is higher than 1.14 times the global demand fraction for the group, blue indicates that the fraction of demand is lower than 0.87 times the global demand fraction for the group, and yellow indicates the region in between average, or low). Notice that while there is a clear correlation with neighborhoods, Hispanic and Asian groups have great demand on the long routes over the city center. Also, while groups may have high demand over an edge (compared to the rest of the edges), they may not be their principal users (as other groups may have a higher demand over them).
Considering edge e, we assign it to group g if f g e > f g only for group g. If f g e > f g for more than one group, we consider it as Mixed. The result of this partitioning can be found in Fig. 3. What we see for Whites and Blacks can be expected from the neighborhood distribution of Fig. 1A, as their most demanded streets correspond geographically with their neighborhoods. Hispanics and Asians show an interesting pattern of dispersion from the tracts associated with them to the rest of the city, through the principal routes of the city. Hispanics heavily use the highways connecting the north of the city.

Street context
Once we defined each street network edge's demand, we can include the information regarding the different types of street contexts over each street. As mentioned, we associate to each street the images from set H within 10 meters distance. We represent this process's result with a vector u i e , indicating the proportion of images representing context i and intersecting edge e. We calculate the average fraction of context i for group g as C i g = 1 d g e d g e u i e . C i g indicates the average fraction of streets in context i that a group member g meets during their daily activities. To compare with the general population, we also compute C i = 1 d e d e u i e , the average fraction of context in the total population considered. The relation C i g /C i is presented in Fig. 4 for each group, calculated using the human and AI labeled images. The White group has a distribution of C i g similar to the average, as they are the city's main group. Non-White groups have a low fraction of Park context compared to the average, while White group has a slightly greater fraction than the average. The Black group has a high fraction of Residential contexts compared to the average. Both Hispanic and Asian groups have a high fraction of Downtown Commercial contexts compared to the average.  Fig. 3 Representation of the street network, with edges colored based on their group demand and width representing their total demand. If more than a group has a demand higher than expected by chance, it is considered as Mixed. Notice that while there is a clear correlation with neighborhoods, most central routes have a mixed usage, showing that they are heavily used by multiple groups The Hispanic group has a very high context of highways (Highway and Highway Ramp) compared to all other groups. Considering the tracts partition into groups, the Black group mobility occurs at commercial and residential zones of their group. Asian group is concentrated within city's downtown, with high usage of highways to reach other parts of the city. Compared to the rest, the Hispanic group has the highest context of highways as they have to reach the rest of the city from their principal region at the north of the city. An important step is to test AI labeling methods, as this is the first step towards extending these types of analysis where street context data is not manually labeled. We applied the same analysis used with set H to set A P . To compare the results, we randomly sampled 50 subsets of images from H with equal size to A P , which we call h k , with k representing the sampled subset. We calculated mean value and standard deviation of C i g /C i over the 50 h k samples, as presented in Fig. 4. The results are consistent, considering the error bars of the sampled results. White and Black group have the highest demand d and the biggest number of tracts, reducing the variation produced by the smaller image sample size in their C i g /C i value. As Hispanic and Asian groups have lower representation, extreme effects like the Highway Ramp over average context in the Fig. 4 Relation between average fraction of street context per group C i g and general average fraction of street context C i using set H ( ∼ 24,000 human labeled images). White group has values similar to the average, as they are the principal group of the city. Non-White groups have a low fraction of Park context. Blacks have a high fraction of Residential contexts. Both Hispanics and Asians have high fraction of Downtown Commercial contexts. Hispanics have a very high context of Highways. The result obtained using set A P ( ∼ 1000 AI labeled images) is similar in average to the one using sub-samples of H (samples of ∼ 1000 human labeled images), taking into account the error bars Hispanic group are difficult to capture. Due to the low sample size and the low representation of Alley context in the city, the Alley fraction of context is not well captured by set A P . Still, the general behavior is maintained at both sample sizes, and it is consistent considering the standard deviation. Increasing the sample size can reduce the variation by increasing the representation of each context.
The quantitative street context analysis points general differences among the four groups, as their context distribution has evident differences. Still, the power of images does not stop here. To point further possibilities which can be explored using other AI algorithms (Cao et al. 2018;Gebru et al. 2017;Naik et al. 2014;Kita and Kidziński 2019), we include images from set H from 4 contexts (Commercial Twry., Neighborhood Comm., Neighborhood Res. and Residential Trwy.), one image per group, in Fig. 5. These images where selected from each group's 18 most demanded streets in the particular context.
Comparing the images from Commercial Trwy., for Whites we see a wide and newly made cobblestone street, with a boulevard and publicity in the background. For the Blacks we see old fashioned shops, asphalt streets, and a mailbox. For Hispanics we see a deteriorated cobblestone street, with big residential buildings and a plaza in the middle. For Asians, downtown architecture with modern buildings and a traffic light. Comparing the four, Whites and Asians show many more cars and more modern architecture.
Comparing the Neighborhood Comm., we see local restaurants for each group. Whites show a Coffee place, in a region with big buildings. Blacks shows multiple little shops, one with Caribbean food. Hispanics have a pedestrian street with a Hispanic restaurant. Asians show multiple Asian shops, particularly a big restaurant in a central street. All the building styles are different, with Whites and Asians in a central zone, Blacks in a more residential zone, and Hispanics in a pedestrian street.
Comparing Neighborhood Res., for Whites we see a little street with multiple trees and a pedestrian cross street to multiple houses with stairs. For Blacks we see a boulevard with buildings on one side. For Hispanics we see a varying type of architecture, with low and tall buildings, and a wasteland on the street's corner. For Asians we see bigger buildings and the smaller sidewalk of the four groups. While all the images show brick houses, Whites and Asians have many more cars parked, and Hispanics show elements absent in the others (like the wasteland).
Comparing Residential Trwy., for Whites we see again the newly made cobblestone street, from a point of view that shows how the street continues to residential areas. For Blacks, we see a similar building style to the one in Neighborhood Comm. and Commercial Trwy., this time in a boulevard without trees and a park-alike zone at the side. For Hispanics we see a tinier street than for Whites and Blacks, with a big buildings, lots of parked cars and a building in construction. For Asians we see a two ways street with lower houses. Notice that only the Hispanic street does not show any signal for bikes, and also there is a bigger number of cars parked than in the rest.
Overall, we see that Whites and Asians have a similar downtown-alike context, with Asians appearing to be more restricted to the city center. Blacks buildings have an older appearance, with most worn colors. Hispanics are located in more suburban places, including wastelands and old made brick streets, thinner streets and less signals.
This comparison showed the presence (or absence) of unique elements in the images of each group that can be exploited by pattern recognition methodologies specifically designed for that purpose. The comparison also shows how even the same context can look very different based on the particular elements present for each group. State of the cobblestone streets, wastelands, a shop's name language, bike-friendly indications, buildings' age and cars parked can be used as indicators for further analysis beyond the initial categorization.

Conclusions
The presented work points to useful connections between two urban analysis branches: street demand, which is already a combination of street information and daily mobility, and contextual image analysis. This combination shortens the gap between what people actually encounter during their daily activities, and what we are able to capture through the data traces their leave behind. Working with massive information regarding mobility and its connections to the city's possibilities has already proven useful, while its crudeness leaves out aspects regarding the street user's perception, aside from what GIS data sets could offer (in the form of pre-made labels). The inclusion of images and the AI methodologies to massively characterize them can help to realize comparative studies between cities. This framework can power up urban planning and sociological analysis, allowing to put in a unified framework information from very different regions. We hope these results motivate the use of urban imagery to enrich information and communication data. This would inform applications that relate to improving ambient exposures and urban design considering diverse social groups.
Following the street network demand analysis, we saw long routes connecting different parts of the city. These routes are essential to Hispanic and Asian groups, which are concentrated in a few tracts but still move through the city. Assigning a group to each street results in a street partition similar to the tracts partition, with frontiers represented by streets with mixed demand between groups.
Street context analysis in combination with the mobility patterns provides a novel manner to explore each group's daily experiences. The context of each group reflects mostly the regions where they live. Whites are similar to the average, as they are the city's most populous group. The non-White group's exposure to Park context is below the Whites' exposure to Park context. As the Hispanic group is concentrated in the north of the city, it is heavily exposed to highways, more than any other group. This type of analysis can point out differences in the neighborhood composition and in mobility patterns (for example, the use of highways which are not particularly attached to any neighborhood), helping urban planners conceive more accessible and equal cities.
IA-labeled and human-labeled images provide consistent results in the studied data set. IA-labeling poses a promising methodology, applicable at big scale based on machine learning and network science techniques, making the analysis portable and automatic worldwide. Overall, the proposed framework points to a useful combination of data sets, allowing the quantitative characterization of context-based both in imagery and daily activities.
The comparison among groups for four particular contexts shows the images' usefulness as analysis departing material. Direct inspection shows different styles of restaurants, building construction and street makeup for each group, even while the street context is the same. The presence or absence of some elements (like street construction, bike signals and worn paint) suggests new challenges that other AI methodologies can further exploit by focusing on aspects like comfort, street makeup or the streets' safety.
We believe that this type of analysis can power up urban science methodologies based on street appearance, indicating where to focus (through demand) and reducing working costs (through AI), helping to think more accessible cities and easing the identification of inequalities between different demographic groups.