Skip to main content

Optimizing facility siting for probabilistic collection and distribution of information in support of urban transportation


Collecting and receiving information about the state of a transportation system is essential to effective planning for intelligent transportation systems, whether it be on the part of individual users or managers of the system. However, efforts to collect or convey information about a system’s status often require considerable investment in infrastructure/technology. Moreover, given variations in the development and use of transportation systems over time, uncertainties exist as to where and when demand for such services may be needed. To address these problems, a model for minimizing the cost of siting and/or collecting information while ensuring specified levels of demand are served at an acceptable level of reliability is proposed. To demonstrate the characteristics of the proposed formulation, it is coupled with another planning objective and applied to identify optimal sites for information provision/collection in a transportation system. Model solutions are then derived for multiple scenarios of system flow to explore how variations in the use of a transportation system can impact siting configurations.


Information provision and collection are essential to facilitating efficient movement in complex systems. Those navigating a system often benefit from updates on travel conditions, routing alternatives, the availability of services, as well as a wide range of other information to inform their movement decisions. Collection of information about network conditions is vital in that respect as it often forms the basis of the intelligence that is conveyed to managers and users of a transportation system. Whereas transportation systems can serve a diverse range of needs, collection of data and/or providing information to users of the systems becomes complicated given variations in travel behavior and the planning objectives of interest. For example, in the case of information provision, the value of information to travelers can be fraught with uncertainty as it depends upon how it relates to their activities and supporting path(s) of movement as well as their ability/desire to receive and utilize additional information in their decision-making process. Likewise, in the case of information collection, the value of the collected information for a particular need, such as origin–destination (OD) flow estimation, depends on how well it represents the nature of movements in a transportation system as well as how it affects the performance of the analytical task (Gentili and Mirchandani 2012).

Intelligent Transportation Systems (ITS) employ many technologies for the collection and distribution of information regarding the state of a transportation system. For example, variable message signs (VMS) provide visual traffic information and guidance to drivers at specific sites within a system. VMS can be used to disseminate a variety of information regarding incidents, detours and alternative routes, general information and warnings, availability of services (e.g., weigh stations, disease testing/ vaccination sites, etc.), road condition and weather, special events, high occupancy vehicle and contraflow lane designations, and reversible lane control (Jindahra and Choocharukul 2013; Zhang et al. 2014; Romero et al. 2020). While VMS are usually stationary, the information content that is distributed can be tailored to the needs of those traversing the site. Aside from VMS, there are other intelligent transportation system technologies that are being explored for providing vehicles with relevant information such as vehicular ad hoc networks (VANETs). Like VMS, VANETs involve locating facilities in a transportation system. However, the facilities serve to provide a virtual connection between the vehicles and the infrastructure (Lu et al. 2019). Along with providing vehicles with information in support of travel decisions, intelligent transportation systems are increasingly employing data driven applications such as detection of traffic parameters and characteristics of individual vehicles that rely upon intensive data collection from sensors in the network (Zhang et al. 2011). For example, traffic data collected via cameras, speed sensors, automated vehicle counters, personal GPS receivers, and social media apps are increasingly utilized in transportation and mobility analyses (Zheng et al. 2016). For example, in the context of ITS, such data is used to provide real-time estimates of traffic conditions and insight into the number of trips between system origins and destinations (Yim and Lam 1998; Anderson and Souleyrette 2002).

Given that information provision and collection are expensive, resource constrained tasks, minimizing the cost associated with the facilities required to effectively conduct these tasks is an important planning consideration. However, it can be difficult to predict exactly how the transportation system will be utilized at any time as well as when and where the need (demand) for information (dissemination or collection) will arise. Therefore, instead of assuming a single traffic assignment protocol when modeling system usage, a range of assignment scenarios should be considered. As adequate resources for providing and/or collection information are likely to be lacking, various provision/collection service thresholds may also need to be considered. In this paper, an optimization methodology is proposed to address these problems. First, background literature related to the proposed modeling approach is reviewed. Next, a probabilistic flow capture problem for siting facilities in a network is described. Following this, a multiobjective version of the model is applied to the siting of VMS to illustrate the tradeoffs between minimizing system cost and maximizing benefit to the system.


A variety of models have been proposed to assist in siting facilities in transportation systems to serve flow moving among network origins and destinations. In the context of providing information to travelers via facilities such as VMS, maximizing exposure to sited facilities is an important goal. The maximal covering location problem (MCLP) of Church and ReVelle (1974) and its network-based counterparts have proven very effective tools for identifying facility siting configurations that can best achieve such planning goals. The MCLP seeks to maximize coverage of demand in a geographic region provided limitations on the number of facilities that can be sited. Demand is viewed as covered when it is within a specified service range of a sited facility. To address cases where the demand to be covered is movement in a networked system, Hodgson (1990) provides an extension to the MCLP, the flow capturing location problem (FCLM). The FCLM is a linear-integer model that maximizes coverage of flow moving among pairs of origins and destinations (ODs) in a network given that a specified number of facilities are to be sited. In the basic FCLM, it is assumed that all flow between an origin and destination are assigned to a single path and facilities can be located at nodes anywhere along a path. Facilities are also considered to provide equivalent service, regardless of where they are positioned along the path. Variates of the FCLM have been described to address a range of planning problems. For instance, Kuby and Lim (2005) modify the basic FCLM to locate refueling stations for alternative fuel vehicles which sometimes require the availability of multiple fueling sites along a path. Matisziw (2019) also details a version of the FCLM in which multiple facilities can be sited along a path, where the ability of each facility to serve flow is probabilistic.

Another important objective when planning for the provision of information in a transportation system is the minimization of costs. The cost to be minimized can be those associated with traversing the transportation system (Huynh et al. 2003; Henderson 2004; Toi et al. 2005; Chiu and Huynh 2007; Boyles and Waller 2011). In this sense, the more flow that can benefit from the provided information, the more efficient the transport system becomes. Alternatively, the cost to be minimized can be those associated with acquiring and operating a set of facilities needed to serve network-based demand. To this end, Berman et al. (1992) detail a location set covering problem (LSCP) to address cases in which a proportion (\(\lambda = \left[ {0,1.0} \right]\)) of all flows must be covered. The paths \(m \in M\) each support flows \(f_{m}\) (demand) in the network and can be covered by facilities sited at network nodes (indexed \(i \in N\)). A binary-integer variable \(X_{i}\) is defined for each candidate facility to reflect the decision to site \(\left( {X_{i} = 1} \right)\) or not to site \(\left( {X_{i} = 0} \right)\). Likewise, a binary-integer variable \(Y_{m}\) is defined for each path to represent whether \((Y_{m} = 1)\) or not \((Y_{m} = 0)\) path \(m\) is covered by a sited facility. Their flow covering problem can then be formulated as follows.

$$Minimize{ }\sum\limits_{i \in N}^{{}} {X_{i} }$$

Subject to:

$$\sum\limits_{{i \in R_{m} }}^{{}} {X_{i} \ge Y_{m} \quad\forall m \in M}$$
$$\sum\limits_{m \in M} {f_{m} Y_{m} \ge \lambda \sum\limits_{m \in M} {f_{m} } }$$
$$X_{i} = \left\{ {0,1} \right\}\quad\forall i \in N$$
$$Y_{m} = \left\{ {0,1} \right\}\quad\forall m \in M$$

Objective (1) minimizes the number of facilities to be sited and Constraints (2) ensures that the demand on path \(m\) cannot be covered unless a facility is sited at one or more of the nodes \(i\) in the set of nodes \(R_{m}\) capable of serving the path (i.e., the nodes traversed by the path). Constraint (3) ensures that at least some proportion \(\lambda\) of the total network flow is served by the set of sited facilities. Constraints (4) and (5) are binary/integer restrictions on the decision variables.

In the context of locating sensors to collect information about a transportation system, a variety of applications exist for such flow covering models. For example, to help estimate the flows between origins and destinations, traffic flows are commonly recorded at locations throughout a transportation system. Given that there are practical limitations as to how many sensors can be at work at any one time, a variety of approaches for identifying the best locations for traffic sensors have been proposed. For example, Yang et al. (2006) describe an integer model that minimizes the number of sensors to be located such that at least one sensor is placed along every path in the network. This condition equates to \(\lambda = 1\) in (3) and replacing Constraints (2) with Constraint (6).

$$\sum\limits_{{i \in R_{m} }}^{{}} {X_{i} \ge 1} \quad \forall m \in M$$

Gentili and Mirchandani (2012) also seek to minimize the number of sensors needed to ensure that flow along network paths can be accurately estimated and employ a similar flow covering model. However, they implement additional constraints to ensure that the selected set of arcs are sufficient to obtain a unique solution to the path flow estimation problem.

Coverage of demand by sited facilities in many instances can entail some level of uncertainty. That is, although a facility has been sited within a given service standard of a demand location, the probability facility \(i\) can effectively serve demand \(m\) \(\left( {p_{im} } \right)\) can vary. Probability of coverage can be integrated into facility location models as part of the modeling objective and/or as a constraint(s) depending upon the desired outcome. For instance, the maximum covering location problem has been extended to maximize the probability of coverage (Daskin 1982, 1983; ReVelle and Hogan 1988) as has the FCLM (Matisziw 2019). Haight, ReVelle, and Snyder (2000) and ReVelle, Williams and Boland (2002) account for probabilistic demand coverage in the form of a threshold constraint as shown in (7).

$$\prod\limits_{{i \in R_{m} }} {\left( {1 - p_{im} } \right)^{{X_{i} }} \le (1 - \alpha_{m} )^{{Y_{m} }} \quad\forall m \in M}$$

For each demand \(m\), it is assumed that a minimum level of service reliability \(\alpha_{m}\) must be achieved before \(m\) can be considered effectively covered. To this end, Constraints (7) state that \(m\) cannot be effectively covered unless the probability that it is not effectively served by the configuration of sited facilities is less than or equal to the acceptable level of ineffective service \(\left( {1 - \alpha_{m} } \right)\). While this probabilistic threshold constraint is inherently non-linear, Haight, Revelle and Snyder (2000) demonstrate that linearization can be achieved through a log transformation as in (8).

$$\sum\limits_{{i \in R_{m} }} {\log (1 - p_{im} )X_{i} \le \log (1 - \alpha_{m} )Y_{m} { } \quad\forall m \in M}$$

In efforts to provide or collect information in a transportation system, it is important to determine how much flow among network origins and destinations would be served by a configuration of sited facilities. Provided estimates of demand for movement between OD pairs are available (i.e., OD flows), there are a variety of ways in which those flows could be assigned to paths. For example, all flow between an OD can be assigned to the shortest path, distributed among k-shortest paths, or any set of paths thought to support movement between the OD (Lam and Chan 2001). Once flows are assigned to the OD paths accordingly, the potential impact of a facility configuration can be evaluated. In some applications, a single assignment of flow is considered (Henderson 2004; Matisziw 2019). In others, the assignment of flow to paths can be allowed to vary, reflecting dynamic traffic conditions (Chiu and Huynh 2007; Basu and Maitra 2010). Some studies have specifically explored methodologies for addressing recurrent congestion (Yang 1999; Li et al. 2016) whereas others have focused on non-recurrent congestion (Huynh et al. 2003; Chiu and Huynh 2007), addressing the placement of information to best assist with the diversion of traffic to alternative routes.

All the facility siting approaches detailed in this section in some way address the way demand for a service is met by a configuration of facilities. In planning for information provision and/or collection, minimizing the number of facilities needed to serve demand is critical given the expenses involved in such infrastructure development. Given that provision of information to flow between all OD pairs in a network may not be feasible due to resource constraints and that certain OD pairs may require differing levels of information, being able to ensure a base level of service is available is also an important consideration (i.e., a threshold constraint on flow coverage). For information to be of use to network flow, aside from being observable, the information needs to be effectively conveyed. However, given any range of variables, conveying information is rife with uncertainties that need to be accounted for in the siting process (i.e., probabilistic threshold constraint). Further contributing to the complexity of this problem is the fact that typically more than one path supporting movement from an origin to a destination exists. Thus, the OD flow or demand needing service is distributed over the network in some fashion. In the bulk of the flow capturing literature, only a single path among each OD pair is considered. Only in a few cases are multiple paths supporting flow among each OD pair postulated (Riemann et al. 2015; Matisziw 2019). Moreover, most applications only consider either a single assignment of flow in a system over one or more planning periods. However, given that the ways in which OD flow utilizes the system is constantly changing, there is a need to consider multiple potential assignments of flow in a system when making decisions regarding facility placement. Next, to better account for the various conditions described above, a modeling approach for identifying optimal sites for provision and/or collection of information in a transportation system is proposed. Following the introduction of this model, an application to truck flow in a highway network is provided to highlight its computational characteristics.


Consider a transportation system represented as a directed graph \(G\) with \(N\) nodes and \(A\) arcs \(G\left( {N,A} \right)\). This system supports flows \(\left( {a_{od} } \right)\) among pairs of origin nodes \(\left( {o \in O \in N} \right)\) and destination nodes \(\left( {d \in D \in N} \right)\). It is assumed that the flows between each origin and destination are distributed over a set of viable network paths \(\phi_{od}\) according to some network assignment strategy. That is, each path \(m \in \phi_{od} \in M\), supports a certain amount of flow \(f_{m} \in a_{od}\). Facilities \(i\) can be sited along arcs (e.g., \(i \in A)\) (and/or at nodes) at a cost of \(\delta_{i}\). In keeping as much as possible with the notation presented earlier, a probabilistic flow covering problem is now formulated.

Probabilistic flow covering problem (PFCP)

$$\Psi = Minimize\sum\limits_{i \in A} {\delta_{i} X_{i} }$$

Subject to:

$$\sum\limits_{{m \in \phi_{od} }} {f_{m} Y_{m} } \ge \lambda_{od} \sum\limits_{{m \in \phi_{od} }} {f_{m} } \quad\forall o \in O,d \in D|a_{od} \ne 0$$
$$\sum\limits_{{i \in R_{m} }} {X_{i} \ln (1 - p_{im} ) \le Y_{m} \ln (1 - \alpha_{m} )\quad\forall m \in M}$$
$$Y_{m} = \left\{ {0,1} \right\}{ }\quad\forall m \in M;X_{i} = \left\{ {0,1} \right\}\quad\forall i \in A$$

Objective (9) minimizes cost of equipping network arcs with facilities that provide (and/or collect) information to network flows. Constraints (10) stipulate that at least \(\lambda_{od}\) percent of the flow between an OD pair is exposed to a facility and are akin to threshold constraint utilized by Berman et al. (1992). Thus, when \(\lambda_{od} = 1.0\), 100% of flow between the OD pair must be served by the sited facilities. When \(0.0 \le \lambda_{od} < 1.0\), only \(\lambda_{od}\) percent of flow is guaranteed to be covered. Constraints (11) follow the structure of the probabilistic threshold constraints (8) and state that path \(m\) cannot be effectively served unless the probability of ineffective service (e.g., insufficient reliable exposure) provided by the sited facilities is less than or equal to \(1 - \alpha_{m}\). Given that multiple facilities may be needed to ensure the probabilistic threshold for exposure, the path reduction techniques of Berman et al. (1992) no longer are applicable. Constraints (12) are binary/integer restrictions on all decision variables.

While model (9)-(12) addresses the coverage of OD pairs individually, it is also possible to do so in aggregate. For instance, an origin-specific approach can be adopted whereby a certain proportion of total outflow from an origin to all destinations may require coverage. In other words, instead of imposing a threshold \(\lambda_{od}\) on flow among individual OD pairs, a threshold \(\lambda_{o}\) can be imposed on all flow out of an origin. This situation can be readily accommodated in the model as shown in (13).

$$\sum\limits_{d \in N|d \ne o} {\sum\limits_{{m \in \phi_{od} }} {f_{m} Y_{m} } } \ge \lambda_{o} \sum\limits_{d \in N|d \ne o} {a_{od} } \quad \forall o \in O$$

In the proposed formulation, the way in which flow is assigned to a path connecting an OD is an input to the model. In other words, it is assumed that the way in which the network will be utilized is known. This is in fact a very common assumption in the flow capturing literature. In many models, only the shortest path connecting an OD is considered (Upchurch and Kuby 2010). More recently, variants of the flow capturing models have been proposed that consider multiple, alternative paths of movement among ODs (Gzara and Erkut 2009; Matisziw 2019). Regardless of how flow is modeled to utilize a network at any given time, there will always be uncertainty as if and to what extent that representation of network use will manifest over time. Therefore, instead of considering one or a few alternative representations of network flow, it may be worth exploring many potential ways in which flow could be assigned to paths within a system. This facet can be addressed in the model by identifying and comparing solutions for a range of alternative flow assignment scenarios.

To explore the robustness of a siting solution to multiple scenarios of flow assignment, the following experimental framework can be employed. First, derive a representative set of flow assignment scenarios \(s \in S\). While an infinite set of such scenarios no doubt exists, scenarios could be selected based on factors such as observed or hypothesized locations of disruption (e.g., accidents, congestion, etc.), different assignment strategies (e.g., all-or-nothing, user equilibrium, etc.), proportion of flow to be served and different levels of likelihood for observing information on each path. Next, the model can be in turn solved for each flow assignment scenario and the resulting siting configurations can then be examined.

Empirical study

To illustrate the mechanics and applicability of the PFCP, a case study of siting VMS in a highway system is examined. In particular, the problem of identifying VMS siting configurations for providing information to truck flows utilizing the Interstate highway system in the state of Ohio, USA is considered. This system supports truck flow among 15 metropolitan statistical areas (MSAs) (210 OD pairs). 68 directed arcs (Fig. 1), representing 7,561 km of roadway, function to provide connectivity among the OD pairs. For this experimental network, a minimum of 210 paths are needed to connect the OD pairs (i.e., one path per OD pair) while at maximum, 119,582 paths could theoretically function to support OD flow (Matisziw et al. 2007a, b). It is likely though that the number of paths that serve to support flow among origins and destinations in this network is somewhere in between these two extremes.

Fig. 1
figure 1

Ohio, USA interstate highway network

In the transportation sciences, a variety of ways of assigning OD flow to network paths have been proposed based on the hypothesized travel behavior. Thus, rather than focus on any single assignment of OD flow to network paths, a range of different assignments of flow between ODs are examined to better understand the solution characteristics of the model. Although 119,582 OD paths do exist in the system, only 118,114 connect OD pairs having non-zero truck flows. Out of these paths, three subsets of paths were selected to represent viable alternatives for movement between the OD pairs based on different network flow assignment scenarios. Flow between each OD pair \(\left( {a_{od} } \right)\) was assigned to paths \(m \in \phi_{od}\) relative to the cost of traversing alternative paths. Specifically, the inverse cost for each path is powered by a coefficient \(\beta\) and is evaluated relative to the sum of powered inverse path costs of all paths serving an OD pair as shown in Eq. (14). Therefore, when \(\beta\) is high, assignment of flow will be more highly influenced by less costly paths and distributed over a small set of paths (e.g., less paths with \(f_{m} > 0\)). Conversely, when \(\beta\) is low, assignment of flow will be less influenced by path cost and distributed over a larger set of paths (e.g., more paths with \(f_{m} > 0\)).

$$f_{m} = a_{od} \left( {\frac{{(1/c_{m} )^{\beta } }}{{\sum\limits_{{l \in \phi_{od} }} {(1/c_{l} )^{\beta } } }}} \right)$$

Second, after an initial proportional assignment of flow, all paths allocated less than 1.0 unit of flow, are removed from consideration and \(f_{m}\) is recomputed. Next, in increasing order of path costs \(c_{m}\), flows assigned to paths are rounded down to integer values and the fractional remainders are tracked. Whenever at least 1.0 unit of remainder becomes available, it is added to the flow of the incumbent path. Using this process ensures an integer assignment of OD flow on viable paths while also ensuring that total OD flow \(\left( {a_{od} } \right)\) is conserved. In this study, these steps were repeated for \(\beta = 4\) (flows distributed over 4,017 paths—many alternatives (~ 19) for each OD,), \(\beta = 8\) (flows distributed over 970 total paths—a moderate number of alternatives (~ 5) for each OD), and \(\beta = 12\) (flows distributed over 599 total paths—a few alternative (~ 3) paths for each OD). These three representations of network use (many paths, moderate paths, and few paths) will be used to evaluate the sensitivity of the PFCP to different assignments of network flow.

The probability that information sited along an arc \(i\) will be observed by flow along a path \(m\) \(\left( {p_{im} } \right)\) could be based upon many different assumptions. Here, it is assumed that all arcs can provide the same base probability of exposure \(\tau\). It is also assumed that arcs that are longer or involve more travel time relative to that of the path will be associated with a higher likelihood of exposure (e.g., given they provide more opportunity for the traveler to integrate the VMS content into their decision-making process). To account for this relationship, the length \(\eta_{i}\) of each arc \(i \in R_{m}\) can be evaluated relative to that of the path \(\sum\limits_{{i \in R_{m} }}^{{}} {\eta_{i} }\) and calibrated by a scalar \(\kappa\) to represent the additional level of likelihood of exposure to be offered over the base level as shown in (15).

$$p_{im} = \tau + \left( {\kappa \frac{{\eta_{i} }}{{\sum\limits_{{i \in R_{m} }} {\eta_{i} } }}} \right)$$

Studies have reported wide variation in the proportion of drivers observing VMS messages, anywhere between 33 and 97% depending on the context of the study (Chatterjee et al. 2002). In this application, all arcs are assigned a based likelihood of exposure of at least 0.7 \((\tau = 0.7)\) and are scaled by \(\kappa = 0.1875\) to allow up to an additional ~ 19% likelihood to be added based on the length of the arc in relation to the length of the entire path.

In this application, it is assumed that the cost of deploying VMS in network is considered as a function of length of the arc on which VMS is to be installed (i.e., \(\delta_{i} = \eta_{i}\)). The PFCP also requires selection of values for proportion of OD flow that must be served (e.g., \(\lambda_{od}\) or \(\lambda_{o}\)) and the minimum level of service reliability that is required \((\alpha_{m} )\) which would be determined based on the planning goals of those managing the infrastructure. As these parameters can vary in practice, a range of values are explored to examine their general influence on the model solution characteristics and output. For each of the three representations of the network (many paths, moderate paths, and few paths), 9 model parameterizations involving different combinations of \(\lambda_{o}\) and \(\alpha_{m}\) are considered. Three values of \(\lambda_{o}\) (the proportion of flow leaving each origin to be served) are examined: a) a low threshold (\(\lambda_{o} = 0.20\)), a moderate threshold (\(\lambda_{o} = 0.60\)), as well as 100% service (\(\lambda_{o} = 1.0\)). The \(\lambda_{o}\) values are then each paired with three values of \(\alpha_{m}\) (the minimum level of service reliability that is required for effective exposure): (a) \(\alpha_{m} = 0.78\), (b) \(\alpha_{m} = 0.82\), (c) \(\alpha_{m} = 0.86\). These values were chosen to be higher than the base level of probability of exposure that could be offered by any single arc. Values of \(\alpha_{m} > 0.86\) were also considered, however, it was found that in this application context, there were cases in which a feasible solution to the model did not exist for those higher thresholds (e.g., there were not enough arcs available in some paths to permit the threshold to be exceeded). The Gurobi 9.0 optimization solver was used to identify optimal solutions to each of the PFCP model parameterizations.


Table 1 summarizes the optimal PFCP solutions for 9 model parameterizations for each of the three network flow assignment scenarios. For each assignment scenario and parameterization of \(\lambda_{o}\) and \(\alpha_{m}\), the cost of the selected network arcs relative to the total cost of arcs in the network (MC) is reported as is the amount of flow covered relative to total flow in the system (FC). For any given combination of \(\alpha_{m}\) and \(\lambda_{o}\) in Table 1, the MC increases for higher \(\lambda_{o}\) coverage thresholds. For instance, in the scenarios involving many OD paths where \(\alpha_{m} = 0.78\), the MC increases from 6.9 to 14.7% as the \(\lambda_{o}\) increases from 20 to 60%. That is, the cost of providing coverage to a minimum of 60% of the flow out of each origin is roughly double that required to meet the 20% threshold. Given that \(\lambda_{o}\) is a minimal threshold on the flow out of each origin that should be served, the proportion of flow covered (FC) by the facility configuration for the system as a whole depends on how the flows are distributed in the network. For example, in the scenarios involving many paths where \(\alpha_{m} = 0.78\), when the coverage threshold is \(\lambda_{o} = 0.2\), 44.5% of system flow is covered and when the threshold is \(\lambda_{o} = 0.6\), 67.3% of system flow is covered. For the network assignment scenarios involving few paths, even greater amounts of flow are covered given that the OD flow is confined to a smaller set of paths. As the threshold on the minimum level of required service reliability increases from \(\alpha_{m} = 0.78\) to \(\alpha_{m} = 0.86\), MC increases in all but a few instances. However, in many cases, the proportion of system flow (FC) that is covered decreased as the \(\alpha_{m}\) reliability threshold increases.

Table 1 Summary of optimal PFCP solutions

Figure 2a–c illustrates three example optimal siting configurations for the assignment scenario involving many OD paths that are summarized in Table 1. When the origin outflow coverage threshold is relatively low (20%) and the minimum level of exposure reliability is 0.78, cost is low (6.9%) as VMS is needed only on a small number of arcs constituting two subgraphs of the network to meet the threshold (Fig. 2a). As the threshold on coverage of origin outflow is increased to 60% and the minimum level of exposure reliability is increased slightly to 0.82, cost increases (17.3%) as VMS is needed on more arcs to meet the elevated requirements. In this case, the selected arcs constitute four subgraphs (Fig. 2b). However, the proportion of the network that would be involved in such a solution is still relatively small. When the threshold on coverage of origin outflow is increased to 100% and the minimum level of exposure reliability is increased to 0.86, most of the arcs (88.3% of the system) require VMS (Fig. 2c). In scenarios in which flow is distributed over the moderate and few OD paths, the number of arcs needed to ensure flow coverage threshold is met is a bit more than those in the many OD paths scenario. The reason for this is that the routing alternatives for the moderate number and few OD paths involve of a smaller portion of the network and as such, less arcs are utilized to support movement between certain OD pairs.

Fig. 2
figure 2

Arcs selected in optimal PFCP solutions for the ‘many OD paths’ network assignment scenario: a low flow threshold, lower reliability threshold \(\left( {\lambda_{o} = 0.2,\:\alpha_{m} = 0.78} \right)\), b medium flow threshold, moderate reliable threshold \(\left( {\lambda_{o} = 0.6,\:\alpha_{m} = 0.82} \right)\), c high flow threshold, higher reliability threshold \(\left( {\lambda_{o} = 1.0,\:\alpha_{m} = 0.86} \right)\)

One shortcoming of threshold-based optimization models is that once the threshold(s) for coverage has been met, there is no incentive to further benefit flow. For instance, should there exist more than one way to cover at least 20% of the flow out of each origin by siting VMS on three arcs, from a modeling standpoint, any of the alternative optima will suffice, even if one results in more flow coverage than the others. In such situations other evaluation criteria can be included to further distinguish among the alternatives as well as to add some comparative value to the solutions. For example, in addition to minimizing the cost of siting facilities in a network, one might also be interested in assessing some measure of benefit the facilities provide to the system. To explore this notion, the PFCP cost minimization Objective (9) is paired with the opportunity for path diversion objective of (Matisziw, 2019).

$$\Omega = Maximize\sum\limits_{i \in A} {b_{i} X_{i} }$$

Objective (16) maximizes the benefit that the facilities can present to flow in terms of providing information that can assist flow in identifying alternative ways of proceeding to the destination (i.e., options for rerouting/diversion). The benefit \(\left( {b_{i} } \right)\) of locating information on a particular arc \(i \in R_{m}\) along a path \(m\) can be measured as the percent of flow weighted path cost that could be avoided given that opportunities for diversion exists upon exiting arc \(i\). In this way, more benefit will be accrued when a greater proportion of flow weighted path cost can be avoided given information is provided at an arc \(i\). Given the biobjective formulation (Objectives (9) and (16); Constraints (10)-(12)), Pareto optimal solutions to the experimental parameterizations were identified by way of the NISE method (Cohon et al. 1979) utilizing the Gurobi 9.0 optimization solver. In sum, all 1,311 supported efficient solutions (Pareto optimal) were identified using this procedure.

While the 27 solutions described earlier relate explicitly to the PFCP, the 1,311 supported efficient solutions represent tradeoffs between cost minimization and benefit to path diversion. The 27 PFCP solutions are actually a subset of the set of supported efficient solutions (e.g., anchor point solutions). Among the set of solutions for each parameterization there is a solution at which cost is the lowest (optimizing the PFCP Objective (9)) and at which benefit to path diversion is the highest (optimizing Objective (16)). All other solutions represent tradeoffs between the two objectives. For example, the tradeoffs between the 115 Pareto optimal solutions identified for the many OD path assignment strategy \(\alpha_{m} = 0.78\), \(\lambda_{o} = 0.2\) parameterization (Table 2), are shown in Fig. 3a. Three example solutions are labeled A, B, and C in Fig. 3a and the selected arcs corresponding with these solutions are depicted in Fig. 4a–c. Solution A represents a complete focus on optimizing the cost minimization objective (Fig. 4a). 14 arcs need to be outfitted with VMS to ensure that at least 20% of the outflow from the origin nodes is covered with 78% exposure reliability. The selected arcs form three subgraphs in different portions of the state. In all but two instances, arcs representing movement for both directions between pairs of nodes were selected. Solution B (Fig. 3b) provides nearly 75% more benefit for path diversion than solution A, but involves outfitting 37 arcs with VMS at more than 4 times the cost of solution A. The arcs in this solution (Fig. 4b) form a single subgraph in the central portion of the network. Solution C (Fig. 4c) represents an intermediate tradeoff between cost and benefit. The 22 selected arcs build upon the three clusters in solution A, entailing about twice as much cost as A and about half that of B while providing approximately 42% more benefit to diversion than A and about 23% less benefit than B. In other model parameterizations there were solutions that were very close to that of the cost minimizing solution, but offered a significant improvement to opportunity for diversion. For example, among the 107 efficient supported solutions identified for the many OD path assignment strategy \(\alpha_{m} = 0.78\), \(\lambda_{o} = 0.6\) (Table 2), there is only a 4% increase in cost between the solution that reflects full focus on cost minimization (solution D) and a solution that provides a 25% increase in opportunity for diversion (solution E) (Fig. 3a).

Table 2 Number of supported efficient solutions
Fig. 3
figure 3

Objective tradeoffs among the supported efficient solutions identified for each network assignment scenario and coverage threshold given a reliability threshold \(\alpha_{m} = 0.78\)

Fig. 4
figure 4

Arcs selected in supported efficient solutions for the many OD path assignment scenario given flow coverage threshold \(\lambda_{o} = 0.2\) and reliability threshold \(\alpha_{m} = 0.78\) for: a solution A, b solution B, and c solution C

Figure 5 illustrates the number of times that each network arc appears in the 115 supported efficient solutions for the many OD path assignment strategy \(\alpha_{m} = 0.78\), \(\lambda_{o} = 0.2\) to provide a better perspective as to which arcs tend to be relevant to more solutions. Although all arcs appear in some Pareto optimal solutions, arcs on the periphery of the study region appear in far less solutions than arcs that provide more direct connectivity among the MSAs. The network assignment scenarios that involve a greater number of paths provide more alternatives for flow to move among the OD pairs and a richer set of Pareto optimal alternatives. While 115 Pareto optimal solutions were found for the many OD paths, \(\alpha_{m} = 0.78\), \(\lambda_{o} = 0.2\) parameterization, 91 were found for the same parameterization for the moderate paths scenario and only 5 for the few paths scenario. Similarly, for the many OD paths, \(\alpha_{m} = 0.78\), \(\lambda_{o} = 1.0\) parametrization, there were 45 Pareto optimal alternatives but only 33 and 17 for the same parameterization of the moderate and few OD paths scenarios respectively (Table 2). Greater numbers of paths with flow also translate into more opportunity for diversion. For example, all solutions for parameterizations based upon the many OD path assignment scenario (Fig. 3a) entail more opportunity for diversion (\(59,756 \le \Omega \le 119,726\)) than those based upon the moderate (\(35,073 \le \Omega \le 54,887\)) (Fig. 3b) or few (\(17,114 \le \Omega \le 32,022\)) (Fig. 3c) OD paths scenarios. Typically, the lower the outflow from each origin that must be covered (e.g., \(\lambda_{o} = 0.2\)), a greater variety of lower cost solutions can be found. However, as the threshold for coverage of flow out of each origin increases (e.g., \(\lambda_{o} = 0.6\) and \(\lambda_{o} = 1.0\)), the initial cost of simply satisfying the threshold becomes much more, prior to benefit for diversion becoming a major consideration.

Fig. 5
figure 5

Number of times arcs are selected over the 115 supported efficient solutions for the many OD path assignment scenario given flow coverage threshold \(\lambda_{o} = 0.2\) and reliability threshold \(\alpha_{m} = 0.78\)


This article presents a framework for siting facilities in a transportation system to provide (and/or collect) information to network flows. In particular, an optimization model is structured to minimize the cost of siting a configuration of facilities to serve flows between network origins and destinations. Unlike many other flow capturing models, any number of paths supporting flows among OD pairs can be considered. Given that there is typically uncertainty as to the extent to which information will be received and/or collected from flows passing by sited facilities, probabilities of exposure are associated with candidate facilities. Probabilistic threshold constraints are then incorporated to ensure that the flows are reliably exposed to the facilities before they can be considered effectively served. While this type of threshold formulation can guarantee a base level of service for network flows, it does not place any value on exceeding thresholds should the ability to do so exist in light of alternative optima. As such, a biobjective formulation is explored by introducing a maximization objective to better evaluate the characteristics of the model structure. A NISE algorithm is applied to identify all supported efficient facility configurations for distributing information to flows in an interstate highway system. To explore sensitivity of the model to variations in the representation of the transportation system, distribution of flows, exposure probabilities, and coverage criteria, a range of modeling parameterizations were examined.

Information provision and collection in transportation systems can be resource intensive given the complexities of modern urban transportation infrastructure and the ways in which it is used. In this study, various configurations of paths supporting flows among OD pairs and assignments of flow to those paths were examined to reason about impacts to solution characteristics. In the application, arc length was used to represent the cost of siting facilities to provide (or collect) information to network flows. However, there are alternative ways in which facility costs could be operationalized in this type of modeling approach. For instance, arcs could be split into smaller management units. Alternatively, the number of facilities that would be needed to effectively serve an arc could be explicitly calculated (e.g., based on some recommended minimum facility spacing). Given the spatial and temporal dynamics of flows in transportation systems, future work is needed to account for perturbations in flow when siting facilities for information collections/provision. Also, while managing the cost of siting facilities is an important consideration, other planning objectives often factor into the decision-making process. For instance, aside from general information provision, this study considered another planning criteria, that of providing information to facilitate the diversion of flows from upcoming portions of their paths that may be obstructed. To better leverage the capabilities of facility configurations over a broader range of purposes, future work is also needed as to how a wider range of planning criteria can be integrated in the modeling process. Further, while the network modeling constructs described here were motivated and demonstrated relative to their utility in planning for information provision/collection in a transportation system, they are also broadly applicable to many other networked systems of any spatial scale. Potential applications in this respect could include the location of sensors for disease surveillance and environmental monitoring, location of facilities to support humanitarian recovery efforts, siting law enforcement resources, positioning information in social networks, among others.

Availability of data and materials

The datasets supporting the conclusions of this article are available in the figshare repository,



Flow capturing location problem


Global positioning system


Intelligent transportation systems


Location set covering problem


Maximal covering location problem


Metropolitan statistical area


Non-inferior set estimation




Probabilistic flow covering problem


Vehicular ad hoc network


Variable message signs


Download references


Not applicable.


This material is based upon work supported by the National Science Foundation under Grant No. (2027891).

Author information

Authors and Affiliations



TCM formalized the mathematical model, contributed to the solution implementation routine, and led the manuscript development. AG tuned the optimization routine, analyzed a range of modeling scenarios and application results, and assisted with manuscript preparation. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Timothy C. Matisziw.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Matisziw, T.C., Gholamialam, A. Optimizing facility siting for probabilistic collection and distribution of information in support of urban transportation. Appl Netw Sci 6, 28 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: