 Research
 Open access
 Published:
Estimation of flow trajectories in a multilines transportation network
Applied Network Science volume 8, Article number: 44 (2023)
Abstract
Characterizing a public transportation network, such as an urban network with multiple lines, requires the origin–destination trip counts during a given period. Yet, if automatic counting makes the embarkment (boarding) and disembarkment (alighting) counts in vehicles known, it often happens that pedestrian transfers between lines are harder to track, and require costly and invasive devices (e.g., facial recognition system) to be estimated. In this contribution, we propose a method, based on maximum entropy and involving an iterative fitting procedure, which estimates the passenger flow between origins and destinations solely based on embarkment and disembarkment counts. Moreover, this method is flexible enough to provide an adaptable framework in case additional data is known, such as attraction poles between certain nodes in the network, or percentages of transferring passengers between some lines. This method is tested on toy examples, as well as with the data of the public transportation network of the city of Lausanne provided by its Transportation Agency (tl), and gives arguably convincing estimations of the transportation flow.
Introduction
Transportation networks determine our mobility, require a considerable amount of planning and resources, and elicit much public hopes and critics. They also constitute an endless source of inspiration in formal modeling and optimization, as attested in operations research (classical optimal transportation, maximum flow problem), quantitative geography and spatial econometrics (spatial navigation, multimodality, gravity models for flows), and machine learning (recent developments in regularized optimal transportation, such as color transfer or images interpolation; see e.g. Peyré and Cuturi 2019).
This contribution addresses a straightforward, yet central question in public transportation networks: given a network made of many train, bus, subway, or tram lines, how can one estimate the real trips made by the travelers, on the sole basis of the embarkment (boarding) counts and disembarkment (alighting) counts in each vehicle? Although estimating origin–destination flows is a much addressed issue in transportation modeling (see e.g Bell and Lida 1997; Hazelton 2000; Ashok and BenAkiva 2002; Cui 2006 and references therein), the specific problem addressed in this contribution seems, to the best of our knowledge, original.
Pedestrian transfers of travelers between different lines here constitute the missing information, which planners traditionally attempt to estimate using census, or more recently with costly and invasive devices located at station, such as mobile phone tracking or facial recognition systems. In this article, we take a different approach, which is to produce the optimal estimation of transportation trajectories without additional data, using the principle of maximum entropy. The proposed solution consists of three consecutive steps: a maximumentropy computation of the trip distributions, obeying marginal constraints and with a given prior; an update of the prior distribution by shrinking the components responsible for transfer overflow; and an update of marginal distributions. This iterative procedure clearly evokes the EM algorithm (see e.g. Dempster et al. 1977; Bavaud 2009), with the first step corresponding to the “expectation step” and last two steps to the “maximization step”. The first step only is required for solving the single line case, naturally much simpler but yet not trivial, and exhibiting a disembarking probability independent of the embarking stop (Markov property). This method also offers some flexibility, as it is possible to set a prior distribution taking into account attraction or repulsion poles among stops, and to fix hyperparameters limiting the number of transfers between lines.
As case studies, we test the proposed method on toy examples as well as with the data of the public transportation network^{Footnote 1} of the city of Lausanne, in Switzerland. Toy examples offer some kind of validation, as a transportation flow can be entirely set on toy networks and compared to the solution given by our algorithm. The real case scenario with the data from the Lausanne public transportation network demonstrates that this method is applicable on a real transportation network and can yield pertinent insights about traveler habits.
“ Notations and formalism” Section introduces the notations and the formalism, as well as the statement of the problem and the proposed solution. “Case Studies” Section contains case studies with toy examples and the Lausanne public transportation network. “Conclusion” Section concludes the article. Data, codes and extensive results can be found in the GitHub repository of the article.^{Footnote 2}
Notations and formalism
Lines, stops and junctions
Consider a transportation network made of lines numbered \(\ell =1,\ldots , q\), of respective lengths (number of stops) \(l_\ell\). Opposite lines, that is parallel lines running in the back and forth directions are considered as distinct.
The \(l=\sum _{\ell =1}^ql_\ell\) stops constitute the nodes of the transportation network. Each stop \(i=1,\ldots ,l\) belongs to a single line, and defines a unique next or forward stop F(i) (unless i is the line terminus) and a unique backward stop B(i) (unless i is the line start), both on the same line.
Let \(S_i\) denotes the set of stops which can be reached from stop i outside lines connection (with, e.g., an acceptable walking distance), excluding i itself. A stop i is referred to as an isolated stop if \(S_i=\emptyset\), and to as a junction otherwise.
Edges, trips, and the incidence matrix
Two sorts of oriented edges are involved in the transportation network:

Intraline edges \((i,j)=(i,F(i))\) belonging to a single line \(\ell (i)=\ell (j)\)

Interline or transfer edges (i, j) connecting different lines \(\ell (i)\ne \ell (j)\), involving walks from junction i to \(j\in S_i\). The set of transfer edges is denoted by T.
A sttrip, noted [s, t], consists of entering into the network at stop s, and leaving the network at t, by following the shortestpath (i.e. achieving the minimum distance, minimum time, or minimum cost), supposed unique, leading from s to t.
The succession of edges (i, j) belonging to the sttrip, noted \((i,j)\in [s,t]\), is unique. Define the edgetrip incidence matrix as
Note that we can also forbid some aberrant trips across the network, for example, trips [s, t] where (s, t) is a transfer edge (travelers making this trip do not actually do not use the transportation network). The set of permitted trips across the network is denoted by P, and can be defined following some conditions (see, e.g., “Construction”).
Transportation flows
Let \(x_{ij}\) count the number of travelers using edge (i, j) in a given period, such as a given hour, day, week or year. The edge flow \(x_{ij}\) is denoted by \(y_{ij}\) for an intraline edge (i, j), and by \(z_{ij}\) for a transfer edge (i, j). By construction, \(x_{ij}=y_{ij}+z_{ij}\), where \(y_{ij}\, z_{ij}=0\).
Let \(a_i\), respectively \(b_i\), the number of passengers embarking, respectively disembarking at stop i. By construction,
Also, \(\textbf{a}\) and \(\textbf{b}\) must be consistent, in the sense that \(A_{B(i)}\ge B_i\), where \(A_i\) (respectively \(B_i\)) is the cumulated number of embarked (resp. disembarked) passengers on the line under consideration, recursively defined as \(A_{F(i)}=A_i+a_i\) (resp. \(B_{F(i)}=B_i+b_i\)). Moreover, \(A_i=B_i\) at a terminal line stop i. This common value yields the total number of passengers transported by the line.
Let the transportation flow \(n_{st}\) denotes the number of passengers following an sttrip, that is entering the network at s and leaving the network at t by using the shortestpath. One gets from (1)
Among the passengers embarking in i, some transfer from another line, and some others enter into the network:
where “\(\bullet\)” denotes the summation over the replaced index, as in \(n_{i\bullet }=\sum _{j=1}^l n_{ij}\). Similarly, among the passengers disembarking in i, some transfer to another line, and some others leave the network:
By construction
where \(n_{\bullet \bullet }\) counts the number of passengers, and \(z_{\bullet \bullet }\) counts the number of transfers. \(z_{\bullet \bullet }/n_{\bullet \bullet }\) is the average number of transfers per passenger.
As explained in “Lines, stops and junctions” Section, transfers can only occur at junctions, that is \(z_{ij}>0\) implies \((i, j) \in T\). In particular, \(z_{ii}=0\): no traveler is supposed to disembark and reembark later at the same stop.
Statement of the problem and solution method
Automatic passenger counters measure the number of passengers entering and leaving lines at each stop [Boyle, 1998], that is \(\textbf{a}\) and \(\textbf{b}\), which provide the basic raw data of the present study, kindly provided by the Lausanne Transportation Agency (tl) for the case study in “Real Data” Section. We will suppose here that this data obeys the necessary consistency conditions for embarkment and disembarkment counts, even if, in real case studies, a rescaling must usually be performed to balance in and outflows on each lines (given in the appendix).
Intraline edge flows \(\textbf{Y}=(y_{ij})\) can be determined by (2), but transfer edge flows \(\textbf{Z}=(z_{ij})\) are, here and typically, unknown. The objective is to estimate the \(l\times l\) transportation flow matrix \(\textbf{N}=(n_{st})\). Many consistent solutions coexist in general, even for a single line with no transferts (“Markov property for a single line” Section). This issue of incompletely observed data can be tackled by the maximum entropy formalism Jaynes (1957), which has often been invoked in transportation modeling research Wilson (1967) Erlander and Stewart (1990).
Let \(f_{st}=n_{st}/n_{\bullet \bullet }\) be the distribution of sttrips (empirical distribution) and let \(g_{st}\) be some prior guess on its shape (theoretical distribution). Assuming some reasonable initial prior \(g_{st}\),

(1)
We first suppose that the empirical margins \(\alpha _s=f_{s\bullet }\) and \(\beta _t=f_{\bullet t}\) are known. Then \(f_{st}\) can be determined as the maximum entropy solution (“Maximum entropy estimate of sttrips” Section), i.e. as the distribution closest to \(g_{st}\) in the Kullback–Leibler divergence sense under the margin constraints, to be calibrated by an iterative fitting inner loop

(2)
Then (“Updating the prior distribution” Section), the prior is updated to \(\tilde{g}_{st}\) by shrinking, if necessary, the priors \(g_{st}\), thus avoiding transfer overflow exceeding the embarking and disembarking counts at each stop. Moreover, an hyperparameter \(\theta \in [0, 1]\) is used at this stage in order to control the minimum proportion of passengers entering/leaving the network at each stop.

(3)
Finally (“Updating the margin distributions” Section), the margins are updated to \(\tilde{\alpha }_s\) and \(\tilde{\beta }_t\).
With the new prior distribution \(\widetilde{g}_{st}\) and the new margin distributions \(\widetilde{\alpha }_s\), \(\widetilde{\beta }_t\), we can iterate the above steps, until convergence. The only free parameter is \(\theta\), whose effect is studied on toy examples in “Toy Examples” Section.
The above iterative solution method is somewhat reminiscent of the EM algorithm. As a matter of fact, the first step (maximum entropy) exactly correspond to the “expectation step” of the EM algorithm (see e.g. Dempster et al. 1977; Bavaud 2009), but steps two and three, aiming at calibrating parameters \(g_{st}\), \(\alpha _s\) and \(\beta _t\), do not follow the maximum likelihood rationale of the “maximisation step”. Pseudocode of the algorithm is shown in Algorithm 1.
Maximum entropy estimate of sttrips
As announced, the proportion of sttrips \(f_{st}=n_{st}/n_{\bullet \bullet }\) (empirical distribution) will be estimated from some prior guess \(g_{st}\) (theoretical distribution) and margin constraints \(\alpha _s\) and \(\beta _t\) for \(f_{st}\) by maximum entropy, i.e. by solving the problem
The Lagragian is
which gives, after deriving and setting to zero,
Using constraints in (6), we find
which yields the following iterative fitting algorithm: starting with some \(\psi ^{(0)}_t > 0\), one performs the iteration
until convergence to \(\phi _s\) and \(\psi _t\) obeying (8).^{Footnote 3}
In view of (4) and (5), the postulated margins must satisfy, for each isolated stop i
permitting to determine the total flow as \(n_{\bullet \bullet }=\frac{a_i}{\alpha _i}\), or \(n_{\bullet \bullet }=\frac{b_i}{\beta _i}\) for any isolated stop i, and thus the stflow itself as
whose plugging into (3) yields the intraline edge flows \(\textbf{Y}=(y_{ij})\) and the transfer edge flows \(\textbf{Z}=(z_{ij})\). Only the latter is required for subsequent algorithm steps and can be computed with
where I(.) denotes the 0/1 indicator function.
Initialization of the prior and the margins
The geometry of the network permits to define the set of permitted sttrips across the network denoted by P. The initial prior was chosen as the uniform distribution on admissible paths, that is as
The initial margins were chosen to initially match \(g_{st}\), namely \(\alpha _s=g_{s \bullet }\) and \(\beta _t=g_{\bullet t}\) for all stops. However, \(g_{st}\) could be chosen more carefully in case of additional data. As a matter of fact, \(g_{st}\) represent prior attractions between nodes in equation (7), before margin correction given by \(\phi _s\) and \(\psi _t\) and subsequent algorithm steps. An urban planner could choose to increase or decrease some of these values in order to take into account prior knowledge on commuting habits of inhabitants.
Embarkment, disembarkment constraints and hyperparameter \(\theta\)
After a single iterative fitting step, the resulting transportation flow \(\textbf{N}=(n_{st})\) and transfer flow \(\textbf{Z}=(z_{ij})\) have little chance to fulfill constraints (4) and (5) defined by \(\textbf{a}\) and \(\textbf{b}\), and prior distributions \(g_{st}\), \(\alpha _s\) and \(\beta _t\) must be corrected accordingly.
When \(z_{\bullet i} > a_i\), or \(z_{i \bullet } > b_i\), the found solution typically predicts that there are more passengers entering, respectively exiting, at a stop i than the actual measured quantity. One could be tempted to correct prior distributions in order to have \(z_{\bullet i} = a_i\), or \(z_{i \bullet } = b_i\), on every problematic nodes i, but the latter solution would mean that all passengers entering (resp. exiting) the line at this stop are transiting, which seems unrealistic in most real life scenarios.
We define here the hyperparameter \(\theta \in [0, 1]\) as the minimum proportion of passengers (among \(a_i\) and \(b_i\)) entering/leaving the network at each stop (in other words, not transferring), that is
or equivalently
and updates of distributions will be made accordingly.
Note that, if additional data were known, we could set a particular value of \(\theta\) for every node, and differing for embarkments and disembarkments. However, without additional information, we will restrain to this simpler case.
Updating the prior distribution
Overflow occurs in transfer edge (i, j) if \(z_{i \bullet } > (1  \theta )b_i\) or \(z_{\bullet j} > (1  \theta )a_j\). To avoid it, components \(g_{st}\) of the prior distribution will be shrinked by a suitable ratio whenever edge flows \((i,j)\in [s,t]\) exhibit overflow. For any edge (i, j), let us compute the flow ratio \(r_{ij}\) as
where \(r_{ij} > 1\) denotes an overflow through edge (i, j). For a given origin–destination [s, t], define the orgindestination flow ratio \(\bar{r}_{st}\) as the largest \(r_{ij}\) among edge flows \((i,j)\in [s,t]\), that is as
By construction, \(\bar{r}_{st} > 1\) denotes an overflow on some transfer edge between s and t. To adjust the flow, we shall divide the previous flow by this ratio
and define the new prior distribution as
where \(\phi _s\) and \(\psi _t\) are the values (8) obtained in the previous maximum entropy step.
Updating the margin distributions
By construction, the corrected flow \(\widetilde{n}_{st}\) found in (17) now respects embarkment and disembarkment constraints. We can compute the new transfer flow on edges with
and, with (4) and (5), updating margin distributions is straightforward
Markov property for a single line
A “network” made of a single line contains no transfers, and flow estimates can be obtained at once by the maximum entropy step only.
Let \(i=1,\ldots , l\) enumerate the stops in increasing order, that is \(F(i)=i+1\). The initial prior is simply \(g_{st}=c\; I(s<t)\) and captures solely the unidirectional nature of trips, where \(c=\frac{1}{(l1)(l2)}\). The margins of the empirical distribution \(f_{st}\), as well as the total flow, are here known
Following (7) maximum entropy flows are of the form
where (setting \(\Psi _s:=\sum _{t>s}\psi _t\) and \(\Phi _t:=\sum _{s<t}c\phi _s\)) the constraints (8) equivalently read
to be solved by iterative fitting.
Interestingly enough, the form (22) for the flows is reminiscent of the gravity flows of quantitative Geography Wilson (1967) Erlander and Stewart (1990) Bavaud (2002) ThomasAgnan and LeSage (2021), where \(\phi _s\) is the push factor, \(\psi _t\) is the pull factor, and \(I(s<t)\) the distance deterrence function. Yet, instead of being symmetric in s, t and decreasing with the distance \(st\), the distance deterrence function is here asymmetric due to the line orientation, but otherwise constant.
This constancy entails the following Markovian behaviour for flows: let \(m_{st}\) be the number of travelers embarking at stop s and still inside the line at stop \(t>s\), and let \(\rho _{st}\) the probability that travelers embarking at s will disembark at t. By (22),
The empirical estimate of \(\rho _{st}\) is given by the proportion, among the travelers embarking at s and still present at \(t>s\), of travelers disembarking at t, that is
which depends on t only: it appears that the disembarkment probability \(\rho _t=\frac{\psi _t}{\psi _t+\Psi _t}\) at t is independent of the embarkment stop s. Said otherwise, a traveler embarking at any stop s (and thus necessarily in the line at \(F(s)=s+1\)) experiences the same disembarkment probability at each further stop \(t>s\).
This Markov property, enjoyed by maximumentropic flows, contrasts other possible solutions, such as the “first in, first out” (FIFO) flows (homogenizing the traveled distances among users) or the “last in, first out” (LIFO) flows (tending to generate maximally contrasted traveled distances).
Case studies
Case studies are divided in two sections. In the first section, we test the algorithm on toy examples, which are artificial networks where the transportation flow \(\textbf{N}_\text {ref} = (n^\text {ref}_{st})\) is randomly drawn. These examples enable some kind of validation of the algorithm, as the ”real” transportation flow is known and can be compared to the solution \(\textbf{N} = (n_{st})\) given by our method. This setup differs from the second section, which is dedicated to applying the algorithm to the real case of the public transportation network of the city of Lausanne (tl), where embankment and disembarkment flows are measured but the real transportation flow is unknown. This second case study shows that the algorithm is applicable on large, real datasets and can give insights about passengers probable routes in the network.
Error measurements
In all case studies, we obtain a estimation of the transportation flow with the algorithm, noted \(\textbf{N} = (n_{st})\), starting from the real embarkment flow \(\textbf{a}_\text {ref}\) and disembarkment flow \(\textbf{b}_\text {ref}\). In toy examples, we also have access to the real transportation flow \(\textbf{N}_\text {ref}\). There are two types of dissimilarity measures between the data and the solution proposed by the algorithm: (1) if we have access to \(\textbf{N}_\text {ref}\), how much \(\textbf{N}\) differs from it, and (2) how well constraints defined by \(\textbf{a}_\text {ref}\) and \(\textbf{b}_\text {ref}\) are respected. The first dissimilarity is measured through the mean transportation error, denoted by \(\text {MTE}(\textbf{N})\), and computed as
and the second one with the mean margin error, noted \(\text {MME}(\textbf{N})\), defined as
where \(z_{ij} = I((i,j) \in T)\sum _{st} \chi _{ij}^{st} n_{st}\) is the flow on transfer edges T. Both errors can be interpreted as a weighted mean of error percentages.
Note that, by construction, the MME should be zero when the algorithm converges. However, it can be informative to track down this error along iterations and, in some practical cases where margin constraints are impossible to fulfill, the algorithm convergence criterion is reached with MME \(> 0\).
Toy examples
Construction
All constructed toy examples are built following the same approach, which aims at being simple but somewhat realistic. We fix a number of round trips \(p \ge 2\), each of which is constituted of a forward line and a backward line, for a total of \(q = 2p\) lines. Every line has a starting and an ending node, which are isolated nodes, and possesses \(p1\) intermediary nodes which allow transfers to the other round trips, yielding a total of \(n = 2p(p + 1)\) nodes in the network. Examples of these toy networks can be found in Fig. 1.
In order to be realistic, the permitted sttrips set P is constructed by considering all shortestpath between pair of nodes, excluding:

s and t that are on the same line but with t preceding (or equal to) s in the line order.

s and t that are on the same round trip but on opposite lines.

s and t whose shortestpath starts with a transfer edge, ends with a transfer edge, or possesses two (or more) consecutive transfer edges.
A transportation flow \(\textbf{N}_\text {ref} = (n^\text {ref}_{st})\) is drawn by setting a fixed number of passengers \(n^\text {ref}_{\bullet \bullet }\), and each passenger is assigned randomly to a (s, t) pair drawn uniformly among P. From this reference transportation flow \(\textbf{N}_\text {ref}\), using the edgetrip incidence matrix \({\varvec{\chi }}\) and equation (3), we can compute flow on edges \(\textbf{X}_\text {ref}\) and, in turn, the number of passengers embarking \(\textbf{a}_\text {ref}\) and the number of passengers disembarking \(\textbf{b}_\text {ref}\) at each stop.
Algorithm iterations
First, we exhibit some algorithm iterations on a toy example with \(p=2\), where 50 passengers were drawn uniformly across the \(\vert P \vert = 20\) possible sttrips. Some iterations of the algorithm with \(\theta =0.001\), along with MTE and MME errors, are shown in Fig. 2.
On this small example, we can see that the algorithm quickly find an estimation giving a small MME error, but still give a MTE of 0.256. This result is due to the fact that only 50 passengers are drawn, giving a large deviation compared to the optimally found solution which maximizes the entropy.
MTE study
The main goal of toy examples, since we have access to the real transportation flow, is to study how the resulting MTE behaves regarding passenger sttrips distribution and hyperparameter \(\theta\), on different network sizes.
The randomness of the sttrips distribution is controlled by the number of drawn passengers and we can see that the algorithm performs better if this number increases, as seen in Fig. 3. This behavior can be expected as the method is constructed to find the solution maximizing the entropy while respecting embarkment and disembarkment constraints. In fact, all MTE should converge to 0 eventually, however this rate of convergence seems to be low.
As for the hyperparameter study, found in Fig. 4, we see that the optimal parameter seems to decrease as the network size increases, reaching value close to 0 for 8 round trips. However, this is not the only factor which helps to calibrate this parameter. As a matter of fact, it seems unrealistic, in some real life scenarios, that all passengers embarking or disembarking at junctions are all transiting, and this value will be kept at 0.1 for our real life study.
Real data
The dataset and algorithm setup
The dataset was constructed from data collected by the public transportation agency in the city of Lausanne (tl), in Switzerland. Automatic passenger counters were used to collect data on the number of passengers entering and leaving buses and subways at each stop. The dataset comprises 44 lines, 1361 stops, and over 115 million passengers per year. Each round trip forms two separate lines, which often have stops that correspond to both the inbound and outbound directions, but this is not always the case. Some stops may be unique to either the inbound or outbound direction. The initial dataset contained 13 million data rows, which were aggregated for the year 2019, and the passenger data was filtered to include only passengers who embarked and disembarked at each stop. Lines with the most errors were removed, resulting in a final dataset of 35 transportation lines and 1216 stops. The threshold for invalid lines was set when a difference between total embarkment and disembarkment counts differs more than 15% of their means. For the rest all other lines, the correction found in the appendix was applied to ensure consistency conditions.
Line edges were built from the given data, and to obtain the full network structure, we added transfer edges between different lines if the estimated walking time (using Open Trip Planner^{Footnote 4} Morgan et al. (2019)) was less than 120 s. Shortestpaths between all pairs of stops were computed using the breathfirst search algorithm West (2001) implemented in the igraph^{Footnote 5} R library Csardi and Nepusz (2006), thus giving \({\varvec{\chi }}\). The matrix P, containing permitted sttrips, was built using the same conditions found in the toy example experiments (“Construction” Section) with an additional one. If the estimated pedestrian time between two stops s and t was shorter than the estimated time by taking the public transportation network, this sttrip was also removed from P. The graph structure, consisting in the edgetrip incidence matrix \({\varvec{\chi }}\), the permitted sttrips set P, and the transfer edges set T, was constructed beforehand testing the method.
The algorithm was run with hyperparameter \(\theta = 0.1\). Tracking the mean margin error (MME) showed a rapid decrease (\(<0.001\) after 13 iterations) but the (conservative) convergence criterion (chosen as when \(\sum _{st} \vert f^\text {prev}_{st}  f_{st} \vert < 10^{6}\)) was reach after 316 iterations, which corresponds to approximately one hour of computing time with a single thread on a AMD Epyc2 7402 with 32GB of RAM.
Results
As the result of our algorithm consists in the \((1216 \times 1216)\) matrix \(\textbf{N} = (n_{st})\), it is difficult to present them succinctly. We chose here to focus on two aspects: (1) mapping the origin and destination profiles of stops with the help of Correspondence Analysis (CA) Benzécri (1977); and (2) mapping the aggregated transfers between lines.
The first and second factorial dimensions of the destination and origin profiles of stops, obtained with CA applied on \(\textbf{N}\), can be found on Fig. 5. We can see that origin and destination coordinates of each stop are quite similar when displayed on the first two factorial axes. This result can be explained if we understand that, in fact, origin and destination profiles are, to some extent, symmetrical: at the beginning of a line, stops have several choices of destinations, but very few origins “point” at them, which is exactly the reverse for stops at the end of a line. Intermediary stops are generally located near the city center, and have a similarly rich origin and destination profiles. Concerning the dimensions, we can see that the first component divides the city stops between west and east, a known dichotomy in the public transportation usage in Lausanne: the east of the city is less well served by public transport and inhabitants often use cars, while the west intensively uses public transports. The second component is harder to interpret, but seems to highlight the radial structure of the transportation network in Lausanne.
Figure 6 maps predicted transfer hubs, obtained by aggregating every transfer counts \(z_{ij}\) between lines when they occurs between stops located in the same spatial cluster (same “superstop” name in the dataset). As expected, most transfers occur in the city center, with the top 5 largest hubs located at StFranÃ§ois (2,132,597 transfers), LausanneGare (2,035,543 transfers), BelAir (1,908,168 transfers), Ours (1,057,282 transfers), and Chauderon (969,789 transfers). When taken individually, four of the top 5 transfer counts are predicted between line 1 and line 72 occurring at the train station (LausanneGare). Line 72 is actually a subway line connecting the northern and southern regions of the city, and is by far the most used line in the network. Line 1 is also a highly frequented line and operated toward the western region of the city. It is not surprising to see most transfers occurring between these two lines, as they permit to connect the region with a high density of public transportation network users (the western part of Lausanne), to the main artery of the network (line 72).
Conclusion
At first glance, the task of estimating the origin–destination trip counts of public transportation networks on the sole basis of embarkments and disembarkments stop counts might appear as too ambitious. The project is indeed challenging, but we hope to have convinced the reader that meaningful estimates can be obtained by applying a succession of carefully chosen, principled yet flexible steps. Some of the ingredients (maximum entropy, iterative fitting) are familiar in transportation theory, and some others (shrinkage of priors, minimal amount of entering and leaving the network) seem original. Possible theoretical and algorithmic improvements are still under investigation. Due to length constraints, the exploitation and interpretation of the estimated flow has been kept to a minimum. Yet, many issues, such as disaggregating data and regimes (week days, weekends or holidays; time in the day), assessing the centrality of stops and junctions, or characterizing the imbalance between uphill and downhill flows (Lausanne is known for its slopes) are of primary interest for urban planning and transportation geography, and will be developed in a forthcoming work. Also, visualizing the numerous and various quantities of interest, on a spatial map or else, constitutes a challenge in itself, requiring a particular blend of creativity and rigor.
Availability of data and materials
All data, codes, and results can be found on the GitHub repository of the project: https://github.com/sliunil/tl_study.
Notes
with possibly a small quantity \(\epsilon > 0\) added on null components of \(g_{st}\).
References
Ashok K, BenAkiva ME (2002) Estimation and prediction of timedependent origindestination flows with a stochastic mapping to path flows and link flows. Transp Sci 36(2):184–198
Bavaud F (2002) The quasisymmetric side of gravity modelling. Environ Plan A 34(1):61–79
Bavaud F (2009) Information theory, relative entropy and statistics. In: Sommaruga G (ed) Formal theories of information: from shannon to semantic information theory and general concepts of information, LNCS, vol 5363. Springer, Berlin, pp 54–78
Bell MG, Lida Y (1997) Transportation network analysis. Wiley, Chichester
Benzécri JP (1977) Histoire et préhistoire de l’analyse des données. partie v l’analyse des correspondances. Cahiers de l’analyse des données 2(1):9–40
Csardi G, Nepusz T (2006) The igraph software package for complex network research. InterJ Complex Syst 1695:1–9
Cui A (2006) Bus passenger origindestination matrix estimation using automated data collection systems. PhD thesis, Massachusetts Institute of Technology
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc: Ser B (Methodol) 39(1):1–22
Erlander S, Stewart NF (1990) The gravity model in transportation analysis: theory and extensions, vol 3. VSP, Leiden
Hazelton ML (2000) Estimation of origindestination matrices from link flows on uncongested networks. Transp Res Part B: Methodol 34(7):549–566
Jaynes ET (1957) Information theory and statistical mechanics. Phys Rev 106(4):620
Morgan M, Young M, Lovelace R, Hama L (2019) Opentripplanner for R. J Open Source Softw 4(44):1926. https://doi.org/10.21105/joss.01926
Peyré G, Cuturi M et al (2019) Computational optimal transport: with applications to data science. Found TrendsⓇ Mach Learn 11(5–6):355–607
ThomasAgnan C, LeSage JP (2021). In: Fischer MM, Nijkamp P (eds) Spatial econometric ODflow models. Springer, Berlin, Heidelberg, pp 2179–2199
West DB et al (2001) Introduction to graph theory, vol 2. Prentice hall Upper Saddle River
Wilson A (1967) A statistical theory of spatial distribution models. Transp Res 1(3):253–269
Acknowledgements
This paper would not have been possible without the dataset given by the transportation agency of the city of Lausanne (tl). We wish to thank the agency for its kindness and hope this research will help them for future developments.
Funding
Open access funding provided by University of Lausanne.
Author information
Authors and Affiliations
Contributions
GG created the algorithm and participated to the case studies and the manuscript redaction. RL participated to the case studies and manuscript redaction, FB participated to the manuscript redaction.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Correction of the embarkment and disembarkment counts in a single line
It may happen that, in a line \(\ell\) with stops indexed in order as \(1, \ldots , l\), raw data \(\textbf{a}=(a_i)\) and \(\textbf{b}=(b_i)\) do not obey the following necessary consistency conditions
where \(A_i\) (respectively \(B_i\)) is the cumulated number of embarked (resp. disembarked) passengers on the line at stop i, i.e. \(A_i = \sum _{j=1}^{i} a_j\) and \(B_i = \sum _{j=1}^{i} b_j\). The first condition is easy to correct (by setting both quantities to 0), and we will assume that they are valid. The last two require an iterative procedure, explained here.
We first identify all stops i where \(A_{i  1} < B_i\) and store them, along with stop 1 and l, in a order set S. Let \(\text {Prev}(i)\) be the item right before i in the set S. For all \(i \in S \setminus \{1\}\), we do:
Before the last step, all conditions \(A_{(i1)} \ge B_{i}\) should be respected except for node l. The last step ensures that \(A_{l} = B_{l}\). However, as this last step can sometimes lower the embarkment count and increase the disembarkment count on previous nodes (if \(A_l > B_l\) before correction), some new nodes can now violate \(A_{(i1)} \ge B_i\). This is why this procedure must be iterated until all stops on the line verify consistency conditions.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Guex, G., Loup, R. & Bavaud, F. Estimation of flow trajectories in a multilines transportation network. Appl Netw Sci 8, 44 (2023). https://doi.org/10.1007/s41109023005707
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41109023005707