The Mobility Network of Scientists: Analyzing Temporal Correlations in Scientific Careers

The mobility of scientists between different universities and countries is important to foster knowledge exchange. At the same time, the potential mobility is restricted by geographic and institutional constraints, which leads to temporal correlations in the career trajectories of scientists. To quantify this effect, we extract 3.5 million career trajectories of scientists from two large scale bibliographic data sets and analyze them applying a novel method of higher-order networks. We study the effect of temporal correlations at three different levels of aggregation, universities, cities and countries. We find strong evidence for such correlations for the top 100 universities, i.e. scientists move likely between specific institutions. These correlations also exist at the level of countries, but cannot be found for cities. Our results allow to draw conclusions about the institutional path dependence of scientific careers and the efficiency of mobility programs.


Introduction
Mobility of high-skill labour is a hot topic in economics, politics and research policy. One of the most prominent manifestations of this phenomenon is the international mobility of scientists. Global mobility of scientist has increased over the past decade and is regarded by the OECD (2017) as being a "key driver of knowledge circulation worldwide". In the past this phenomenon has caused fears of "Brain Drain", whereby high-skilled labour moves abroad to the benefit and detriment of the receiving and sending country, respectively. This pessimistic view has been challenged by recent research showing that both sending and receiving countries may benefit from this labour mobility (Saxenian, 2005;Agrawal et al., 2011;Petersen, 2018). Nevertheless, modern economies rely on high-skill labour to maintain their competitive advantage (Chambers et al., 1998;Beine et al., 2001;Beechler and Woodward, 2009;Bahar et al., 2012). For this reason, attracting and retaining scientists are becoming a key concern for migration policy (Boucher and Cerna, 2014).
Following the arguments from above, scientist mobility plays a central role in the exchange of knowledge. This implicitly assumes that scientists can move relatively freely and that there is a high degree of "mixing". Mixing in this context means that different scientific careers are possible, 1/22 arXiv:1905.06142v3 [cs.SI] 4 Mar 2020 Figure 1: The affiliation data can be aggregated from affiliation level (e.g., university) to city and country as shown in panel (a). In panel (b) the top 10% most common paths at city level in 2004 for the MEDLINE corpus are shown. and they do not always follow the same pattern. However, we know that geography constrains scientists' career trajectories (Verginer and Riccaboni, 2018) in particular, and research and development activities in general (Scholl et al., 2018). Additionally, Deville et al. (2014); Clauset et al. (2015) have shown that institutional prestige constrains scientific careers. Indeed, both geography and prestige are essential to capture complex features of scientists' mobility (Vaccario et al., 2018). In this work, we test if scientific careers are indeed global and free, resulting in a worldwide scientist mobility network with high levels of exchange and mixing. Or alternatively, if scientific careers often follow similar patterns and thus lead to weak mixing. We test this hypothesis by looking at scientific careers among the 100 top universities as ranked by the Times Higher Education World Reputation Rankings (2015).
Previous works have projected career trajectories at the city, regional and country level to study their patterns and economic impact (Miguélez and Moreno, 2014;Petersen, 2018;Verginer and Riccaboni, 2018). These projections are obtained by aggregating trajectories traversing institutions located in the same cities, regions or countries. By doing so, scientists' mobility can be studied at different levels of aggregation: affiliation, city and country levels (see Fig. 1). For this reason, we complement our analysis by looking at career trajectories beyond the 100 top universities and consider also the city and country level. Hence with our study, we explore academic mobility at three different levels of aggregation.
To empirically address the importance of academic mobility and consequences for knowledge exchange, we reconstruct the career trajectories of scientists through bibliographic data. Specifically, we use the affiliations reported on scientific publications, to track scientific careers. We refer to these time-ordered sequences institutions of individual scientists as the "career trajectories", in line with the literature on labour mobility. Previous research (Clauset et al., 2015) has aggregated these trajectories to construct a static network where nodes represent research institutions and link the flow of scientists between them. However, recent advances in computer and network science have raised concerns over the naive aggregation of temporal sequences into static networks. In particular, neglecting temporal correlations can lead to erroneous conclusions about the accessibility and importance of nodes (Lentz et al., 2013) and the possible dynamic processes unfolding on networks (Pfitzner et al., 2013). To address this issue, we use Higher-Order Networks, a novel method to represent and model sequential data (Rosvall et al., 2014;Scholtes et al., 2014;Xu et al., 2016). This network representation allows us to capture temporal correlations by means of topological characteristics. Additionally, Scholtes (2017) has combined different higher-order networks to obtain a Multi-Order Graphical Model that encode temporal correlations of varying lengths. We build on these recent developments in network science to contribute to the literature on scientists' mobility in particular and high-skill labour in general.
The rest of the paper is structured as follows. First, in Sect. 2, we introduce the data we will be using for the analysis. We then describe, in Sect. 3, how career trajectories may be represented using a network perspective and introduce higher-order network models. In Sect. 4, we estimate the importance of temporal correlations in our data by using multi-order network models. Moreover, we analyze the implications of the identified temporal correlations for knowledge exchange in Sect 5. To complement the analysis at the university level, we also test for the existence of temporal correlations at city and country level, in Sect. 6. Finally, we discuss our findings and their implications in Sect. 7.

Reconstructing Mobility Paths
To study the mobility patterns of scientists, we need a large scale dataset on the institutions, scientists have worked. We extract this data from publications records. Indeed, the affiliation is a piece of meta-data available with virtually every published paper. We rely on this meta-data to reconstruct career trajectories. Precisely, we extract from two large bibliographic datasets the sequence of affiliations for a given disambiguated author, namely MEDLINE and MAG. Specifically for the analysis at the institution level, i.e., the 100 most "prestigious" universities we use the Microsoft Academic Graph (MAG) dataset, and for the comparison at city and country level, we use the MEDLINE corpus. Extracting affiliation sequences from these corpora yields sequences of time-stamped (i.e., publication date) locations. An example of such a record, as found in the MEDLINE corpus, is shown in Table 1. This record is equivalent to the records extracted from MAG (see Table 2), except for the fact that MAG lists the affiliation but not the city or country.
For the analysis at affiliation level, we use the Microsoft Academic Graph (MAG) released for the KDD Cup competition in 2016. The KDD cup version of the MAG data contains more than 126 Mill. publications (Sinha et al., 2015). Each publication is also endowed with various attributes such as unique ID, publication date, title, journal ID, author ID, and affiliations 3/22  From this data, we extracted the career trajectories of scientists at affiliation level by using the affiliations reported on their publications. For the precise details of the extraction procedure see Verginer and Riccaboni (2018). We obtain 235 935 scientist career trajectories moving through 100 universities. These 100 university have been disambiguated manually and the list of matched universities is available upon request from the authors. We use the MAG for the analysis at university level because it has disambiguated authors and covers a wide array of journals across fields. This means that especially for US and EU based institutions, we have comprehensive coverage of publications.  Torvik and Smalheiser (2009);Torvik (2015) to aggregate universities to the city and the country level. With this dataset, we can geo-localize the affiliation strings worldwide. They main disadvantage of MEDLINE is that it has a lower journal and discipline coverage.
We are aware that reconstructing career trajectories using bibliographic data is not without issues. First, the disambiguation of authors is never perfect. However, we rely on two high quality and widely used datasets to address this issue (Torvik and Smalheiser, 2009;Torvik, 2015;Sinha et al., 2015). Second, publications are a delayed signal of presence and not a live signal, as the submission and publication date are different. This means that the affiliation associated with a scientist in a given year might not be her current place of employment. At the same time, for the present study, we care primarily about the order of affiliations and not their precise timing. Third, we can only talk about scientists' mobility and not about migration, which would require us to know the nationality of the scientists, which we do not have. We do have information on the country of first publication, which might coincide but is not guaranteed to be correct.

Analyzing scientists' mobility as a network
To study the career trajectories of scientists, we adopt a network perspective. We represent universities as nodes and scientists' movements between universities as links. By aggregating the career trajectories, we reconstruct the mobility network that we show in Fig. 2. From this figure, we see that all universities belong to the same connected component. This fact implies that there exists a (network) path connecting any pair of universities. Moreover, the network

University of Toronto
Oxford Peking University Stanford MIT ETH Zürich Figure 2: The inter-institution mobility network generated from the 10% most common paths. Nodes represent universities and links scientists moving between them. The links are directed and they have to be followed clockwise.
diameter is six, and the average shortest path is about 2. Hence, it appears that a scientist from any university could potentially reach any other university within a small number of steps. In reality, however, both prestige and geography constrain scientists' career trajectories (Clauset et al., 2015;Verginer and Riccaboni, 2018) and thus rule out various potential trajectories. There is a "paradox" between the topological finding of a short average path and the realized possibilities of such short paths which motivates our more in-depth investigation.
We start our investigation by looking at the temporal correlations in scientists' career trajectories. Recent studies have shown that mobility patterns with temporal correlations cannot be captured using standard network models (Rosvall et al., 2014;Scholtes et al., 2014;Xu et al., 2016;Scholtes, 2017). For example, these studies have shown that random walk models on networks do not capture travel data of passengers well, due to temporal correlations. Indeed, most passengers flying from one airport to another take return flights or pass through specific hubs, and hence, the sequences of airports visited exhibit temporal correlations. To deal with temporal correlations in sequence data, several authors have proposed a new class of network models called higher-order networks (Rosvall et al., 2014;Scholtes et al., 2014;Xu et al., 2016).
Higher-order networks are mathematical objects which retain temporal information normally discarded in standard network models. For example, we see in Fig. 3 (a) four different career trajectories p 1 = p 2 = {A, C, D} and p 3 = p 4 = {B, C, E}. If we were to represent these as a simple first-order network (see Fig. 3 (b)), we would imply that the transitions A → D and A → E are equally likely. However, by doing so, we have discarded the (temporal) information that no trajectories are connecting A → E. To preserve this information, we represent the trajectories {p 1 , p 2 , p 3 , p 4 } in a second-order network (see Fig. 3 (c)). In this network, we have four nodes (A-C), (C-D), (B-C) and (C-E) and two links (A-C) → (C-D) and (B-C) → (C-E). With this second-order network, we now respect the (temporal) order implicit in the data. Similar to standard networks, a second-order network may be represented using a transition matrix T (2) . In this matrix an element (T (2) ) ee is the probability of observing the transition e (e.g., C → D) after observing the transition e (e.g., A → C).
In Fig. 4, we represent scientists' career trajectories as a second-order network. From this visualization, we immediately note two features. First, we note the halo of points that are many disconnected components of size two. These represent career trajectories that connect only pairs of universities, i.e., trajectories of the type (A-B) → (B-A). Second, the largest connected component contains only 23% of the nodes (i.e., university pairs) and it has a diameter of 24. These two key statistics indicate that the empirical career trajectories do not connect all university pairs, suggesting the existence of constraints in academic mobility.
We argue that the above finding is caused by the time ordering of universities in career trajectories. To show this, we depict the second-order network reconstructed using a "Markovian" temporal network in Fig. 5. This network is constructed from a maximum entropy second-order transition matrix,T (2) (Scholtes et al., 2014). An element (T (2) ) (A-C),(C-E) = (T (1) ) A,C × (T (1) ) C,E where (T (1) ) A,C is the observed relative frequency of the transition A → C. By this, we obtain a transition matrix that does not preserve the empirical time ordering of career trajectories. In Fig. 5, the network links correspond to all the non-zero entries of the second-order transition matrix of the Markovian temporal networkT (2) . From this network visualization, one would deduce that all universities are connected through career trajectories, and scientists move freely from one university to another. Indeed, the Markovian second-order network has a diameter of six, and every node belongs to the giant connected component. The differences between the 7/22 Figure 4: The inter-institution mobility network generated from the 10% most common career trajectories. Note that in the second-order network a node represents a first-order edge (e.g., MIT→UCLA) and an edge a link between edges (e.g., (MIT-UCLA)→(UCLA-ETH)). The higher-order network represents the actual 10% most common career trajectories.
second-order networks reconstructed using the empirical data and T (1) hint at the presence of temporal correlation in scientists' career trajectories.

Quantifying Temporal Correlations
A second-order network captures the temporal correlations of scientists trajectories of length two without loss of information. In general, trajectories can have different lengths, and hence, can be represented by higher-order networks of different orders. Then, if we have a sample of career trajectories of different lengths, how do we choose the correct order to model all these trajectories at the same time? To address this problem, we use the multi-order graphical (MOG) models,

8/22
G. Vaccario, L. Verginer, F. Schweitzer: The Mobility Network of Scientists. (Submitted, 28.02.2020) Figure 5: The inter-institution mobility network generated from the 10% most common career trajectories. Note that in the second-order network a node represents a first-order edge (e.g., MIT→UCLA) and an edge a link between edges (e.g., (MIT-UCLA)→(UCLA-ETH)). The second-order network representation implied by the network shown in Fig.2. and the statistical test developed by (Scholtes, 2017). A MOG-model is a combination of higherorder networks up to order k. For a MOG-model, it is possible to compute its likelihood given its complexity (i.e., degrees of freedom) and the observed trajectories. Then, by using a likelihood ratio test between MOG-models with increasing k, we choose the model with the optimal order k opt . The MOG-model with a k opt retains statically significant temporal correlation present in the data without over-fitting. For details about MOG-model and the test see Scholtes (2017).
Note that k opt = 1 would imply that a first-order network well represents the data, and hence, the trajectories are not significantly influenced by temporal correlations. In practice, this would mean that the next movement of a scientist depends only on his/her current location. k opt > 1, on the other hand, would imply that the next location visited by a scientist depends not only 9/22 his/her current location but also on the previous ones. To estimate the k opt , We use Pathpy, the open-source path analysis library (Scholtes, 2019). When applying the statistical test of Scholtes (2017) to scientists' career trajectories, we find k opt = 2. That means k opt > 1, therefore we have statistical evidence that scientists' mobility exhibits temporal correlations. Precisely, we find that the next university in a career trajectory depends on a scientist's current and previous universities.
To better understand the source of the identified temporal correlation, we study the temporal motifs of length two. A temporal motif of length two is a sub-path of length two in a career trajectory, and it can have the following forms: X → Y → X (type I) and X → Y → Z (type II). Type I motifs represent pieces of career trajectories where a scientist first changes his/her university (X → Y ) and then, goes back to the previous working place (Y → X). Type II motifs, instead, represent career trajectories of scientists that after changing university (X → Y ), do not go back (Y → Z). For each type of motif, we can compare its empirical probability distribution P (emp) to the one expected from a first-order P (1) and a second-order P (2) network model. For more details on how these distributions are computed, see Appendix A. If we find that the P (emp) of both motif types are similar to P (2) , but not P (1) , we can conclude that the temporal correlations stem from both motif types. If, on the other hand, only the P (emp) of type I (type II) motifs is similar to P (2) , we can conclude that the temporal correlations stem from type I (type II) motifs.
We start by analyzing the motives of type I, i.e., of the form X → Y → X. To compare the P (emp) , P (1) , and P (2) of these motifs, we use the Kullback-Leibler (KL) divergence (Kullback and Leibler, 1951). This measure is commonly used to describe the similarity between probability distributions of discrete and unordered variables, and hence, it suits our situation well.
For the motifs of type I, the KL-divergence between P (emp) and P (1) is 0.826, while the P (emp) and P (2) is 0.038. The ratio between these two KL-divergences is ∼ 21.7 and we name it R I . Hence, the KL-divergence between P (emp) and P (2) is more than 20 times smaller compared to the divergence between P (emp) and P (1) . This means that the frequency of scientists that go back to their previous working institution is better captured by a second-order network.
For the motifs of type II, the KL-divergence between P (emp) and P (1) is 0.798, while the P (emp) and P (2) is 0.183. The ratio between these two KL-divergences is ∼ 4.4 and we name it R I I. Hence, the KL-divergence between P (emp) and P (2) is about 4 times smaller compared to the divergence between P (emp) and P (1) . We find again that the second-order model captures motives of type II better.
By comparing R I and R II , we also find R I is about five times larger than R II . This result indicates that the temporal correlations in the second-order model are mostly useful to capture motifs of type I. In the last section of the paper, we discuss and interpret this result.

Career trajectories and mixing
In this section, we analyze the effect that temporal correlations have on mixing and diffusion processes unfolding on the scientist mobility network. We split the analysis of scientist trajectories in a qualitative and quantitative part. For the former, we compare the alluvial diagrams (Lambiotte et al., 2019) of career trajectories created using the first-and second-order network models, while for the latter, we compute their entropy growth (Scholtes et al., 2014). Note that both the alluvial diagrams and the entropy growth describe and quantify the diffusion process unfolding on a temporal network. Specifically, for the case of academic mobility, one might think of a diffusion process in which scientists carry knowledge. Then the question arises, how fast and how far this knowledge would propagate in the network through mobility alone. If this happens fast and many institutions are reached, then this corresponds to strong mixing. If on the other hand, few institutions are reached, then we have weak mixing. Figure 6: The possible diffusion processes following the career trajectories originating from the University of Cambridge. The thickness of an edge represents the probability to observe that move. The color of an edge is blue when it represents a move going back to the source node, i.e., the University of Cambridge, otherwise the color is orange. The diffusion process describes the probability to find a scientists in a given location after t moves starting from the source node. On the left side of the plot (a), the empirical diffusion (i.e., actual moves) is shown. On the right side of the plot (b), the Markovian diffusion is shown, i.e., the diffusion we would expect if career trajectories had no temporal correlations. For visualization purposes, the 50% most common career trajectories have been used for this figure.
In Fig. 6, we show two alluvial diagrams describing the career trajectories originating from the "University of Cambridge". In Fig. 6(b), we show the alluvial diagram for the Markovian 11/22 diffusion. Starting from the University of Cambridge at t 0 , scientists are expected to move to many institutions with similar probabilities, illustrated by the similar thickens of edges from t 0 to t 1 . The universities reached at t 1 are many, suggesting the existence of strong mixing, and from step t 1 to t 2 , the mixing is amplified.
In Fig. 6(a), we show the alluvial diagram for the empirical diffusion of career trajectories originating from the University of Cambridge. In this panel (a), we see that from t 0 to t 1 many institutions are reached, again suggesting the existence of strong mixing. However, moving from t 1 to t 2 this trend is reversed, since most paths originating at the University of Cambridge go back there. This is visually shown as the blue edges from t 1 to t 2 . The preferred destination reduces the probability to reach many other locations at t 2 , implying weak mixing.
Note that the segment from t 0 to t 1 is identical, albeit ordered differently, for both Fig. 6(a) and (b), since both representations capture the first transitions correctly. The differences in the diffusion process emerge in the second step from t 1 to t 2 , when it matters whether a network is k opt = 1 or k opt > 1. As the comparison of Fig. 6(a) and (b) demonstrates, disregarding temporal correlations in the Markovian model leads us to overestimate mixing. Precisely, in Fig. 6(a) the returning rate is high and low in Fig. 6(b). This qualitative result is in line with the KL analysis, highlighting the importance of motives of type I, (X → Y → X).
To quantify the expected mixing of the two models given the observed data, we use the entropy growth ratio introduced by (Scholtes et al., 2014). The interpretation of this measure is as follows. Given an observed transition e = A → B, we compute the corresponding entropy from the second-order transition matrix, i.e., the probabilities to transition to the next nodes e : where E is the set of edges, i.e., possible career moves from one university to another.
The entropy h e (T (2) ) quantifies to what extent the next transition in a trajectory is determined by the previous transition e. For example, if a row (T (2) ) e,· has only one element equal to 1, then the transition is deterministic, and hence,Âă h e (T (2) ) = 0 implying weak mixing. While if all elements of a row are equal, then the next transitions happen uniformly at random, and hence, h e (T (2) ) is high, implying strong mixing.
To compute the expected overall mixing, one could argue that the total entropy of the model is the sum of all the transition entropies, i.e., e∈E h e (T (2) ). However, this would be only partially correct as some transitions are observed more often, and hence, should have a higher weight. To account for this, we weight each h e by the relative frequency of e. In formula, we have: where π e is the relative frequency of the transition e, that is given by: where S is the multi-set of paths and sub-paths constructed using the empirical data. With the above definition, we can compute the entropy growth ratio between the first-and the second order models: Note that for Λ H (T (2) ) > 1 the diffusion process computed using the first-order network would over-estimate the level of mixing. If Λ H (T (2) ) < 0, the opposite is true. We obtain a Λ H (T (2) ) ∼ 3.4 > 1. This means that the growth of entropy coming from the first-order model is 3 times larger than expected from the second-order model. Hence, the mixing would be overestimated by using a first-order model. This result is in line with the qualitative findings from Fig.6.

City and Country Level
We now move our analysis from the top 100 universities to the city and country level. For this analysis, we aggregate trajectories traversing institutions located in the same cities or countries.
By not limiting the analysis to specific set of universities, we increase the number of observed trajectories and scientists. Hence, we should expect a more reliable statistics. At the same time, we should be aware that aggregating and projecting sequential data can distort its temporal properties as discussed previously, see also Scholtes et al. (2014).
We restrict our attention to 3 740 187 individual scientist trajectories across 215 countries between 1990 and 2009 with papers listed in MEDLINE. 89% of the trajectories are of length 0, meaning that most scientists do not change country. However, this statistic also means that longer career trajectories are rare but still numerous (411 137). The most frequent trajectories of length one are between the UK and USA, Japan and the USA, and the USA and the UK. The presence of the US in these paths stems from the fact that the US has the largest scientist population in the dataset.
When considering trajectories of length two, the most frequent ones are between (Japan, USA, Japan), (USA, UK, USA), and (UK, USA, UK). If we consider only those trajectories that do not go through the USA, we find that the most frequent trajectories of length two are across (UK, Australia, UK), (France, UK, France) and (Germany, UK, Germany). All these types of trajectories are reminiscent of the motifs of type I, i.e. X → Y → X, found at the institution level. When looking at trajectories at the city level, we find a similar pattern (see Appendix B). Therefore, we test for the existence of temporal correlations as we have done at the institution level.
By applying the multi-order modelling framework of Scholtes (2017), we identify k opt = 2, implying that a second-order network best represents the data at the country level. When applying the same framework at city level, we identify k opt = 1. Hence, we do not find evidence of temporal correlations at city, but at country level. This result has several implications. First, even though we observe a large number of trajectories of type I at a both city and country level, these 13/22 trajectories are still to be expected from a first-order network model at city, but not at country level. Second, temporal correlations have been detected at the affiliation and country, but not at city level. Therefore, we find that the concerns raised by Butts (2009);Zweig (2011);Scholtes (2017) about the naive application of network-analytic models apply to research on academic mobility.

Discussion
Scientific knowledge is shared not only through artefacts, e.g., publications, but also through scientists carrying out their research. Thus, their mobility is thought to facilitate knowledge exchange. "Mobility as a vehicle of knowledge exchange" implicitly assumes that academic careers can span the globe and universities. In this work, we have addressed this implicit assumption by testing for the existence of temporal correlations in scientific careers. The existence of such correlations would imply the existence of constraints in academic mobility.
To perform our analysis we have reconstructed, from two large bibliographic datasets, MAG (Sinha et al., 2015) and MEDLINE (Torvik and Smalheiser, 2009;Torvik, 2015), the career trajectories across the 100 top universities and across cities and countries, respectively. We do not find evidence that temporal correlations influence career trajectories at city level. However, we do find evidence that mobility, especially among elite institutions, is influenced by temporal correlations.
At the university level, we find that the mobility of scientists is affected by temporal correlations. In particular, a large part of these correlations is determined by career trajectories of type I, i.e., scientists return to their original university after two career steps. Indeed, the KL-divergence ratio R I for these motifs is five times larger than R II . This result has a direct impact on the knowledge exchange as proxied by scientists mobility. Specifically, the implied mixing of knowledge through mobility is a lot lower than one would expect, by using a standard networks model. To quantify this statement, we have computed the entropy growth ratio Λ H (T (2) ), that captures how much a standard network model overestimates the trajectory mixing. We find that Λ H (T (2) ) ∼ 3.4, showing that a simple network model would overestimate the mixing by a factor of 3. A possible but rather pessimistic explanation is that a simple mental short-cut employed by hiring committees to favour "known origins" reinforces the path dependency and possibly hampers knowledge exchange.
At the city and country level, we find a large number of career trajectories of type I. However, at city level, we do not find statistically significant temporal correlations. This result indicates that type I trajectories or returnees (Saxenian, 2005;Agrawal et al., 2006Agrawal et al., , 2011 are still to be expected from a standard network model. Additionally, the use of a standard network perspective is justified when studying academic mobility at city, but not at country level. 14/22 Overall from the results presented in this work, a picture emerges that scientific careers are a lot less diverse than one would expect. The existence of temporal correlations suggests that although among the 100 most prestigious institutions, there is substantial and continued exchange, the actual career trajectories are far more limited. Our results are relevant both methodologically and empirically, for research on high-skill labour mobility, in general, and the analysis of scientists' career trajectories, in particular.

Author's contributions
GV, LV and FS have conceived the study. GV and LV have performed the analysis. GV, LV and FS have written the manuscript.

Acknowledgements
The authors thank Dr. Giona Casiraghi for his useful suggestions on how to present the analysis for the entropy growth ratio.

Availability of data and materials
The raw XML data on all MEDLINE articles is available for download from the NIH at • https://www.nlm.nih.gov/databases/download/pubmed_medline.html • ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline The disambiguation of authors (Authority) (Torvik and Smalheiser, 2009) and affiliations (Ma-pAffil) (Torvik, 2015) has been obtained from • http://abel.lis.illinois.edu/downloads.html Access to this resource can be requested for free from the maintainers through the online form on the same page. Note that due to an agreement with the providers of Authority and MapAffil, these datasets may only be shared by requesting access through the previously mentioned online form.
The MAG data (Sinha et al., 2015) was downloaded from https://www.kdd.org/kdd-cup/ view/kdd-cup-2016/Data. Unfortunately, the data has now been removed for download. Hence, the MAG data used and analysed are available from the corresponding author on reasonable request. To compute the probability to observe motifs of type I, i.e., X → Y → X or type II, i.e., X → Y → Z, we have note that these are all the possible (sub-)trajectories of length two. We call S k the multi-set containing all the sub-trajectory of length k. This means that S 2 contains all the motifs of type I and type II. Then, we define the probability to observe a motif p ∈ S 2

A Distributions of motifs
Note that P (emp) is is always equal or greater than zero and p∈S 2 P (emp) (p) + |S 1 |/|S| = 1 whereS 2 is the set containing the sub-trajectory of length two. This last relation follows from the fact that any trajectory i ∈ S can be spitted in l i − 1 sub-trajectories of length 2 when l i ≥ 2. Hence, Using Eq.6, we can show that For computing the probability distribution in the first-and second-order network, i.e., P (1) and P (2) , we rely on the Pathpy implementation. Precisely, we use the function path_likelihood that returns the probability to observe a path/transition. For more details, see (Scholtes, 2017(Scholtes, , 2019.

A.2 Odds ratio
We split the setS 2 in two disjoint setsS I 2 andS II 2 ,respectively containing motifs of type I and type II. The empirical odds ratio to observe motifs of type I compared to motifs of type II is given by and we find from our data that OR I,II ∼ 1.7. This means that motifs of type I are almost twice as frequent as motifs of type II. When computing the odds ratio OR I,II using P (1) we get ∼ 0.14 that is of an order magnitude different compared to the empirical one. While, the the odds ratio OR I,II coming from P (2) is ∼ 1.4 that is quite close to the empirical one.

A.3 Computing and visualizing the Kullback-Leibler divergence
Since we use the P (emp) , P (1) , P (2) to compute the Kullback-Leibler (KL) divergence for motifs of length 2, we normalize this probability to one. In other words, for P (emp) (p), we write and we renormalise in similar way P (1) and P (2) . The KL-divergence between P (emp) and P (1) is 1.51, while between P (emp) and P (2) is 0.10. Hence, the first-order model has KL-divergence more than 10 times larger compared to the second-order model. In Fig. 7, we plot P (emp) , P (1) , and P (2) respectively in green dots, blue + and orange ×. The order on the x-axis is created by sorting the motifs from the most to least probable with respect to the empirical data. We see that the first-order network consistently underestimates motifs that are more probable. Instead, the second-order network produces probabilities quite close to the empirical ones.
To compute the Kullback-Leibler divergence only for motifs of type I (type II), we have to renormalise P (emp) , P (1) , and P (2) . In other words, P (emp) (p) → P (emp) (p)/ p∈S I 2 P (emp) (p), and similarly for P (1) and P (2) . In Fig. 8(a) and (b), we compare the probability distribution for the motifs of type I and type II, respectively. Again, we have P (emp) , P (1) , and P (2) respectively in green dots, blue + and orange ×. The order on the x-axis is created by sorting the motifs from the most to least probable with respect to the empirical data. From Fig. 8(a) that depict motifs of type I, it is evident that the first order network underestimates the probabilities of these motifs. While, in Fig. 8(b), the situation is reversed.

20/22
Figure 7: Distribution probabilities for all the motifs of type I and II. we plot P (emp) , P (1) , and P (2) respectively in green dots, blue + and orange ×. The order on the x-axis is created by sorting the motifs from the most to least probable with respect to the empirical data. On the y-axis, we report their respective probabilities. Figure 8: Distribution probabilities for all the motifs of type I (a) and II (b). we plot P (emp) , P (1) , and P (2) respectively in green dots, blue + and orange ×. The order on the x-axis is created by sorting the motifs from the most to least probable with respect to the empirical data. On the y-axis, we report their respective probabilities.

C Trajectories at global affiliation level (MAG)
In the MAG, we have over 14 million disambiguated scientist among more than 19000 affiliations. Among these, 2 591 784 trajectories have length 1 or higher between 18 522. Specifically, 1 196 158 are of length 1, meaning that we observe half of the scientists changing affiliation only once. While, 1 395 626 of the trajectories are longer (e.g., 614 776 have length two). The most frequent trajectories of length one are between University of California and University of California, Berkeley University of California and university of California Los Angeles, and University of California and Davis, University of California. Note that University of California is a university system composed of 10 different campuses and is not an unambiguously defined location or affiliation. When considering longer trajectories, we find similar type of trajectories trough ambiguously defined research institutions. For this reasons, we do not use the MAG data to analyze scientists' career trajectories at global level and use MEDLINE instead.
1 Note that the fact that we can distinguish between locations such as Stanford and Paolo Alto only thanks to the fine grain resolution of MapAffil (Torvik, 2015). Indeed, by manually checking authors' affiliation that MapAffil placing in Palo Alto, we find that many companies such Xerox, Hewlett-Packard, Lockheed Martin, and many biotech companies have laboratories in Paolo Alto.