 Research
 Open Access
 Published:
The mobility network of scientists: analyzing temporal correlations in scientific careers
Applied Network Science volume 5, Article number: 36 (2020)
Abstract
The mobility of scientists between different universities and countries is important to foster knowledge exchange. At the same time, the potential mobility is restricted by geographic and institutional constraints, which leads to temporal correlations in the career trajectories of scientists. To quantify this effect, we extract 3.5 million career trajectories of scientists from two large scale bibliographic data sets and analyze them applying a novel method of higherorder networks. We study the effect of temporal correlations at three different levels of aggregation, universities, cities and countries. We find strong evidence for such correlations for the top 100 universities, i.e. scientists move likely between specific institutions. These correlations also exist at the level of countries, but cannot be found for cities. Our results allow to draw conclusions about the institutional path dependence of scientific careers and the efficiency of mobility programs.
Introduction
Mobility of highskill labour is a hot topic in economics, politics and research policy. One of the most prominent manifestations of this phenomenon is the international mobility of scientists. Global mobility of scientist has increased over the past decade and is regarded by the OECD (2017) as being a “key driver of knowledge circulation worldwide”. In the past this phenomenon has caused fears of “Brain Drain”, whereby highskilled labour moves abroad to the benefit and detriment of the receiving and sending country, respectively. This pessimistic view has been challenged by recent research showing that both sending and receiving countries may benefit from this labour mobility (Agrawal et al. 2011; Petersen 2018; Saxenian 2005). Nevertheless, modern economies rely on highskill labour to maintain their competitive advantage (Bahar et al. 2012; Beechler and Woodward 2009; Beine et al. 2001; Chambers et al. 1998). For this reason, attracting and retaining scientists are becoming a key concern for migration policy (Boucher and Cerna 2014). Reference PDF Following the arguments from above, scientist mobility plays a central role in the exchange of knowledge. This implicitly assumes that scientists can move relatively freely and that there is a high degree of “mixing”. Mixing in this context means that different scientific careers are possible, and they do not always follow the same pattern. However, we know that geography constrains scientists’ career trajectories (Verginer and Riccaboni 2018) in particular, and research and development activities in general (Scholl et al. 2018). Additionally, Deville et al. (2014); Clauset et al. (2015) have shown that institutional prestige constrains scientific careers. Indeed, both geography and prestige are essential to capture complex features of scientists’ mobility (Vaccario et al. 2018). In this work, we test if scientific careers are indeed global and free, resulting in a worldwide scientist mobility network with high levels of exchange and mixing. Or alternatively, if scientific careers often follow similar patterns and thus lead to weak mixing. We test this hypothesis by looking at scientific careers among the 100 top universities as ranked by the Times Higher Education (World Reputation Rankings 2015).
Previous works have projected career trajectories at the city, regional and country level to study their patterns and economic impact (Miguélez and Moreno 2014; Petersen 2018; Verginer and Riccaboni 2018). These projections are obtained by aggregating trajectories traversing institutions located in the same cities, regions or countries. By doing so, scientists’ mobility can be studied at different levels of aggregation: affiliation, city and country levels (see Fig. 1). For this reason, we complement our analysis by looking at career trajectories beyond the 100 top universities and consider also the city and country level. Hence with our study, we explore academic mobility at three different levels of aggregation.
To empirically address the importance of academic mobility and consequences for knowledge exchange, we reconstruct the career trajectories of scientists through bibliographic data. Specifically, we use the affiliations reported on scientific publications, to track scientific careers. We refer to these timeordered sequences institutions of individual scientists as the “career trajectories”, in line with the literature on labour mobility. Previous research (Clauset et al. 2015) has aggregated these trajectories to construct a static network where nodes represent research institutions and link the flow of scientists between them. However, recent advances in computer and network science have raised concerns over the naive aggregation of temporal sequences into static networks. In particular, neglecting temporal correlations can lead to erroneous conclusions about the accessibility and importance of nodes (Lentz et al. 2013) and the possible dynamic processes unfolding on networks (Pfitzner et al. 2013). To address this issue, we use HigherOrder Networks, a novel method to represent and model sequential data (Rosvall et al. 2014; Scholtes et al. 2014; Xu et al. 2016). This network representation allows us to capture temporal correlations by means of topological characteristics. Additionally, Scholtes (2017) has combined different higherorder networks to obtain a MultiOrder Graphical Model that encode temporal correlations of varying lengths. We build on these recent developments in network science to contribute to the literature on scientists’ mobility in particular and highskill labour in general.
The rest of the paper is structured as follows. First, in “Reconstructing mobility paths”, we introduce the data we will be using for the analysis. We then describe, in “Analyzing scientists’ mobility as a network”, how career trajectories may be represented using a network perspective and introduce higherorder network models. In “Quantifying temporal correlations”, we estimate the importance of temporal correlations in our data by using multiorder network models. Moreover, we analyze the implications of the identified temporal correlations for knowledge exchange in “Career trajectories and mixing”. To complement the analysis at the university level, we also test for the existence of temporal correlations at city and country level, in “City and country level”. Finally, we discuss our findings and their implications in the “Discussion” sections.
Reconstructing mobility paths
To study the mobility patterns of scientists, we need a large scale dataset on the institutions, scientists have worked. We extract this data from publications records. Indeed, the affiliation is a piece of metadata available with virtually every published paper. We rely on this metadata to reconstruct career trajectories. Precisely, we extract from two large bibliographic datasets the sequence of affiliations for a given disambiguated author, namely MEDLINE and MAG. Specifically for the analysis at the institution level, i.e., the 100 most “prestigious” universities we use the Microsoft Academic Graph (MAG) dataset, and for the comparison at city and country level, we use the MEDLINE corpus. Extracting affiliation sequences from these corpora yields sequences of timestamped (i.e., publication date) locations. An example of such a record, as found in the MEDLINE corpus, is shown in Table 1. This record is equivalent to the records extracted from MAG (see Table 2), except for the fact that MAG lists the affiliation but not the city or country.
For the analysis at affiliation level, we use the Microsoft Academic Graph (MAG) released for the KDD Cup competition in 2016. The KDD cup version of the MAG data contains more than 126 Mill. publications (Sinha et al. 2015). Each publication is also endowed with various attributes such as unique ID, publication date, title, journal ID, author ID, and affiliations From this data, we extracted the career trajectories of scientists at affiliation level by using the affiliations reported on their publications. For the precise details of the extraction procedure see Verginer and Riccaboni (2018). We obtain 235 935 scientist career trajectories moving through 100 universities. These 100 university have been disambiguated manually and the list of matched universities is available upon request from the authors. We use the MAG for the analysis at university level because it has disambiguated authors and covers a wide array of journals across fields. This means that especially for US and EU based institutions, we have comprehensive coverage of publications.
However, the MAG has limitations that hinder a reliable analysis of scientists’ career beyond the 100 top universities. First, the string based matching employed by MAG is imperfect. For example, the MAG contains different affiliation IDs for the campuses of the University of California (e.g., UCLA, UCI, etc.) and a separate ID for the University of California UC. Hence, when creating careers trajectories considering all the affiliation IDs, one finds trajectories with fictitious locations, e.g., UCLA→UC→UCLA→UC. Note that this is not a problem when analyzing top institutions as the trajectory UCLA→UC→UCLA→MIT would become UCLA→MIT. Second, there is no unambiguous definition of what constitutes a research center. For example, the University of California is composed of ten campuses (e.g., UCLA, UCI, etc.), but UC and its campuses have different IDs that do not reflect this hierarchical structure. Given these issues, to study also the careers of scientists not affiliated with the top 100 universities, we do not use affiliations but their location at city and country level. For this purpose, we use MEDLINE.
MEDLINE is the largest, publicly available bibliographic database in the life sciences and maintained by the U.S. National Library of Medicine (NLM). It contains over 26 Mill. papers published in 5,200 journals and 40 languages, goes back in time until 1966 and is continuously updated. The corpus covers research in biomedicine and health predominantly. Moreover, we rely on the highquality dataset by Torvik and Smalheiser (2009); Torvik (2015) to aggregate universities to the city and the country level. With this dataset, we can geolocalize the affiliation strings worldwide. They main disadvantage of MEDLINE is that it has a lower journal and discipline coverage.
We are aware that reconstructing career trajectories using bibliographic data is not without issues. First, the disambiguation of authors is never perfect. However, we rely on two high quality and widely used datasets to address this issue (Sinha et al. 2015; Torvik 2015; Torvik and Smalheiser 2009). Second, publications are a delayed signal of presence and not a live signal, as the submission and publication date are different. This means that the affiliation associated with a scientist in a given year might not be her current place of employment. At the same time, for the present study, we care primarily about the order of affiliations and not their precise timing. Third, we can only talk about scientists’ mobility and not about migration, which would require us to know the nationality of the scientists, which we do not have. We do have information on the country of first publication, which might coincide but is not guaranteed to be correct.
Analyzing scientists’ mobility as a network
To study the career trajectories of scientists, we adopt a network perspective. We represent universities as nodes and scientists’ movements between universities as links. By aggregating the career trajectories, we reconstruct the mobility network that we show in Fig. 2. From this figure, we see that all universities belong to the same connected component. This fact implies that there exists a (network) path connecting any pair of universities. Moreover, the network diameter is six, and the average shortest path is about 2. Hence, it appears that a scientist from any university could potentially reach any other university within a small number of steps. In reality, however, both prestige and geography constrain scientists’ career trajectories (Clauset et al. 2015; Verginer and Riccaboni 2018) and thus rule out various potential trajectories. There is a “paradox” between the topological finding of a short average path and the realized possibilities of such short paths which motivates our more indepth investigation.
We start our investigation by looking at the temporal correlations in scientists’ career trajectories. Recent studies have shown that mobility patterns with temporal correlations cannot be captured using standard network models (Rosvall et al. 2014; Scholtes et al. 2014; Scholtes 2017; Xu et al. 2016). For example, these studies have shown that random walk models on networks do not capture travel data of passengers well, due to temporal correlations. Indeed, most passengers flying from one airport to another take return flights or pass through specific hubs, and hence, the sequences of airports visited exhibit temporal correlations. To deal with temporal correlations in sequence data, several authors have proposed a new class of network models called higherorder networks (Rosvall et al. 2014; Scholtes et al. 2014; Xu et al. 2016).
Higherorder networks are mathematical objects which retain temporal information normally discarded in standard network models. For example, we see in Fig. 3a four different career trajectories p_{1}=p_{2}={A,C,D} and p_{3}=p_{4}={B,C,E}. If we were to represent these as a simple firstorder network (see Fig. 3b), we would imply that the transitions A→D and A→E are equally likely. However, by doing so, we have discarded the (temporal) information that no trajectories are connecting A→E. To preserve this information, we represent the trajectories {p_{1},p_{2},p_{3},p_{4}} in a secondorder network (see Fig. 3c). In this network, we have four nodes (A−C),(C−D),(B−C) and (C−E) and two links (A−C)→(C−D) and (B−C)→(C−E). With this secondorder network, we now respect the (temporal) order implicit in the data. Similar to standard networks, a secondorder network may be represented using a transition matrix T^{(2)}. In this matrix an element \(\phantom {\dot {i}\!}\left (\mathbf {T}^{({2})}\right)_{ee'}\) is the probability of observing the transition e^{′} (e.g., C→D) after observing the transition e (e.g., A→C).
In Fig. 4, we represent scientists’ career trajectories as a secondorder network. From this visualization, we immediately note two features. First, we note the halo of points that are many disconnected components of size two. These represent career trajectories that connect only pairs of universities, i.e., trajectories of the type (A−B)→(BA). Second, the largest connected component contains only 23% of the nodes (i.e., university pairs) and it has a diameter of 24. These two key statistics indicate that the empirical career trajectories do not connect all university pairs, suggesting the existence of constraints in academic mobility.
We argue that the above finding is caused by the time ordering of universities in career trajectories. To show this, we depict the secondorder network reconstructed using a “Markovian” temporal network in Fig. 5. This network is constructed from a maximum entropy secondorder transition matrix, \({\tilde {\mathbf {T}}^{({2})}}\) (Scholtes et al. 2014). An element \(\left (\tilde {\mathbf {{T}}}^{({2})}\right)_{({A}{C}),({C}{E})} = \left (\mathbf {T}^{({1})}\right)_{A,C} \times \left (\mathbf {T}^{({1})}\right)_{C,E}\) where (T^{(1)})_{A,C} is the observed relative frequency of the transition A→C. By this, we obtain a transition matrix that does not preserve the empirical time ordering of career trajectories. In Fig. 5, the network links correspond to all the nonzero entries of the secondorder transition matrix of the Markovian temporal network \({\tilde {\mathbf {T}}^{({2})}}\). From this network visualization, one would deduce that all universities are connected through career trajectories, and scientists move freely from one university to another. Indeed, the Markovian secondorder network has a diameter of six, and every node belongs to the giant connected component. The differences between the secondorder networks reconstructed using the empirical data and T^{(1)} hint at the presence of temporal correlation in scientists’ career trajectories.
Quantifying temporal correlations
A secondorder network captures the temporal correlations of scientists trajectories of length two without loss of information. In general, trajectories can have different lengths, and hence, can be represented by higherorder networks of different orders. Then, if we have a sample of career trajectories of different lengths, how do we choose the correct order to model all these trajectories at the same time? To address this problem, we use the multiorder graphical (MOG) models, and the statistical test developed by (Scholtes 2017). A MOGmodel is a combination of higherorder networks up to order k.
For a MOGmodel, it is possible to compute its likelihood given its complexity (i.e., degrees of freedom) and the observed trajectories. Then, by using a likelihood ratio test between MOGmodels with increasing k, we choose the model with the optimal orderk_{opt}. The MOGmodel with a k_{opt} retains statically significant temporal correlation present in the data without overfitting. For details about MOGmodel and the test see Scholtes (2017).
Note that k_{opt}=1 would imply that a firstorder network well represents the data, and hence, the trajectories are not significantly influenced by temporal correlations. In practice, this would mean that the next movement of a scientist depends only on his/her current location. k_{opt}>1, on the other hand, would imply that the next location visited by a scientist depends not only his/her current location but also on the previous ones. To estimate the k_{opt}, We use Pathpy, the opensource path analysis library (Scholtes 2019). When applying the statistical test of Scholtes (2017) to scientists’ career trajectories, we find k_{opt}=2. That means k_{opt}>1, therefore we have statistical evidence that scientists’ mobility exhibits temporal correlations. Precisely, we find that the next university in a career trajectory depends on a scientist’s current and previous universities.
To better understand the source of the identified temporal correlation, we study the temporal motifs of length two. A temporal motif of length two is a subpath of length two in a career trajectory, and it can have the following forms: X→Y→X (type I) and X→Y→Z (type II). Type I motifs represent pieces of career trajectories where a scientist first changes his/her university (X→Y) and then, goes back to the previous working place (Y→X). Type II motifs, instead, represent career trajectories of scientists that after changing university (X→Y), do not go back (Y→Z). For each type of motif, we can compare its empirical probability distribution P^{(emp)} to the one expected from a firstorder P^{(1)} and a secondorder P^{(2)} network model. For more details on how these distributions are computed, see Additional file 1. If we find that the P^{(emp)} of both motif types are similar to P^{(2)}, but not P^{(1)}, we can conclude that the temporal correlations stem from both motif types. If, on the other hand, only the P^{(emp)} of type I (type II) motifs is similar to P^{(2)}, we can conclude that the temporal correlations stem from type I (type II) motifs.
We start by analyzing the motives of type I, i.e., of the form X→Y→X. To compare the P^{(emp)},P^{(1)}, and P^{(2)} of these motifs, we use the Kullback–Leibler (KL) divergence (Kullback and Leibler 1951). This measure is commonly used to describe the similarity between probability distributions of discrete and unordered variables, and hence, it suits our situation well.
For the motifs of type I, the KLdivergence between P^{(emp)} and P^{(1)} is 0.826, while the P^{(emp)} and P^{(2)} is 0.038. The ratio between these two KLdivergences is ∼21.7 and we name it R_{I}. Hence, the KLdivergence between P^{(emp)} and P^{(2)} is more than 20 times smaller compared to the divergence between P^{(emp)} and P^{(1)}. This means that the frequency of scientists that go back to their previous working institution is better captured by a secondorder network.
For the motifs of type II, the KLdivergence between P^{(emp)} and P^{(1)} is 0.798, while the P^{(emp)} and P^{(2)} is 0.183. The ratio between these two KLdivergences is ∼4.4 and we name it R_{I}I. Hence, the KLdivergence between P^{(emp)} and P^{(2)} is about 4 times smaller compared to the divergence between P^{(emp)} and P^{(1)}. We find again that the secondorder model captures motives of type II better.
By comparing R_{I} and R_{II}, we also find R_{I} is about five times larger than R_{II}. This result indicates that the temporal correlations in the secondorder model are mostly useful to capture motifs of type I. In the last section of the paper, we discuss and interpret this result.
Career trajectories and mixing
In this section, we analyze the effect that temporal correlations have on mixing and diffusion processes unfolding on the scientist mobility network. We split the analysis of scientist trajectories in a qualitative and quantitative part. For the former, we compare the alluvial diagrams (Lambiotte et al. 2019) of career trajectories created using the first and secondorder network models, while for the latter, we compute their entropy growth (Scholtes et al. 2014). Note that both the alluvial diagrams and the entropy growth describe and quantify the diffusion process unfolding on a temporal network. Specifically, for the case of academic mobility, one might think of a diffusion process in which scientists carry knowledge. Then the question arises, how fast and how far this knowledge would propagate in the network through mobility alone. If this happens fast and many institutions are reached, then this corresponds to strong mixing. If on the other hand, few institutions are reached, then we have weak mixing.
In Fig. 6, we show two alluvial diagrams describing the career trajectories originating from the “University of Cambridge”. In Fig. 6b, we show the alluvial diagram for the Markovian diffusion. Starting from the University of Cambridge at t_{0}, scientists are expected to move to many institutions with similar probabilities, illustrated by the similar thickens of edges from t_{0} to t_{1}. The universities reached at t_{1} are many, suggesting the existence of strong mixing, and from step t_{1} to t_{2}, the mixing is amplified.
In Fig. 6a, we show the alluvial diagram for the empirical diffusion of career trajectories originating from the University of Cambridge. In this panel (a), we see that from t_{0} to t_{1} many institutions are reached, again suggesting the existence of strong mixing. However, moving from t_{1} to t_{2} this trend is reversed, since most paths originating at the University of Cambridge go back there. This is visually shown as the blue edges from t_{1} to t_{2}. The preferred destination reduces the probability to reach many other locations at t_{2}, implying weak mixing.
Note that the segment from t_{0} to t_{1} is identical, albeit ordered differently, for both Fig. 6a and b, since both representations capture the first transitions correctly. The differences in the diffusion process emerge in the second step from t_{1} to t_{2}, when it matters whether a network is k_{opt}=1 or k_{opt}>1. As the comparison of Fig. 6a and b demonstrates, disregarding temporal correlations in the Markovian model leads us to overestimate mixing. Precisely, in Fig. 6a the returning rate is high and low in Fig. 6b. This qualitative result is in line with the KL analysis, highlighting the importance of motives of type I, (X→Y→X).
To quantify the expected mixing of the two models given the observed data, we use the entropy growth ratio introduced by (Scholtes et al. 2014). The interpretation of this measure is as follows. Given an observed transition e=A→B, we compute the corresponding entropy from the secondorder transition matrix, i.e., the probabilities to transition to the next nodes e^{′}:
where E is the set of edges, i.e., possible career moves from one university to another.
The entropy h_{e}(T^{(2)}) quantifies to what extent the next transition in a trajectory is determined by the previous transition e. For example, if a row (T^{(2)})_{e,·} has only one element equal to 1, then the transition is deterministic, and hence, h_{e}(T^{(2)})=0 implying weak mixing. While if all elements of a row are equal, then the next transitions happen uniformly at random, and hence, h_{e}(T^{(2)}) is high, implying strong mixing.
To compute the expected overall mixing, one could argue that the total entropy of the model is the sum of all the transition entropies, i.e., \(\sum _{e\in E} h_{e}\left ({\mathbf {T}^{({2})}}\right)\). However, this would be only partially correct as some transitions are observed more often, and hence, should have a higher weight. To account for this, we weight each h_{e} by the relative frequency of e. In formula, we have:
where π_{e} is the relative frequency of the transition e, that is given by:
where \(\mathcal {S}\) is the multiset of paths and subpaths constructed using the empirical data. With the above definition, we can compute the entropy growth ratio between the first and the second order models:
Note that for Λ_{H}(T^{(2)})>1 the diffusion process computed using the firstorder network would overestimate the level of mixing. If Λ_{H}(T^{(2)})<0, the opposite is true. We obtain a Λ_{H}(T^{(2)})∼3.4>1. This means that the growth of entropy coming from the firstorder model is 3 times larger than expected from the secondorder model. Hence, the mixing would be overestimated by using a firstorder model. This result is in line with the qualitative findings from Fig. 6.
City and country level
We now move our analysis from the top 100 universities to the city and country level. For this analysis, we aggregate trajectories traversing institutions located in the same cities or countries. By not limiting the analysis to specific set of universities, we increase the number of observed trajectories and scientists. Hence, we should expect a more reliable statistics. At the same time, we should be aware that aggregating and projecting sequential data can distort its temporal properties as discussed previously, see also Scholtes et al. (2014).
We restrict our attention to 3 740 187 individual scientist trajectories across 215 countries between 1990 and 2009 with papers listed in MEDLINE. 89% of the trajectories are of length 0, meaning that most scientists do not change country. However, this statistic also means that longer career trajectories are rare but still numerous (411 137). The most frequent trajectories of length one are between the UK and USA, Japan and the USA, and the USA and the UK. The presence of the US in these paths stems from the fact that the US has the largest scientist population in the dataset.
When considering trajectories of length two, the most frequent ones are between (Japan, USA, Japan), (USA, UK, USA), and (UK, USA, UK). If we consider only those trajectories that do not go through the USA, we find that the most frequent trajectories of length two are across (UK, Australia, UK), (France, UK, France) and (Germany, UK, Germany). All these types of trajectories are reminiscent of the motifs of type I, i.e. X→Y→X, found at the institution level. When looking at trajectories at the city level, we find a similar pattern (see Additional file 1). Therefore, we test for the existence of temporal correlations as we have done at the institution level.
By applying the multiorder modelling framework of Scholtes (2017), we identify k_{opt}=2, implying that a secondorder network best represents the data at the country level. When applying the same framework at city level, we identify k_{opt}=1. Hence, we do not find evidence of temporal correlations at city, but at country level. This result has several implications. First, even though we observe a large number of trajectories of type I at a both city and country level, these trajectories are still to be expected from a firstorder network model at city, but not at country level. Second, temporal correlations have been detected at the affiliation and country, but not at city level. Therefore, we find that the concerns raised by Butts (2009); Scholtes (2017); Zweig (2011) about the naive application of networkanalytic models apply to research on academic mobility.
Discussion
Scientific knowledge is shared not only through artefacts, e.g., publications, but also through scientists carrying out their research. Thus, their mobility is thought to facilitate knowledge exchange. “Mobility as a vehicle of knowledge exchange” implicitly assumes that academic careers can span the globe and universities. In this work, we have addressed this implicit assumption by testing for the existence of temporal correlations in scientific careers. The existence of such correlations would imply the existence of constraints in academic mobility.
To perform our analysis we have reconstructed, from two large bibliographic datasets, MAG (Sinha et al. 2015) and MEDLINE (Torvik and Smalheiser 2009; Torvik 2015), the career trajectories across the 100 top universities and across cities and countries, respectively. We do not find evidence that temporal correlations influence career trajectories at city level. However, we do find evidence that mobility, especially among elite institutions, is influenced by temporal correlations.
At the university level, we find that the mobility of scientists is affected by temporal correlations. In particular, a large part of these correlations is determined by career trajectories of type I, i.e., scientists return to their original university after two career steps. Indeed, the KLdivergence ratio R_{I} for these motifs is five times larger than R_{I}I. This result has a direct impact on the knowledge exchange as proxied by scientists mobility. Specifically, the implied mixing of knowledge through mobility is a lot lower than one would expect, by using a standard networks model. To quantify this statement, we have computed the entropy growth ratio Λ_{H}(T^{(2)}), that captures how much a standard network model overestimates the trajectory mixing. We find that Λ_{H}(T^{(2)})∼3.4, showing that a simple network model would overestimate the mixing by a factor of 3. A possible but rather pessimistic explanation is that a simple mental shortcut employed by hiring committees to favour “known origins” reinforces the path dependency and possibly hampers knowledge exchange.
At the city and country level, we find a large number of career trajectories of type I. However, at city level, we do not find statistically significant temporal correlations. This result indicates that type I trajectories or returnees (Agrawal et al. 2006, 2011; Saxenian 2005) are still to be expected from a standard network model. Additionally, the use of a standard network perspective is justified when studying academic mobility at city, but not at country level.
Overall from the results presented in this work, a picture emerges that scientific careers are a lot less diverse than one would expect. The existence of temporal correlations suggests that although among the 100 most prestigious institutions, there is substantial and continued exchange, the actual career trajectories are far more limited. Our results are relevant both methodologically and empirically, for research on highskill labour mobility, in general, and the analysis of scientists’ career trajectories, in particular.
Abbreviations
 KDD:

Knowledge discovery and data mining
 KL:

Kullback–Leibler
 MAG:

Microsoft academic graph
 MIT:

Massachusetts Institute of Technology
 MEDLINE:

Medical literature analysis and retrieval system online
 OECD:

Organisation for economic cooperation and development
 UCI:

University of California Irvine
 UCLA:

University of California Los Angeles
References
Agrawal, A, Cockburn I, McHale J (2006) Gone but not forgotten: Knowledge flows, labor mobility, and enduring social relationships. J Econ Geogr 6(5):571–591. http://dx.doi.org/10.1093/jeg/lbl016. http://arxiv.org/abs/arXiv:1011.1669v3.
Agrawal, A, Kapur D, McHale J, Oettl A (2011) Brain drain or brain bank? The impact of skilled emigration on poorcountry innovation. J Urban Econ 69(1):43–55. https://doi.org/10.1016/j.jue.2010.06.003 http://arxiv.org/abs/arXiv:1011.1669v3.
Bahar, D, Hausmann R, Hidalgo C (2012) International Knowledge Diffusion and the Comparative Advantage of Nations. SSRN Electron J 2014. https://doi.org/10.2139/ssrn.2087607.
Beechler, S, Woodward IC (2009) The global “war for talent”. J Int Manag 15(3):273–285. https://doi.org/10.1016/J.INTMAN.2009.01.002.
Beine, M, Docquier F, Rapoport H (2001) Brain Drain and Economic Growth: Theory and Evidence. J Dev Econ 64(1):275–289. https://doi.org/10.1016/S03043878(00)001334.
Boucher, A, Cerna L (2014) Current Policy Trends in Skilled Immigration Policy. Int Migr 52(3):21–25. https://doi.org/10.1111/imig.12152.
Butts, CT (2009) Revisiting the foundations of network analysis. Science 325(5939):414–416.
Chambers, E, Foulon M, HandfieldJones H, Hankin S, Michael III E (1998) The war for talent. McKinsey Q 3:44–57. https://doi.org/10.1080/03071840308446873.
Clauset, A, Arbesman S, Larremore DB (2015) Systematic inequality and hierarchy in faculty hiring networks. Sci Adv 1(1):1400005.
Deville, P, Wang D, Sinatra R, Song C, Blondel VD, Barabási AL (2014) Career on the move: Geography, stratification, and scientific impact. Sci Rep 4:1–7. https://doi.org/10.1038/srep04770.
Kullback, S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86.
Lambiotte, R, Rosvall M, Scholtes I (2019) From networks to optimal higherorder models of complex systems. Nat Phys 15(4):313–320.
Lentz, HH, Selhorst T, Sokolov IM (2013) Unfolding accessibility provides a macroscopic approach to temporal networks. Phys Rev Lett 110(11):118701.
Miguélez, E, Moreno R (2014) What attracts knowledge workers? The role of space and social networks. J Reg Sci 54(1):33–60. https://doi.org/10.1111/jors.12069.
OECD (2017) OECD Science, Technology and Industry Scoreboard 2017: The digital transformation. OECD Publishing, Paris. https://doi.org/10.1787/9789264268821en.
Petersen, AM (2018) Multiscale impact of researcher mobility,. J R Soc Interf 15(146):20180580. https://doi.org/10.1098/rsif.2018.0580.
Pfitzner, R, Scholtes I, Garas A, Tessone CJ, Schweitzer F (2013) Betweenness preference: Quantifying correlations in the topological dynamics of temporal networks. Phys Rev Lett 110(19):198701.
Rosvall, M, Esquivel AV, Lancichinetti A, West JD, Lambiotte R (2014) Memory in network flows and its effects on spreading dynamics and community detection. Nat Commun 5:4630.
Saxenian, A (2005) From Brain Drain to Brain Circulation: Transnational Communities and Regional Upgrading in India and China. Stud Comp Int Dev 40(2):35–61. https://doi.org/10.1007/BF02686293.
Scholl, T, Garas A, Schweitzer F (2018) The spatial component of R&D networks. J Evol Econ 28(2):417–436.
Scholtes, I (2017) When is a network a network?: Multiorder graphical model selection in pathways and temporal networks In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1037–1046.. Association for Computing Machinery, New York. https://doi.org/10.1145/3097983.3098145.
Scholtes, I (2019) Python package pathpy. https://github.com/uzhdag/pathpy. Retrieved 2 Jan 2020.
Scholtes, I, Wider N, Pfitzner R, Garas A, Tessone CJ, Schweitzer F (2014) Causalitydriven slowdown and speedup of diffusion in nonmarkovian temporal networks. Nat Commun 5:5024.
Sinha, A, Shen Z, Song Y, Ma H, Eide D, Hsu BjP, Wang K (2015) An overview of Microsoft Academic Service (MAS) and applications In: Proceedings of the 24th International Conference on World Wide Web, 243–246.. Association for Computing Machinery, New York. https://doi.org/10.1145/2740908.2742839.
Torvik, VI (2015) MapAffil: A Bibliographic Tool for Mapping Author Affiliation Strings to Cities and Their Geocodes Worldwide. DLib Mag 21(11/12). https://doi.org/10.1045/november2015torvik.
Torvik, VI, Smalheiser NR (2009) Author name disambiguation in MEDLINE. ACM Trans Knowl Discov Data 3(3):1–29. https://doi.org/10.1145/1552303.1552304.
Vaccario, G, Luca V, Schweitzer F (2018) Reproducing scientists’ mobility: A datadriven model. arXiv:1811.07229. http://arxiv.org/abs/arXiv:1811.07229. Retrieved 2 Jan 2020.
Verginer, L, Riccaboni M (2018) BrainCirculation Network: The Global Mobility of the Life Scientists. Working Papers 10/2018, IMT Institute for Advanced Studies Lucca. https://ideas.repec.org/p/ial/wpaper/42018.html. Retrieved 2 Jan 2020.
World Reputation Rankings (2015). https://www.timeshighereducation.com/worlduniversityrankings/2015/reputationranking#!/page/0/length/100/sort_by/rank/sort_order/asc/cols/undefined. Retrieved 2 Jan 2020.
Xu, J, Wickramarathne TL, Chawla NV (2016) Representing higherorder dependencies in networks. Sci Adv 2(5):1600028.
Zweig, KA (2011) Good versus optimal: Why network analytic methods need more systematic evaluation. Cent Eur J Comput Sci 1(1):137–153. https://doi.org/10.2478/s135370110009x.
Acknowledgments
The authors thank Dr. Giona Casiraghi for his useful suggestions on how to present the analysis for the entropy growth ratio.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
GV, LV and FS have conceived the study. GV and LV have performed the analysis. GV, LV and FS have written the manuscript. All author(s) read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1
Appendix.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Vaccario, G., Verginer, L. & Schweitzer, F. The mobility network of scientists: analyzing temporal correlations in scientific careers. Appl Netw Sci 5, 36 (2020). https://doi.org/10.1007/s4110902000279x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s4110902000279x
Keywords
 Global mobility
 Scientist careers
 Spatial networks
 Temporal correlations
 Multiorder graphical models