The academic wanderer: structure of collaboration network and relation with research performance

Paraskevopoulos, Pavlos; Boldrini, Chiara; Passarella, Andrea; Conti, Marco

doi:10.1007/s41109-021-00369-4

Research
Open access
Published: 14 April 2021

The academic wanderer: structure of collaboration network and relation with research performance

Pavlos Paraskevopoulos¹,
Chiara Boldrini¹,
Andrea Passarella¹ &
…
Marco Conti¹

Applied Network Science volume 6, Article number: 31 (2021) Cite this article

4281 Accesses
8 Citations
1 Altmetric
Metrics details

Abstract

Thanks to the widespread availability of large-scale datasets on scholarly outputs, science itself has come under the microscope with the aim of capturing a quantitative understanding of its workings. In this study, we leverage well-established cognitive models coming from anthropology in order to characterise the personal network of collaborations between scientists, i.e., the network considered from the standpoint of each individual researcher (referred to as ego network), in terms of the cognitive investment they devote to the different collaborations. Building upon these models, we study the interplay between the structure of academic collaborations, academic performance, and academic mobility at different career stages. We take into account both purely academic mobility (i.e., the number of affiliation changes) and geographical mobility (i.e., physical relocations to different countries). For our investigation, we rely on a dataset comprising the geo-referenced publications of a group of 81,500 authors extracted from Scopus, one of the biggest repositories of academic knowledge. Our main finding is that there is a clear correlation between the structure of co-authorship ego networks and academic performance indices: the more one publishes and the higher their impact, the larger their collaboration network. However, we observe a capacity bound effect, whereby, beyond a certain point, higher performances become increasingly less correlated with large collaboration networks. We also find that international academic migrants are better at growing their networks than researchers that only migrate within the same country, but the latter seem to be better in exploiting their collaboration to achieve higher impact. High academic mobility does not appear to translate into better academic performance or larger collaboration networks. This shows a different finding with respect to related literature, where scientific productivity is seen as directly linked to mobility. Our results show that, when looking at impact of research, this is not necessarily the case.

Introduction

Science of science (Fortunato et al. 2018) is a recent discipline that aims to get a quantitative large-scale understanding of the ins and outs of the scientific endeavour, leveraging the unprecedented availability of digital data on scholarly output. Science of science touches upon several issues, such as the tension between innovation (high-risk, high-gain) and tradition (Foster et al. 2015), citation dynamics (Eom and Fortunato 2011), gender inequalities (Moss-Racusin et al. 2012), the impact of teamwork (Wuchty et al. 2007). The most relevant to this work is the topic of career dynamics, which entails studying how scientific careers unfold over the years, often in response to milestone events like promotions or a move to different institution. Scientific mobility can, indeed, yield major career opportunities for a scientist, like access to better research facilities or collaborations with prestigious colleagues. For this reason, it has attracted a lot of attention from the research community (Deville et al. 2014; Sugimoto et al. 2017; Petersen 2018). At the same time, each movement may affect the network of already-established collaborations between co-authors (Petersen 2018) (adding new ones or churning old collaborations). Co-authorship partnerships have been shown to play a significant role in a scientistist’s career, in special cases (referred to as super ties) contributing to above-average productivity and a 17% citation increase per publication (Petersen 2015). For this reason, they are a very important dimension to investigate.

Since academic mobility has the potential to disrupt academic partnerships, as geographic distance and different academic incentives at different institutions may make collaborating harder, in this work we study the effects of academic mobility on co-authorship collaborations. From the mobility standpoint, we consider both pure academic mobility (i.e., a change in affiliation) and geographical mobility (e.g., a relocation to a different country). For studying the collaboration networks of scientists, we leverage results from anthropology, and specifically the well-studied cognitive model known as social brain hypothesis. The main idea behind the model is that our brain is allocated a certain capacity for social interactions and that this capacity is distributed unevenly across our social relationships. In particular, we are able to interact meaningfully (i.e., to engage in nurturing a relationship at least at the level of sending birthday/Christmas wishes once a year (Hill and Dunbar 2003) with just a handful of our overall social relationships. This number, whose average is around 150 (Dunbar 1998), is widely known as Dunbar’s number. While traditionally associated with purely social contexts, Dunbar’s number has been shown to emerge also in many types of organizations (Dunbar 2010). Dunbar’s theories also characterise how our cognitive efforts are allocated to these 150 peers (typically referred to as alters). Specifically, these 150 relationships are structured in concentric groups (known as circles) around each individual, and the sizes of these groups are multiples of three. In fact, the typical sizes of the circles are 1.5, 5, 15, 50, 150. The innermost circles contain the strongest relationships, while the opposite holds for the outermost ones (Hill and Dunbar 2003; Zhou et al. 2005). Notably, this hierarchical social structure is known to impact significantly on who we trust (Sutcliffe et al. 2015), on the way information spreads in social networks (Arnaboldi et al. 2014), and on the diversity of information that can be acquired by users (Aral and Van Alstyne 2011). This layered social structure is typically represented using an ego network graph (Hill and Dunbar 2003; Everett and Borgatti 2005; Lin et al. 2001; McCarty 2002), in which the individual, referred to as ego, is at the center of the graph, and the edges connect her to the peers (alters) with which she interacts. Figure 1 illustrates the classical ego network structure. Interestingly, this ego network structure has been confirmed also for co-authorship relationships (Arnaboldi et al. 2016) (please refer to "(Ego) networks of collaborations" section for details).

Building upon these results, in this work we set out to study the interplay between co-authorship ego networks, academic performance, and academic mobility. To the best of our knowledge, this problem has not been considered before in the related literature. For our study, we leverage a dataset extracted from the Scopus repository comprising all the publications of 81,500 scholars at different career stages (ranging from young researchers to distinguished professors). We characterise each scientist based on their academic performance (scientific productivity and impact) as well as their mobility profile (in terms, e.g., of number of institutions changed during the career and domestic vs international mobility). Then, we investigate how these different dimensions of a researcher’s career correlate with the structure of their collaboration networks (modelled as ego networks). Specifically, we tackle the following research questions:

RQ1::: Is the productivity of a researcher associated with larger co-authorship circles in their ego network?
RQ2::: Does the impact of a researcher translate into larger co-authorship circles?
RQ3::: Is the mobility (international/domestic) of an author associated with larger collaboration circles? And how is it related to academic performance?
RQ4::: Does any cognitive capacity bound, similar to those existing in social relationships, manifest also in terms of collaboration network structure?

Our main findings are the following:

In general, there is a clear positive correlation between sizes of layers and performance indices (both productivity and impact): the more one publishes or the higher their impact, the larger the layers of their ego network. However, the correlation is particularly high only for the external layers and this is in agreement with the theory on the different cognitive effort that is allocated to the collaborations in the different circles. The outermost circles only require low effort, while the innermost ones necessitate of intense nurturing, hence cannot be grown at will.
The existence of a capacity bound effect on the size seems confirmed: beyond a certain point (in terms of performance) higher performances are less and less correlated with large collaboration networks. This confirms at a higher level of detail the similarity between cognitive constraints in general social networks and in scientific collaborations. Also, this result seems to support the findings in Petersen (2015) about stable partnerships as a strategy for success.
International and domestic migrants show different efficiency levels in exploiting their collaboration networks, respectively for productivity and impact. While international migrants seem more effective in exploiting larger networks for higher number of publications, domestic migrants seem to be more effective in exploiting them for achieving high impact. This suggests that international migrations (which typically require more effort on the side of the migrating person) are not always particularly effective in terms of impactful research.
International migrants enjoy a boost in their co-authorship ego networks earlier in the career with respect to domestic migrants. The latter, however, are able to catch up and end their career at comparatively the same ego network levels.
The most mobile authors do not publish more and increase the number of their collaborations. Moreover, they are not able to capitalise mobility in terms of impact, as for them having bigger ego network sizes is less correlated with high impact with respect to regular authors.

The rest of the paper is organized as follows. In "Related work" section we present the related work. In "(Ego) networks of collaborations" section, we introduce the theory behind the concept of ego networks and we discuss why it is relevant for co-authorship networks. In "The dataset" section we overview the available repositories of academic knowledge and we discuss why the Scopus one, used in this paper, offers a clear advantage. We also explain how we collect and pre-process the data. In "Methodology" section we introduce the concept of researcher profile and we present the methodology used for analysing the data. Our findings are then discussed in "Results" section. Finally, "Conclusion" section concludes the paper.

Related work

Many studies have focused on the analysis of the effects of academic migrations and the variety of parameters that affect them. Moed et al. (2013) study the migration flows from a country to another and how the language similarities affect them. Five countries (Germany, Italy, the Netherlands, UK, and USA) are considered, and authors are observed for a period of ten years. Moed et al. observe that increased international migration corresponds to increased international collaboration, but that the two processes are somewhat distinct. They also show that language similarity is one of the most important factors in relocation decisions. This is one of the few works in the literature that leverages Scopus data for the analysis of scientific mobility, like we do in this paper. Urbinati et al. (2019) aim at identifying the countries that play a central role in the international migrations of researchers. Recently, González et al. (2020) have used the Scopus data to study internal scholarly migrations in Mexico.

Petersen (2018) aims at measuring shifts in researcher profiles before and after a cross-border mobility event. For this study, the author uses the APS Research Review dataset, comprising information about 237,038 physicists and 425,369 scientific papers (filtered such that only 26,170 researchers matched the target career longevity and productivity criteria). Petersen’s focus is on geographic displacements and their effect on the number of citations and collaboration network. The take-home messages of this work are that (1) migration is associated with a significant churning in the collaboration network, up to the point where all former collaborations are curtailed (this happens only in 11% of cases), (2) the professional ties created after a migration event are less strong hence people will tend to move more also in the future, (3) career benefits of mobility (measured in terms of number of citations and diversity of research topics and collaborations) are common to all ranks, not just elite scientists. Relying on data coming from the Web of Science database, Sugimoto et al. (2017) set out to study the benefits of global researchers mobility. They found that only a small portion of researchers in their dataset had more than one affiliation. And yet, mobile scholars always achieve higher citation rates than non-mobile ones. Researchers that do move do not severe their ties with their country of origin but instead work as bridges between different countries, keeping connections to their home country and sometimes, at a point, going back to their country of origin. However, Sugimoto et al. focus on authors that published their first paper after 2008, hence implicitly targeting only young researchers. Franzoni et al. (2014) focus on the effects of academic mobility on research performance (measured in terms of the impact factor of the papers). This study focuses on 4 specific research fields and leverages data from 16 countries. They highlight the positive effects on the career for researchers that migrate. This work relies on surveys data comprising 14,299 scientists.

Researcher mobility is considered in Deville et al. (2014), James et al. (2018) and Vaccario et al. (2019), but from a different perspective. James et al. (2018) tackle the problem of predicting the next career move of a researcher. Vaccario et al. (2019) reconstruct the temporal graph of worldwide movements of scientists and study its properties. Deville et al. (2014) measure the impact of a movement to a higher- or a lower-ranked institution. The authors find that a movement to a lower-ranked institution is going to probably affect negatively the profile of a researcher, however the opposite does not hold, with the impact of authors that move to highly-ranked institutions remaining the same.

The geographical composition of the collaboration networks is considered in Abramo et al. (2011), Kato and Ando (2013) and Panzarasa and Opsahl (2008). The main research question in this body of work is whether international collaborations are associated with better research performance with respect to national ones. A positive correlation is found by Abramo et al. (2011) and confirmed by Kato and Ando (2013). Panzarasa and Opsahl (2008) find that geographic distance among collaborators has positive effects on research performance (but the main focus of their work is the interplay between multi-authorship and specialization).

The goal of our contribution is to study the interplay among co-authorship ego networks, academic performance, and academic migrations. In contrast to Moed et al. (2013), Urbinati et al. (2019) and González et al. (2020), we are not interested in scholarly migration flows from the countries perspective. Nor do we focus on the ranking and reputation of institutions [as in Deville et al. (2014)] or on the prediction of the next career move (James et al. 2018). Differently from Abramo et al. (2011), Kato and Ando (2013) and Panzarasa and Opsahl (2008), we are interested in the migration of the “ego” author, not in the geographical affiliation of its collaborators (or alters, in ego network terminology). Closest to our work are Sugimoto et al. (2017), Petersen (2018) and Franzoni et al. (2014), since their main focus is on individual researchers and the relationship between their academic migrations with their scholarly performance. Differently from previous studies, though, we study coauthorship networks through the lens of ego networks (which provide a well-established framework for capturing the different relationship intensity), and we analyse the effect of both cross-border and domestic academic mobility on the structure of researchers’ co-authorship ego networks. Furthermore, we split the analysis in six career groups, based on the duration of the career of each researcher. Finally, we separately analyze exceptional cases such as researchers that tend to publish a lot or that have an exceptionally high number of collaborations or exceptionally high mobility, to distinguish their behaviour from that of the “average” researcher at the same career stage.

Please note that this paper builds upon our previous work in Arnaboldi et al. (2016) but it is not an extension of it. In Arnaboldi et al. (2016) we have validated the presence of ego network structures [see "(Ego) networks of collaborations" section for more details on the work carried out in Arnaboldi et al. (2016)] in co-authorship networks. Here we provide an in-depth analysis of how such structures change depending on the career stage, productivity, impact, and mobility of each researcher. A preliminary analysis of the effect of mobility on the ego networks of researchers has been recently presented in Paraskevopoulos et al. (2020), where we have investigated the variation of the size and composition of scholarly ego networks before and after a movement. In that work, though, neither the layered structure of ego networks or the different types of migrations have been studied. The results in Paraskevopoulos et al. (2020) can be considered as a preliminary dynamic analysis of how ego networks change upon movements, while the present work is an investigation of the static ego networks across the whole career. Thus, the two works provide distinct but complementary views on co-authorship ego networks.

(Ego) networks of collaborations

Each co-authored paper marks a milestone in a collaboration relationship. When a paper is out in the academic world, its authors have already invested a lot of their time and mental resources to its development. While it is well accepted that academic migrations foster new collaborations and novel results, there might be a price to pay in terms of old collaborations. Relationships may also be affected by geographical distance. Difficulty in interactions due to lack of face-to-face opportunities and time zone difference may also affect the collaboration network of more mobile researchers differently than for less mobile ones. Since the local network of collaborations features similar properties as the purely social one [as shown in Arnaboldi et al. (2016)], it is reasonable to expect that its dynamics may be similar as well. Specifically, findings from anthropology state that social relationships have a maintenance cost (Roberts and Dunbar 2015) and that humans can invest in a defined personal relationship with a limited number of individuals. As a result, as more relationships are added, the intimacy with old relationships inevitably decreases.

The availability of large-scale longitudinal data about researchers mobility and their collaborations offers a unique playground for studying the effect of mobility on collaboration relationships. In previous work, co-authorship has been treated as a binary relationship between a pair of researchers, without considering the intensity of the collaboration. However, considering the intensity of collaborations may provide novel insights into how researchers structure their work-related network. At the same time, considering collaboration intensity allows us to study the effect of academic mobility on such networks. In the remaining of the section, we provide a brief overview of the main results from the social sciences ("Background" section) and we discuss the ego network model for co-authorship networks ("Ego networks of scientists" section).

Background

According to the social brain hypothesis from anthropology (Dunbar 1998), the social life of primates is constrained by the size of their neocortex. Specifically, for humans, the typical group size is estimated around an average of 150 members, a limit that goes under the name of Dunbar’s number. This limit is related to the cognitive capacity that humans are able to allocate to nurturing their social relationships, and it originates from the time and efforts that primates could devote to grooming. The 150 social relationships do not include acquaintances, but only people with which a coherent quality relationship is entertained. In a modern society, this entails, at the very least, exchanging birthday or holiday cards every year (Hill and Dunbar 2003).

Unsurprisingly, humans are not fair in how they distribute their cognitive attentions among the 150 important persons they have around them. To the contrary, it is possible to group our social relationships in circles of increasing intimacy, as illustrated in Fig. 1. This layered social structure is typically referred to as ego network (Hill and Dunbar 2003; Everett and Borgatti 2005; Lin et al. 2001; McCarty 2002). The individual (ego) is at the center of the graph, and the edges connect her to the peers (called alters) with which she interacts. Typically, each of us has at least four circles of intimacy (Hill and Dunbar 2003; Zhou et al. 2005), with the innermost one (support clique) including our close family and the outermost one active network made up of all the people one meaningfully interacts at least once a year. The circles are conventionally concentric, with the inner layers contained in the outer ones. Many people also feature an additional inner layer (contained in the support clique) comprising on average 1.5 alters for which they have a very high emotional investment (Dunbar et al. 2015). The ego-alter tie strength is typically computed as a function of the frequency of interactions between the ego and the alter. The ego network structure is characterised by a striking regularity, with an approximately constant ratio of 3 between the size of consecutive layers.

The social brain theory was developed for the offline world, in which relationships were nurtured via face-to-face interactions, letters, or landline phone calls. However, Dunbar’s model was confirmed to hold also on several non-face-to-face communication means [such as email communications (Haerter et al. 2012)], mobile phone calls (Miritello et al. 2013), and Online Social Networks (Dunbar et al. 2015; Gonçalves et al. 2011).

Ego networks of scientists

More interestingly with respect to the scope of this paper, many types of social organizations are arranged around the layered structure discussed in the previous section. This is the case of, among others, modern military organizations (Zhou et al. 2005) and investors networks (Johansen et al. 1999). In a previous work (Arnaboldi et al. 2016), we have confirmed the existence of this layered structure for academic collaborations as well. Our goal here is not to validate the presence of similar layered structures for co-authorship networks, as this has been already done in Arnaboldi et al. (2016). Our goal is to understand how this structure is related to the academic performance and the mobility profile of researchers, especially for what it concerns the size of the layers. To this aim, we recall below the reference ego network model for academic collaborations defined in Arnaboldi et al. (2016).

We consider the researcher i as the ego and all the co-authors appearing in the research publications of the ego as the alters. From all the publications on which the ego i and the alter j have worked together, the intensity (i.e., tie strength) $t_{ij}$ of the collaboration between i and j can be computed as follows:

$$\begin{aligned} t_{ij} = \frac{1}{d_{ij}} \sum _{p \in \{P(i) \cap P(j)\}} \frac{1}{k(p)-1}, \end{aligned}$$

(1)

where $d_{ij}$ represents the duration of the co-authorship relation (i.e., time between first co-authored paper and last paper published by the ego i), k(p) denotes the number of authors of paper p, and P(i) denotes the set of papers authored by researcher i. As a filtering parameter, in order to have a sufficiently long observation period for each collaboration, we consider as adequate only the co-authorship relations longer than six months, thus discarding those that have been initiated the last six months of the ego’s career. We have tested thresholds at both at six months and twelve months. The difference, on this specific dataset, was negligible, hence we opted for the 6-month threshold in analogy with Arnaboldi et al. (2016).

Each co-authored paper refreshes and nurtures the relationship, hence it adds a quantum of tie strength to the collaboration. This quantum is inversely proportional to the number of co-authors, thus this definition of tie strength embodies the notion that a smaller number of co-authors typically implies stronger collaborations. Note that the tie strength in Eq. 1 is basically a frequency of common co-authorship between i and j (modified according to the number of co-authors), as we divide by the length of time $d_{ij}$ since the co-authorship has been established. The latter implies that this tie strength definition factors in the ups and downs of collaborations, as it decreases if no new co-authored publications are produced over time. Analyses leveraging this tie strength definition are usually referred to as static, because this definition yields an average picture of the state of the collaboration for its entire duration. The complementary analysis, referred to as dynamic (Arnaboldi et al. 2013), entails studying the variation of the tie strength over time. This approach would be suitable, for example, to investigate whether the academic status of a collaboration affects the future evolution of the co-authorship ego network, or how the overall collaboration network growth process affects individual ego networks over time. Investigating such dynamic aspects for scholarly collaboration is left as future work. In this work, we focus on the static analysis, which has been shown to be very effective in identifying social cognitive constraints in the related literature (see "Background" section). Finally, note that the frequency of interaction is one of the typical measures of tie strength used in the literature (Dunbar et al. 2015). This definition also implies that we are working with weighted networks.

Having computed the tie-strength between the ego and the alters, we cluster the co-authors based on the tie-strength, in order to obtain the circles of collaborations. Thus, we cluster tie strengths between the ego and the alters in 5 groups. For getting the optimal break points of each cluster, we use the Jenks natural breaks classification method (Jenks 1977). As we will see in "Results" section, five is typically the optimal number of circles for our dataset (this is in line with previous results, as discussed in "(Ego) networks of collaborations" section). Once we confirmed this, we adopt the common practice in the related literature (Arnaboldi et al. 2016): we map all the egos’ collaborations into 5 circles. This practice reduces the complexity of the analysis, because it allows us to analyse all the egos together, rather than splitting them based on the optimal number of collaboration circles of each ego. For getting the optimal break points of the fixed 5 clusters, we use the Jenks natural breaks clustering method (Jenks 1977). As a result, for each ego we obtain an ego network of 5 layers, where the first one contains the strongest collaborations and the fifth contains the weakest ones. The ego network size is defined as the total number of collaborations existing across all the ego network layers.

The dataset

In this section, we discuss why we chose Scopus as our reference dataset ("The academic data landscape" section), how we collect our data ("Data collection" section), and how we preprocess and filter it ("Extraction of academic mobility profiles–Meaningful co-authorships" sections).

The academic data landscape

Finding a suitable dataset for studying academic mobility entails obtaining longitudinal information on the affiliations of a reference set of authors. Discarding unscalable approaches that involve surveys (Wang et al. 2019) or manually checking the website/CVs of the authors in order to compile the affiliation history, we can rely on several platforms that provide information about individual authors (Table 1). Some of them export free or premium APIs that can be used to retrieve information, some others offer dumps of their data base covering a specific time span or a certain fraction of authors. When it comes to listing affiliations (which, as explained in "Extraction of academic mobility profile" section, we use as proxy for detecting mobility), some of them [like DBLP, used in Servia-Rodríguez et al. (2015)] do not provide the information at all. Google Scholar [used in Arnaboldi et al. (2016) and Servia-Rodríguez et al. (2015)] lists only the most recent affiliation if this information is provided by the authors themselves. In ORCID, researchers can enter their full affiliation history. The resulting affiliations are typically reliable and accurate, but the drawback of this approach is that, for most author profiles, the list of affiliations is often missing or incomplete, as it is provided on a voluntary basis. ORCID data has been used in Urbinati et al. (2019) and Bohannon and Doran (2017) for studying researchers mobility. A more scalable approach is based on the automatic extraction of the affiliation directly from papers and their metadata. Pursuing this strategy, repositories like APS, MedLine/PubMed (since 2014), Microsoft Academic, Scopus, and Web of Science (WoS) are able to provide affiliation information for each paper and for each author in their database. WoS, used in Sugimoto et al. (2017) for studying the global circulation of scholars, requires premium fees for accessing affiliation information. Hence, we do not consider it further in this study. APS has been often used as reference repository in the related literature (Deville et al. 2014; Petersen 2018; James et al. 2018; Wang et al. 2013; Qi et al. 2017). MedLine was used in Vaccario et al. (2019). However, while the affiliation per se is useful in studying academic mobility [i.e., movements from one institution to another one, as in Deville et al. (2014)], it needs to be further processed if the goal is to study geographic mobility as well. For example, Petersen (2018) extracts geographical information from APS data by matching country names and codes in the affiliation text string (geoparsing). Another approach entails querying location services (such as Google Maps) in order to retrieve the position of the university a scientist is affiliated to James et al. (2018). Much more convenient is to rely on databases that already provide preprocessed geographical information. The Scopus API does that, offering geo-referenced information about the affiliation up to the city level granularity. For this reason, in this work we focus on affiliation data coming from the Scopus repository.

Table 1 Summary of academic knowledge/career repositories

Full size table

Data collection

In order to identify a set of authors to study, we focus on the group of authors whose profiles and publications were collected by Arnaboldi et al. (2016). Arnaboldi et al. crawled the authors’ profiles in Google Scholar starting from a single category (“computer science”), then accessed all the categories found in the visited profiles, iterating the procedure until no new categories were found. Since Google Scholar does not provide the history of affiliations, nor their geographical position, the dataset collected by Arnaboldi et al. cannot be used directly for our analysis. Thus, in order to leverage the georeferenced affiliation history provided by Scopus, we downloaded afresh from Scopus all the publications authored by the Google Scholar researchers studied in Arnaboldi et al. (2016). The download took place in November 2019. We used the official Scopus APIs.^{Footnote 1} Specifically, we used the Author Search API, the Author Retrieval API, and the Affiliation Retrieval API, as described below.

Since an automatic mapping between the researcher IDs across the two platforms (Google Scholar and Scopus) does not exist, we established this mapping ourselves. Given that two researchers can share the same name, it is not enough to match the first and last name in the two datasets. In order to resolve this ambiguity, we decided to match each Google Scholar researcher $r_{g}$ with a Scopus researcher $r_{s}$ of the same name that has authored at least one paper with the same title as those authored by $r_{g}$. In practical terms, we passed the name and surname of the researcher $r_{g}$ to the Scopus Author Search API, retrieving all the researchers in Scopus matching the (string) name passed. This allowed us to get the IDs of the potential matches $r_s$ in the Scopus database, while also getting their full publication record. Through these publication records, we were able to match the titles of $r_{g}$’s papers in the Scholar database with those of the potential matches in the Scopus database. We establish a match between $r_{g}$ and a potential match $r_s$ if $r_s$ shares with $r_g$ at least a publication with the same name. Then, we use the Scopus ID of $r_s$ to get, via the Author Retrieval API, the full author profile: full name, h-index, number of publications, citations received and research areas. The full history of affiliation is obtained from the publication records. In case of incomplete affiliation information in the publication record, we queried the Affiliation Retrieval API.

In the end, we successfully identified 102,236 authors (out of the initial 285,577 in the Google Scholar dataset) that have authored 5,847,723 publications. For these authors, we have their ID in the Scopus database, the total number of the citations received and their h-index (at the time we collected the data), as well as the complete list of their publications (geo-referenced, timestamped, with all co-authors). Please note that this Scholar-Scopus mapping does not directly affect the ego networks of authors. In fact, it simply reduces the pools of egos, since the Scholar ego authors for which a match is not found in Scopus are dropped. The mapping is not applied to co-authors (alters), so they are not affected. However, the ego networks extracted from Google Scholar may be different from those extracted from Scopus, since the papers published on the two platforms do not completely overlap.

Extraction of academic mobility profiles

The raw Scopus dataset yields a list of geo-referenced affiliations per author, timestamped with the publication date of each paper. Since the actual movements of a researcher cannot be directly tracked from a scholarly database, we need to use the affiliated institutions listed in papers as a proxy of the locations visited by a researcher. This approach is widespread in the related literature (Sugimoto et al. 2017; Petersen 2018; James et al. 2018), since it allows for large-scale automated collection of academic movements. However, there are several limitations. First, there is always a time lag between a change in affiliation and the first published paper that allows us to detect it. Second, a change in affiliation does not always imply a physical relocation to a new institution, especially for more senior researchers. Third, the detection and proper management of individual multiple affiliations is not trivial. On the one hand, the multiple affiliations tend to be quite noisy in the paper metadata, as there is no standard way for reporting them and this makes their extraction more complex. On the other hand, since a person cannot be in two geographical locations at exactly the same time, an author with multiple affiliations has to be assigned to a single one of them, according to a certain criterion (see below for a detailed discussion on the one we have chosen). Despite these limitations, at the moment, to the best of our knowledge, inferring mobility from the affiliation history in academic repositories is the only viable alternative to surveys, which are costly and do not allow researchers to obtain comparable large-scale datasets.

In order to reconstruct the academic trajectory of the researchers in our dataset, we create, for each of them, a time series where, for each point, the x-coordinate is given by the month and year (timeslot) the researcher has published a paper, and the y-coordinate by the geo-referenced affiliation reported by the researcher in the paper. Recall that each affiliation can provide geographical information at different granularity levels (country, city). When the specific granularity is not important, we will refer unambiguously to them as locations. Two issues may be encountered in the construction of this time series. Specifically, the author may have multiple publications with different affiliations associated with the same timeslot or she may have more than one affiliation per paper. In both cases, the most representative affiliation should be kept. We developed a probabilistic model that keeps the affiliation that is referred more frequently in a specific timeslot. In case this is not enough to extract the most representative affiliation, we take into consideration the affiliations used in the temporally closest (either in the past or in the future) timeslot.

Once we have the location time series (each representing an academic and geographical trajectory), we can get the distance between consecutive movements. To this aim, we leverage the Google Maps API, to which we pass the locations of two consecutive timeslots in the time series. However, some affiliations may have missing georeferenced data, such as a missing city or country. In order to bypass this issue, whenever a city does not exist in the affiliation record, we query Google Maps for the address of the institution directly. A drawback that emerged was the mistaken identification of some institutions, a problem that introduced noise in the computation of the distances travelled. However, after manually checking these cases, we found out that the percentage of the mistaken identification was lower than 1%. Thus, we do not further handled this issue.

Meaningful co-authorships

As discussed in "(Ego) networks of collaborations" section, each collaboration must correspond to a non-negligible cognitive effort spent for maintaining the relationship. This requirement clashes with the presence, in the dataset, of papers with very many authors. In that case, it can be hardly argued that those authors have strong personal collaborations between themselves (at least, this is unlikely to be true for all the authors involved). In order to handle this problem, we focus on the distribution of the number of co-authors per publication. The average number of co-authors per paper in our dataset is 26, with a standard deviation of 219. The maximum number of co-authors in a paper is 5563, while 96.8% of the papers have less than 26 co-authors. Setting the latter as the threshold for assuming a meaningful collaboration, we dropped from the analysis the papers that have more than 26 authors (thus keeping approx. 97% of papers), reducing the number of the usable papers to 5,659,268. Our main goal here is to discard the long tail of extremely large collaborations, in the absence of a clear reference threshold from the related literature.

Finally, from these meaningful co-authorship relations, we construct the ego networks as described in "Ego networks of scientists" section. For a minority of egos (those with a small optimal number of circles according to Mean Shift), we could not populate the fixed five circles properly. Thus, we do not consider these scientists in our analysis. Please note, though, that this does not affect the finding about the layered structure of co-authorship ego networks, as the latter is confirmed (see "Researcher profiles overview" section) before dropping the authors with empty layers. In the end, the dataset we use for our analysis includes 81,500 authors with complete and geo-referenced ego networks.

Methodology

The goal of this work is to characterise the interplay between the personal networks of scientists, their mobility, and their academic performance. To this aim we first need to compile a complete researcher profile for each author, based on the data extracted from Scopus. This task is described in "Building the researcher profile" section below. Then, in "Correlation analysis" section we discuss our approach for establishing a relationship between the different dimensions of the researcher profiles.

Building the researcher profile

Leveraging the dataset described in "The dataset" section, we obtain, for each researcher, the following set of features (summarised in Table 2).

Table 2 The researcher profile

Full size table

Research areas

The research areas a scholar has been actively working on are identified from the 27 master categories shown in Table 3, which correspond to the ones provided by the Scopus API.

Table 3 Research areas

Full size table

Academic performance features

The two indexes of academic performance we consider are the number of papers a researcher has published and the h-index the researcher has achieved. While the former describes the productivity quantitatively, the latter is a measure of the impact of the work within the research community.

Career stage

Since publications are accumulated over time, comparing performance features between young and senior researchers is not sensible, as the same numbers that would categorise a young researcher as brilliant would put a senior researcher in the under-performing category. Thus, we apply a binning based on the duration of the research career, measured as the time distance between the last and first publication in our dataset. Note that studying the different career phases separately also allows us to mitigate the effect of the time scale at which the scientific collaboration networks evolve. We consider the categories of: PhD students (0–3 years), young researchers (3–6 years), assistant professors (6–10 years), associate professors (10–28 years), full professors (28–38 years) and distinguished professors (38+ years). The group definition and their sample size^{Footnote 2} in our dataset can be seen in Table 4. Please note that the career group names are intended to be a proxy for the career length in years, aimed at making the explanations more reader-friendly. In fact, the Scopus dataset does not allow us to derive the real professional position of individual scholars.

Table 4 Career groups and corresponding sample size (number of researchers in our dataset)

Full size table

Academic mobility features

Academic mobility can be studied at the institution level and at the geographical level. The main barrier to the latter is the granularity of geo-referenced data one can obtain/extract per researcher. As mentioned before, Scopus provides georeferenced affiliation information up to the city granularity. For each author we can, thus, provide: the number of institutions with which they have been affiliated over the course of their career, the number of different affiliation locations (both at the city and the country granularity), as well as the distance (in km) between consecutive affiliations.

A relocation to a new institution may take its toll on a researcher and their collaborations. This is particularly true when relocating implies moving to a different country. The geographical distance, the different academic culture and incentives might make it harder for a researcher to nurture its collaboration network. For this reason, among the mobility features we also measure whether the researcher has moved at least once to a different country or not. We refer to the scientists in the former category as international migrants, while the latter are referred to as domestic migrants. We acknowledge that this classification may seem loose, since a domestic migration in a very large country may be very different from one within a small country. However, movements within the same country typically enjoy a greater homogeneity of norms and practices with respect to international movements. For this reason, we believe that this classification is well-suited to capture the cognitive effort implied by the two classes of movements.

Collaboration network features

Finally, as explained in "Ego networks of scientists" section, we cluster each ego’s collaborators in five groups, generating five “concentric circles” according to the ego network model. In the following, we refer to the circles as C1, C2, ..., C5, from the most intimate one (strongest collaborations) to the least intimate one, respectively.

Correlation analysis

In this work we carry out an analysis of the relationship between the different dimensions in the researcher profile, with a particular focus on the co-authorship ego network. The specific relationships we investigate are explained in detail in "Results" section. In general, we study the correlation between different elements of a researcher’s profile, such as, e.g., the correlation between ego network layers size and number of affiliations changed. For studying how a pair of features X, Y is correlated, we use the Pearson correlation, which can be computed as follows:

$$\begin{aligned} r_{xy} = \frac{\sum _i (x_i - \overline{x})(y_i - \overline{y})}{\sqrt{\sum _i (x_i - \overline{x})^2 } \sqrt{\sum _i (y_i - \overline{y})^2 }}, \end{aligned}$$

(2)

for n points $\left\{ (x_{1},y_{1}),\ldots ,(x_{n},y_{n})\right\} $. Note that the Pearson correlation detects associations between the X, Y, not causal relationships, thus we do not claim any causal effect between mobility and ego network features, just co-occurrence. Note also that Pearson correlation does not make any assumption on the underlying distribution of random variables X and Y beyond requiring their variance to be defined (Dekking et al. 2005). This is because, intuitively, the correlation is simply a standardized version of the covariance, allowing the latter to become independent of the units in which X and Y are measured. In "Results" section, we also study separately the top researchers (from the performance or mobility standpoint) from the others. Correlations are known to be strongly affected when one of the variables is constrained. In this case, we apply a correction to the Pearson correlation known as Thorndike correction (Sackett and Yang 2000; Thorndike 1949). It is well documented that corrected correlations are less biased than uncorrected ones, under a wide range of assumptions violations (Gross and Fleischman 1983; Holmes 1990). Specifically, we use the Thorndike Case 2 correction when we directly restrict the range of the X variable ("Academic performance and co-authorship ego networks" section). Instead, we use the Thorndike Case 3 correction when we restrict the range of a third variable Z, which however might induce an indirect range restriction on X ("Mobility profile and co-authorship ego networks" section).

The confidence in the reported correlations is measured relying on bootstrapped confidence intervals. This non-parametric approach yields much more robust result with respect to methods [e.g. Fisher transformation (Fisher 1915)] that assume X and Y to be approximately normal. Specifically, we use the bias-corrected and accelerated (BCa) bootstrap (Efron 1987), which adjusts for both bias and skewness in the bootstrap distribution. In this work, we use 95% confidence intervals, obtained performing 10, 000 bootstrap replicas.

Results

We first present, in "Researcher profiles overview" section, a brief exploratory analysis of the dataset. Then, in "Academic performance and co-authorship ego networks" section, we study how the structure of the co-authorship ego network correlates with the considered performance indices, i.e., the number of publications and the h-index of researchers. In "Mobility profile and co-authorship ego networks" section we focus on the correlation between the structure of the co-authorship ego network and the mobility characteristics of researchers (number of institution visited, domestic or international migration).

Researcher profiles overview

In order to better understand the type of researcher profiles featured in our dataset, we briefly explore the main characteristics of the features discussed in "Building the researcher profile" section. We start by analysing the research areas (see Table 3) covered by the authors in our dataset (Fig. 2), in order to assess the representativeness of our sample. We observe that the most popular research areas never account for more than 8–9% of the sample, and almost all areas cover at least 1% of the sample. Hence, we dot not expect the results to be substantially skewed by one “outlier” subject area.

We now move to analysing the career duration, whose histogram is shown in Fig. 3. The average career duration is 17.1 years and the standard deviation in our dataset is 9.2. Almost all the career groups have a sample size of some thousands (Table 4). The vast majority of the scholars in our sample can be classified as assistant professors or associate professors. PhD students are very few, hence they will not be considered further in the analysis.

Next, we focus on the academic performance indices. From the impact standpoint (Fig. 4), the dataset spans a significant range of h-index values, including also some top researchers with h-index $\ge 40$, plus some truly outstanding researchers with h-index above 60. The average h-index in the dataset is 17.2 (sd = 13.47), which is consistent with the expected h-index at the most popular career stages (mostly assistant/associate professors) in the dataset. In terms of productivity (Fig. 5), we observe that the average number of published papers is 68 ($sd=90.5$, median $= 40$), with a long tail of authors publishing up to 2773 papers in their career.

Let us now have a look at the mobility profile of the researchers in our dataset. In terms of affiliations (Fig. 6), our researchers are quite mobile, with the majority of them having changed between one and ten affiliations during their career. The observed average is 7.8 with a standard deviation of 10. Approximately 3% of these scientists can be classified as extremely mobile, having changed more than 25 affiliations in their career. We cannot exclude that these numbers are partially inflated by the way the affiliation time series is reconstructed. As discussed in "Extraction of academic mobility profiles" section, inferring the correct time series of affiliations from the time series of published papers is made difficult by the time lag between different types of publications (e.g., journal vs conference), which may create sequences like $\{$..., affiliation1, affiliation2, affiliation1, affiliation2$\}$ where the second appearance of affiliation1 is only due to a delay in publishing with that affiliation, rather than to an actual return to affiliation1. The preprocessing discussed in "Meaningful co-authorships" section was aimed at mitigating the problem but, due to the lack of ground truth, it is impossible to assess its efficacy.

Recalling that we are interested in distinguishing between domestic and international movements, we now look at the number of international vs domestic migrants. We observe a very balanced situation, with 40,749 international and 40,751 domestic migrants in our dataset. This implies that several of the affiliation changes detected in Fig. 6 are within the same country. When studying academic migrations, not all movements have the same impact. A relocation between two institutions in Europe is not comparable, in terms of personal effort and impact of the established collaboration, with a movement between, e.g., Europe and East Asia. Different time zones, large distances, different cultures make the move much more challenging, hence worth being studied separately. In Fig. 7 we plot the histogram of the average (per author) length of movements travelled by the authors in our dataset. While the majority of movements are in relatively close proximity, we observe some long-distance relocations. As a reference, the average distance in km between Europe and the US is approximately 8000 km. The average trip-distance travelled by a researcher is 2560 km, with a standard deviation of 3200km

Finally, we study the ego networks of the authors in our dataset. Preliminarily, we verify what is their optimal number of collaboration circles. To this aim, we leverage the methodology discussed in "Ego networks of scientists" section, applying the Mean Shift clustering algorithm to the authors in our dataset. Figure 8 shows the distribution of the obtained optimal number of circles. This distribution is centered around 5, as it was the case for the distribution of the social circles of purely social interactions (Dunbar et al. 2015). Thus, without loss of generality, in the following we will restrict ego networks to having 5 layers, which are obtained using the Jenks clustering algorithm (Jenks 1977). In Table 5 we summarise the properties of the co-authorship ego networks for the scientists in our dataset. Specifically, we report the mean circle size at the different career stages, that we have computed as discussed in "Ego networks of scientists" section, and the scaling ratio, which is defined as the ratio between the sizes of consecutive circles. Recall that the typical sizes of circles in purely social networks are 1.5, 5, 15, 50, 150, with a scaling ratio around 3. These numbers should be treated as reference fingerprints of social ego network structures, with the actual sizes of circles varying, possibly quite significantly, around these reference values, depending on the specific social environment considered. One limitation of our analysis (common to all works relying on data automatically extracted from publication repositories) is that we are only able to capture the collaboration strength at a coarse granularity: e.g., we cannot exactly pinpoint the co-authors on which the collaboration efforts hinged. Currently, though, these datasets provide the best large-scale proxies of collaboration strengths available for research. The results in Table 5 show that, as expected, the ego networks of scientists at the beginning of their career are smaller than those of more established colleagues. Despite these important difference, when focusing on individual career stages, we observe a striking regularity in the way collaborations (hence, cognitive efforts) are distributed across the layers, with a scaling ratio $\sim $ 2–3. In order to take into account the effect of the career stage on the size of ego network layers, in the following sections we always assess the impact of the career stage on the specific correlation under study.

Table 5 Ego network circles at different career stages: mean size and scaling ratio

Full size table

Academic performance and co-authorship ego networks

After the preliminary characterisation of the researchers in the dataset carried out in the previous section, we now start investigating the relationships between measures of productivity and impact, and the observed ego network structure. To this aim, we focus on the correlation between the features of interest, as discussed in "Correlation analysis" section. Recall that the ordering of ego network layers has an inverse relationship with the collaboration strength: the innermost layers are small and contain only the strongest connections, while the opposite holds for the outermost layers.

Productivity and circles of collaborations

The first question that we tackle is whether higher productivity is associated with more numerous collaborations in all ego network circles. On the one hand, it is expected that more productive people (people that publish a lot of papers) tend to also collaborate more. However, as discussed in "Background" section, in human social relationships each collaboration consumes cognitive efforts on the ego’s side. Hence the ego cannot increase her network and invest the same amount of time/efforts as before into each prior relationship. So, while it is easier to grow outermost layers (which include weak relationships), expanding innermost circles is much harder. We conjecture that similar properties may hold also for ego co-authorship networks. We provide the scatterplot of layer size vs productivity for the five circles in Fig. 9, and the corresponding correlation values in the first column of Table 6. While the linear trend is evident for the outermost circles, as we move inwards the correlation starts to decrease. When we reach the first circle (Fig. 9) the strong correlation ($r=0.80$) of C5 has significantly faded away ($r=0.26$ in C1). In Table 6, we also provide the correlation between the career length and productivity, in order to rule out that the strong correlation we observe at the outermost layers is only a side-effect of joint correlation on simpler features of the researcher profile (Table 2). Since the correlation between career length and productivity is generally lower than the correlation between productivity and layer size, we can rule out this hypothesis.

Table 6 Correlation of productivity, impact, and mobility with layer size and career length

Full size table

Having established a clear association between productivity and ego network layer size, we next investigate how this relation is affected by the career stage and the mobility status (domestic vs international migrant). In Fig. 10a we show, for each career stage, the Pearson correlation between the number of publications and the size of ego network layers. We plot the results for international migrants in blue and those for domestic ones in red. Please note that each line offers a compact summary of the correlations extracted from a set of scatterplots equivalent to those in Fig. 9. By and large, the trend observed in Fig. 9 globally is confirmed for individual career stages: researchers can easily grow their outermost collaboration circles (this stems from the high correlation for those layers), but establishing strong collaborations that affect the inner layers is costly, and aggressive publication is not sufficient to invert this trend, in any stage of career. It is interesting to note that international migrants tend to exhibit slightly stronger correlations between circle sizes and productivity, but this difference is strongly significant (i.e, showing non-overlapping confidence intervals) only for associate and full professors. Considering the trends observed in Fig. 10a up to the level of full professorship, results indicate that, eventually, researchers who migrate also internationally might be able to “exploit” more efficiently collaborations in their community, resulting in a higher correlation between layers sizes and number of publications. However, such a higher efficiency manifests itself from a certain stage of career onwards and the cumulative effect of international migration at the end of the career is not particularly significant. We also observe that, typically, the longest the career duration, the highest the correlation for all the ego network layers.

Table 7 Tally of heavier publishers (corresponding to the top 10% researchers from the productivity standpoint) in the different career and migration classes

Full size table

The final test that we carry out focuses on the most productive authors. We classify authors as most productive if they are in the 90th percentile of the productivity distribution. The number of samples in the most-productive group per class is reported in Table 7.^{Footnote 3} For young researchers and distinguished professors, there are just few samples of heavy publishers for certain mobility classes (e.g., only 67 distinguished professors in the domestic migrants class), hence these classes will be discarded from the analysis. The reason to analyse the most productive authors is the following. Results in Fig. 10a have established a clear correlation between sizes of layers and productivity, particularly at the external layers, for the authors in our dataset. This seems to suggest that heavy publisher have extremely large ego co-authorship networks. In order to investigate this claim, we compare the correlation observed for them (dashed lines in Fig. 10b) with the correlation observed for the rest of the population (continuous lines). Recall ("Correlation analysis" section) that both correlations are adjusted with the Thorndike correction, to take into account the range restriction due to the 10–90% split. In general, we observe that the correlation for heavy publishers is significantly lower than for regular scientists, with the only exception of young researchers. This result is quite interesting, as it shows that heavy publishers do not increase their ego network at the same pace of the general population, or, from the complementary standpoint, that nurturing very large collaboration networks does not necessarily lead to extremely high productivity. This can be interpreted as a capacity bound effect: beyond a certain level, the most prolific researchers hit a capacity bound (in terms of number of collaborations) that limits the growth of their ego co-authorship network as the number of publications increase. This is a first result suggesting the presence of cognitive capacity bounds similar to the ones driving the structure of ego networks in general, also in the case of ego co-authorship networks. Note that the social capacity bound of Dunbar (1992) and Roberts et al. (2009) blends in cognitive limits (e.g., for remembering personal details about the alters and acting upon them) with temporal constraints (e.g., time spent grooming among primates). This might as well be the case for scientific collaborations, where both temporal and cognitive constraints are expected to play a role in the scientist’s ability to keep up with many collaborators. An alternative explanation is that heavy publishers saturate the pool of potential co-authors, hence cannot grow their ego network layers at the same pace of regular researchers. While it seems unlikely that the finite number of potential collaborators plays a role here, we leave the exploration of this alternative hypothesis as future work.

Research impact and collaboration circles

Impact is the other dimension of performance, beside productivity, on which researchers are typically evaluated. Impact is usually measured by the h-index, which, despite its limitations (Costas and Bordons 2007), is still by far the most widely used metric. In this section, we investigate whether high impact is associated with larger collaborations in the ego network circles. Note that the h-index of an author is well-known to grow approximately linearly at the beginning of the career and to stabilise towards the end (Hirsch 2005).

Table 8 Tally of researcher with high impact (corresponding to the top 10% researchers from the h-index standpoint) in the different career and migration classes

Full size table

Preliminarily, the overall correlation (with no class distinction) is reported in Table 6 (second column). We observe that there is still a growing trend when moving from the innermost to the outermost layer. However, the correlation at the outermost circles remains lower than for the case of the number of publications. Similarly to what we did for productivity, in Table 6, we also provide the correlation between the career length and impact. Again, such correlation is generally lower than the correlation between impact and layer size, hence we can rule out the hypothesis that the measured correlation is just a by-product of the career length effect.

Figure 11a shows the correlation between h-index and circle size for domestic and international migrants (red solid line vs blue solid line) at different career stages. Similarly to the case of productivity, in Fig. 11b we also single out the top researchers in each category in terms of h-index (again, taking those above the 90th percentile) in order to study what happens for top performers (results shown as dashed lines). Recall that, for the 10–90% split, we apply the Thorndike correction ("Correlation analysis" section). Given that not enough samples are available for high-impact young researchers that are international migrants and distinguished professors that are domestic migrants (Table 8), we do not report their corresponding curves. We first focus on the general population of researcher (continuous lines, darkest colours) in Fig. 11a. We note that, while the general trend is the same observed for the productivity, the absolute values of the correlation tend to be lower. More interestingly, while previously there were noticeable differences between domestic and international migrants, now these differences are less marked, with only a few cases where there difference is actually statistically significant (corresponding to non overlapping confidence intervals). In general, differently from the case of productivity, here correlation for international migrants is not greater than for domestic migrants. This suggests that, while in terms of number of publications international migrants could exploit more efficiently their collaboration network, in terms of impact, domestic and international migrants exploit collaboration networks in a similar way. This is a first result in our analysis showing that international migrations are not necessarily strategic in terms of publication performances.

If we distinguish between the general population and top researchers (from the h-index standpoint), the same properties can be observed in an even more evident way. Specifically, as it was the case for heavy publishers, we observe significantly lower correlation between h-index and co-authorship circles for the most impactful researchers. Indeed, Fig. 11b shows a somewhat counter-intuitive result, as one would expected very impactful researchers to grow their network easily, as they are very appealing as collaborators and probably sought after by many. Instead, contrary to this intuition, the collaboration circles of top researchers do not observe an outstanding growth. The career length seems to only slightly affect the absolute value of correlations, but the trends remain the same across all career stages. Moreover, this is another element suggesting that a capacity bound effect might be present: beyond a certain limit, having very big networks does not result in high performance, as there is not sufficient cognitive capacity to make those collaborations effective.

Take-home messages

Summarising the results on academic performance and ego network structure, our findings are the following:

In general, there is a clear correlation between sizes of layers and performance indices (both productivity and impact). Thus, as the productivity/impact grows, so does the collaboration circles. The correlation is particularly higher for external layers and this is in agreement with the theory on the different cognitive effort that is allocated to the collaborations in the different circles. The outermost circles only require low effort, while the innermost ones necessitate of intense nurturing, hence cannot be grown at will.
International and domestic migrants show different efficiency levels in exploiting their collaboration networks, respectively for productivity and impact. While international migrants seem more effective in exploiting larger networks for higher number of publications, this advantage is not present when considering high impact. This suggests that international migrations (which typically require more effort on the side of the migrating person) are not always particularly effective in terms of impactful research.
The existence of a capacity bound effect on the size seems confirmed: beyond a certain point (in terms of performance) higher performances are less and less correlated with large collaboration networks. This confirms at a higher level of detail the similarity between cognitive constraints in general social networks and in scientific collaborations. Also, this result seems to support the findings in Petersen (2015) about stable partnerships as a strategy for success.

Mobility profile and co-authorship ego networks

In the previous section, we have considered, among others, the impact of domestic and international mobility on the co-evolution of academic performance measures and collaboration circles. We now provide a more focused analysis on the interplay between co-authorship ego networks and the different dimensions of academic mobility.

In Table 6 (third column) we report, for each ego network layer, the correlation between its size and the number of affiliation changes. The trend is similar to that observed for the h-index, again showing a limited ability in growing the innermost circles. For the outermost circles (corresponding to weaker collaborations), the growth is significant, but still less than what we observe for the number of publications. For growing the weakest links, it seems, it is more important to publish extensively rather than striving for a high impact or moving a lot. Similarly to the case of productivity and impact, we can rule out the hypothesis that the measured correlation is just a by-product of the effect of career length, since its correlation with mobility is lower.

Figure 12 presents the correlation between co-authorship layer size and number of affiliations, separating the results per career stages and type of academic migration. We observe that, for international migrants, there are two distinct career phases. The first corresponds to the beginning of the career (young researcher and assistant professor stages), for which we observe correlation but not a particularly high one, especially for the innermost circles. In the second career phase (from associate professor on), the correlation significantly increases in the outermost circles with respect to the previous phase, but overall we observe minimal changes across the career stages in this second phase. Thus, it seems that young scholars have more difficulties in growing their networks as they move across institutions, while senior researchers are very effective in capitalising movements. This might be due to the typically lower effort spent on individual papers by more established scientists with respect to younger researchers. Interestingly, this dichotomy is not present for domestic migrants, for which the correlation, more or less steadily, increases as the career progresses. Comparing the two cases, we observe that, on a layer-by-layer basis, the correlation values at the end of the career are basically equivalent between international and domestic migrants. However, the former class enjoys a bump in correlation early in the career. Specifically, international migrants seem to capitalise mobility in terms of growing their network of collaborations already at the stage of associate professor at the same level (or even higher) as what domestic migrants achieve at the end of their career.

Table 9 Tally of highly mobile researcher (corresponding to the top 10% researchers from the mobility standpoint) in the different career and migration classes

Full size table

Similarly to what we did in "Productivity and circles of collaborations" section for heavy publishers and in "Research impact and collaboration circles" section for researchers with high impact, we now study separately very mobile scholars. The correlation between layer size and mobility for the most mobile authors are denoted in scales of red in Fig. 13, for the others in scales of green. Again, we consider as the most mobile authors those above the 90th percentile (in turn, the remaining scholars are assigned to the group of regular authors). Table 9 reports that we have a reasonable number of samples for all career stages with the exception of distinguished professors that migrate domestically, which we thus discard from the analysis. In Fig. 13, we observe different trends for international and domestic migrants. Let us first discuss the former (first five rows of Fig. 13). Highly mobile young researchers that migrate internationally exhibit larger correlations than their less mobile counterpart. This difference is statistically significant in all the outermost layers. Hence, the collaboration networks of highly mobile young researchers tend to be larger in presence of international mobility. This relationship, though, wanes quickly as the career progresses: there is basically no difference between highly-mobile and regular assistant professors. The trend flips starting from associate professors: we observe smaller correlation for highly mobile scientists than for the others. Thus, highly mobile international migrants easily grow their (weaker) collaborations at the beginning of their career, but rapidly fall behind their less mobile counterparts as the career progresses. Let us now examine the case of domestic migrants. For them, the differences in correlation between the highly mobile and the rest of the researchers is statistically negligible in the vast majority of cases. Thus, there seems to be a dichotomic behaviour: for international migrants, high mobility correlates with the collaboration network size more than for regular scientists, but only the very beginning of the career. After this initial phase, the relationship between mobility and collaboration network size instead becomes weaker for the highly mobile scientists with respect to the others. For domestic migrants, the “early boost” is not even present and highly mobile researchers tend to be almost indistinguishable from the others.

The next question we tackle is whether highly mobile researchers are different from the others when considering the relation between their productivity/impact and the size of their collaboration network. Note that in this set of results we rely on the Thorndike Correction Case 3 (see "Correlation analysis" section). We start by investigating whether publishing more as a highly-mobile researcher pays off in terms of collaborations with respect to the average researcher. We know, from Fig. 10, that productivity correlates with the size of the collaboration circles, especially in the outermost layers (weaker collaborations). We also observed that heavy publishers, however, exhibit lower correlation than their average counterparts, hinting at the existence of a capacity bound. Figure 14 shows how the correlation between ego co-authorship layer size and productivity varies at the different stages of career and for the different classes of mobility. In general, highly mobile researchers are statistically indistinguishable from the others, apart for distinguished professors that have migrated internationally. For them, the correlation of the highly mobile is even lower than for the other career stages. Thus, high mobility does not seem to pay off in terms of productivity and collaboration circles. International and domestic migrants are substantially equivalent from this standpoint. The only significant difference is that highly mobile domestic migrants exhibit slightly smaller correlation values than international ones. This might suggest that the benefit of international migrations carries over to highly mobile researchers. In Fig. 15 we focus on the impact of highly mobile researchers and on how it correlates with the number of collaborations at the different layers. At the beginning of the career, highly mobile researchers are basically indistinguishable from the others, in both migration classes. However, starting from associated professors, being highly mobile entails having smaller collaboration circles as the impact increases, with respect to regular researchers. This differences is larger for international than for domestic migrants. Putting all the results together (from Figs. 13, 14 and 15), and considering that the authors in the most mobile groups are the same for the analysis of both performance features, we find that most mobile authors are not able to capitalise their mobility neither in terms of productivity nor of impact, as for them having bigger ego network sizes is less correlated with high impact/number of publications with respect to regular authors. The same conclusion can be also obtained by inspecting the correlation between the two performance indices, i.e., number of publications and h-index (Fig. 16). Specifically, comparing the most mobile researchers with the rest, it is clear that their correlation is similar or lower than that of average researchers. This confirms that high mobility does not seem to pay off in terms of research performance.

Take-home messages

In this "Mobility profile and co-authorship ego networks" section we have focused on the interplay between the mobility profile and the ego network structure. The main findings are summarised below.

International migrants enjoy a boost in their co-authorship ego networks earlier in the career with respect to domestic migrants. The latter, however, are able to catch up and end their career at comparatively the same ego network levels.
The most mobile authors do not publish more nor increase the number of their collaborations with respect to less mobile researchers. Moreover, they are not able to capitalise mobility in terms of impact, as for them having bigger ego network sizes is less correlated with high impact with respect to regular authors.

Conclusion

Within the Science of Science framework, in this work we have investigated the interplay between academic performance, academic collaborations, and academic migrations. The analysis was carried out leveraging a dataset of geo-referenced publications from 81,500 authors at different stages of career, collected from Scopus. The collaboration network is computed from the perspective of individual authors, and is linked to results from anthropology on hierarchical social structures. Specifically, the ego network structure of the collaborations is obtained for each author in our dataset. The innermost layer of an ego network contains the strongest collaborations, while the weakest ones can be found in the outermost layers. Our results show that increasing productivity and impact of a researcher correlate with larger ego network layers, but, while the outermost layers can be easily grown, the innermost ones are much less sensitive to the measured academic performance. In addition, we observed a capacity bound, whereby the marginal utility of an increasing performance is less and less correlated with larger collaboration networks. From the mobility standpoint, we have observed that international migration correlates with higher productivity more than with higher impact. Also, international migrants seem to benefit from a collaboration boost at earlier stages of their career compared to domestic migrants. However, the latter are able to catch up and end their career at approximately the same levels of collaborations. Finally, we observed a lack of pay-off for very high mobility: its correlation with research impact is smaller than for regular authors.

Availability of data and materials

The data that support the findings of this study are available from Scopus but restrictions apply to the availability of these data and so they are not publicly available. Data are however available from the authors upon reasonable request and with permission of Scopus. The list of scientists analysed in this study is available from the corresponding author on request.

Notes

Version: 2.0.1 [currently renamed to “pybliometrics” Rose and Kitchin (2019)]. https://dev.elsevier.com/scopus.html.
The low number of authors classified as PhD students in our dataset is due to the fact that we rely only on authors that were present in the Google Scholar dataset collected by Arnaboldi et al. (2016) in November 2013. Since then, the majority of those authors have progressed in their career.
Please note that the two columns do not add up exactly to the 10% of authors in the corresponding career group (see Table 4). This is due to the fact that some authors have exactly the same number of publications, and we retain all of them for the analysis.

References

Abramo G, D’Angelo CA, Solazzi M (2011) The relationship between scientists’ research performance and the degree of internationalization of their research. Scientometrics 86(3):629–643. https://doi.org/10.1007/s11192-010-0284-7
Article Google Scholar
Aral S, Van Alstyne M (2011) The diversity-bandwidth trade-off. Am J Sociol 117(1):90–171
Article Google Scholar
Arnaboldi V, Conti M, Passarella A, Dunbar R (2013) Dynamics of personal social relationships in online social networks: a study on Twitter. In: COSN’13. https://doi.org/10.1145/2512938.2512949
Arnaboldi V, Conti M, La Gala M, Passarella A, Pezzoni F (2014) Information diffusion in OSNs: the impact of nodes’ sociality. In: Proceedings of the 29th annual ACM symposium on applied computing, ACM, pp 616–621
Arnaboldi V, Dunbar RI, Passarella A, Conti M (2016) Analysis of co-authorship ego networks. In: International conference and school on network science, Springer, pp 82–96
Bohannon J, Doran K (2017) Introducing ORCID. Science (New York, NY) 356(6339):691–692. https://doi.org/10.1126/science.356.6339.691
Article Google Scholar
Costas R, Bordons M (2007) The h-index: advantages, limitations and its relation with other bibliometric indicators at the micro level. J Inform 1(3):193–203. https://doi.org/10.1016/j.joi.2007.02.001
Article Google Scholar
Dekking FM, Kraaikamp C, Lopuhaä HP, Meester LE (2005) A modern introduction to probability and statistics: understanding why and how. Springer, Berlin
Book Google Scholar
Deville P, Wang D, Sinatra R, Song C, Blondel VD, Barabási A-L (2014) Career on the move: geography, stratification, and scientific impact. Sci R 4:4770
Google Scholar
Dunbar RI (1992) Time: a hidden constraint on the behavioural ecology of baboons. Behav Ecol Sociobiol 31(1):35–49
Article MathSciNet Google Scholar
Dunbar R (1998) The social brain hypothesis. Evol Anthropol 9(10):178–190
Article Google Scholar
Dunbar R (2010) How many friends does one person need? Dunbar’s number and other evolutionary quirks. Faber & Faber, London
Google Scholar
Dunbar RIM, Arnaboldi V, Conti M, Passarella A (2015) The structure of online social networks mirrors those in the offline world. Soc Netw 43:39–47
Article Google Scholar
Efron B (1987) Better bootstrap confidence intervals. J Am Stat Assoc 82(397):171–185
Article MathSciNet Google Scholar
Eom YH, Fortunato S (2011) Characterizing and modeling citation dynamics. PLoS ONE. https://doi.org/10.1371/journal.pone.0024926
Article Google Scholar
Everett M, Borgatti SP (2005) Ego network betweenness. Soc Netw 27(1):31–38
Article Google Scholar
Fisher RA (1915) Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika 10(4):507–521
Google Scholar
Fortunato S, Bergstrom CT, Börner K, Evans JA, Helbing D, Milojević S, Petersen AM, Radicchi F, Sinatra R, Uzzi B, Vespignani A, Waltman L, Wang D, Barabási AL (2018) Science of science. Science (New York, NY) 359(6379):0185. https://doi.org/10.1126/science.aao0185
Article Google Scholar
Foster JG, Rzhetsky A, Evans JA (2015) Tradition and innovation in scientists’ research strategies. Am Sociol Rev. https://doi.org/10.1177/0003122415601618. arXiv:1302.6906
Franzoni C, Scellato G, Stephan P (2014) The mover’s advantage: the superior performance of migrant scientists. Econ Lett 122(1):89–93
Article Google Scholar
González AM, Aref S, Theile T, Zagheni E (2020) Scholarly migration within mexico: analyzing internal migration among researchers using scopus longitudinal bibliometric data. arXiv preprint arXiv:2004.06539
Gonçalves B, Perra N, Vespignani A (2011) Modeling users’ activity on twitter networks: validation of Dunbar’s number. PLoS ONE 6(8):22656
Article Google Scholar
Gross AL, Fleischman L (1983) Restriction of range corrections when both distribution and selection assumptions are violated. Appl Psychol Meas 7(2):227–237
Article Google Scholar
Haerter JO, Jamtveit B, Mathiesen J (2012) Communication dynamics in finite capacity social networks. Phys Rev Lett 109(16):168701
Article Google Scholar
Hill RA, Dunbar RI (2003) Social network size in humans. Hum Nat 14(1):53–72
Article Google Scholar
Hirsch JE (2005) An index to quantify an individual’s scientific research output. Proc Natl Acad Sci 102(46):16569–16572 https://doi.org/10.1073/pnas.0507655102. https://www.pnas.org/content/102/46/16569.full.pdf
Holmes D (1990) The robustness of the usual correction for restriction in range due to explicit selection. Psychometrika 55(1):19–32
Article MathSciNet Google Scholar
James C, Pappalardo L, Sîrbu A, Simini F (2018) Prediction of next career moves from scientific profiles. arXiv preprint arXiv:1802.04830
Jenks GF (1977) Optimal data classification for choropleth maps. Department of Geographiy, University of Kansas Occasional Paper
Johansen A, Sornette D, Ledoit O (1999) Predicting financial crashes using discrete scale invariance. J Risk 1:5–32
Article Google Scholar
Kato M, Ando A (2013) The relationship between research performance and international collaboration in chemistry. Scientometrics 97(3):535–553. https://doi.org/10.1007/s11192-013-1011-y
Article Google Scholar
Lin N, Cook KS, Burt RS (2001) Social capital: theory and research. Transaction Publishers, Piscataway
Book Google Scholar
McCarty C (2002) Structure in personal networks. J Soc Struct 3(1):20
Google Scholar
Miritello G, Moro E, Lara R, Martínez-López R, Belchamber J, Roberts SGB, Dunbar RIM (2013) Time as a limited resource: communication strategy in mobile phone networks. Soc Netw 35(1):89–95. https://doi.org/10.1016/j.socnet.2013.01.003
Article Google Scholar
Moed HF, Aisati M, Plume A (2013) Studying scientific migration in scopus. Scientometrics 94(3):929–942
Article Google Scholar
Moss-Racusin CA, Dovidio JF, Brescoll VL, Graham MJ, Handelsman J (2012) Science faculty’s subtle gender biases favor male students. Proc Nat Acad Sci USA. https://doi.org/10.1073/pnas.1211286109
Article Google Scholar
Panzarasa P, Opsahl T (2008) Patterns of scientific collaboration in business and management: the effects of network structure and interdisciplinarity on research performance. In: Proceedings of the international workshop and conference on network science (NetSci’08), Norwich BioScience Institutes, pp 23–27
Paraskevopoulos P, Boldrini C, Passarella A, Conti M (2020) Dynamics of scientific collaboration networks due to academic migrations. In: 12th International conference of social informatics (SocInfo)
Petersen AM (2015) Quantifying the impact of weak, strong, and super ties in scientific careers. Proc Nat Acad Sci USA 112(34):4671–4680. https://doi.org/10.1073/pnas.1501444112
Article Google Scholar
Petersen AM (2018) Multiscale impact of researcher mobility. J R Soc Interface 15(146):20180580
Article Google Scholar
Pramanik S, Gora ST, Sundaram R, Ganguly N, Mitra B (2019) On the migration of researchers across scientific domains. In: Proceedings of the international AAAI conference on web and social media, vol 13, Association for the Advancement of Artificial Intelligence, Menlo Park, pp 381–392. https://www.aaai.org/ojs/index.php/ICWSM/article/view/3238
Qi M, Zeng A, Li M, Fan Y, Di Z (2017) Standing on the shoulders of giants: the effect of outstanding scientists on young collaborators’ careers. Scientometrics 111(3):1839–1850. https://doi.org/10.1007/s11192-017-2328-8
Article Google Scholar
Roberts SBG, Dunbar RIM (2015) Managing relationship decay. Hum Nat. https://doi.org/10.1007/s12110-015-9242-7
Article Google Scholar
Roberts SG, Dunbar RI, Pollet TV, Kuppens T (2009) Exploring variation in active network size: constraints and ego characteristics. Soc Netw 31(2):138–146
Article Google Scholar
Rose ME, Kitchin JR (2019) Pybliometrics: scriptable bibliometrics using a python interface to scopus. SoftwareX 10:100263. https://doi.org/10.1016/j.softx.2019.100263
Article Google Scholar
Sackett PR, Yang H (2000) Correction for range restriction: an expanded typology. J Appl Psychol 85(1):112
Article Google Scholar
Servia-Rodríguez S, Noulas A, Mascolo C, Fernández-Vilas A, Díaz-Redondo RP (2015) The evolution of your success lies at the centre of your co-authorship network. PLoS ONE 10(3):0114302. https://doi.org/10.1371/journal.pone.0114302
Article Google Scholar
Sugimoto CR, Robinson-García N, Murray DS, Yegros-Yegros A, Costas R, Larivière V (2017) Scientists have most impact when they’re free to move. Nat News 550(7674):29
Article Google Scholar
Sutcliffe AG, Wang D, Dunbar RI (2015) Modelling the role of trust in social relationships. ACM Trans Internet Technol (TOIT) 15(4):16
Article Google Scholar
Thorndike RL (1949) Personnel selection; test and measurement techniques
Urbinati A, Galimberti E, Ruffo G (2019) Hubs and authorities of the scientific migration network. arXiv preprint arXiv:1907.07175
Vaccario G, Verginer L, Schweitzer F (2019) The mobility network of scientists: analyzing temporal correlations in scientific careers. arXiv preprint arXiv:1905.06142
Wang D, Song C, Barabási A-L (2013) Quantifying long-term scientific impact. Science (New York, NY) 342(6154):127–32. https://doi.org/10.1126/science.1237825
Article Google Scholar
Wang J, Hooi R, Li AX, Chou M (2019) Collaboration patterns of mobile academics: the impact of international mobility. Sci Public Policy 46(3):450–462. https://doi.org/10.1093/scipol/scy073
Article Google Scholar
Way SF, Morgan AC, Clauset A, Larremore DB (2017) The misleading narrative of the canonical faculty productivity trajectory. Proc Nat Acad Sci USA 114(44):9216–9223. https://doi.org/10.1073/pnas.1702121114
Article Google Scholar
Way SF, Morgan AC, Larremore DB, Clauset A (2019) Productivity, prominence, and the effects of academic environment. Proc Nat Acad Sci 116(22):10729–10733. https://doi.org/10.1073/PNAS.1817431116
Article Google Scholar
Wuchty S, Jones BF, Uzzi B (2007) The increasing dominance of teams in production of knowledge. Science. https://doi.org/10.1126/science.1136099
Article Google Scholar
Zhou WX, Sornette D, Hill RA (2005) Dunbar RIM (2005) Discrete hierarchical organization of social group sizes. Proc Biol Sci R Soc 272(1561):439–444
Article Google Scholar

Download references

Funding

This work was partially funded by the SoBigData++ and HumaneAI-Net project. The SoBigData++ project has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No 871042. The HumaneAI-Net project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 952026. The work of Pavlos Paraskevopoulos was supported by the ERCIM Alain Bensoussan Fellowship Program.

Author information

Authors and Affiliations

IIT, National Research Council, Via G. Moruzzi 1, 56124, Pisa, Italy
Pavlos Paraskevopoulos, Chiara Boldrini, Andrea Passarella & Marco Conti

Authors

Pavlos Paraskevopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Chiara Boldrini
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Passarella
View author publications
You can also search for this author in PubMed Google Scholar
Marco Conti
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Designed the study: PP CB AP MC. Collected and processed the data: PP. Analyzed the data: PP CB AP MC. Wrote the paper: PP CB AP. PP was a postdoctoral researcher at IIT-CNR at the time of the study. CB is a permanent researcher at IIT-CNR. AP and MC are research directors are IIT-CNR.. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Pavlos Paraskevopoulos.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Paraskevopoulos, P., Boldrini, C., Passarella, A. et al. The academic wanderer: structure of collaboration network and relation with research performance. Appl Netw Sci 6, 31 (2021). https://doi.org/10.1007/s41109-021-00369-4

Download citation

Received: 11 November 2020
Accepted: 16 March 2021
Published: 14 April 2021
DOI: https://doi.org/10.1007/s41109-021-00369-4

The academic wanderer: structure of collaboration network and relation with research performance

Abstract

Introduction

Related work

(Ego) networks of collaborations

Background

Ego networks of scientists

The dataset

The academic data landscape

Data collection

Extraction of academic mobility profiles

Meaningful co-authorships

Methodology

Building the researcher profile

Research areas

Academic performance features

Career stage

Academic mobility features

Collaboration network features

Correlation analysis

Results

Researcher profiles overview

Academic performance and co-authorship ego networks

Productivity and circles of collaborations

Research impact and collaboration circles

Take-home messages

Mobility profile and co-authorship ego networks

Take-home messages

Conclusion

Availability of data and materials

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords