Cooperation patterns in the ERASMUS student exchange network: an empirical study

The ERASMUS program is the most extensive cooperation network of European higher education institutions. The network involves 90% of European universities and hundreds of thousands of students. The allocated money and number of travelers in the program are growing yearly. By considering the interconnection of institutions, the study asks how the program’s budget performs, whether the program can achieve its expected goals, and how the program contributes to the development of a European identity, interactions among young people from different countries and learning among cultures. Our goal was to review and explore the elements of network structures that can be used to understand the complexity of the whole ERASMUS student mobility network at the institutional level. The results suggest some socioeconomic and individual behavioral factors underpinning the emergence of the network. While the nodes are spatially distributed, geographical distance does not play a role in the network’s structure, although parallel travelling strategies exist, i.e., in terms of preference of short- and long-distance. The European regions of home and host countries also affect the network. One of the most considerable driving forces of edge formation between institutions are the subject areas represented by participating institutions. The study finds that faculties of institutions are connected rather than institutions, and multilayer network model suggested to explore the mechanisms of those connections. The results indicate that the information uncovered by the study is helpful to scholars and policymakers.


Introduction
The ERASMUS exchange network was established to play an essential role in connecting and developing European young people and to help in the creation of European identity in a culturally, economically, and linguistically divided Europe. Hundreds of thousands of participants in the exchange network connect institutions (HEIs) and regions and form a cooperation network at the European level. Participants select the host HEI through the contract system of their home institution depending on their preferences, goals and motivations. Institutional contractual relationships and individual travel decisions form a mobility network. We assume that the network structure is not random, and individual decisions, desires, and constraints significantly impact the network structure.
One of the roles of the ERASMUS network is to promote European identity and strengthen relations between European citizens. The achievement of these goals is made possible by overcoming spatial and cultural distances. Of particular interest is the effect of the financial support of movement on spatial patterns. We aim to infer the socioeconomic factors of edge formation in the ERASMUS network through a data-driven exploration of the network structure, in particular, the balanced connectivity of regions and countries, and to uncover other institutional-level factors that affect mobility.
In general, edge formation is constrained between geographically distributed nodes, and the probability of edges decreases with spatial distance (Expert et al. 2011;Barthélemy 2011). Gravity models have shown that geographical distance either does not or negatively affects the ERASMUS network (Barrioluengo and Flisi 2017). To the best of our knowledge, to date, no network science-based statistical study has been conducted on the spatial distance dependence of the ERASMUS network at institutional level.
Europe's geographical distances and diversity are large enough for HEIs as vertices to be located in very different sociocultural environments. We know that the geographical location of students' origins may influence ERASMUS mobility motivation (Lesjak et al. 2015). From the point of view of origin, not all destinations are attractive in an educational or touristic sense. Our goal is to clarify this general assumption. What parts of the network are biased in terms of cooperation? Do HEIs in different regions cooperate with each other in balanced ways?
In this work, we examine the impact of institutional characteristics on mobility mainly the represented subject area. The characteristics of HEIs/nodes in a network can be very different. What is the role of HEI profiles in shaping the structure of the network? Travels are categorized by subject areas; however, the literature appears to be limited in assuming that the ERASMUS network has a multilayered structure where layers are subject areas.
The results obtained provide vital information for establishing future models that can be used to understand the network more deeply. In most studies, institutions are aggregated geographically (NUTS regions) or sampled by geographic area. In the literature, there is a lack of analysis of the whole network at the institutional level. We assume that this gap is due to the lack of a proper database. Gadár et al. (2020) geocoded the HEIs and merged them with the microdata of the first comprehensive database on European HEIs (ETER) (Project 2013) to expand the information available regarding the nodes in the network.
In the presented work, we aimed to identify the mobility patterns generated by the participants. The ERASMUS mobility network is a very complex system from a network science point of view. A multifaceted evaluation of ERASMUS's network allows us to infer the participants' decision patterns affected by the home and host HEIs, regions, countries, geographical distance. The research reveals deviations from the random network in a data-driven way, presenting patterns and drivers that influence travel decisions.
The paper is organized as follows. After the Introduction section, we review the related social and network science-based research works on the ERASMUS student network. In the Data section, we shortly present the dataset from which we constructed the network. In the Methods section, we present the data driven network science tools used to discover the mobility patterns and network structure, mainly comparison with random network. The Results section presents the network's structural characteristics as well as a discussion, particularly with regard to the factors related to geographical distance and the modular structure, which strongly relates to the multilayer structure where layers are subject areas.

The ERASMUS program
The most extensive cooperation network of European higher education was initiated in 1987 by the European Union with a European Commission proposal (1987). The European Community Action Scheme for the Mobility of University Students (ERASMUS) program supports student mobility and includes exchanges among academic teachers. The program was designed to allow millions of European students to expand their studies to higher education institutions (HEIs) in other member states or European organizations. ERASMUS student exchanges have a duration of between 3 and 12 months, and approximately 90% of European universities are involved.
Initially, a few thousand students from 14 countries formed the ERASMUS student exchange network. The number of participants increased as more member states joined. The ERASMUS program became part of Socrates I (1994)(1995)(1996)(1997)(1998)(1999), then Socrates II (2000)(2001)(2002)(2003)(2004)(2005)(2006) and then lifelong learning programs (2007)(2008)(2009)(2010)(2011)(2012)(2013) (Parliament and the Council of the European Union 2006). At that time, the program aimed to enhance the quality and reinforce the European dimension of higher education by encouraging transnational cooperation between universities, boosting European mobility and improving the transparency and full academic recognition of studies and qualifications throughout the European Union (Parliament and the Council of the European Union 2000). Twenty-five member states of the European Union, two candidate countries and four countries in Europe participated in Socrates II. During this period, 1.2 million students were involved in the program. The ERASMUS+ program (2014-2020 involved other organizations (such as NGOs) and individuals (e.g., young people under 26 years of age) in education and training and youth and sport activities in member states. ERASMUS+ 2021-2027 is much more inclusive and covers all types of education and training and goes far beyond higher education to expand learning opportunities in Europe. However, student, teacher and staff mobility between HEIs is still a decisive part of the program.

Social and economic view of ERASMUS program
The presented research focuses on the ERASMUS program in 2007-2013 based on the available open data and student mobility for study purposes. The Bologna process (1999), which began with the Bologna Declaration, established an intergovernmental process aimed at creating a 'European Area of Higher Education (EHEA) (2005) by 2010, which occurred in this period. Emphasis on internationalization has become an essential element of the policy discourse of European higher education to make it more attractive and competitive worldwide. HEIs continuously seek their place in the European higher education sphere by increasing the need to create a more diverse range of international activities. The Bologna Process has furthered internationalization in the EHEA. This process is problematic because it is difficult to measure the impact of the Bologna Process on internationalization (Valiulis 2015). Internationalization is also taken into account in international higher education ranking systems.
The specific objective of the ERASMUS program is to support the EHEA. One of the Lifelong Learning Program's main aims is to contribute a European dimension in systems and practices in education and training. As a result of this program, by 2014, 3.3 million students and 4,000 institutions from 34 countries participated in the exchange program. The evolution of this network in education is still ongoing.
The ERASMUS exchange program reflects the European integration of institutions and people and the internationalization and openness of HEIs through its supporting system (Maggioni and Uberti 2009), and it has become one of the most important projects of European identity. Exchanges are a tool for personal career development and introduction to other cultures and improve future chances for more accessible and better opportunities for employability (Bracht et al. 2009;Teichler and Janson 2007;Waibel et al. 2017;Brandenburg 2014;Brandenburg et al. 2016). Several papers show that the formation of the "European dimension" is limited because contact with host students remains limited (Sigalas 2010;Van Mol 2013). Indeed, the ERASMUS program plays an important socio-economic role within Europe.
The internationalization of Europe with the ERASMUS program also indicates the birth of a new industry. A growing number of ERASMUS students are international travelers, and studying and/or improving academic knowledge appears to be a travel motive (Lesjak et al. 2015;Altbach and Knight 2007). A comparative study highlighted that a non-supporting family background, insufficient financial help, and credit recognition problems significantly deter participation in the ERASMUS student exchange program, and students aim to balance and minimize the risk (Souto-Otero et al. 2013). The push and pull motivations of 360 students who studied abroad in 2008/2009 and 2009/2010 with ERASMUS grants were surveyed (Lesjak et al. 2015). The main push motivations on the respondents' preference lists were to meet new people, have experiences, learn about other cultures, and study, which is well correlated with other non-European case studies (Ingraham and Peterson 2004;Pyvis and Chapman 2007). This phenomenon is probably an age-specific feature, and it has been shown that 'lifestyle' , which refers to experiencing and rediscovering self-important factors, affects the travel habits of mostly 18-to 26-year-old people (Kim et al. 2007). Among the pull motivation factors were touristic attractions (Lesjak et al. 2015), but it should be noted that institutes' academic performance or excellence were not among the examined categories. These and other findings confirm that students engaged in the ERASMUS student exchange program mainly just for fun, which appears to be the opposite of professionaloriented objectives (Findlay et al. 2010;González et al. 2011;Keogh and Russel-Roberts 2009).
In a study, student mobility was considered a type of tourist activity in Galicia (Spain). The main determinants driving the demand for academic tourism, including ERAS-MUS students' travels, were identified between 2001 and 2009 (Rodríguez et al. 2012). The authors showed that academic tourism demand was driven by noneconomic factors such as the ability to attract international students or the word-of-mouth reputation of HEIs and the ease of mobilization of the ERASMUS program. Economic factors show that travel costs are high, and a tendency was identified whereby an increasing number of students travel from countries with relatively low income levels to Galician universities. One of the conclusions of that study was that there is a need for cooperation between institutions and stakeholders to attract students who participate in the ERAS-MUS program.

Network science-based view of the ERASMUS network
The cooperation of HEIs and part-time studies within the framework of the ERASMUS program can be considered a collaboration network where nodes are institutes and links are the mobilities of students, academic teachers and staff between universities. HEIs belong to regions and countries. Thus, collaboration can be analyzed at a higher level of the geographical system.
The principal factors that affect ERASMUS student mobility (ESM) flows in 1995-2006 between countries were analyzed through migration theory and a gravity model at the country level (González et al. 2011). The authors identified differences between the original members of the ERASMUS program (EU-16), which reached a stationary level of outgoing students, and new participating countries where increasing travel numbers were experienced. The ratio of ESM per 1000 capita was much higher in EU-16 than in new member countries in 2006. The analysis of the variables of the linear gravity model revealed that distance negatively affects mobility but Mediterranean host countries positively affect mobility. Contrary to the case of newly participating countries, these factors were much less evident in the model. Interestingly, a higher price level of the host country positively affects the ESM, while in EU-16 countries, it negatively affects the ESM. The possible explanation offered by the authors for the price level effect is that the ESM had not yet spread to lower-or middle-income groups as in the EU-16.
A gravity model-based study analyzed the flow between NUTS2 regions by taking into account the size of the HEIs based on ETER data (Barrioluengo and Flisi 2017). Their findings, among others, show that geographical distance is negatively associated with mobility. At the same time, there is a positive relationship between the number of students at home and the host institutes and reputation. Reputation was evaluated using a dummy variable with a value of one if the HEI was included in the Times Higher Education Ranking. Maggioni and Uberti (2009) adequately analyzed a NUTS2 region aggregated subnetwork of ERASMUS student mobility among 5 European countries in terms of knowledge transfer and internationalization and noted the success of the ERASMUS program. They found a core-periphery structure of the ERASMUS student network. Core HEIs in ERASMUS differ slightly from core institutions in terms of patents and FP5 research networks, which is caused by the profiles of HEIs (Kosztyán et al. 2021). EU-supported travel fosters students' geographical mobility, even when they are located in 'peripheral' regions. Topologically and structurally, the network shows similarity to random networks, and the geographical distance does not strongly influence the flows of students between regions (Maggioni and Uberti 2009). The degree correlation between nodes based on GDP per capita showed slight assortativity, which indicates that student travel is stimulated by economical similarity. These results are consistent with an outcome of a previous research among wider international countries, which showed that the country's position in the global student exchange network is significantly related to its GNP per capita and clusters include countries with similar development (Barnett and Wu 1995;Barnett et al. 2016). In addition, the core and peripheral positions proved to be stable over time (Chen and Barnett 2000;Barnett et al. 2016). According to Deutschmann (2022) Transnational activities are primarily determined by geographical distance, the travel and communication behaviour is still regionalized and a truly global society is unlikely to emerge.
A self-respondent-based induced subnetwork of 37 HEIs from 25 countries also shows a core-periphery structure with a densely connected core composed of active institutions (Savić et al. 2017). The network of 134330 student exchanges between 2333 HEIs in 2003 was analyzed, and the authors found that the degree distribution of the ERASMUS student network is not scale-free. Random connection models facilitated an understanding of the network topology (Derzsi et al. 2011). According to the authors, this means that the network is not characterized by preferential linking. Contrary to Breznik (2017), who found a power-law distribution in the subnetwork where student exchange took place in an engineering subject area, but this finding was not proven statistically and rigorously.

Data employed
The ERASMUS network, defined by nodes and edges, is considered a spatially embedded, multilabel and multidimensional network. The open and linked dataset of the ERASMUS exchange network, published in 2020 (Gadár et al. 2020), offers completely new analytical possibilities for network scientists. The compilation of the dataset aims to better understand ERASMUS mobility patterns and the motivational and attractiveness factors behind travel.
The ERASMUS network has become a spatial network by defining the geocoordinates of the HEIs as nodes. By identifying the spatial embeddedness of HEIs, the environment of the nodes can be better recognized, and the geographical distance of travels between the nodes can be estimated. In previous studies, the countries or regions of the institutions represented the geographical embeddedness and were used as nodes in the network.
As the most significant European institutional database, the European Tertiary Education Register (ETER) is linked to the institutional data of the ERASMUS network. ETER represents most European tertiary education institutions by defining some specific thresholds for inclusion and exclusion (Lepori 2017). The Global Research Identifier Database (GRID) has been linked to institutional variables and contains institutions with research institute profiles. Thus, we know which institutions have research activity in addition to education function. ERASMUS student and academic teacher exchanges represent the educational profile of institutions. Linked databases provide labels to institutions, enriching the dataset for network science analysis.
Connections between HEIs in the ERASMUS network are defined by travelers participating in the exchange program. The EU Open Data Portal published travel between 2008-2014 at the microdata level. The microdata include in detail the traveler's subject area, level of training, and the length of the trip. This information, as the labels of the edges, offers additional possibilities for network science analysis.
The results of this work are based on the ERASMUS student network in the 2013-2014 study year. The number of movements between HEIs provided in the open data is exactly the same as that identified in the official statistics published by the European Commission (2015).

Methods employed
The network science methodological toolbox is comprehensive and evolving very rapidly. This paper uses those methodological tools that help to answer the study's central question. To what extent does the ERASMUS program deepen European identity? In our view, the development of European identity assume a balanced cooperation network of universities. The balance of a network can best be evaluated by comparisons with a random network. Real networks are nonrandom that suggest some mechanisms that could drive network formation (Newman 2003).
We begin by considering the overall network. In particular, we focus on structural elements that indicate inequalities. The power-law distribution of nodes degrees indicating some preference mechanism in the network. Most real-world networks follow a powerlaw degree distribution (Barabási 2014). However, there are also contradictory opinions (Tanaka 2005 Going further, significantly higher connectivity between nodes that are rich in connections or hosted students (a rich-club effect) may indicate an unequal distribution of resources (Opsahl et al. 2008), which makes it difficult for small network entities to prevail. Thus, examining the properties of prominent institutions can provide to new insights into such inequalities.
Further investigation focuses on constraints and driving forces. In particular, we investigate the role of geographical distance using network statistics methods. After that we uncover the network modules. Investigating the common properties of vertices belonging to the same module we can estimate driving forces behind modularization. Why exactly do those institutions belong to a module?
With regard to the similarity of institutional properties, we investigate whether a node property has a significant influence on forming the links between nodes. Mobilities occur in particular subject areas. We consider the subject areas represented by the HEIs and the edges between them. It may be appropriate to assume that the ERASMUS student network at institutional level can be characterized as a multilayer network, where layers are subject areas.
In the following subsections, we describe the detailed methodological tools that were used.

Exploring structural network biases
The ERASMUS program aims to play an essential role in creating a European identity. It can achieve this by ensuring that, despite geographical, linguistic and cultural differences, all regions host a balanced mix of students from all over Europe. All institutions in any part of Europe have equal opportunities to send and receive students.
If the network structure is affected by preferential attachment, it is revealed by examining the degree distribution. Because preferential attachment is responsible for the emergence of scale-free property and power-law degree distribution (Barabási 2014). We cannot measure the growth of network with data of one academic year, but degree distribution indicate the possibility of preferential attachment at each time of evolving networks. Preferred destinations indicate that several institutions or sub regions are particularly attractive. If preferred destinations exchange students significantly more frequently among themselves, a rich-club effect occurs in the network, which indicates an inequality in the distribution of resources (Opsahl et al. 2008).
The phenomena are not a significant problem and may be natural; however, they indicate that the original objective may be compromised.

Degree distribution
One of the most important large-scale structural information networks is the degree distribution of nodes-the frequency distribution of vertex degrees defining the characteristics of the network structure. The value p k is the probability that a randomly chosen node in the network has a degree k. There is a Poisson degree distribution of truly random graphs, e.g., Erdős-Rényi (1960), which has a peak at approximately 2 or 3 (the most frequent degree). Scale-free networks can be described by a power-law distribution, where P(k) = k −γ (Albert and Barabási 2002), where the γ parameter is the research focus. γ relates to the scale-free property if 2 < γ < 3 (Barabási 2014).
In our study, we follow the suggestion of Albert and Barabási (2002); Barabási (2014) for a fitting strategy. The power-law distribution is fitted to the empirical distribution of weighted indegree and outdegree and the linear section designated by k min and k sat are computed. The p-value suggests that we can or cannot reject the null hypothesis that the data follow a power-law distribution. The possibility of a log-normal and exponential distribution is also tested with Vuong's test statistic. The twosided p-value of the comparison test shows that both distributions are equally close to the distribution, and other distributions can be rejected or not. The power-law package in R (Gillespie 2014) was used for calculations.

Rich club effect
Large-scale hierarchical organizations can develop into complex systems. Prominent elements at the top of the hierarchy share and control resources or avoid one another (Opsahl et al. 2008). The rich-club phenomenon refers to a small number of nodes with a large number of connections forming tightly interconnected communities (Colizza et al. 2006). Among the rich nodes, significantly more connections have been found to develop in Internet networks (Zhou and Mondragón 2004), in the brain network of people with schizophrenia (Van Den Heuvel et al. 2013), and in the global water trade network (Konar et al. 2011). The rich-club effect in the ERASMUS network refers to the inequality of resources. The competition between the universities is based on the number of received students.

Other large scale network structure information
The clustering coefficient, indicating the rate of common neighbors of two vertices, is an order of magnitude larger than if the relationships were formed randomly. The phenomenon is natural in social networks. It shows that a connection with a friend of friends is more likely to develop than with another node.
Components are units in which the vertices are connected, but there is no connection between the components (Barabási 2014). They are like islands with different size. The components developed in a mode typical of real networks.

Analysis of the effect of spatial constraint
The ERASMUS network can be interpreted as a spatially weighted directed graph where the weights of the edges are the number of travels between HEIs. Here, we want to empirically describe the effect of geographical distance on network structure with network science tools. Spatially distributed nodes include origin-destination (Gómez et al. 2018), air transport (Guimerà and Amaral 2004), travel (Bao et al. 2017), migration networks (Landesberger et al. 2016), and world trade webs (Picciolo et al. 2012), which successfully describe social and economic processes and identify structural biases affected by geographical distance. Spatial constraints in email (Goldenberg and Levy 2009), mobile communication (Lambiotte et al. 2008) and online social (Lengyel et al. 2015) networks have emerged. To minimize their effort (Zipf 1949) and to maintain social ties, most individuals connect with their spatial neighbors. Spatial constraints affect the structure of networks (Barthélemy 2011), e.g., the probability of triangles and link formation decreases with distance. Moreover, spatial constraints affect degree distribution because long-distance costs suppress nodes with high degrees (Expert et al. 2011).
Due to the increasing cost of edge formation caused by geographical distance, the probability of forming connections between distant nodes decreases. This phenomenon leads regional centers to have high degrees, the degree distribution in most cases is power-law, and the network groups are arranged according to geographical regions (Barthélemy 2011). In models of real geographical networks, the probability that a new node j will link to a node i at a distance d i,j exponentially decreases (Boccaletti et al. 2006).

Modeling the distance dependence of connections
The deterrence function describes the effect of geographical space in the general gravity law equation (Barthélemy 2011). As we have observed in some of the experimental research, gravity law-based models of the ERASMUS network show that geographical distance is negatively associated with forming relationships. However, the empirical deterrence function is missing in the literature. The deterrence function f (d i,j ) can be directly measured from the data through a binning procedure, as shown in Eq. 1 (Expert et al. 2011). whose function is proportional to the weighted average of the probability of a link existing at distance d. The I out i I in j in Eq. 1 is the importance of nodes, e.g., the in/out degree or strength in the case of the configuration model, but socioeconomic indicators can also be utilized (Gadar et al. 2018). A i,j is the observed number of directed links between nodes i and j, and d i,j is the distance between nodes i and j. The distance dependence of the connection probability can be handled by explicit functions such as f (d) = 1/d δ i,j (Jung et al. 2008;Krings et al. 2009 (Kaluza et al. 2010). The deterrence function is constant if the modeled edge formation probability in the denominator in Eq. 1 is equal to the edge probability in the expected network in each distance bin, and the deterrence function does not depend on distance (Expert et al. 2011); thus, distance does not play a role in the formation of edges.

Distance dependence of contact with a common neighbor
Theoretically and empirically, the probability of the connection with a common neighbor, i.e., triangles in network science, is higher between closed nodes and decreases with distance. As a consequence of this phenomenon, the transitivity is higher in spatial networks than in networks without spatial constraints (Barthélemy 2011), and spatial networks are spatially clustered. There is some research on the distance-dependent local clustering coefficient. However, these studies hardly handle the distance because the local clustering coefficient is a node property while the distance is an edge attribution. Lambiotte et al. (2008) considered the geographical extension of communication triangles and showed triangles not only composed of geographically adjacent nodes. They proposed a distance-dependent clustering coefficient, which calculates the probability that a link of length d belongs to a triangle shown in Eq. 2. Their results showed that c(d) decreases with distance and reaches a plateau.
where C d is the number of links with length d and belongs to triangle, and L d is the number of links with length d (Lambiotte et al. 2008).

Search for grouping factors
Many networks divide naturally into groups or communities. Investigating groups within networks is a fruitful area of network science. However, it is difficult to identify communities. In this study, we use the modularity measure introduced by Newman (2006), represented in Eq. 3, to uncover modules and to establish the modular structure of the network. A network module is a subgraph whose vertices are more likely to be connected than those outside the subgraph. The interactions of the vertices are more closely related to some and less closely to others, leading to the formation of modules. Frequent connections have reason and/or function, and if several nodes in the network behave or function similarly, it leads to local relationship densities and the formation of modules (Newman 2018;Tunç and Verma 2015).
where L is the total number of weights in the network, A i,j is the observed directed edge weight between nodes i and j, k i and k j are the outstrength and instrength of nodes i and j, respectively (in the case of directed and weighted networks), Kronecker δ is one if the community of i ( c i ) and community of j ( c j ) are the same, and 0 otherwise. Modularity reflects the extent of observed edge weights relative to the random configuration network. The modularity maximization algorithm organizes the vertices into groups in which the edge weights within the modules are as large as possible compared to the configuration model.
The modular structure of the network highly reflects the spatial constraint. If the reference model is the configuration model, then the clusters highlight the spatial proximity of the nodes. Thus, the modular structure follows the geographical structure. However, spatially independent clusters can be uncovered based on the appropriate null model (Expert et al. 2011;Gadar et al. 2018). To examine the spatial characteristics of the network, we uncovered modules and examined the regional partitioning of the modules.
We use the Leiden algorithm (Traag et al. 2019) implemented in R (Kelly 2020) to define the modules, which outperforms the popular Louvain (Newman and Girvan 2004) algorithm.

Examining the relevance of a multilayer network model
This work's results show that HEIs represent different subject areas and that specialists and generalists are involved in the network. Mobilities between HEIs are strongly related to the subject areas studied by an ERASMUS student at the host institution. To the best of our knowledge, the impact of subject areas has not yet been studied.
Subject areas have a strong influence on the formation of modules, which will be discussed in "Overlap between subjects: a multilayer structure" section. It is reasonable to assume that subnetworks behave differently, and it is worth examining them separately. The subject composition of the institutions and the links between them will be examined. Suppose the links are typically related to one or two disciplines. In that case, it is recommended that a multilayer network model be used since the links between institutions are formed on a disciplinary basis.
The theory of multilayer networks is a rapidly growing field where layers represent the different contexts (Kivelä et al. 2014;Boccaletti et al. 2014). Multilayer network models are widely used in social network analysis (Dickison et al. 2016;Gadár and Abonyi 2019). Network layers are relevant if people are involved in a network in multiple overlapping links. Nodes can be characterized by their activities on different layers (Cai et al. 2018). Multilayer network models provide a better understanding of network structure and the role of nodes and the connections between them (Battiston et al. 2014;Mollgaard et al. 2016).
A multilayer structure influences how information is spread and the resilience of the network. In most cases, network layers have different structural properties (Boccaletti et al. 2014). However, network layers may overlap. The centrality of nodes as an indicator of their importance may differ among the different network layers and their characteristics (Halu et al. 2013;Solé-Ribalta et al. 2014). There are several methods to detect dense communities in a multilayer network (Berlingerio et al. 2013;Jeub et al. 2017), and generally, they uncover different communities by layers. Examples show that complex network systems separated by relationship type facilitate the understanding of the systems.

Overview of the network
The first impression of the network is introduced with some topological and structural information. The characteristics describing the large structure of the network are given in Table 1. The table shows the elements of the whole network structure and the subnetworks organized by subject areas. The subnetworks reflect the multilayer structure of the network, which is described in more detail in "Overlap between subjects: a multilayer structure" section.
The ERASMUS network is also referred to as an exchange network. Only 10-30% of interinstitutional relations are reciprocal. The low ratio of mutually directed edges between the HEIs indicates that exchanges do not occur at the institutional level. The Table 1 Structural metrics of ERASMUS student network in 2013-2014 and subject based subnetworks (Abbreviations in rows: N -number of nodes, W tot -the total number of weights/travels, L -the number of directed edges, E -the number of connections between HEIs not taking into account the direction and weight of edges, γ in , γ out -the fitting parameter of power-law distribution in case of in-strengths, k sat and k cut represents the minimum and maximum node (in or out) strengths between the power-law distribution can be fit, i.e., log-log scale plotting of distribution the linear range is between the k sat and k cut , G(N, L) -Erdős -Rényi random graph with the same number of nodes and edges as reference) proportion of reciprocal connections in the whole network is higher than that in the subject-based subnetworks. This can be explained by the different subject areas of the mutual edges between the same pair of HEIs. The clustering coefficient is relatively high. The values of the random Erdős -Rényi network with the same number of nodes and edges are also included as a reference in Table 1. The clustering coefficient of the actual network is higher by an order of magnitude. This phenomenon suggests that existing cooperative links have an impact on the formation of new links. In other words, new links are more likely to be formed through a common neighbor.

ERASMUS Education Humanities and Arts
The network has one giant and several smaller components consisting of 2-4 institutions, which is also true for subject area subnetworks.
The strength distribution of the nodes does not show a peak, and the average strength would be misinformation. A power-law distribution could not be fitted to the weighted degree distribution, and the scale-free property can be statistically rejected, which confirms the previous finding on topology (Derzsi et al. 2011).
However, surprisingly, there are node strength ranges between k sat and k cut in the subject area subnetworks to which the power-law distribution can be fitted. Most of the fitting parameters ( γ ) are above 3, which, according to Barabási (2014), is the random range. The γ value is in a scale-free range between 2 and 3, suggesting a preferential attachment phenomenon in the engineering, agricultural and services subject areas. However, it should be noted that based on the pairwise comparison Vuong test, the lognormal and exponential distributions cannot be rejected in either subject area case.

Spatial characteristics of the ERASMUS student network at institutional level
We examined the effect of geographic distance on the emerging ERASMUS student network using several pieces of information. The focus of this analysis is to measure the effect of distance on the probability of forming relationships. We examined whether the financial system of the ERASMUS program neglects the effect of geographical distance on the development of cooperation.

Distance distribution of connections
The distribution of distances of movements between HEIs in the ERASMUS student program helps increase our understanding of spatial patterns. The distribution of distances was compared with random movements between spatially embedded nodes that were considered references. Although the proportion of pairs of nodes at which distance d is connected decays with an exponential function in most real-world networks (Barthélemy 2011), it is also necessary to consider how many options are available to move there. If geographical distance affects the probability of travels between nodes with distance d, significant differences should be shown between the real-world and random networks' distribution. The distance distribution of the random network represents the distance-independent case. The denominator of 1 shows the configuration model-based random network if I is degree or strength. The first random model is the configuration model based on reweighting the existing weights of each directed edge. From the configuration model, we can derive Eq. 4.
where Ã i,j is the expected weight of the directed edge from node i to node j, s out i is the out-strength (weighted degree) of node i, s in j is the in-strength of node j, and L is the sum of all weights in a directed network. The second random network is the real network rewired with the Erdős-Rényi algorithm (G(N, L) procedure) (Erdös and Rényi 2011), and the third random network is rewired with the degree-preserving algorithm (Barabási 2014). The comparison of distance distributions of the empirical and randomized networks is shown in Fig. 1.
The distributions have similar curves, with peaks in each curve. The weighted averages of movement distances in the empirical network, in the reweighted network by the configuration model, in the degree distribution preserving network and in the Erdős-Rényi random network are 1318 km, 1341 km, 1314 km, 1334 km, respectively, which are very close values. The Erdős-Rényi random network and the degree distribution preserving network do not consider that the ERASMUS program support travels to a member states other than the country of origin. Therefore, these random networks overestimate the values of the real-world network for short travel distances. Interestingly, the reweighted network underestimates the number of short travel distances, and more people travel to neighboring countries than would be estimated by sending and receiving capacities. Overall, it can be concluded that the reweighted network (configuration model) estimates the empirical network well, indicating that distances play a small role in network formation.

Distant independent probability of common neighbor
The next piece of information on the effect of distance on the network structure is the probability of forming triangles. The increasing distance decreases the probability (4) A i,j = s out i * s in j L Fig. 1 Distributions of movement distances, empirical network compared with Erdős-Rényi random graph, degree preserving random graph and configuration model based rewired random graph of forming edges between distant nodes and the common connections in spatial networks. The short distance regime of the network has a large clustering coefficient while the distance increases between two nodes and the probability of forming triangles decreases. This phenomenon can be investigated with the probability c(d) that a link of length d belongs to a triangle; see Eq. 2. (Barthélemy 2011). The directed and weighted network was converted to undirected, and the edge weights were not taken into account when examining the triangles. The edges were grouped by distance, where the size of the bins was 100 km. The number of edges in each group was summed. The edges were matched with triangles, and the number of triangles was calculated for each edge. The ratios in each distance group are plotted in Fig. 2. Figure 2 shows the probability that an edge of length d consists of 1, 2, 5, 10, 20, 50, and 100 triangles and the role of d in changing the probability. There is an increasing tendency in the probability at shorter distances (less than 300 km), but very long distances (more than 3500 km) tend to decrease. The trend lines fitted to the ratio values indicate a slight decrease, but this can be negligible. The shape of the curves is not similar to the empirical results observed in other distance-dependent networks (Lambiotte et al. 2008). They suggest that the probability should decrease with distance. Empirically, the clustering in the ERASMUS network is rather independent of distance. The decreasing trend is also not observed as the number of neighbors increases. If the likelihood of forming common neighbors is unaffected by distance, what effect does it have on forming relationships?

The effect of spatial distance on the likelihood of forming connections
The deterrence function f (d i,j ) is directly measured from the data as Eq. 1 showed. The bin size was set to 200 km. The sending and receiving capacity of institutes (the in and out strength of nodes) was used as an importance factor. The resulting experimental deterrence function is shown in Fig. 3. The deterrence function completely differs from what is expected for spatial networks (a decreasing trend with increasing distance, e.g., Kaluza et al. (2010)) and instead shows that distance plays a minimal role in the structure of the network. A smaller proportion of students prefer short distances, which can be seen in the left part of the deterrence function. The probability of the observed number of journeys is 1.2 − 1.7 times higher than expected below the 500 km region. The number of travels between the 500 km and 3500 km distance ranges is the same as the number of travels predicted by the configuration model. Most trips are made at this distance range based on the distance distribution of the trips in Fig. 1. We can conclude that distance has no role in the probability of edge formation in that distance range. Finally, distance does play a role in travels that exceed 3700 km. However, scholarships closer than 4500 km are more likely than expected, indicating that the ERASMUS scholarship helps fulfil some students' desires to reach distant destinations. The shape of the curve shows that, based on the sending and receiving capacities of the institutions, the configuration model appropriately approximates the spatial constraint.
The irregular shape of Europe raises the question of how the probability of the formation of edges at different distances is related to the position of origin. For example, the Iberian Peninsula, Scandinavia, and Turkey are geographically located on the periphery of Europe, while Germany is in the center. In 2013, the geographical center of the 27 European Union member states was in the state of Hesse in Germany, which is a good approximation of the geographical center of the ERASMUS network.
To determine the deterrence function by country of origin, we used the formula defined by Eq. 1 and filtered the edges according to the analyzed countries of origin. The results obtained are shown in Fig. 4.
Each country has a different characteristic of travel distances based on Fig. 4. The outgoing travels from Austria, Bulgaria, Poland, Romania and Spain show a deterrence function similar to that of the whole network. For travels less than 500 km, there is a peak in all Eastern countries (except the Czech Republic). Thus, more people traveled shorter distances than expected. One of the most significant differences between the experienced and expected values in the trips below 300 km is in the case of Hungary. According to the configuration model, students are more likely to go to Hungarian-language institutions outside the border and Eastern Austrian universities than expected.
The peak below 300 km, for short distances, refers to the frequency of travel to a neighboring country. Similarly, Maggioni and Uberti (2009) found that the gravity model shows a significant positive effect on travelling to neighboring countries. This mobility strategy suggests that although they want to take advantage of EU support and get to know other cultures, they do not want to be away from home for some reason, and perhaps they want to travel back home frequently in the granted period.
Peaks can be seen in the section above 3000 km in the case of many countries. This phenomenon suggests that more people travel to distant host institutions than would have to travel under the random model. The ERASMUS program opens up opportunities and helps university students achieve their dreams of knowing other cultures.
The distance-independent phenomenon means that the ERASMUS program can connect nations without spatial constraints. However, different mobility strategy patterns can be seen. In the next section, we will examine whether other factors drive the formation of relationships.

Connectivity of regions as an affect of connections between institutions
The EU support system allows students to travel from the sending institution to the host institution, regardless of geographical distance. The question is whether socioeconomic factors have a role in the network structure. Do the various economic and cultural factors cause any bias in the network structure?
The connectivity in the network is most balanced in the random configuration model, so the configuration model serves as a reference to explore the modules and determine the deterrence function. The reference network only takes into account the sending and receiving capacities of the nodes. The number of extra trips is due to attraction, and fewer trips can originate from some aversion. The deviation from the reference is a good approximation of preferences or disfavor from the viewpoint of the sending node.
The configuration model-based estimation of the relative strength between large regions of Europe is shown in Fig. 5. The rate of connectivity between regions is calculated with the ratio of the real and expected amount of mobility ( A i,j /Ã i,j ). A rate greater than one indicates more interest from the origin to the destination than expected from the random configuration model. A value equal to one indicates a balanced relationship. Countries at East are BG, CZ, EE, HR, HU, LT, LV, MK, PL, RO, SI, SK; countries at South are CY, ES, GR, IT, MT, PT, TR; countries at West AT, BE, CH, DE, FR, IE, LI, LU, NL, UK; countries at North are DK, FI, IS, NO, SE. In the classification of countries, we followed (De Blij et al. 2003) (Fig. 1-13) with three exceptions. The UK and Ireland were classified as Western Europe because they would be a separate region anyway. Turkey was added to southern countries. Estonia, like the other Baltic states, was classified as Eastern Europe and not as Northern Europe.
The observed ERASMUS student exchange between western and southern countries is almost precisely the expected value. There are far more travels between western and northern countries than the configuration model would expect. The number of mobilities from eastern countries to western and northern countries is in line with expectations, but the proportion of trips in the opposite direction is much lower. Travels from east to south were as expected. More people travelled from south to east than according to the configuration model. The extent of the north-south relationship is much smaller than expected. The countries of the western and southern regions are characterized by remaining in their region, as expected. In eastern countries, staying in the region is more common than the configuration model expects.
Moreover, participants from northern countries prefer to travel to other regions, typically to western countries.
The strength of connectivity between countries is shown in Fig. 6. The figure shows the difference between the actual and expected amount of mobility ( A i,j -Ã i,j ). The numbers of trips between countries are much smaller than those between regions, so we return to the logic behind Eq. 3 and plot differences.
One of the driving forces for travel from the southern region to the east is that participants from Turkey, Portugal, and Spain are more likely to choose Poland as their destination than expected. More people travel from Turkey, Italy and Germany to Eastern European countries than predicted by the configuration model. From Portugal, the eastern countries, Italy and Spain, host more students than assumed. This relationship strength is likely due to economic reasons. The cost of living is relatively lower in the east. There is no country-to-country relationship from south to north that is stronger than we would expect. Climatic conditions presumably explain this. From the north, Greece, Cyprus and Malta receive more students than expected by sending and receiving capacities, but these are low-capacity countries. Western host countries from eastern countries are mainly the German-speaking area, while France, England, and Ireland host fewer people from the east than expected by the random configuration model. From west to east, the number of trips from France to Romania and from The Netherlands and Germany to Hungary is higher than expected. However, all other relationships are weaker than expected. The strongest relationship, outstanding compared to the expected value, is the reciprocal travel between Spain and Italy.
Geographical distance has minimal influence on the formation of connections. However, the preferred destination depends on the starting location. The detailed deviation in the relationship between countries and regions compared to the random configuration model is interesting. They are indicating preferred destinations from the countries of origin. A more detailed examination of the social and economic factors behind the preference of the country or region as destination goes beyond the focus of this article.
In the next section, we analyze whether institutional clusters based on mutual preferences form significantly more links with each other than with the rest of the network.

Spatially organized modular structure
We could separate the groups of institutions in the network that prefer to collaborate with each other than with other HEIs in the network. The recently introduced Leiden algorithm was used to uncover the modules (Traag et al. 2019).
The stochastic algorithm yields slightly different clusters at each run. Therefore, we analyzed the partition with the highest modularity value. The network's modularity varied between 0.230 and 0.243, which does not refer to the strong modular structure of the network. The number of nodes (N) in modules is shown in Table 2. Additional modules consist of 2-3 vertices, but they are negligible and are not shown in Table 2. Although the modularity is relatively low, 57.1% of travel occurs within the modules, which is a good representation that connections within the modules are denser than between the modules. Table 2 summarizes the proportion of institutes by region in each module. The institutions of each region are represented significantly differently in the modules. Eastern institutes dominate module 1 and 6. Southern institutes are overrepresented in Module 1. Western institutes are overrepresented in Modules 2, 3, and 5. Northern HEIs are overrepresented in Modules 2 and 5.
In addition to the regional proportion of the modules, we examined the distances of the travels within modules. We found that modules are typically not organized by distance, and students choose destinations ignoring geographical distance in each modules. The distance distribution of the movements has a similar shape, as shown in Fig. 1, so the averages of the distances can be compared. Module 6 considered that organized by spatial distance. This module contains the fewest institutions, 60 in total. In addition, the majority of them are located in the eastern region of Europe.

Modular structure organized by subject areas
A significant relationship was found between the module structure and the subject areas of movements within modules. The subject areas of each movement between HEIs are known. We examined the subjects of interinstitutional travels within the modules, and the proportions of disciplines in modules are shown in Table 1. The p-value of the Chi2 test is very close to zero, which indicates a strong relationship between the modules and subject areas.
The subject area proportion of movements is different in each module, and in none of the modules is the distribution the same as in the whole network. In Module 1, the humanities and arts and social sciences are underrepresented, but this module is the most similar to the whole network in terms of subject area proportion. In Module 2, social sciences, while in Module 3, humanities and arts and social sciences dominate. In Module 4, student exchanges take place almost exclusively in the humanities. Module 5 is overrepresented for engineers as well as the IT fields. In Module 6, the subject areas were covered by services and social sciences.
Modules are illustrated on maps in "Appendix I: Maps of modules" section.

Module 1: Turkish destinations and other HEIs in the southern region
The module has an orientation in eastern and southern Europe. The vast majority of Turkish institutions are located in Module 1. Most of them are hosted in Polish institutions. In this module, many people travel from Portugal to Poland and from southern Italy to western Spain. The proportions of subject areas of the travels are similar to those of the whole network, so the module is not organized by subjects.

Module 2: Social and economic science-specific HEIs
The module is overrepresented by western and northern HEIs, but the most significant number of trips are from England to Spain. Travels in the social sciences and economics subject areas represented the highest proportion compared to other modules. Geographical regions, cultural factors, and subjects have a slight influence on the organization of this module.

Module 3: The rich club and their connections
The module includes the largest host institutions. We examined whether there is a rich-club effect (Opsahl et al. 2008) in the ERASMUS network because the largest host HEIs organized in this module indicate a stronger relationship density between them. There is a significant rich-club effect. Large host institutions tend to be connected, which explains why they are organized into one module. It should be noted that not all disciplines have a rich-club effect.
The inclusion of high in-strength institutions in this module depends on the subject area represented by HEIs. Within the module, trips typically occur in the fields of humanities, social sciences, economics and law. A total of 62.7% of mobilities are taken in those subject areas. The high host capacity is strongly correlated with the subject area represented by the HEIs.
The module is also organized spatially. This includes western institutions in the largest proportion, and all other regions are underrepresented. However, the large host institutions of the southern region belong to this module.
Module 4: Humanities and arts specific HEIs Travels within the module are subject area specific. Mobilities between relatively small in-strength institutions are almost exclusively in the humanities and arts subject area. Spatially, almost all regions are represented to the same extent as in the whole network. In terms of the number of institutions, the eastern region is underrepresented.
Module 5: Engineering and IT specific HEIs The nodes in this module are linked by engineering and IT subject areas. The vast majority of travels between the institutions in the module take place in these fields. The number of western and northern institutions is overrepresented, and eastern and southern institutions are underrepresented in the module, but the difference is not significant compared to the overall network.

Module 6: Eastern Military Academies
Module 6 is a module of eastern military academies. Although travel is linked to economics and so-called services, it is clear from the names of the institutions that it is linked to national defense. The module has a strong eastern European orientation.

Overlap between subjects: a multilayer structure
The modular structure of the network is strongly related to subject areas and weakly related to geographical regions, except Module 1. Almost all modules are organized by subject area. This phenomenon suggests that mobility interaction patterns are more subject area dependent.
The formation of institutional "circles of friends" means that the institutions do not represent all academic subjects. The subject dimensions of the edges (travels) are known. We calculated each HEI's subject represented based on the subject area of outgoing and incoming travels. Almost half of the institutions are specialists and represent 1-2 subject areas. Institutions are shown in Fig. 7, colored according to the diversity of disciplines, where their size is proportional to the number of students hosted.
In addition to the subject area composition of the vertices, we also examined the subject dimensions of the edges. Directed travel between two HEIs takes place, on average, with 1.21 different subject areas. That is, from a sending institution, they usually arrive at a host institution in 1-2 different fields. We also examined the subject area composition of travels between institutions representing at least five disciplines. Out of 79,070 directed connections, 50,921 directed mobilities take place between institutions representing at least five disciplines. On average, 1.29 types of disciplines were found between HEIs representing at least five subject areas. Connections between generalist HEIs are subject-specific. It can be stated that the relationships between the HEIs are strongly subject area specific. ERASMUS relations are de jure between institutions but de facto between faculties representing a subject area.
Due to the subject-specific dependence of the module structure and the narrow subject dimension composition of nodes and edges, it is highly recommended that the Fig. 7 Activity of institutes in subject areas on map ERASMUS exchange network be defined as a multilayer network. Layers can be formed based on each subject area. Within the institutions, the departments and faculties can operate quasi independently, and this is also reflected in connection with ERASMUS travels.
Large-scale structural elements were examined for the subject area-based network layers, and the results are summarized in Table 1. The most important lesson learned is that the power-law distribution has been fitted to the strength distribution of the vertices. For the engineering, agricultural, and services subject areas, the gamma values ranged from 2 to 3, which indicates a scale-free property and preferential attachment. The network layers of education and services subject areas are the least connected and have the lowest clustering coefficients, and the highest number of components.

Summary and conclusions
In this paper, we analyzed the relationship between the ERASMUS student exchange network structure and drivers of cooperation of HEIs based on the EU's open dataset. The network represents the most significant educational cooperation between Europe's HEIs. The structure is less effected by spatial distances thank to the financial support of ERASMUS programme. Altough highly correlated with subject area represented by HEIs.
In conducting this research, we obtained detailed information about the drivers of the formation of collaborations with a network science toolset. We highlighted that network science could be used to fine-tune information gathering. We can examine structure parameters with network science tools, and, in addition, the exact location of each phenomenon can be determined.
The complexity of a network is caused by several endogenous factors that shape it at the same time. First, parallel travel strategies are present in terms of spatial distance. The ERASMUS student cooperation network is not affected by geographical distance. However, nodes are spatially distributed. The ERASMUS program covers the cost of travels or edge formation. Nonetheless, significantly more people choose a neighboring country compared with the random effect. Furthermore, we could distinguish which countries of origin are characterized by this strategy (see Fig. 4. Furthermore, significantly more people travel to far (more than 3000 km) destinations.
Second, the results suggest that the structure is supposedly influenced by the cost of living, cultural differences, and learning intent and that none of these is dominant in the network. The effect of cultural and economic factors on the network structure can be seen through connections of European regions. The regions represent different cultures and their economic development is various. Travels from west and north to east should have higher volumes in terms of the number of mobilities. Exchanges between northern and southern European regions are also deficient, presumably due to climatic, cultural and language reasons. The amount of scholarship for long-term study abroad is determined based on the host country's living costs. Eastern scholarship may seem low for northern and western countries, which may be one reason why participants still not travel to behind the former Iron Curtain. However, there may also be a correlation with the fact that more prestigious institutions are more densely located in the west and north. The results suggested from the comparisons of real and the random network provide input information for further research for more detailed explanations of drivers of connections. But it should be tested with appropriate model-based network statistical methods.
Third, we demonstrated that the subject area of travel plays a significant role in the network structure at institutional level. The connections between HEIs are subject-specific. Moreover, more than half of the nodes (HEIs) are specialists, represents only one or two subject areas. Presumably, the faculties of generalist HEIs representing the individual disciplines can organize their education-related activities independently, and connections are between the faculties instead of HEIs in the network.

Implications for scholars
A deeper understanding of the system is recommended through the combined use of economic models and network science techniques. Network science-based results are different and a good complement to economic science-based studies.
We have shown with network science methods that the structure of the institutional network is independent of geographical distance, which result highly correlate with previous results (Maggioni and Uberti 2009;González et al. 2011). Therefore, the application of distance-based gravity models is limited.
We recommend using subject-based multilayer network models to analyze the ERAS-MUS network because different subject layers (subnetworks) have different structures. Moreover, different HEIs contribute to different network layers according to the subject they represent. Only generalist HEIs participate in more network layers.

Implications for policymakers
An important finding for policymakers is that, on average, the ERASMUS program allows students to travel anywhere regardless of the spatial distance. The extent of mobilities between some regions is even smaller than that under random conditions which suggest that cultural, linguistic, and economic differences may affect the network. A rich club effect is present in the institutional network which shows that a group of institutions has greater control over resources because they exchange students between themselves in greater numbers than we would expect. The network is not fully balanced compared to the random network and there are some bias. This can affect the development of a European identity at high level.
Moreover, the message for educational policymakers is that it is problematic to compare and rank higher education institutions as they represent other disciplines. The role of a HEI in the whole network is subject area dependent. Approximately half of HEIs are subject-specific. In addition, the connections between HEIs representing several disciplines (generalists) are typically connected through one or two disciplines. Relationships are between faculties rather than HEIs.