The role of higher education in spatial mobility

The role of higher education in social and spatial mobility has attracted considerable attention. However, there are very few countrywide databases that follow the career paths of graduates from their place of birth, through their enrollment in university, and ultimately to their workplace. However, in Hungary, there is an excellent source maintained by the government’s Education Authority containing information on career tracks, which allows one to follow all students from their place of birth, through their choice of higher education institution, to their workplace. With the combination of gravity-like economic models and the proposed mobility network, this paper examines the mediating and retaining role of institutions. This paper also proposes how to calculate the added value of location and institution in salaries and how to use these values to explain mobility between locations. The paper also shows how economic inequities influence revealed application preferences through the asymmetry of the mobility network.


Introduction
finds that language, quality of higher education and university rankings have positive impacts while distance and migration costs have negative impacts on international student mobility to destination areas. The features and reputations of institutions also play a substantial role. Their quality is often measured by proxies (Lourenço and Sá 2019): for example, the employability of graduates (Sá et al. 2012;Lourenço and Sá 2019), the teacher-student ratio (Sá et al. 2004), student quality (Dotti et al. 2013), research effectiveness (Adkisson and Peach 2008), and the results of international rankings (Ciriaci 2014).
The reported effect of quality differs across studies (Lourenço and Sá 2019). One study reports that there is not enough difference to identify an impact of institutional quality Sá et al. (2012). In other cases, quality seems to matter (Ciriaci 2014;Cooke and Boyle 2011). A recent paper investigates whether research quality is associated with a university's ability to attract students from other provinces in Italy. The estimates suggest that research performance is a significant predictor of student enrollment. Cross-country differences in the quality of higher education institutions (HEIs) may also play a substantial role in international and internal student mobility (Bratti and Verzillo 2019). Mobility decisions depend on the attractiveness of the origin and destination and one's age and may be justified by emotional or family reasons (Lourenço et al. 2020). However, the advantages of migration are explored from various perspectives: more interesting teaching services, distinct sociocultural skills, and the opportunity to leave one's home (Holdsworth 2009;Lourenço et al. 2020). Further research shows a negative effect of distance on students' mobility decisions (Sá et al. 2004). The direct cost of attending an HEI has received considerable attention in empirical work. Financial support packages, which at least partly cover the expenses of a college education, are available in most higher education systems. The amount of financial aid, in the form of grants and scholarships, is expected to have a positive effect on the probability of enrollment (Fuller et al. 1982;Catsiapis 1987). Nevertheless, these financial aid packages rarely cover all out-of-pocket expenses, such that students are often dependent on their families' financial resources. Household income is among the most commonly evoked factors when discussing the decision to continue studying after the secondary level, with most studies finding that the higher the income of the household is, the higher the demand for post-secondary education and the propensity to be in school after the secondary level (see, for instance, Savoca 1990; Duchesne and Nonneman 1998;Checchi 2000;Hartog and Diaz-Serrano 2007). Parental educational level and occupational status are sometimes used either to proxy for this income effect or examined in their own right, and they exert a positive influence on young people's decisions to attend higher education (e.g., Checchi 2000;Nguyen et al. 2003;Hartog and Diaz-Serrano 2007).
Application to universities (so-called application mobility) is the first step in the spatial mobility of young people. The birth-to-school and school-to-work (so-called occupation mobility) transitions can be examined using both economic models (see, for instance, Tuckman 1970; Orsuwan and Heck 2009;Niu 2015), such as gravity models (e.g., Agasisti and Dal Bianco 2007;Alm and Winters 2009;Abbott and Silles 2016;Cullinan and Duggan 2016), and social network analysis (see, e.g., González Canché 2018;Bilecen et al. 2018;Kondakci et al. 2018).
This study demonstrates advantages of combining economic and data science methods in the analysis of application and occupation mobility. Gadar and Abonyi (2018) analyzed the school-to-work transition in the framework of a bipartite network model using the integrated database of the National Tax Administration, the National Health Insurance Fund, and the Education Authority. Based on the Education Authority database, application mobility is investigated using gravity models and logistic regressions (Telcs et al. 2015); however, to the best of our knowledge, no prior work has combined economic and social science methods to analyze the dynamic structure of the mobility network. Combining time-series gravity models and dynamic network science methods allows us to understand changes in the structure of the mobility network. These models can cross-validate one another, and in this way, they can be used for model triangulation (Modell 2015).

Spatial mobility
Spatial or geographical mobility is the movement across different locations. It concerns the physical motion from one space to another (Powell and Finger 2013). It is usually distinguished from social mobility, which is the ability to move up or down in social class (typically defined in terms of wealth). However, sociologists argue that the social mobility of individuals with diverse backgrounds is adversely affected by educational experiences (Powell and Finger 2013;Haveman and Smeeding 2006;Brown 2013). Therefore, the spatial mobility of applicants (application mobility) and of graduated early-career people (occupation mobility) are usually motivated by the promise of social advancement, such as a better salary, better existence, and better social esteem (Montmarquette et al. 2002;Hilmer and Hilmer 2012). Kazakis (2019) finds a positive and significant relationship between the migration flows of skilled individuals and innovation (patents as proxy), productivity (using, e.g., total factor productivity and labor productivity as proxies), higher population density, and higher investments in R&D. He reveals that technological development is highly correlated with education. Highly innovative regions are more attractive to professionals. Grogger and Hanson (2015) analyze how economic and political conditions influence foreign students'decisions to live in the United States after receiving PhDs from US universities. The authors find that students who receive meritbased fellowships or scholarships during their studies and have more educated parents are more likely want to stay in the United States. The authors contend that a stronger US economy makes it more attractive for the graduates to remain in the United States instead of returning to a home country with a weaker economy. Lucas (2001) reveals the factors affecting the international migration of highly skilled people and some benefits of the technology transfers, international trade and capital flows induced by "brain drain". Faggian and Franklin (2014) contend that the migration of highly educated persons is fundamental for policy makers. Higher human capital can improve a region's relative position. They use a negative binomial regression to estimate a gravity-type model of the interstate migration of "college-bound" high school students in the US. Chetty (2020) measure children's outcomes in adulthood to advance the research on human capital development and find that neighborhoods have causal effects on children's long-term outcomes.
Despite the divergent results on spatial mobility in the recent social science literature, there are studies examining geographical mobility trends within countries over time (Kulu et al. 2018;Chetty 2020), especially in the case of application and occupation mobility.

Methods for exploring spatial mobility
The diverse methodological approaches for measuring spatial mobility include, on the one hand, direct techniques to quantify the volume of people's movement in space. Examples such as traffic counts (Cascetta 1984), population census data (Ette et al. 2008), or commuting surveys (Rüger et al. 2011) rely to varying degrees on concrete measures of the flows between origin and destination; however, the availability of data for such studies is often limited. Another widely used methodological solution is, therefore, the application of indirect tools for exploring geographical mobility. Researchers estimate spatial mobility numbers, for instance, from GPS tracking data (Zheng et al. 2008;Siła-Nowicka et al. 2016;Zignani and Gaito 2010) or cell phone information (Mohall 2015;Candia et al. 2008;Gonzalez et al. 2008); such methodologies could provide reasonable assumptions on geographical mobility numbers.
Other indirect tools apply proxies, such as distance-based probabilities, to measure the volume of spatial mobility between regions. Rogerson (1990) applied geometric probability methods to estimate migration distances based on the spatial distribution of population and region shape data. It is also common to approximate spatial interactions or mobility volume between regions by taking the size of regions and spatial proximities into account, namely, by applying the gravity model approach from social physics in a geographical context. Moreover, Poot et al. (2016) explicitly highlights the use of gravity modeling in spatial mobility research.
Methodologies related to occupation (or labor) mobility account, for example, for the interaction between the returns to geographic mobility and to the level of education by applying distance functions. Based on a large dataset, Lemistre and Moreau (2009) calculated the distance between the place of education and the location of the first employment of graduated students. Their results suggest decreasing returns to spatial mobility in the distance covered and increasing returns to mobility with higher levels of education. Similarly, Magrini and Lemistre (2013) examine an income-distance tradeoff model and found that the most highly skilled young people do not receive a positive wage return from migration but that less-skilled young workers do. Early-career spatial mobility was also analyzed by Venhorst et al. (2015), who applied an instrumental variable approach in their model and found positive wage returns related to spatial mobility; however, when controlling for self-selection, a strong reduction was observed in the effect of spatial mobility on job match quality. Similarly, Javakhishvili Larsen and Mitze (2015) applied treatment variables in panel econometric models of individual-based longitudinal data to determine the interconnectedness of spatial mobility and early career effects.

The gravity model
The so-called "gravity" equations are widely used in empirical analysis of foreign trade, migration and even capital flows due to mass-based spatial movements (Anderson 1979). Gravity models have become popular due to their flexibility, simplicity and high explanatory power (excellent fit) Anderson and Van Wincoop (2003). To analyze the spatial interaction between two or more locations using this mathematical model, one applies Newton's gravitational law, as with gravity in physics (Paas 2003). Countries and municipalities with high economic power exert attraction on smaller ones around them (Nemes Nagy and Tagai 2011). Attractive areas are geographical points (e.g., cities, small regions) where the attractiveness of the place is stronger than that of any other geographical point. Based on the physical analogy, there are two basic areas of application: the examination of spatial flow (the intensity of the flow) and the delimitation and demarcation of attractive areas (Nemes Nagy and Tagai 2011) Henri-Guillaume Desart developed a version of the gravity model for analyzing passenger travel and applied it to railway planning, and the American economist Henry Carey presented a statement that resembled the notion of a gravity model in 1858 (Odlyzko 2015). According to a survey by Fotheringham et al. (2000), Carey (1858) and Ravenstein (1885) observed that there is a parallel between the movement of individuals between cities and the law of universal attraction, namely, there is a more intense flow between larger cities than between smaller towns (Fotheringham et al. 2000;Ravenstein 1889).
International trade has its own gravity model, developed by Jan Tinbergen (1962). It is a multivariate linear regression model for modeling bilateral and regional trade that is employed for analyzing cross-sectional and panel data (Tinbergen 1962;Anderson 1979). The gravity model of foreign trade, like other gravity models in social science, predicts bilateral trade flows based on the size and distance of the partner economies (Anderson and Van Wincoop 2003). It states that trade between two countries is directly linked to the "gravitational" pull of their national incomes (GDP) and inversely proportional to the distance between them (Paas 2003). The model predicts bilateral trade flows based on economic size (usually measured in GDP) and distance (Anderson 1979;Bergstrand 1985Bergstrand , 1989Anderson and Van Wincoop 2003). The gravity model has been widely used to estimate the impact of a variety of policy issues, including regional trading groups, currency unions, political blocks, various trade distortions and agreements, border region activities and historical linkages (Paas 2003;Westerlund and Wilhelmsson 2011).
The gravity model also underlies migration studies. Most studies using the gravity model approach have sought to rationalize labor force mobility across locations, especially internal migration waves. There has been highly accurate empirical work on this topic for countries such as the United States (Ashby 2007), China (Shen 1999;Poston and Zhang 2008), Germany (Bierens and Kontuly 2008), Hungary (Cseres-Gergely 2012) and Spain (Devillanova and García-Fontes 1998). Based on the objectives of the present study, the most relevant articles are those examining international migration streams. In addition to other works, we can also mention articles on international migration to the European Union (Breitenfellner et al. 2008;Warin and Svaton 2008) and to the United States (Karemera et al. 2000) as the two main regions that draw foreign immigration. These articles tend to include standard economic variables such as per capita gross domestic product (GDP) or population, sometimes using different changes in variables, as in Warin and Svaton (2008), which is based on calculations related to GDP, while in Karemera et al. (2000), political variables are also included, which has a negative impact on migration flows. The leader countries where most international students are recruited from include China, India and other parts of Asia. Another study finds that "China is becoming an important destination for students due to the distinctiveness of the language, the rise of its universities in global rankings and the country's economic growth" (Ahmad and Shah 2018). According to the gravity model, the number of migrants between two regions is directly proportional to the population in each region and indirectly proportional to the squared distance between the location they leave and the region they enter.
The gravity model approach is also common in higher-education-related spatial mobility studies. Bernela et al. (2018) tested the significance of the impact of the scientific size of regions and spatial proximity on PhD mobility in France by applying the Heckman (1979)'s two-step gravity model with a selection equation to evaluate the existence of potential spatial and nonspatial proximity effects. In addition, there are examples of measuring higher-education-related application mobility with large survey-type datasets. Extensive research on the spatial mobility of graduates was performed by Venhorst et al. (2011), who applied multinomial logit models to investigate the relationships between migration and both regional economic circumstances and individual characteristics. They found that the presence of a large labor market is the most important structural economic determinant of higher retention rates in regions.

The network model
Mobility can be modeled by networks. In this case, nodes represent locations, while directed arcs represent the mobility from one location to another. The weights of arcs represent the number of domestic migrants (spatial mobility) between given locations. In network science, null model creation is a common and useful tool. The null model assumes that the network is random (Newman and Girvan 2004) and thus that the weights of the arcs are independent of one another. Liu and Murata (2010)'s null model assumes that the probability of weights on an arc depends on the distance between locations, while Gadár et al. (2018) assumes that the number of links (weights of arcs) fit well to the values predicted by a gravity model. Null models are also essential for modularity-based community detection. Modularity-based community analysis is performed in two separate phases: first, the detection of a meaningful community structure from a network and, second, the evaluation of the appropriateness of the detected community structure. Systematic deviations from a random configuration or from other null models without a characteristic modularity structure allow us to define a quantity called modularity, which is a measure of the quality of partitions. Newman and Girvan consider only the degree of nodes as a null model, which is equivalent to rewiring the network while preserving the degree sequence (Newman and Girvan 2004). This random model overlooks the economic nature of the network and thus modules. However, economicbased null models can connect these aspects: modularity-based community detection to find and explain communities where mobility exceeds the expected value. Economic null models predict the weight of arcs and, in this way, establish a baseline and explain network properties such as density, asymmetry or clustering. Therefore, it is worth connecting gravity and link prediction models. Based on link prediction, other properties of the network, such as asymmetry, can be evaluated. In this study, we show how to match the network asymmetry and the revealed preference matrix; therefore, the application preference order can also be modeled using economic models.
Most studies on student mobility focus on international movement (see, e.g., Beine et al. 2014;Shields 2013), and very few studies have investigated mobility within a county (see an excerpt of Bacci and Bertaccini 2020). The reason for the low number of papers in this field is that it is very difficult to access a reliable database containing data on students, employees, and institutions, while several databases on international mobility networks are freely available (Gadár et al. 2020). However, an increasing number of countries, including Italy, Estonia, and Hungary, are registering applicants, and this allows for the investigation of the mobility network. To the best of our knowledge, Hungary is the first in the field of data integration because it has registered applications in a central database since 2001, and since 2011, this database has been integrated into the early career database, which was already an integrated database. In addition, this anonymous database is freely available to researchers. 1

Contribution to the literature
To the best of our knowledge, very few studies have attempted to combine gravity-like and network models (see several exceptions in Gadár et al. 2018;Bacci and Bertaccini 2020). However, they are not combined to explain mobility network formation. Although very few studies still consider economic model-based link prediction, network properties, such as asymmetry, are not modeled by economic inequities, and they are not used to estimate the revealed application preferences. In this study, aspects of describing young people's mobility are combined (see Fig. 1). The main result of this study is to apply economic gravity models for link prediction, which provides an explanation for why mobility stronger between certain locations. We demonstrate the connection between revealed preferences and network asymmetries, and we show how the revealed application preferences can be explained by economics or rooted in spatial economic inequalities reflected by the gravity-based network asymmetries.

Methods
In section "Data sources", the common data sources are introduced as indicated in Fig. 1. Then, in section "Applied null models in mobility networks", the fundamental network properties are introduced, which will be evaluated for the network based on the economic gravity model, which is introduced in section "Applied gravity models". Finally, in section "Methods", we present the estimates of the inequities in the application preferences through network asymmetry; see section "Modeling application preferences via asymmetries in the application mobility network".

Data sources
Several specific data sources are involved in the study. One of the main databases is the Hungarian central system for tracking graduates' careers (HCSTGC). Similar tracking systems were recently developed by Estonia and Italy (Bacci and Bertaccini 2020; Kovacs and Kasza 2018). At present, there are few such systems, but we believe that a career path tracking system is the key data source for analyzing student mobility and the impact of higher education on society and the economy. Therefore, it would be worthwhile for decision makers to consider introducing such a system at least at the European Union level. HCSTGC includes anonymized information on the location of residence (NUTS4 subregion), the city (or subregion) of the HEI, the county and subregion of the workplace, and starting salary of all graduated employees who earned their absolutorium or degree between 09/01/2014 and 01/31/2015. Among these individuals, this study focuses on those who were employed as of May 2016, representing 47,165 graduate students. The occupation and economic activity codes are also available for the work and workplace. The occupation coding uses the International Standard Classification of Occupations (ISCO) codes. The economic activity coding using the International Standard Industrial Classification of all Economic Activities (ISIC) codes. Following Gadar and Abonyi (2018)'s work and databases, the first two letters of the occupation codes are matched to thirteen scientific fields of graduation; we called these fields occupation categories.
The next applied database is the student application database (2006-2017) which contains anonymized data from applicants and HEIs. We used the following data for the analysis: the location (NUTS4 subregion) of the applicant, the location of the HEI, the applied for BA/BSc/MA/MSc program and the scientific field of the program. To maintain consistency between the student application database and HCSTGC, henceforward, only the thirteen matches for occupation-scientific field category (occupation category) are considered: (1) agriculture, (2) human studies, (3) social sciences, (4) information technology, (5) law & public administration, (6) military, (7) business & economics, (8) engineering, (9) health & medical sciences, (10) pedagogy, (11) sport sciences, (12) natural sciences, and (13) arts. The aim of the matching was to ensure consistent nomenclature. In this study, only categories of the ISCO are considered, but to ensure consistency, nomenclatures of thirteen scientific fields are used as occupation categories ( O k , k = 1, 2, .., 13).
The last included data sources are already found in all countries in the European Union (such as Eurostat) and most countries in the world. The per-capita gross domestic income (GDI/cap) of a location (i.e., subregion) between 2006 and 2017 comes from the Hungarian Central Statistical Office. From the Hungarian National Employment Service, we obtained the mean of the (gross) salary 2 for all 19 counties and the capital city (Budapest) for 2015 and 2016. These national (not only for recent graduates) salary statistics are available via ISIC/ISCO codes at the county level.

Applied null models in mobility networks
The mobility network can be described as a directed graph and is an ordered y} is a set of edges (also called arcs) that are ordered pairs of distinct vertices (i.e., an edge is associated with two distinct locations in a mobility graph). The number of movements between locations is associated with the edges. E is the adjacency matrix of graph G, where the elements of the matrix indicate whether pairs of vertices are adjacent in the graph.
Denote e ij as the matrix element of adjacency matrix E of mobility graph G. The first null model that will be considered is the random configuration model that calculates the arc probabilities p NG ij , assuming a random graph conditioned to preserve the degree sequence of the original network: where id represents the in-degree and od the out-degree: id j = i e ij , od i = j e ij , L is the number of arcs (links) between nodes.
The distance-dependent (Liu and Murata 2010) version can also be used for null models.
where p α,β is the distant-dependent null model. f (d ij ) is a monotone function of distance decay. The α, β parameters are called importance values estimated by regression analysis.
For the economic model we use the notation where m i is an economic value, such as GDP, GDI or another economic quantity of the location of node i. d ij is the distance between location i and location j. α, β, δ are importance values of locations and the distance between locations. They were estimated through a regression analysis. In this study, we propose a generalized gravity model for link prediction (null model): (1) where N is the number of economic parameters while α k , β k , γ , δ are regression parameters.
The time-series version of the null model predicts the arcs of the dynamic network.
Null models are mainly used in modularity-based community analysis. Originally, the method was specified for edges.
In the case of a directed network, this difference can be formulated as follows: p ij represents the number of estimated arcs/weights proceeding from the i-th to the j-th location, and δ C i , C j is the Kronecker delta function that is equal to one if the i-th and j-th locations are assigned to the same community. The goal of modularity-based community analysis is to separate the network into groups of nodes that have fewer connections between them than inside communities (Newman and Girvan 2004). The modularity of partition C can be calculated as the sum of the modularities of the C c , c = 1, . . . , n c communities: Originally, these null models were specified for estimating arcs in a binary graph; however, they have been extended to handle weighted graphs, but null models can also be used to evaluate asymmetry.
The value of modularity M c of a cluster C c can be positive, negative or zero. Should it be equal to zero, then the community has as many links as the null model predicts. When modularity is positive, the C c subgraph tends to be a community that exhibits a stronger degree of internal cohesion than the model predicts.
To obtain real communities, null models (p in Eq. (10) ) should approximate a given property ( ), such as the probability of arcs, weights, reciprocity or edge asymmetry as much as possible. Therefore, when seeking null models, the following equation should be minimized: ( where is the modeled parameter, such as the arc/weights between node i and node j or the asymmetry of arc ij. Measuring asymmetry: We consider a directed weighted network specified by the (nonnegative) weight matrix E , where e ij indicates the weight of the directed from node i to node j. In the case of no connection from i to j, e ij = 0 . E can be specified as the sum of a symmetric ( P ) and skewed symmetric matrix Q , where The edge asymmetry matrix ( A ) is the rate of skew-symmetric ( Q ) and symmetric ( P ) components of the weight matrix ( W).
where a ij = e ij −e ji e ij +e ji ∈ A is also a skewed symmetric matrix. For ∀ i = j, where e i,j + e j,i � = 0 , edge asymmetry can only be interpreted for the interconnected i and j nodes. It satisfies that −1 ≤ a ij = −a ji ≥= 1 -in binary networks a ij ∈ {−1, 0, 1}.
The economic null model for asymmetry (denoted as ) can be expressed as follows: Equation (15) also shows that while the economic version of the law of gravity is distance dependent (see Eq. (3)), by substituting it into Eq. (14), we obtain a distance-independent model. In addition, the economic null model of asymmetry only depends on the economic parameters (denoted as m) of locations and their importance factors ( α, β ). The indicator Ŵ i,j has economic content itself. If q Ŵ i,j estimates domestic immigration and q Ŵ j,i estimates domestic emigration between location i and location j, then q Ŵ i,j -q Ŵ j,i represents the net mobility exchange while q Ŵ i,j +q Ŵ j,i represents the gross mobility exchange between location i and location j. If the arcs represent import/export goods, then the absolute value of Ŵ i,j is a well-known intraindustry trade (IIT), while in this study, this value is called the intramobility rate (IMR), which characterizes the asymmetry of the mobility between locations.

Applied gravity models
According to the introduced data sources (see "The gravity model" section) we can identify and calculate the following parameters and coefficients (see Table 1):    The added value measured by the salary premium, which is the difference between the starting and mean salary of not graduated employees, can be specified by a sum of a graduate premium, the added value of the location, added value of the HEI and individual wage bargaining (see the waterfall plot in Fig. 2). Figure 2 shows that the added value of a location can be negative; however, except for the mean salary of undergraduates, every factor can either be positive or negative ( Table 2).
Factors of starting salary: After matching education-occupation pairs and the calculation of average salaries, the following differences can be specified: Formally, the starting salary for employee n can be explained as follows (see Eq. (16)): where ǫ n is the result of individual wage bargaining. Figure 2 explains how the salary premium ( S g k,j,m (n) − S k,j,m ) for employee n can be divided into four parts, namely, the graduate premium ( S g k ), the added value of the location ( S g k,j ), the added value of the HEI ( S g k,j,m ), and the individual bargaining ( ǫ n ) as described in Eq. (16). These values serving as proxies for the added value to salaries will be measured, and they will be applied as dependent variables in the proposed spatiotemporal model.
Two kinds of movement can be specified based on the Hungarian career tracking database: (1) application to an HEI (see the solid arrows in Fig. 3a) and (2) application to a job (see the dotted arrows in Fig. 3a). These movements can be separated (see Fig. 3b) into two distinct layers. Two kinds of mobility, occupation mobility ( Y L i ,L j ) and application mobility ( A L i ,HEI j ,t ), can be explored. In addition, based on the application databases, application mobility can be modeled over the period 2006-2017 with a fixed effects gravity model. Occupation mobility is also separated in time (see Fig. 3b). One of the problems is that these movements differ over time. Therefore, it (16) S g k,j,m (n) − S k,j,m = �S g k + �S g k,j + �S g k,j,m + ǫ n  The added value of HEI ( HEI m ) at location ( L j ) for occupation ( O k ).

Fig. 3 Spatiotemporal networks for modeling both birth-to-school and the school-to-work mobility
is a so-called two-layer spatiotemporal network. Nevertheless, if we only consider the movement between the location of residence before the application to an HEI and the location of a workplace, we can propose a so-called mobility network (see Fig. 3c). In this case, the edges between nodes connect different locations not only in space but also in time.
There are two ways to understand the proposed mobility network. We can regard the mobility network as a spatiotemporal mobility network where the communities represent the "attractiveness" of a location. The sizes of communities can differ, regardless of whether an HEIs is present. Counties that typically represent sources and sinks for different occupations can be specified. The role of an HEI in becoming a source or sink county can also be analyzed.
The other understanding is to regard mobility as a flow (of people) between locations. Gravity models make it possible to model the "attractiveness" of a location, such as the strength of the regional economy (GDP/cap, GDI/cap), salary opportunities, the distance dependencies between locations for every single occupation, and the role of institutions.
The logarithmic form of the proposed spatiotemporal gravity model can be specified as follows for occupation mobility: where u j,j ′ is the residue of the regression model. β s are the regression parameters to be estimated. Note that the years of the data values I j,t j = GDI/cap(L j , t j ) and I j ′ ,t ′ j = GDI/cap(L ′ j , t ′ j ) are usually different. When considering the source location ( L j ), the GDI/cap(L j , t j ) should be from the year of application, while when considering the target ( L ′ j ) (workplace), the GDI/cap(L ′ j , t ′ j ) should be from the year of hiring at the workplace.
Using dummy variables, the role of HEIs is included in Eq. (17). GDI/cap values are proxies for the expected salaries; therefore, if the salary surplus (as an added value) of the location for an occupation O k is considered, then a detailed model can be specified as follows: Application mobility can be explored with a time-series (fixed effects gravity) model. In this way, similar indicators can be specified.
Owing to the time-series data, the application mobility can be explored using timeseries (i.e., fixed effects gravity) models. Therefore, the dynamic null model predicts the dynamic mobility graph. Table 3 reports which null models are estimated with economic models.

Modeling application preferences via asymmetries in the application mobility network
The Hungarian Education Authority has made all application data available to researchers. We considered the interval between 2006 and 2017. According to (Telcs et al. 2016), individual application preferences can be aggregated, and in this way, professional, regional, and institutional aggregated preference matrices can be calculated. Here, an (i, j) cell from the aggregated preference matrix shows how many times the i-th institution preceded the j-th institution in student applications. Telcs et al. (2016) offered several heuristic methods to estimate aggregated preference orders from an aggregated preference matrix. They showed that these methods approximate the optimal preference order very well, where the optimal preference order is a preference order and the number of opposite application preferences (i.e., the sum of values in the lower triangle of the   Opposite preferences=2+1+1=4 Sum of the lower triangular= − 10/3 preference matrix) is minimal (Telcs et al. 2016). One possible choice is the column sum method.
The aggregated revealed preference is the order of the column sums (see Table 4a). Since Eq. (14) is a monotonous transformation, the order of the column sum is not modified (see Table 4b), the asymmetry is not perturbed, and therefore, we gain a model for the preference order (see Eq. (20)). Table 4 provides an example of an (unordered) aggregated preference matrix. Telcs et al. (2016) showed that if the matrix is (re)ordered by the preference orders, then the opposite preferences (the sum of the values in the lower triangle) can be decreased. The algorithm will stop if the (re)orderings of the preference matrix cannot reduce the sum of opposite preferences (see Table 5).
Equation (20) shows an example of the power of combining economic models and network science. Equation (20) explains the preference value between institutions HEI m and HEI m ′ in year t. If these models fit well, then the institutional preference order can also be modeled and explained.
Equation 20 defines the asymmetry between HEIs m and m ′ . A generalized gravity model is used to express the attractiveness of the HEIs to express the asymmetry of the nodes in the network representing the HEIs.
For shorter notation here, we used I m instead of GDI/cap(L HEI m ) with a little abuse of the original notation. Table 5 The ordered aggregated preference and asymmetry matrices Opposite preferences (E)=1+1+1=3 Sum of the lower triangular= − 4

Results
In this section, we show how the location and the location's economic state influence the students mobility, career path and how it can be inferred by our models.

Added value of the locations and education
Following Eq. (16), the salary premium can be decomposed into four components. The first is the graduate premium S g k , which is the difference between the mean starting salary of graduated employees ( S g k ) and the mean starting salary of not graduated employees (S n k ) in occupation category k. Figure 4 shows the graduate premiums by occupation category k. The highest mean salaries are in information technology, engineering and business & economics. The highest added value of a master's diploma is also in these three categories.
The second factor is the added value of the location. Figure 5 shows the added value of the location for business & economics (see Fig. 5a) and engineering (see Fig. 5b). The added value, which can be positive or negative, is ordered into ten deciles. In the case of business and economics, only the seventh decile contains positive values, and these deciles are concentrated in the center of Hungary, while in the case of engineering,  added value is concentrated in the western part of the country. These results are highly correlated with the spatial distributions of the companies. While the center of economics is concentrated in the capital city, the companies requiring engineers are concentrated in the more industrialized western part of Hungary. The third factor is the added value of HEIs. The role of HEIs in spatial mobility has already been shown in Fig. 6. If an employee graduated in the eastern part of Hungary (i.e., the University of Debrecen, see Fig. 6a) then he/she usually does not go to work in to another part of the country, and vice versa. For example if employees graduated in the western part of the country, for example from the University of Pannonia (see Fig. 6b), which has several campuses in the western part of Hungary, then they usually do not work in the eastern part of the country. The added value of rural universities is positive or neutral in the locations and neighboring locations of the campuses; however, it is usually negative in the capital (Budapest) and varies substantially in more distant subregions.
The remaining factor is the individual bargaining on salary, which is the unexplained (residual) part of the economic model in Eq. (16). This value follows the normal distribution.

Analyzing spatial mobility to reflect the role of HEIs
In this section, application and occupation mobility are investigated. First, traditional gravity models are applied to determine the roles of economic and institutional indicators of mobility. Then, a mobility network is built and the main network properties are examined. In the null models, parameters, such as link weights, density and asymmetry, are predicted by the unified economic-network model. Then, the preference orders of institutes and subregions are explained by the proposed economic-network model.

Analyzing application mobility: the first step
Applying gravity models: With Eq. (19), the primary preferences can be analyzed within the period 2011-2017. We used a fixed effects gravity model. Table 6 shows the results. I = GDI/cap is only a proxy for the cost of living and expected salaries. It is not surprising that the most important value is the GDI/cap in the subregion of the HEI. Nevertheless, the second most important value is the GDI/cap in the subregion of residence Fig. 6 Added value of the HEIs to salaries (10 deciles) in business & economics of the applicants, which indicates that the economic properties of the sending subregion also play an important role in being accepted by an HEI.
The unemployment rate (UR) of the subregion of the place of residence has a positive effect while the unemployment rate of the subregion of the university has a negative effect on applications. This means that there are more applicants from subregions with higher unemployment rates but that fewer apply to institutions in subregions where this indicator is also high. In the national ranking of Hungarian HEIs, the better HEIs are linked to fewer applicants; therefore, having a negative value indicates that more reputable institutions enroll students.
Note that the role of institutional reputation is decreasing each year. The negative coefficient of distance is also in line with the gravity models. The larger the distance is, the higher the travel costs, resulting in a greater financial burden for parents and students. In addition, between 2011 and 2016, this coefficient linked to physical distance increased, which indicates a decrease in mobility since the geographical distances did not change. Telcs et al. (2015) suggested that gravity models should be used without HEIs in capital cities because the high centralization of institutes may distort the results.
The gravity-based potential model shows that the role of Budapest (BP) is increasing. The strength of Budapest-centric HEIs is reflected in the greater difference in potential values between Budapest and rural HEIs (see Fig. 7).
Applying network science models: Network analysis can not only confirm the results of the gravity model but also offers additional insights. One of the fundamental dynamic network indicators is the change in network density over time (see Fig. 8a) and the change in the structure of modules (see Fig. 8b). Figure 8a shows that between 2011 and 2013, the density of the mobility graphs was almost equal regardless of whether one includes of excludes the HEIs in Budapest. The linear trends show that the density is increasing, which indicates that fewer locations are connected. Furthermore, not only has the number of applicants decreased, but students have applied to HEIs from fewer locations. The differences in the slope of the linear trend indicate that the decrease in mobility is greater if HEIs in Budapest are excluded. In 2017, there were 27% fewer applicants to HEIs than in 2011. This causes the lines to be thinner in Fig. 8b. The modulebased community analysis identified similar catchment areas as the potential analysis. However, it indicates that the Eastern part is more fragmented. In addition, the decreasing density (with the exception of 2016) indicates that fewer students applied to HEIs and that they applied from fewer places. The results of the fixed panel model show the increasing roles of the distance between the subregion of the student's residence and the subregion of the HEI (see Table 6). The higher values of the distance deterrence function at low distances in 2017 also reflect these results; nevertheless, the distance deterrence function shows the nature of the distance distribution between the student's residences and the locations of HEIs. The maximum of all deterrence functions is in the low distance between locations, which indicates the retention role of HEIs. In both curves, a fraction can be seen showing that students are welcome to apply to institutions in the center of the   (2011)(2012)(2013)(2014)(2015)(2016)(2017). *The density is the number of edges between subregions of students' residence and theHEI/(number of HEIs number of subregions). **The thickness of the edges is proportional to the number of applications; outward linesindicate loops country (Budapest) but are far less welcome at either end of the country. This fraction is stronger in 2017. Figure 9 shows the distance deterrence function calculated by Eq.
(2). These spline curves indicate different shapes of distance deterrence comparing 2011 and 2017. The higher values at low distances in 2017 show that the role of distance is increasing. Students apply to closer HEIs. Since Budapest is in the middle of Hungary, the distance deterrence function becomes fractioned.
The combination of gravity and network science models generates a lower error value for the null models. These null models can be used to specify distance-based communities (see Fig. 11), which better reflect students' preferences and the catchment areas of HEIs. Therefore, the asymmetry of the application network can also be modeled (see Eq. (15)). The application preferences can be modeled using see Eq. (19), the asymmetry prediction. Figure 10 shows the fits of the null models. e j,m ∈ E represents the applications from location j to location of HEI m , while matrix P contains the estimated values (see Eq. (1) based on Newman and Girvan 's null model and Eqs. (4)-(5) for the economic null model). In this calculation, the economic null model (Eq. (5)) uses the coefficients from the application mobility gravity model (see Eq. (19)).
In this way, the gravity model and the economic null model are matched. First, the matched model is used for community detection (see Eq. (9)). Figure 11a shows that without considering distances, four modules can be defined. 3 The economic-gravitymodel-based communities (see Fig. 11b) better reflect the catchment areas of HEIs. Note that Newman and Girvan 's null model is a distance-independent null model; therefore, if subregions are geographically connected, such as in Fig. 11a, then this shows that distance is an important factor. It does not neglect from the null model (Gadár et al. 2018). The economic null model is already a distance-dependent null model where the modules are more fragmented; see Fig. 11b. Nevertheless, we can see, for example, that an economic-based null model that is simultaneously a gravity model explains mobility better (see Table 6) than the module that provides catchment areas of HEIs (compare Figs. 7 and 11b).
The other approach is to combine gravity-based economic models and null models from network science, and the revealed preferences are used to model asymmetry with economic null models (see Eq. (15)). The first advantage of this model is that it is distant independent (see Eq. (15)); therefore, only differences between the importance values are estimated and analyzed. Table 7 shows the parameter estimation of Eq. (20). The results of the gravity models in Table 6 demonstrate that a more preferred HEI that has a higher number of applicants has a lower unemployment rate, higher per-capita GDI in its region and better  rank position (smaller rank order value) on faculty excellence. The results show that an HEI is more preferred if the prospects of living there are more favorable (higher percapita GDI, lower unemployment rate) and there are better HEIs, which is in line with the results of the gravity models (see Table 7). Nevertheless, this model also shows that the differences in the importance of the faculty excellence of HEIs are increasingly being reduced. By examining the coefficients over time, we can see that the determination value ( R 2 ) and the adjusted determination value ( R 2 ), with the exception of those of 2017, decrease between 2011 and 2016. This means that the importance of other, unmodeled parameters is increasing. The values of all coefficients are the same as in the gravity model. Nevertheless, in the gravity models, the decline in coefficients indicated that they were playing increasingly less of a role in the top preference of applicants. Here, however, a decrease in values indicates that the importance of coefficient values for institutions located in different places in diminishing order of preference is being equalized. Where a good fit is found for asymmetry, it is worth comparing the real and modeled values. In this case, we can answer the question of how well the preference ordering formed on the basis of the application and the ordering modeled on the basis of economic, unemployment, and faculty excellence data correlate with one another (see Table 8). As in section "Modeling application preferences via asymmetries in the application mobility network", where we showed how we can restore the aggregate preference matrix from the asymmetry matrix by knowing the total number of students, we can estimate the preference order from the estimation of the asymmetry matrix; therefore, all applications and, more important for institutions, all first-place applications can be predicted. The policymakers of a given HEI can gain particular insight into their comparative advantages (disadvantages) by examining the factors that determine the asymmetry between them and their primary competitors. Table 8 shows the preference order of the first 10 institutions (see Telcs et al. 2016), which follows most of all applications. The Spearman rank correlation of the modeled and real preference order is 0.61. To understand why the proposed method ranked certain institutions lower or higher, we report the per-capita GDI and the unemployment rate in the subregions of the HEIs and display the rank positions of the faculty excellence values. Budapest Business School: BGF (now BGE) is ranked lower in the model because its faculty excellence ( RANK m ) is significantly lower than that of other listed institutions. Based on the results presented above, although faculty excellence is an important factor, there are institutions that are popular, despite that this value is lower for them. The University of Miskolc (ME) would not be among the top ten institutions according to the model. The reason for this is that in the subregion of Miskolc, unemployment was very high in 2011, at above 10%. The University of West Hungary (NYME), which no longer exists, would be ranked higher by the model, despite the lower per-capita GDI ( I m ); however, there is a low unemployment rate ( UR m ), and it is third place in faculty excellence ( RANK m ). We can see that the model underestimates the number of applications for the first 10 institutions and overestimates them at the end of the ranking.

Analyzing occupation mobility: the second step
First, Eq. (17) is applied to measure the role of HEIs in occupational mobility. Figure 12 shows the results of the gravity model where all parameters are significant, and the parameters are arranged in descending ordered by the absolute β values.
Not surprisingly, the highest coefficient is the I = GDI/cap at the host location ( j ′ ) which is a proxy for expected salaries. The coefficient of GDI/cap of the source location j is also positive, which resonates with the results of the applied gravity models. A high GDI/cap in the source location creates a chance for the student to attend to university and later to remain at that location or go to another place to secure employment.
It is a very interesting and important result that just behind the income data, the importance of two dummy variables (whether there is an HEI in host j ′ and the source j location) appears ahead of the unemployment rates and distance. Mobility is facilitated both by having an HEI close to the workplace and even more so by having such an institution close to the source location. It is no coincidence that multinational companies prefer to settle around university cities, as these institutions play a significant role in both attracting students and keeping them there after graduation. Due to their importance in innovation, HEIs can have a positive impact on the social, economic and cultural development of a given region. This is because due to the continuous decrease in public expenditures, some HEIs are turning to the local public and business sector and attempting to recruit more students from the immediate environment, as well as to increase revenues by providing professional services. The retention of students and the regional involvement of HEIs, which provide space and audiences for cultural events, are indisputable, in addition to professional relations with the business sector. Mobility, albeit to a lesser extent, is positively affected if the unemployment is high in the source location; however, the attractiveness of the host location is slightly decreased if the unemployment rate in the host location is high. The coefficient of the unemployment rate is lower than the coefficient of distance, which is, not surprisingly, a negative value. If instead of GDI/cap, the mean of local starting salaries in an occupation is considered, then the adjusted determination coefficient ( R 2 ) can be increased (see Table 9). Table 9 shows the variable importance (measured by the contribution to the determination coefficient) instead of betas. Since the HEI variables are dummy variables, they better reflect the real role of HEIs. The results show that except for sport sciences, the most important value for occupational mobility is whether there are any HEIs close to the workplace. These results match well with the results on the added value of the HEIs, where Figure 6 shows that students attempt to find jobs close to the university where they graduate. Furthermore. larger companies prefer to settle around cities where HEIs operate.
By combining network science methods with gravity models, the impact of HEIs can be further investigated. The asymmetry in occupation mobility can be calculated for the subregions, and they can be ranked. Table 10 shows the top 10 most attractive subregions.
In line with the former results, it is not surprising that, first, ten subregions have (at least one) HEIs. This accords with the former results and highlights the role of HEIs. Only the first two subregions have a positive balance of graduated employees. The first is the capital city of Hungary, and the second is also located next to Budapest. The other subregion already has a negative balance. If only all people's mobility is considered is this balance positive in the top subregions. Nevertheless, the lack of freshly graduated employees can indicate that the structure of society may change over time.

Discussion
Data-driven career tracking offers new insights for scholars to analyze application and occupation mobility. This database represents the whole population of graduated employees and applicants; therefore, more reliable models can be proposed. This database is also important for potential applicants when choosing an HEI. A pilot web page is already accessible at https:// www. diplo mantul. hu, where potential applicants can see the potential salaries by occupation categories and the added value of HEIs (see Figs. 5 and 6). Nevertheless, future research should examine how the availability of this information reinforces the asymmetry that already exists in mobility networks. Modeling asymmetry is one of the key issues because the asymmetry makes it possible to use revealed preferences and the preference order to model economic inequities (see Table 7). In addition, asymmetries in occupational mobility may indicate changes in demographic and social characteristics and structure as well as subjective preferences, components that determine popularity of HEIs. It is unquestionable that HEIs play a key role in changing social and demographic processes (see Table 10). The combination of network and economic models, on the one hand, can cross-validate the results and offers a technique for model triangulation. On the other hand, the economic models can explain the formation and properties, such as asymmetries, of mobility networks (see Tables 7 and 8). In addition, the combination offers better community detection (see Figs. 10 and 11 and a better explanation and prediction of revealed application preferences (see Table 8). Explaining network properties with economic models can open new horizons because it is possible to move from descriptive statistics on network properties to explanatory models, and thus, we can better understand the mechanism underlying the formation of networks.

Summary and conclusions
This study showed how which to measure the role of HEIs in application and occupation mobility by combining economic and network models. These results are based on application and career tracking databases that include all applicants and all freshly graduated people within the explored time interval. The paper demonstrates how to calculate the graduate premium, the added value of the locations and the HEIs. One can see that the added value of locations (on starting salaries) varies across occupation categories and correlates with the distribution of the companies (see Fig. 5). The added value of HEIs is usually limited to the vicinity of the HEI in question. The impact of a HEI at a larger distance is questionable (see Fig. 6).
The results highlighted the role of HEIs, especially in occupational mobility (see the high importance values in Table 9). Nevertheless, in the case of application mobility, the coefficient of faculty excellence decreases over time (see Table 6). In addition, the declining R 2 values over the years make it probable that factors other than economic factors, unemployment and faculty excellence are increasingly influencing student mobility (see Table 7). Finally, the appropriate model for network asymmetry can explain the precedence orders and predict the number of applications of HEIs (see Table 8).
Although the authors only analyzed spatial (such as application and occupation) mobility in Hungary, the presented methods can also be generalized to data from other countries and other spatial and economic data. The proposed methods can cross-validate each other and create new insight to explain the formation of mobility networks. By using these methods, actual players in these processes may better understand their competitive position and design their strategies.