Analysis of ownership network of European companies using gravity models

Social network analysis is increasingly applied to modeling regional relationships. However, in this scenario, we cannot ignore the geographical economic and technological nature of the relationships. In this study, the tools of social network analysis and the gravity model are combined. Our study is based on the Amadeus database of European organizations, which includes 24 million companies. The ownership of parent subsidiaries was modeled using economic, technological, and geographic factors. Ownership was aggregated to the NUTS 3 regional level, to which average corporate profitability indicators, the GDP per capita characterizing the economic environment, and the number of patents, which is a proxy of the technological environment, were assigned to NUTS 3 regions. The formation of the ownership network between 2010 and 2018 was characterized using this dataset. As the proposed model accurately describes the formation of ownership relationships marked with edges, it is possible to estimate network properties, such as modularity and centrality.

While power-related connections between corporations play an important role in understanding our global corporate system (Vitali et al. 2011), few papers have investigated such networks. For example, Nakamoto et al. (2019) employed the so-called Orbis database (Dijk 2018) to identify and analyze high-risk intermediate companies used for international profit shifting. Using the same database, Khalife et al. (2021) modeled the ownership network to establish a methodology to extract and analyze meaningful patterns of capitalistic influence from the graph structure. Mizuno et al. (2020) applied a new model on this network to measure a shareholder's power to control corporations. Based on their findings, the landscape of global corporate control appears different if we adequately evaluate indirect influence via dispersed ownership. Finally, Takes et al. (2018) investigated the essential building blocks (multiplex motifs) of this graph to provide a better understanding of multiplex corporate networks.
This paper analyzes the European subset of the Orbis database called Amadeus 1 to provide further insights into the structure of European companies' ownership networks. To this end, we combined the tools of SNA with a gravity model containing different economic, technological, and geographic indicators. Ownership was aggregated to the NUTS 3 regional level, to which average corporate profitability indicators, the GDP per capita characterizing the economic environment, and the number of patents and industrial designs characterizing the technological environment were assigned to NUTS regions. The formation of the ownership network between 2010 and 2018 was then characterized using this dataset. As the proposed model accurately describes the formation of ownership relationships marked with edges, it is possible to estimate network properties, such as modularity and centrality.
The aims of the study are twofold, namely, proposing a new, economic null model and identify and analyze economic-investment communities (EICs). The stated aims (As) of this study are grouped as follows:

I. Propose a new gravity-based economic null model (GEN):
A 1 To improve link prediction with GEN. A 2 To predict derivative network characteristics such as centralities and modularity values.
II. Identify and analyze companies' ownership structure: A 3 To identify EICs. A 4 To analyze the stability of EICs over time.
Incorporating gravity models into null models offers us to predict links more accurately (see A 1 ). In this way, the formation of the companies' ownership network, as well as its

Data and methods
The study combines data-driven and model-driven approaches. The employed datadriven methods came from network sciences, such as calculating centralities to identify the key regions in investment and community-based modularity detection to identify communities of regions. On the other hand, the applied gravity model is a frequently used economic model, where the rate of flows, such as migration, trade exchanges-or in this case, the number of investments-has to be modeled. The employed data-driven approach, in contrast to the traditional model-driven approaches, could not be based on a preliminary research model and the associated research hypotheses. However, clearly defined research purposes and the associated research questions are formulated. In addition, the combination of data-driven and model-driven approaches allows scholars to formulate more specific research questions (RQs) as follows: I. Methodological research questions: RQ 1 Is it possible to improve link prediction via the proposed GEN null model? RQ 2 Is it possible to improve the derivative network coefficients, such as centralities and modularity values, by the proposed GEN link prediction?
II. Applications of null models: RQ 3 Do administrative (such as country) borders affect investments? RQ 4 How do investments change if distance does not play a role? RQ 5 What kind of EICs can be identified with the GEN null model? Are they stable in time and space? RQ 1 and RQ 2 are derived from A 1 and A 2 . One of the main goals of this study is to propose a better null model that better predicts the links (i.e., the number of owners) between regions. In a spatiotemporal network, which is the company ownership network, not only the distance but also the economic and technological environment, as well as the financial status of companies, can influence the links between nodes. Therefore, not only the links (see A 1 -RQ 1 ) but also the derived network characteristics, such as centrality and modularity values, can be predicted more accurately (see A 2 -RQ 2 ). Without underestimating the importance of methodological questions, the interesting questions can be centered around the interpretation of the results (see A 3 -A 4 and RQ 3 -RQ 5 ). The establishment of a new subsidiary can be considered an investment in which technology and knowledge transfer also take place.
The configuration model of Newman (2010) shows that if links between nodes are concentrated around geographical locations, then the distance between nodes should be considered in null models (Expert et al. 2011). At the same time, distance dependence alone does not explain why administrative boundaries are returned during a module search (see RQ 3 ). In that case, we can rightly assume that other economic, technological, or corporate characteristics also influence the decision of investments. Indeed, while in the European Union, the federalist and sovereignist positions fight with each other in almost all areas of decision-making (Saurugger 2018;Heidbreder 2022), an important question may be whether the administrative borders (here primarily the country borders) play a role.
Nevertheless, using the distance-dependent null model in community detection gives us the opportunity to ask questions about what kind of relationships would develop if distance did not play a role (see RQ 4 ). In addition, by using GEN-based null models, the following question can be answered: how does an EIC change in time and space (see RQ 5 ). Since EICs show regions where the connections between regions are denser than the gravity model predicts if administrative borders are returned as modules, then this indicates that the financial, economic, and technological differences are still decisive in the unified economic area.
Since the gravity model is a classical economic model that follows a model-driven approach, research hypotheses (RHs) can also be stated.

RH 1
The links within companies' ownership networks can be modeled by the distance between regions, the economic and technological properties of the regions, and the financial status of the companies. RH 2 The administrative borders play an important role in the formation of EICs, which are stable in space and time.
The RH 1 determines the groups of indicators involved in the gravity models. The applied GEN-model-based community detection specifies EICs, which mainly reflect the country's borders. We also assume that these EICs are stable in space and time (see RH 2 ).

Data employed
The database was collected and aggregated by employing the freely available database of Eurostat, 2 as well as the databases of Amadeus, 3 and PATSTAT. 4 Although the latter two are commercial databases, they collect freely available data from European companies and patents. Amadeus consists of over 24 million company data from all over Europe. In addition to the headquarters of companies, it contains balance sheet and income statement data, as well as ownership relations among companies. The PATSTAT database consists of filed and accepted patents, industrial designs, and trademarks from around the world, as well as the addresses of the inventor and the exploiter. Most European headquarters are assigned to a NUTS 3 region. In addition, we collected per capita GDP adjusted for purchasing power parity (PPP) and population data for NUTS 3 regions. Note that GDP (PPP) is often used as an indicator that is suitable for international comparisons (see, e.g., Abrham and Vosta 2010); however, within the European Union, there is a relatively moderate difference between nominal GDP and GDP PPP. 5 Two distinct data tables are specified. The first table provides the data of nodes (i.e., data of NUTS 3 regions). All indicators are attached and aggregated to a NUTS 3 region. Data from Amadeus (indicators m 2 − m 15 ) are also aggregated to NUTS 3 regions. The mean values of company data are calculated for indicators m 2 − m 14 . The edge dataset connects two regions (i and j), and the distances ( d i,j ) between two regions are collected from the official site of Eurostat. 6 Table 1 shows the list of applied indicators and the data sources. Note that in the Eurostat database, there is no information about the NUTS 3 GDP data for Iceland (2 regions), Liechtenstein (1 region), Switzerland (25 regions), and the United Kingdom (179 regions); therefore, we used the GDP per capita values for all countries. 7 To test RH 1 , data on companies' financial status ( m 1 − m 14 ), as well as regional economic ( m 15 ) and technological ( m 16 ) indicators and interregional distances ( d i,j )are collected. These variables were treated as independent variables, while the dependent variable was the number of owners of i region companies in the j region ( a i,j ).

Network representation of ownership
The parent-daughter relationship between firms was characterized by means of a binary adjacency matrix A , whose elements are defined as: 5 Source: https:// stati stics times. com/ econo my/ gdp-nomin al-vs-gdp-ppp. php, retrieved: 5 July 2022. Note that no significant changes in coefficients were experienced when we replaced GDP (PPP) with GDP in the gravity model. A similar conclusion can be found, e.g., in Paas et al. (2008), which examined international trade within the European Union using a gravity model. 6 Source: https:// ec. europa. eu/ Euros tat, retrieved: 5 May 2022.
Because of the difficulties of interpreting aggregations, the rate of ownership is not considered. The adjacency matrix A is further called the company ownership matrix (COM). The database contains the exact geographical location of each company. Moreover, since all our economic and technological indicators were provided at the NUTS 3 level and we also wanted to preserve the anonymity of the companies, we aggregated the data to the NUTS 3 level. However, these data are stored separately to use in link (i.e., the number of ownerships) prediction between NUTS 3 regions. Each settlement was assigned to a NUTS 3 region (county). Companies are assigned to geographic regions by the A [mo,NUTS 3] and A [da,NUTS 3] incidence matrices, whose elements are defined as: with element one if the headquarters of the i-th mother company is situated in the j-th NUTS 3 geographic region, -a [da,NUTS 3] i,j with element one if the i-th daughter is situated in the j-th NUTS 3 geographic region, Therefore, the directed weighted network that defines the number of investment connections between the regions can be defined as: (1) a i,j = 1 if the i-th company owns the j-th company. 0 otherwise where A [NUTS 3] is the (aggregated) company ownership matrix (ACOM). If both the subsidiary (daughter) and the parent company are stated in the same NUTS 3 region, then a self-loop is formated in a NUTS 3 level. a i,j ∈ A [NUTS 3] represents the number of owners between NUTS 3 region i and j.
The advantage of this company-to-local transformation is to create the opportunity to analyze connections between regions via yearly cross-sectional analysis.
If we examine several periods, we obtain three-dimensional arrays instead of adjacency matrices, where the third dimension is time (i.e., year). As we address intercounty relations throughout, the NUTS 3 notation is neglected. An adjacent matrix in year t is denoted as

Applied null models
Null models predict connections between nodes. The most widely applied null model is the random configuration model specified by Newman and Girvan (2004), which calculates the prediction assuming a random graph conditioned to preserve the degree sequence of the original network: = i a i,j , and L = i j a i,j . Note that self-loops created during the regional aggregation of the ownership network have to be treated. To this end, Arenas et al. (2008) proposed a multiresolution method called AFG (after the authors, Arenas, Fernandez, and Gomez) by adding r self-loops to each node. This algorithm increases the strength of a node without altering the topological characteristics of the original network, as follows: A r = A + r I , where I denotes the identity matrix and r the weight of the self-loops of each node. We used this correction in the case of finding modules; however, this compensation underestimates the self-loops.
The so-called randomized null model presented by Eq.
(3) is inaccurate in most realworld networks Liu et al. (2012b). Nevertheless, several community-based detection methods, such as modularity detection, are based on this random configuration model (Newman 2010).
One of the main disadvantages of the randomized null model is that it neglects the distance dependency between nodes (i.e., regions). The following null model can be specified by considering distance dependency and the use of the attractiveness or importance of nodes instead of the sum of incoming or outgoing edges (Barthélemy 2011;Expert et al. 2011): where j ) denotes the importance (or attractiveness) of nodes. α, β are fitting parameters. Since i j p i,j = i j a i,j , γ = can be directly measured from the data by means of a binning procedure, where the prediction error should be minimized, similar to that used in Expert et al. (2011): Note that Eq. (4) is the generalized version of Eq. (3). Additionally, these null models are identical if α = β = f (d) = 1, γ = 1/L . AFG correction can also be used in distancedependent predictions; however, if f (d ij ) = ∞, ∀d i,j = 0 then all self-loops can be predicted by Eq. (4). It is important to note that Eq. (4) is already a hybrid null model. Since Eq. (3) predicts links solely by network characteristics, such as incoming and outgoing edges, excluding any other influence indicator, which determines the weights of edges between nodes, Eq. (4) already includes the distance dependency in the model. In addition, regression parameters also allow distinguishing the importance of incoming and outgoing edges, which has already different meanings.
Only one step remains to estimate the probability of connections with a gravity model, where f (d i,j ) = d δ . Following the notation of gravity models, instead of I, we denote m as the characteristics of nodes (i.e., regions) (Gadár et al. 2018), such as GDP per capita and population. The null model is generalized as follows: where N is the number of indicators belonging to the nodes. α, β, γ , δ are regression coefficients. Eq. (6) further called this the gravity-based economic null model. If d i,j = 0 regression parameters can be estimated via the logarithmic version of GEN (see Eq. (6)): In this study, ∀m i > 0 , however, because of the self-loops, d i,i = 0 . If there is no exact knowledge about the distances, there are two ways to handle self-loops. One way is to add 1 km to every distance. In this way, log(d i,i + 1) = 0 , and Eq. (7) can be solved. Nevertheless, Burger et al. (2009) showed that this correction can distort the estimation; therefore, they suggested solving Eq. (6) by Poisson regression instead of solving Eq. (7) directly. At the same time, the geocoded location of the company exists; therefore, in the aggregation, the average distance is used in NUTS 3 regional self-loops instead of using only one correction value. Note that this average distance can be calculated for every pair of regions; however, this correction had no significant effect and was therefore only used in self-loops.
Since Eq. (7) provides a linear regression model, and all assumptions of regression models, such as normality, homogeneity and independence (i.e., there is no multicollinearity), must be satisfied. To test for multicollinearity, we used the variance inflation factor (VIF).
To reduce the multicollinearity, the greatest VIF should be less than 2.5 ( max i VIF i < 2.5 ) Johnston et al. (2018).
The proposed gravity-based economic null model (GEN, see Eq. (6)) and its logarithmic version (see Eq. (7)) are purely economic models, and there is no network property involved. At the same time, we assume that GEN provides better link predictions than other null models (see A 1 , RQ 1 ). In addition, via better link prediction, an estimated network can also be predicted where the network properties, such as centralities and modularity values, can be calculated. Better link prediction also provides lower prediction error in derived properties (see A 2 , RQ 2 ). At the same time, it must not be forgotten that the GEN is purely an economic model, which thus models not only the formation of edges but also the formation of centralities and modules.
The goodness of fit of null models is determined by how well edges are estimated. Therefore, if there are variable parameters, the absolute differences between the real and predicted edge values must be minimized. Formally: This section introduces three kinds of null models. Newman and Girvan (2004)'s model considers only network properties during link prediction. While Expert et al. (2011)'s distant dependent null model already considers the spatial dependencies between nodes, it can be considered a hybrid model because spatial and network properties are involved simultaneously. Several other null models can be found in Barthélemy (2011), but to the best of our knowledge, the proposed GEN model is the first, which predicts links based on purely spatial, economic, technological, and corporate financial data but does not employ network-property data.
Note that in the case of GEN, the minimization problem is very similar to the gravity-based economic models. Since in the regression model the square estimation error, while in the case of optimizing null models the absolute difference between the original and the predicted links should be minimized. This similarity provides for the employment of gravity models for link prediction and via link prediction the prediction of the company ownership network.

Communities
One of the main applications of null models is to detect communities. Classical modularity optimization-based community detection methods utilize f(C) metrics based on the difference between the internal number of edges and their link prediction (Newman and Girvan 2004;Yang and Leskovec 2015).
In the case of the proposed directed network, this difference can be formulated as where p i,j represents the number of estimated ownership relationships from region i to region j and δ C i , C j is the Kronecker delta function, which is equal to one if the i-th and j-th regions are assigned.
The modularity of the partition C can be calculated as the sum of the modularities of the C c , c = 1, . . . , n c communities: The value of the modularity M c of a cluster C c can be positive, negative or zero. Should it be equal to zero, the community has as many links as the null model predicts. When the modularity is positive, the C c subgraph tends to be a community that exhibits a stronger degree of internal cohesion than the model predicts. When specifying modules, Eq. (12) must be maximized. When using randomized null models, the modules specify communities where connections are stronger between members within a community than between members of two distinct communities (Newman 2010). A number of links between nodes are dependent on the distances on a spatial network (Expert et al. 2011); therefore, modules give a set of nodes that are close together in geographical terms; however, if they give larger regional units, such as countries, then other formation forces can also be guessed. Therefore, it is a question to be answered whether modules provide larger regions (see RQ 3 ).
The distance-dependent modules already compensate for the effect of spatial distances between regions. Therefore, the modules can be treated as a module without regional distances. In other words, we can analyze what happens if there are no spatial distances between regions (see RQ 4 ).
In the case of gravity models, modules specify the area of investments (Gadár et al. 2018); we further called economic-investment communities (EICs). EICs specify a set of regions where the strength of investments (modeled by a number of ownerships) are denser than the economic, financial, and technological opportunities, as well as the geographical distances predict. If EICs also give back the administrative boundaries, it indicates that the administrative boundaries are the main formation force in investments (see RH 2 ), which should be considered at the European Union level.
This study proposes a generalization of gravity null models (GEN). This model also highlights which economic and technological indicators influence the formation of investment areas of regions (see RH 1 ).
Eq. (12) is typically solved via Louvain's algorithm (Blondel et al. 2008); however, to increase the stability of the results, the recent Leiden's algorithm is applied in this study (Traag et al. 2019).
Since a company ownership network (CON) can represent a static network of ownership, if it is important to analyze CON in time, the one way is to specify a multilayer network, where every layer represents a year. Yearly null models deal only with one layer at once; therefore, all predictions can be performed simultaneously. The other way is to use the dynamic network, where edges between nodes are specified within a time frame. However, this model is better in the case of continuous time intervals. While the multilayer network represents a set of yearly static networks, the existing null models can be extended in an easier way to a multilayer network. Both algorithms can be generalized to multilayer networks, where layers represent a time slice. Thus, the proposed Gravity null models can be used to predict the links in a multilayer network, and modules can specify the yearly EICs.
The whole network formation can be modeled via link prediction; in this way, several network properties, such as centralities, can be estimated. In addition, the formation of these coefficients can be explained, and their changes over time can be predicted.

Multilayer network as a discrete model of a spatial-temporal network
., m}} is a family of (directed or undirected, weighted graphs (called layers of M ), where V α is the set of vertices (set of nodes), E α ⊆ V α × V α is the set of edges (links, or arcs), and W α : V α × V α → R + 0 is the weight matrix of edges of graph G α in layer α and is the set of interconnections between nodes of different layers G α , G β ∈ M with α = β.
In this study, the set of interconnections is not specified; therefore, it is assumed that C = ∅ . Note that in the case of spatial and temporal networks, a layer can represent a time slice (i.e., a year), α = t . In addition, the regions are time invariant; therefore, V t = V , ∀t . Only the weights of edges may change over time. Thus, the connections between regions can be estimated separately (see Eq. 14) using a yearly gravity model: Shifts in regression parameters indicate changes in the role of geographical, economic and technological indicators. The analysis of embeddedness using the multilayer version of centralities indicates shifts in role-player regions.
Finally, analyzing the shifts in modules in time and space indicates the changes in EICs, while calculating modules in a multilayer structure provides time-invariant economic-investment communities.

Centralities
Centralities are traditionally used as descriptive network properties in network science to identify key nodes (roleplayer) in a network. However, if not only links but also links, the whole network can be predicted, and the centralities can be calculated for the predicted network. In other words, in this way, the centralities are predicted. This prediction offers scholars to analyze which indicators influence a region to become a roleplayer. For this analysis, centralities should be modeled as much as possible (see A 2 ).
Since a directed graph is employed to distinguish the mother-daughter relationships of companies, Only the directed versions and generalized versions of centralities are used, such as in-degree, out-degree, betweenness, in-closeness, out-closeness, authorities, host, and PageRank centralities.
Degree centrality is defined as the number of links incident upon a node (i.e., the number of ties that a node has). In the case of a directed network (where ties have direction), we usually define two separate measures of degree centrality, namely, in-degree and outdegree. In-degree is a count of the number of ties directed to the node, and out-degree is the number of ties that the node directs to others. The degree centrality of a vertex v is defined as: In a connected graph, the normalized closeness centrality (or closeness) of a node is the average length of the shortest path between the node and all other nodes in the graph. Thus, the more central a node is, the closer it is to all other nodes.
Closeness is defined by Bavelas (1950) as the reciprocal of farness, that is: where d(v, w) is the graph distance between vertices v and w. Distances from or to all other nodes are irrelevant in undirected graphs, whereas they can produce totally different results in directed graphs.
Betweenness centrality ( C B ) quantifies the number of times a node acts as a bridge along the shortest path between two other nodes. Vertices that have a high probability of occurring on a randomly chosen shortest path between two randomly chosen vertices have a high betweenness.
PageRank satisfies the following equation: where is the number of neighbors of node j. α ∈ [0, 1] , where N is the number of nodes.
Hub centrality ( C H ) and authority centrality ( C A ) are calculated to obtain the ranking results. The hub value is the centrality of a node in its ability to make a relation with other nodes, while the authority value is the centrality value of a node based on the number of relations to the node. The Newman and Girvan (2004), Expert et al. (2011) and proposed GEN models predict links and networks; thus, the centralities can be calculated for both the original and predicted networks. The absolute error of the centralities can be calculated as follows: where C(v) is the centrality measure for vertex v, N is the number of nodes, C is the original, and C is the predicted centrality measure.
Do not forget that in the case of low ǫ C , the GEN-based prediction, which uses purely economic, corporate financial, and technological indicators in the prediction, models centralities indirectly. This model shows which kind of mixture of spatial-economicfinancial-technological indicators can increase the role of a region.

Descriptive statistics
During the analysis, we investigated the ownership network of European companies between 2010 and 2018. The Amadeus database 8 contains data from 23,381,325 companies. From these data, we identified 1,872,272 companies as mother companies or subsidiaries within the examined time period. After data cleaning, we obtained 1,620,340 different parent companies and subsidiaries. The investigated subsidiaries and parent companies are related to 1,435 NUTS 3 regions, which form the nodes of our temporal network. The 87,708 identified ownership relations between these companies over the studied time period are indicated by the edges of the network. Note that connections within the same NUTS 3 region resulted in self-loops.
The 2 table contains descriptive statistics of the main economic and technological data for the examined time period.
The profit and loss (P/L) statement shows the mean value in thousand € for years considering all the companies, not aggregated for NUTS 3 regions. P/L before taxes is the mean value of companies by year in thousand €. In the cash flow line, we can see the mean value of cash flows for all companies by year in thousand €. The number of employees shows the mean values of the number of the companies' employees by year.
Except for the last 3 years of patents, all values are increasing over time. The patent information comes from the PATSTAT database; we have access to the Spring 2021 release of PATSTAT. Because the database itself contains only information about applications that have already been published and because the standard publishing time is usually more than 18 months, the number of PATSTAT entries is much lower in 2017 and 2018.

Null models as link prediction
Null models predict links. At the same time, via link prediction, a predicted network is proposed. Figure 1 shows the fits of null models, where A contains the adjacency matrix of the original company ownership network (CON); P represents the adjacency matrix of predicted networks. The Newman and Girvan (2004)'a model assumes a random network (see Fig. 1a); however, this model cannot explain loops, and the probability of links between spatial (21) Kosztyán et al. Applied Network Science (2022)    Mean value of P/L before taxes  (2018) nodes (i.e., NUTS 3 regions) is dependent on distance (see Fig. 2). The Expert et al. (2011) formula already considers the nonlinear distance dependency between nodes (see Fig. 1b), and in this way, the group loops disappear. The (f(d)) distance dependency is compensated for (see Eq. 4) by the spline function (see Fig. 2); nevertheless, the best fit ( ǫ = 0.0080 ) is produced by the proposed GEN model. When applying Eq. (6), the indicators with VIF > 2.5 were removed from the model. The adjusted R 2 was decreased only slightly (compare Tables 3 and 6 in "Appendix"); thus, the assumptions of normality, homogeneity and independence were satisfied for the remaining model. Table 3 shows the results of the proposed GEN model in the years studied. In addition, it summarizes the coefficients and their significance, as well as the absolute error of the fit of the prediction and centralities between the original and predicted network by the gravity model. For example, β PI i,2018 = −0.0233 means that if GDP decreases 1% from the source (i) region, the number of owners is expected to increase by 0.0233% . Positive (negative) significant coefficients on the source side indicate that an increase in the components may increase (decrease) ownership relations. Similarly, positive (negative) significant coefficients on the host site of the NUTS 3 regions show that an increase in such components may increase (decrease) investments and the development of new corporate sites.
Table 3(a) shows that the applied model (see Eq. 6) is significant, and the adjusted R 2 is slightly greater than 0.4. Among the independent variables examined, the strongly significant coefficients of fixed assets (FA) are positive on the source side (i)  and negative on the host side (j). This result indicates that parent companies typically own higher FA than their subsidiaries.
The current ratio (CR) is a liquidity measure that represents the quotient of current assets and liabilities. The coefficients of this variable are high and significant on the source side, but they are smaller and positive only until 2015 on the host side. The coefficients of the solvency ratio (SR) show the opposite effect to that of CR. This finding suggests that parent companies are typically much more liquid and less solvent than their subsidiaries.
The financial metrics return on capital employed (ROCE), denoted RCB in Table 3(a), can be applied to gauge companies' operational efficiency. The significant coefficients of these variables have a negative sign regardless of side, but this negative effect is greater and much more significant for subsidiaries.
To examine the effect of the economic and technological development of the NUTS 3 regions on the formation of ownership relations, GDP per capita (GDP) and the annual number of patents (PI) are applied. The coefficients of these indicators are negative on both sides; however, they are smaller and less significant on the source side. This observation shows that, in contrast to parent companies, subsidiaries are typically related to NUTS 3 regions that have smaller GDP and fewer patent applications.
Furthermore, we applied the number of companies (CO) within the given NUTS 3 region to control the size of the regions. In the case of this indicator, we obtained coefficients corresponding to our preliminary assumptions since the coefficients of the number of companies are highly positive, regardless of the side. The coefficients of the distance between parent and subsidiary companies are negative and relatively constant over the examined time period.
Finally, we predicted the original network based on the applied model (see Eq. 6); then, we calculated the mean absolute deviation between centrality measures calculated from the original and the predicted network. As shown in Table 3(b), these deviations are relatively stable over the examined time period.

Predicting network properties
Since null models predict links between nodes, they can predict networks as well. Therefore, centralities can also be calculated for the predicted networks. A good fit of the centralities assumes good link predictions. However, the differences between the real and predicted parameters provide new insight into the structure of corporate networks. Table 4 shows the mean absolute error of centralities between the original and predicted networks.
Three kinds of networks can be predicted. Newman and Girvan (2004)'s method provides a random network, where the links are predicted via Eq. (3). Thus, no organizing force is assumed. Edges between two regions are estimated as a proportion of incoming and outgoing edges. The spatial network is specified by the Expert et al. (2011) method based on the model of Eq. (4), which compensates for the distance dependency between nodes. Although the prediction errors are lower ( ǫ NG = 0.0191, ǫ spa = 0.0112 ), the absolute differences between centralities are very similar. The relevant changes are provided by only the gravity model. Since the mean absolute error of link prediction is ǫ grab = 0.0080 lower, the degree centrality error is much lower. Furthermore, the prediction error of betweenness centrality and PageRank centrality is greater. Table 5 shows an example of the power of modeling centralities. Better fits occur in the case of in-/out-degree centralities. Table 5 shows the top 5 regions with high indegree centralities. In other words, these regions are the most attractive regions to establish a new subsidiary company. Table 5 shows that the gravity (GEN) model better predicts the ranks of the top 5 regions than the distance-dependent model. These counties are often capitals (e.g., Rome, Warsaw, Madrid) or larger cities (e.g., Milan, Barcelona, Turin). The distancedependent models, as expected based on the errors ( ǫ ), estimate the regions poorly. This indicates that in addition to geographical distances, economic, technological and financial indicators should be included in the null model to predict top role players. Figure 3 shows the in-degree centralities of NUTS 3 regions. Figure 3a shows the original network, and Fig. 3b-d show the predicted networks. All the predicted networks use the same color bars for a clear comparison of the results.
All network predictions indicate low in-degree centrality for Germany, Benelux countries (Belgium, Netherlands, and Luxembourg), and the UK. Furthermore, the original network contains fewer high in-degree centrality nodes than do the predicted networks. The in-degree centralities are overestimated, especially in the case of random networks and spatial network models, for both South and Central European countries. Thus, the amount of investment (characterized by the making of a corporate site) for Southern Europe and Central Europe is much less than that predicted by any of the models. The investment is much less than allowed by the economic opportunities, including the spatial, technological and economic distances between regions. Figure 4 shows the closeness centralities of NUTS 3 regions. Figure 4a shows the original network, and Fig. 3b-d show the predicted networks. In this case, the predicted networks use the same color bars to ensure an easy comparison of the results. Figure 4 shows that the best predictions are provided by the gravity model. Both the random network and the spatial network models overestimate the in-closeness centralities. Both the original and gravity models indicate important roles for several eastern German, southern England, and northern French regions. In addition, all models and the original network indicate low in-closeness centralities for the Serbian NUTS 3 regions. Figure 5 shows the in-closeness centralities by year of a multilayer network predicted by the proposed GEN model. Figure 5 shows the changes in the roles of the NUTS 3 regions by investments. The changes in the yearly in-closeness centralities predict an increased role of Germany and the UK for investments.

Economic communities
In the case of module seeking, the group of nodes that are more connected than the predicted model estimates are identified. In the case of Newman and Girvan (2004)'s model (see Fig. 6a), the modules represent the set of NUTS 3 regions that are more connected to each other within a module than between two different modules. The spatial (see Fig. 6b) and proposed GEN model (see Fig. 6c) already consider spatial and economic properties during the prediction. Therefore, a distance-dependent module indicates communities in which there are stronger connections between regions within a module than are predicted by the distant-dependent model. In the case of seeking modules based on GEN prediction models, the economic communities are specified, where the connections are stronger than would be justified by geographical distance or economic and technological factors. Figure 6 shows modules by different null models. Modules with lower values (reddish regions) contain more regions, while (blueish) modules with fewer regions have a higher number. Modules marked in black contain only 1-1 regions. Since the employed database had no data for Turkey, NUTS 3 regions within Turkey are shown in white. As Gadár et al. (2018) showed, if Newman and Girvan (2004)'s method is employed to search for modules for a spatial network, the modules return the borders of the higher region area (in this case NUTS 1 regions, i.e., countries).
The compensation of distance dependency (see Fig. 6b) changes the shapes of the modules: the locations of modules appear more random. Modules are more separated, and there are more smaller modules. This structure can also be treated as organizing modules without spatial distance. Gadár et al. (2018) showed that gravity modules can specify investment catchment areas that can extend beyond administrative boundaries. Moreover, the fact that the gravity model primarily returned administrative boundaries, the results indicate that the established parent-subsidiary companies still have administrative borders. Economic communities are formed mainly within countries. Furthermore, most regions in Great Britain and Germany and in France and northern Italy form 1-1 economic blocks.  GEN predictions 2010Kosztyán et al. Applied Network Science (2022 Figure 7 shows the economic modules in the multilayer network, where the layers specify years. Figure 7 shows that administrative boundaries, which make it difficult for economic communities to form, can be observed every year. The largest block with the most regions remains the core of the European economy, Germany, Britain, France and northern Italy.

Discussion
The establishment of subsidiaries itself can be considered a kind of investment since the parent company establishes a subsidiary company in another region or country. The company takes the required technology to the new site and creates new jobs. Thus, the study of such networks is important. However, very few databases fully cover both economic and ownership relationships with firms. The paper formulates four research aims (see A 1 -A 4 ), five research questions (see RQ 1 -RQ 5 ) and two research hypotheses (see RH 1 -RH 2 ).
The proposed yearly gravity model has shown that company establishments are driven by technological and economic inequalities (see Table 3 and RH 1 ): capital flows from economically and technologically more developed regions to less developed ones. Moreover, the integration with network models has shown that this investment is primarily domestic (see Figs. 6 and 7, and RH 2 ). Although the European Union continues to push for higher integration, administrative boundaries still have a significant impact on ownership network formation (compare subfigures in Fig. 6, and see RQ 3 -RQ 4 ) and have remained broadly unchanged over the period under review (Fig. 7, and RQ 5 ).
The proposed (yearly) GEN models best explain the formation (see Fig. 1, and A 1 , RQ 1 ) of the corporate ownership network (see centrality estimates in Table 4, and A 2 , RQ 2 ). GEN also predicts the most attractive regions for investments well; see Table 5. By combining the proposed GEN model with the module search procedures, so-called economic communities can be defined and explained. In this way, the combined models show that several core countries, such as France, Germany, Great Britain and the Benelux countries, play a key role in ownership formation (see Figs. 3, 4, 5 and 6). Of particular interest is the strength of Great Britain's European integration (see, e.g., Fig. 5) of the core countries to create an economic community (see Fig. 6, and A 3 -A 4 , RQ 3 -RQ 5 ).

Summary and conclusion
This study attempts to combine networks using descriptive and economic factors in explanatory models, providing an opportunity to exploit the strengths of approaches (see A 1 -A 2 ). This paper proposes a generalized yearly gravity-based economic null (GEN) model to predict the spatial network of corporate ownership. Gravity models provide good estimates of the entire network properties, such as centrality, compared to the Newman and Girvan (2004) and Expert et al. (2011) model (see A 1 -A 2 ). The proposed gravity-based modularity provides economic communities (see A 3 -A 4 ). In addition, the proposed yearly model offers an analysis of the changes in economic communities (see RQ 5 ). The GEN model provides an opportunity to find a better explanation for the formation of networks.

Limitations and future works
In this research, the authors focused on European organizations; however, the corporate, patent, and GDP data are available globally. Nevertheless, NUTS 3 regions can only be interpreted within Europe. This study has shown that Great Britain, especially England, is connected to the European Union by a thousand strands. However, it may be interesting to examine the network structure after Brexit. In the proposed GEN model, we are currently undertaking a yearly estimation of a multilayer network, namely, the ownership network. Furthermore, the gravity model can be extended to estimate the connections of several networks organized into a so-called multiplex network. Finally, an industry-level analysis of ownership structure may provide additional information on the formation of parent-subsidiary relationships. 9 m 1 Total assets (thousand €): Total assets = (fixed assets + current assets). m 2 Solvency ratio (asset based) (%): (shareholder funds/total assets) * 100. m 3 Shareholders' funds (thousand €): total equity (capital + other shareholders funds). m 4 ROE using P/L before tax (%): return on equity (ROE) is a measure of financial performance calculated by dividing net income by shareholders' equity. Because shareholders' equity is equal to a company's assets minus its debt, ROE is considered the return on net assets. (Profit before tax/shareholder funds) * 100. m 5 ROCE using P/L before tax (%): return on capital employed (ROCE) is a financial ratio that can be used to assess a company's profitability and capital efficiency. In other words, this ratio can help to understand how well a company is generating profits from its capital as it is put to use. (Profit before tax + interest paid)/(shareholders' funds + noncurrent liabilities) * 100. m 6 Profit margin (%): (profit before tax/operating revenue) * 100. m 7 P/L for period (thousand €): net income. m 8 P/L before tax (thousand €): operating profit + financial profit. m 9 Operating revenue (turnover) (thousand €): total operating revenues (net sales + other operating revenues + stock variations). The figures do not include VAT. Local differences may occur regarding excises taxes and similar obligatory payments for specific markets of tobacco and alcoholic beverage industries. m 10 Fixed Assets (thousand €): total amount (after depreciation) of noncurrent assets (intangible assets + tangible assets + other fixed assets). m 11 Number of employees (employee): the number of employees. m 12 Current ratio (-): The current ratio is a liquidity ratio that measures a company's ability to pay short-term obligations or those due within 1 year. Current assets/ current liabilities. m 13 Cash Flow (thousand €): The term cash flow refers to the net amount of cash and cash equivalents being transferred in and out of a company. Cash received represents inflows, while money spent represents outflows. (Profit for period + depreciation). m 14 Number of companies: The accumulated number of companies in the NUTS3 region. m 14 Number of companies: The accumulated number of companies in the NUTS3 region. m 15 GDP per capita in purchasing power priority (thousand €): One popular macroeconomic analysis metric to compare economic productivity and standards of living between countries is PPP. PPP is an economic theory that compares different countries' currencies through a "basket of goods" approach, not to be confused with the Paycheck Protection Program created by the CARES Act. m 16 Patents: Patents protect technical inventions in all fields of technology. They are valid in individual countries for a specified period. Patents give holders the right to prevent third parties from commercially exploiting their invention. In return, applicants must fully disclose their invention. Patent applications and granted patents are published, which makes them a prime source of technical information.