Towards a Better Understanding of the Characteristics of Fractal Networks

The fractal nature of complex networks has received a great deal of research interest in the last two decades. Similarly to geometric fractals, the fractality of networks can also be defined with the so-called box-covering method. A network is called fractal if the minimum number of boxes needed to cover the entire network follows a power-law relation with the size of the boxes. The fractality of networks has been associated with various network properties throughout the years, for example, disassortativity, repulsion between hubs, long-range-repulsive correlation, and small edge betweenness centralities. However, these assertions are usually based on tailor-made network models and on a small number of real networks, hence their ubiquity is often disputed. Since fractal networks have been shown to have important properties, such as robustness against intentional attacks, it is in dire need to uncover the underlying mechanisms causing fractality. Hence, the main goal of this work is to get a better understanding of the origins of fractality in complex networks. To this end, we systematically review the previous results on the relationship between various network characteristics and fractality. Moreover, we perform a comprehensive analysis of these relations on five network models and a large number of real-world networks originating from six domains. We clarify which characteristics are universally present in fractal networks and which features are just artifacts or coincidences.


Introduction
Network science has received a great deal of research interest in the past two decades since networks can efficiently model numerous real-world structures and phenomena, including the Internet, the WWW, cellular networks, and social networks [1]. The primary goal of network science is to better understand the structure, origin, and evolution of real networks. For example, if we aim to efficiently stop or prevent a pandemic, it is important to explore the biological structure of the virus [2], the mechanisms underlying the disease [3] and the social interactions of communities [4,5].
The breakthrough in network science dates back around the millennium since the rapid and large-scale development of computer science made it possible to store and efficiently analyze complex networks [6]. An observation that there are properties, which are generally present in a large number of networks regardless of their origin, was also made in these years. The most important features of networks include the scale-free property [7] and the small-world property [8].
The fractality of networks is another well-studied characteristic. While the notion of fractal scaling was originally introduced in geometry, it has been extended to complex networks as well [9]. The book of Rosenberg [10] and the survey of Wen and Cheong [11] give an extensive overview of fractal networks. Fractal scaling was verified in various real-world networks [9,12,13] and has been associated with numerous important properties, such as robustness against intentional attacks [14] and accelerated flow [15]. Consequently, it is in dire need to uncover the underlying mechanisms causing fractality. Several studies have been published throughout the years that focus on the exploration of the origins of fractality [14,16,17,18] without a clear consensus.
In this work, we investigate which network characteristics influence the emergence of fractality in complex networks. To this end, we review the most influential studies, and we also extend the methodological approaches in the literature. Furthermore, we propose a completely different technique to gain a better understanding of the origins of fractality, namely, we utilize the tools of machine learning. To make our findings as universal as possible, all of the aforementioned analyses rely on our large collection of real-world and model-generated networks.
The investigated characteristics that have been connected to the fractality of networks are the following: 1 Yook et al. [12] and Song et al. [14] argued that fractality originates from the disassortativity of the network and the repulsion (disconnectedness) of the hubs. 2 Fujiki et al. [16] and Rybski et al. [19] demonstrated, using different approaches, that there is a connection between long-range anti-correlation and fractality. 3 Wei et al. [18] demonstrated that the distribution of edge betweenness centrality (BC) influences fractality and even a few edges with high BC can destroy the fractal structure of a network. 4 Csányi and Szendrői [20] were the first to draw attention to the opposing relationship between fractality and small-worldness, however, among many others, they also mention that the transition between fractal and small-world is smooth and these two properties can also be present simultaneously. 5 Finally, Kitsak et al [17] argued that in fractal networks there is a weaker correlation between the degree and the betweenness centrality of the nodes than in non-fractal networks.
In the Foundations and preliminaries section, we first lay the foundation of our analyses by showing how fractality can be determined in networks, presenting different fractal network models, and describing our dataset, which forms the basis of the analyses. In the Analysis of network characteristics section, we put under the microscope the aforementioned characteristics, which have been associated with fractality, one by one, and in A machine learning approach section, we use machine learning algorithms to study how the composite of the network characteristics influence fractality. Finally, in the Summary section, we summarize our findings and propose further research questions.

Foundations and preliminaries
In this section, we introduce the concept of fractal network, we lay the foundation of our analyses including the determination of fractality and the description of the used mathematical network models, and finally, we describe our collection of network data in detail.

Fractality of networks
Similarly to the case of geometric fractals, the fractality of networks can also be defined by the so-called box-covering method, using the length of the shortest path between two nodes as the distance metric. The method can be summarized as follows [9]: The nodes of the network are partitioned into boxes of size l B in such a way that any two nodes of a box are less than l B far from each other. The minimum number of boxes needed to cover the entire network with boxes of size l B is denoted by N B (l B ). A network is defined to be fractal, if the relation of N B (l B ) and l B follows a power law, i.e.: The d B exponent is called the box dimension or fractal dimension of the network.
Box-covering is proved to be an NP-hard problem [21], therefore, there is no efficient algorithm, which could find the exact solution, i.e. the optimal N B (l B ) number of boxes. However, numerous approximating methods have been proposed, for a collection and comparative analysis of box-covering algorithms, we refer to [22]. Here, we present only one of the most widely used methods, the Compact Box Burning (CBB) algorithm, which we use later for the boxing of our networks. The method works as follows [21]: 1 Let C be the set of uncovered nodes. 2 Randomly choose a c ∈ C node, and remove it from C. 3 Remove every node from C, which is at distance at least l B from c. 4 Repeat steps 2 and 3 until C becomes empty. At this point, the chosen c nodes form a compact box, thus no other nodes could be added to this box. 5 Repeat steps 1-4 until the whole network is covered.

Determination of fractality
The identification of the fractal nature of networks is of great importance, however, it is a very challenging task, since most of the solutions rely on visual evaluations. To avoid the uncertainty of these techniques, we apply a more automated method to determine the presence of fractality in networks.
In theory, the determination of the fractality of a network can be done by testing whether the minimal number of boxes N B (l B ) -determined by the box-covering method -scales as a power of the box size. A statistical framework for the detection of power law behavior in empirical data was developed by Clauset et al. [23], however, this framework is rarely used to quantify the fractality of networks due to the special nature of the problem. First of all, due to the NP-hard nature of the box-covering method, the use of approximation algorithms is necessary which makes the results less suitable for statistical analysis. Furthermore, for smaller networks or for those with small average distances, the number of points resulting from box covering is not large enough to obtain reliable information by these statistical tests. Moreover, the presence of different properties in networks is usually not pure, especially in real networks. It is a common phenomenon that fractal scaling holds only in an (l B,M IN , l B,M AX ) range of l B [23,24]. Often for small l B values, the power law relation prevails, while for large l B values exponential relation holds. Consequently, one has to choose a range of l B values to run the statistical tests in, which is itself a challenging task and it also reduces the sample size.
Due to the aforementioned difficulties, in practice, the most common technique for detecting the fractal nature of a network is to plot the (l B , N B (l B )) data points on a log-log plot, fit a straight line, and decide about the goodness-of-fit by the mean squared error, the coefficient of determination or by simply looking at the plots [16,25]. Obviously, these methods and the conclusions drawn from their results are highly influenced by personal decisions as it is also pointed out by Kovács et al. [22]. Furthermore, considering a large number of networks, the visual evaluation of plots becomes impracticable. Therefore, we will use a more automated way to decide about the fractality of networks.
We use a method introduced by Takemoto [26], which takes advantage of the observation that while in fractal networks the [24]. Here, we apply a modified version presented by Akiba et al. [27]. Namely, we fit both a power law and an exponential curve in the form of the mentioned relations to the normalized (l B , N B (l B )/N ) points, where N is the number of nodes in the network. The fitting is done by excluding the point corresponding to l B = 1 because it is usually an outlier. The fractality can be measured by the ratio of the root-mean-square errors of the two curves: The idea of normalization of data points and the use of RMSE allows us to compare the goodness-of-fit of different networks.
One might say that if R < 1, then the network is fractal since in this case, the power law curve fits better than the exponential one, otherwise it is non-fractal. However, as was also mentioned earlier, a network is not necessarily purely fractal, but it can still possess the fractal property for a given range. This metric also allows us to measure fractality on a continuous scale, the closer the R ratio is to 0, the more fractal the network is. However, in order to compare the characteristics of fractal and non-fractal networks, we still need to create a cut-off point. We observed in both real-and model-generated networks that R = 0.65 is a reasonable choice. Since the boundary is fuzzy, one could make a stricter partition, but for our analyses, it would not make a significant difference. Here, we say that the investigated networks with R < 0.65 are rather fractal than non-fractal and vice versa. Figure 1 shows a few illustrative examples.
It is also important to note that the described method cannot be used for networks with a small diameter (e.g. smaller than 6). However, for these networks, the fractal nature can hardly be interpreted anyway. Furthermore, this method may also not give appropriate results for some mathematical network models where the fractal scaling only asymptotically holds. For this reason, we use this method for the identification of fractality only for real networks, while we stick to the theoretical findings in the case of model-generated networks.

Network models
Mathematical models play a crucial role in understanding the properties of networks. Numerous models have been introduced throughout the years to capture fractal scaling in networks and to better understand the relation between fractality and other network characteristics. In this section, we describe five such network models, with a special emphasis on the connection of their parameters with fractality.

Song-Havlin-Makse model
One of the most well-known fractal network models is the Song-Havlin-Makse model (SHM) [14]. The network grows dynamically and the degree correlation (hub repulsion/attraction) of the emerging graph is driven by a predefined parameter p. The model is defined as follows: 1 The initial graph is a simple structure, e.g. two nodes connected via a link. 2 In the iteration step t + 1 we connect m offspring to both endpoints of every edge, i.e. an x node gains m · deg t (x) offspring, where m is a predefined parameter and deg t (x) is the degree of node x at the end of step t.
3 In iteration step t + 1 every (x, y) edge is removed independently with probability p, where p is a predefined parameter. When an edge is removed, it is replaced by a new edge between the offspring of x and y. Figure 2 illustrates two realizations that can be generated with the described model. The fractality is influenced by the choice of parameter p, namely, it can be shown that the generated network is fractal for p = 1, and non-fractal for p = 0 [14,28]. The intermediate values develop mixtures between the two properties. Our observation is that networks with p > 0.6 can be considered fractal, while those with p < 0.4 are clearly non-fractal, which is illustrated in Figure 3. It can also be seen that the transition from fractal to non-fractal is smooth, hence in the 0.4 ≤ p ≤ 0.6 range, it is questionable to assign the networks to any of the two categories. Later in the section called A machine learning approach, where a binary classification is carried out, we are still creating a cut-off point at p = 0.5, because we do not want to exclude the intermediate networks.  The SHM model was introduced to show that fractal networks exhibit strong repulsive relations between their hubs, which conjecture is reviewed later in the Disassortativity and hub repulsion section.
Hub attraction dynamical growth model Kuang et al. modified the Song-Havlin-Makse model (SHM) in such a way that the new mechanism can generate fractal networks with strong hub attraction [29]. The hub attraction dynamical growth (HADG) model is based on the previously described SHM model, with the following modification applied: first, the rewiring probability of the model is flexible, more precisely, it depends on the degree of the endpoints of the links. The other modification is what they call the withinbox link-growth method, which means that after an edge is rewired, the model adds additional edges between the newly added offspring, in order to increase the clustering coefficient of the network. The evolution of the HADG model is defined as follows [29]: 1 The initial condition and the growth of the model are the same as in step 1 and 2 of the Song-Havlin-Makse model. 2 We rewire the (x, y) edge at time t + 1 with probability where deg t (x) is the degree of node x and deg max t is the maximum degree in the network at time t and a, b, T ∈ [0, 1] are predefined parameters. Thus, if we define a < b, then hubs will have a higher probability to be connected than non-hubs. 3 At step t + 1, for each old y node, we add deg t (y) edges between the newly generated offspring of y. It should be mentioned that in the original paper, Kuang et al. used the notations a and b for the probabilities that an edge is not rewired [29], consequently, as a slight abuse of notation, the probabilities we use here are equivalent to 1 − a and 1 − b with regard to the original article.   Figure 4 illustrates two networks that can be generated with the model using different parameter settings. Kuang et al. concludes that there are fractal networks with assortative behavior, i.e. where the most connected nodes can be connected since this model can generate such graphs with appropriate parameter settings [29]. We can support the observation of the authors, namely, we found that choosing b > 0.1 (with our notation) results in fractal networks, and with b ≤ 0.1 we can generate non-fractal networks, independently of parameter a given that a < b. This is well-illustrated in Figure 5.
(u, v)-flower The family of (u, v)-flowers was introduced by Rozenfeld, Havlin, and Ben-Avraham [30]. Similarly to most of the previous models, this model also generates networks through iterations, but the edge replacement procedure is quite different. The model is defined as follows: 1 The initial graph is a cycle consisting of w = u + v nodes and edges, where u and v are predefined parameters, and we can assume that u ≤ v. 2 In the iteration step t+1 every (x, y) edge is replaced by two paths connecting x and y, one with length u and one with length v. Two networks generated with different parameter settings are shown in Figure  6. Rozenfeld et al. showed that the model generates fractal networks when u > 1, and non-fractal ones when u = 1 [30]. This statement is illustrated in Figure 7. Furthermore, it was also shown in [30] that in the u = 1 case the resulting networks are small-world, which supports the idea that fractal and small-world are conflicting properties. This statement is investigated in the Small-world property section.

Repulsion based fractal model
In [31] we introduced the repulsion-based fractal (RBF) model, which is also based on the SHM model [14] and adapts some concepts of the HADG model as well [29].  Illustration of the fractality of the (u, v)-flower for different parameter settings. In all of the cases, the iteration number is set to four, and w = 7 is also fixed.
The model evolves through time and rewires edges with probability based on the degree of the endpoints to create repulsion among nodes. The within-box link-growth method of Kuang et al. [29] is also adapted by the model to increase the clustering coefficient, hence creating more realistic networks. The growing mechanism of the repulsion-based fractal model is as follows: 1 The initial condition and the growth of the model are the same as in step 1 and 2 of the Song-Havlin-Makse model. 2 In iteration step t + 1 we remove every edge (x, y) with probability where Y ∈ [0, 1] is a predefined parameter, deg t (x) is the degree of node x, deg t,max is the maximum degree at step t. When an edge is removed, it is replaced with a uniformly randomly chosen new edge between the offspring of its endpoints. 3 We add deg t (y) edges among the newly generated offspring of every old node y. In order to avoid creating self-loops, this step is only executed, when m > 1. Parameter Y influences which group of nodes should repel each other (within the group). Figure 8 illustrates the two extreme cases of the model. This model generates fractal networks for all Y ∈ [0, 1], as it can be seen in Figure 9, and hence suggests that the property, which gives rise to fractality is repulsion, but the repulsion does not necessarily have to be among hubs [31].

Lattice small-world transition model
The lattice small-world transition model (LSwTM) was also introduced in [31]. It utilizes the fractal nature of grid-like structures and also adapts the preferential attachment mechanism to work against fractal scaling. The model is defined as follows: 1 We start with a d-dimensional (practically d = 2) grid graph with n 1 × n 2 × · · · × n k vertices. 2 With probability p, every edge of (x, y) is replaced by (x, z), where z is chosen with a probability that is proportional to p z : where a is a positive constant, deg(z) is the degree of node z and deg max is the maximum degree of the current graph. By default, y is replaced with z during the rewiring process, however, if in this way the graph becomes disconnected, x is replaced instead.
(a) (b) Figure 10 Illustration of the LSwTM for (a) p = 0, (b) p = 0.1, using a 10 × 10 lattice as the initial graph. Subfigure (a) is from [31] Even a small probability of rewiring results in a network that differs greatly from the initial grid graph, as illustrated in Figure 10. The fractality of the generated network depends on the choice of p. For p = 0 the network is purely fractal, and as p grows the model shows a transition from fractal to non-fractal networks [31]. As Figure 11 demonstrates, it is reasonable to choose p = 0.01 as a cutpoint. It was also shown in [31] that this model demonstrates a fractal-small-world transition.

Data
To gain a complete understanding of the relationship between fractality and other network properties, it is essential to consider a diverse and large-scale collection of real-world and model-generated networks as the basis of our analysis. Although mathematical models give insight into the evolution of networks and some distinguished network properties, they usually cannot capture every characteristic of real networks. To be as comprehensive as possible, we generated networks with the models introduced in the Network models section, with various parameter settings, in addition, we collected a large number of real networks originating from six different domains. The analyses were performed in Python, and all network-related calculations, including the network generation process, were done using the NetworkX package [32].

Model-generated networks
We selected the parameters of the different models to get a representative sample of the space spanned by the network models while keeping the number of networks reasonably low for computational purposes. For this reason, we limited ourselves to networks with at most around 10,000 nodes. Our choices of the parameter values are summarized in Table 1 and 2.  Table 1 Parameter settings for the Song-Havlin-Makse, Hub attraction dynamical growth, and Repulsion-based fractal models, with which the analyzed networks were generated. The parameter n denotes the number of iterations, for the meaning of the other parameters, see the Network models section. The resulting number of networks: 138 (SHM), 161 (HADGM), 69 (RBFM).  Table 2 Parameter settings for the (u, v)-flower and the Lattice small-world transition model, with which the analyzed networks were generated. n denotes the number of iterations, for the meaning of the other parameters, see the Network models section. In the case of the LSwTM, the values of the n 1 , n 2 parameters are chosen in such a way that |n 1 − n 2 | is minimal. The resulting number of networks: 118 ((u, v)-flower), 154 (LSwTM).
For those analyses, where the evaluation is done on a network-by-network basis by observing plots, we restricted ourselves to a smaller number of networks. We created three size categories of networks with approximately 800, 2000, and 5000 nodes. For every model, three to seven networks per size category were chosen including both fractal and non-fractal networks (except for the RBFM, where only fractal networks can be generated).

Real networks
Real networks were collected from various online repositories [33,34,35,36,37,38,39,40,41,42,43,44,45]. Table 3 gives a short description of the different domains from which we collected the networks together with the number of networks. In total, we work with 275 real-world networks. Some of their main features are listed in Table 4, aggregated by domains. For those analyses, which require visual evaluation, we selected four to six networks from every domain, bearing in mind to have both fractal and non-fractal networks from all size categories presented in the domain. Facebook, Twitter and collaboration networks 39 Table 3 Description of the network domains and the number of collected networks.
We decided on the fractality of the networks as we described in the Determination of fractality section. In order to eliminate the randomness of the box-covering algorithm, we repeated the procedure 15 times and averaged their outcomes. The resulting class distribution of model-generated networks, real networks, and all combined networks is shown in Figure 12. It can be seen that there are much more fractal networks amongst both the model-generated and the real networks, but the number of non-fractal networks is also significant.  Table 4 Some of the main features of the collected real networks. The average, minimum, and maximum values of the number of nodes, the number of edges, and the diameter of the networks by domains.

Analysis of network characteristics
In this chapter, we intend to give a comprehensive analysis of the relation of fractality with other network properties by revisiting some assertions from the literature.

Disassortativity and hub repulsion
The first network properties, which were associated with the origin of fractality are disassortativity and repulsion between large degree nodes, i.e., hubs [12,14]. It has been much disputed whether the fractal nature of networks originates from these characteristics, there are papers that support this assertion [46], but there are more works that confute it [16,18,29,47]. The concepts of disassortativity and hub repulsion are often used interchangeably, although the latter can be considered only as the practical interpretation of the former. For this reason, we rather separate the two notions: First, we measure the assortativity of a network by the classic assortativity coefficient. Second, we define a novel hub connectivity score (HCS) as the number of edges among hubs divided by the number of hubs, thus it shows how many hub neighbors a hub has on average. Formally: HCS = E hub N hub , where N hub denote the number of hubs in the network and E hub is the number of edges among these hubs. In this way, HCS is large for those networks, in which hubs tend to connect to each other (strong attraction), and small, when there are only a few or no edges among them (strong repulsion).
Here, we define hubs as nodes whose degrees are at least two times the average degree of the network. If there is no such node, we set its hub connectivity score to −1. Furthermore, both the assortativity coefficient and the hub connectivity score are averaged over 15 realizations of the network models for each parameter setting.

Results for dissasortativity
For mathematical network models, we study how the assortativity coefficient depends on the parameter of the model which influences the fractality of the network. Except for the LSwT model, all models generate disassortative fractal networks, however, the (u, v)-flower is the only model where the fractal networks are disassortative, and the non-fractal networks are assortative, as Figure 13 (b) shows.
While the fractal networks that the Song-Havlin-Makse and the Hub attraction dynamical growth models generate are disassortative, the non-fractal networks generated by these two models are also disassortative. Although in the case of the SHM model, the figures in the supplementary material [48] suggest that for a specific parameter setting (m = 2) the fractal networks are more disassortative than the non-fractal ones, for m = 1 and m > 2, fractal networks typically have a higher assortativity coefficient than the non-fractal ones. Hence, in general, based on the (dis)assortativity of the network generated by the SHM or the HADG models, no conclusions can be drawn about whether the network is fractal or not.
Similarly, the RBF model also generates disassortative fractal networks, but since the RBF model can only generate fractal networks, based on this model, no conclusions can be drawn about the assortativity of the non-fractal networks.
Assortativity as a function of the parameter, which influences fractality for the (a) Lattice small-world transition model, The grey dotted reference line shows Assortativity = 0. The results for the other models can be found in the supplementary material [48].
The LSwTM serves as a counterexample to the aforementioned assertion because it not only generates fractal networks with assortative mixing, but we can observe a positive correlation between fractality and assortativity, i.e., the "more fractal" the model is, the higher the assortativity is (see Figure 13 (a)). In this sense, the LSwT model behaves in the opposite way to the (u, v)-flower. In the case of real-world networks, we can say that fractal networks are often disassortative, but there are numerous examples of assortative cases too, which is well illustrated in Figure 14. Moreover, if we consider not only the binary fractal/nonfractal categories but the continuous R coefficient of the networks (see the Network models section), we cannot recognize any remarkable pattern in the assortativity. An illustration of this result can be found in the supplementary material [48].
Overall, our findings partially support the conclusion of Kuang et al., namely, that fractality is independent of the assortative mixing [29], because there are numerous counterexamples on both sides for the conjecture that fractality originates from disassortativity. However, disassortativity is still common amongst fractal networks.

Results for hub repulsion
Regarding the hub repulsion, we can observe that most models support the conjecture that this property may lie behind fractality. For instance, for the (u, v)-flower, in the u = 1 (i.e., non-fractal) case, the HCS scores are much higher than in the fractal cases. Disregarding small networks (i.e. if the number of nodes is less than 100) due to the lack of hubs, we can say that fractal and non-fractal networks can be clearly separated according to their HCS. Fractal (u, v)-flowers have HCS close to 0, while for non-fractal (u, v)-flowers, this measure is at least 1, which can be seen in Figure 15   Besides the (u, v)-flower, the Hub attraction dynamical growth model also seems to support the conjecture. Figure 16 shows how the hub connectivity score characterizes the different cases of the HADG model. For the non-fractal networks (i.e. when b ≤ 0.1) the HCS is higher than for the fractal networks. Furthermore, we can claim that the extent of the hub connection (or repulsion) depends on the parameter b, which influences the fractality of the network, and not on the parameter a, which creates the repulsion. Thus, as Kiang et al. [29] showed, the hubs in fractal networks can be directly connected, but our results show that the hub connectivity score is still capable of distinguishing the fractal and non-fractal networks generated by the HADG model. Similar behavior is demonstrated by the other models as well, that is fractal networks show stronger hub repulsion than non-fractals. However, it must be mentioned that HCS usually stays between 0 and 1 for these models, and the difference concerning fractality can only be observed for each model separately. For example, the Repulsion-based fractal model is able to create fractal networks with HCS being close to or even above 1, while in the case of the Song-Havlin-Makse model, only the non-fractal networks possess such high scores. Illustrations of the results for the SHM, RBF, and LSwT models can be found in the supplementary material [48]. Similarly, as Figure 15 (b) shows, no clear consensus can be drawn on the conjecture based on real networks. There are examples of fractal networks with large HCS (i.e., strong hub attraction), however, it can be said that networks with high HCS are typically non-fractals, although there are also many examples of nonfractal networks with lower scores. Similar observations can be made if we consider the R values of the networks (see: [48]). It is important to note that the scores are generally higher for real networks than for those that are generated by models, regardless of fractality.
In conclusion, we can say that similarly to disassortativity, strong hub repulsion is also common amongst fractal networks, but this property still cannot distinguish perfectly fractal and non-fractal networks, hence it cannot be considered as the reason behind fractality.

Long-range correlation
Besides direct degree correlation, the long-range correlation has also been associated with fractal scaling [16,19]. Both studies suggest, based on different approaches, that there is a connection between long-range anti-correlation and fractality. Here, we apply both of the methods [16,19] in addition to a more immediate extension of neighbor-level degree correlation measures, introduced in [49].
In [19] a fluctuation analysis approach was proposed to measure long-range correlations. The steps of this method can be summarized as follows: 1 Consider all shortest paths in the network of length d. For all of these paths, calculate the average degree of the nodes on the path. 2 Calculate F (d), which is the standard deviation of the mean degrees calculated in step 1. An extension of the concept of hub repulsion to long-range scales was proposed in [16]. The authors examined how the distribution of hub distances looks in fractal and non-fractal networks. The procedure can be summarized by the following steps: 1 Calculate the distance of all pairs of hubs.
2 For all distance l calculateP (l), which is the number of hub pairs separated by the shortest path of length l. 3 CalculateP (l) by dividingP (l) by the number of possible edges among hubs, i.e.P (l) =P (l)/ N hub 2 . In this way,P (l) is the probability that a randomly selected pair of hubs is at distance l from each other. In order to be consistent with the results of [16], for this analysis, we cut off the hubs at the 98th percentile of the degree distribution.
The third approach to capture long-range correlations was introduced in [49] and has not been used before to study the relationship between fractality and long-range correlation. This method extends the notion of neighbor connectivity to nodes at a distance larger than one. The main idea of the method can be summarized as follows.
1 Fix m, and for every node x, take the average degree of the nodes that are at distance m from x. 2 Calculate k m (k) by taking the average of the outputs of step 1 over nodes with degree k. 3 Examine the relation of k and k m (k). Following the line of [49], we consider the values of m up to 5, and assume power law relation between k and k m (k).

Results with fluctuation analysis
The results of the fluctuation analysis are illustrated for some real-world and modelgenerated networks in Figure 17. Generally, it can be said that for the (u, v)-flower and the Song-Havlin-Makse model, F (d) scales as a power of d with exponent less than − 1 2 , while in the non-fractal cases the relation is rather exponential, which supports the observations of [19].
However, in the case of the Repulsion-based fractal model, F (d) does not follow a power law. It may not be immediately visible from Figure 17(b), but the exponential curve provides a better fit than the power law. For the fitting, we use the powerlaw Python package [50].
For the Hub attraction dynamical growth model, as b increases, the power-law relation indeed appears, but the transition is smooth and there are fractal networks that do not show the desired relation. In the case of the Lattice small-world transition model, none of the previously mentioned relations seem to hold on F (d) for the fractal networks.
Among real-world networks, there are some cases, where the expected behavior of F (d) can be observed, as Figure 17(c) shows. However, there are examples, where power law relation cannot be detected, thus long-range correlations cannot be concluded, as it is illustrated in Figure 17(d). In conclusion, we can say that longrange anticorrelation captured by fluctuation analysis is not a universal property of fractal networks.

Results with hub distances
Concerning the distribution of hub distances, we can say that the HADGM, RBFM, and the (u, v)-flower support the suggestion of [16], that in fractal networks hub distances have a wide range, while in non-fractal networks hubs cannot get far from each other.
However, a surprising observation can be made based on the Song-Havlin-Makse model. As Figure 18 In the case of the LSwTM, for p ≤ 0.1, i.e., when the model is fractal, no hubs are formed, hence this analysis cannot be carried out for this model.
Investigating real networks suggests that the examined property is independent of fractality. The first two subplots of Figure 18

Results with long-range neighbor connectivity
Finally, the results obtained by the third approach, i.e., the neighbor connectivity [49], suggest that there is no apparent connection between the fractality and the long-range correlation profile of networks.
The Hub attraction dynamical growth model seems to be the only exception, because the non-fractal networks generated by this model usually preserve their disassortative structure for large distances as well, while fractal networks mostly do not show any correlation for distance m ≥ 3.
In the case of the Repulsion-based fractal model and the Song-Havlin-Makse model, usually, no correlation can be detected for m ≥ 3 and until that, the correlation profile does not change.    Figure 19 Degree correlations of the two extreme cases of the LSwT model at distances from 1 to 5. The first row corresponds to the p = 0, and the second to the p = 1 case. km (k) is plotted against k on a log-log scale, and the line fitted to the log-transformed data is also provided.
correlations, while non-fractal networks have negative correlations, even in the longrange scale (see Figure 19).  Figure 21 Degree correlations of two fractal social networks at distances from 1 to 5. km (k) is plotted against k on a log-log scale, and the line fitted to the log-transformed data is also provided.
Degree correlations of the real networks -independently of their fractality -are usually preserved or reversed for larger distances but do not seem to disappear. Figure 20 shows two brain networks, one of them is fractal, the other one is not, and their correlation profile is very similar for all m distances. Figure 21 shows two fractal social networks with a negative correlation on the direct neighbor level and a positive correlation at m = 2. Illustrations of the results for all of the examined networks can be found in the supplementary material [48].
Overall, we can conclude from the results of all three approaches that fractality and long-range correlation profiles do not have a clear ubiquitous connection.

Edge betweenness centrality
In [18], the authors reported that even a small number of edges with high betweenness centrality (BC) can destroy the fractal scaling of a network. Although, they investigated this conjecture from the perspective of minimum spanning trees, here we rather study the suggestion explicitly on the networks. In other words, we examine the question of whether fractal networks can have edges with high betweenness centrality. To this end, we calculate multiple measures concerning the edge betweenness centralities: the average and maximum BC and the average of the top 5% of edge betweenness centralities. We examine whether fractal networks tend to possess smaller values of the aforementioned measures.

Results
Some of the investigated models can only generate networks with edges having a small betweenness centrality, other models can also create networks with edges with quite large BC as well, therefore a comparison between models cannot be made.
The Song-Havlin-Makse, the Hub attraction dynamical growth, and the Repulsion based fractal models generate networks for which the examined measures range from 0 to 0.6, and they decrease as the number of nodes increases. A difference in fractal and non-fractal networks can be observed in the aforementioned three models: fractal networks tend to obtain larger values than non-fractal networks of the same size (see the supplementary material [48]). Similar observations can be made on the Lattice small-world transition model (see Figure 22 (a)-(c)). However, here the edge betweenness centralities are low in general for all parameter settings, regardless of fractality. Furthermore, we can observe that as the value of parameter p grows, i.e., as the network becomes less fractal, the betweenness centrality of its edges decreases which contradicts the conjecture.
Contrary to the previous models, for (u, v)-flowers, for any given network size, the maximal edge BC is typically larger for non-fractal than for fractal networks. The same observation can be made for the average of the top 5% betweenness centralities for small networks (fewer than 100 nodes), however, for larger networks, this property disappears, and the values are higher for fractal networks than for non-fractals. Moreover, the average of all betweenness centralities also shows that fractal networks have a higher average edge BC. The aforementioned results are well illustrated in Figure 22   To sum up, we can conclude that fractal networks can have edges with high betweenness centrality as well, furthermore, the related metric values seem to be higher on average for the fractal than non-fractal networks, which contradicts the suggested connection between edge betweenness and fractality.

Correlation of degree and betweenness centrality
In one of the earliest works on fractal networks, Kitsak et al. [17] studied the betweenness centrality of fractal and non-fractal networks. The authors analyzed seven SHM model-generated networks and four relatively large real-world networks. They have found that there is a smaller correlation between the betweenness centrality and the degree of a node in fractal networks than in non-fractal networks.
The authors argue that the Pearson correlation coefficient is not a suitable metric to characterize the difference between fractal and non-fractal networks because the average betweenness centrality for a given degree does not change much [17]. Hence, the authors measure the standard deviation of the betweenness centralities for a given degree and they compare it to that of the counterpart networks made by the configuration model. Due to the large computing complexity, here we apply a slightly different approach.
In this work, we measure the coefficient of variation (ratio of standard deviation to the mean) of the betweenness centralities for given degrees and then take the average along the degrees. A low mean coefficient of variation (CV) means that the correlation between the betweenness centrality and the degree is high, and similarly, a high CV means that the correlation is low. We also computed the Pearson correlation, and a weighted mean of the coefficient of variation, where similarly to Kitsak et al. [17], we weighted by the degree distribution.

Results
We found that the different metrics for measuring the correlation between betweenness centrality and degree gives consistent results. Here we discuss our findings concerning the mean CV in detail, for results about the other metrics see the supplementary material [48]. Figure 23 shows the mean CV of the betweenness centralities for three network models and the real-world networks. In the case of network models, for each parameter setting, we took the average of the results of 15 graphs.   (Figure 23 (b)) indeed supports the conjecture of Kitsak et al. (for networks with at least 100 nodes), but the other network models and real-world networks do not seem to be in alignment with this conjecture.
As Figure 23 (a) illustrates, in the case of the LSwT model, the dispersion (CV) of betweenness centrality is low for both pure fractal and pure non-fractal networks. Surprisingly, those networks have the highest dispersion (i.e., lowest correlation) that possess a mixture of the fractal and non-fractal properties (0 < p < 1).
In the case of the HADG model (Figure 23 (c)) while it is true that the purely nonfractal network has lower variance (i.e., the higher correlation between degree and betweenness centrality) and that the purely fractal network has high dispersion. It is also apparent that when b = 0.1 the model is still non-fractal, but the deviation of the betweenness centralities is as high as in the case of the fractal networks (b = 0.5 and b = 1.0).
We also studied the (u, v)-flower (the results are included in the supplementary material [48]), and we found that while on average the non-fractal networks (u = 1) have lower CV values, it is also possible to generate fractal networks that have nearly zero variance regarding the betweenness centralities for given degrees. Hence, the (u, v)-flower model also does not really support the observation of Kitsak et al. [17].
Finally, Figure 23 (d) demonstrates that in real networks the distribution of the coefficient of variation of the betweenness centrality in fractal and non-fractal networks does not differ.

Small-world property
Another widely studied topic is the relationship between the small-world and fractal properties of networks. Csányi and Szendrői suggest that these two are conflicting properties of networks, however, they also mention that mixed property could also be possible in such a way that a network is microscopically small-world, but fractal on a macroscopic scale [20]. Several other papers also reported that there is a connection between the lack of the small-world property and the emergence of fractality [25,30,51,52,53]. On the other hand, plenty of models have been introduced, that exhibit transition from fractal to small-world networks [14,54,55,56,57]. Moreover, modifications of existing models have been proposed to demonstrate the simultaneous presence of the two properties in the generated networks [14,46,58,59,60].
The possible connection between the fractal and small-world property of networks has received a great deal of research interest, however, it has to be mentioned that the term "small-world network" is often used non-rigorously. By definition, a network can be considered small-world, if l ∼ log N holds [8], i.e., when the average distance in the network (l) grows proportionally to the logarithm of the number of nodes (N ). However, this definition can only be applied to networks evolving with time or where different states can be compared, so typically to network models. Consequently, networks having a relatively small diameter or average path length compared to their size are also often referred to as small-world networks [61].
Here, we rather distinguish between the two approaches and examine the relation of fractality to the small-world property using two slightly different approaches. First, we consider the length of the diameter and average path length for all networks to see whether fractal networks have larger distances. In the second approach, we consider growing network models to study whether the original concept of smallworld property and the fractal property can simultaneously be present in a network or whether these are exclusive characteristics.

Results with normalized average path length and diameter
For the first approach, to be able to compare the distances of networks of different sizes, both the average path length and the diameter are normalized by the logarithm of the number of nodes [45].
In the case of the (u, v)-flower and the Lattice small-world transition model, apart from networks with very few nodes, the different cases determined by the main parameter (u for the (u, v)-flower and p for the LSwTM) clearly separate, and the distances are growing as the networks become fractal. This phenomenon is well illustrated by Figure 24 (a), which shows the change in the normalized diameter and average path length for the (u, v)-flower. In the case of the Song-Havlin-Makse and the HADG model, this clear separation can only be observed for fixed values of the (n, m) parameter pair. Namely, the parameter, which influences fractality, also affects the distances similarly to LSwTM and (u, v)-flower.
Since the Repulsion based fractal model always generates fractal networks, the previous comparing-based examination cannot be applied to this model, however, it can be said that the average path length and the diameter of this model are similar to those of the fractal networks generated by the SHM and HADG models.
As Figure 24 (b) illustrates, similar observations can be made on the real networks, too. The distances are mostly quite small both in non-fractal and fractal networks and for small values the two classes can hardly be separated based on the normalized diameter and average path length. However, we can observe that non-fractal networks do not tend to have large distances, and based on our dataset, a cut-off point at 2.5 for the normalized average path length, and at 5 for the normalized diameter can be created. Above these values, there seem to be only fractal networks. However, we have to emphasize that there are numerous fractal networks below these cut-off points as well, which means that fractality does not originate just from large distances.
Results with growing network models All the investigated network models undergo a transition from the small-world to the non-small-world property driven by the main parameter that also drives the fractality of the network: as fractality weakens, small-world property arises. This transition is not sharp, and there are intermediate states, where both properties are significant. The small-world transition can also be observed in the Repulsion based fractal model, which generates only fractal networks. Figure 25 illustrates some cases of the Repulsion based fractal model and the Lattice small-world transition model. It can be seen that there are a few states for both models, where the small-world property is met, while the networks are also fractal. Except for the Lattice small-world transition model, the relation of the average path length to the size can also be examined considering the iterations, through which the network evolves. That is we fix all the parameters, except the iteration number (n), and see if the average path length grows proportionally to the logarithm of the size or power law holds instead. It can be said that in this approach, for the cases where the resulting networks are rather fractal than non-fractal, indeed power law holds, while in the non-fractal cases logarithmic relation can be observed. The only exception is the Repulsion based fractal model because it has a few (fractal) cases, where the connection is rather logarithmic, for which an illustration can be found in the supplementary material [48].
Overall, if we take into consideration both of the approaches concerning the smallworld property we can say that there is a significant relation between fractality and non-small-world property, however, they do not necessarily exclude each other since there are examples for networks, which are fractal and small-world at the same time.

A machine learning approach
In the previous sections, we investigated the network characteristics that have been associated with fractality one by one. Here, we address the problem as a binary classification task and distinguish between fractal and non-fractal networks based on a few selected network characteristics. The benefit of this approach is that we can investigate numerous network metrics at the same time, identify the most important features and also recognize how the combination of metrics affects the fractal scaling of networks.
Here, we use three decision tree-based classification algorithms to distinguish between fractal and non-fractal networks: simple decision tree, random forest, and XGBoost. We select the explanatory variables to get a collection of characteristics, which represents the structure of the networks well, but they are not too correlated. Moreover, we aim to make these metrics as independent of the network size as possible, hence where it is reasonable, normalization is also performed. We extend the set of metrics that we used in our earlier studies [45,62] with features that have been associated with fractality. The list of our explanatory variables together with their description can be found in Table 5.  Table 5 Name and description of the chosen explanatory variables, for the classification task.
We consider three datasets to perform the task on, one consisting of the modelgenerated networks, one of the real networks, and one which combines the two sets, thus including all examined networks. Moreover, we drop the small networks from all datasets, i.e. the ones whose number of nodes is less than 100, because in most of these cases fractality can hardly be defined, as was also mentioned earlier. We use 2/3 of the datasets for training and the remaining 1/3 for testing to avoid overfitting. Two evaluation metrics are used to measure the performance of the algorithms, accuracy and the Area Under the ROC Curve (AUC). The hyperparameter optimization for the algorithms is carried out based on the latter one because, in the case of an unbalanced class distribution, the accuracy score can often be misleading. For the data preparation, training, and evaluation of the algorithms, we use the scikit-learn [63] and XGBoost [64] Python packages.
To identify the most important variables, we calculate the permutation importance score of the features. This score shows how much the performance of the model decreases if the values of a given attribute are randomly permuted.

Results
The performance of the models measured on the test sets is summarized in Table 6. It can be said that all of the algorithms can solve the problem with high accuracy and AUC score, thus we can conclude that fractal and non-fractal networks indeed differ in the considered network characteristics.  Table 7 shows the three most important features with the corresponding permutation importance scores for every algorithm and dataset. We can observe that for model-generated networks, the normalized diameter and the hub connectivity score both have significant importance for all algorithms. In addition, the assortativity coefficient also seems to be important for most of the methods. In the case of real networks, the set of important features varies for the different machine learning (ML) models. The assortativity coefficient, the hub connectivity score, and the normalized diameter are among the most important attributes for two of the three algorithms, but the average clustering coefficient and the average or maximum degree can also be considered important features for some ML models. In the combined dataset, the normalized diameter and the hub connectivity score turned out to be the most important characteristics of all algorithms. The assortativity coefficient, average clustering coefficient, and average degree also seem to have notable importance for some of the methods. Figure 26 shows two scatterplots of the combined dataset with respect to different network characteristics. It can be seen that while a large (normalized) diameter is a characteristic of only the fractal networks, an additional feature is still not enough to clearly separate fractal and non-fractal networks when the diameter is small. Similarly, most of the investigated fractal networks possess a small hub connectivity score, but there are a significant number of them with large HCS, and considering the normalized maximum degree as well, we still cannot separate the two classes clearly. From the results detailed above, we can conclude that the magnitude of distances in a network is certainly connected to fractality. It may not be the distances between hubs, which influence fractality, but average distances generally. However, the connectivity of hubs, as well as assortativity indeed seem to have a distinguishing ability. Although alone they are not enough to separate fractal and non-fractal networks, together with other properties, they could contribute to the distinction. It seems that, although a single network characteristic does not clearly determine the fractal property, the combination of several metrics can achieve excellent distinguishing power.

Summary
In this work, we investigated which characteristics could cause the emergence of fractal scaling in complex networks. Our analyses relied on a large dataset of both real-world and model-generated networks, in order to prevent making conclusions based on coincidences. Our most important findings are summarized in Table 8. Weak correlation between degree and node BC (N > 100)

Non-small-world property
Average path length and diameter large dist. =⇒ frac. small dist. =⇒ non-frac. Growing network models - Table 8 The relation of network models and real networks to the examined conjectures concerning fractal characteristics. The check mark () denotes that the model/real networks support the statement (for real networks it means that the distribution of the metric is statistically significantly different in fractal and non-fractal networks), the cross mark () means that they contradict it, i.e. they behave in the opposite way, the circle ( ) refers to mixed results and when no clear conclusion can be drawn, moreover, the hyphen (-) denotes when the analysis could not be carried out due to the nature of the model. Finally, N denotes the number of nodes and n denotes the number of iterations.
Concerning the disassortativity of fractal networks, we have found that although most of the considered mathematical models suggest that fractality correlates with disassortativity, there is also one model, the Lattice small-world transition model, which completely contradicts the statement. Consequently, we can conclude that although disassortativity is common amongst fractal networks, just based on the disassortativity, we cannot clearly tell whether a network is fractal or not, which is suggested by real networks as well. We conclude that disassortativity cannot be considered the reason behind fractality, Somewhat similar observations can be made in the case where hub repulsion was measured directly. All of the considered network models show that in fractal networks hubs are less connected than in nonfractal networks (smaller hub connectivity score). The real networks also suggest that a large hub connectivity score (hub attraction) is a property of non-fractal networks, but counterexamples on both sides make hub repulsion a non-universal characteristic of fractal networks.
The possible connection of long-range anticorrelation to fractality was reviewed using three different methods. Here we could not find a clear connection between the correlation of node degrees and fractal scaling. Although for all three methods, we could find examples of both model-generated and real networks, which support the suggestion of anticorrelation in fractal networks, even on the long-range scale, there are numerous counterexamples as well.
The suggestion of the connection of edge betweenness centrality with fractality was also reviewed. We examined whether fractal networks can possess edges with large betweenness centrality. We have come to the conclusion that fractal networks show no tendency to have edges mostly with small betweenness centrality. Almost all of the examined network models show the opposite of the statement, while real networks suggest that small edge betweenness centrality does not depend on fractality.
In addition to the connection of fractality and edge betweenness centrality, a suggestion regarding node betweenness centrality was also revised. Namely, we revisited the conjecture that the correlation between degree and node BC is weaker in fractal networks than in non-fractals. We have found that the Song-Havlin-Makse model supports this statement, but all the other mathematical models and the real networks rather contradict it.
We investigated thoroughly the suggested conflicting relation of fractality and the small-world property. We have found that those network models which are able to generate both fractal and non-fractal instances support the observation that the distances (average path length and diameter) are larger in fractal networks. In the case of real networks, we have also found that large distances are present only in fractal networks, however, small distances do not imply non-fractality. Moreover, a transition from "small-worldness" to large distances can be observed in the Repulsion based fractal model as well, which generates only fractal networks. When we examined the small-world property on growing network models we experienced that the vast majority of the considered model-generated networks support the conflicting relation of fractality and small-world property.
Finally, we introduced a novel approach to analyze the origin of fractal networks. We formulated a binary classification problem with the goal to distinguish fractal and non-fractal networks based on other network properties. We solved the problem with state-of-the-art machine learning algorithms and identified the characteristics with high distinguishing ability. The results suggest that although a single characteristic is not enough, a combination of several metrics can distinguish between fractal and non-fractal networks efficiently. The normalized average distance is possibly one of the most essential properties in recognizing fractal scaling, moreover, hub connectivity and assortativity can also contribute to the characterization of fractal networks together with other properties.
An important direction of further studies could be to directly examine the possible connection of the proposed joint properties to fractality. For these analyses, the extension of the dataset with additional models and real networks may be necessary. Furthermore, other network characteristics should be involved in such studies, which have not been considered in previous works. Different approaches could also be used to distinguish fractal and non-fractal networks, such as network embedding techniques. If the networks could be embedded into a vector space, where the two classes are well-separated, then the properties of this space could reveal what the difference lies in.