 Research
 Open Access
 Published:
A study on the friendship paradox – quantitative analysis and relationship with assortative mixing
Applied Network Science volume 4, Article number: 71 (2019)
Abstract
The friendship paradox is the observation that friends of individuals tend to have more friends or be more popular than the individuals themselves. In this work, we first study local metrics to capture the strength of the paradox and the direction of the paradox from the perspective of individual nodes, i.e., an indication of whether the individual is more or less popular than its friends. These local metrics are aggregated, and global metrics are proposed to express the phenomenon on a networkwide level. Theoretical results show that the defined metrics are wellbehaved enough to capture the friendship paradox. We also theoretically analyze the behavior of the friendship paradox for popular network models in order to understand regimes where friendship paradox occurs. These theoretical findings are complemented by experimental results on both network models and realworld networks. By conducting a correlation study between the proposed metrics and degree assortativity, we experimentally demonstrate that the phenomenon of the friendship paradox is related to the wellknown phenomenon of assortative mixing.
Introduction
Topologies of complex networks have been known to exhibit nontrivial heterogenous properties such as heavytailed degree distributions (Barabási and Albert 1999) and assortative mixing (Newman 2002), to name a few. A similar nontrivial network heterogeneity is captured by the phenomenon of the friendship paradox. The friendship paradox was first studied by Scott Feld in the context of social networks (Feld 1991). It states that the average number of friends of the collection of friends of individuals in a social network will be higher than the average number of friends of the collection of the individuals themselves. The phenomenon results from a sampling bias that causes nodes to be counted in proportion to their degree, and it extends beyond popularity to other individual traits such as amount of content in social networks (Hodas et al. 2013), number of coauthors, citations and publications in scientific collaboration networks (Eom and Jo 2014), etc. This is known as the generalized friendship paradox and has been attributed to a correlation between these desirable traits and degree (Eom and Jo 2014; Fotouhi et al. 2014) although it has been proven to exist in absence of a significant correlation as well (Momeni and Rabbat 2016). The phenomenon has a measurable impact on people’s psyches because people tend to evaluate success in regard to a given trait through comparison to the social network around them (Bollen et al. 2017; Jackson 2016); an overshadowing effect of a collection of nodes on another is significant. The friendship paradox has been studied in the context of both online (Bagrow et al. 2017; Bollen et al. 2017; Hodas et al. 2013) and offline social networks (Feld 1991). The phenomenon is described as a global, networkwide characteristic. It is natural to ask if the friendship paradox holds at a more local level, e.g., at the level of individual nodes in a network (Hodas et al. 2013; Momeni and Rabbat 2016). In this work, we mathematically quantify the friendship paradox so as to understand its strength both locally and globally in the network.
Inspired by the friendship paradox, we first study a local network metric for a node, which we call the “friendship index” (FI). For a vertex v, it is defined as \(FI(v) = \frac {\sum _{(v, u) \in E}{d_{u}}}{d_{v}^{2}}\), which is the ratio of the average degree of the neighbors of the vertex to its own degree. FI captures two significant characteristics related to the friendship paradox. First, it captures the ‘direction of influence’, i.e., it indicates whether, in accordance with the friendship paradox, the vertex is outperformed by its neighbors in popularity, or if it is one of the vertices for which the paradox does not hold because it is more popular than its neighbors. Second, it quantifies the ‘discrepancy’ compared to a scenario where the friendship paradox does not exist; FI values further from 1 indicate an extreme manifestation of the paradox whereas FI values closer to 1 would indicate the vertex experiences the paradox less severely. Notice that this measure is truly local. The node’s FI value is determined entirely by its own degree and the degrees of its neighbors, the structure of network as a whole does not influence a node’s FI. This distinguishes FI from local assortativity (Thedchanamoorthy et al. 2014; Piraveenan et al. 2010), a node metric that captures a relative value normalized by other such values in the network. Having defined the local metric, we propose aggregating the indices over all vertices in order to quantify the strength of the overall networkwide phenomenon of friendship paradox by taking an arithmetic mean, the log of the geometric mean, and the harmonic mean. We explore these measures theoretically and experimentally, noting their ability to capture network characteristics. We compare and contrast them and ultimately find value in the arithmetic mean and the log of the geometric mean, while discarding the harmonic mean because drastically different graphs have equivalent harmonic means. Consistent with the nature of the friendship paradox, we prove a significant lower bound on these means that indicates the average FI is always greater than or equal to 1, meaning nodes feel outperformed by their neighbors on average. Because of this bound, we can say the aggregates indicate a stronger occurrence of the friendship paradox as they increase. Unlike FI, where the distance from 1 can be in either the positive or negative direction, the aggregate measures only increase with the strength of the paradox. We explore realworld networks that are categorized by their type, and find some consistencies in our aggregate measures within the individual categories.
A number of existing works have analysed the friendship paradox from different perspectives. The comparison between a node’s degree to both the mean and median of neighbors’ degrees has been used in the literature, primarily as a binary measure of whether a node is more or less popular compared to its neighbors (Jo and Eom 2014; Momeni and Rabbat 2016; Momeni and Rabbat 2018; Lee et al. 2019), while some (Hodas et al. 2013; Jackson 2016) have also used its absolute value to indicate severity of the paradox. Hodas et al. (2013) uses the binary measure obtained through the mean of neighbor’s degrees to investigate the friendship paradox in Twitter network. Jo and Eom (2014) characterized the paradox holding probability of individual nodes, and studied its behavior for network models with tunable degreedegree and degreeattribute correlations. A significant finding was that the paradox holding probability may depend on the assortativity of the network, which acts as a strong motivation for our work. Momeni and Rabbat (2016) used both the mean and median based binary measures to investigate the prevalence of friendship paradox in Twitter network. Lee et al. (2019) studied three perception models of the friendship paradox  meanbased, medianbased and fractionbased binary measures for the friendship paradox aggregated across the entire network. These measures were analysed in configuration models with tunable assortativity, and the impact of the perception model was studied on opinion formation. Our contribution has been to investigate the measure based on the mean of neighbors’ degrees as a network metric by studying its properties in popular network models, and illustrating its relationship with local and global measures of assortativity.
The phenomenon of assortativity, or assortative mixing captures the preference for a network’s nodes to attach to others that are similar in some way. In this work, we focus on degree assortativity (Newman 2002) where the measure of similarity is in terms of the nodes’ degrees. Although assortativity was proposed as a global measure, local versions have also been proposed more recently (Piraveenan et al. 2010; Thedchanamoorthy et al. 2014). It is worth mentioning that these local assortativity measures are examples of node measures that are not truly local, because they are influenced by the global assortativity of the graph. Jo and Eom (2014), Lee et al. (2019), and Momeni and Rabbat (2018) conduct studies on networks that vary in assortativity and explore its impact on the friendship paradox. In this work, we further investigate the relationship between the friendship paradox and assortative mixing, by analysing the relationship between the FIs of nodes and their local assortativity measures. We observe that FI values close to 1.0 indicate strong local assortativity, while FI values further from 1.0 indicate weak assortativity. We also explore the relationship between our networkwide aggregate measures and network assortativity. Similar to the results of the local measures, when the aggregate values are extreme they indicate disassortativity and when they are moderate they indicate assortativity. These consistencies between the measures are revealed through theoretical arguments and experimental results. We conclude that the friendship index captures some information about assortative mixing in the network. We consider canonical graphs and common graph models and find our aggregate measures produce well behaved functions that smoothly reflect small adjustments in the graphs where the function of network assortativity has a far less smooth curve. The present work is an extension of our previous work (Pal et al. 2018); here we present complete and detailed proofs of results partially proven in Pal et al. (2018) along with new theoretical findings on the proposed metrics in networks models, such as ErdosRenyi and BarabasiAlbert graphs. Additionally, a more thorough simulation study on network models is presented here.
The paper is organized as follows. In “Preliminary” section we define the local and global notions of friendship index and assortativity. These metrics are then studied for various canonical graphs such as regular graphs, star graphs and complete bipartite graphs. In “Theoretical results on friendship index” section, we highlight some theoretical results of the friendship index, followed by experimental results on network models and real networks in “Experimental results” section.
Preliminary
Friendship index
Consider a graph G=(V, E), with V and E⊆V×V being the set of nodes and edges respectively. For each node i∈V, let d_{i} denote the degree of node i and S_{i} denote the sum of i’s neighbors’ degrees. Therefore, we have
where N_{i} denotes the set of neighbors of node i. The friendship index (FI) of node i is defined as
The friendship index of a node is the average degree of its neighbors normalized by its own degree. Therefore, the FI of a node will be less than 1 if its own degree is larger in comparison to its neighbors’ degrees, and greater than 1 otherwise.
Another way to understand the friendship index is through the uniform and neighborhood node sampler (Leskovec and Faloutsos 2006; Momeni and Rabbat 2018). For a graph G=(V, E), we define the following node sampling techniques:
a) Uniform sampler ρ  A uniformly sampled node is chosen uniformly at random among the nodes in the graph. Therefore, \(\rho \sim \mathcal {U}(V)\).
b) Edge sampler σ  An edge is sampled uniformly at random, then one of its endpoints are chosen uniformly at random. For an uniformly chosen edge \((\sigma _{1},\sigma _{2}) \sim \mathcal {U}(E)\), \(\sigma \sim \mathcal {U}(\sigma _{1},\sigma _{2})\).
c) Neighborhood sampler μ  First define \(\mu (i) \sim \mathcal {U}(N_{i})\) to be a node sampled uniformly at random among the neighbors of node i. If node i has no neighbors then set μ=i. Next, we define μ(ρ) to be a node that is sampled randomly from the neighbors of an uniformly sampled node ρ. Operationally, first sample \(\rho \sim \mathcal {U}(V)\) and \(\mu \sim \mathcal {U}(N_{\rho })\); However if N_{ρ}=∅, then set μ=ρ.
We can equivalently define the friendship index as follows,
It is easy to see that the friendship index for regular graphs will always be 1 because all degrees are the same. This is a local network metric concerning a particular node. We could either aggregate the local measure to obtain a global measure or directly work with the local FIs {FI(i), i∈V}.
The local measures of the friendship paradox can be extended to the entire network in different ways. The following global notions of FI can be defined for a given graph G:

1
Arithmetic FI (AFI)  We define AFI as the average, or arithmetic mean, of all the local FIs
$$ AFI(G) = \frac{1}{V}\sum_{i \in V} FI(i) $$(3)Observe that
$$\begin{array}{*{20}l} AFI(G) &= \frac{1}{V} \sum_{i \in V} \frac{ \mathbb{E}[{d_{\mu(i)}}]}{d_{i}} = \mathbb{E}{ \frac{d_{\mu(\rho)}}{d_{\rho}} } \end{array} $$(4)with the caveat that \(\frac {\mathbb {E}{d_{i}}}{d_{i}}\) is set to 1 if d_{i}=0. Therefore, AFI is the expected ratio between the degrees of a neighborhood sampled node μ(ρ) and that of the uniformly sampled node ρ.

2
Geometric FI (GFI)  This metric is defined as the logarithm of the geometric mean of all the local FIs
$$GFI(G) = \frac{1}{V} \sum_{v \in V} \log FI(v) $$While the linearity of expectation allows the alternative representation for AFI through the notion of sampling, we do not have an equivalent expression for GFI. This leads to greater mathematical tractability in analysing AFI of network models compared to GFI or other binary measures based on individual nodes involving Heaviside step functions. However, GFI has other benefits over AFI that will become apparent later.

3
Harmonic FI (HFI)  This metric is defined as the harmonic mean of all the local FIs
$$HFI(G) = \frac{1}{\frac{1}{V} \sum_{v \in V} \frac{1}{FI(v)} } $$Again for HFI, we do not have an alternative representation like (4).
We will give theoretical results on AFI and experimental results on both AFI and GFI. In what follows we explain the relative advantages of considering AFI and GFI.
Global and local assortativity
Newman (2002) defines assortativity as a measure of the similarity of degree in adjacent vertices.^{Footnote 1} Newman formally quantifies assortativity as a Pearson correlation coefficient of the degrees of the two vertices attached to every edge. This can be expressed as
where (σ_{1},σ_{2}) is an uniformly sampled edge and σ is a node sampled according to the random edge sampling. This difference in sampling nodes introduces inherent differences between the friendship index and assortativity.
The assortativity can also be defined in terms of the network degree distribution p, excess or remaining degree distribution q and the link distribution e_{j, k}. Note that the distribution p is the degree distribution of an uniformly sampled node ρ. The distribution q is related to the excess degree of a node arrived through the random edge sampling procedure, given by
We also define the joint probability distribution of the remaining degrees of the two nodes on either end of a randomly chosen link as e_{j, k}. We can equivalently define assortativity as
where σ_{q} is the standard deviation of the distribution q.
While this definition of assortativity is at the global level, several efforts have been made to quantify the assortativity of the network at a local level (Piraveenan et al. 2010; Thedchanamoorthy et al. 2014). Specifically, we first consider the definition of local assortativity presented by Piraveenan et al. (2010). This definition of local assortativity, which we call the plocal assortativity is defined as
where, \(\bar {k}\) is the average remaining degree of the node’s neighbors, μ_{q} and σ_{q} are the mean and standard deviation of the distribution q. It also follows that the plocal assortativities of all the nodes sum up to the network assortativity r.
Note that r_{p}(i) is positive and large if \((d_{i}1)\bar {k} >> \mu _{q}^{2}\). This happens when a high degree node is connected to other nodes which have relatively high degree or if a node with degree in a medium range has neighbors with large degrees. And, r_{p}(i) is large in magnitude and negative if \((d_{i}1)\bar {k} << \mu _{q}^{2}\), which can happen when a node with low degree is connected to low degree nodes. While in such cases the local assortativity should be positive, the plocal assortativity will have a negative value.
We define another notion of local assortativity due to Thedchanamoorthy et al. (2014), which has been argued to be closer to the fundamental notion of assortativity compared to that proposed by Piraveenan et al. (2010). This concurs with our analysis in the previous paragraph that the plocal assortativity does not always capture what a local measure of assortativity should. For these reasons, we only show results for the local assortativity due to Thedchanamoorthy et al.
The local assortativity measure calculates the ‘average neighbor difference’, a direct indicator of a node’s dissortativity, given by
which is then scaled by the sum of the neighbor differences across all the nodes, to obtain the normalized neighbor differences \(\bar {\delta _{i}}\). We therefore have,
The final assortativity of a node, which we call the Tlocal assortativity (TLA) is obtained by
where \(\lambda = \frac {r+1}{V}\), such that the sum of all the local assortativities yield the global assortativity r.
Observe that if δ_{i} is small, then the degrees of node i and its neighbors are very close. Therefore, the local measure FI(i) should be close to 1. However, if δ_{i} is large then at least one neighbor of i has degree very different from i. Therefore, it is expected that the FI(i) would be far from 1. If it is much less than 1, then the degree of i is more than its neighbors’ on average; while if it its greater than 1, then the degree of i is less than its neighbors’ on average. Therefore FI captures ‘direction of influence’ in scenarios of local disassortativity. If the TLA at node i is close to 1, i.e., degree of node i is similar to its neighbors, then FI(i) would be close to 1; and if TLA is small and close to 0, i.e., the degree of node i is significantly different from its neighbors, then FI(i) is expected to be away from 1. Therefore, FI(i)−1 is expected to be anticorrelated with TLA, i.e., as FI(i)−1 increases, TLA is expected to reduce.
Results for special graphs

1
dregular graphs  Since all nodes have the same degree d, FI=1 for all nodes. Therefore AFI=1, GFI=0 and HFI=1. A clique of n nodes is a n−1regular graph. Thus, the result also holds for cliques. Therefore, for dregular graphs where the friendship paradox does not occur, we get the minimum values for AFI and GFI (see Theorems 1 and 2). Also, the assortativity r=1 for regular graphs, and all local assortativity values will be positive and equal \(\left (r_{T}=\frac {2}{n}\right)\).

2
Star graphs  Consider a star graph with n nodes. The FI for the center of the star will be \(\frac {1}{n1}\), and n−1 for all leaves. One can show that the FI values for the center and the leaves of a star are the maximum and minimum possible values for a graph with the given number of nodes. Thus the friendship paradox is rather severe for a star graph. We calculate, \(AFI=n2+\frac {1}{n1}\), \(GFI=\frac {n2}{n} \log n\) and HFI=1. An advantage of considering GFI is that the maximum and minimum possible values of FIs have the same absolute value in the logarithmic scale. Surprisingly, HFI for a star is the same as that of a dregular graph. The assortativity r=−1, while all the local assortativities r_{T} will be the minimum value of −1.

3
Complete bipartite graphs  Consider a bipartite graph with x nodes on one side and y nodes on the other. This is a generalization of the star graph (x=1,y=n). Let us denote this as Bipartite(x, y), with the set of x nodes on one side being denoted as X, and the set of y nodes on the other side denoted as Y. For vertex v∈X, \(FI(v)=\frac {x}{y}\) and for v∈Y, \(FI(v) = \frac {y}{x}\). Therefore, if x>y then nodes in X will have FI>1 and nodes in Y will have FI<1. The \(AFI = 1 + \frac {(xy)^{2}}{xy}\) which is always greater than 1 as long as x≠y. Furthermore, \(\frac {\partial AFI}{\partial y} = \frac {\left (y^{2}x^{2}\right)}{xy^{2}}\) implying that \(\frac {\partial AFI}{\partial y}>0\) for y>x and \(\frac {\partial AFI}{\partial y}<0\) for y<x. Therefore, with x kept constant, the AFI increases when y is varied away from x. We also have \(GFI=\frac {1}{x+y} \log \left (\frac {x^{x} y^{y}}{x^{y} y^{x}} \right)>1\) for x≠y. We calculate, \(\frac {\partial GFI}{\partial y} = \frac {2xy \log \left (\frac {y}{x} \right) + \left (y^{2}  x^{2}\right) }{(y+x)^{2}y}\), which again implies that the GFI will increase as y is varied away from x. Therefore, both AFI and GFI are wellbehaved for the class of complete bipartite graphs in that both of the measures increase as y is varied away from x indicating more prominence of the friendship paradox. Interestingly, for x≠y, assortativity r=−1, while for x=y assortativity r=1 by definition, indicating that the variation in assortativity is rather sharp. We also compute HFI=1 for all positive values of x and y.
We find HFI to be equal to 1 for widely different classes of graphs ranging from dregular graphs to star graphs. Therefore in later sections, we will only continue investigation into AFI and GFI owing to their more wellbehaved nature. Also from the star graph, we observed that in the logarithmic scale, the maximum and minimum values of the FI are equidistant from 0, the logFI value where friendship paradox is not observed. Therefore in the “Experimental results” section, we study the correlation between  logFI and TLA, because  logFI better captures the deviation from the scenario where friendship paradox does not occur.
Theoretical results on friendship index
A lower bound on global measures of fI
We first state a simple result that will be used in this section.
Lemma 1
For any positive number x, y and monotonically increasing functions \(f,g: \mathbb {R} \rightarrow \mathbb {R}\) such that f(x),g(x)≥0 for x≥1, we have
Proof
For x, y≥1, we have
□
With f(x)=x, and g(x)=x^{2}, the above lemma leads to the following corollary.
Corollary 1
For any edge e(i, j), we have \(\frac {d_{i}}{d_{j}^{2}}+ \frac {d_{j}}{d_{i}^{2}} \ge \frac {1}{d_{i}}+\frac {1}{d_{j}}\)
Theorem 1
For all graphs G, AFI(G)≥1. AFI(G)=1 only for the class of graphs where each connected component is a regular graph.
Proof
For a graph G, we have
Observe that for each edge (i, j), the term \(\frac {d_{j}}{d_{i}^{2}}\) appears when summing over i and \(\frac {d_{i}}{d_{j}^{2}}\) appears when summing over j. Therefore, rather than the double sum in (13), we can have
where the last step follows by noticing that for each node i, the term \(\frac {1}{d_{i}}\) appears for exactly d_{i} edges. Also, equality in (14) occurs if and only if all the nodes on either side of an edge have the same degree, i.e., all nodes in a connected component have the same degree. □
With f(x)= logx, and g(x)=x, Lemma 1 leads to the following corollary.
Corollary 2
For any edge e(i, j), we have \(\frac {\log d_{i}}{d_{j}}+ \frac {\log d_{j}}{d_{i}} \ge \frac {\log d_{i}}{d_{i}}+\frac {\log d_{j}}{d_{j}}\)
Theorem 2
For all graphs G, GFI(G)≥0. GFI(G)=0 only for the class of graphs where each connected component is a regular graph.
Proof
Using the expression for GFI, we obtain for a graph G
Applying Jensen’s inequality in (16), we obtain
where the last step is obtained by noticing that for each node i, the term \(\frac {1}{d_{i}} \log d_{i}\) appears for exactly d_{i} edges. Again, equality occurs if and only if both nodes on either side of an edge have the same degree. □
Theorem 1 has been proven by Jackson (2016)(Lemma 1), while Theorem 2 is a new result. Theorems 1 & 2 show that both the AFI and GFI will be at their minimum values for the class of regular graphs, where the friendship paradox does not occur. For any other graph, the AFI and GFI will be strictly greater than their minimum values, thereby suggesting the occurrence of the friendship paradox due to an imbalance between neighbor degrees. This is in agreement with the observation due to Feld (1991), where he noted that the mean number of friends of friends is always greater than the mean number of friends, i.e.,
with equality taking place in (18) only for regular graphs where the empirical degree distribution has zero variance. This demonstrates a wellbehaved property for both of the proposed global metrics.
ErdosRenyi graphs
We study the behavior of the friendship index for the class of ErdosRenyi (ER) graphs (Erdos and Rényi 1960), a basic graph model for capturing random connections in networks. Consider a sequence of ER graphs \(\{\mathbb {G}_{1},\mathbb {G}_{2},\ldots \}\), such that \(\mathbb {G}_{n}=(V_{n},\mathbb {E}_{n})\) with V_{n} being the vertex set {1,2,…,n}, and \(\mathbb {E}_{n}\) being the random edge set with every edge occuring with probability p. We define the following notation: For nodes i and j, let {i∼j} denote the event of i being connected to j. Let D_{n, i} and \(\mathcal {N}_{n,i}\) denote the degree and the set of neighbors of node i in \(\mathbb {G}_{n}\). In a random graph setting, the expected friendship index of node i in \(\mathbb {G}_{n}\) can be expressed as
Although we will primarily focus on the expected behavior of the FI of a node, it is worth noting that the AFI will exhibit similar behavior, because
with, AFI_{n} being the AFI of graph \(\mathbb {G}_{n}\).
Case 1: np constant
Considering the first term in (19), we have
As n→∞, D_{n, i}⇒_{d}Π(c) under the regime np→c. We know that if X_{n}⇒_{d}X, then \(\mathbb {E}[{f(X_{n})}] \rightarrow \mathbb {E}[{f(X)}]\) for bounded Lipshitz functions f. The mapping \(f:x \to \frac {1}{x}, \ x \in \mathbb {N}\) is bounded by 1, and Lipschitz because \(f(x)f(y)= \left  \frac {1}{x}  \frac {1}{y} \right  \leq xy\). Using this result in (21), we obtain
under the regime np→c, with c>0. We set the limit as
Simple bounds on Friendship Index: Using (22), we first try bounding the FI, since the expression cannot be exactly computed.
Lemma 2
For sequence of ErdosRenyi graphs \(\{\mathbb {G}_{n}, n=1,2,\ldots \}\)
under the regime np→c.
Proof
Using the limiting expression of FI, we obtain for c>0
For obtaining the lower bound we only use the first term from the infinite sum
and also note that \(\mathbb {E}[{{FI}_{n}(i)}] \geq 1\) since \(\mathbb {E}[{{FI}_{n}(i)}] = \mathbb {E}[{{AFI}_{n}}] \geq 1\) using Theorem 1. □
Theorem 3
For a sequence of ErdosRenyi graphs \(\{ \mathbb {G}_{n}, n=1,2,\ldots \}\) under the regime np→c, and node i, we have (a) \({\lim }_{n \to \infty } \mathbb {E}[{{FI}_{n}(i)}] \rightarrow 1\)as c →0; (b) \({\lim }_{n \rightarrow \infty } \mathbb {E}[{{FI}_{n}(i)}] \rightarrow 1\)as c →∞; (c) there exists δ>0such that for all c∈(0,δ), \({\lim }_{n \rightarrow \infty } \mathbb {E}[{{FI}_{n}(i)}]>1\).
Proof
Part (a) of Theorem 3 can be proved using the upper bound of Lemma 2 and also by using the lower bound \(\mathbb {E}[{{FI}_{n}(i)}]= \mathbb {E}[{{AFI}_{n}}]\geq 1\).
Lemma 2 does not help in determining the behavior of β for c large, because the upper and lower bounds diverge as c increases. Therefore, we study the behavior of \(\beta : \mathbb {R} \rightarrow \mathbb {R}\) over the range of c analytically. For c∈(0,C), with C>0, the infinite sum β(c) converges uniformly on the specified range. Thus, differentiating β(c) w.r.t. c yields
For c large and β(c)>1, we have β^{′}(c)<0. Furthermore for c large, β^{′}(c)≈−β(c)+1 implying that \({\lim }_{c \rightarrow \infty } \beta (c) = 1\). This to Theorem 3(b).
Part(c) follows from the lower bound in Lemma 2 and observing that the lower bound is greater than 1 for c∈(0,δ) (for some δ>0). □
Therefore, the result indicates that for c small, the effect of the friendship paradox is weak when measured in terms of AFI. This is probably because the graph is so sparse that there are a lot of isolated nodes and edges, and the paradox does not occur for the two entities. This is true for c large as well because the graph becomes very dense. Also, from Theorem 3(c), we prove that the AFI is strictly greater than 1 for a certain range of parameter c suggesting that the friendship paradox is observed even in large ErdosRenyi graphs.
Note that the average degree of the ER graph in this regime is (n−1)p which converges to c. Theorem 3 can also be thought to imply that in random ER graphs, as the average degree of the graph becomes small or gets large, the friendship paradox becomes weaker. For c large, the growth in every node’s degree and their neighbors’ degrees are such that the friendship paradox is not observed at any chosen node. This is rather intuitive, by recalling that FI is defined as the ratio between the degree of a node’s random neighbor and its own degree. If the degrees get large in an ER graph where the degree rvs are uncorrelated, then the sampling bias is expected to be small, which is exactly what we observe.
Case 2: p constant
We also address the question whether the friendship paradox is observed when the probability of connection p is kept constant as n goes to infinity. We return to the expression of the friendship index
As n gets large, the second term in (28) behaves as
We define the edge rvs as χ_{i, j}, where χ_{i, j}=1 if the edge exists between nodes i and j, 0 otherwise. We write the first expression in (28) as follows
where we have defined,
as the excess degree of node j with respect to the edge (i, j).
Lemma 3
Proof
We use the BorelCantelli lemma to show the a.s. convergence result. For ease of exposition the detailed proof is shown in Appendix “Proof of lemma 3”. □
Furthermore, we have
and
The convergence results (31)(33) when applied to (30) yields
which together with (29) leads to Theorem 4.
Theorem 4
For sequence of ErdosRenyi graphs \(\{\mathbb {G}_{n}, n=1,2,\ldots \}\)
under the regime p constant.
Also, using the dominated convergence theorem we can also prove convergence in expectation. Details provided in Appendix “Proof of theorem 5”.
Theorem 5
For sequence of ErdosRenyi graphs \(\{\mathbb {G}_{n}, n=1,2,\ldots \}\)
under the regime p constant. Therefore,
Theorems 4 & 5 together imply that the friendship paradox is not observed for large ER graphs when p is kept constant. This suggests that the degree imbalances in every node’s neighborhood vanish for large graphs under the mentioned regime. One can show that with p held constant, degree of individual nodes diverge in the large graph limit. The intuition for this result is similar to that for the np constant regime with c large, although the two regimes are very different mathematically.
BarabasiAlbert graphs
We analyze the behavior of the AFI in BarabasiAlbert graphs (1999), a well known model for understanding powerlaw behavior in networks. The sequence of BA graphs \(\{ \mathbb {G}_{0}, \mathbb {G}_{1}, \ldots \}\) are such that \(\mathbb {G}_{0}\) is the initial graph, and the graph grows from \(\mathbb {G}_{t}\) to \(\mathbb {G}_{t+1}\) by adding a node labelled t+1 and an edge preferentially between t+1 and a node Q_{t} chosen from the graph \(\mathbb {G}_{t}\) preferentially on the basis of degree.
The joint degree distribution p_{T}(k,ℓ) is defined as the joint probability of having an edge between two nodes with degrees k and ℓ in graph \(\mathbb {G}_{t}\). This can be written as
where D_{T, t} and \(D_{T,Q_{t}}\) are the degrees of nodes t and Q_{t} in graph \(\mathbb {G}_{t}\) respectively.
The limiting joint degree distribution exists and is defined as (Fotouhi and Rabbat 2013),
Let AFI for the graph \(\mathbb {G}_{t}\) be defined as AFI_{T}. We actually prove that AFI_{T} diverges with respect to T.
Theorem 6
For sequence of BarabasiAlbert graphs \(\mathbb {G}_{T}, T=1,2,\ldots \), the expected values of AFIs, \(\mathbb {E}[{{AFI}_{T}}]\) diverges.
Proof
The following proof uses the fact that the sum of FIs for all the nodes is greater than the sum of the FIs of nodes with degree 1. Using this insight we have
We continue by exchanging the order of the finite double sum and only summing for degrees d≤M for some fixed positive integer M
Using (38) with k=d and ℓ=1, we obtain on letting T go to infinity
Observe that the sum in (41) diverges as M is taken to infinity, and therefore the result follows. □
The above result suggests that the friendship paradox keeps growing as the size of the graph grows. This is potentially due to the increasing influence of hubs as the size of the network grows. This is in stark contrast to the behavior of assortativity, which is known to converge to 0 as the size of the BA graph grows (Newman 2002).
Experimental results
Network models
We study three network models  ErdosRenyi (ER) graphs for modeling random connections; BarabasiAlbert (BA) graphs for modeling powerlaw behavior; and WattsStrogatz (WS) graphs known for capturing smallworld networks. We study how the global metrics like AFI, GFI, and assortativity behave as the parameters of the models are varied; and connect the experimental results with our theoretical findings on ER and BA graphs obtained in the previous section. We also study the correlation between the local FI metrics and local assortativity (TLA). Specifically, we compute the Pearson, Spearman, and KendallTau correlation coefficients between  logFI and TLA because the intuition developed in “Theoretical results on friendship index” section suggests that  logFI and TLA must be negatively correlated – Any deviation from the scenario where the friendship paradox does not occur is best captured by  logFI, and this measure is expected to increase as the local assortativity (TLA) reduces.
ErdosRenyi graphs (1960). In Fig. 1, we show the AFI, GFI and assortativity plots for n=100,200,300,400,500 and varying 0≤p≤1. We observe that the phenomenon of the friendship paradox is minimal for both small and large values of p, with the global FI indices reaching a maximum somewhere in between. The intuition behind this finding is that as the probability of connection, p, is varied keeping n fixed, three regimes occur – (a) for small values of p, the graph mostly consists of isolated nodes and edges, which leads to a low value of AFI and GFI; (b) for large values of p, the graph is very close to a clique, which again exhibits low friendship paradox, and therefore low values of global FI metrics; (c) the global FI metrics reaches the maxima for intermediate values of p. From Theorems 3 & 4 we know that friendship paradox occurs in the regime np constant and does not occur for p constant. And from our experiments we infer that the friendship paradox is observed in the intermediate range of p, which shrinks and gets closer to 0 as the size of the graph grows. This corresponds to the np constant regime, where the friendship paradox is known to occur. Furthermore, we observe that for any fixed value of sufficiently large p such that the realized graphs are dense enough (p>0.04 in Fig. 1), increasing the size of the graph n, reduces the AFI. This validates the theoretical result Theorem 4, which states that the AFI should converge to 1 as n grows large keeping p fixed.
On the other hand, assortativity is observed to be very close to 0 for almost all values of n and p. Thus even for the class of ER graphs, the global FIs exhibit behavior somewhat different from assortativity. We also observe from the heatplot that the local FI distribution peaks at 1 for all values of p, with the peak being very prominent for low values of p, less prominent as p increases, and again becomes prominent as p increases even further. This agrees with the plots on the global metrics of AFI and GFI. In Fig. 4, we show the correlation coefficient between the  logFI values and the Tlocal assortativities of individual nodes. We observe that as p increases, the correlation coefficients start from high negative values close to −1, then reduce in magnitude, and then again approach −1 as p is further increased. This implies that for sparse and dense ER graphs (p<0.8), the  logFI is strongly anticorrelated to TLA. This can be explained by our previous plots which show that the friendship paradox is strong only for a range of values of p, while it is weak for small and large values of p. Our interpretation is that for this particular range of p values where the friendship paradox exists, the anticorrelation between  logFI and TLA reaches a minima. However, when p is further increased beyond 0.8, the correlation coefficients become unstable because both the  logFI and the TLA values approach 0.
BarabasiAlbert graphs (Barabási and Albert 1999). In Fig. 2, we observe that the AFI and GFI increase with number of nodes n, keeping number of edges m fixed, which agrees with the theoretical result Theorem 6, and decrease with increasing m for fixed values of n. The intuition behind this is that with fixed m, higher values of n lead to greater occurrence of the friendship paradox, because the disparity in the degrees of the hubs and leaves or low degree nodes become greater. This is evident by looking at star graphs and how the global FIs increase with graph size. Increasing m and keeping n fixed leads to lower global FIs, because the low degree nodes now have larger degrees to begin with. The assortativity approaches 0 with increasing m for reasons similar to that of the FIs. However, assortativity approaches 0 with increasing n, because of the limiting behavior of BA graphs. The heatplot of the local FI distribution indicate a peak at 1, thereby agreeing with the plots of AFI and GFI.
We also show the correlation coefficient plot in Fig. 4 between  logFI and TLA for BA graphs with n=1000 and varying m. We observe the two measures to be anticorrelated, with small and larger values of m leading to stronger anticorrelation.
WattsStrogatz graphs (1998). In Fig. 3 we show the behavior of AFI and assortativity as the parameters of the network models such as the size of the graph n, probability of rewiring p or average degree k are changed. We observe that with graph size fixed at n=200 nodes, increasing p increases the AFI; and, increasing k for fixed p reduces the AFI. The reason for this behavior is that with increasing p, the graphs become more heterogeneous and hence the friendship paradox strengthens. Also, fixing p and increasing k reduces the AFI because larger average degree weakens the effect of the friendship paradox. We also observe that increasing n keeping p and k fixed increases the AFI which saturates beyond a certain point.
From Fig. 3, we also observe that increasing p reduces the assortativity; While for a fixed p, increasing k gets the assortativity to be closer to 0. The reason for this observation is similar to that stated in the previous paragraph, in that increasing p makes the graph more heterogeneous and leads to the formation of hubs, while fixing p and increasing k reduces the influence of hubs. We also observe that for a larger value of n, the graphs become dissassortative for smaller values of p, the reason being that hubs form for smaller values of p in larger graphs. Also Fig. 4 show strong anticorrelation between  logFI and TLA for the different parametric settings being considered.
While we study the behavior of local and global metrics on network models, we also observe that the FI is related to assortativity in that  logFI is anticorrelated to TLA for most regimes in the three network models being studied. This strengthens the argument that the phenomenon of friendship paradox and assortative mixing are related.
Real world networks
Description of networks. We consider Social(S) networks like the Hamsterster, Brightkite, Douban, Gowalla and Hyves datasets; Human Social(HS) networks of Jazz musicians and the Zachary karate club; Human contact(HC) network at the ACM Hypertext conference held in Turin and the INFECTIOUS:STAY AWAY exhibit at Dublin, both in 2009; Computer(C) networks formed by Route views and the Internet topology; and, Infrastructure(I) networks like the Power grid and the Euroroad datasets. All of the network datasets were taken from the Koblenz network collection (Kunegis 2013).
Study on the local metrics. In Figs. 5 and 6, we show the scatter plot between the local FI values and the Tlocal assortativities for the Hamsterster and the EuroRoad networks respectively. We observe that when Tlocal assortativity is positive the logFI is close to 0, while for negative local assortativity the logFI could be either more negative or positive. This gives rise to a bellshaped curve between local assortativity and logFI, and negative correlation between local assortativity and  logFI as observed in the scatter plots. Therefore, if the local assortativity is positive then the friendship paradox would be low, leading to logFI being close to 0. However, if the local assortativity is negative, then there would be a significant occurrence of the friendship paradox, thereby leading to either more negative or positive values of logFI, and hence higher  logFI. This justifies the significant negative correlation between TLA and  logFI as observed in Table 1.
Study on the global metrics. We consider 13 real networks of different types. We report basic network parameters such as number of nodes and edges, global metrics such as assortativity r and the proposed metrics AFI and GFI. We also report the Pearson correlation coefficient ρ_{P} and Spearman’s rank correlation coefficeint ρ_{s} between the local FI measure in absolute value,  logFI, and local assortativity r_{T}.
We observe that while most of the social networks have negative assortativity, all of them exhibit the friendship paradox very strongly. On the other hand, the human social and contact networks exhibit the friendship paradox very weakly partly due to the small size of the network. Computer networks like the Route views and Internet topology show negative assortativity and a strong friendship paradox while infrastructure networks show positive assortativity and a weak friendship paradox. Experimental results from Momeni and Rabbat (2018) also show that networks with high assortativity (Collaboration in Figure 2 of the paper) have many nodes that do not exhibit friendship paradox while most nodes exhibit the paradox for networks with negative asssortativity (Friendster). Lee et al. (2019) also observed that the medianbased and the fractionbased perception models correlate negatively with assortativity, which does not quite hold for the meanbased model. Note that the aggregate measures used by Lee et al. (2019) are based on binary measures corresponding to each node, which is substantially different from our setting. As argued previously, we also observe that local assortativity is negatively correlated to  logFI very strongly for most considered networks.
Concluding remarks
We first propose a metric called the friendship index that captures the phenomenon of the friendship paradox, locally, from the perspective of an individual node. We then aggregate these metrics to globally capture the networklevel friendship paradox. The arithmetic mean of the FI values, AFI, is found to have operational significance in sampling, and is also found to be mathematically suited for analysis of network models. On the other hand, the geometric mean, GFI, is found to better adjust the range of FI values about the FI value for which the friendship paradox does not occur. We lower bound the global metrics, AFI and GFI, and show that the lower bound is achieved only for the class of regular graphs where the friendship paradox does not occur, thereby showing that the proposed aggregate measures are wellbehaved. We theoretically demonstrate that large random (ER) graphs with average degree kept constant exhibit the friendship paradox. However, if the connection probability is kept constant, then large ER graphs do not exhibit the paradox anymore. The results indicate that the sampling bias disappears for large random graphs as the average degree of nodes gets large or very small. We also theoretically show that the paradox exists for the BA graph and gets stronger as the size of the graph grows. This behavior is very different from global assortativity which is known to converge to 0 for both the ER and BA graphs. This is because the friendship index measures the sampling bias by choosing a random neighbor instead of a random node, and the assortativity only compares the degrees of the two endpoints of a random edge. Nevertheless, the two phenomena are not unrelated.
In fact, the local FI measure can shed light on the local assortativity of the network. Experimental results on network models and real world networks suggest that the local FI measures and local assortativity (TLA) are closely related. High values of TLA would mean that the node is connected to similar nodes and the  logFI measure would be close to 0, while small values of TLA would mean that the node is connected to dissimilar nodes and  logFI would be greater. More broadly, the FI measure captures the imbalance between a node and its neighbors’ degrees along with the direction of imbalance, while assortativity only serves as a measure for indicating imbalance between two nodes connected by a random edge. In conclusion, although the friendship paradox and assortativity measures have very different functional forms due to their widely different motivations, they are nonetheless related to each other. Future work could focus on finding theoretical relationships between the two concepts in general graphs, or in certain classes of graphs where the analysis is tractable.
Appendix
Proof of lemma 3
To prove Lemma 3, we use the BorelCantelli lemma (Billingsley 2008) which we state below.
Lemma 4
Let E_{1},E_{2},… be a sequece of events in some probability space. If the sum of the probabilities of {E_{n}} is finite,
then the probability that infinitely many of them occur is 0, i.e.,
For a fixed ε>0 and n=1,2,…, we define the following event,
We will use bounding argument on these events and apply the BorelCantelli lemma to prove a.s. convergence.
Using Markov’s inequality, we have the upper bound
The second term in (42) goes to 0 exponentially fast. The first term can be written as
We work with the term in the inner expectation,
For a given choice of q, r, s, t, we define
The first term in the product can be written as
Observe that, the terms (χ_{q, z}−p), z=r, s, t is mutually independent with respect to terms \( \left (\Omega ^{\prime }_{i,z}  (n5)p \right), \ z=r,s,t\). Therefore, such terms will not contribute. The product between terms of the form (χ_{q,z}−p), z=q, r, s, t will be finite and bounded by a constant say c_{0}. Therefore, we can only consider the product of the terms \( \left (\Omega ^{\prime }_{i,z}  (n5)p \right), \ z=q,r,s,t\). There are several types of products that have to be considered separately – (a) For q≠r≠s≠t, the contribution is 0 because all the terms are mutually independent; (b) For q=r, r≠s≠t, the terms \( \left (\Omega ^{\prime }_{i,z}  (n5)p \right) \) for z=s, t are mutually independent with respect to z=q. Hence, such terms also do not have any contribution; (c) For q=r≠s=t, the two terms (Ωi, z′−(n−5)p)^{2}, z=q, s will have a contribution, which will be discussed later; (d) For q=r=s≠t, term \( \left (\Omega ^{\prime }_{i,t}  (n5)p \right)^{2}\) is independent of the other three terms. The expectation of this term is 0, therefore leading to no contribution. (e) For q=r=s=t, the contribution of such terms need to be considered.
Using the above insights and continuing from (44), we have
We write
and
where \(c_{0} = \mathbb {E}\left [{\left (\chi _{1,2}  p \right)^{2}}\right ]\), and \( c_{1} = \mathbb {E}\left [{\left (\chi _{1,2}  p \right)^{4}}\right ] \). Substituting (46) and (47) into (45), we obtain
Substituting (48) to (42), we obtain
Observe that,
holds and therefore the result follows using BorelCantelli Lemma.
Proof of theorem 5
We will use bounded convergence theorem (Billingsley 2008) to prove convergence in expectation of the FIs.
Recall that
Since 1[D_{n, i}=0]≤1, we only consider the first term in (51). In order to apply ChernoffHoeffding bound we consider the event {D_{n, i}<n(p−δ)}, and fix δ>0. We have the following decomposition
The first term in (52) is upper bounded as
where the upper bound follows by noting that the sum of the degree of neighbors of i is bounded by n^{2}. Furthermore, we also have the upper bound
where D(pp−δ) is the KL divergence between Bernoulli rvs with parameters p and p−δ and the final step is obtained by ChernoffHoeffding bound. Since n^{2}e^{−nD(pp−δ)}→0 as n→∞, there exists a constant c_{δ} such that
Therefore using the upper bounds (53) and (55), we obtain that
Since all the rvs FI_{n}(i) are bounded, the bounded convergence theorem yields the result.
Availability of data and materials
All of the network datasets were taken from the Koblenz network collection, which is available at http://konect.unikoblenz.de/.
Notes
 1.
This is a specific type of homophily, using degree as a trait.
Abbreviations
 AFI:

Arithmetic friendship index
 BA:

BarabasiAlbert
 ER:

ErdosRenyi
 FI:

Friendship index
 GFI:

Geometric friendship index
 HFI:

Harmonic friendship index
 TLA:

Thedchanamoorthy’s local assortativity
References
Bagrow, JP, Danforth CM, Mitchell L (2017) Which friends are more popular than you?: Contact strength and the friendship paradox in social networks In: Proceedings of the IEEE/ACM Int’l Conference on Advances in Social Networks Analysis and Mining 2017, 103–108.. ACM, New York.
Barabási, AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512.
Billingsley, P. (2008) Probability and measure. Wiley, New York.
Bollen, J, Gonçalves B, van de Leemput I, Ruan G (2017) The happiness paradox: your friends are happier than you. EPJ Data Sci 6(1):4.
Eom, YH, Jo HH (2014) Generalized friendship paradox in complex networks: The case of scientific collaboration. Sci Rep 4:4603.
Erdos, P, Rényi A (1960) On the evolution of random graphs. Publ Math Inst Hung Acad Sci 5(1):17–60.
Feld, SL (1991) Why your friends have more friends than you do. Am J Sociol 96(6):1464–77.
Fotouhi, B, Rabbat MG (2013) Degree correlation in scalefree graphs. Eur Phys J B 86(12):510.
Fotouhi, B, Momeni N, Rabbat MG (2014) Generalized friendship paradox: An analytical approach In: Int Conf Soc Inform, 339–352.. Springer, Cham.
Hodas, NO, Kooti F, Lerman K (2013) Friendship paradox redux: Your friends are more interesting than you. Int AAAI Conf Web Soc Med (ICWSM) 13:8–10.
Jackson, MO (2016) The friendship paradox and systematic biases in perceptions and social norms. Journal of Political Economy 127(2):777–818.
Jo, HH, Eom YH (2014) Generalized friendship paradox in networks with tunable degreeattribute correlation. Phys Rev E 90(2):022809.
Kunegis, J (2013) KONECT: The Koblenz network collection. http://konect.unikoblenz.de/.
Lee, E, Lee S, Eom YH, Holme P, Jo HH (2019) Impact of perception models on friendship paradox and opinion formation. Phys Rev E 99(5):052302.
Leskovec, J, Faloutsos C (2006) Sampling from large graphs In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 631–636.. ACM, New York.
Momeni, N, Rabbat M (2016) Qualities and inequalities in online social networks through the lens of the friendship paradox. PloS one 11(2):e0143633.
Momeni, N, Rabbat M (2018) Effectiveness of alter sampling in social networks. arXiv:1812.03096v2.
Newman, ME (2002) Assortative mixing in networks. Phys Rev Lett 89(20):208701.
Pal, S, Yu F, Novick Y, Swami A, BarNoy A (2018) Quantifying the strength of the friendship paradox In: International Workshop on Complex Networks and Their Applications, 460–472.. Springer, Cham.
Piraveenan, M, Prokopenko M, Zomaya AY (2010) Classifying complex networks using unbiased local assortativity In: Conf Artif Life, 329–336.
Thedchanamoorthy, G, Piraveenan M, Kasthuriratna D, Senanayake U (2014) Node assortativity in complex networks: An alternative approach. Procedia Comput Sci 29:2449–2461.
Watts, DJ, Strogatz SH (1998) Collective dynamics of ’smallworld’networks. Nature 393(6684):440.
Acknowledgements
SP acknowledges that his prior discussions with Prof. Armand Makowski at University of Maryland College Park was helpful in connecting the definition of friendship index and assortativity with that of various sampling methods.
Funding
Research was sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF0920053 (the ARL Network Science CTA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on. This document does not contain technology or technical data controlled under either the U.S. International Traffic in Arms Regulations or the U.S. Export Administration Regulations.
Author information
Affiliations
Contributions
SP worked on conceptualization, methodology, theoretical analysis, experimental design, writing, review and editing, and acquisition of funding for the paper. FY worked on conceptualization, methodology, software, and theoretical analysis. YN worked on conceptualization, experimental analysis, and writing. AS worked on conceptualization, supervision of research, and review and editing. ABN worked on conceptualization, supervision of research and acquisition of funding. All authors read and approved the final manuscript.
Corresponding author
Correspondence to Feng Yu.
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Pal, S., Yu, F., Novick, Y. et al. A study on the friendship paradox – quantitative analysis and relationship with assortative mixing. Appl Netw Sci 4, 71 (2019). https://doi.org/10.1007/s4110901901908
Received:
Accepted:
Published:
Keywords
 Friendship paradox
 Assortativity
 Assortative mixing
 Network analysis
 Network models
 Homophily