The solution of Problem 1 includes two steps:

1
extract a backbone graph \(\mathcal {G}_b=(V,E_b,p_b)\) from the original graph \(\mathcal {G} = (V,E,p)\) such that \(E_b = \alpha E\), where \(\alpha\) is the sparsification ratio,

2
modify the probability of the edges in \(E_b\) such that nodes’ ego betweenness is as close as possible to their value in the original graph. The resulting graph is a sparsified graph \(\mathcal {G}' = (V, E', p')\), where \(E' = E_b = \alpha E\). As we see in “Solution framework” section, \(E' = E_b\) in the output of the GradientDescent algorithm (see “Gradientdescent (GD)” section) and \(E' \subset E\) in the output of the ExpectationMaximization algorithm with \(E'\) and \(E_b\) not necessarily equal (see “Expectationmaximization (EM)” section).
Therefore, we define ego betweenness discrepancy as follows:
Definition 2
(ego betweenness discrepancy) Given a probabilistic network \(\mathcal {G}\) and a sparsified network \(\mathcal {G}'\), ego betweenness discrepancy of node u is:
$$\begin{aligned} \delta (u)\,=\,EB_{\mathcal {G}}(u)  EB_{\mathcal {G}'}(u) \end{aligned}$$
(3)
where, \(EB_{\mathcal {G}}(u)\) is the ego betweenness of u in \(\mathcal {G}\).
Formally, in the second step of the solution we aim to minimize \(D = \sum _{v \in V} \delta (v)\). Linear programming (LP) is a possible solution to get the global minimum D. However, it has been shown that not only it is inefficient on large graphs, but also it does not explicitly reduce entropy (Parchas et al. 2018). Therefore, we adapt GradientDescent (GD) and ExpectationMaximization (EM) algorithms as in Parchas et al. (2018) to approximate the optimal probability adjustment in a small proportion of time compared to LP while decreasing the entropy. Since in both algorithms we need to have a differentiable function as the objective function, and \(\sum _{v \in V} \delta (v)\) is not differentiable at 0, then we use \(D = \sum _{v \in V} \delta ^2(v)\) (Parchas et al. 2018) as the objective function hereinafter.
Backboning
In this section we introduce concisely four backboning methods that have been utilized in this research.

1
Noise corrected (NC): The first backboning method is a simplified version of the noise corrected method (Coscia and Neffke 2017; Coscia and Rossi 2019) in which an edge is kept if its probability is higher than the ratio of the sum of the expected degree of its incident nodes divided by the total number of edges connecting to these two nodes. In the NC backboning algorithm the edge under consideration is excluded in the calculation of the ratio.

2
Maximum Spanning Tree (MST): The second method is the iterative spanning tree method (Nagamochi and Ibaraki 1992; Parchas et al. 2018). First we construct the backbone graph and initialize it with the same set of nodes in the original graph and empty set of edges. Then, in the first iteration of the algorithm we remove the edges of the spanning tree of \(\mathcal {G}\) and add them to the backbone. After that in each iteration we compute the spanning tree/forest of the remaining graph and move the selected edges from remained graph to the backbone. This procedure is repeated until the backbone includes \(\alpha E\) edges.

3
Monte Carlo (MC): The third method is MonteCarlo sampling through which \(\alpha E\) edges of the input graph are sampled.

4
Hybrid (MST/MC): The forth method is the combination of the second and the third methods (Parchas et al. 2018). First \(\alpha 'E\) edges where \(\alpha ' < \alpha\) are selected via the iterative spanning tree method and then \((\alpha  \alpha ')E\) edges are sampled via the MonteCarlo sampling method.
We illustrate the differences between the backboning methods on the example of a complete graph \(K_{10}\) as shown in Fig. 2. For all backboning methods we repeat the procedures as long as the backbone maintained as single connected component. The first column in Fig. 3 shows the four resulting backbones with \(\alpha\) = 0.31 resulting from NC, MST, MST/MC (\(\alpha '\) = 0.155) and MC methods respectively. Although the edges with the highest probabilities are most likely to be represented, there are still considerable differences among the four resulting backbones (e.g., the edge with probability .95 is only present in three of the four backbones.
In the following two sections we describe the GradientDescent and the ExpectationMaximization algorithms where the first one modifies edges’ probabilities and the second one rewires backbones as well as modifies edges’ probabilities.
Gradientdescent (GD)
Given \(\mathcal {G}_b=(V,E_b,p_b)\), the GradientDescent algorithm picks one edge \(e \in E_b\) in each iteration and optimizes that edge’s probability. To achieve this goal, in each iteration we have to reduce the objective function \(D = \sum _{v \in V} \delta ^2(v)\). According to Eq. 3, if the probability of the edge \(e = (u,v)\) changes by \(\partial p_e^{i+1}\) at iteration \(i+1\), then the discrepancies of two groups of nodes will change; first the discrepancies of incident nodes to that edge, i.e., u and v and second the discrepancies of common neighbors of the incident nodes. The discrepancies of those nodes that are not members of these two groups do not change because of \(\partial p_e^{i+1}\). Then the derivative of the objective function at iteration \(i+1\) with respect to the change of \(p_e\) at that iteration is:
$$\begin{aligned} \frac{\partial D^{i+1}}{\partial p_e^{i+1}} =  2 \frac{\partial \delta ^{i+1}(u)}{\partial p_e^{i+1}}  2 \frac{\partial \delta ^{i+1}(v)}{\partial p_e^{i+1}} + 2 \sum _{w \in W(u,v)} \frac{\partial \delta ^{i+1}(w)}{\partial p_e^{i+1}} \end{aligned}$$
(4)
where W(u, v) is the set of common neighbors of nodes u and v.
In Fig. 4, if the probability of edge \(e = (u,v)\) increases, the ego betweenness of node u increases, because first nodes \(x_1\) and \(x_2\) rely more on u to be connected to v and second node v relies relatively more on node u to connect to common neighbors \(w_1\) and \(w_2\) if compared to direct connections, i.e., \((v,w_1)\) and \((v,w_2)\). At the same time, if the probability of e increases, the ego betweenness of the common nodes \(w_1\) and \(w_2\) decreases as nodes v and u rely relatively more on their adjacent edge (u, v) in comparison to the twohop paths that cross nodes \(w_1\) and \(w_2\).
In the following we express the change of discrepancies based on the change of \(p_e\) (probability of edge (u, v) in Fig. 4) at iteration \(i+1\) based on the aforementioned intuitions. Equations 5 and 6 represent the change of discrepancies on nodes adjacent to the edge e, and Eq. 7 shows the change of discrepancies of the common neighbors of u and v.
$$\begin{aligned}&\delta ^{i+1}(u) = \delta ^{i}(u)  (p_{uv}^{i+1}  p_{uv}^{i}) \Bigg (\underbrace{\sum _{w \in W(u,v)}p_{uw}(1p_{vw})+\sum _{x \in N(u)W(u,v)}p_{ux}}_{C_u}\Bigg ) \end{aligned}$$
(5)
$$\begin{aligned}&\delta ^{i+1}(v) = \delta ^{i}(v)  (p_{uv}^{i+1}  p_{uv}^{i}) \Bigg (\underbrace{\sum _{w \in W(u,v)}p_{vw}(1p_{uw})+\sum _{y \in N(v)W(u,v)}p_{vy}}_{C_v} \Bigg ) \end{aligned}$$
(6)
$$\begin{aligned}&\delta ^{i+1}(w_j) = \delta ^{i}(w_j)  (p_{uv}^{i+1}  p_{uv}^{i})(\underbrace{p_{uw_j}p_{vw_j}}_{C(w_j)}) \end{aligned}$$
(7)
where, N(u) are the neighbors of u, and \(W(u,v) = N(u) \cap N(v)\) is the set of common neighbors of u and v, and \(p_{uv}^{i}\) is the probability of edge (u, v) at iteration i. Hence, by calculating \(p_{uv}^{i+1}\) as follows, we will be assured that \(\sum _{v \in V} \delta ^2(v)\) will get one step closer to the local minimum value:
$$\begin{aligned} p_{uv}^{i+1} = p_{uv}^{i} + \Delta p \,\,\,\,,\,\,\,\, \Delta p = h \frac{\sum _{k \in \{u\} \cup \{v\} \cup W(u,v)} C_k \delta ^{i}(k)}{\sum _{k \in \{u\} \cup \{v\} \cup W(u,v)} {C_k}^2} \end{aligned}$$
(8)
where, \(0 < h \le 1\) is the gradient descent step size. For proof see Appendix 1.
Algorithm 1 illustrates the GradientDescent algorithm in which the objective function converges to the local minimum. In line 1, the sparsified graph is initialized with backbone graph (\(\mathcal {G}_b\)). Then, the algorithm takes iterative steps to reach a local minimum of D. In each iteration, it picks an edge and assigns a new probability to it according to Eq. 8 (line 5). Lines 610 assure that the new probability value does not violate constraint \(0 \le p \le 1\). At the beginning and at the end of each iteration the objective function is calculated in lines 3 and 12 respectively. If the difference between these two values is equal to or lower than the input threshold \(\tau _{GD}\), the algorithm finishes.
The second column in Fig. 3 shows the resulting sparsified graphs applied on the four backbones.
Expectationmaximization (EM)
Algorithm 1 only modifies the probability of the edges of the input backbone graph \(\mathcal {G}_b\). Therefore, the output sparsified graph \(\mathcal {G}'\) is dependent not only on the probability modification of the GradientDescent (GD) algorithm but also on \(\mathcal {G}_b\). The authors in Parchas et al. (2018) proposed ExpectationMaximization (EM) algorithm that both rewires \(\mathcal {G}_b\) and modifies edge probabilities.
The objective function of the EM algorithm is \(\sum _{v \in V} \delta ^2(v)\). Algorithm 2 illustrates the EM algorithm. The EM algorithm first initializes \(\mathcal {G}'\) with the input backbone graph \(\mathcal {G}_b\). Lines 220, for each edge in \(E'\) the algorithm replaces it with an edge in \(E\backslash E'\) that yields lower D. In more details, in lines 56 the selected edge is removed from \(\mathcal {G}'\) and the discrepancies of all corresponding nodes are updated. Line 7, selects the node that has the highest discrepancy, \(v_c\). In lines 815, all incident edges to \(v_c\) that are not available in the current \(\mathcal {G}'\) are examined and the one that has the maximum gain is added to \(\mathcal {G}'\). Line 18 runs the GD algorithm to modify edge probabilities of \(\mathcal {G}'\) in that iteration. At the beginning and at the end of each iteration the objective function is calculated and if the difference between these two values is equal to or lower than the input threshold \(\tau _{EM}\), the algorithm finishes.
The gain of candidate edges are calculated as follow:
$$\begin{aligned} gain(e_c) = D(\mathcal {G}')  D(\mathcal {G}'+e_c) \end{aligned}$$
(9)
where, \(D(\mathcal {G}')\) is the objective function computed on \(\mathcal {G}'\) and \(D(\mathcal {G}'+e_c)\) is the objective function computed on \(\mathcal {G}'\) after adding \(e_c\).
It should be noted that if the probability of an edge becomes zero in the final output of the GradientDescent or ExpectationMaximization algorithms, that edge will be removed from the sparsified graph. This is because according to the definition of probabilistic networks, edge probability has to be in the range (0, 1]. As a result, the condition \(E' = \alpha E\) becomes \(E' \approx \alpha E\). Notice that the condition \(E' = \alpha E\) cannot be obtained exactly in practice anyway, because it can define a noninteger number of edges. As a result, the sparsified graph will only contain approximately \(\alpha E\) edges.
An easy way to avoid probabilities to go to 0 would just be to set a minimum probability of \(\epsilon > 0\) in Algorithm 1 line 9. A small \(\epsilon\) would keep the edge but not have any significant impact on the measures. However, the objective of sparsification is to reduce the size and entropy of the network, so having an algorithm that may lead to a slightly lower size and entropy than requested is practically reasonable in our opinion, without needing any ad hoc finetuning.