 Research
 Open Access
 Published:
Understanding the limitations of network online learning
Applied Network Science volume 5, Article number: 60 (2020)
Abstract
Studies of networked phenomena, such as interactions in online social media, often rely on incomplete data, either because these phenomena are partially observed, or because the data is too large or expensive to acquire all at once. Analysis of incomplete data leads to skewed or misleading results. In this paper, we investigate limitations of learning to complete partially observed networks via node querying. Concretely, we study the following problem: given (i) a partially observed network, (ii) the ability to query nodes for their connections (e.g., by accessing an API), and (iii) a budget on the number of such queries, sequentially learn which nodes to query in order to maximally increase observability. We call this querying process Network Online Learning and present a family of algorithms called NOL*. These algorithms learn to choose which partially observed node to query next based on a parameterized model that is trained online through a process of exploration and exploitation. Extensive experiments on both synthetic and real world networks show that (i) it is possible to sequentially learn to choose which nodes are best to query in a network and (ii) some macroscopic properties of networks, such as the degree distribution and modular structure, impact the potential for learning and the optimal amount of random exploration.
Introduction
Incomplete datasets are common in the analysis of networks because the phenomena under study are often partially observed. It has been shown that analysis of incomplete networks may lead to biased results (Sanz et al. 2012; Ghosh et al. 2013; GonzálezBailón et al. 2014; Sampson et al. 2015; Alves et al. 2020). Our work seeks to address the following problem: Given a partially observed network with no information about how it was observed and a budget to query the partially observed nodes, can we learn to sequentially ask optimal queries relative to some objective? In addition to introducing new methodology, we study when learning is feasible in this problem (see Fig. 1).
We present a family of online learning algorithms, called NOL*, which are based on online regression and related to reinforcement learning techniques like approximate Qlearning. NOL* algorithms do not assume any a priori knowledge or estimate of the true underlying network structure, overall size, or the sampling method used to collect the partially observed network.
We describe two algorithms from the NOL* family, both of which learn an interpretable policy for growing an incomplete network through successive node queries. Interpretability of these algorithms is important because we want to be able to reason about when and why a policy is or is not learnable. The first algorithm, referred to as NOL^{Footnote 1}, uses online linear regression as a model for predicting the value of querying available nodes, then uses those predictions to choose the best node to query next. The second algorithm, NOLHTR, uses a heavytail regression method to account for heavytailed (equivalently heterogeneous) reward functions. This is necessary because in the case of a heavytailed reward distribution such as node degree, NOL will underpredict the objective for hubs (which are the “big and rare” instances).
We conduct experiments using NOL* algorithms on graphs generated by synthetic models, as well as realworld network data.^{Footnote 2} We focus on the following objective or reward function: discover the maximum number of initially unobserved nodes in the network through successive node querying.
Formal problem definition Given an incomplete network \({\hat {G}}_{0} = \left \{{\hat {V}}_{0}, {\hat {E}}_{0}\right \}\), which is a partial observation of an underlying network G={V,E}, sequentially learn a policy that maximizes the number of nodes \(u \in {\hat {V}}_{b}, u \notin {\hat {V}}_{0}\) after bqueries of the incomplete network. Querying the network involves selecting a node and asking an oracle or an API for all the neighbors of the selected node.
This problem is a sequential decisionlearning task that can be formulated as a Markov Decision Process (MDP), where the state of the process at any time step is the partially observed network, the action space is the set of partially observed nodes available for probing, and the reward is a userdefined function (e.g., the increase in the number of observed nodes). The goal of an MDP learning algorithm is to learn a mapping from states to actions such that an agent using this mapping will maximize its expected reward. In our case, the stateaction space of the problem can be arbitrarily large given that we make no assumptions about the underlying network; thus any solution will need to learn to generalize over states and actions from experience (i.e., answers to successive queries).
Our framework NOL* can be viewed as solving an MDP problem. Its algorithms learn models to predict the expected reward gained by probing^{Footnote 3} a partially observed node. The current model is then used at each time step to decide the best action (i.e., which node to query to observe as many new nodes as possible). In this way, NOL* algorithms choose the action that leads to maximizing the total reward over time within an arbitrary budget constraint.
NOL* algorithms query nodes in the partially observed network sequentially. That is, in each iteration the algorithm queries one node, adds all of its neighbors to the observed network, and updates the pool of partially observed nodes available to be queried in the next iteration. This makes NOL* algorithms adaptive, since the parameters of the model change based on recent experience. Initially, all nodes in the network are assumed to be partially observed, thus NOL* is agnostic to the underlying observation or sampling method. At any iteration, there are three “classes” of nodes: fully observed (probed), partially observed (unprobed but visible) and unobserved (unprobed and invisible).
In the present work, we assume networks that are undirected, unweighted, and static, meaning that the data being queried does not change over time (no insertion or deletion of nodes or edges). However, we note that our approach is flexible enough to be modified to incorporate features and objective functions that take edge directionality and weight into account. For example, one could incorporate in and out degree as features in the case of a directed network, and weighted degree in weighted networks. Our methodology can also be applied to discover previously unobserved nodes in a bipartite network, as long as there is a suitable method for querying the data. Finally, our method can be extended to allow for repeated probing of nodes under any API access model, which is useful when the network is not static.
Contributions Our contributions are as follows:

We propose a family of interpretable algorithms, NOL*, for learning to grow incomplete networks.

We present two algorithms from the NOL* family: an online regression algorithm, simply called NOL; and an algorithm that can effectively learn for heavytailed objective functions, called NOLHTR.

Our extensive experiments on both synthetic and real network data showcase the limitations of online learning to improve incomplete networks.
In the next section we summarize the literature related to the network discovery problem. Then, we describe and evaluate NOL* before closing the paper with a discussion of future directions.
Related work
Incomplete data affects numerous areas of research. It has been shown to be a problem in social networks, including in public health (Gile 2011; Wejnert and Heckathorn 2008) and economics (Breza et al. 2017), as well as the mining of large systems such as the World Wide Web (WWW) (Cho et al. 1998; Avrachenkov et al. 2014) or Internet infrastructure (Vázquez et al. 2002). It is also a problem for understanding gene regulatory networks (Sanz et al. 2012). It has also been shown that oversimplified models, for example those that do not incorporate directionality or weight of edges, can lead to anomalies in the analysis of centrality (Alves et al. 2020).
The problem of network online learning is different from network sampling. In traditional network sampling, the goal is to gather a representative sample of the underlying network from which statistical characteristics are then estimated. In our setting, the data collection is guided by an initial sample graph and a userdefined objective function, which may or may not be directly related to a notion of statistical representativeness of the data. For an excellent survey on network sampling, we refer the reader to (Ahmed et al. 2013).
One approach to growing incomplete networks is to assume that the graph is being generated by an underlying network model, then use that model to infer the missing nodes and links. Examples of this approach are (Hanneke and Xing 2019) and (Peixoto 2018), both of which model and address sampling errors within a Stochastic Block Model (Wang and Wong 1987) framework. Another example is (Kim and Leskovec 2011), which assumes Kronecker graph models (Leskovec et al. 2010). Finally, we refer the reader to (Chen et al. 2019) for a recent paper on model selection for mechanistic network models.
Avrachenkov et al. (2014) explore methods for maximally covering a graph by adaptive crawling and querying. Their work introduces the Maximum Expected Uncovered Degree (MEUD) method and shows that for a certain class of random power law networks, MEUD reduces to choosing the node with maximum observed degree for each query (which is equivalent to our high degree heuristic; see Experiments section).
MAXOUTPROBE (Soundarajan et al. 2015) estimates the degree of each observed node and the average clustering coefficient of the graph; it does not assume knowledge of how the incomplete network was collected. MAXREACH (Soundarajan et al. 2016) estimates the degree of each observed node and the perdegree average clustering coefficient. It assumes that the incomplete network was collected via random node or edge sampling and that one knows the number of nodes and edges in the fully observed graph.
Multiarmed bandit approaches are wellsuited for the problem of growing incomplete networks because they are designed explicitly to facilitate the exploration vs. exploitation tradeoff. Soundarajan et al. (2017) present a multiarmed bandit approach to network completion that trades off densification vs. expansion. Their approach also focuses on the problem of probing edges rather than nodes. Murai et al. (2018) propose a multiarmed bandit algorithm for reducing network incompleteness that probabilistically chooses from an ensemble of classifiers that are trained simultaneously. Madhawa and Murata (2019) describe a nonparametric multiarmed bandit based on a knearest neighbor upperconfidence bound algorithm, which will be described in more detail in Experiments section.
Active Exploration (Pfeiffer III et al. 2014) and Selective Harvesting (Murai et al. 2018) address variants of the problem definition where the general task is to iteratively search a partially observed network through node querying. The goal is to maximize the number of nodes in the expanded network with a particular binary target attribute. In this paper we address a different problem, where the goal is to maximally grow the network regardless of particular node attributes.
In the following section, we introduce NOL*, which provides a unified framework for interpretable and scalable network online learning. It incorporates exploration vs. exploration and does not assume how the incomplete network was originally collected or a specific model generating the underlying network.
NOL* family of algorithms
Algorithm 1 presents the general NOL* framework. The goal is to sequentially learn to predict the reward value of partially observed nodes in an incomplete network under a resource constraint or budget, b, on the number of queries we can ask an oracle or API, leveraging previous queries as sample data. One node is queried at every time step t=0,1,2,…b. The partially observed network \(\hat {G}_{t} =\left \{{\hat {V}}_{t},{\hat {E}}_{t}\right \} \subset G\), with nodes V_{t}, edges E_{t} and the list of nodes which have been already probed P_{t}, are updated by the algorithm after every query. The incomplete network \({\hat {G}}_{t}\) grows after every probe by incorporating the neighbors of the probed node j, selected from \(\left (\hat {V}_{t}  P_{t}\right)\). The number of new nodes added to the observed network by probing j in timestep t is the reward earned in that timestep, denoted by r_{t}(j).
The goal is to make a prediction about the reward to be earned by querying any partially observed node j that maximizes reward at time t. In general, given an initial state s_{0}, we wish to maximize the cumulative reward earned after b queries, given that the starting point was s_{0}:
In each time step t, we want to maximize the reward r_{t} earned by taking action a from state s_{t}. Therefore we learn to choose an action, which corresponds to choosing a node u to query, such that
where r_{t}(j) is the reward earned by querying node j from state s_{t}.
Although a more general Temporal Difference Learning (Sutton and Barto 2018) solution to this problem can be formulated, we did not find an advantage in using discounting or credit assignment in experiments. We conjecture that this is due to a combination of the finite resource constraint and the fact that we do not visit the same states multiple times in our setting, making standard planning tools less effective.
To predict the reward value of every node in \({\hat {G}}_{t}\), a feature vector \(\phi _{t}(j) \in {\mathbb {R}}^{d}\) is maintained that represents the knowledge available to the model at time t for node \(j \in \hat {G}_{t}\). Given features ϕ_{t}(i) for all nodes in \({\hat {G}}_{t}\), NOL* algorithms learn a function \({\mathcal {V}}_{\theta }: {\mathbb {R}}^{d} \rightarrow {\mathbb {R}}\) with parameter θ to predict the expected reward to be earned by probing the node. Any available information about a node can be included as a feature, but since learning is to happen online, features should be feasible for online computation. Further, it is desirable for features to be interpretable so that the resulting regression parameters can be interpreted to help understand the performance of the algorithm in a given dataset (see Appendix A.2).
Given parameter θ_{t} at time t, the predicted number of unobserved nodes attached to node \(j \in {\hat {V}}_{t}\) is estimated as \({\mathcal {V}}_{\theta _{t}}(j) = \theta _{t}\phi _{t}(j)\) such that \({\mathcal {V}}_{\theta _{t}}(j)\) should be close to the observed reward value of probing the node j at time t. At each step, the expected loss \(E_{t}\left [{\mathcal {V}}_{\theta _{t}}(j)  r_{t}(j)\right ]\) is minimized, where r_{t}(j) is the true value of the reward function for node j at time t.
In NOL, θ is updated after each probe through online stochastic gradient descent based on Strehl and Littman (2007) (see lines 1013 of Algorithm 1). If the variable of interest is heavytailed (e.g., the degree distributions of real networks exhibit this property), NOLHTR adopts the generalized median of means approach to regression with heavy tails found in Hsu and Sabato (2016). These example functions illustrate the flexibility of NOL*: the choice of reward function and learning algorithm should correspond to individual goals and circumstances. We describe the details of how we learn the parameters for NOLHTR in the next section and Algorithm 2.
Parameter estimation for NOLHTR
Our goal at every time t is to choose a node that maximizes the earned reward. However, since the reward distribution is based on node degree, and the degree distribution in many realworld networks is heavytailed (e.g. hubs are present), we expect the reward distribution of an incomplete version of the network to be heavytailed as well. In order to deal with the heavy tailed nature of the target variable, we adapt the methods from (Hsu and Sabato 2016) for regression in the presence of heavytailed distributions. This process is a generalization of the median of means approach to parameter estimation. Intuitively, the algorithm splits the previously observed feature vectors and associated rewards into k subsamples, then computes parameters ω for each subsample. Then, the algorithm chooses the set of parameters that has the minimum median distance from all of the other parameters. This procedure provides guarantees and confidence bounds on the distance of the learned parameters from the true parameters (Hsu and Sabato 2016), discussed further in Scalability & guarantees section.
Algorithm 2 presents our adopted process. At time t, t−1 nodes with feature vectors x_{i}∈X have been probed, and their corresponding reward values r_{i}∈Y observed. We select an integer k≤t and randomly sample the data into k subsamples S_{i} of size \(\frac {t}{k}\). Then, the covariance matrix Σ_{i} and maximum likelihood regression estimate ω_{i} are computed for each S_{i}. For each i, the Mahalanobis distance between ω_{i} and every other ω_{j} is computed, and the median distance is assigned to m_{i}. Finally, the ω_{i} with the minimum m_{i} value is assigned to be the next set of parameters, θ.
There are two parameters in Algorithm 2: the number of subsamples k, which corresponds to the confidence parameter δ in Hsu and Sabato (2016), and the regularization parameter λ. The number of subsamples k should be set such that the size of the subsamples, \(\frac {n}{k}\), is larger than O(d log(d)), meaning each subsample has size at least the number of dimensions in the feature vector (Hsu and Sabato 2016). In our experiments, we set k to be ln(n), where n is the number of previously queried nodes (equivalently the time step t), which allows k to grow slowly as more data is gathered through the querying process. We set the regularization parameter λ to 0. However, our feature matrices may be singular (since some subsamplings can result in feature matrices without full rank), so when computing the regression in Algorithm 2 we use the MoorePenrose pseudoinverse, which corresponds to computing the ℓ_{2} regularized parameters.
εgreedy exploration
NOL* algorithms need to learn adaptively over time because the reward distribution may change as nodes are queried. Choosing a node based on \(V_{\theta _{t}}\)exploits the current model by choosing the node with the maximum predicted reward. However, learning in networks is difficult precisely because the properties of nodes are diverse, thus it is desirable to introduce some randomness to the decision process in order to gather a diverse set of training examples. Therefore, we formulate NOL* algorithms as εgreedy algorithms, meaning with probability ε the node to query is chosen uniformly at random from the set of unprobed nodes.
In order to increase the likelihood that our random samples are informative, we choose our random nodes from those that were present in the initial network \({\hat {G}}_{0}\). The rationale behind this choice is that as a consequence of probing nodes sequentially nodes that have been in the network for longer have more “complete” information, since they have had more opportunity to be connected to in the t−1 probes before time t. That is, if a node \(j \in {\hat {G}}_{0}\) has only a few neighbors after many queries have been made, it could be that j has very few neighbors, but it could also be that j connects to a neighborhood that the algorithm has yet to discover. Therefore, to explore the possibility of learning a better model using different information, NOL* algorithms select the random node from \({\hat {V}}_{0}  P_{t}\) to probe until all nodes in \({\hat {V}}_{0}\) are exhausted, when NOL* chooses any unprobed node in \({\hat {G}}_{t}\) at random.
Since NOL* algorithms learn from all or most of the previous samples at every time step, it is not strictly necessary to continue random exploration through the entire querying process. This is consistent with the literature on εgreedy algorithms, which often systematically lower the rate of exploration over time (Kirkpatrick et al. 1983; Cheng et al. 2010). NOL* adds an optional exponential decay to the initial random jump rate ε_{0}, such that at time t the jump rate is computed as \(\epsilon _{t} = \epsilon _{0} e^{\frac {t}{b}}\). We have also experimented with datadriven methods for adaptive εgreedy, such as in Tokic (2010), but did not find them to be advantageous and leave further investigation of their utility in this space for future work.
Scalability & guarantees
In general, the computational complexity of a NOL* algorithm is the product of (1) the budget, (2) the feature computation complexity and (3) the learning complexity. In symbols, O(b×O(features)×O(learning)).
In this section, we briefly discuss the complexity of the learning steps of NOL and NOLHTR. We also note that the features used in NOL* algorithms are user defined and range from trivial to compute (e.g. degree) to computationally expensive (e.g. node embeddings). Due to this, we omit a detailed analysis of the specific features we use in our experiments (described in Experiments section), but note that since the computations are to happen online, and only nodes whose features may have changed should be updated, the complexity depends not necessarily on the total number of nodes or edges in the graph, but on the size of the neighborhood around the queried node for which the features might have changed.
The complexity of a NOL online regression update depends only on the number of features, since the most expensive operation required is multiplication of the feature vector by a constant factor. Due to this, even with a relatively highdimensional feature vector, the complexity of the learning step is trivial. In this case, the scalability of the entire process will likely hinge on the scalability of the feature value updates.
The procedure to learn parameters for NOLHTR is more complicated. The data is first partitioned into k subsamples, the covariance of each subsample is computed and optimal regression parameters are estimated for each, and finally the median of the k parameters is computed. The most expensive operation is the regression parameter estimation, which requires computing the MoorePenrose inverse of the regressor matrix. This can be computed via Singular Value Decomposition of the matrix, which has complexity O(nd^{2}), where n is the number of nodes in the computation and d is the number of features.
Hsu and Sabato (2016) also derive a bound on the loss and show that our learned parameters θ are within an ε of the true parameters if we use Algorithm 2 with the number of samples \(m\geq O\left (d\log \left (\frac {1}{\delta }\right)\right)\). In our case, m is the number of rows in X. The derived bound states that with probability (1−δ) the empirical loss is bounded by
where L_{∗} is the true loss with the optimal parameters and \(\delta =\frac {1}{e^{k}}\). In practice, we avoid wasted computation by limiting the total number of samples m to 2000, which is always larger than is necessary for the guarantees given the values of k in our experiments (which determines delta in our formulation of the algorithm).
Experiments
In this section, we show the utility of applying NOL* algorithms for network completion in a variety of datasets, including both synthetic and real world networks. We first describe the data we use to test our method, then explain our experimental methodology and present results.
Data
We test NOL* algorithms on a range of synthetic graphs, as well as five real world datasets.
Synthetic models
Synthetic network models are useful for generating test data that exhibits interesting properties often found in real world networks. We are interested in testing the performance of our algorithm on datasets that allow us to vary a few macroscopic properties of networks: the distribution of degrees, the extent of local clustering, the modularity of the global structure.
The degree distribution is an important factor in understanding when learning is possible or helpful. There are two simplified extremes that can characterize degree distributions in complex networks: homogenous and heterogenous (or heavy tailed) distributions. The important difference is that in a heterogeneous degree distribution some nodes accumulate far more connections than the majority of nodes in the network, resulting in these nodes becoming topological hubs. In the presence of hubs, the variance of the degree distribution can become large as the number of nodes in the network N grows, until it eventually diverges as N→∞. In contrast, a homogeneous degree distribution implies that hubs are not present, meaning the nodes are statistically equivalent with respect to degree. In the homogenous case, the degree of a randomly chosen node will be a random variable following a distribution with welldefined variance.
A second node characteristic that we conjecture is important for learning is the clustering coefficient. The clustering coefficient (more precisely, local clustering coefficient) is a measure of the extent to which a node’s neighbors are connected to one another. Since the clustering coefficient is realvalued, it is often more intuitive to study the average clustering by degree, which is what we show in Fig. 2. Clustering is related to the local density of neighborhoods and can serve as a proxy for the amount a neighborhood has been explored.
Lastly, we are interested in studying the impact of the modularity of a network structure on algorithm performance, meaning the prevalence of withincommunity links relative to outofcommunity links in a modularitymaximized partitioning of the nodes into communities. The extent of modularity in a network structure could be instructive in balancing exploration and exploitation, since learning to probe in a highly modular structure is more susceptible to settling on local minima by exploiting in a single community without discovering crosscommunity links.
To test the above conjectures on the limitations of learning in complex networks, we study five synthetic network models.

1
ErdősRényi (ER) (Erdös and Rényi 1959)^{Footnote 4}: In a network sampled from the ER model (specifically the ensemble G_{Np}), every possible (undirected) link between N nodes exists with probability p. Networks generated by the ER model have a homogeneous degree distribution (the exact distribution is Binomial, but it is often approximated by Poisson). Parameters: N=10000, p=0.001.

2
BarabásiAlbert (BA) (Albert and Barabási 2002): The BA model generates networks through a growth and preferential attachment process where each node entering the network chooses a set of m neighbors to connect to with probability proportional to their relative degree. This process results in a heterogeneous degree distribution, which in the infinite limit follows a power law distribution with exponent 3. Parameters: N=10000, m=5, m_{0}=5. m_{0} denotes the size of the initial connected network. m denotes the number of existing nodes to which a new node connects.

3
Block Twolevel ErdősRényi (BTER) (Seshadhri et al. 2012): BTER is a flexible^{Footnote 5} model that combines properties of the ER and BA model. It consists of two phases: (i) construct a set of disconnected communities made up of dense ER networks, with the size distribution of the communities following a heavy tailed distribution (i.e., a small number of large communities and many more small communities) and (ii) connect the communities to one another to achieve desired properties, such as a target value of global clustering coefficient. Parameters: N=10000, target maximum clustering coefficient = 0.95, target global clustering coefficient = 0.15, target average degree 〈k〉=10.

4
LancichinettiFortunatoRadicchi (LFR) (Lancichinetti et al. 2008): In the LFR model, modular structure can be induced by varying the mixing parameter μ, which controls the extent to which nodes connect internally to a tight community (higher modularity) or loosely to the entire network (lower modularity), thus controlling the extent of modular community structure. Parameters: N=34546, mixing parameter mu∈0,0.1,0.2,0.3,0.4,0.5,0.6,0.7, degree exponent γ∈2,2.25,2.5,2.75,3,3.25,3.5, community size distribution exponent β=2, average degree 〈k〉=12, and maximum degree k_{max}=850.

5
kregular networks: In a kregular network, every node is connected to k other nodes such that connections are random and every node has the same degree. Parameters: N=10000, k=6.
We note that we have left out analysis of the WattsStrogatz (WS) model (Watts and Strogatz 1998), which is a model developed to study the small world property of random graphs. In WS, a rewiring parameter controls the tradeoff between nodes clustering into triangles and the average path length between all nodes, a proxy for the smallworld property. We do not study this model because (a) its degree distribution is homogeneous, following a Poisson distribution, and (b) the model does not result in networks with modular structure. Therefore, despite many uses in other contexts, the WS model is not well suited for studying the network completion problem in the present case.
Realworld networks
We evaluate NOL* on real world datasets whose characteristics are summarized in Table 1. We show the number of nodes (N), number of edges (E), number of triangles (# △), and the modularity (Q), computed by finding a modularity maximizing partition with the Leiden algorithm (Traag et al. 2019), a recently proposed improvement on the wellknown Louvain algorithm (Blondel et al. 2008) for finding node partitions with high modularity. Degree distribution, average clustering by degree and component size distribution is shown for each network in Fig. 2.
Sampling methods
NOL* is agnostic to the method of sampling used to collect the initial sample graph, \({\hat {G}}_{0}\). For the sake of continuity, all of the initial samples used in this paper were collected via node sampling with induction. In this technique, a set of nodes is chosen uniformly at random, then a subgraph is induced on the nodes (i.e. all of the links between them are included in the sample). Our samples are defined in terms of the proportion of the edges in the underlying network. To generate samples with the target proportion of edges, we choose a sample of nodes and induce a subgraph on them; if this subgraph has too many (or too few) edges, we repeat with a larger (or smaller) subset of nodes until we find an induced graph with an acceptable number of edges.
In the main text of this paper we present results on samples collected via node sampling with induction, but we have also verified many of the results using random walk with jump sampling (Ahmed et al. 2013), which can be found in Appendix A.3.
Features
To accurately predict the number of unobserved neighbors of a partially observed node, NOL* algorithms require interpretable node features that are relevant across a variety of very different network structures. These features must be feasible to compute and update online for our algorithms to be scalable. We use the following features for each node i visible in the sample network:

\(\hat d(i)\): the normalized degree of node iin the sample network, which can be seen as a measure of the node’s centrality in the sample. The inclusion of this feature assumes that the degree of a node in the sample is relevant to its total degree.

\(\hat {cc}(i)\): the clustering coefficient of node iin the sample network. This feature captures the local neighborhood density of a node, particularly the tendency of the node and its neighbors to form triangles.

CompSize(i): the normalized size of the connected component in the sample network to which node i belongs, which can be used to facilitate exploration and exploitation based on where in the network the node is located.

pn(i): the fraction of node i’s neighbors which have already been probed. This feature indicates the extent to which the neighborhood of node i has been explored by the algorithm.

LostReward(i): the number of nodes that first connected to node i by being probed. Unlike pn(i), LostReward(i) only counts nodes that were not connected to i before they were probed. This feature mitigates the ordering effects of probing nodes. If nodes i and j both have the same unobserved neighbors, for instance, probing j first would normally lower the total reward of node i when i would be probed. LostReward(i) gives credit to i upon its probing, since it could have brought in as many new nodes as j.
Baseline methods
We compare the performance of NOL* with 4 heuristic baseline methods and a multiarmed bandit method. Our heuristic baselines are as follows:

1.
High degree: Query the node with maximum observed degree in every step. This has been shown to optimal in some heavy tailed networks (Avrachenkov et al. 2014).

2.
High degree with jump: With probability ε, query a node chosen uniformly at random from all partially observed nodes. With probability 1−ε, query the node with the maximum observed degree. We set ε=0.3.

3.
Random degree: Query a node chosen uniformly at random from all partially observed nodes.

4.
Low degree: Query the node with minimum observed degree in every step. This method is approximately^{Footnote 6} optimal for kregular networks where every node has the same degree, since the lowest degree node is always furthest from the uniform degree k (see Appendix A.1).
We also compare our approach with the nonparametric multiarmed bandit model proposed in Madhawa and Murata (2019), which we refer to as KNNUCB (knearestneighbors upper confidence bound). This method similarly relies on computing a vector of structural features for each node, including degree, average neighbor degree, median neighbor degree and average fraction of probed neighbors. Each unprobed node is considered an arm in a Multiarmed Bandit formulation, and the next arm to pull (node to probe) is chosen by computing
Here, \(\hat {f}(x_{i})\) is the expected reward of probing node i and σ(x_{i}) is the average distance to other points in the neighborhood. The expected reward is calculated as a weighted kNN regression on nodes within the kNN radius of node i, defined as the k nodes whose feature vectors have Euclidean distance less than r from the feature vector of node i. The term ασ(x_{i}) facilitates exploration by allowing the possibility of nodes without maximum expected reward to be probed, assuming α>0. In our experiments we fixed the value of k=20 and α=2, following the experiments in the original paper.
Experimental setup
Across all networks, our experiments are run over 20 independent initial node samples of the network. In synthetic networks, the underlying network is a realization of the model using the parameters described in Synthetic models section.
Our experiments aim to (1) investigate how network properties impact the performance of NOLHTR, (2) exhibit the performance tradeoffs between NOLHTR and NOL, (3) show that NOL* algorithms outperform the baseline methods in settings where learning is possible and approximate the heuristic methods elsewhere, (4) analyze the prediction error of NOLHTR, and (5) analyze the evolution of the feature weights learned by NOLHTR over time.
Performance metrics
After each probe of the network, NOL* earns a reward, defined in this work as the number of previously unobserved nodes included in the network after a probe. Formally, the reward at time t is defined as r_{t}=V_{t+1}−V_{t}. We study the performance of each method by showing the cumulative reward, \(\hat {c}_{r}(T)\), where T is a time step between \(0, 1, 2, \dots b\). Formally, \(\hat {c}_{r}(T) = {\sum \nolimits }_{t = 1}^{T} r_{t}\). We also want to quantify the utility of decisions made by NOL*. For this purpose, we study the prediction error of the model. This quantifies the extent to which our prediction, \({\mathcal {V}}_{\theta _{t}}(i) =\theta _{t}\phi _{t}(i)\), differs from the true reward value r_{t}. Thus, we calculate \(E(t) = {\mathcal {V}}_{\theta _{t}}(i)  r_{t} \).
NOLHTR parameter search
We ran a two dimensional parameter search over values of k (116, 32, 64, 128, log10(n), loge(n), log2(n)) and ε (0, 0.1, 0.2, 0.3, 0.4, and exponential decay versions of each). We ran this search on all of the networks in Fig. 2, as well as 56 LFR networks (Lancichinetti et al. 2008) spanning a wide variety of network structures. We varied two parameters in the LFR networks: μ, which controls modular structure by adjusting the probability of crosscommunity links; and γ, which controls the exponent of the degree distribution of the entire network (Lancichinetti et al. 2008).
Across all of these networks, we did not observe a general trend in which some parameter settings performed best consistently across experiments. There was not a single best choice or regime of parameters through the experiments for any of the networks we searched on individually, nor was there a standout choice of parameter across the networks.
However, we have found some evidence to suggest that performance on more modular structures is improved by more randomization, meaning a nonzero value of ε. We ran a linear regression using ε as the target variable and global clustering, modularity, and degree exponent as covariates. Results are presented in Table 2. All three covariates positively effected ε, with modularity having the strongest effect. This result is consistent with a network scientific understanding of the querying process: when querying a modular network structure, the likelihood of finding a local minima within one highly connected and clustered module is high, meaning that adding more randomness to the algorithm will increase the likelihood that the algorithm sees examples that allow it to avoid settling on such a minima.
We report results using a set of parameters that performed reasonably well across networks. These parameters are ε=0.3, decaying as \(\epsilon _{t} = 0.3 e^{\frac {t}{b}}\) and k= ln(t), where t represents the number of nodes probed thus far in the experiment.
Results
In this section, we compare the performance of the heuristic baseline methods, the KNNUCB baseline approach, NOL, and NOLHTR. Broadly, we find that NOLHTR and NOL perform similarly in terms of average cumulative reward, but that the performance of NOLHTR is more consistent, indicated by standard deviations that are tighter around the mean across experiments.
Cumulative reward We report the average cumulative reward \(\hat {c}_{r}(T)\) over a budget of thousands of probes on 6 networks in Fig. 3. The average and standard deviation of \(\hat {c}_{r}(T)\) are computed over experiments on 20 independent samples. We omit results on ER networks, noting that because all nodes are statistically equivalent in terms of structural properties, every probing method performs equivalently and neither learning or heuristics provide any significant advantage (see Appendix A.1).
In BA networks (Fig. 3a), NOLHTR performs on par with the High Degree baseline, which is known to be near optimal in networks with heavy tailed degree distributions (Avrachenkov et al. 2014). Further, NOLHTR outperforms NOL by achieving both higher average reward and smaller standard deviation. NOLHTR also consistently outperforms the baseline methods in networks generated by the BTER model (Fig. 3b). Although NOLHTR outperforms NOL in both reward and variance towards the beginning of the experiment, after around 25% of the nodes have been probed NOL begins to outperform NOLHTR. See Comparing NOL and NOLHTR section for a discussion of some potential explanations for this observation. NOLHTR performed as well as NOL and the best heuristics in every real world network we experimented on. Comparing directly with NOL, NOLHTR is able to achieve similar or better performance, always with smaller standard deviation, on every network, regardless of the underlying distributions (compare networks in Fig. 2). We note that across experiments, the KNNUCB method was unable to match the performance of our model and often underperformed the baseline methods.^{Footnote 7}
Use case: twitter social network We present results on a social interaction network sampled from Twitter in Fig. 4. In this setting, querying a node corresponds to calling the Twitter API to obtain more data about a specific account. This is an example of a natural use case for NOLHTR, both because the network must be expanded by repeatedly accessing an API and because there is evidence that the distribution of social interactions is heavytailed (see e.g. Kossinets and Watts 2006 and Morales et al. 2012).
This Twitter Social Network data set consists of Twitter users connected to one another with an undirected edge if they mutually retweeted or mentioned each other throughout the course of a single day in 2009. We focus our experiment on the largest connected component of this interaction network. While this is only a small fraction of the Twitter network, the data was collected via the Twitter Firehose, and is therefore complete for that time span, which allows us to accurately simulate the process of growing a network by querying Twitter users. Some other characteristics of the dataset are outlined in Table 1 and the distributions of degree and clustering are shown in Fig. 2. Notably in our context, though the maximum degree in this network is not very large (≈100), choosing a random node with degree near the maximum is much less likely than choosing a node with low degree.
As shown in Fig. 4, NOL outperforms the heuristic baseline methods and NOLHTR.^{Footnote 8} However, the inset plot shows that, similar to the experiments on other networks, NOLHTR outperforms NOL early in the experiment before eventually being overtaken. We discuss our conjectures about why we observe this tradeoff behavior in Comparing NOL and NOLHTR section.
Prediction error We analyze the ability of NOL* to learn by showing the measure of prediction error, E(t), defined in Performance metrics section, over time and across different initial sample sizes.
Cumulative prediction error, E(T), as a function of time is shown in Fig. 5. The error is averaged over 10 independent samples of the network for initial sample sizes of 1%, 2.5%, 5%, 7.5% and 10% of the complete network (as percent of edges in the network).
We show results in both the BA and BTER models. The predictions of NOLHTR are noisy in the beginning of the experiment and the resulting outliers skew the analysis of cumulative error, therefore we present the error starting from t=50. In both cases the prediction error is relatively stable over time, indicated by the slow growth of the curves. We also observe that the average prediction error is lowest when the initial sample is largest, and highest when the initial sample is smallest; This is intuitive, since when the initial sample is large, the model is learning its predictions from more accurate information, whereas when the initial sample is small, the training information is very noisy.
Feature weight analysis We qualitatively analyze the feature weights learned by NOLHTR over time (see Fig. 8 for visualization). Since the weights were computed on random subsamplings of the observed data at every time step, they fluctuated considerably between probes. Still, we found that across most networks (all but Caida and Enron), the degree of a node was weighted positively and large in magnitude, implying that it is the most salient feature to predict reward. This is intuitive, since we expect the target variable to correlate highly with the sample degree. There were no features with consistently highly negative weights, which would imply a negative predictor of reward. Instead, the other features typically fluctuated around 0, meaning they were not consistently predictive of either high or low rewards, but were nonzero so did contribute to the prediction.
The Caida and Enron networks were exceptions to the above general trends, though NOLHTR exhibits similar performance on these networks in terms of cumulative reward. In these experiments, the weight of the degree feature was centered around 0 along with the other features, but with very large fluctuations in both positive and negative directions. This suggests that the importance assigned to degree depended strongly on the particular subsampling of the data in an individual time step. It may also have implications for the impact of degree correlations (i.e. average neighbor degree), since the fluctuations are consistent with both low and high degree nodes predicting high reward.
Comparing NOL and NOLHTR
The above analysis illustrates some differences and tradeoffs between NOL and NOLHTR: (1) NOLHTR achieves similar performance to NOL and is more consistent. This is evidenced by the tighter standard deviations around the average cumulative reward. (2) NOLHTR outperforms NOL in the beginning of every experiment. This is consistent with our expectation based on how each of the algorithms work: NOLHTR is able to leverage outlier, high reward queries early on because it computes maximum likelihood parameters at every step, while NOL updates its parameters with online gradient descent using a fixed learning rate, and is therefore not able to adapt as effectively to high reward nodes. (3) As the number of probes increases, NOL often begins to outperform NOLHTR. There are a few possibilities for why this is happening. First, NOL is learning through gradient descent, so adjusting the learning rate of the algorithm could impact the amount of examples in the tail it takes to reach highly predictive parameters, explaining the lag. Second, as the number of samples grows, the NOLHTR parameter setting of k=ln(n) grows very slowly. This means that more data is being subsampled into the same number of bins, thus each subsample may become more noisy and the outliers become less distinguishable. This could be alleviated by increasing the value of k more quickly as the number of samples grow, or by capping the size of the subsamples, or by choosing the sample size so that data from the tail is more likely to have an impact on the parameters.
Limitations of learning
We are interested in understanding the limitations of our approach in this setting. We experiment across a wide variety of network structures using the LFR community benchmark (Lancichinetti et al. 2008). We generated 56 LFR networks spanning two parameters: μ, which controls modular structure by adjusting the probability of crosscommunity links; and γ, which controls the exponent of the degree distribution of the entire network. Then, we ran the NOLHTR parameter search (see NOLHTR parameter search section) on each network, as well as random and high degree baseline methods. For each network, we found the set of NOLHTR parameters that resulted in the maximum average cumulative reward and computed the percent gain in performance using either of the baselines as
Results are shown in Fig. 6. Compared with high degree probing (left plot), NOLHTR is always able to gain in performance in higher modularity networks (Q>0.6), with performance gains spanning from 1530% up to about 130%. When modularity is lower, including in the BA model realization, NOLHTR approximates high degree across degree exponents, with the maximum performance loss of less than 3% (−2.54%) compared to the heuristic.
Comparing with random degree probing (right plot), NOLHTR is always able to gain in performance, with minimum performance gain of 5% and maximum of 93%. The maximum gains are in networks with the most heterogeneous degree distributions. This corresponds to the most starlike network structures, meaning most randomly chosen nodes will have very low degree.
The performance compared to random degree probing in the real networks seem to fit with performance on LFR networks with similar parameters, allowing for some noise in either direction.
Comparing with high degree on BTER, Caida, Cora and Enron, performance appears to be about on par with similar LFR networks, again allowing for some noise. Performance gain by using NOLHTR rather than high degree in the DBLP network is substantial, but smaller than an LFR network with similar properties.
The degree exponent estimate for our Twitter sample is an outlier in this dataset. However, there is also substantial performance gain, though it might be smaller than what we would expect given an LFR network with similar properties.
Taken together, these results show that the ability of our model to learn in this setting is tied to the structural properties of the network. In some cases, a low computational cost heuristic such as high or random degree can perform well, but our experiments show that there is almost never a disadvantage to using a NOL* algorithm, and that the advantages can be substantial, up to 130% gains in performance.
Conclusion and future work
We proposed and evaluated algorithms to address the problem of reducing the incompleteness of a partially observed network via successive queries as an online learning problem. We presented two algorithms in the NOL* family and highlighted NOLHTR for learning heavytailed reward distributions. We showed that NOLHTR is able to consistently outperform other methods, especially earlyon in the process of querying nodes when the extreme values are yet to be discovered. We also showed that macroscopic properties of the underlying network structure, specifically the degree distribution and extent of modularity, are important factors in understanding when learning will be relatively easy, difficult, or nearly impossible. Alongside experiments on multiple synthetic and real world networks, we presented experiments on a Twitter interaction network, a realistic use case for a network growth algorithm such as NOLHTR.
The problem of online network discovery remains fruitful for future work. The case of noisy observations from the query model has yet to be fully addressed. For example, a query may return only a sample of the neighbors, or a list of potential neighbors that may include false positives. The specific models we presented here are not sensitive to this type of noise, but could be addressed in extensions to NOL*. Similarly, an adversarial version of the problem can be formulated, where an adversary is intentionally poisoning the queries (via either the query that is sent or the data that is returned) and the model must include the adversary in order to make appropriate adjustments to decision making. Such noisy network discovery tasks could be formulated as Partially Observed Markov Decision Processes (POMDPs), which differ from our MDP formulation by the fact that the agent is uncertain about its current state (for example, we could model some error on the feature vectors).
Appendix A:
A.1 Results on ER and regular networks
In Fig. 7, we present cumulative reward results on an ER network (left) and a kregular network (right). The result on the ER network shows that every querying strategy performs indistinguishably from the others. This is due to the fact that the degree distribution of the network is homogeneous, meaning that the expected number of neighbors of each node is well defined and the same for all nodes. Since the expected number of neighbors is the same for every node, but the exact number of neighbors for a given node is random, no querying strategy is able to outperform any other.
In the kregular network, every node has exactly the same degree, but the particular neighbors are randomly chosen. Since every node has the same degree, choosing the lowest degree node is approximately optimal when maximizing the number of newly observed connections. For the same reason, choosing the highest degree node is usually far from optimal, since the maximum degree node in the sample will have degree nearly k, thus the minimum reward. This explains why high degree is the worst performer.
A.2 Feature weight analysis
In Fig. 8, we present analysis of feature weights from a NOLHTR experiment, showing how the feature weights change as querying continues, averaged over 20 trials. The purpose is to show that it is possible to analyze the learned feature weights to help understand why NOL* algorithms perform the way they do. We choose NOLHTR here, but in principle any parameterized model could be temporally analyzed in a similar fashion.
In the BA example (Fig. 8a), the LostReward feature, which takes order effects of querying into account (see Experiments section), appears as the most positively weighted feature. This makes some intuitive sense: hubs accumulate larger values of lost reward over time, since many nodes brought in by other queries are connected after they eventually get queried. This means the hubs are “missing out” on reward and since they are in fact the best nodes to probe, this is reflected in the weights of the features. This implies that in a BA network, we expect that the degree and lost reward features will be correlated. This correlation provides a potential explanation for why degree is, somewhat counterintuitively, one of the least important features when querying a BA network: the degree of hub nodes is being accounted for in the lost reward feature, so degree itself does not have a strong impact.
In the BTER example (Fig. 8b), the normalized size of the connected component that a node is in is the most highly weighted feature. As a reminder, BTER networks are generated by constructing many ER networks (“communities”) of different sizes, where the size of the networks follows a power law distribution, then connecting them to one another. This means there are a small number of very large and relatively well connected “communities”, which are the best place to query to bring in new nodes.
A.3 Random walk sampling
In Fig. 9 we show results of NOL* algorithms starting from random walk samples. The sampling works as follows: a node is chosen uniformly at random, then a random walk from that node proceeds until the desired proportion of edges is discovered, jumping randomly 15% of the time. The resulting performance is similar to what we saw in the node sampling with induction samples: NOL* algorithms are able to learn to match or outperform the heuristic methods in almost every case. However, the performance of NOL appears to be more consistent on random walk samples, as evidenced by the fact that the standard deviation around the mean performance tends to be tighter. The performance of NOLHTR is approximately the same or even slightly worse (e.g. BTER) on the random walk samples. The low degree heuristic also appears to perform better on random walk samples.
A.4 Alternative features
In this section, we show some experiments using node2vec (Grover and Leskovec 2016) as features, rather than our hand selected features. Since node2vec is significantly more computationally expensive, we expect a tradeoff between computation time and performance increases due to more expressive features. However, the results shown in Fig. 10 present a mixed picture: In some cases, using the embedding features makes no improvement or can even degrade performance. Only in one case, the Cora network, do the node2vec features truly outperform the others, and even then only with one of the learning algorithms (NOLHTR). These results are inconclusive on the benefit of using embeddings as features and show that more experimentation and research needs to be done to understand the tradeoff between complexity of features and improvement in performance.
Availability of data and materials
Code and data to reproduce the results in this paper are available at https://github.com/tlarock/nol/. Datasets used in our experiments were downloaded from SNAP (Leskovec and Krevl 2014).
Notes
 1.
An earlier version of NOL was presented at the 14th Annual Workshop on Mining and Learning with Graphs, a nonarchival venue, in 2018 (LaRock et al. 2018).
 2.
We use the terms graph and network interchangeably in this paper.
 3.
We use the terms querying and probing interchangeably in this paper.
 4.
We omit almost all results on ER graphs from the paper because all methods perform indistinguishably on these graphs.
 5.
There are many other random graph models that provide flexibility similar to the BTER that we could have used to generate networks for our experiments. We chose BTER out of convenience because it allows us to easily specify target values for average degree and clustering parameters directly.
 6.
Cases where low degree is not optimal, even in networks with uniform degree, can be constructed. Specifically, if the neighbors of the node with lowest degree are already in the network, querying it will result in reward 0. If in the same case the neighbors of the node with 2nd lowest degree are almost all outside of the network, reward for querying that node will be larger.
 7.
We further note that the sample collection techniques used in the experiments in the KNNUCB paper (Madhawa and Murata 2019) were defined differently from those we employ here and that all of our experiments ran over larger probing budgets.
 8.
Our attempt to run the KNNUCB baseline on the Twitter network did not finish in a reasonable amount of time, therefore we omit it.
Abbreviations
 NOL:

Network Online Learning
 HTR:

Heavy Tail Regression
 API:

Application Programming Interface
 MDP:

Markov Decision Process
 POMDP:

Partiallyobserved Markov Decision Process
 WWW:

World Wide Web
 MEUD:

Maximum Expected Uncovered Degree
 ER:

ErdősRényi
 BTER:

Block Twolevel ErdősRényi
 BA:

BarabásiAlbert
 LFR:

LancichinettiFortunatoRadicchi
 WS:

WatttsStrogatz
 KNNUCB:

knearestneighbors upper confidence bound
 # △:

Number of Triangles
References
Ahmed, NK, Neville J, Kompella RR (2013) Network sampling: from static to streaming graphs. TKDD 8(2):7–1756.
Albert, R, Barabási AL (2002) Statistical mechanics of complex networks. Rev Mod Phys 74(1):47–97.
Alves, LGA, Aleta A, Rodrigues FA, Moreno Y, Nunes Amaral LA (2020) Centrality anomalies in complex networks as a result of model oversimplification. New J Phys 22(1):013043.
Avrachenkov, K, Basu P, Neglia G, Ribeiro BF, Towsley DF (2014) Pay few, influence most: online myopic network covering In: INFOCOM Workshops, 813–818.. IEEE, Toronto.
Blondel, VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008.
Breza, E, Chandrasekhar AG, McCormick TH, Pan M (2017) Using aggregated relational data to feasibly identify network structure without network data. NBER Working Paper (23491).
Chen, S, Mira A, Onnela JP (2019) Flexible model selection for mechanistic network models. J Complex Net 8(2). https://doi.org/10.1093/comnet/cnz024.
Cheng, R, Lo E, Yang XS, Luk M, Li X, Xie X (2010) Explore or exploit? effective strategies for disambiguating large databases. PVLDB 3(1):815–825.
Cho, J, GarciaMolina H, Page L (1998) Efficient crawling through UR ordering. Comput Netw 30(17):161–172.
Erdös, P, Rényi A (1959) On random graphs I. Publ Math 6:290–297.
Ghosh, S, Zafar MB, Bhattacharya P, Sharma NK, Ganguly N, Gummadi PK (2013) On sampling the wisdom of crowds: random vs. expert sampling of the twitter stream In: CIKM’13: 22nd ACM International Conference on Information and Knowledge Management San Francisco California USA October, 1739–1744.. Association for Computing Machinery, New York.
Gile, KJ (2011) Improved inference for respondentdriven sampling data with application to HIV prevalence estimation. JASA 106(493):135–146.
GonzálezBailón, S, Wang N, Rivero A, BorgeHolthoefer J, Moreno Y (2014) Assessing the bias in samples of large online networks. Soc Networks 38:16–27.
Grover, A, Leskovec J (2016) node2vec: Scalable feature learning for networks In: KDD’16: 22nd ACMD SIGKDD Conference on Knowledge Discovery and Data Mining San Francisco California USA August, 855–864.. Association for Computing Machinery, New York.
Hanneke, S, Xing EP (2019) Network completion and survey sampling In: AISTATS 2019: The 22nd International Conference on Artificial Intelligence and Statistics, 209–215.. Proceedings of Machine Learning Research.
Hsu, D, Sabato S (2016) Loss minimization and parameter estimation with heavy tails. JMLR 17:1–40.
Pfeiffer III, JJP, Neville J, Bennett PN (2014) Active exploration in networks: using probabilistic relationships for learning and inference In: CIKM ’14: 2014 ACM Conference on Information and Knowledge Management Shanghai China November, 639–648.. Association for Computing Machinery, New York.
Kim, M, Leskovec J (2011) The network completion problem: inferring missing nodes and edges in networks In: SDM, 47–58.
Kirkpatrick, S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680.
Kossinets, G, Watts DJ (2006) Empirical analysis of an evolving social network. Science 311(5757):88–90.
Lancichinetti, A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78(4):46110. https://doi.org/10.1103/PhysRevE.78.046110, http://arxiv.org/abs/0805.4770.
LaRock, T, Sakharov T, Bhadra S, EliassiRad T (2018) Reducing network incompleteness through online learning: A feasibility study In: MLG ’18. http://www.mlgworkshop.org/2018/papers/MLG2018_paper_40.pdf.
Leskovec, J, Chakrabarti D, Kleinberg JM, Faloutsos C, Ghahramani Z (2010) Kronecker graphs: an approach to modeling networks. JMLR 11:985–1042.
Leskovec, J, Krevl A (2014) SNAP Datasets: stanford large network dataset collection. http://snap.stanford.edu/data.
Madhawa, K, Murata T (2019) A multiarmed bandit approach for exploring partially observed networks. Appl Netw Sci 4(1):26. https://doi.org/10.1007/s4110901901450.
Morales, AJ, Losada JC, Benito RM (2012) Users structure and behavior on an online social network during a political protest. Physica A 391(21):5244–5253.
Murai, F, Rennó D, Ribeiro B, Pappa GL, Towsley D, Gile K (2018) Selective harvesting over networks. Data Min Knowl Discov 32(1):187–217.
Peixoto, TP (2018) Reconstructing networks with unknown and heterogeneous errors. Phys Rev X 8(4):041011.
Sampson, J, Morstatter F, Maciejewski R, Liu H (2015) Surpassing the limit: keyword clustering to improve Twitter sample coverage In: HT ’15: 26th ACM Conference on Hypertext and Social Media Guzelyurt Northern Cyprus September, 237–245.. Association for Computing Machinery, New York.
Sanz, J, Cozzo E, BorgeHolthoefer J, Moreno Y (2012) Topological effects of data incompleteness of gene regulatory networks. BMC Syst Biol 6(1):110.
Seshadhri, C, Kolda TG, Pinar A (2012) Community structure and scalefree collections of ErdösRényi graphs. Phys Rev E 85(5):056109.
Soundarajan, S, EliassiRad T, Gallagher B, Pinar A (2015) Maxoutprobe: a algorithm for increasing the size of partially observed networks. CoRR abs/1511.06463.
Soundarajan, S, EliassiRad T, Gallagher B, Pinar A (2016) Maxreach:reducing network incompleteness through node probes In: ASONAM, 152–157.. IEEE, San Francisco.
Soundarajan, S, EliassiRad T, Gallagher B, Pinar A (2017) εWGX: adaptive edge probing for enhancing incomplete networks In: Proceedings of the 2017 ACM on Web Science Conference, WebSci 2017, 161–170.. Association for Computing Machinery, New York.
Strehl, AL, Littman ML (2007) Online linear regression and its application to modelbased reinforcement learning In: Advances in Neural Information Processing Systems 20 (NIPS 2007), 1417–1424.. Neural Information Processing Systems, San Diego.
Sutton, R, Barto A (2018) Reinforcement Learning: An Introduction. 2nd edn. MIT Press, Cambridge, MA.
Tokic, M (2010) Adaptive epsilongreedy exploration in reinforcement learning based on value difference In: KI, 203–210.. Springer, Karlsruhe.
Traag, VA, Waltman L, van Eck NJ (2019) From Louvain to Leiden: Guaranteeing wellconnected communities. Sci Rep 9(1):5233.
Vázquez, A, PastorSatorras R, Vespignani A (2002) Internet topology at the router and autonomous system level. CoRR condmat/0206084.
Wang, YJ, Wong GY (1987) Stochastic blockmodels for directed graphs. J Am Stat Assoc 82(397):8–19.
Watts, DJ, Strogatz SH (1998) Collective dynamics of ’smallworld’networks. Nature 393(6684):440.
Wejnert, C, Heckathorn DD (2008) Webbased network sampling: efficiency and efficacy of respondentdriven sampling for online research. Sociol Methods Res 37(1):105–134.
Acknowledgements
The authors acknowledge Brennan Klein for his help designing Fig. 1.
Funding
This research was sponsored in part by (1) the National Science Foundation grants NSF CNS1314603 and NSF IIS1741197, (2) the Combat Capabilities Development Command Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF1320045 (ARL Cyber Security CRA), and (3) the Under Secretary of Defense for Research and Engineering under Air Force Contract No. FA870215D0001. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Combat Capabilities Development Command Army Research Laboratory or the Under Secretary of Defense for Research and Engineering or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes not withstanding any copyright notation here on.
Author information
Affiliations
Contributions
This manuscript would not exist without TL; he was the lead contributor in all aspects of the manuscript. TS was involved in generating the synthetic data sets, preparing the Twitter data, and generating the feature importance results. SB was involved in the design of algorithms presented in the manuscript. TER supervised the work and contributed to methodology and experimental design. TL was the lead developer of the code and conducted all the experiments. TL was the lead contributor in writing the manuscript. All author(s) read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
LaRock, T., Sakharov, T., Bhadra, S. et al. Understanding the limitations of network online learning. Appl Netw Sci 5, 60 (2020). https://doi.org/10.1007/s4110902000296w
Received:
Accepted:
Published:
Keywords
 Partially observed networks
 Online learning
 Heavytailed target distributions