Compile and runtime approaches for the selection of efficient data structures for dynamic graph analysis
 Benjamin Schiller^{1}Email authorView ORCID ID profile,
 Clemens Deusser^{1},
 Jeronimo Castrillon^{2} and
 Thorsten Strufe^{1}
DOI: 10.1007/s4110901600112
© The Author(s) 2016
Received: 9 February 2016
Accepted: 25 July 2016
Published: 5 September 2016
Abstract
Graphs are used to model a wide range of systems from different disciplines including social network analysis, biology, and big data processing. When analyzing these constantly changing dynamic graphs at a high frequency, performance is the main concern. Depending on the graph size and structure, update frequency, and read accesses of the analysis, the use of different data structures can yield great performance variations. Even for expert programmers, it is not always obvious, which data structure is the best choice for a given scenario.
In previous work, we presented an approach for handling the selection of the most efficient data structures automatically using a compiletime approach wellsuited for constant workloads.
We extend this work with a measurement study of seven data structures and use the results to fit actual cost estimation functions. In addition, we evaluate our approach for the computations of seven different graph metrics. In analyses of realworld dynamic graphs with a constant workload, our approach achieves a speedup of up to 5.4× compared to basic data structure configurations.
Such a compiletime based approach cannot yield optimal results when the behavior of the system changes later and the workload becomes nonconstant. To close this gap we present a runtime approach which provides live profiling and facilitates automatic exchanges of data structures during execution. We analyze the performance of this approach using an artificial, nonconstant workload where our approach achieves speedups of up to 7.3× compared to basic configurations.
Keywords
Dynamic graph analysis Data structures Performance Measurement study Compiletime optimizationIntroduction
There is an emerging application domain that deals with the analysis of dynamic graphs. They serve to model dynamic systems across different disciplines, such as biological (Candau et al. 1982; Marti 2000), transportation (Chabini 1998), computer (Gonçalves 2012), and social networks (Braha 2009; Kossinets 2006; Mucha 2010). The analysis of such dynamic graphs is challenging and its complexity arises from the frequent changes to their topologies and properties rather than their size alone. Due to a proliferation of applications and the ever increasing size of dynamic systems, performance has quickly become a major concern (Ediger 2010, 2012; Madduri and Bader 2009).
For performance reasons, dynamic graph analysis is implemented on an inmemory graph representation (Ediger et al. 2010; 2012). There are well understood representations of graphs, such as adjacency lists and matrices, on which algorithms, data structures, and complexity analyses have been studied extensively. For practical applications, however, it remains challenging to find the best suited match of algorithms and data structures as the result often depends on the combination of a number of factors. In the case of dynamic graphs this includes graph size and structure, frequency of updates to its topology, and access patterns of the metric computation. Different graph representations result in high performance deviations but are challenging for programmers to predict (Hunt and John 2011; Shirazi 2003).
There exist many frameworks for the efficient analysis of static graphs (Bader et al. 2008; Batagelj et al. 1998; Malewicz et al. 2010). While they are all built for efficient analysis, the graph representation is fixed and selected by the developers. Many graph databases have been developed to represent graph over time (McColl et al. 2009). While they allow for complex queries of the graph over time and the storage of additional properties, they are neither suited for a large number of updates nor the efficient computation of topological graph properties for specific states (Ciglan et al. 2012). A lot of work has been done to develop compact representations of graphs. These approaches do not focus on runtime efficiency but on obtaining a small memory footprint (Blandford and et al. 2004). They often are not even applicable to arbitrary graphs as they are developed for separable or sparse graphs (Blandford et al. 2003; Sun et al. 2007). Special graph representations for dynamic graphs have also been developed. Their underlying data structures are tuned for memory (Madduri and Bader 2009) or runtime efficiency (Bader et al. 2009; Ediger et al. 2012; Macko 2014) but cannot be adapted to different scenarios.
Many approaches have been developed for profiling programs to facilitate their subsequent optimization. Frameworks like Pin (Luk et al. 2005) or JFluid (Dmitriev 2004) allow the instrumentation of existing programs to collect statistics about CPU usage, memory consumption, or call frequencies of code fragments. In addition to this instrumentation, Brainy (Jung et al. 2011) enables the optimization of the data structures used by a program. Based on benchmarks of available data structures, the approach uses machine learning to generate rules like, e.g., if operation o is called more than k times use data structure d. After the analysis of a complete execution of the program, data structures are exchanged based on these general rules. This approach is not applicable to the problem of dynamic graph analysis because the generated rules are generalized for all data types and do not take into account the specific runtime properties of handling vertices or edges in specific lists.
Other approaches attempt to exchange the used data structures during runtime. JustinTime data structures (JitDS) (DeWael et al. 2015) is an extension of the Java language enabling the combination of multiple representations for a single data structure. For each instance, swap rules can be defined by an expert programmer to declare when and how to switch between representations. While this approach is powerful, it relies on the programmer’s intuition and foresight to define such rules. Chameleon (Shacham et al. 2009) provides a framework for runtime profiling without the need to adapt the program. In case the program uses data structure wrappers provided by the framework, data structures can be replaced during runtime which comes at the high cost of performing a separate monitoring of all data structures. Based on fixed rules for exchanging data structures as well, CoCo (Xu 2013) requires the programmer to use wrappers provided by the framework in order to optimize the selected data structures during runtime. With their use of predefined rules that do not adapt to the current properties of the graph and read accesses of the analysis, both approaches are not suited for the analysis of dynamic graphs.
In previous work (Schiller et al. 2015), we presented a compiletime approach for optimized data structure selection in the context of dynamic graph analysis. We benchmarked five data structures as potential candidates and evaluated our approach for the computation of three graph metrics. In this article, we extend this work by benchmarking a total of seven data structures, creating actual estimation functions via curve fitting, and evaluating the impact on a total of seven graph metrics. Furthermore, we propose and evaluate a runtime approach for the selection of optimal data structures during the execution of an application to handle highly dynamic workloads.
The remainder of this article is structured as follows: We introduce our terminology in Section “Terminology and notation”. In Section “Compiletime selection of efficient data structures”, we describe our compiletime approach, discuss benchmarking and profiling results, and evaluate its performance benefits. We outline and evaluate our runtime approach in Section “Runtime selection of efficient data structures” and summarize our work in Section “Summary, conclusion, and outlook”.
Terminology and notation
In this Section, we introduce our terminology and notations for graphs, dynamic graphs, and their analysis. We introduce the different lists for representing graphs in memory as well as the operations required to adapt them over time and access them for analysis. Finally, we define the problem of selecting the best data structures for these lists.
Graphs and adjacency lists
A graph G=(V,E) consists of a vertex set V={v _{1},v _{2},… } and an edge set E. In undirected graphs, edges are unordered pairs of vertices and ordered pairs in directed graphs. The adjacency list of a vertex in an undirected graph is then defined as a d j(v):={{v,w}∈E}. For directed graphs, incoming and outgoing adjacency lists are defined by i n(v):={(w,v)∈E} and o u t(v):={(v,w)∈E}. In addition, the vertices with bidirectional connections are commonly stored in the neighborhood list, i.e., n(v):={w∈V:(w,v)∈i n(v)∧(v,w)∈o u t(v)}.
Dynamic graphs
As a dynamic graph, we consider a graph whose vertex and edge sets change over time. Each change is represented by an update of V or E that adds or removes an element. Applying any of these updates a d d(v), r e m(v), a d d(e), and r e m(e) implies the modification of V, E, and adjacency lists.
We consider a dynamic graph at an initial state G _{0}=(V _{0},E _{0}) and its development over time: G _{0},G _{1},G _{2},…. The transition between two states G _{ i } and G _{ i+1} of the graph can then be described by a set of updates we refer to as a batch B _{ i+1}. Then, the complete transition of a dynamic graph over time can be understood as the consecutive application of batches to it: \(G_{0} \stackrel {B_{1}}{\longrightarrow } G_{1} \stackrel {B_{2}}{\longrightarrow } G_{2} \stackrel {B_{3}}{\longrightarrow } \dots \).
Analysis of dynamic graphs
Analyzing a dynamic graph means to determine its topological properties at certain states, e.g., for G _{0},G _{1},G _{2},…. Examples of such topological metrics are the degree distribution (DD), connected components (C), assortativity (ASS), clustering coefficient (CC), richclub connectivity (RCC), allpairsshortest paths (SP), and betweenness centrality (BC).
Representing a dynamic graph in memory
For directed and undirected graphs, different lists are required to represent the graph and all adjacencies in memory. For both types, the set of all vertices V and the set of all edges E must be stored. For each vertex of an undirected graph, the list of all adjacent edges adj must be represented. In the case of directed graphs, separate lists of incoming and outgoing edges (in and out) as well as neighboring vertices (n) must be maintained. Hence, there is a total of 6 different lists which we denote as \({\mathcal {L}} := \{V, E, adj, in, out, n\}\). Each list stores either edges (e) or vertices (v), denoted as \({\mathcal {T}} := \{v, e\}\). We refer to this element type stored in a list by \(t: {\mathcal {L}} \rightarrow {\mathcal {T}}\) with t(V)=t(n):=v and t(E)=t(i n)=t(o u t)=t(a d j):=e.
Each list must provide operations to modify it and retrieve certain information. To create and maintain a list, it must provide means to be initialized (init), add elements to it (add), and remove existing elements (rem). It must provide operations to fetch a specific element using a unique identifier (get) or iterate over all elements (iter). Often, it is also necessary to retrieve a random element from a list (rand), determine its cardinality (size), or determine if a specified element is contained in the list (cont).
The execution of add, rem, and get can be successful or fail depending on the current state of the list. Likewise, the execution of cont can return true or false. For example, adding vertex v to V fails in case it already exists while the removal of e from E is successful in case the edge exists. Similarly, the result of a contains operation can be true or false, also considered as success or failure. Depending on the data structure used to implement a list for storing elements of a specific type, the runtime can differ significantly when an operation fails compared to a successful execution. We do not need to make this distinction for the other operations: size and iter can not fail and rand returns null in case the list is empty.
Therefore, we distinguish between successful (s) and failed (f) execution of add, rem, get, and cont and consider a set \({\mathcal {O}}\) of 12 different operations: \(o \in {\mathcal {O}} := \{init, add_{s}, add_{f}, rem_{s}, rem_{f}, get_{s}, get_{f}, iter, rand, size, cont_{s}, cont_{f} \}\).
Problem definition
In this article, we consider the problem of finding the most efficient data structures for representing a dynamic graph during analysis in memory. Assume \({\mathcal {D}}\) to be a set of data structures that implement all required operations. Then, we must find the most efficient configuration cfg which maps each list to a data structure: \(cfg: {\mathcal {L}} \rightarrow {\mathcal {D}}\). For undirected graphs, this means to select data structures for V, E, and adj while directed graphs require data structures for in, out, and n in addition to V and E. In the following, we focus on undirected graphs since all results can be transferred to directed graphs.
Compiletime selection of efficient data structures
In this Section, we describe a compiletime approach for the selection of efficient data structures for the analysis of dynamic graphs. Afterwards, we discuss benchmarking results for different data structures and give examples. Then, we present results of operation counts obtained during profiling for the computation of graph metrics and the adaptation of a dynamic graph. Finally, we evaluate our approach on two realworld datasets and summarize our results.
Compiletime approach
Our approach for optimizing the data structure selection for dynamic graph analysis is based on the assumption that workload and characteristics of the dynamic graph do not change drastically over time. We refer to such a workload as constant and call a workload nonconstant in case access patterns or list sizes change significantly over time. In this case, we can estimate the workload for the complete analysis based on the first batches and determine the best configuration.
To understand and estimate the performance of data structures when executing specific operations, we benchmark them beforehand. This preparation phase must be executed only once for a platform where the application should be executed.
Benchmarking
The runtime of executing an operation \(o \in {\mathcal {O}}\) on a list \(l \in {\mathcal {L}}\) depends on the element type \(t(l) \in {\mathcal {T}}\), the data structure \(d \in {\mathcal {D}}\) used to implement the list, and its size \(s_{l} \in \mathbb {N}^{+}\). To estimate this runtime, we perform measurements for data structures and element types with all operations and list sizes s∈[1,s _{ max }]. As a result, we obtain a set of measurements for each list size s: \(m_{d,t,o}: [1,s_{max}] \rightarrow \mathbb {R}^{k}\).

f _{1}(x)=a+b·x+c·x ^{2}

f _{2}(x)=a+b·l o g(x)
We chose these functions to reflect the complexity classes O(1), O(s), O(s ^{2}), and O(l o g(s)) of the operations on different data structures. We fit f _{1} and f _{2} via median value and standard deviation of the data points in m _{ d,t,o } and select the function with the smallest error as e _{ d,t,o }.
Instrumentation, execution, and profiling
Two actions are performed during the analysis of a dynamic graph: graph modification and metric computation. Graph modification means that the inmemory representation is changed to reflect the updates that occur in the graph over time, i.e., add and rem. For the computation of metrics, read operations like iter, size, and contains are executed on certain lists depending on metrics and algorithms.
In the first part of our approach, we instrument a given application such that these accesses to data structures can be recorded. Then we execute the instrumented application for some batches and aggregate the recorded access statistics for each list l and o as \(c_{l}: {\mathcal {O}} \rightarrow \mathbb {N}\). We refer to c _{ l } as operation counts. In addition, we record the average size of all instances of list l as s _{ l }. For example, c _{ V }(a d d) records how many elements have been added to V and s _{ adj } denotes the average size of all adjacency lists adj.
Analysis and recompilation
As a result, the analysis components returns the configuration c f g ^{∗} which was estimated to be the most efficient for executing the operation counts for the given list sizes. Finally, we recompile the application to use c f g ^{∗}.
Benchmarking results
We performed a measurement study of Java data structures to obtain m _{ d,v,o }(s) and m _{ d,e,o }(s) for sizes s∈[1,10^{5}], and seven data structures that provide the required operations: Array (A), ArrayList (AL), HashArrayList (HAL), HashMap (HM), HashSet (HS), HashTable (HT), and LinkedList (LL), i.e., \({\mathcal {D}} = \{A, AL, HAL, HM, HS, HT, LL\}\). HashArrayList is an implementation that stores all elements simultaneously in a HashSet and an ArrayList to take advantage of their respective performance for different operations as proposed by Xu (2013). For the other data structures, we used the default Java implementations.
All measurements are executed on an HP ProLiant DL585 G7 server running a Debian operating system with 64 2.6GHz AMD OpteronTM 6282SE processors. We guaranteed that no more than 60 processes were running during the evaluation executed using a 64bit JVM version 1.7. Our implementation of the benchmarking phase is available as an opensource repository^{2}.
Estimation functions of g e t _{ s } and g e t _{ f } depending on data structure and element type
t  d  \(e_{d,t,get_{s}}(x)\)  \(e_{d,t,get_{f}}(x)\) 

v  A  23.74+0.91·x−0.01·x ^{2}  16.72+0.15·x−0.00·x ^{2} 
AL  24.49+1.41·x−0.01·x ^{2}  41.09+1.82·x+0.04·x ^{2}  
HAL  47.58+0.18·x−0.00·x ^{2}  60.36+3.23·x−0.00·x ^{2}  
HM  73.57+0.93·x−0.00·x ^{2}  57.48+15.46·l o g(x)  
HS  56.20+40.23·x−0.18·x ^{2}  54.05+40.99·x−0.17·x ^{2}  
HT  153.87+18.14·l o g(x)  98.70+19.96·l o g(x)  
LL  39.80+0.24·x−0.00·x ^{2}  26.28+14.04·x+0.22·x ^{2}  
e  A  22.92+1.88·x+0.02·x ^{2}  27.78+1.51·x+0.02·x ^{2} 
AL  23.49+3.65·x−0.00·x ^{2}  29.81+3.63·x−0.00·x ^{2}  
HAL  51.42+5.26·x−0.02·x ^{2}  53.08+4.77·x−0.02·x ^{2}  
HM  371.51+1.38·x−0.00·x ^{2}  357.04+1.44·x−0.00·x ^{2}  
HS  33.45+15.87·x−0.04·x ^{2}  69.20+34.08·x+0.01·x ^{2}  
HT  442.95+2.09·x−0.01·x ^{2}  407.83+5.01·x−0.04·x ^{2}  
LL  31.36+11.18·x+0.10·x ^{2}  35.44+10.59·x+0.11·x ^{2} 
Fastest data structure according to our estimation for different list sizes
o  v  e  

10^{1}  10^{2}  10^{3}  10^{4}  10^{5}  10^{1}  10^{2}  10^{3}  10^{4}  10^{5}  
init  LL  LL  LL  LL  LL  LL  LL  LL  LL  LL 
a d d _{ s }  AL  HS  HAL  HAL  HS  AL  AL  HS  HT  HT 
a d d _{ f }  A  A  A  HS  A  A  HS  HS  HS  HS 
r e m _{ s }  A  A  A  A  A  A  A  HS  HM  HM 
r e m _{ f }  A  A  A  A  A  AL  HS  HS  HS  HM 
g e t _{ s }  A  LL  A  A  LL  A  HAL  LL  HM  HT 
g e t _{ f }  A  A  A  A  A  A  HAL  LL  HM  HM 
iter  AL  HAL  HAL  HAL  LL  AL  HAL  LL  LL  A 
rand  A  HAL  A  A  A  AL  HAL  A  A  HAL 
size  A  LL  A  A  A  A  A  A  A  HAL 
c o n t _{ s }  A  A  A  A  LL  A  HS  HS  HAL  HS 
c o n t _{ f }  A  A  A  A  HS  A  HS  LL  HM  HS 
For storing vertices, Array and HashArrayList appear to be the fastest data structures overall (cf. Table 2). They perform best for most operations and list sizes.
When storing edges, Array and ArrayList are only fast for small lists of size 10. As the lists grow, the fastest data structure depends on the respective operation and even changes again the more the lists grow (cf. Table 2). For example, HashSet and HashTable perform best when executing a d d _{ s } on lists of size ≥ 1,000 while ArrayList is fastest for lists of size 10 and 100.
The reason for the difference in performance when storing vertices or edges lies in the identification of elements. Vertices are identified by a unique identifier which can simply be used as the index of Array, ArrayList, or HashArrayList. Therefore, performing contains or get operations translates to a simple lookup at a deterministic location in memory. In contrast, hashbased data structures perform the overhead of looking up this identifier in the corresponding hash table and potentially determining its location in memory. Edges are identified by a hash computed from the two unique indexes of the adjacent vertices. Their lookup in an arraybased data structure is time consuming since the complete list has to be scanned. Representing all possible indexes of an edge list in an arraybased data structure would require each list to map all possible hash values, and hence always be of size 2^{3}2 which is infeasible. While the lookup in arraybased data structures is still faster for small lists, hashbased data structures are faster for larger lists as they only need to check for the respective hash in their hash table.
From these results, we assume that arraybased data structures should be recommended for storing vertices. Similarly, we see that for storing small edge lists, arraybased data structures should be recommended as well. For larger edge lists with more than 100 elements, there is not a single data structure which appears best. Hashbased data structure perform better than Array and ArrayList but which one depends on the combination and count of the performed operations.
Profiling results
We instrumented the graph component of DNA (Dynamic Network Analyzer) ^{4}, a framework for the analysis of dynamic graphs (Schiller and Strufe 2013), to record c _{ l } and s _{ l } for all lists \(l \in {\mathcal L}\) during graph modification and metric computation using AspectJ (Kiczales et al. 2001). In the following, we present such results generated using the profiling component. With these operation counts and average list sizes, we can perform an analysis to estimate the most efficient configuration.
During the profiling phase, executed for each program at the beginning of our compiletime approach, the counts for graph modification as well as metric computation are recorded and used as basis for the recommendation.
Evaluation
Now, we evaluate our compiletime approach on the analysis of two realworld dynamic graphs: one that produces a constant workload (MD) and a second one that generates a nonconstant workload (FB). Our analysis scripts for performing the evaluation are available as an opensource repository^{6}.
Datasets
The FB dataset is a friendship graph of Facebook taken from KONECT, the Koblenz Network Collection (Kunegis 2013). It represents users and their friendship relations as a list of edges sorted by the timestamp they appeared. We take the initial graph consisting of the first 1,000 edges and 898 vertices. With each batch, the next 100 edges and corresponding vertices are added creating a nonconstant workload. After 200 batches, the graph consists of 11,941 vertices and 21,000 edges (cf. Fig. 6 b).
For both datasets, we create the initial graph and apply the first 20 batches. After the application of each batch one of the following metrics was computed: DD, C, RCC, ASS, SP, CC, or BC. Based on the operation counts c _{ l } of the 20 batch applications and metric computations, we determine the recommended data structures for V, E, and adj.
Then, we perform the same computation with the recommended data structures, as well as configurations where V, E, and adj are all using Array, ArrayList, HashArrayList, HashMap, HashSet, HashTable, or LinkedList, referred to as basic configurations. In total, we compute the properties of MD for all 20,000 states and the properties of FB for 201 states. For comparison, we compute the runtime of all seven configurations relative the configurations recommended by out approach. All results presented here are the median speedup of 50 repetitions.
Constant workload
Recommendations for V, E, and adj depending on workload and computed metric
Metric  Constant workload (MD)  Nonconstant workload (FB)  

V  E  adj  V  E  adj  
Allpairs shortest paths  A  HM  AL  HAL  HAL  LL 
Assortativity  A  HM  A  HAL  HAL  A 
Betweenness centrality  HAL  HM  AL  LL  HAL  LL 
Clustering coefficient  A  HM  A  HAL  HAL  AL 
Degree distribution  A  HM  A  HAL  HAL  AL 
Richclub connectivity  A  HM  AL  HAL  HAL  AL 
Connected components  A  HM  AL  HAL  HAL  AL 
Nonconstant workload
After profiling for the first 20 batches of FB, our approach recommended the use of HashArrayList for representing E for all metrics. With a single exception, the same data structure was recommended for V while the use of either Array, ArrayList, or LinkedList was proposed for adj. We consider this workload to be nonconstant because the sizes of V and E increase with each batch. We expect that this significant change in list sizes renders the initial profiling meaningless for the far longer running analyses of all 200 batches. Based on the profiling during the first twenty batches, we assume a total number of 1,000+20·100=3,000 edges as input of our analysis. But after 200 batches, E grows to a total of 21,000 elements, 7× more than the list size we assume based on our initial profiling. Therefore, we expect that the recommendations generated by our approach are not always the best choice throughout an analysis and can be outperformed by the other configurations.
Summary of the compiletime approach
The fact that our recommended configurations outperform all other tested combinations for MD suggests that our estimation of the actual runtime based on e _{ d,t,o } is accurate and the recommendation valid for all subsequent batches. We have shown that our compiletime approach achieves speedups over basic configurations in case of a constant workload. These recommendations are based on a short profiling phase and the results independent of the duration of the analysis afterwards.
In contrast, our evaluation has shown that our compiletime approach is not always able to accelerate the analysis for all metrics when applying a nonconstant workload (FB). We assume that this is because of the increase of list sizes over the complete analysis period which also affects the operation counts.
Hence, we conclude that our compiletime approach is well suited for constant but not for nonconstant workloads. Therefore, we propose a runtime approach that analyzes the workload during the execution of an application and exchanges data structures accordingly to account for changes in list sizes and operation counts over time.
Runtime selection of efficient data structures
In this Section, we present a runtime approach for the selection of efficient data structures for the analysis of dynamic graphs. Then, we perform a performance analysis using an artificial workload. Finally, we summarize the insights gained from the analysis.
Runtime approach
For our runtime approach, we assume that the workload (i.e., list sizes or operation counts) of an application changes drastically over time. In such a case, there is not a single data structure configuration which performs best throughout the complete execution and it would be necessary to continually change the data structures during execution for optimal performance. Based on this assumption, we propose an approach to monitor the list sizes and operation counts at runtime, use that information to make regular recommendations for the best configuration for the current workload, and finally exchange the data structures used to represent the dynamic graph in memory.
The instrumentation adds capabilities to the program to record the access statistics and list sizes during execution and perform a hot swap of data structures if required. Like in our compiletime approach, the profiling component regularly generates operation counts and average list sizes. The analysis component takes these statistics as well as the cost functions generated during the benchmarking phase as input to recommend a data structure configuration. In case this recommendation differs from the currently used configuration, the hot swap component replaces the lists in memory with new instances of the recommended data structure. Afterwards, the execution of the program is continued.
Hot Swap In our compiletime approach, the recommended data structures are assigned to the respective lists and the program is recompiled. In the runtime approach, these changes must be applied during the execution of the program. In case a new recommendation appears more efficient than the current one, we pause the execution and exchange the current data structures for the recommended ones. To exchange the data structure we create new instances of the recommended data structure and fill them with the elements representing the current state of the graph. Afterwards, we update all references that point to the respective list.
Performance analysis
To analyz the performance of our runtime approach, we generated an artificial workload where the operations executed on V and E as well as their sizes change over time to investigate how our approach performs compared to basic configuration for highly dynamic scenarios. We execute this workload for each of the 7 basic data structure configurations we used before and for our runtime approach. The runtime approach always begins execution using Array as the data structure for all lists. For each execution, we measure the runtime for processing the workload as well as the overhead of recommending data structures and exchanging them.
 1.
cont:V, cont:E  100k contains operations of random elements
 2.
get:V, get:E  100k get operations of random elements
 3.
iter:V, iter:E  10k iterations over all elements
 4.
add:V, add:E  1k additions of new elements
Each of these individual operations is performed 10 times before moving on to the next, forming a round consisting of 80 operations. We execute 4 such rounds, leading to a total of 320 separate operations.
All runtimes shown in the following are the average of 50 repetitions.
As the sizes of V and E do not change during the execution of cont, get, and iter, their runtimes only depend on the data structure used but remain similar for all repetitions. In contrast, each application of add:V and add:E increases the respective list size by 1k leading to an increase in their runtime with each repetition.
As indicated by our benchmarks, arraybased data structures (Array, ArrayList, HashArrayList) are most efficient for the execution of cont:V, get:V, and iter:V. For add:V, hashbased data structures (HashArrayList, HashSet, HashTable) perform best.
For operations executed on E, the results are more diverse: While HashArrayList, HashMap, and HashSet are the best choices when executing cont:E, HashMap is the fastest data structure for obtaining elements (get:E). When executing iter:E, ArrayList performs best. When adding elements, all hashbased data structures (HashArrayList, HashMap, HashSet, HashTable) outperform the others.
HashArrayList always performs well when either HashSet or ArrayList do so. This is expected because HashArrayList takes advantage of their respective benefits to execute these operations and shows the usefulness of this combined data structure.
Recommended data structures (for workload and set size, underlined: swap required)
list size  cont:V  get:V  iter:V  add:V  cont:E  get:E  iter:E  add:E 

10k  A  A  AL  HS  HAL  HM  AL  HS 
20k  A  A  AL  HS  HS  HM  AL  HS, HM 
30k  A  A  AL  HS  HS  HM  A  HM 
40k  A  A  AL  HS  HS  HM  A  HM 
Our approach correctly recommends the data structure which ran the fastest during the execution using the basic configurations (cf. Fig. 11 a): For all investigated list sizes, Array is recommended for the execution of cont:V and get:V. When executing get:V, ArrayList is proposed and HashSet for adding vertices (add:V). When obtaining elements from E (get:E), HashMap is recommended for all sizes. For the execution of cont:E, HashArrayList is recommended for list sizes below 20k while HashSet is selected for larger ones. Similarly, Array is recommended for executing iter:E on lists with 30k and more elements but ArrayList for smaller ones. When executing add:E, the recommendation changes during the second round: HashSet is recommended for E≤21k and HashMap for larger ones.
The runtimes of our runtime approach (denoted as RT) for executing a single round of this workload are shown in Fig. 11 b. Our approach achieves runtimes consistent with the expectation of following our recommendation of the fastest basic configuration (cf. Fig. 11 a). The only anomaly introduced in the runtime approach are spikes that can occur on the first execution of each operation batch. The reason for this behavior is that we have to execute a new operation at least once on the old data structure before we can recognize that swapping the data structure would be beneficial. For example, take the execution of get:E: During the first execution of this operation, E is still stored in HashSet, the best choice for the previously executed cont:E. During this first execution, the accessed operations are recorded by the profiling component and used by the analysis component to recommend a data structure that is best suited for this new workload. Afterwards, the hot swap component replaces these data structures with the recommended ones which leads to the performance improvement for the following executions.
Summary of the runtime approach
We proposed a runtime approach for recommending and exchanging the data structures used to represent a dynamic graph in memory. We evaluated our approach using an artificial, regularly changing workload. Our approach outperformed basic configurations by up to 7.34×. This shows that in scenarios where the workload behavior changes over time, our approach has the potential to achieve significant performance improvements for the analysis of dynamic graphs. Some questions, however, remain open and need to be investigated in future work:
What is the best recommendation given a realistic execution history? We currently assume that any overhead is justified when making our recommendation, which is obviously not a generally valid assumption. The problem of determining whether a system has shifted its workload sufficiently that the cost of the overhead of swapping data structures is outweighed by the performance gain of a faster data structure is not trivial. This problem can be broken up into several subproblems: How can the difference between a dynamic system changing its behavior and just making a few anomalous requests be determined? We currently assume that a realistic application of dynamic graph analysis will not erratically change its workload, but rather stay consistent to a slowly changing usage profile. We believe that this assumption is valid and supported by real world data, but the degree of consistency and the velocity of overall change varies from application to application. Determining these factors is critical in order to answer the above question and make an accurate recommendation. How much information should be taken into account when making our recommendations? This question pertains to how much of the execution history is relevant for our recommendation. On the one hand, correct processing of more information can never make the result less accurate, on the other hand taking into account too much information might make the system inflexible over time and significantly increase the overhead of our recommendation.
It may not be avoidable to use a certain degree of machine learning to make the best recommendation due to the sheer number and complexity of the involved variables.
On a lower level, closer to the implementation of data structures themselves, it should be investigated how the actual exchange of data structures can be improved. Instead of treating the swap between any two data structures over the same interfaces, more efficient ways to swap between specific data structures should be investigated.
Summary, conclusion, and outlook
In this work, we considered the problem of finding the most efficient data structures for representing a graph for the application of dynamic graph analysis.
We proposed a compiletime approach for optimizing these data structures. As a case study, we performed a measurement study of seven data structures, fitted estimation functions from the results, implemented our approach on top of a Javabased framework for dynamic graph analysis, and evaluated it using realworld datasets. Our results show that our optimization achieves speedups of up to 5.4× over basic configurations on realworld datasets.
The data structure configuration proposed by our approach outperformed all seven default configurations for the computation of all metrics for a constant workload. For nonconstant workloads, we achieved speedups in many but not all cases. Thereby, our approach is wellsuited for improving the analysis of dynamic graphs with a constant workload but not capable of adapting to the drastic changes of list sizes that can occur in nonconstant workloads.
To close this gap, we developed a new runtime based approach for the adaptation of graph data structures during the execution of an application. We analyzed the performance of our approach using a synthetic workload designed to capture most operations and generate a nonconstant workload. In this scenario, our approach performed as expected and achieved speedups over basic configuration of up to 7.3×.
In future work, we will further investigate the benchmarking phase of our approaches to generate more appropriate cost estimation functions. In addition, we will perform an extensive parameter study to understand the different aspects of the proposed runtime approach and look for methods to determine when to use which approach.
Endnotes
^{1} http://gnuplot.sourceforge.net
^{2} https://github.com/BenjaminSchiller/DNA.gdsMeasurements
^{4} https://github.com/BenjaminSchiller/DNA
^{5} We omitted the computation of motif frequencies used in previous work because the resulting operation counts and runtimes are very similar to those observed for the clustering coefficient.
Declarations
Acknowledgements
This work is partly supported by the German Research Foundation (DFG) within the Cluster of Excellence “Center for Advancing Electronics Dresden” (cfaed) and the Collaborative Research Center (SFB 912) “Highly Adaptive Energyefficient Computing” (HAEC).
Availability of supporting data
The source code for all components are available in opensource repositories on GitHub.
DNA Framework including compile and runtime approach.
https://github.com/BenjaminSchiller/DNA.
Sources for performing the measurement study:
https://github.com/BenjaminSchiller/DNA.gdsMeasurements.
Sources for performing the performance evaluation:
Authors’ contributions
BS developed the proposed approaches, implemented them, carried out the measurement study and the performance analysis, interpreted the results, and drafted the outline. CD implemented the function fitting component and aided in interpreting the results from the performance analysis. JC outlined the components of both approaches and formalized them using a common notation. TS drafted the outline and aided in the setup and evaluation of both approaches. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Ambedkar, C, Reddi KK, Muppalaneni NB, Kalyani D (2015) Application of centrality measures in the identification of critical genes in diabetes mellitus. Bioinformation 11(2): 90.View ArticleGoogle Scholar
 Bader, DA, Madduri K (2008) Snap, smallworld network analysis and partitioning: an opensource parallel graph framework for the exploration of largescale networks In: Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, 1–12.. IEEE.
 Bader DA, Berry J, AmosBinks A, ChavarríaMiranda D, Hastings C, Madduri K, Poulos SC2009. Stinger: Spatiotemporal interaction networks and graphs (sting) extensible representation. Georgia Institute of Technology, Tech. Rep.
 Batagelj, V, Mrvar A (1998) Pajekprogram for large network analysis. Connections 21(2): 47–57.MATHGoogle Scholar
 Blandford, DK, Blelloch GE, Kash IA (2003) Compact representations of separable graphs In: Proceedings of the fourteenth annual ACMSIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics. pp 679–688.
 Blandford, DK, et al. (2004) Experimental analysis of a compact graph representation.
 Braha, D, BarYam Y (2009) Timedependent complex networks: Dynamic centrality, dynamic motifs, and cycles of social interactions In: Adaptive Networks, 39–50.. Springer.
 Candau, S, Bastide J, Delsanti M (1982) Structural, elastic, and dynamic properties of swollen polymer networks In: Polymer Networks, 27–71.. Springer.
 Chabini, I (1998) Discrete dynamic shortest path problems in transportation applications: Complexity and algorithms with optimal run time. Transportation Research Record: J Transp Res Board1645: 170–175.View ArticleGoogle Scholar
 Ciglan, M, Averbuch A, Hluchy L (2012) Benchmarking traversal operations over graph databases In: Data Engineering Workshops (ICDEW), 2012 IEEE 28th International Conference on, 186–189.. IEEE.
 De Wael M, Marr S, De Koster J, Sartor JB, De Meuter W (2015) Justintime data structures In: 2015 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward!), 61–75.. ACM.
 Dmitriev, M (2004) Profiling Java applications using code hotswapping and dynamic call graph revelation In: ACM SIGSOFT Software Engineering Notes, 139–150.. ACM.
 Ediger, D, Jiang K, Riedy J, Bader DA (2010) Massive streaming data analytics: A case study with clustering coefficients In: Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on., 1–8.. IEEE.
 Ediger, D, McColl R, Riedy J, Bader DA (2012) Stinger: High performance data structure for streaming graphs In: High Performance Extreme Computing (HPEC), 2012 IEEE Conference on, 1–5.. IEEE.
 Gonçalves, KC, Vieira AB, Almeida JM, da Silva APC, MarquesNeto H, Campos SVA (2012) Characterizing dynamic properties of the SopCast overlay network In: 2012 20th Euromicro International Conference on Parallel, Distributed and Networkbased Processing, 319–326.. IEEE.
 Hunt, C, John B (2011) Java performance.. Prentice Hall Press.
 Jung, C, Rus S, Railing BP, Clark N, Pande S (2011) Brainy: effective selection of data structures In: ACM SIGPLAN Notices. ACM, 86–97.
 Kiczales, G, Hilsdale E, Hugunin J, Kersten M, Palm J, Griswold WG (2001) An overview of AspectJ In: European Conference on ObjectOriented Programming, 327–354.. Springer.
 Kossinets, G, Watts DJ (2006) Empirical analysis of an evolving social network. Science311(5757): 88–90.ADSMathSciNetView ArticleMATHGoogle Scholar
 Kunegis, J (2013) Konect: the koblenz network collection In: Proceedings of the 22nd International Conference on World Wide Web, 1343–1350.. ACM.
 Luk, CK, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. ACM Sigplan Notices 40(6): 190–200.View ArticleGoogle Scholar
 Madduri, K, Bader DA (2009) Compact graph representations and parallel connectivity algorithms for massive dynamic network analysis In: Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, 1–11.. IEEE.
 Malewicz, G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for largescale graph processing In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, 135–146.. ACM.
 Marti, J (2000) Dynamic properties of hydrogenbonded networks in supercritical water. Phys Rev E 61(1): 449.ADSView ArticleGoogle Scholar
 Broder, A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener J (2009) Graph structure in the web. Comp Net. 33(1):309–320.
 Mucha, PJ, et al. (2010) Community structure in timedependent networks. Science 1:12011.
 Macko, P, et al. (2014) Llama: Efficient graph analytics using large multiversioned arrays. PhD thesis In: Ph. D. Dissertation. Harvard University.
 Schiller, B, Strufe T (2013) Dynamic network analyzer building a framework for the graphtheoretic analysis of dynamic networks In: Proceedings of the 2013 Summer Computer Simulation Conference, 49.. Society for Modeling & Simulation International.
 Schiller, B, Castrillon J, Strufe T (2015) Efficient data structures for dynamic graph analysis In: 2015 11th International Conference on SignalImage Technology & InternetBased Systems (SITIS), 497–504.. IEEE.
 Schiller, B, Jager S, Hamacher K, Strufe T (2015) StreaMA StreamBased Algorithm for Counting Motifs in Dynamic Graphs In: International Conference on Algorithms for Computational Biology, 53–67.. Springer.
 Shirazi, J (2003) Java performance tuning. O’Reilly Media, Inc.
 Shacham, O, Vechev M, Yahav E (2009) Chameleon: adaptive selection of collections In: ACM Sigplan Notices, 408–418.. ACM.
 Sun, J, Xie Yinglian, Zhang H, Faloutsos C (2007) Less is More: Compact Matrix Decomposition for Large Sparse Graphs. In: SDM, 366–377.. SIAM.
 Trequattrini, R, et al. (2015) Network analysis and football team performance: a first application In: Team Performance Management.
 Xu, G (2013) CoCo: sound and adaptive replacement of java collections In: European Conference on ObjectOriented Programming, 1–26.. Springer.
 Zhao, P, Nackman SM, Law CK (2015) On the application of betweenness centrality in chemical network analysis: Computational diagnostics and model reduction. Combustion and Flame 162(8): 2991–2998.View ArticleGoogle Scholar