TemporalRI: subgraph isomorphism in temporal networks with multiple contacts

Temporal networks are graphs where each edge is associated with a timestamp denoting when two nodes interact. Temporal Subgraph Isomorphism (TSI) aims at retrieving all the subgraphs of a temporal network (called target) matching a smaller temporal network (called query), such that matched target edges appear in the same chronological order of corresponding query edges. Few algorithms have been proposed to solve the TSI problem (or variants of it) and most of them are applicable only to small or specific queries. In this paper we present TemporalRI, a new subgraph isomorphism algorithm for temporal networks with multiple contacts between nodes, which is inspired by RI algorithm. TemporalRI introduces the notion of temporal flows and uses them to filter the search space of candidate nodes for the matching. Our algorithm can handle queries of any size and any topology. Experiments on real networks of different sizes show that TemporalRI is very efficient compared to the state-of-the-art, especially for large queries and targets.

one or more edges between any two nodes) where each edge is associated with an integer, called timestamp, denoting when two nodes interact.
Here, we focus on Temporal Subgraph Isomorphism (TSI) problem. Given two temporal graphs Q and T, called query and target, respectively, and a time interval , TSI problem aims at finding a subgraph S of T (called occurrence of Q in T) such that: (1) Q and S are isomorphic, i.e. structurally equivalent, (2) edges in S follow the same chronological order imposed by corresponding matched edges in Q, (3) all interactions in T are observed in a time interval less than or equal to . The problem can have more than one solution, i.e. there can exist two or more occurrences of Q in T. Figure 1 shows a toy example of TSI.
Subgraph isomorphism problem in static graphs have been widely investigated and several methods have been presented (Cordella et al. 2004;Carletti et al. 2017;Bonnici et al. 2013;Bonnici and Giugno 2017;Han et al. 2019;Sun and Luo 2020;Han et al. 2013;Bi et al. 2016). However, as far as we know, few algorithms have been proposed for TSI or similar definitions of the problem Cunningham 2013b, 2016;Mackey et al. 2018;Sun et al. 2019b;Kim et al. 2018). Cunningham (2013b, 2016) introduce for the first time the TSI problem, where: (1) queries are static graphs with no timestamps, (2) all paths in the matched target subgraph must be time-respecting, i.e. any outcoming event from a node has to follow any incoming event from the same node, and (3) consecutive events are d-adjacent, i.e. the difference between their timestamps must not exceed a threshold d. The authors also presented a modified version of the subgraph isomorphism algorithm VF2 (Cordella et al. 2004) capable to check the time constraints of the query during the search. The same definition of the TSI problem we introduced above is given in Mackey et al. (2018) and Sun et al. (2019b). Mackey et al. (2018) describe a general algorithm which orders both query and target edges before matching. By doing so, whenever a new match between a target edge t and a query edge q is found, the algorithm can continue the search from the target edge following t in the ordering. Sun et al. (2019b) illustrate a method for counting the number of occurrences of a temporal query in a temporal target, based on partitioning the graph into subgraphs and then counting the pattern in each subgraph while matching motif topology.
A problem closely related to TSI is temporal motifs search, which consists in finding all small and recurrent temporal subgraphs of interactions, called motifs, having k nodes and/or m edges (Kovanen et al. 2011;Paranjape et al. 2017;Liu et al. 2019;Hulovatyy et al. 2015). The main difference between TSI and motif search is that TSI enumerates all the subgraphs matching a specific query, while motif search counts the occurrences of all possible subgraphs with a specified number of nodes and edges. Due to the computational complexity of the problem, motif search algorithms usually focus on very small motifs or on specific topologies. Temporal motifs were introduced for the first time by Kovanen et al. (2011) and are defined as ordered sets of events such that: (1) the time difference between two consecutive events is within a threshold t and (2) adjacent events, i.e. events sharing a node, are consecutive, so the node cannot participate in any other event in the meanwhile. Hulovatyy et al. (2015) relax the latter constraint and introduce the concept of dynamic graphlets to capture how the neighborhood of a node changes over time. This is done to reduce the computational complexity while obtaining approximate results. As in Mackey et al. (2018) and Sun et al. (2019b), Paranjape et al. (2017) define a temporal motif as a subgraph with a temporal order of edges. The authors present an algorithm to efficiently calculate the frequencies of 2-node, 2-stars and 3-node triangle temporal motifs. For bigger motifs they use a naive algorithm that first computes static matches, then filters out occurrences which do not match the temporal constraints. To tackle with the NP-completeness of temporal motif search problem, Liu et al. (2019) propose a general sampling framework to estimate motif counts. It consists in partitioning time into intervals, finding exact counts of motifs in each interval and weighting counts to get the final estimate, using importance sampling.
Here, we present a new algorithm for the TSI problem, called TemporalRI, inspired by RI algorithm for subgraph matching in static networks (Bonnici et al. 2013;Bonnici and Giugno 2017). TemporalRI can be applied to queries of any size, in terms of number of nodes and edges. This paper extends in several directions the preliminary work presented in Locicero et al. (2021) where a first implementation of TemporalRI was presented. From a theoretical point of view, the definition of TSI has been refined by introducing a parameter to set a maximum temporal interval in which interactions should be observed. The concept of temporal flow introduced in Locicero et al. (2021) has been better formalised to include more types of flows, improve filtering and speedup the matching process. From an algorithmic point of view, TemporalRI has been redesigned in order to support multiple edges between two nodes. More specifically, the new version of the algorithm performs matching edge-by-edge, while the preliminary version presented in Locicero et al. (2021) executes matching node-by-node as the original RI algorithm (Bonnici et al. 2013;Bonnici and Giugno 2017). We compare TemporalRI with Mackey's algorithm (Mackey et al. 2018) on a dataset of real temporal networks, showing that our method is faster, especially for large targets and queries with many nodes and edges, independently of .

Preliminary definitions
In this section we formally define the Temporal Subgraph Isomorphism (TSI) problem and the concept of temporal flows, which are used by TemporalRI to filter candidate pairs of query and target nodes for the matching. Through the paper we will use the terms "graph" and "network" interchangeably.

Temporal subgraph isomorphism
A temporal graph (or network) is a pair G = (V , E) , where V is the set of nodes and E ⊆ V × V × R is the set of edges. Each edge is a triplet (s, d, t), where s is the source of e, d is the destination of e and t is the timestamp of e. Triplets in E are distinct, therefore multiple edges between two nodes having the same timestamp are not allowed.
If ∀(s, d, t) ∈ E also (d, s, t) ∈ E , then G is undirected, otherwise G is directed. With the notation e.source, e.dest and e.time we represent the source, the destination and the timestamp of an edge e, respectively. If (s, d, t) ∈ E or (d, s, t) ∈ E we say that s and d are neighbors. With Neigh(u) we denote the set of all neighbors of node u. The out-degree of a node u, outDeg (u), is the number of edges having u as source. Likewise, the in-degree of u, inDeg (u), is the number of edges having u as destination. The total degree of u is the sum of its in-and out-degree.
Given two temporal graphs Q = (V Q , E Q ) and T = (V T , E T ) , called query and target, respectively, and an integer , the Temporal Subgraph Isomorphism (TSI) problem consists in finding an injective function f : V Q → V T , called node mapping and an injective function g : E Q → E T , called edge mapping, such that the following conditions hold: Condition 1 ensures that edge mapping g is consistent with node mapping f. Condition 2 means that the chronological order of query edges based on their timestamps must be respected in the target too. So, timestamps of query edges are more like indexes denoting in which order target interactions should happen. Condition 3 implies that all matched target edges must be observed within a fixed time window. This is a reasonable constraint in many real contexts, because interactions that occur far apart in time are likely to be unrelated each other. For example, in a communication network, node B receives a message from node A and then after a finite amount of time it may reply back to A or act like a broker and send another message to a third node C. Notice that Condition 3 only applies to the target, indeed timestamps in the query can assume any value. The TSI problem can have one or more solutions. Given an edge mapping g, a match of Q in T is the set of pairs of query and target matched edges M = {(q 1 , g(q 1 )), (q 2 , g(q 2 )), . . . , (q k , g(q k )} , where k = |E Q |.
An occurrence of Q in T is a graph O formed by edges g(q 1 ), g(q 2 ), . . . , g(q k ) and all nodes that are sources or destinations of at least one of these edges. Figure 1 illustrates an example of application of the TSI problem with = 5 . There is exactly one match of query Q in target T, which is M = {((a, b, 1), (C, A, 11)), ((b, c, 2), (A, B, 13)), ((c, b, 3), (B, A, 15)), ((c, d, 3), (B, D, 15))} , and nodes and edges of the corresponding occurrence are drawn in red. The two subgraphs on the right, S 1 and S 2 , are not occurrences of Q in T because S 1 violates the constraint, while S 2 does not satisfy the chronological order imposed by query edges.

Temporal flows
Two edges e 1 and e 2 of a temporal graph G sharing at least one node u form a temporal flow, denoted as F = {e 1 , e 2 } . u is the center of the flow. Depending on the number of nodes shared by e 1 and e 2 , the timestamps of the two edges and whether G is directed or not, we can distinguish among different types of flows. Figure  For simplicity, each flow is uniquely identified by a code of the form " x − y " for undirected and " x − y − z " for directed networks, where: • x is the number of nodes involved in the flow (2 or 3); • y represent the temporal direction of the flow: • Asynchronous (AS) e 1 and e 2 have the same direction but different timestamps; • Synchronous (S) e 1 and e 2 have the same direction and equal timestamps; • Forward (F) a flow where e 1 .dest = e 2 .source and e 1 .time < e 2 .time (only in directed networks); • Backward (B) a flow where e 1 .dest = e 2 .source and e 1 .time > e 2 .time (only in directed networks).
Note that the undirected flow "2-S" is not allowed, because by definition of temporal graph we cannot have multiple edges between two nodes with the same timestamp. Given a node u, we can build, for each flow type C, the set of all flows of class C centered in u. The result is a vector of temporal features S(u) , called temporal signature, where each feature is a set of flows of a certain type. With S(u)[i] we denote the ith component of the vector and with |S(u)[i]| the cardinality of the corresponding set. Figure 3 shows the temporal signatures of each query and target node of Fig. 1.
By comparing temporal signatures of different nodes, we can derive a partial order binary relation , called temporal inclusion. Given two nodes u and v with temporal signatures S(u) and S(v) , respectively, of length l, we say that u is temporally included in In other words, u v iff u contains all the types of flows centered in v and, for each of such types, at least the same number of flows of that type centered in v.
By looking at the temporal signatures of nodes in Fig. 3, we observe that node A contains at least one occurrence of all types of flows centered in node b, so b A . Instead, b B because there is no "2-B-IO" flow centered in node B.

Description of RI algorithm
TemporalRI is inspired by the RI-DS algorithm for subgraph isomorphism in static graphs (Bonnici et al. 2013;Bonnici and Giugno 2017), which introduces in RI the concept of compatibility domains to filter candidate pairs of query and target nodes before starting the matching. The three main steps of RI are: (1) computation of compatibility domains, (2) computation of the ordering of query nodes for the matching, (3) matching process. In the following, we briefly describe each step.

Computation of compatibility domains
In the first step, RI computes, for each query node q, the compatibility domain Dom(q) which consists of the set of nodes in the target graph that could match q based on node in-and out-degrees. Formally, a target node t is compatible to a node q iff: (1) inDeg(q) ≤ inDeg(t) , (2) outDeg(q) ≤ outDeg(t) . This step speeds up the matching process, because only target nodes in Dom(q) are considered as possible candidates for a match to q during the search.

Ordering of query nodes
Before starting the matching, RI computes the order in which query nodes have to be processed during the search. The processing order is computed without considering the target graph. The key idea is that query nodes which both have high degree and are highly connected to nodes already present in the partial ordering come earlier in the final ordering. The first node of the ordering is randomly chosen among the nodes with highest total degree. The next node in the ordering is chosen among the remaining nodes as the one with the maximum number of neighbors already present in the ordering. In case of tie, the algorithm chooses the node with the maximum number of neighbors, which in turns are also neighbors of nodes already present in the ordering. In case of further tie, the node with highest total degree is chosen. The process is iterated until all query nodes are in the ordering.

Matching process
Following the previously defined ordering of query nodes, RI performs matching to find occurrences of the query within the target. The matching process starts with the first node of the ordering and an initially empty match M . If a new match between a query node q and a target node t is found, the pair (q, t) is added to M and RI continues the search with the next node in the ordering. If all query nodes have been matched, M constitutes a new match of Q in T, so it can be added to the list of matches found. Whenever all query nodes have been matched or no match has been found for a query node, the algorithm performs backtracking and continues the search from the last matched node. Target nodes evaluated for a match with a query node q are chosen from a list of candidates. For the first node of the ordering the set of candidate target nodes for matching is its compatibility domain, while for any other query node q candidates are both: (1) neighbors of the target node t ′ that has been already matched to the previous query node q ′ in the ordering, (2) compatible to q. The choice to add a candidate pair (q, t) to M is made based on the following feasibility rules: (1) t has not been already matched, (2) for every already mapped node q ′ neighbor of q, there must be an edge between t and f (q ′ ) in T (recall that f is the node mapping function). The latter rule ensures the consistency of the partial mapping M in case the new pair (q, t) is added to M.

Description of TemporalRI
As in the original RI-DS algorithm, the three main steps of TemporalRI are: (1) computation of compatibility domains, (2) ordering of processing query edges for the matching, (3) matching process. In the following subsections we will detail each step.
For the description of the algorithm and its time complexity analysis we refer to a temporal query Q = (V Q , E Q ) with k nodes and a temporal target T = (V T , E T ) with n nodes. To help the reader understand the functioning of TemporalRI, a toy example is presented in Fig. 4.

Compatibility domains
In order to build compatibility domains of query nodes, TemporalRI exploits not only node degrees but also the temporal signature of query and target nodes. Computation of compatibility domains is outlined in Algorithm 1.
First, temporal signatures for query nodes are built (lines 1-2). Then, for each target node t, we compute the relative signature (line 4) and check if t is mappable to some query node (lines 7-11). t is added to the domain of a query node q iff: The first two conditions apply the standard degree rules that are valid also for subgraph isomorphism in static networks. The third condition is based on the temporal signatures of both nodes.
To efficiently compute the temporal signature S(u) of a node u (lines 2 and 4), we scan all pairs of edges incident in u. Each pair of edges defines a flow of a certain class, that can be added to the corresponding entry of S(u).
In Fig. 4 the compatibility domain of query node b contains only target node A. In fact, b has both in-degree and out-degree equal to 1, while target nodes C and D have no incoming edges. Target node B is not compatible to b because there is a 2-B-IO flow centered in b that is missing for B.

Ordering of processing query edges
TemporalRI follows an iterative approach to search for a match between Q and T. Starting from an empty match, the algorithm looks for a mapping between a first query edge and a target edge. Once the first edge has been mapped, TemporalRI tries to match a second edge, and so on, until all query edges have been mapped.
To this aim, it is crucial to find an optimal ordering of processing query edges. A simple greedy approach consists in choosing at each step the query edge with the minimum number of candidates. Ideally, this choice should be done dynamically (i.e. during the search). However, this would be computationally expensive because the number of candidates also depends on the target edges and nodes we have already mapped. Instead, TemporalRI uses a static approach that consists in defining the ordering before starting the matching process. Algorithm 2 details the computation of the ordering.
First, TemporalRI computes an ordering of nodes µ mainly based on their total degree (lines 1-10). This is very similar to RI's ordering of query nodes, described in the previous Section.
The first node in the ordering is the one with highest degree (line 2). In case of tie, the node with smallest compatibility domain is chosen (line 3). In case of further tie, one of the candidate nodes is chosen randomly (line 4). A similar approach is followed for choosing the next nodes (lines 6-10), selecting, at each step, the node with the highest number of connections with nodes that are already present in the current ordering (line 7). Ties are handled as previously described (lines 8-9).
In the toy example of Fig. 4, the node ordering is µ = [b, c, a] . In fact, b is the node with highest degree. Nodes c and a are both linked to b, but c precedes a because c has a smaller compatibility domain.
Starting from the resulting ordering of query nodes µ , TemporalRI derives an ordering of query edges O (lines 11-18). For increasing values of i, we consider the ith node u in µ and all nodes that precede u in µ and are neighbors of u (lines 12-14). Following the ordering of such nodes in µ , we add all edges between them and u to O (lines 15-17). Therefore, in the computation of edge ordering each query edge is considered only once and immediately added to the partial ordering.
In Fig. 4 edges e q 2 and e q 3 link nodes b and c that come before a in µ . Hence, edge e q 1 which connects nodes a and b, must follow both e q 2 and e q 3 , yielding the final ordering O = [e q 2 , e q 3 , e q 1 ].

Matching process
Following the previously defined ordering of query edges, TemporalRI performs matching to find occurrences of the query within the target. The matching process is outlined in Algorithm 3.
Matching is done by building a node mapping function f : V Q → V T , an edge mapping function g : E Q → E T and the corresponding match M . The list of candidates to scan during the search is stored in variable Cand. To ensure that the chronological order imposed by query edges is satisfied each time the partial match is extended, a sorted list tTimes with timestamps of mapped target edges is also stored. A similar list qTimes is calculated for the query before starting the match (line 1). Finally, variables minTime and maxTime contain the lower and upper bounds of timestamps of candidate target edges during the search. All such information must be kept updated during the process to guarantee the correctness of the results. Figure 4b shows a computational graph describing all the steps performed during matching for the toy example of Fig. 4a. Each node of the computational graph is annotated with the state of the computation, i.e. the current values of all the auxiliary variables previously described. Edges correspond to transitions from one state of the computation to another one, which happen whenever a new match between a query edge and a candidate target edge is examined. Numbers attached to edges indicate the order in which transitions are performed.
The matching process starts with the first edge e in the ordering (line 2). An initial list of candidates Cand(e) is calculated (line 3) and the list is scanned starting from the first element. A given candidate c can be matched with e iff: (1) c has not been mapped yet, (2) by adding c, the resulting match M satisfy the chronological order imposed by query edges (line 12). The latter condition can be easily verified by comparing the rank of e's timestamp in qTimes with the rank of c's timestamp in tTimes. Whenever a new match is found, the current matching M is updated (line 13). If e was the last query edge to match, M is added to the list of matches found (line 15) and the search goes on with the next candidate (line 16). Otherwise, we first update the mapping and the target's time information (lines 18-25), then we continue the search with the next query edge to match (lines 26-27) and find the set of candidates for such edge (line 28). In the computational graph of Fig. 4, transitions 1 and 2 imply an update of mapping and time information, while transition 4 updates only current matching. If candidate c does not match e, the algorithm just skips to the next candidate (line 31). In Fig. 4b, this is represented by transition 3. When all candidates for e have been examined (line 6), Tempo-ralRI performs backtracking, i.e. restores mappings and target's time information to the previous values (line 7) and goes back to the previous query edge (lines 8-9). Backtracking implies removing: (1) the mapping for the last matched query edge e q , (2) the last match from M and (3) optionally, the mapping for one or both nodes of e q (e.g. in transition 7). Moreover, we need to remove the timestamp of the last target matched edge from tTimes and, if needed, update minTime and maxTime. In the computational graph of Fig. 4, backtracking corresponds to transitions 6 and 7 which restore the computation to the previous state. To guarantee that every time we do backtracking the search continues from last examined candidate for the previous query edge, TemporalRI uses a set of list iterators CandIndex, one for each query edge. CandIndex(e) contains the position of the last examined candidate in Cand(e). Every time we pass to the next candidate, the corresponding iterator is incremented (lines 16 and 31). The search ends when no more candidates are available for the first query edge. At the end of the process, TemporalRI returns the list of all matches found (line 32).
The procedure used to find the set of candidates Cand(e) for a query edge e = (u, v) is detailed in Algorithm 4.
The content of Cand(e) depends on whether u and/or v have already been mapped or not. Moreover, except for the first edge in the ordering, we can also use temporal information about already mapped target edges to constraint the search for candidates. In fact, minTime and maxTime define the time interval of the current match and is the maximum desired time interval between any two edges. This implies that we can look for edges whose timestamp is between minTime and maxTime, as well as below minTime and beyond maxTime of at most a quantity η = � − (maxTime − minTime) (line 4). So, the temporal interval where to search candidates is given by T = [minTime − η, maxTime + η] (line 5).
We distinguish four cases: 1 If both u and v are unmapped (line 1), then Cand(e) is the set of all edges between any target node compatible to u and any target node compatible to v (line 2); 2 If both u and v have been mapped (line 6), then Cand is the set of all possible target edges between f(u) and f(v) with timestamp t ∈ T (line 7); 3 If only u has been mapped (line 8), then Cand is the set of all possible target edges between f(u) and any node compatible to v with timestamp t ∈ T (line 9); 4 If only v has been mapped, then Cand is the set of all possible target edges between any node compatible to u and f(v) with timestamp t ∈ T (line 11).
Note that the case where both u and v are unmapped is possible iff e is the first edge in the ordering. In fact, by construction of the ordering O (Algorithm 2) each following edge must have one or two nodes in common with at least one of its predecessors in O.

Complexity analysis
In this subsection we analyze the time complexity of TemporalRI. Suppose, for simplicity, that = ∞ and m << n is the average number of edges (with different timestamps) connecting two nodes both in the query and in the target. Let d Q and d T the average number of ingoing and outgoing edges in query and target nodes, respectively. In the worst case both Q and T are complete graphs, so d Q = O(mk) and d T = O(mn) ≃ O(n) . Moreover, |E Q | = O(k 2 ) and |E T | = O(n 2 ).
The first step of TemporalRI is the computation of compatibility domains (Algorithm 1). The complexity of this step mainly depends on the efficiency of the computation of temporal signatures, especially in the target (line 4). To calculate the signature of a node n we have to consider all possible pairs of distinct edges incident in n. Then, for each such pair of edges we identify the flow type they form and increment the corresponding count. In the query, there are on average d Q * (d Q − 1)/2 possible pairs of edges to examine for each node. Therefore building the temporal signature for all k query nodes (lines 1-2) requires O(m 2 k 2 k) = O(m 2 k 3 ) . Likewise, computing the temporal signature of a target node t (line 4) takes O(n 2 ) . However, in practice is usually finite, so computing signatures is much faster because we can just consider pairs of edges for which the difference between timestamps is within . Since temporal signatures have small and finite lengths, checking if a query node q is temporally included in t (lines 6-9) can be done in constant time and the check for all query nodes only requires O(k). Therefore, the overall computation of domains takes n * O(n 2 ) * O(k) = O(kn 3 ) time.
Then, TemporalRI computes the ordering of query edges for the matching process (Algorithm 2). The preliminary ordering of query nodes (lines 1-10) is an iterative process. The first node of the ordering can be identified in O(k) time by simply looking at the total degree of all query nodes and eventually the cardinality of their domains (lines 2-3). For the next steps, we need to count, for each node u not yet in the ordering, how many neighbors of u are already in the ordering and this requires scanning the list of u's neighbors (line 7). Therefore, ordering nodes takes O(mk * k) = O(mk 2 ) time. Ordering of query edges (lines 12-17) requires scanning the list of neighbors of each node, so it takes O(mk 2 ) time.
The core of TemporalRI is the matching process (Algorithm 3). The computational complexity of this step mainly depends on the number of examined candidate edges for the matching, which can be retrieved in constant time from the adjacency lists of target nodes using Algorithm 4. Assuming = ∞ , all |E T | target edges are candidates for the first query edge in the ordering. Candidates for next query edges are chosen among the target edges sharing at least one node with one or more previously matched edges and they are at most d T = O(n) . Again, in practice is finite, so much less candidates are examined. Any combination of |E Q | candidates, one for each query edge, is a possible match between Q and T.
In the worst case (i.e. assuming no early backtracking), all possible combinations of candidates are examined and the number of combinations is given by Checking if a candidate edge c matches a query edge e (Algorithm 3, line 12) requires a comparison between two ranks in sorted lists of length at most |E Q | = O(k 2 ) which takes O(k 2 logk 2 ) . All remaining operations consist in updating mappings and time information and require constant time. Therefore, the complexity of the matching process is O(m k n 2k ) , which is also the time complexity of TemporalRI.

Experiments
In this section we assess the performance of TemporalRI considering a dataset of 7 real medium and large networks and queries of different sizes randomly extracted from such networks. As far as we know, this is the first comprehensive evaluation of temporal subgraph isomorphism algorithms, since all previous works only consider few small queries (Mackey et al. 2018;Redmond and Cunningham 2016;Sun et al. 2019b). For the comparison with other tools, we only focused on exact motifs counting or subgraph isomorphism algorithms having the same or very similar definition of temporal queries (Mackey et al. 2018;Sun et al. 2019b;Paranjape et al. 2017). Among these,  Mackey's algorithm (Mackey et al. 2018) was the only one we managed to compare with TemporalRI. However, we had to slightly modify the original implementation of Mackey's algorithm, because the latter does not work correctly with queries and/or networks having simultaneous contacts. For all other tools, it was impossible to make any comparison. Sun et al. (2019b) did not provide any implementation of their algorithm. SNAP temporal algorithm (Paranjape et al. 2017) works only for 2-node and specific 3-node  Fig. 1, b nodes of target T of Fig. 1. Based on this, only target node A temporally includes query node b, because A contains, for each flow type C, at least the same number of flows of type C centered in b Fig. 4 Toy example describing the matching process performed by TemporalRI. a A query Q, a target T and the values of all variables computed before starting the matching process, including compatibility domains Dom, ordering µ and O of processing query nodes and edges, respectively, and sorted list qTimes of query timestamps. b Computational graph describing all steps performed by the algorithm during matching. Each node represent a state of the process and is annotated with the values of all the auxiliary variables described in Algorithm 3. Edges correspond to transitions from one state of the computation to another one. Numbers indicate the order in which transitions from one state to another one in the SST are done queries, namely 3-edge stars and 3-edge cliques. The general algorithm described in their paper is not actually temporal, because it performs subgraph matching in a static graph and then, in a post-processing step, it removes the occurrences that do not match the temporal constraints. TemporalRI has been implemented in Java. All experiments have been performed on an Intel Core i5-8259U with 16 GB of RAM. The source code of TemporalRI is available on Github at https:// github. com/ GMica le/ Tempo ralRI. For reproducibility purpose, in the same Github repository, we also make available the source code of the modified version of Mackey's algorithm together with the temporal networks and queries used in our experiments.

Dataset and setup
Temporal networks for the experiments were downloaded from Network Repository (Rossi and Ahmed 2015; http:// netwo rkrep osito ry. com). Table 1 reports, for each network, the number of nodes, the number of edges, the number of static edges (i.e. ignoring timestamps), the number of timestamps and the resolution, i.e. the minimum difference between consecutive timestamps.
SFHH-conf is a proximity network describing the face-to-face interactions of 405 participants to the 2009 SFHH conference in Nice, France (Génois and Barrat 2018). as-topology is a peer-to-peer communication network of Autonomous Systems (AS) with data collected between February and March 2010. contact-dublin is a dynamic contact network of people participating to the Infectious SocioPatterns event that took place at the Science Gallery in Dublin, Ireland (Isella et al. 2011). email-enron describes e-mail exchanges between employees of the Enron corporation between 1999 and 2003 (Cohen 2009). digg-friends is a network of friendship links between users of the American news aggregator web service Digg collected over a period of one month in 2009 (Hogg and Lerman 2012). yahoo-messages contains email exchanges between users of Yahoo Mail in a one-month interval in 2010. prosper-loans is a directed network of transactions occurring from November 2005 to September 2011 between members of Prosper.com, a website where people can either invest in personal loans or request to borrow money (Redmond and Cunningham 2013a). Edges link lenders to borrowers and are timestamped with the origination date of the loan.
Queries with k = 3, 6, 9 edges were randomly extracted from each network. The query extraction procedure is iterative and works as follows. First, an edge is randomly sampled from the target. At each iteration, an edge having at least one node in common with (i.e. incident to) at least one of the already sampled edges is added to the partial query. To avoid any bias, this is done by picking uniformly at random one of such incident edges. If no incident edge exists, the extraction process is repeated from scratch starting from a new edge. The procedure ends when a query with k edges is obtained. Finally, edge timestamps of the extracted query are randomly sampled with replacement from the set {1, 2, . . . , k} . We extracted 100 queries for each value of k from each network, for a total number of 2100 queries.
For each network and each query, we ran TemporalRI and Mackey's algorithm with different values of time interval . The choice of is not easy, because depends both on the size of the query and on the rate at which interactions have been measured to build the target network. We considered = kr, 2kr, 3kr , where r is the resolution of the network. In each experiment, we set a timeout of 30 min for both algorithms.

Experimental results
To compare the performance of TemporalRI and Mackey's, we first focused on the experiments completed by both algorithms before the timeout. In Figs. 5, 6 and 7 we show boxplots of the running times obtained by TemporalRI and Mackey's on the set of queries completed by both algorithms with k = 3 , k = 6 and k = 9 edges, respectively, on varying . In all these experiments the two algorithms returned the same number of occurrences. For k = 9 , Mackey's didn't manage to complete any query before the timeout in the as-topology network (see also Table 4), therefore in this case boxplots for the two algorithms are not drawn. Results show that TemporalRI is almost always faster on average than Mackey's algorithm, independently of and the size of query. The speedup is higher for larger queries and denser networks. We believe that this is due to two aspects that distinguish Tempo-ralRI from Mackey's algorithm: (1) the effective filtering of candidates done using temporal flows, (2) the ordering of query edges based on the degrees of query nodes. Indeed, in Mackey's algorithm there is no a-priori filtering of candidate edges. This may affect a lot the running time, especially in the case of large queries, where there are more combinations of target edges to explore. In this scenario, temporal flows prove to be features that are both fast to compute and effective to speedup matching. Concerning the ordering of processing query edges, Mackey's algorithm just order them based on their timestamps. This strategy does not take into account the structure of the query. However, nodes with high degree and edges incident in such nodes tend to have less candidates. Though filtering seems to have the greatest impact on performance, there are several works in the context of subgraph matching in static graphs showing that ordering can be important too (Carletti et al. 2017;Bonnici et al. 2013;Bonnici and Giugno 2017;Han et al. 2019;Bi et al. 2016). In general, for both algorithms does not seem to impact a lot on the running times.
In order to analyze the behaviour of TemporalRI and Mackey's on the whole set of experiments, we calculated, for every combination of values of query size k and , in each network, the percentage of queries completed before the timeout by: (1) both algorithms; (2) only one of them; (3) none of them. Results are reporeted on Tables 2, 3 and  4. As expected, when the query size increases the number of completed tests drops down for both algorithms, but TemporalRI always manages to complete more experiments than Mackey's. Mackey's algorithm seems to suffer in as-topology and prosperloans. This could be related to the low number of distinct timestamps in these two networks, yielding to an increased average number of occurrences of tested queries, even for small values of . Once again, except for enron-email, does not seem to impact on the percentage of experiments completed before the timeout. Overall, out of 6300 experiments, there were just 11 cases (0.17%) where only Mackey's algorithm completed before the timeout, while in 909 queries (14.43%) only TemporalRI ended before the timeout. In 4675 (74.2%) queries and in 705 (11.19%) queries both algorithms and none of the two completed before the timeout, respectively.
Finally, we compared the performance of TemporalRI and Mackey's based on the topology of the query. For each query size k, results were grouped according to the topology class and the number of nodes of the query. We specified five distinct topology classes: paths, stars, trees, cliques and cyclic graphs. The notation "x-class" indicate a cluster of queries with x nodes and having one of the above topology classes (e.g. 3-path, 4-tree, etc.). Grouping was done ignoring edge direction, timestamps and possible multiple edges between two nodes. For each cluster, we computed the percentage of tests in which TemporalRI was the fastest algorithm and we did the same for Mackey's. We excluded from this analysis all the experiments in which none of the two algorithms ended before the timeout. Percentages for each cluster of queries are reported in Figs. 8,9 and 10. Interestingly, in all three cases ( k = 3, 6, 9 ), Mackey's algorithm seems to be generally faster than TemporalRI in queries with very few (2 or 3) nodes, while TemporalRI is almost always faster than Mackey's in queries with many (from 4-5 on) nodes.
To sum up, TemporalRI is generally faster than Mackey's algorithm, especially in large queries and targets that are either large or have a limited number of timestamps. In queries with very few nodes and many multiple contacts between nodes, Mackey's algorithm behaves generally better than TemporalRI. In all other cases, the latter is faster, independently of the topology of the query. show that TemporalRI is overall faster than the state-of-the-art Mackey's algorithm independently of and the number of query edges. The speedup is higher for large queries and large networks. It is worth noting that the concept of temporal flow and the techniques introduced to prune the search space are general and therefore can be, in principle, plugged into any subgraph matching algorithm to handle the TSI problem.
In this paper we decided not to consider labeled temporal graphs, i.e. graphs containing node and edge labels, even though both TemporalRI and Mackey's algorithm are designed to handle such graphs. This is due to the fact that adding labels just Table 3 Number of experiments with k = 6 completed by both algorithms, TemporalRI only, Mackey's algorithm only and none of them  Table 4 Number of experiments with k = 9 completed by both algorithms, TemporalRI only, Mackey's algorithm only and none of them

S-C (%) A-T (%) C-D (%) E-E (%) D-F (%) Y-M (%) P-L (%) Total (%)
= requires few additional checks without changing the structure of both algorithms and affecting results. For the future, we plan to optimize TemporalRI and implement it on top of a SPARK framework to deal with very large networks. We also aim to adapt TemporalRI as a general algorithm for counting temporal motifs. This implies the need of a clever strategy to identify and count occurrences of different temporal subgraphs without scanning the target several times.