Spreading of performance fluctuations on real-world project networks

Understanding the role of individual nodes is a key challenge in the study of spreading processes on networks. In this work we propose a novel metric, the reachability-heterogeneity (RH), to quantify the vulnerability of each node with respect to a spreading process on a network. We then introduce a dataset consisting of four large engineering projects described by their activity networks, including records of the performance of each activity; such data, describing the spreading of performance fluctuations across activities, can be used as a reliable ground truth for the study of spreading phenomena on networks. We test the validity of the RH metric on these project networks, and discover that nodes scoring low in RH tend to consistently perform better. We also compare RH and seven other node metrics, showing that the former is highly interdependent with activity performance. Given the context agnostic nature of RH, our results, based on real-world data, signify the role that network structure plays with respect to overall project performance.


Introduction
Spreading broadly refers to the notion of an entity propagating through a networked system, typically fueled by a dynamical process [1]. Spreading processes are a powerful set of tools for modelling a wide-range of real-world phenomena, including the dissemination of (dis)information on social media [2], the propagation of a pathogen within a population [3], cyber attacks on computer networks [4] and delays in transportation systems [5]. Node degree [6], betweenness centrality [7] and eigenvector centrality [8] are all examples of topological metrics used to approximate the role of individual nodes in the context of spreading processes, a problem that yet remains open in the extant literature [9,10].
The problem is further complicated by the scarcity of reliable ground truth. Datasets providing an individual-level description of a spreading process within a population are few [11,12], with aggregated reports being more common [13]. Even when working with real-world networks, researches often resort to simulations for what concerns the spreading dynamics itself [14,15]; and when information describing the network structure is also incomplete, the interplay between the two problems further amplifies the difficulty of the task [16].
A bountiful, yet underexploited, source of reliable data, describing both complete network structures and the fine-grained evolution of real spreading processes on them, can be found within the field of project management [17,18,19]. Projects are described by schedules, time-ordered lists of interconnected activities that can be naturally modelled as directed acyclic graphs (DAGs) [20].

arXiv:2012.00480v1 [cs.SI] 1 Dec 2020
Spreading can be used to describe performance fluctuations on project networks: activities completed behind or ahead of schedule can impact other activities downstream and initiate a spreading process [21,22]. Project schedules record both planned and real starting dates for all activities, therefore providing a complete record of the performance fluctuation dynamic.
Real-world projects often perform poorly in terms of both time and cost, a fact that holds true across different countries, companies, and industries [23,24]. As an example, studies have shown that, in the construction sector, almost nine out of ten projects are subject to cost overruns, for an average overrun cost estimated to be as high as 45% [25,26].
Large failures in projects often start as localised phenomena, with the performance of a single activity eventually impacting the performance of the entire project. Cases have been documented where an initial disruption located in a single activity ended up affecting almost a third of the entire project [27], or increasing its final cost by 20 to 40% [28]. In this respect, the networked structure of the schedule has been shown to play an important role [29,30].
Methodologically, most of the efforts aimed at modelling project performance through their associated networks have centered on cascade models [31], for example by focusing on how small-scale delays can trigger project-wide cascades [29,19], or by studying the role of indirect interactions between activities [32]. With the present study, we contribute to this line of work by developing a measure that draws a direct connection between topology and performance at the activity level, and then validate it using real performance data.
Our contribution is two-fold. First, building on prior work by Estrada [33] and by Ye and colleagues [34], we introduce a novel measure called reachabilityheterogeneity (RH), which quantifies heterogeneity on DAGs. The RH is defined both at the global (how heterogeneous is a network) and local level (how much a node contributes to the heterogeneity).
Heterogeneity plays an important role in determining how vulnerable a network is with respect to spreading processes [35]. If all nodes have equal spreading power, then the network is maximally robust, not presenting any weak spots to either targeted attacks or random failures [36]. Numerous studies quantify heterogeneity by examining the distribution of some node-level measure (examples including degree [37], memory [38,39], activity potential [40,41], attractiveness [42], burstiness [43] and modularity [44]), and examine the relationship between such heterogeneity and the spreading dynamics.
The novelty of our contribution consists in leveraging a topological feature that is intrinsically related to the spreading process: the number of descendants and of ancestors. Due to the absence of cycles, the size of the ancestry trees plays an especially important role in DAGs; and, to the best of our knowledge, there is no study examining the relevance of its heterogeneity in spreading processes. Our analysis qualitatively verifies that the global RH score is a good indicator of the heterogeneity of the ancestry and descendancy distributions.
Our second contribution consists in the introduction of a dataset describing the networks of activities that make up four real-world, complex projects; these data provide a reliable ground truth for benchmarking spreading processes. We experimentally validate the accuracy of RH against performance records from the projects' Table 1 For each of the four activity networks we report the number of activities (nodes), dependencies (directed links) and weakly connected components, and the size of the largest weakly connected component.

Project
Activities activities. Our results show that best-performing nodes tend to score low in RH, making our metric a good tool for their identification. Furthermore, we compare the local RH to seven other node metrics by computing the mutual information between them and the activity performance; RH reports the highest (or, in one case, third-highest) mutual information values among all candidates. Given the context agnostic nature of RH, our results signify the role that the network structure has with respect to overall project performance, and indicate that the RH score gives computational embodiment to the notion that a network is maximally robust against spreading when all nodes contribute equally to it.

Project Data
We use data from four complex engineering projects, where 'complex' refers to the non-triviality of underlying dependencies [45,46,47]. For each project, we use the schedule to generate the respective activity network [20]. The project schedule consists of a list of activities and in a list of dependencies between them. For each activity, the schedule contains the planned and actual start and end date. For each activity, the schedule contains the planned and actual start and end date. Target dates for an activity correspond to its start and end date as initially planned. Actual dates, as the name suggests, correspond to the dates when the activity was actually initiated and completed. The schedule naturally lends itself to be represented as a network, with activities taking the role of nodes and dependencies representing directed links among them (from now on, we will use the terms 'node' and 'activity' interchangeably). A link from node i to node j means that activity i must first be completed before activity j can start. At this stage, we remove from the network all isolated nodes, since these nodes are not capable of contributing to any sort of spreading in a meaningful way. Notice that activity networks are DAGs, as cyclic dependencies between activities are not allowed.
The four projects analysed here detail the construction of different kinds of infrastructure: a highway (HW), a data centre (DC), a wind farm (WF) and a power network (PN). The number of activities and dependencies for each project ranges from less than two hundred to more than a thousand (Table 1). Activity networks do not necessarily consist of a single component: projects may have a modular structure, being composed of independent sections. The number of weakly connected components for each network, and the size of the largest one, are also reported in Table 1. We verify that all four networks are acyclic, as expected. Figure 1 shows the reverse cumulative distribution of the number of ancestors and descendants for each project network, divided by the network's size. The four Figure 1 Reverse cumulative frequency distribution of the fraction of descendants (blue) and ancestors (orange) over the total number of nodes. The distributions vary widely in terms of largest ancestry and descendancy fraction (from less than 0.1 for HW, to more than 0.7 for WF), showing different degrees of heterogeneity.
dataset present significant differences between each other, with the most peaked (HW) having no ancestry or descendancy larger than 0.1, while WF and PN have numerous nodes with either descendancy or ancestry ranging between 0.2 and 0.5 of the entire network. In all cases the distribution of descendants has the longest tail of the two, although in the case of WF this is caused by the presence of a single node with a large number of descendants (more than 0.7 of all nodes). Overall, the four datasets show very different degrees of heterogeneity in their ancestry and descendancy distributions.

Activity Performance
Performance indicators for each activity can be constructed by comparing its target with the actual start and end dates. Here we focus on a particular form of performance, the Start Delay i.e., the difference between the target and the actual start date. The advantage of this metric is that it allows us to focus on performance fluctuations that occurred upstream of an activity, separating them from fluctuations that might occur while the activity is being carried out. A possible alternative performance indicator would be represented by the End Delay, i.e., the delay in the end date of an activity; this second measure would account for fluctuations that occur while the activity is taking place too, as well as for those that took place upstream.
Suppose, for example, that the completion of activity j is dependent on the completion of activity i, and the two activities are taking place at the same time. If a delay happens in i after the start of j, the same delay might end up propagating to Notice that a majority of activities starting early (i.e., with negative delay) does not necessarily result in an early completion of the project as a whole. As discussed in the Introduction, project overruns are often driven by localised disruptions in activities that can trigger fluctuations reaching the project's end, and affecting the overall performance.

Reachability-Heterogeneity Measure
To quantify the heterogeneity of a project network, we start from Estrada's heterogeneity measure [33], and particularly its extension to directed graphs [34]: Above, k in i and k out i represent the in-and out-degree of node i respectively, N is the set of all edges in the network G, and the summation is taken over the set of all G's (directed) edges E.
Since activity networks are DAGs, a performance fluctuation in node i can only propagate to its descendants. In turn, node i can only be affected by performance fluctuations occurring in its ancestors. By descendant of i, we mean any node j such that a directed path from i to j exists; by ancestor of i, we mean any node j such that a directed path from j to i exists. i is a descendant of j if and only if j is an ancestor of i.
In assessing the heterogeneity of an activity network with respect to performance fluctuation spreading, we make use of the more cogent notion of ancestor (descendant) instead of predecessor (successor). The contribution of a pair to the overall score is a function of the difference between the number of ancestors and descendants of the two nodes involved, rather than of their in-and out-degree, accounting for the impact of ancestors and descendants to the overall spreading process.
In formulae, we replace the in-and out-degree from Equation 1 with the number of ancestors and descendants of the two nodes respectively, and we extend the summation to all pair of connected nodes, leading to the following definition: In Equation 2, d i and a i represent the number of descendants and ancestors of node i, and C is the set of all ordered pairs of connected nodes. This metric is a global network property that allows comparison between different topologies and quantification of their heterogeneity with respect to the size of nodal ancestry lineages. In comparison, the measure in Equation 1 focuses exclusively on the immediate neighbourhood of the node.
In order to provide more actionable information, we introduce an additional version of the measure above, defined at the level of single nodes, in order to allow targeted interventions by project experts. Our aim in doing so is to answer the question: if a single node could be removed in order to make the topology less vulnerable, which one would be the best choice? The answer can simply be computed by taking the difference between the network scores before and after the removal: We call this measure Reachability-Heterogeneity (RH). Table 2 Global RH scores for the four activity networks. The comparison with Figure 1 shows a correspondence between higher score values and longer tail in the ancestry tree size distribution.   Figure 2 shows that it is also the only project that, among the four, reported delays significantly larger than zero.

Results
We first calculate the RH score for all nodes on all the four projects, as well as the four global RH scores, which are reported in Table 2. The global score provides a good characterisation of the shape of the ancestry and descendancy distributions shown in Figure 1, with the highest RH value (WF) being assigned to the distribution with the longest tail, and the other three following in order.
The distributions of node-level RH scores for all four projects are shown in Figure  3. All distributions show frequency values spanning over various orders of magnitude and a rather clearly identifiable peak, always close, but not always corresponding, to the zero value. HW, DC and PN bear some degree of similarity in shape, with a single-sided flat tail in the higher values, but differ in magnitude. Interestingly, WF, which is the only project to report significant positive delays (Figure 2), is also the only project with a significant left tail in the RH score distribution; it is worth remarking that the RH score is based on the network structure alone, and does not account for performance data.
To assess the effectiveness of RH in quantifying node vulnerability, we first use activity performance to build our ground truth. Specifically, we use the Start Delay indicator, as described in the Methods section. To mitigate the noise, we group the nodes in bins of equal width. [1] Within every bin, we calculate the Start Delay of each node and a number of summarising statistics, namely: mean, median, 50% and 68% Confidence Intervals (CIs).
The results for each project are reported in Figure 4, in the form of boxplots. In general, the Start Delay value increases for greater RH, showing that this newly introduced measure can provide a good indicator of activity performance. It is worth reminding that the Start Delay accounts for delays inherited from ancestors, signifying the relationship between performance and spreading (see the Data section for further discussion).
In particular, for the HW data the trend is especially evident in the mean and the lower end of the CIs. The upper end of the CIs seems to be capped at zero, as almost all Start Delay values are negative (see Figure 2). The trend is clearer for lower RH values,which then flattens towards the tail.
For the DC data, the trend is stronger in the mean. The clear separation between the mean value and the centre of the distribution confirms that Start Delay distributions within each bin are long-tailed, with longer tails in correspondence of lower RH values. Again, all Start Delay values are negative.
The WF data are the noisiest, possibly due to the smaller size of the dataset, leading to wider bins. Despite the noise, a trend, not captured by the median, can instead be seen in the CIs and mean.
Finally, in PN the same scenario as in DC is repeated, with the mean capturing a trend otherwise overlooked by the CIs, further reaffirming that low RH scores correspond to a greater presence of outliers from the (left) tail of the Start Delay distribution, the best-performing activities. Due to the extremely peaked shape of the performance distribution (Figure 2), the small size of the CIs was indeed to be expected.
As a further step towards validating the effectiveness of the local RH score, we benchmark it against seven other node metrics: in-degree, out-degree, betweenness centrality, closeness centrality, reverse closeness (i.e., closeness centrality computed on the network with edges' direction reversed), number of descendants and of ancestors. Once again we use the Start Delay as our target variable. For each of the eight metrics considered, we compute the mutual information between it and the target variable. [2] For each of the four networks, we proceed by computing a two-dimensional frequency matrix with the considered node metric as one dimension and the Start Delay as the other. For the purpose of computing frequencies, we group data in a number of uniform bins equal to the square root of the number of nodes, rounded [1] We use the OptBinnig Python package to choose the number of bins: http://gnpalencia.org/optbinning/. [2] Notice that the notion of target variable has a purely methodological significance in this context: mutual information is symmetric with respect to the 'candidate' and 'target' distributions. Figure 4 For each activity of each project, we report Start Delay (in days) and RH score (at the node level). Data are binned uniformly along the RH dimension to mitigate noise. A trend emerges in all four datasets with higher RH values corresponding to longer delays, i.e., worse performance. As it is particularly evident from DC and PN, a significant contribution to this phenomenon comes for the outliers in the Start Delay distribution, the best-performing activities, that tend to score low in RH. down (the same number of bins is used along both dimensions). The mutual information is then computed through the frequency matrix. [3] The results, displayed in Table 3, show that the local RH has the highest mutual information value of all the metrics considered on all datasets minus DC, where it ranks third.

Discussion
Project performance can be understood by focusing on how fluctuations spread within the project's underlying activity network. We leverage the context agnostic nature of the approach to develop a new heterogeneity measure (RH) which we then use to explore four distinct projects (a highway, a data centre, a wind farm, and a power network respectively). The size of the datasets varies between schedules too, from 1185 for DC to 129 for PN. The networks also have very different component structure, as summarised in Table 1.
In all four cases, frequencies of ancestry size, descendancy size, and performance, take values ranging over various orders of magnitude. The global RH score (Table 2) appears to be particularly effective in quantifying the heterogeneity of the descendancy and ancestry distributions (Figure 1), with longer-tailed distributions (i.e., more heterogeneous) corresponding to higher RH values. A systematic investigation of the nature of this correspondence is beyond the scope of this paper, and might provide the object of future works.
Our experimental results on the four datasets show that a general trend exists, according to which lower RH scores correspond to better performance (Figure 4). Looking at these results in detail, the cases of DC and PN are particularly interesting, with the mean of the binned data showing a clear trend that the median fails to capture. A similar behaviour is apparent in the other datasets too, though not as pronounced. This is due due to the trend being driven by outliers, i.e., bestperforming activities, located in the left tail of the Start Delay distribution; these are activities that take smaller RH values and hence amplify the difference between mean and median values within each bin. Such a feature might prove convenient, considering that a likely purpose of the RH measure is to identify cases of extremely high performance, although the opposite (identifying the poorly performing nodes) might also be the case in some instances.
The use of the Start Delay as a performance measure allows us to draw a direct connection between performance and vulnerability to spreading, since it accounts for delays inherited from upstream nodes (as discussed in the Data section). Three out of four projects (excluding WF) follow a similar Start Delay distribution, with a peak around zero and a tail in the negative values (corresponding to betterperforming nodes).
As reported in Table 3, we run a comparison between the local RH score and seven other node metrics (in-and out-degree, betweenness centrality, closeness and reverse closeness centrality, number of descendants and of ancestors). The purpose of the comparison is to quantify which of the candidate metrics carry the most information on node performance. To avoid making any assumption on the form of the dependency, we use mutual information, which is a non-parametric measure, capable of accounting for non-linear relationships. With the sole exception of DC, where it ranks third, the local RH carries the highest mutual information of all the metrics considered. No other candidate shows the same consistency across datasets; closeness centrality, for example, ranks first in DC and second in HW, but fourth in WF and fifth in PN. In-and out-degree are always the two worst performing metrics, reinforcing the point that an effective performance measure must look beyond the first-degree neighbourhood, in agreement with the existing literature [48].

Conclusions
In the present work, we tackle the question of quantifying and mitigating spreading phenomena from a topological perspective, focusing on how fluctuations in the completion time of certain activities can impact the performance of complex projects.
Our contribution is two-fold: first, we introduce a novel vulnerability measure that focuses on ancestry tree size, a quantity that plays a big role in spreading process across DAGs; second we apply this measure to an important but currently underrepresented domain -the delivery of complex projects -where we use ground truth data to test our proposed measure.
Using these data, we assess the effectiveness of RH in quantifying performance fluctuations of activities within projects. We show that higher values in RH correspond to worse performance, indicating its appropriateness in accounting for the propensity of such fluctuations to propagate. In addition, we compare RH with seven other node metrics, and show that RH carries the most amount of information about the activity performance on three out of four projects, strengthening its utility in identifying vulnerable nodes.
As well as introducing a new tool for the study of spreading processes on networks, and on directed acyclic graphs in particular, we hope that our work will stimulate the interest of the community in project management as a domain of application for network science.