Skip to main content

Rank is all you need: development and analysis of robust causal networks

Abstract

Financial networks can be constructed to model the intertemporal price dependencies within an asset market, giving rise to a causal network. These networks are traditionally inferred through multivariate predictive modelling. However, the application of these techniques to financial data is highly challenged. The interplay of social and economic factors produces unstable price behaviour that violates most conventional modelling assumptions, limiting the informational content of networks derived from standard inference tools. Despite these limitations, it remains unclear whether the improved accuracy of robustly estimated networks translates into qualitatively unique insight. This study provides an extended analysis of our recently introduced Rank-Vector-Autoregression model, demonstrating its capacity to identify properties that are undetected with standard methodology. We initially validate the superior accuracy of Rank-VAR through a simulation study on processes that contain adversarial abnormalities. When applied to a dataset of 261 cryptocurrencies, our rank network uniquely displays capitalisation-dependent hierarchical ordering, with outgoing influence being positively, and incoming influence negatively correlated to total coin valuation. Applying our method to the squared deviations verifies that even under robust estimation, volatility networks display fundamentally differing dynamics to raw returns, with more connections, clustering, and causal cycles. Our results demonstrate the use of Rank-VAR to identify and verify unique properties in the causal structures of cryptocurrency markets.

Introduction

Grasping the underlying causes of price volatility in complex assets holds significant value. This knowledge equips investors and policymakers with the tools to decode risk structures and to develop forecasts capable of yielding profitable trading strategies. An important representation of causal effects involves generating networks that map the interdependencies between assets. Our study employs inferential forecasting methodologies with the objective of constructing these causal networks, aiming to expand our understanding of financial markets through complex networks analysis.

The rise of cryptocurrency markets presents a fertile context for deploying these analytical techniques. While these digital assets have garnered significant interest for their rapid expansion, they also present considerable complexities. These complexities stem from their significant volatility, which is the result of the complex interaction of social and economic factors, including market sentiment, news events, technological innovations, regulatory shifts, and unpredictable market participant behavior (Giudici et al. 2020). Given this complexity, the analysis and modeling of cryptocurrency prices has become an indispensable area of research. This study investigates cryptocurrency markets through the aforementioned causal network framework, providing methodological advancements to deal with the idiosyncrasies inherent to volatile market data. Despite this study’s focus on cryptocurrencies, the robust model we develop has potential applications across a myriad of financial instruments, including equities, futures, and fixed income assets, offering many directions for further studies.

The Vector Autoregression (VAR) framework stands as a prevalent method for multivariate forecasting (Toda and Phillips 1994; Stock and Watson 2001), extending univariate autoregression by incorporating cross-dependencies between multiple time series. This paper introduces an intuitive extension to the conventional VAR model, designed to increase estimation robustness—a critical adaptation prompted by the frequent violations of VAR’s standard assumptions within financial data. Our proposed Rank-VAR model utilizes a rank transformation of the original data series, mitigating the impact of outliers and highly-leveraged points.

Our method is validated through simulations that illustrate its advantages, followed by an analysis comparing standard and Rank-VAR methods applied to a dataset comprising 261 cryptocurrencies over the course of a year. We constructed networks to analyze both the mean response dependency and the volatility structure of returns. Our analysis reveals that robust techniques, such as Rank-VAR, yield fewer connections compared to the conventional method (1.57% vs 3.78%). However, these connections exhibit stronger correlations with market capitalization. We identify a significant inclination towards negatively-signed self-dependence within the cryptocurrency market, with the Rank-VAR method uncovering a greater prevalence of these mean-reversion links (90% vs 85%). In terms of volatility modeling, although the Rank-VAR method identifies fewer links than the standard VAR, both methods produce networks with considerably more connections in their volatility structures than in their mean response analyses (8% and 4%, respectively).

The original contributions that we build upon are:

  • An introduction to our extended, robust VAR framework, with an explanation of it’s intimate connections to both copula modelling and the Spearman Correlation.

  • An extensive simulation study demonstrating the advantages of Rank-VAR when modelling processes with abnormal variations to the VAR assumptions.

    • The results show strong advantages in non-standard cases: in the most detrimental cases, where initial performance is poor we observe an increase in the Area-Under the Curve (AUC) from 0.562 to 0.707. For classification metrics, this corresponds to a 604% increase in the F1 score, with two other scenarios showing more than \(100\%\) improvement.

This extended study offers a detailed analysis of networks derived from applying Rank-VAR to 1 year of hourly return data from 261 cryptocurrencies, revealing the following key findings:

  • The Identification of Novel Network Features:

    • Rank-VAR’s enhanced precision allows for the identification of previously unobserved network features, such as local community effects and hierarchical linking patterns related to the ordering for market capitalization.

  • Differentiation of Network Profiles by Information Type:

    • We verify that directional, and volatility style information produces causal networks with differing network profiles. Even under the robust fitting of Rank-VAR, we see more connections, clusters and causal cycles, suggesting that volatility may rapidly escalate inside the network.

Related work

Numerous studies have delved into the construction of networks derived from the correlational patterns observed in time series data of financial assets. It is common for researchers to explore the concurrent correlations among asset prices, aiming to model the joint structure of these observations in a risk network (Boginski et al. 2005). Some studies analyze the evolution of these networks over some interval using ‘sliding window’ approaches (Kenett et al. 2010; Almog and Shmueli 2019), while others analyse evolution through an “asset graph”, constructed by the sequential introduction of nodes based on their market correlations (Onnela et al. 2003).

Compared to the risk networks that arise from simultaneous correlations, causal networks are derived from cross-correlational effects. This approach goes beyond monitoring the co-occurrence of returns, and captures the directional relationships between assets. A notable advantage of this approach lies in its provision of an ‘impulse response’ framework, where we can consider the consequential effects of activity stemming from a single node. Such a representation is particularly insightful in financial analysis, where an understanding of specific asset crash-contagion is crucial. Given this suitability, it’s unsurprising that causal networks have seen financial application (Billio et al. 2011) and are frequently used to model the relationships between media sentiment and price (Souza and Aste 2019). For applications to cryptocurrency data, we observe a strong research emphasis on modelling the interactions between prices and other data sources, such as media sentiment or traditional asset prices (Aste 2019; Milunovich 2018; Amirzadeh et al. 2023; Elsayed et al. 2020; Azqueta-Gavaldón 2020). While VAR models have seen limited application to these kinds of cross-data causality networks (Ahelegbey et al. 2021), most analyses of this kind is done in a bivariate framework, lacking partial effects considerations.

Cross-correlational effects are customarily modeled through a partial-effects approach to manage confounding influences. The VAR model is the primary tool for such analyses, and is a direct partial effects analogue of the cross-correlation matrix. This approach is intimately linked to the notions of causality developed by Granger (1969, 1980), and it can be shown that testing a set of variables for linear Granger causality is equivalent to determining whether the associated VAR parameters are nonzero (Luetkepohl 2005). Variations to the baseline VAR model have been developed to incorporate various specific features, such as coefficients that represent long range dependency (Johansen 2008), and structural restrictions that generate directed acyclic graphs (Ahelegbey et al. 2016). However, a large proportion of causality research occurs along two main axes. First, approaches that generalize the mean-response causality of VAR into more general/nonlinear causality, with particular research focus on capturing causality in variance (Cheung and Ng 1996; Woźniak 2018) and tail/extreme risk causality (Corsi et al. 2018; Hong et al. 2009). Second, approaches that improve linear causality detection in the presence of data abnormalities. While the method we introduce goes slightly beyond standard linearity, it mostly falls into the latter category.

A range of approaches for the robust estimation of VAR models have been introduced (Croux and Joossens 2008; Muler and Yohai 2013; Chang and Shi 2022), yet these strategies typically aim to improve forecasting accuracy while ensuring the model remains interpretable and that coefficients stay on their conventional scale. In scenarios where the emphasis is on network analysis alone, the intricate interpretability of each coefficient becomes less crucial. Our main goal is to determine the interdependence among time series, framing the issue in terms of a binary outcome. This circumvents several technical challenges, and enables simpler robust estimation methods, like the Rank-VAR method we present.

Methodology

The objective of causal network analysis is to construct a directed graph, or digraph, denoted by \(G=(V,E)\), which maps out causal relationships within a system. Here, V represents a set of nodes, corresponding to different variables or entities (in our case, cryptocurrencies), while E comprises a set of directed edges that indicate causal influences from one node to another. Specifically, an edge \(e_{ij} \in E\) signifies that the next observation of asset j is influenced by the previous values of asset i. This causal structure can be effectively visualized through an \(N \times N\) adjacency matrix W, where N equals the total number of nodes, and matrix elements \(W_{i,j}=1\) indicate a causal link from node i to node j, while \(W_{i,j}=0\) signifies no effect.

When developing these networks from empirical data, the primary methodological step involves selecting a model that can accurately identify and test these causal relationships, i.e., to infer E. This section provides an overview of the common techniques for this estimation problem, details our proposed extension, and reviews several commonly utilized tools for analyzing empirical networks.

Bivariate correlations

The simplest way to investigate the statistical dependencies observed between two series is to use the Pearson product moment correlation: r(XY). This metric is widely used across many analytical fields and tends to be the primary measure of bivariate relation. It quantifies the linear dependence between two series by taking the ratio of their covariance against the product of their standard deviations:

$$\begin{aligned} r(X,Y)=Corr(X,Y)=\frac{Cov(X,Y)}{\sigma _{X}\sigma _{Y}}. \end{aligned}$$
(1)

This representation demonstrates that the Pearson correlation is a normalisation of the covariance measure, ensuring that the resulting term falls within the interval \([-1,1]\).

The Pearson correlation offers numerous benefits, such as its intuitive appeal, ease of computation, and solid foundation in \(\mathcal {L}_2\) statistical methods. however, it exhibits high sensitivity to the distributions \(F_X(\cdot )\) and \(F_Y(\cdot )\) of the input data. Given its reliance on squared deviations, the presence of outliers or fat-tailed behaviour can distort the correlation values \(r(X,Y)\). This aspect becomes particularly critical in our causal analysis framework, where we determine links \(e_{ij}\) through binary classifications on test statistics that are intimately connected to these correlation figures \(r\). Significant bias or variance in this metric can adversely affect the precision of link identification within the causal network.

The Spearman rank correlation coefficient \(\rho (X,Y)\) generalises r(XY). It measures the degree to which two series are monotonically related, i.e., whether an increase in one variable is consistently associated with an increase (or decrease) in the other. It can be considered a simple extension to the Pearson correlation, as it is defined as the Pearson correlation of the variables after a rank transformation:

$$\begin{aligned} R(X_i)=|\{X_j: X_j < X_i\}|+0.5|\{X_j: X_j = X_i\}|+1 \end{aligned}$$
(2)

Spearman’s \(\rho\) often serves as a robust substitute for Pearson correlation, offering reduced sensitivity to outliers and requiring fewer assumptions about the relationship’s structure. Its utility shines when dealing with data characterized by significant skewness or kurtosis. By focusing on monotonicity, rather than linearity, Spearman’s \(\rho\) is adept at uncovering complex correlation patterns, proving beneficial in scenarios where the co-linear assumptions of the Pearson coefficient are inappropriate. Such qualities render it well-suited for the analysis of cryptocurrency markets, known for their intricate volatility and departure from the classical Gaussian model (Cornell et al. 2023).

Vector autoregression

Vector autoregression (VAR), a statistical model widely recognized for its application in macroeconometrics, was introduced by Christopher Sims (Sims 1980) to analyze the joint dynamics and causal relationships across multiple time series. It serves as a multivariate extension of the univariate autoregression (AR) model, which examines the time-dependent relationships within a series of observations. In the VAR(p) framework, the expected value of the data vector \(\varvec{y}_t\) at any given time is modeled as a linear combination of it’s \(p\) preceding values. The following Eqs. 3 and 4 illustrate this relationship for first-order and \(p\)-th order lagged versions:

$$\begin{aligned} \hbox {VAR(1):} \;\;\;&\varvec{y}_t =A_1\varvec{y}_{t-1}+\varvec{c}+\varvec{\epsilon }_t , \end{aligned}$$
(3)
$$\begin{aligned} \hbox {VAR(p):} \;\;\;&\varvec{y}_t =A_1\varvec{y}_{t-1}+A_2\varvec{y}_{t-2}+...+A_{t-p}\varvec{y}_{t-p}+\varvec{c}+\varvec{ \epsilon }_t , \end{aligned}$$
(4)

where \(\varvec{y}_t\) is a \(N \times 1\) vector capturing observations at time \(t\), with constant vector \(\varvec{c}\) capturing linear trends. The \(A_l\) are \(N \times N\) coefficients matrices corresponding to lags \(l=1,...,p\), and \(\varvec{\epsilon }_t\) denotes an \(N \times 1\) vector of error terms, characterized by zero mean and a covariance matrix \(\varSigma _\epsilon\). Typically, the error vector \(\varvec{\epsilon }_t\) is assumed to follow a Gaussian distribution; however, this is not a formal requirement. Under the VAR framework, it is posited that the present value of each variable is potentially influenced by both its own historical values and those of every other variable in the system, denoting a fully conditioned model.

Estimating a VAR model involves determining the values of the coefficient matrices \(A_l\) and the error covariance matrix \(\varSigma _\epsilon\). Our main focus is on uncovering the dataset’s causal influence structure, necessitating accurate estimates of \(A_l\) since they encapsulate these relationships. Typically, this estimation is achieved through the multivariate least squares (MLS) method, which treats VAR estimation as a general multivariate regression problem (matrix regression). This approach allows for the derivation of closed-form solutions through orthogonal projection (Luetkepohl 2005).

Hypothesis testing for the statistical significance of the coefficient matrices’ elements can be performed by observing that our estimates, \(\hat{A}_l\), follow an asymptotic normal distribution under finite variance assumptions, i.e.,

$$\begin{aligned} \sqrt{N} \; \hbox {Vec}({\hat{A}_l-A_l})\xrightarrow []{d}\mathcal {N}(0, \varGamma ^{-1}\otimes \varSigma _\epsilon ) , \end{aligned}$$
(5)

where \(\varGamma =\ YY'/N\), \(\otimes\) indicates the Kronecker product and \(\hbox {Vec}(\cdot )\) refers to the process of transforming a matrix into a vector form. For a VAR(1) model, \(Y\) corresponds to the matrix form of our response data \(\varvec{y}_t\), indicating that \(\varGamma\) serves as an estimated covariance matrix of returns. With generalized VAR(p) models, this matrix’s complexity escalates, yet the formulation in Eq. 5 remains applicable.

To verify the presence of a link \(e_{ij}\) we calculate \(t\) values under the null hypothesis that \(A_{l,i,j}=0\), defined as \(\hat{t}_{i,j}={\hat{A}_{l,i,j}}/\hat{s}_{i,j}\), with \(\hat{s}_{i,j}\) being the corresponding element from \(\varGamma ^{-1}\otimes \varSigma _\epsilon\). These \(t\) values facilitate the creation of a binary classification for links, targeting a false positive rate \(\alpha\) through the criterion \(\hat{e}_{ij}=|\hat{t}_{ij}|>\varPhi (1-\alpha /2)\), where \(\varPhi\) denotes the inverse normal cumulative distribution function.

To streamline our analysis of causal networks, we primarily consider VAR processes of order 1, VAR(1), and exclude the index l from further discussion, with \(A=A_1\) unless stated otherwise. It is important to note that, as demonstrated in Appendix 7.1, any VAR(p) model can be directly converted to an augmented VAR(1) representation (Luetkepohl 2005), producing a recurrence structure analogous to Eq. 3. From either this expanded form, or similar re-structured equations we can fit co-efficient with a least squares projection, and find asymptotic co-efficient distributions as per Eq. 5. Given the fundamental equivalence of these models, it is theoretically expected that the simulation results in “Simulation study” section should generalize to VAR(p). For the empirical networks discussed in “Data” section, our VAR(1) focus corresponds to an assumption that any causal relationship from i to j first manifests through immediate (order-1) effects, and that lag effects beyond order 1 cannot occur without an initial order-1 link. While this assumption is not certain, it is intuitive to assume that such situations would be relatively uncommon. Further, we may expect such effects to be weaker in magnitude than their order-1 counterparts, making the successful identification of such anomalies even more uncommon.

Rank vector autoregression

The introduced method, Rank-VAR, provides a robust extension to the traditional model by altering the recurrence relation to operate on the rank transformed data. This process yields a multivariate partial-effects counterpart to the Spearman rank correlation, with the following mathematical form:

$$\begin{aligned} \hbox {Rank-VAR(1):} \;\;\;&R(\varvec{y}_t) =A_1R(\varvec{y}_{t-1})+\varvec{c}+\varvec{\epsilon }_t , \end{aligned}$$
(6)
$$\begin{aligned} \hbox {Rank-VAR}(p): \;\;\;&R(\varvec{y}_t) =A_1R(\varvec{y}_{t-1})+...+A_{t-p}R(\varvec{y}_{t-p})+\varvec{c}+\varvec{ \epsilon }_t , \end{aligned}$$
(7)

where the rank transform \(R(\varvec{y}_t)\) is applied elementwise to the vector \(\varvec{y}_t\). The resulting error terms \(\varvec{ \epsilon }_t\) have a relatively complicated structure due to the discrete, bounded nature of \(R(\varvec{y}_t)\). However we can infer some basic elements, such as zero expectation, and bounded variance (as the ranks, \(R(\varvec{y}_t)\), are themselves bounded). As such, we meet the main requirement of finite variance for the asymptotic normality of \(\hat{A}\) discussed in “Vector autoregression” section.

The primary motivation for the rank transformation is to generalise \(\rho\) into a multivariate setting, allowing us to construct causal networks with enhanced resilience to the non-Gaussian behavior commonly observed in financial markets. Specifically, testing for co-linearity of the ranks corresponds to the relaxed assumption of co-monotonicity of the raw returns, and the bounded nature of the ranks limits the undue influence of outlier observations. However, an inherent limitation to this approach is a decrease in interpretability, as the model’s coefficients now correspond to ranks rather than the original return values. This alteration is likely to reduce the accuracy of forecasts in their original scale. Nonetheless, Rank-VAR is primarily designed for identification tasks, particularly for constructing causal networks. For such binary outcomes, parameter interpretability and forecast accuracy is largely irrelevant. These concerns, however, do explain the method’s absence among the general time series literature, and the restrained use of non-temporal rank regression methods beyond preliminary investigations (Iman and Conover 1979; Chen et al. 2014). It is important to recognize that these studies, including the current one, primarily employ an empirical approach. Given the gradual expansion in the use of these methods, there is an indication that future studies on the theoretical properties of rank-regression tools would be beneficial to the research community.

Before diving into simulations, it is crucial to clarify the theoretical connections between our method and existing time series approaches. Recognizing that linear/correlational models remain unchanged under scalar multiplication allows us to consider rank imputation as effectively a CDF transformation. Specifically, the rank of a data point \(R(X_i)\) quantifies the count of \(X\) values that are either equal to or less than \(X_i\). By dividing these ranks by \(N\), the total number of observations, we obtain an approximation of the empirical distribution function:

$$\begin{aligned} \hat{F}_X(X_i) = \frac{1}{N}\sum ^N_{j=1} 1 _{X_j\le X_i} \approx \frac{R(X_i)}{N}. \end{aligned}$$
(8)

The approximation stems from \(R(X_i)\) attributing only a half count to observations of identical value, as we adopt ‘mid ranks’. This view reveals parallels between Rank-VAR and copula modeling. In copula modeling, collections of series such as \(\{X, Y,..., Z\}\) undergo transformation through their respective marginal CDF functions:

$$\begin{aligned} \{U_X, U_Y,..., U_Z\} = \{F_X(X), F_Y(Y),..., F_Z(Z)\}, \end{aligned}$$

where the \(U_i\) are uniformly distributed. The joint cumulative distribution function \(C(u_x, u_y, \ldots , u_z) = P(U_X<u_x, U_Y<u_y,..., U_Z<u_z)\) is then defined as the copula of \(\{X, Y,..., Z\}\).

This approach separates the dependency structure of joint distributions from the influences of the marginal distributions. Sklar’s theorem supports this approach, which posits that any multivariate joint distribution can be broken down into its univariate marginal components, and a copula that describes the cross-variable dependencies.

Hence, by emphasizing ranks we are using a copula-like technique to separate the issues of joint dependence identification and characterisation of the marginal distributions, offering a potentially streamlined approach to causal modeling. However, it’s essential to acknowledge that these parallels are not exact. Our method utilizes empirical, rather than true marginal distributions, and we analyse joint dependence through conditional expectation, rather than the direct CDF modelling found in traditional copula techniques.

Network metrics

From the network identification methods discussed in “‘Vector autoregression” and “Rank vector autoregression” sections we have generated sets of links \(\hat{e}_{ij}=\{0,1\}\) derived from our estimated \(\hat{t}_{ij}\) values. From this the adjacency matrix, \(\hat{W}\), of our resulting graphs can be easily constructed with terms \(\hat{W}_{ij}=\hat{e}_{ij}\). From this matrix representation we can calculate many common metrics to summarise key aspects of asset coupling behaviour.

Node degree

A simple metric of a complex network is the degree k(i) of a given node i. For a digraph this comes in two components: the out-degree \(k^+(i)=\sum _jW_{i,j}\) specifies the number of outgoing edges, while the in-degree \(k^-(i)=\sum _jW_{j,i}\) counts incoming edges. From the set of degrees k in our network we may construct degree distributions \(P^+(k)\) & \(P^-(k)\), which describe the probability of a randomly selected node having degree equal to k (Newman 2010). As a simple measure of a nodes tendency to be influential, rather than influenced, we can consider the net-node degree: \(k^\varDelta =k^+-k^-\). This score will be influenced by the joint distribution of node degrees, and can be considered a basis for hierarchical ranking.

Clustering

It is common to investigate a networks tendency to form cliques, where the clustering coefficient is the standard metric. While many variants exist, we follow the specification used in Fagiolo (2007), which defines the clustering c(i) of a node as follows:

$$\begin{aligned} c(i)=\frac{(W+W^T)^3_{ii}}{2[k(i)(k(i)-1)-2k^{<>}(i)]}, \end{aligned}$$
(9)

where \(k(i)= k^+(i)+k^-(i)\) and \(k^{<>}(i)\) is the reciprocal degree, or the number of nodes j, such that there is both link \(e_{i,j}\) and \(e_{j,i}\). The clustering of a graph C(G) can be calculated as the mean node clustering \(1/N\sum _ic(i)\).

Assortativity

The Assortativity of a graph describe the proclivity for edges to combine nodes with similar features (usually node degrees: \(k^{\pm }(i)\), or some fitness parameter \(\lambda (i)\)). It is often defined as the correlation of the source and target attributes across the set of edges. For our implementation we use the Spearman rank correlation \(\rho\) of the edges market capitalisation \(\lambda\), mathematically written as:

$$\begin{aligned} \text {Assortativity}(G)= \sum _{e_{ij}\in E} \rho \left( \lambda (i), \lambda (j) \right) \end{aligned}$$
(10)

Reciprocity

The reciprocity of a graph or node measures the tendency of links to be bidirectional, indicating mutual connections. For a graph, the reciprocity is defined as the ratio of the number of mutual links to total links, with the single node variant being the fraction of a nodes outgoing links which have an associated incoming link:

$$\begin{aligned} r(i)=\frac{k^{<>}(i)}{k(i)}. \end{aligned}$$
(11)

The entire graph Reciprocity, R, can be easily calculated from the adjacency matrix W:

$$\begin{aligned} R=\frac{\sum _{i} W^2_{ii}}{2|E|}. \end{aligned}$$
(12)

Causal cycles

A specific representation of cluster behavior, and an extension to reciprocity that we may consider is the presence of causal cycles in our graph. A causal cycle of length n occurs when there is a path of n nodes which begin and end at the same node, with directed edges connecting all nodes in the sequence. This indicates that a nodes outgoing influence can propagate through the network back into itself, thus generating causal feedback. The total number of length n cycles within a Digraph can be calculated by raising the adjacency matrix to the power of n:

$$\begin{aligned} \textit{Cycle}_n= \frac{W^n_{ii}}{n}, \end{aligned}$$
(13)

a format which demonstrates reciprocity as the normalised count of length 2 cycles.

Community structure

A complex network displays community structure when its nodes can be partitioned into highly connected subsets (communities), where most connections occur within these subsets rather than between them. To evaluate the suitability of a proposed community structure, we can assess the ‘fit’ of the partition by calculating the modularity, shown in Eq 14:

$$\begin{aligned} Q=\sum ^n_{c=1}\left[ \frac{L_c}{m}-\gamma \left( \frac{k_c}{2m}\right) ^2\right] \end{aligned}$$
(14)

where \(L_c\) represents the number of links within a community, m is the total number of links in the network, \(k_c\) is the sum of the degrees of the nodes in a community, and \(\gamma\) is a hyperparameter that adjusts the preference for smaller communities. Detecting the ‘optimal’ community allocation poses significant challenges. In our approach, we employ the “Clauset–Newman–Moore greedy modularity maximization” which provides a partition that greedily maximises Eq. 14 Clauset et al. (2004). Non-greedy solutions exist, however these often introduce randomisation, and significantly elevated computation time. As community analysis is highly dependent on the hyperparameter \(\gamma\), we calculate the structures and associated modularity score across the interval \(\gamma = [0.5,1]\) (by \(\gamma =0.5\) there tends to be one dominating community, and for \(\gamma >1\) we see numerous single member communities).

Random graph validation

To validate whether the observed attribute of our networks exceeds what would be expected by chance, we compare the calculated values against those derived from a set of Erdős–Rényi (ER) random graphs. Specifically, we generate 1000 ER random graphs, ensuring that the probability of edge creation (p) and the number of nodes (N) match the network whose features we’re verifying. We then calculate our metrics for these graphs, establishing a ‘null distribution’ of values in the absence of true attribute effects (i.e. uniquely modular behaviour, excessive clustering etc). This allows us to compare our causal networks against this null set to derive an indication of specific attribute behaviour. However, it is crucial to note that this ‘null’ model is rather limited. It does not replicate specific network idiosyncrasies, such as variations in the node degree distributions and capitalisation effects, both of which may influence the ‘baseline’ values.

Simulation study

As discussed in “Network metrics” section, we aim to generate a binary directed graph \(\hat{W}\) that denotes the causal dependencies observed in our dataset. Since we cannot observe the ‘ground truth’ network structure, this task is analogous to unsupervised learning, where we aim to infer latent variables that cannot be directly viewed. Unlike supervised cases (such as forecasting), we cannot compare our estimated causal network against a true label, leaving us unable to validate our model improvements through a standard training/testing dataset approach.

Instead, we conduct a simulation study to validate our methodology’s ability to identify W. These simulations create controlled environments where the true network structure is known, allowing us to compare how accurately different models reconstruct the true causal network. As our introduced method is designed to be a robust alternative to standard VAR, we simulate and assess across a variety of processes containing violations to the standard form. This section first outlines the practical details of these simulations, as well as the theoretical aspects of the tested violations, before evaluating the simulation results across several key metrics.

Simulating VAR processes

A Vector Autoregressive model is characterized by the constant vector \(\varvec{c}\), a set of recurrence matrices denoted by \(A_i\), and the error process’s distribution: \(F_\mathcal {E}(\varvec{\epsilon }_t)\), which is intentionally designed to have a mean of zero. It is often assumed for simplicity that \(\varvec{\epsilon }_t\) follows a Gaussian distribution, leading to the entire distribution \(F_\mathcal {E}(\varvec{\epsilon }_t)\) being described by the error covariance matrix \(\varSigma _\epsilon\). However, the empirical data from complex systems (particularly economic systems) frequently show deviations from Gaussian behavior. Consequently, our simulations aims to evaluate the performance of VAR and Rank-VAR models when faced with such abnormal distributions.

Generating \(\varSigma _\epsilon\)

The initial simulation phase involves creating a random covariance matrix that must be both symmetric and positive definite. This is accomplished by applying a random rotation to a collection of positive eigenvalues \(\{\lambda _i\}\), obtained through sampling the absolute values of normal distributions: \(\{\lambda _i\} \sim |\mathcal {N}(0,1)|\). The rotation matrix, \(Q\), is derived from the \(Q\) component of a QR decomposition performed on a matrix filled with random normal entries. Consequently, the constructed covariance matrix is expressed as:

$$\begin{aligned} \varSigma _\epsilon = QD(\lambda _i)Q^\top , \end{aligned}$$
(15)

with \(D(\lambda _i)\) representing the diagonal matrix composed of the eigenvalues.

Generating A

The subsequent phase entails the generation of valid matrices that guarantee the stationarity of the VAR process. This requires that the eigenvalues of \(A_i\) are confined within the unit circle. Eigenvalues on the boundary (\(|\lambda _i|=1\)) lead to a drifting process, while eigenvalues beyond the unit circle (\(|\lambda _i|>1\)) indicate divergence. Reusing the method for \(\varSigma _t\) creation would yield dense recurrence matrices, which do not accurately reflect the sparse causal relationships we empirically expect. Therefore, we opt for constructing a sparse Erdős-Rényi graph, where each link has a probability \(p\) of being included, and populate it with Gaussian-distributed weights to form \(A^*\). If required, to ensure stationarity \(A^*\) is scaled down by dividing by approximately the magnitude of the largest eigenvalue:

$$\begin{aligned} A=\frac{A^*}{\max (|\lambda _i|)\times 1.05}, \end{aligned}$$
(16)

where the factor of 1.05 provides a margin to prevent process drift.

Variations to model specifications

Fat-tailed distribution

drawing errors from a fat tailed distribution does not explicitly violate the VAR assumptions, however it is unconventional because the typical (often implicit) assumption of narrow tails underpins the efficacy of \(\mathcal {L}_2\) estimators, such as the least squares projection. These estimators are generally sensitive to outliers, rendering them less effective for distributions characterized by high kurtosis.

Among the array of fat-tailed distributions, the mixed normal distribution stands out for its common use and analytical tractability. In this setup, the variable of interest, \(\epsilon _t\), is modeled as a probabilistic blend of two Gaussian distributions, \(X\sim \mathcal {N}(0,\sigma _x)\) and \(Y\sim \mathcal {N}(0,\sigma _Y)\), with distinct variances. To simulate \(\epsilon _t\), one draws observations from \(F_X(x)\) with a probability of \(q\) or from \(F_Y(y)\) with the complementary probability \((1-q)\). This mechanism yields \(\epsilon _t\)’s cumulative density function as:

$$\begin{aligned} F_\mathcal {E}(\varvec{\epsilon }_t) = qF_X(x) + (1-q)F_Y(y). \end{aligned}$$
(17)

To induce fat-tailed behavior in \(\epsilon _t\) we control \(q\) and \(\sigma _X/\sigma _Y\) to elevate kurtosis. Usually, this is accomplished through a combination of infrequent outliers, \((1-q)\gg q\), and a high variance ratio \(\sigma _X/\sigma _Y \gg 1\). When dealing with a fat-tailed error series \(\varvec{\epsilon }_t\), the process for generating a VAR simulation remains unchanged. The linear filter \(A\) is applied to the fat-tailed error series, allowing the fat tails to percolate throughout the system.

Post-recurrence spiking

A transient, non-autoregressive error component \(\varvec{s}_t\) is introduced, characterized by its sparsity and significant ‘spike’ magnitude compared to the usual values of \(\varvec{y}_t\). This leads to the following first-order process:

$$\begin{aligned} \varvec{y}_t&=A_1(\varvec{y}_{t-1}-\varvec{s}_{t-1})+\varvec{c}+\varvec{\epsilon }_t+\varvec{s}_t, \end{aligned}$$
(18)

which may be more easily interpreted in it’s infinite order moving average (MA) representation:

$$\begin{aligned} \varvec{y}_t&=\sum ^t_{i=0} A^{t-i} \varvec{\epsilon }_i +\varvec{s}_t. \end{aligned}$$
(19)

In this representation, each observation \(\varvec{y}_t\) is visible as a smoothing filter applied to the entire residual series \(\varvec{\epsilon }_i\), plus an observation specific error term \(s_t\).

Real-world datasets are often complex and seldom truly adhere to theoretical process descriptions (such as full auto regression). Consequently, most observation sequences are likely to involve some degree of this type of noise. Here, we are particularly concerned with scenarios of infrequent, yet significant spikes in measurement. For example, letting \(s_t\) adhere to previously described mixture distributions, with a near trivial standard variance, but a highly impactful spike variance: \(\sigma _{s_x} \gg \sigma _{\epsilon } \gg \sigma _{s_y}\). Practical instances of these occurrences could involve anomalies in data collection or temporary economic disruptions that are not anticipated to endure or affect other variables in the network.

Conditional heteroskedasticity (VARCH)

Here we consider autocorrelation in the squared errors, \(\varvec{\epsilon }^2_t\). A univariate approach to modeling this involves the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) framework, where the variance of the residual series depends on both past squared residuals, \(\varvec{\epsilon }^2_t\), and prior variances, \(\sigma ^2_t=E[\varvec{\epsilon }^2_t]\):

$$\begin{aligned} \sigma ^2_t=\sum ^p_{i=1} \alpha _i\sigma ^2_{t-i} + \sum ^p_{i=1} \beta _i\varvec{\epsilon }^2_{t-i}+c, \end{aligned}$$
(20)

for recurrence coefficient \(\alpha _i\) and \(\beta _i\). The multivariate extension to this framework introduces substantial complexity, stemming from the potential to model each covariance element \(\varSigma _{ij}\) with autoregressive effects, requiring the autoregression of entire matrices. We utilise a simplified form where covariance elements respond solely to changing variances, without independent dynamics. This model is known as the constant conditional correlation form (CCC-GARCH). Under this framework, the covariance matrix of the residual series \(\varSigma _t\) is consistently expressible in a diagonalised form with a constant correlation matrix \(R\), i.e.,

$$\begin{aligned} \varSigma _t=D(\varvec{\sigma }_t)RD(\varvec{\sigma }_t) , \end{aligned}$$
(21)

with only the vector \(\varvec{\sigma }_t\) varying over time. This dynamic volatility then follows a VAR style recurrence relation, with \(\varvec{\epsilon }^2\) in place of the raw values:

$$\begin{aligned} \varvec{\sigma }^2_t=B_1\varvec{\epsilon }^2_{t-1}+B_2\varvec{\epsilon }^2_{t-2}+\cdots +B_p\varvec{\epsilon }^2_{t-p}+\varvec{c}. \end{aligned}$$
(22)

This approach omits the lagged \(\varvec{\sigma }^2_t\) terms, suggesting that volatility adheres to a VAR rather than a VARMA process. Our simulations adopt a simplified setting with \(p=1\), fostering short-memory volatility patterns. The matrix \(A\) is obtained by taking the elementwise absolute value of a matrix produced in accordance with “Simulating VAR processes” section. The error process is then generated recursively, with the residual variances \(\sigma _t\) being generated sequentially as per Eq. 22, before being diagonalised into \(\varSigma _{t}\).

Nonlinear recurrence functions

Beyond altering the error distribution \(F_\mathcal {E}(\varvec{\epsilon }_t)\), we can consider alterations that directly effect the recurrence dynamics. Specifically, we introduce an elementwise monotonic activation of the recurrence input. Instead of the linear specification outlined in Eq. 3, we introduce the transformed recurrence formulation:

$$\begin{aligned} \varvec{y}_t&=A_1\phi (\varvec{y}_{t-1})+\varvec{c}+\varvec{\epsilon }_t, \end{aligned}$$
(23)

where \(\phi (\cdot )\) is a monotonic function applied to each element, with the added condition of being sub-linear to preserve the stationarity of the simulation outcomes. Such nonlinear dynamics are not inconceivable. For example, within the financial sector, it’s possible to encounter what’s known as draw-down correlation, characterized by returns being correlated when they are negative, yet largely independent when the returns are positive (Relu-like behaviour).

Parameters

For each variation-recurrence combination we generate 100 simulations with the following paramers:

  • \(N=100\) nodes per process (10,000 potential links) containing 1,000 observations, corresponding to \(\sim\)3 years of daily data.

  • Recurrence matrix and Covariance matrix designed in accordance with X and Y, with constant \(c=0\)

  • Fat tailed error and spike variance are both 10X higher than normal observations (\(\sim\) 3.1% higher SD), with \(q=3\%\) frequency of outlier observations (reflecting \(\sim\)monthly outliers).

ROC results

A key advantage of simulation is that we know the true correlation structure. This knowledge enables us to construct Receiver Operating Characteristic (ROC) curves, as illustrated in Fig. 1. For each network, we estimate the absolute \(|t|\) values for all possible links and apply increasing thresholds to produce a precision-recall curve. This curve illustrates a model’s ability to balance the identification of true positives against the incidence of false positives. Further, this is accomplished in a manner that is independent to the distribution of t values, as it concerns their ordinal ranking, not absolute values. This provides insight into the theoretical predictive power while controlling for the influences of alterations to the empirical false positive rate: \(\hat{\alpha }\ne \alpha\).

Figure 1 displays ROC curves alongside their associated Area Under Curve (AUC) values, together these show that:

  1. 1.

    Rank-VAR is never significantly worse than standard VAR.

  2. 2.

    Rank-VAR performs much better in some cases, notably:

    • for all processes including temporary spikes, regardless of the recurrence activation function, and

    • all cases of Sigmoid recurrence (though this improvement is minor in the poorly identified VARCH scenario).

In the most detrimental cases (Sigmoid with spiking, or VARCH errors), VAR loses almost all discriminative power (AUC near 0.5), essentially choosing links at random. The most notable enhancement is observed for the Sigmoid recurrence combined with Spiking, registering a 25.8% improvement in the AUC, elevating it from 0.562 to 0.707. The ‘Spike’ scenarios consistently demonstrate performance boosts for Rank-VAR, averaging an AUC improvement of 18%. Not surprisingly, such temporary spiking proves significantly disruptive for conventional link detection methods: spikes lack cross-asset correlation, yet two coincidental, random spikes can significantly distort estimations.

Fig. 1
figure 1

ROC curves of VAR and Rank-Var for all combinations of noise model and recurrence function. AUC indicates the ‘overall’ performance of each model

Fig. 2
figure 2

Degree distribution CCDF plots. Results show mean response and volatility network curves for both standard and Rank-VAR

Classification results

For the generation of causal networks, binary classifications are assigned to each potential link \(e_{ij}\) contingent on whether its \(t_{ij}\) value exceeds the threshold \(t^*\) aligned with our predetermined false positive rate \(\alpha\). Here, we evaluate the classifications, \(\hat{e}_{ij}\), generated from both standard and Rank-VAR across several metrics on our simulated data.

Table 1 Simulation results, displaying Rank-VAR (\(\rho\)), and standard VAR (r) classification metrics on processes containing various structural abnormalities
Table 2 Empirical network node-specific metrics, showing node degrees, \(k^{+-\varDelta }\) and the clustering co-efficient

Table 1 presents the outcomes of binary classification for simulated VAR processes, aiming for a false positive rate (\(\alpha\)) of 1%. The findings related to the combined metric (F1 score - the harmonic mean of precision and recall) are generally commensurate with the AUC trends previously described:

  1. 1.

    Rank-VAR minutely underperforms the standard method for Linear and Relu models with standard errors.

  2. 2.

    Rank-VAR substantially outperforms standard VAR for several scenarios:

    • again in all post-recurrence spiking simulations,

    • all scenarios involving the VARCH type errors, and

    • in both nonlinear, fat-tailed \(\epsilon\) scenarios.

    All other scenarios displayed functionally equivalent performance.

The empirical false positive rate, \(\hat{\alpha }\), offers insight into the disparities between AUC and F1 score performances. Given that AUC is indifferent to the specific \(t\) values, focusing solely on their ranking, the impact of an increased \(\hat{\alpha }\) becomes apparent exclusively in binary classifications. For instance, in the VARCH scenarios \(\hat{\alpha }\) values approximately double the intended target for standard VAR, consequently reducing its F1 score.

This difference is particularly noticeable in the Linear VARCH, ReLU VARCH, and fat-tailed ReLU simulations, where ROC performance is almost the same, yet the rank-method achieves better classification outcomes. In these cases, Rank-VAR does not directly improve the identification of edges, rather, its advantage lies in achieving more reliable parameter convergence, with better realisation of the asymptotic normality of \(t_{ij}\).

Empirical study

Data

Our dataset comprises hourly price data for 261 cryptocurrencies from January 1, 2021, to January 1, 2022, along with their market capitalizations as of June 2022. Market capitalization is defined here as the product of a coin’s price and its total circulating supply. The cryptocurrencies included were selected based on their being among the top 750 in terms of market capitalization as of June 2022. However, due to incomplete or missing historical price data we excluded a large proportion of the initially targeted 750 coins, mainly those with smaller market capitalizations. The final dataset represents 79.6% (836 billion of 1.05 trillion) of the total market capitalization. Hourly returns, denoted as \(y_t\), were calculated using the natural logarithm of the ratio between consecutive prices, \(p_t\), in the Coin/USD exchange rate, i.e., \(y_t = \log (p_t/p_{t-1})\). For an in-depth exploration of this dataset see Cornell et al. (2023). A key observation from our previous analysis is the significant capitalization-dependent non-normality within the dataset, indicating the potential benefit of employing robust methodologies, such as Rank-VAR.

Design

By applying both standard, and Rank-VAR to the dataset described in “Data” section we generate two forms of raw return causality networks (which model Mean-response/distribution center effects). We additionally generate two forms of volatility networks by running both methods on the squared and centered data series \((y_t-\bar{y})^2\). As these networks use the original series deviations, and not the residuals of the raw return networks they correspond to the assumption of mean-response efficiency in the cryptocurrency market (i.e., a complete absence of statistically significant mean-response causality). To refer to our 4 networks, we use the shorthand notation VAR, R-VAR, VAR \(\sigma ^2\) and R-VAR \(\sigma ^2\). For the detection of edges in these networks we again use the \(\alpha =0.01\) value utilised in “Simulation study” section.

Node specific attributes

We first examine several per-node network attributes, before moving onto network structural properties.

Node Degree Log-Log Complementary Cumulative Density functions (CCDFs) displayed in Fig. 2a and b show:

  1. 1.

    The distribution of out-degrees exhibits less curvature than that of in-degrees, indicating higher concentrations of outgoing, rather than incoming influence within the network.

  2. 2.

    In the VAR volatility network, both in-degree and out-degree distributions exhibit notable curvature. Such a pattern could suggest a high rate of false positives, given that a system heavily influenced by noise would yield near-binomial distributions with pronounced curvature.

We omit plotting the net node degree as the tail effects of \(k^\varDelta\) are dominated by \(k^+\), leading to near identical plots. If we presume the theoretical false positive rate, \(\alpha =0.01\) to hold, the implied precision values in the raw networks are 73.3% for the standard VAR and 36.8% for the rank-VAR. In the volatility networks, the implied precision values are 87.5% for the standard VAR and 78.6% for the rank-VAR. These values are consistent with the mild-moderately challenged identification scenarios we observe in “Simulation study” section.

Summary statistics

for our node-specific network metrics can be found in Table 2a and b. Standard VAR analysis reveals statistically significant correlations between market capitalization and both out-degree and clustering. In contrast, in-degree does not show a significant correlation with capitalisation, nor does the resulting net-degree. Rank-VAR demonstrates similar, but somewhat elevated correlations between out-degree and capitalization, but does not show a significant correlation for clustering, and in-degree is found to be negatively correlated. Unsurprisingly, the net node degree is found to be highly correlated to market capitalisation. The overall most apparent difference between the networks is the varying number of links, with the Rank-VAR having an average of 4.13 (1.6%) links per node, which is lower than the 9.79 (3.7%) links per node observed in networks generated using standard VAR. Both networks have the mean out-degree being notably higher than the median, with proportionally higher standard deviation than is observed in the in-degree distributions. Corroborating the CCDF observations, both networks display comparatively more skewing and heavy-tailed behaviour in the out-degree distributions, with higher mean/median ratios, and elevated standard deviations.

Table 2b presents data for the Volatility networks. We observe a similar pattern for node degree counts, with the the Rank network averaging 12.19 (4.66%) edges per node, whereas the standard VAR approach yields an average of 20.9 (8%) edges per node. Again, in both networks, the out-degree demonstrates greater skewness than the in-degree. The correlation patterns between the two networks show notable differences; in the VAR network, both node-degrees correlate with capitalization. In contrast, Rank-VAR shows (now positive) statistically significant correlations only between in-degree and market capitalization. The net-degree is uncorrelated to capitalisation for standard VAR, with mild, negative correlation for the Rank network. Clustering shows the opposite trend, with positive correlation for VAR, and an absence for Rank-VAR. For both model types, the total number of edges is markedly higher in the Volatility networks, indicating that a considerable proportion of causal dependencies occur in the higher order, or symmetric moments.

Self-edges

suggest the presence of either return autocorrelation or self-excitatory dynamics. For the VAR-derived network 85.4% of potential self-edges are identified as statistically significant. In contrast with the earlier findings, Rank-VAR now displays an increased detection rate, with 90.0% of self-edges deemed statistically significant. Among these, 98.4% in the VAR network and 99.6% in the Rank-VAR network exhibit negative signs. Despite the overall sparsity of our networks, this reveals a densely connected subset of links that suggest the presence of systematic mean-reversion in cryptocurrency prices. A similar pattern emerges in the volatility networks, where Rank-VAR identifies 93.8% of self-links, surpassing the 88.8% detected by standard VAR. Here too, the effects are predominantly asymmetrical, with all self-links in Rank-VAR and 97.4% in standard VAR being positively signed, signaling self-excitatory behavior.

This strong self-causality we observe in the volatility networks is consistent with the well known behaviour of ‘volatility clustering’ in financial data (Mandelbrot 1967). This is often visible via statistically significant auto-correlations of the absolute value, or squared returns in many financial instruments (Granger and Ding 1995; Ding and Granger 1996), and is often directly modelled through Autoregressive Conditional Heteroskedasticity (ARCH) models. As this behaviour has been previously documented in cryptocurrency data (Panagiotidis et al. 2022; Peovski et al. 2023), it comes as no surprise that we see this strong diagonal structure in our Causal Network.

Graph attributes

In edition to node-specific attributes, we investigate network structural properties with the global metrics discussed in “Methodology” section. For these metrics, we exclude self links as they artificially inflate many of the calculations. For instance, self loops immediately inflate reciprocity, while all length 2 cycles will have a length 3 cycle that is the previous cycle plus a self loop.

Fig. 3
figure 3

Community information as a function of modularity parameter \(\gamma\)

Fig. 4
figure 4

Raw return networks community structure

Fig. 5
figure 5

Volatility networks community structure

Table 3 displays these globals metrics. The first 5 metrics: Clustering, Reciprocity, Cycle counts of length 3 and 4, as well as modularity are shown as the ratio of the empirically observed value to the expectation of an ER random graph, with statistical significance indications (*, **, ***) corresponding to whether this ratio is statistically significantly not 1. We observe a repeating pattern for clustering, reciprocity and cycles of length 3 and 4. For these metrics we observe highly significant (** or ***) excess compared to random networks for all graphs barring the raw return Rank-VAR. The volatility networks generally contain more cycles, with excess cyclicity peaking at length 2, barring the VAR volatility network that shows increasing excess for longer cycles. The clustering co-efficient is higher for the standard VAR model, however there is significant (**), yet relatively mild excess clustering for the Rank-Volatility network (1.08 ratio). A limitation of our random graph validation is that for networks with low link-counts it is difficult to establish statistical significance. For instance, the 1.1 ratio for Rank-VAR clustering exceeds the 1.08 value of the Rank-VAR volatility, yet fails to reach statistical significance, while the lower magnitude effect for the volatility net is highly significant. Similarly the 1.5 Reciprocity ratio for Rank-VAR is of a nontrivial magnitude, yet fails to reach statistical significance.

The presence of these cycles, particularly of the shorter lengths has potentially volatile economic impact. For instance, volatility cycles may lead to repeated outlier feedback that rapidly excites the network beyond what would be expected in a more DAG-like structure. Similarly, cycles in the directional networks may produce price oscillations or aggressive short term trends.

Both forms of raw return networks show nontrivial tendency to point “downstream” in terms of capitalisation (\(P^{-}>0.5\)), while the volatility networks generally lack this behaviour. The Rank method accentuates this effect, with around \(60\%\) of total links pointing to a less capitalised coin. All networks show moderate, yet highly significant capitalisation assortativity. This effect is again maximised by the Rank-VAR network, with 0.22 correlation between source and target capitalisation. Both VAR networks show significant correlation between degree types (\(k^+\) and \(k^-\)), with this effect being generally elevated for the volatility networks. Particularly, the VAR volatility network shows exceptionally strong coupling of in and out-degree, with a correlation of 0.653.

Community structure

Figure 3 displays several attributes of the identified network communities as a function of \(\gamma\). Within the return networks, Rank VAR network demonstrates higher modularity scores, with a greater number of communities and a smaller fraction of total nodes belonging to the largest community. In contrast, the standard VAR network typically forms one predominant ‘super-cluster accompanied by several insignificantly sized communities. Both types of volatility networks display a tendency to form a dominant initial community, with generally low modularity and few total communities. This pattern aligns with the elevated density of links in these networks, with high modularity being difficult to achieve in densely connected graphs. The modularity ratios (\(\gamma =0.75\)) observed in Table 3 show all networks having statistically significantly more modularity than random graphs, with the ratio being maximised by Rank-VAR, compounding with the lower link count to produce the strongest community structuring. This effect is visible in Figs. 4 and 5, which plot the identified community structures (\(\gamma =0.75\)). All networks, barring Rank-VAR, show a limited number of super-clusters, with only the raw return Rank network having a very meaningful spread of communities.

Table 3 Global network metrics, showing the ratio of empirical: clustering, reciprocity, cycle counts and modularity against the value derived from random ER graphs, with statistical significance denoting the p-value associated with the null hypothesis of the ratio being 1, indicated by asterisks (\(*\rightarrow p\le 0.05, **\rightarrow p\le 0.01, ***\rightarrow p\le 0.001\)). The non-ratio metrics of assortativity and node-degree correlations have similar significance indicators, but for the null condition \(\rho =0\)

Summary and discussion

Table 4 Summary of network differentials. Displaying a ± if both Rank, or volatility networks are higher/lower than their counterparts for a specific metric, with unique combination effects if a single network achieves substantially different values without any matching method or data effects

To disambiguate the structural consequences of using the rank method (method effect), and to clarify the differences in the way signed and unsigned (data effect) information propagates through the markets we produce a table summarising graph differences. This table displays a method effect, ±, if both rank networks have higher/lower attribute value than the baseline VAR graphs, with a data effect being similarly calculated for volatility against the raw-return networks. A unique effect is noted for a specific model-data pair if it is higher/lower than all other’s, without any matching method or data effects.

Table 4 shows the results. We observe a moderate inverse relationship between several Method and Data effects. For the link and cycle counts, as well as in-degree versus both out-degree and market capitalisation we see reduced values for the Rank Networks, and elevated values for the Volatility networks. The most unique effects are identified for the Rank-VAR network, followed by both volatility graphs.

Together, the results from Tables 4, 3 and 2 suggests there are strong idiosyncrasies in the way signed, and exhitatory information propagates throughout the market. The volatility networks have more connections, reciprocity, cycles and strong correlations between in and out-degree. The raw returns have low, or negative correlation between in-degree and capitalisation, yet a strong relationship between out-degree and capitalisation. This combined with the tendency for raw links to point ‘downstream’ suggests that directional information has a more hierarchical structure, while volatility is more systematic. Intuitively, we could consider the signed network nodes to exist along an influence-influenced spectrum, while the volatility nodes tend to have more of a connected-disconnected axis. The Rank-VAR method substantiates our observations, as even under the robust method, the symmetric moment graph shows comparatively more of the identified systematic behaviour.

The results from Tables 3, 2 and 2 indicate that Rank-VAR tends to produce networks with fewer links, but higher connectivity in the subset related to self-influence. We see reduced clustering, cycle counts, and correlation between in and both out-degree and capitalisation. Like the raw returns vs volatility effects, this suggests a more hierarchical structure is identified with the Rank VAR. We observe several unique effects for the raw Rank-VAR, with particularly high assortativity and elevated probability of down links, suggesting that connections often occur “across and down”. This network uniquely shows strong positive correlation between net-degree and capitalisation, indicating that the influence-influenced hierarchy is largely characterised by total valuation.

There are several economic implications to these findings.

  • The capitalisation dependent hierarchy suggested by Rank-VAR implies that the downfall of high valuation coins may trigger widespread draw-downs.

  • The clustered, cyclical connections in our volatility graphs suggest that outliers may promote rapid, local escalation chains. Further, as capitalisation is positively related to the number of incoming links, cycles that begin in low-capitalisation regions may ‘spill-upwards’ and cause fluctuations in high value coins.

Conclusion

This study is an extended investigation of our robust model, Rank-VAR for the purposes of generating financial causality networks. The technique is validated in a simulation study featuring a range of violations to the standard VAR assumptions, with results showing Rank-VAR achieves increasingly superior classification results as we introduce more detrimental abnormalities. We conduct an expanded study on the application of Rank-VAR to a dataset of 261 cryptocurrencies. Networks are estimated from both standard return and centered deviation data, the latter providing insight into volatility dynamics. Our robust estimation technique substantiates the observed idiosyncrasies in the way directional, and unidirectional information propagates through the market. Our findings suggest that directional information often flows in a hierarchical pattern, while volatility acts more systematically, with increased connections, reciprocity, clustering and cyclical patterns. Across both data types the Rank approach identified fewer links, but an elevated proportion of edges in the highly connected subset that corresponds to self-causation. Rank-VAR shows particularly strong capitalisation based hierarchy, with out degree being positively, and in-degree negatively correlated against market capitalisation. These findings have tangible implications for all market participants, it highlights that crashes in high influence coins may propagate through local communities, and that volatility can rapidly escalate through causal feedback and clustered connections.

Data availability

The dataset supporting the findings of this study was derived from publicly available data obtained from CoinMarketCap (https://coinmarketcap.com) through their API service. Given that this API is subject to non-static commercialization policies, we are unable to freely distribute the raw data directly.

References

Download references

Author information

Authors and Affiliations

Authors

Contributions

C.C wrote the draft manuscript text and prepared all figures. All authors contributed to the conceptual design, and reviewed and edited the draft manuscript.

Corresponding author

Correspondence to Cameron Cornell.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

VAR(P) Equivalence and estimation

The ability to convert a VAR(p) model, which involves multiple time lags (up to p lags), into a VAR(1) representation is based on the method of vectorization and augmentation of the system. In a VAR(p) model, the future values of a vector of variables \(\varvec{y}_t\) are predicted by the past p values of the same vector. Mathematically, it is represented as:

$$\begin{aligned} \varvec{y}_t =A_1\varvec{y}_{t-1}+A_2\varvec{y}_{t-2}+...+A_{t-p}\varvec{y}_{t-p}+\varvec{c}+\varvec{ \epsilon }_t, \end{aligned}$$

where \(A_i\) are the coefficient matrices for lag i, and \(\varvec{ \epsilon }_t\) is the error term.

To transform this into a VAR(1) model, we define a new vector \(\textbf{z}_t\) that stacks the lags of \(\varvec{y}_t\) up to p:

$$\begin{aligned} \textbf{z}_t = \begin{bmatrix} \varvec{y}_t \\ \varvec{y}_{t-1} \\ \vdots \\ \varvec{y}_{t-p+1} \end{bmatrix} \end{aligned}$$

The VAR(1) representation then takes the form:

$$\begin{aligned} \textbf{z}_t = \textbf{A} \textbf{z}_{t-1} + \mathbf {\varepsilon }_t^* \end{aligned}$$

where \(\textbf{A}\) is a block matrix comprising blocks of \(A_i\) coefficients and identity matrices, and \(\mathbf {\varepsilon }_t^*\) is an augmented vector of the error terms. The block matrix \(\textbf{A}\) typically appears as:

$$\begin{aligned} \textbf{A} = \begin{bmatrix} A_1 &{} \quad A_2 &{} \quad \dots &{} \quad A_{p-1} &{} \quad A_p \\ \textbf{I} &{} \quad \textbf{0} &{} \quad \dots &{} \quad \textbf{0} &{} \quad \textbf{0} \\ \textbf{0} &{} \quad \textbf{I} &{} \quad \dots &{} \quad \textbf{0} &{} \quad \textbf{0} \\ \vdots &{} \quad \vdots &{} \quad \ddots &{} \quad \vdots &{} \quad \vdots \\ \textbf{0} &{} \quad \textbf{0} &{} \quad \dots &{} \quad \textbf{I} &{} \quad \textbf{0} \end{bmatrix}, \end{aligned}$$

with \(\mathbf {\varepsilon }_t^*\) as:

$$\begin{aligned} \mathbf {\varepsilon }_t^* = \begin{bmatrix} \varvec{ \epsilon }_t \\ 0 \\ \vdots \\ 0 \end{bmatrix}. \end{aligned}$$

This formulation allows us to view the dynamics of a VAR(p) process using a single-step lag structure of an expanded VAR(1), capturing all the dependencies and dynamics of the original model. This form is generally used to show that the theoretical properties VAR(1) generalise to VAR(p) processes (Luetkepohl 2005). In practice, we could use this form directly and estimate the entire \(\textbf{A}\) with least squares, however we know the true structure of the majority of this matrix, and the set of \(A_i\) can be estimated more directly in a once again altered form Luetkepohl (2005).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cornell, C., Mitchell, L. & Roughan, M. Rank is all you need: development and analysis of robust causal networks. Appl Netw Sci 9, 39 (2024). https://doi.org/10.1007/s41109-024-00648-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s41109-024-00648-w

Keywords