The global dynamical complexity of the human brain network

How much information do large brain networks integrate as a whole over the sum of their parts? Can the dynamical complexity of such networks be globally quantified in an information-theoretic way and be meaningfully coupled to brain function? Recently, measures of dynamical complexity such as integrated information have been proposed. However, problems related to the normalization and Bell number of partitions associated to these measures make these approaches computationally infeasible for large-scale brain networks. Our goal in this work is to address this problem. Our formulation of network integrated information is based on the Kullback-Leibler divergence between the multivariate distribution on the set of network states versus the corresponding factorized distribution over its parts. We find that implementing the maximum information partition optimizes computations. These methods are well-suited for large networks with linear stochastic dynamics. We compute the integrated information for both, the system’s attractor states, as well as non-stationary dynamical states of the network. We then apply this formalism to brain networks to compute the integrated information for the human brain’s connectome. Compared to a randomly re-wired network, we find that the specific topology of the brain generates greater information complexity.


Introduction
From a computational neuroscience perspective, the brain is oftentimes abstracted as a complex information processing network, that integrates sensory inputs from multiple modalities in order to generate action and cognition. In this paper, we ask a much simpler question: viewing the brain as a dynamical network of neural masses, how can one compute the information integrated by such networks in the course of dynamical transitions from one state to another? A possible approach, among others, is to look at information-theoretic complexity measures that seek to quantify information generated by all causal sub-processes in such a network. One candidate measure for global dynamical complexity is integrated information, usually denoted as . It was first introduced in neuroscience as a complexity measure for neural networks, and by extension, as a possible correlate of consciousness itself (Tononi et al. 1994). It is defined as the quantity of information generated by a network as a whole, due to its causal dynamical interactions, and one that is over and above the information generated independently by the disjoint sum of its parts. As a complexity measure, seeks to operationalize the intuition that complexity arises from simultaneous integration and differentiation of the network's structural and dynamical properties. As such, the interplay of integration and differentiation in a network's dynamics is hypothesized to generate information that is highly diversified yet integrated, thereby creating patterns of high complexity. The aim of this paper is to develop mathematical tools for computing integrated information (analytically when possible, otherwise numerically) for large networks. We then apply this framework to the large-scale structural connectivity network of the human brain.
Let us begin with a brief review of the rich history of this field. The earliest proposals defining integrated information were made in the pioneering work of (Tononi 2004;Tononi and Sporns 2003;Tononi et al. 1994). Since then, considerable progress has been made towards development of a normative theory as well as applications of integrated information Balduzzi and Tononi 2008;2009;Barrett and Seth 2011;Krohn and Ostwald 2016;Mediano et al. 2016;Oizumi et al. 2014;Tononi 2012). Similar information-based approaches have also been successfully applied to many-body problems in other domains, such as, for the problem of estimating microstates of statistical mechanical ensembles (Arsiwalla 2009). In fact, there are now several candidate measures of integrated information such as neural complexity (Tononi et al. 1994), causal density (Seth 2005), from integrated information theory: IIT 1.0, 2.0 & 3.0 (Balduzzi and Tononi 2008;Oizumi et al. 2014;Tononi 2004), stochastic interaction (Ay 2015;Wennekers and Ay 2005), empirical (Barrett and Seth 2011) and synergistic (Griffith 2014;Griffith and Koch 2014), plus several variations of these (see Tegmark (2016) for an overview). Table 1 summarizes these measures along with corresponding information metrics upon which they have been based.
Many of the above measures have been useful in different domains of validity. However, applications to realistic data and in particular to large-scale networks have proven computationally challenging. With a focus on developing computational tools, we discuss three of the above measures in more detail. The measure of (Balduzzi and Tononi 2008) has been quite useful for discrete-state, deterministic, Markovian systems with the maximum entropy distribution. On the other hand, the measure of (Barrett and Seth 2011) has been applied to continuous-state, stochastic, non-Markovian systems and in principle, admits dynamics with any empirical distribution (although in practice, it is easier to use assuming Gaussian distributions). The formulation in (Barrett and Seth 2011) is based on mutual information, whereas (Balduzzi and Tononi 2008) uses a measure based on the Kullback-Leibler divergence. Note however, that in some cases the measure of (Barrett and Seth 2011) can take negative values and that complicates its interpretation. The Kullback-Leibler based definition computes the information generated during  (Balduzzi and Tononi 2008;Barrett and Seth 2011) make use of a normalization scheme in their formulations. Normalization inadvertently introduces ambiguities in computations. The normalization is actually used for the purpose of determining the partition of the network that minimizes the integrated information, but a normalization dependent choice of partition ends up influencing the value and interpretation of . An alternate measure based on the Earth Mover's distance was proposed in (Oizumi et al. 2014). This does away with the normalization problem (though the current version is not formulated for continuous-state variables). However, the formulation of (Oizumi et al. 2014) lies outside the scope of standard information theory and is still very difficult for performing computations on large networks. A comment on network partitions is relevant at this point. The three measures of discussed above, make use of what is called the minimum information partition or minimum information bi-partition (MIP/MIB). The issue with this partitioning is that it leads to a combinatorial explosion in the number of configurations to be evaluated when working with networks having a large number of nodes. As a result, application of the above measures to compute integrated information of very large networks remains challenging, particularly for the scale of networks obtained from neuroimaging data. On the other hand, in earlier work (Arsiwalla and Verschure 2013), we have introduced a formulation of integrated information that overcomes both, the normalization and combinatorial problem by using a different partitioning of the network called the maximum information partition (MaxIP), which opens the prospect of large-scale applications. However, the formulation in (Arsiwalla and Verschure 2013) was only applicable for uncorrelated node dynamics, which may not be realistic enough for many biological systems.
In this paper, we seek to go beyond , starting with an extension of the formalism to include node correlations and also nonstationarity. In order to do that, we solve the discrete-time Lyapunov equation, the solution of which, is then used to get fully analytic expressions for with network correlations. We consider networks with linear stochastic dynamics, which generate multivariate time-series signals. Furthermore, our networks are plastic, in the sense that connection weights are scalable using a global coupling parameter. We compute as a function of this coupling. We also extend our framework to include non-stationary dynamics. This gives us as a function of time, computed through the temporal evolution of the system. The stationary solution yields at the fixed-point attractor, whereas the non-stationary solution leads to elsewhere in the phase space of the system.
As proof of principle, we apply our formulation to the structural connectivity network of white matter fiber tracts in the human cerebral cortex, obtained from diffusion spectrum imaging (Hagmann et al. 2008;Honey et al. 2009). This network has 998 nodes, representing neuronal populations. The edges are weighted fiber counts between populations. Implementing stochastic Gaussian dynamics on this network, we determine stationary solutions to the dynamical system from which we compute the information integrated in bits. To contrast with a null-model, we randomly re-wire the original network and repeat the computation. The original network scores higher on integrated information for all allowed couplings in the stationary as well as non-stationary regime.

Mathematical formulation
We consider networks with linear stochastic dynamics. The state of each node is given by a random variable pertaining to a given probability distribution. These variables may either be discrete-valued or continuous. However, for many biological applications, Gaussian distributed, continuous-valued state variables are fairly reasonable abstractions (for example, aggregate neural population firing rate, EEG or fMRI signals). The state of the network X t at time t is taken as a multivariate Gaussian variable with distribution P X t (x t ). x t denotes an instantiation of X t with components x i t (i going from 1 to n, n being the number of nodes). When the network makes a transition from an initial state X 0 to a state X 1 at time t = 1, observing the final state generates information about the system's initial state. The information generated equals the reduction in uncertainty regarding the initial state X 0 . This is given by the conditional entropy H(X 0 |X 1 ). In order to extract that part of the information generated by the system as a whole, over and above that generated individually by its parts, one computes the relative conditional entropy given by the Kullback-Leibler divergence of the conditional distribution P X 0 |X 1 =x (x) of the system with respect to the joint conditional distributions r k=1 P M k 0 |M k 1 =m of its nonoverlapping sub-systems demarcated with respect to a partition P r of the system into r distinct sub-systems. Denoting this as P r , we have where for an r partitioned system, the state variable X 0 can be decomposed as a direct sum of state variables of the sub-systems and similarly, X 1 decomposes as For stochastic systems, it is useful to work with a measure that is independent of any specific instantiation of the final state x . So we average with respect to final states to obtain an expectation value from Eq. (1). After some algebra, we get This is our definition of integrated information, which we use in the rest of this paper. Note that the measure described in (Balduzzi and Tononi 2008) is not applicable to networks with stochastic dynamics. They do use Eq.
(1) as their definition but endow their nodes with discrete states. On the other hand, (Barrett and Seth 2011) uses a different definition of integrated information, where conditional entropies as in Eq. (4) are replaced by conditional mutual information. This definition only matches the definition of Eq. (1) in special cases but not in general for any distribution. From an information theory perspective, the Kullback-Leibler divergence offers a principled way of comparing probability distributions, hence we follow that approach in formulating our measure in Eq. (4).
The state variable at each time t = 0 and t = 1 follows a multivariate Gaussian distribution The generative model for this system is equivalent to a multi-variate auto-regressive process (Barrett et al. 2010) where A is the weighted adjacency matrix of the network and E 1 is Gaussian noise. Next, taking the mean and covariance respectively on both sides of this equation, while holding the residual independent of the regression variables, yields In the absence of any external inputs, stationary solutions of a stochastic linear dynamical system as in Eq. (6) are fluctuations about the origin. Therefore, we can shift coordinates to set the meansx 0 and consequentlyx 1 to the zero. The second equality in Eq. (7) is the discrete-time Lyapunov equation and its solution will give us the covariance matrix of the state variables.
The conditional entropy for a multivariate Gaussian variable was computed in (Barrett and Seth 2011) which is fully specified by the conditional covariance matrix. Inserting this in Eq. (4) yields Now, in order to compute the conditional covariance matrix we make use of the identity (proof of this identity for the Gaussian case was demonstrated in (Barrett et al. 2010)) The appropriate covariance we will need to insert in this expression is which gives for the conditional covariance And similarly for the sub-systems where k indexes the partition such that M k 0 denotes the k th sub-system at t = 0 and A k denotes the restriction of the adjacency matrix to the k th sub-network.
Further, for linear multi-variate systems, a unique fixed point always exists. We try to find stable stationary solutions of the dynamical system. In that regime, the multivariate probability distribution of states approaches stationarity and the covariance matrix converges, such that t = 0 and t = 1 refer to time-points taken after the system converges to the fixed point. Then the discrete-time Lyapunov equations can be solved iteratively for the stable covariance matrix (X t ). For networks with symmetric adjacency matrix and independent Gaussian noise, the solution takes a particularly simple form and for the parts, we have given by the restriction of the full covariance matrix on the k th sub-network. Note that Eq. (16) is not the same as Eq. (15) on the restricted adjacency matrix as that would mean that the sub-network has been explicitly severed from the rest of the system. Indeed, Eq. (16) is precisely the covariance of the sub-network while it is still part of the network and yields the integrated and differentiated information of the whole network that is greater than the sum of these connected parts. Inserting Eqs. (12), (13), (15) and (16) into Eq. (9) yields as a function of network weights for symmetric and correlated networks. For the case of asymmetric weights, the entries of the covariance matrix cannot be explicitly expressed as a matrix equation. However, they may still be solved by Jordan decomposition of both sides of the Lyapunov equation.

The maximum information partition
Following Edlund et al. 2011), the maximum information partition (MaxIP) is defined as the partition of the system into its irreducible parts. This is the finest partition and is unique as there is only one way to combinatorially reduce a system into all of its sub-units. This partition can directly be found by construction and does not require a normalization scheme for sampling through the space of multipartitions in order to search for the one that either maximizes or minimizes the integrated information. Consequently, the resulting value of computed using the MaxIP is free from normalization dependencies.
Moreover, the MaxIP also helps reduce computational cost. This can be seen as follows. Prescriptions using the MIP/MIB are typically evaluated for a large class of network bi-partitions, whereas the MaxIP is uniquely defined. The number of bi-partitions of a set of n elements is given by the sum of binomial coefficients [n/2] p=1 n C p , where n C p = n! /p! (n − p)! with n! = n × (n − 1) × · · · × 1 and [ n/2] denotes the nearest integer less that or equal to n/2. Among all possible bi-partitions, MIP/MIB prescriptions usually restrict to those that divide the system into approximately equal parts. This still leaves us with n C [n/2] configurations for which has to be computed. Table 2 summarizes how this number scales with network size from a single node to a million nodes.
Another interesting feature of the MaxIP is that computed using this partition in fact accounts for the maximum amount of information that the network can integrate compared to any other bi-, tri-or multi-partition of the system. This is due to the fact that this partition cannot be decomposed further. Every other partition will be coarser than the MaxIP and will therefore have at least some of its parts as composites of the irreducible units in the MaxIP. As these composites integrate more information than its own irreducible units, subtracting the information of a composite (when treating the composite as a part) from the information of the whole system will always produce a smaller than that obtained by subtracting the information of each irreducible unit of the network from that of the whole network. Therefore computed using the MaxIP is the maximum possible integrated information of the system compared to computed using any other partition of the network. In that sense, unlike the MIP or MIB, the MaxIP in fact captures the complete information integrated by the network and is therefore a more natural choice for quantifying whole versus parts.

Analytic solutions for
Now that we have a rigorous analytic formulation of integrated information, let us first demonstrate examples of computations performed using artificial networks. In Fig. 1 we consider two artificial networks. For these cases, we want to compute the exact analytic solution for . Each of these networks have 8 dimensional adjacency matrices with bidirectional weights (though our analysis does not depend on that and works as well with directed graphs). We want to compute as a function of network weights, which we keep as free parameters. However, in order to constrain the space of parameters, we shall set all weights to a single parameter, the global coupling strength g. This gives us as an analytic function of g.

for attractor states
We first compute in the stationary regime, that is, when the system has converged to its fixed-point attractor state. The results for the two networks labeled a and b respectively are shown in Eqs. (17) corresponding to the stable stationary solution of the system.
−1 + g 2 4 1 − 8g 2 + 4g 4 6 1 − 17g 2 + 72g 4 − 64g 6 + 16g 8 8 (18) where Note that the mathematical framework described above is in no way limited by the size of the network and thus, in principle, can be applied to networks of any size, to yield exact results. The only practical difficulty would be in the form of available computing hardware resources. Hence, for very large data networks, such as those from brain imaging, numerical computations of would be more practical to perform.

for non-stationary dynamics
The mathematical formulation developed above can also be used to compute at nonstationary points in the solution space of networks with linear stochastic dynamics. We show this explicitly for Network B in Fig. 1. We compute for the complete temporal evolution of the system starting from an initial condition at t = 0 until the system stabilizes at the fixed point attractor. In the non-stationary case, Eqs. (14), (15) and (16) no longer hold. However, everything up to and including Eq. (13) are valid. Hence, the covariance matrix (X t ) is simply computed recursively following Eq. (7). Subsequently, (X t ) and are both computed for each time-point t. Figure 2 shows the multivariate time-series signals generated by Network B for two different coupling strengths g. The critical value of g for this network is 0.3023, at which the dynamics becomes unstable. For g ≤ 0.3023, the system converges to the fixed-point at the origin. In Fig. 3 we plot temporal profiles of for both the above values of g, which shows increasing integrated information for stronger coupled networks.

Application to brain connectomics
The framework described above, provides us with all the mathematical tools to compute how much information is integrated in bits in a single time-step, by a large network with linear stochastic (Gaussian) dynamics. We apply the above formulation to the whole brain structural connectivity network of the human cerebral cortex, using data published in the seminal work of (Hagmann et al. 2008;Honey et al. 2009). This data is acquired from high-resolution T1-weighted diffusion spectrum imaging (DSI). The data preprocessing pipeline, as described in (Hagmann et al. 2008), involves white and gray matter segmentation from the T1 images, followed by parcellation into 66 anatomical regions and subsequently 998 individual regions of interest (ROIs) based on Talairach coordinates. After that, whole brain tractography is performed to obtain estimates of axonal The plot on the left shows simulated data corresponding to network coupling strength g = 0.3000 and variance of noise σ = 1. The plot on the right refers to data from the same network with coupling g = 0.3022 and the same noise amplitude. Each plot shows 8 time-series profiles, corresponding to the 8 nodes of the network (note that several of these profiles intersect or overlap with each other, hence in the above plots they appear to be clustered together). The time-series for each node is shown in a different color and the color scheme is the same for the plot on the left and the one on the right. Stability of the system is guaranteed until the critical coupling at g = 0.3023. Closer to the critical point, the system takes longer to converge to the fixed-point attractor at x = 0 trajectories across the entire white matter. From this, connection weights between pairs of ROIs are determined, resulting in a weighted network of structural connectivity across the entire brain. We have displayed the data in matrix form as a 998 dimensional matrix on the left-hand side of Fig. 4. The 998 voxels (ROIs) represent nodes of the network. Each node is physically a population of neurons. The edges are weighted fiber counts between populations. Additionally, we include a global coupling variable g, multiplying the entire matrix, that can be used to tune the overall strength of the weights.
To simulate brain dynamics, one may chose from among a variety of possible models, discussed in (Arsiwalla et al. , 2015aGalán 2008). To run these simulations, one may use customizable tools such as those described in (Betella et al. 2014a,b;Omedas et al. 2014). The simplest model among the ones mentioned above is the linear stochastic Wilson-Cowen model. In fact, it can be seen from (Galán 2008) that Eq. (6) is precisely a special case of the discrete-time limit of the linear stochastic Wilson-Cowen model. That is what we use here. The brain's state of spontaneous activity or resting-state is usually identified as the attractor state of these models. This corresponds to finding  The connectivity matrix on the left is a weighted matrix with the color-bar (in the middle) indicating connection strengths. The randomized matrix on the right is obtained by randomly shuffling positions of weights from the connectivity matrix stable stationary solutions of the system. This is precisely the regime in which we compute in bits as a function of the coupling g. The results are shown in the red profile in Fig. 5. Further, in order to contrast this result with a null model, we also rewired the edges of the connectome network randomly, while preserving the magnitude of the weights. This generates the randomized data matrix shown on the right-hand side of Fig. 4. We also compute for this matrix. The resulting profile is the blue curve in Fig. 5. For extremely small couplings, the two networks are indistinguishable on scores, however, as g grows, the architecture of the brain's network turns out to perform better at integrating information than its randomized counterpart.
While Fig. 5 depicts the fixed-point behavior of as a function of g, in Fig. 6 we show the full time-course of for both the connectome network as well as its randomized counterpart at a specific value of the coupling g = 1.488. The non-stationary behavior is computed using linearized dynamics as discussed above. Asymptotic values of in Fig. 6 converge to those in Fig. 5 at g = 1.488. Once again we find that the connectomic for the data (shown as red points) and for the randomized network (shown as blue points). Stationary solutions exist up to g = 1.49, the critical point of the data network Fig. 6 A comparison of temporal profiles of for the brain connectome network versus its randomized counterpart, both computed at a fixed coupling g = 1.488. The asymptotes of these profiles match the stationary values of in Fig. 5 for the given coupling network completely dominates its randomized counterpart in the quantity of information it integrates and this difference only gets more pronounced upon approaching the attractor state. Note that for a more thorough comparison, one might also want to check the above against an entire distribution of random networks. However, the main point of this paper is to demonstrate a systematic computation of how much information a realistic large network integrates. Functionally, what this corresponds to in terms of brain function or disease is an interesting question by itself. A possible approach towards addressing those issues would be to perform computations as the one demonstrated above for a large repertoire of neuroimaging studies ranging from task-based paradigms to disease states and use that to calibrate brain functional states on a scale of information complexity. Another question on which there is still no consensus concerns consciousness ). While it is generally agreed that information integration is a necessary component of phenomenological consciousness, by itself, it may not be sufficient , Verschure 2016).

Discussion
In this paper, we have demonstrated a computational framework, built on a rich body of earlier work on information-theoretic complexity measures and applied it to compute the integrated information of large networks endowed with linear stochastic dynamics. Integrated information is interesting as a global measure of a system's dynamical complexity. Whereas local complexity measures such as Granger causality, transfer entropy or synergistic mutual information have been very successful at quantifying local information processes of complex systems (Wibral et al. 2014), global measures such as integrated information serve to complement local measures and give insights on the system's collective behavior.
Earlier attempts to compute integrated information have been limited to relatively small networks. This was mainly due to normalization ambiguities and explosive combinatorics associated with bi-partitions used therein. Instead, what we find is that the finest partitioning of the system solves all these problems and opens the window of applicability to large-scale networks. In particular, we apply our formulation to the human brain connectome network. This network is constructed from white matter tractography data from the human cerebral cortex and consists of 998 nodes with about 28,000 symmetric and weighted connections between them (Hagmann et al. 2008;Honey et al. 2009). Using a discrete-time linear stochastic neuronal population model to generate the dynamics of neural activity on this network, we compute the integrated information of this dynamical system during state transitions for both, stationary as well as non-stationary dynamics. For the linearized system, the stationary solution corresponds to the network's resting state attractor. The computed integrated information depends on both, the structural anatomy as well as the network's dynamical operating point, that is, the value of the global coupling g.
We see potentially useful applications of our information-based measures for other types of physiological data as well, for example, tracing studies or detailed microscopic connectivity data. As for neuroimaging studies, information-based methods offer a useful way to quantify complexity of brain functions. The clinical utility of our measure would be in identifying information-based differences between healthy subjects and patients of neurodegenerative diseases. Just as we identified a transitionary phase after which an anatomical network strongly differs in information integration and differentiation from a randomly rewired network, similar comparative analysis for patients compared to healthy controls might provide a quantification of the extent of the disorder and even provide an analytic way to suggest diagnostic surgical rewiring to restore network processing.