Ranking of communities in multiplex spatiotemporal models of brain dynamics

As a relatively new field, network neuroscience has tended to focus on aggregate behaviours of the brain averaged over many successive experiments or over long recordings in order to construct robust brain models. These models are limited in their ability to explain dynamic state changes in the brain which occurs spontaneously as a result of normal brain function. Hidden Markov Models (HMMs) trained on neuroimaging time series data have since arisen as a method to produce dynamical models that are easy to train but can be difficult to fully parametrise or analyse. We propose an interpretation of these neural HMMs as multiplex brain state graph models we term Hidden Markov Graph Models. This interpretation allows for dynamic brain activity to be analysed using the full repertoire of network analysis techniques. Furthermore, we propose a general method for selecting HMM hyperparameters in the absence of external data, based on the principle of maximum entropy, and use this to select the number of layers in the multiplex model. We produce a new tool for determining important communities of brain regions using a spatiotemporal random walk-based procedure that takes advantage of the underlying Markov structure of the model. Our analysis of real multi-subject fMRI data provides new results that corroborate the modular processing hypothesis of the brain at rest as well as contributing new evidence of functional overlap between and within dynamic brain state communities. Our analysis pipeline provides a way to characterise dynamic network activity of the brain under novel behaviours or conditions.

to observe these reconfigurations as spatial patterns of metabolic or electrophysiological activity, termed functional activity (Papo 2019). In order to generate these patterns, brain regions must coordinate through transfer of information. This exchange between brain regions defines the state's functional connectivity. Functional activity can therefore be interpreted as a realisation from a brain state graph model which describes brain dynamics and the relationships between brain regions in the state (Bassett and Sporns 2017). This relates to models of the relationship between observed state and environment in which states are realisations of a so-called Markov blanket taking input from the environment to create an internal model of the external and internal environment (Hipólito et al. 2021;Kirchhoff et al. 2018). In these graph models, nodes are anatomically or functionally defined brain regions and edge strength is determined by the level of information shared between these regions (their functional connectivity).
The dynamics of communities of brain regions are of particular interest due to the important functional roles some communities play. Previous work has focused on deriving communities of brain regions using a number of methods including dynamic community detection (Martinet et al. 2020). State space models have also been proposed that focus on the changing community structure within brain states from inferred functional connectivity (Ting et al. 2020;Liu et al. 2018). Our novel framework uses a Hidden Markov Model (HMM) approach to construct a model, we term a Hidden Markov Graph Model (HMGM). This framework is fully unsupervised requiring no sliding window-based estimation or thresholding of the functional connectivity, and no prior assumptions about the number of states or embedding dimension.
We analyse brain state dynamics as a multiplex graph with modular (community) structure at both the temporal (state switching dynamics) and spatial (brain region communication) levels. In order to differentiate the temporal communities of states and the less functionally relevant spatial communities from the most relevant we use the term network. Network here is used exclusively to refer to modular subgraphs of coordinated brain regions within a state that are functionally important (rather than being synonymous with the term graph). These brain networks form the basis of our understanding of the functional connectivity pathways within the brain and are integral to our understanding of the role of changing brain configurations in wakefulness and beyond (Rosazza and Minati 2011).
We have developed a method based on the HMGM framework to identify the importance of possible brain networks using random walks to ascribe to each module in each state an importance or T-score based on their functional connectivity and co-activation. Notably, the method does not apply random walk information to partition the graph but rather to determine the relative importance of communities within a partition (Rosvall and Bergstrom 2008). Our method provides a means to characterise dynamic functional activity under novel conditions or behaviours. As a proof of principle, we apply our pipeline to neuroimaging data from subjects at rest and provide new evidence for both modular and nested functional activity in the awake brain.

Static brain state models
In the simplest brain state models (see Fig. 1A), functional activity arises as noisy realisations of a single static brain state. Considerable progress has been made using this static framework to characterise the vast repertoire of activity patterns observed during wakefulness. Models using both weighted and unweighted graph structures derived from Independent and Principal Component Analysis (ICA and PCA respectively) have revealed key modules within the brain across a wide range of conditions. These include both behavioural and task-based conditions (sensory, motor etc.) and resting state conditions in the absence of direct stimulation (Calhoun and Adali 2012;Smith et al. 2009;Kokkonen et al. 2009;Sämann et al. 2011;Calhoun et al. 2004). Recent results from both electrophysiological data derived from Electroencephalography (EEG) and Blood Oxygen Dependent (BOLD) data derived from functional MRI (fMRI), suggest that weighted network models produce more reliably reproducible and robust results than do binarised network models (Jalili 2016;Ran et al. 2020;Smith et al. 2017).
Studies using static models have helped neuroscientists to build up vast libraries of associations between cognitive functions and specific brain regions . However, the static approach makes it difficult to account for inter-subject variability as well as dynamic changes in state that occur in time as different cognitive and Fig. 1 This figure shows how brain activity can be modelled as being generated by a system of either static (A) or dynamic (B) states, and, in particular, how such a dynamic brain state model can be interpreted as a multiplex network with modular structure (C). In A a static pattern (left) of functional activity (colour of functional activity map) and connectivity (edges between regions) is observed (green arrow) as a stationary multi-ROI multi-subject time series (right) in which each dimension is the activity observed for a particular Region of Interest (ROI) in each subject (separated by a dotted line). B Shows state dynamics for a multi-subject system with multiple states (left). In this system each state is represented by a colour and arrow length indicates its duration in time. This is observed as a multivariate time series composed of weakly stationary segments (right). Segment colour indicates the state that generated it. In C we use temporal relationships between states to represent the system as a dynamic multiplex graph. This system is decomposed (purple arrow) into its essential temporal (coloured ellipses) and spatial modules or functional networks (coloured subgraphs). Dashed circles around states show state hubs, important states in each community which are central to the dynamics and facilitation of brain activity across subjects functional demands are placed on the brain (Michael et al. 2014). These demands result in activity in one moment that is often functionally incompatible with activity in the the next, driving the need for dynamic approaches to brain state modelling (Sridharan et al. 2008).

Dynamic brain state models
Moving window-based approaches produce a series of snapshots of the activity pattern of the brain. Although these methods have proved incredibly useful in understanding changing brain state, they are limited in their ability to reliably detect changes in functional connectivity between regions over time (Hindriks et al. 2016). By contrast, state space models (Fig. 1B) and in particular Hidden Markov Models (HMMs) (Vidaurre et al. 2017;Chen et al. 2016), have arisen as an alternative to the sliding window approach and use a number of simplifying assumptions to improve on these models' tractability and specifiability (Suk et al. 2016). More recently, dynamic community detection methods have been proposed which capture many of the same features as dynamic state space models, however these methods often still rely on sliding window approximations of functional connectivity to construct a series of dynamic networks (Martinet et al. 2020;Liu et al. 2018).
The chief underlying assumption of HMMs is that brain dynamics can be parametrised by a finite state, positive recurrent, Markov process where functional activity and connectivity is determined by an observation model, typically a multivariate normal distribution (Vidaurre et al. 2016;Ting et al. 2020). In these models, dynamic switching between states can be interpreted as a temporal graph of probable state transitions (Fig. 1C). The full model can thus be interpreted as a nested, or multiplex graph in which the layers are brain states (with brain regions as nodes) and the interlayer directed edges are transition probabilities between state layers.

Novel multiplex approach
A state characterises a pattern of activity across the whole brain at a given time; however, it is most often characterised in terms of just a few key subgraphs of interacting brain regions (see Fig. 1C) (Ting et al. 2020;Liu et al. 2018). Much progress has been made to characterise the vast repertoire of activity patterns observed during resting states and task performance. These enquiries have given rise to a number of re-occurring and important networks, associated with a wide range of brain functions and behaviours (Shulman et al. 1997;Biswal et al. 1995;Menon 2011). The most prevalent and widely characterised of these are the so-called resting state networks, termed the Default-Mode (DMN), Salience (SN) and Central Executive (CEN) Networks as well as those active during sensory and motor tasks including: the sensorimotor, visual and auditory networks (Ryali et al. 2016). The mechanisms underlying these networks are interdependent with recruitment of one network often necessitating the further recruitment of other networks (Karahanoğlu and Van De Ville 2017). Conversely, some networks are known to be largely mutually antagonistic in activity, with DMN and SN activity generally being anticorrelated with sensorimotor-like activity in resting wakefulness (Vidaurre et al. 2018).
Although state space modelling of brain dynamics is a relatively young field, one key finding has been the multi-scale modularity of brain states. In particular, Louvain modularity-based community detection applied to the temporal graph of state transitions has shown that states are organised modularly into communities under a variety of conscious conditions including resting wakefulness and sleep (Vidaurre et al. 2017;Stevner et al. 2019).
In order to construct a set of plausible brain states models we train a number of HMMs with different numbers of states on resting state data. We then utilise our novel cross-validated maximum entropy procedure, based on the maximum entropy principle, to select the HMM that best generalises across subjects (Jaynes 1957). We convert the selected HMM into a dynamic graph model by transforming the state covariance matrices into weighted, directed graphs based on the regional correlations within each state and node attributes given by the state mean activity. The intralayer network which we term the Markov Information Matrix of the states is motivated by an interpretation of brain states as realisations of an underlying Markov blanket or network as in Hipólito et al. (2021).

Ranking the importance of networks within the brain
We perform two-level Louvain community detection to discover important communities of brain states (temporal communities) and brain regions within a state (spatial communities). We use community centrality statistics to identify the hub states of key activity in each network. Within each hub state we look at spatial community structure to determine the key actors in the dynamics of the model that may be important to the overall dynamics of wakefulness across subjects.
Random walks provide an effective way to construct representative samples from a graph in a way that preserves local structure (Dupont et al. 2006;Leskovec and Faloutsos 2006). In complex interdependent data sets random walk sampling can be used to remove baseline levels of interdependence and discern the most robust relationships in a one dimensional model, by conditioning out local inhomogeneity in noisy activity (Luecken et al. 2018). Here, we extend this principle to network sampling across two dimensions, space and time. Our method is based on a non-parametric random walk statistic that combines a temporal walk between layers with a spatial walk between regions. We use random walks to sample plausible patterns of functional network activity from the local functional activity background. We then use the samples as a benchmark against which to score functional coordination in our spatial communities. This statistical score, termed the T-score, is simple to compute given the graph model and putative network and is inspired by a similar method for analysing large, complex protein graphs with metalayer information (Luecken et al. 2018).
Our method allows us to determine which spatial communities are highly co-activated or inactivated relative to the expected dynamics across states in that brain area, providing a generalisable procedure to determine functionally relevant brain state communities. Our within state community functional associations largely agree with macroscopic analysis of the state functional activity maps, but provide an additional layer of information in the form of networks that provide clarification and depth to our understanding of brain states at the mesoscale.

Metatextual and network analysis of brain state models
We use the powerful metanalysis tool, Neurosynth (Yarkoni et al. 2011), to determine functional associations between each brain state, it's most important networks and important functional terms from the literature. Neurosynth provides scores based on either correlations between brain images and the occurrence of a predefined set of terms in the literature or, in conjunction with the NIMARE package (NiMARE 2019), a posteriori probabilities of associations between the image and an exhaustive list of literature terms. Using these tools and images derived from our brain states, termed functional activity maps, we provide evidence to corroborate the modular processing hypothesis in resting wakefulness (Reichardt and Bornholdt 2006). Key to our findings is that the states associated with resting state networks tend to self-associate while being anticorrelated with sensorimotor associated states.

Methods
In the following sections, "Acquisition and pre-processing of fMRI data for HMM modelling" and "Model specification and generalisability" sections, we explain the preprocessing of the data and define the state space (HMM) model and novel model selection criterion. We will see that each brain state s can be thought of as a pattern of activity represented by a weighted graph G(s) = {V , a(s), W (s)} in which each node is a brain region x ∈ V , (with |V | = D nodes), each with a level of functional activity a(s) As we shall show, Hidden Markov Modelling with our new model selection method, provides a means to construct a dynamic state space model from multi-subject fMRI time series data in a data driven way. We use inter-regional correlations to determine the state graphs and use the temporal relationships between states to determine the directed interlayer edges (see Fig. 1C). Lastly, in "Louvain and hierarchical temporal clustering", "Community hub selection", "Identifying functionally important spatial communities" and "Analysis of states and communities with NeuroSynth" sections we set out methods to explore the spatiotemporal modular and functional structure of these multiplex brain state models.

Acquisition and pre-processing of fMRI data for HMM modelling
Ten minutes of whole brain fMRI activity were recorded separately for each of N = 15 wakeful subjects (with eyes closed) as part of a previous study (Mhuircheartaigh et al. 2013). The brain volumes produced by the scanner were aligned to the MNI152 standard brain template (Fonov et al. 2011). This resulted in a high dimensional time series of each subject's fMRI (BOLD) signal for each voxel, with a temporal resolution of 3 s and a spatial resolution of 2 mm 3 (Woolrich et al. 2009).
Recordings were collected separately from each subject. Of the 200 volumes recorded per subject (each time point is one volume), four dummy volumes were removed to exclude any non-steady-state magnetisation effects. This was followed by motion correction with MCFLIRT (Motion Correction FMRIB's Linear Image Registration Tool), spatial smoothing using a Gaussian kernel of 5 mm full width half-maximum, global intensity normalisation, and temporal high-pass filtering with a cutoff of 0.02 Hz to remove low frequency scanner drift. Automated removal of non-brain tissue was initially performed before statistical analysis using BET (Brain Extraction Tool), with further manual correction in FSLview. Further spatiotemporal artefact removal was carried out by independent component in FSL melodic (Woolrich et al. 2009).
We selected regions of interest in our study based on the Harvard-Oxford (HO) probabilistic cortical and subcortical brain parcellations, which assigns to each voxel a probability for each brain region. We assign each voxel a unique region identity according to the maximum probability across regions in the HO parcellation. Excluding white matter regions the resulting parcellation of 63 Regions of Interest (ROIs) includes 48 cortical and 15 subcortical brain regions (Caviness et al. 1996;Makris et al. 2006). ROI time series were calculated using the ROI spatial mean BOLD signal at each time point. This results in a D = 63 dimensional time series with T = 196 time points per subject. Each of the D constituent ROI time series were temporal mean subtracted and normalised by the standard deviation.
Model fitting presents two challenges, the first is that the time taken to fit the model scales with parametric complexity, and the second is that a poorly parametrised model may lead to overfitting or underfitting. To address these challenges, dimensionality reduction by principal components of the original D dimensional time series was performed to reduce parametric complexity while also reducing overall noise. This approach is justified by the generally low embedding dimension of most real world data, including neuroimaging data (Ma et al. 2018;Shen and Meyer 2008). In order to balance dimensionality reduction and retention of signal, Parallel Analysis is used (see Additional file 1: Section 1) to obtain a D × d eigenmatrix A of the first d < D eigenvectors (Horn 1965). This method assumes roughly linear separability of uncorrelated noise from signal, but has been shown to outperform a number of methods, including maximum likelihood estimation, in simulation (Humphreys and Montanelli 1975). The reduced d dimensional time series {X * n,t } t∈N T is then inputted to train a noise reduced HMM model of the data.

Model specification and generalisability
We use the HMM-MAR package to train HMMs with multivariate normal observations by Variational Bayes (Vidaurre et al. 2016), whilst separating the data by subject into distinct trials of length T. For further details on model fitting see (Vidaurre et al. 2016). Figure 2A shows how observations of the fMRI BOLD signal at each time point are modelled across subjects. Dynamics for each subject are modelled and fitted using a shared set of states S with finite S = {1, 2, . . . , K } and Markov transition matrix P.
We give a brief overview of HMM dynamics. We note that a key parameter, for these dynamics, the number of brain states, K = |S| , that best generalises these dynamics across subjects is unknown. Consequently, we introduce a novel framework for selecting K based on an information theoretic criterion that maximises generalisability by maximising entropy of the state dynamics across subjects.
In each HMM state trajectory, the initial state of each subject's trial is selected independently at random. Under the Markov assumption of the model the resulting subject-specific state dynamics are assumed independent realisations of the same stochastic process, S n,t . For t > 1 , S n,t is conditionally dependent on the previous time step S n,t−1 so that for s ′ ∈ S . Each brain state s ∈ S is associated with an observation model O(s) ∼ MVN (µ * (s), � * (s)) . The O(S n,t ) model the row dimensionally reduced brain data X * n,t . In order to obtain the full model, the reduced model is then back-projected into D dimensional brain region space [see Eq. (2)].

Novel model selection criterion based on fractional occupancy
The Markov chain defined by P and any given initial state s 0 ∈ S , has a unique stationary distribution π s that is independent of s 0 assuming the chain is irreducible and the states are positive recurrent. The probability π s is the long run probability of the re-occurrence of state s. Selection of the number of these hidden states is carried out by cross-validated entropy maximisation over the related fractional occupancy distribution. The fractional occupancy distribution κ is defined by subject n for each state s and given by where P(S n,t = s|M, X) is the posterior probability of state s occurring at time t given the model M and data X. The fractional is the probability of finding subject n in s over the entire trial of length T. The distribution κ for subject n is related to the stationary distribution π s by the well-known limit That is to say that κ asymptotically approximates the long run average state dynamics of the model as trial length increases. Knowing this, our goal is to select the model whose (See figure on next page.) Fig. 2 Flow diagram of the graph modelling and analysis pipeline. Following preprocessing of the fMRI data we obtain multivariate regional brain activity time series for all N subjects. Variational Bayes inference is then used to train HMMs (using the HMM-MAR package (Vidaurre et al. 2016)). A is a sketch of an HMM fitted to the X n,t data for subject n and time point t. Each hidden brain state S n,t �(S n,t ) has mean activity µ(S n,t ) and covariance �(S n,t ) (after backprojection). State change from S n,t is determined by the transition matrix P. B The number of hidden states, K, is determined using mean subjectwise cross-validated maximum entropy, which is calculated over the fractional occupancies, κ s,n,k for each subject-state pair up to K states. C Adjacency matrix of the interlayer temporal directed transition graph determined by the Markov transition matrix of the HMM, with temporal communities in red along the diagonal. D Each state itself can be considered a layer with edges relating brain regions by their correlation in activity derived from their modelled covariance �(s) , with node weights (regional mean activity) determined by µ(s) . Of the states, some are highly connected state hubs, h(U), belonging to a temporal community U (red shading). E Each hub state (layer) h(U) is analysed and internal spatial communities are determined. F Internal communities are ranked according to their level of coherent brain activity compared to many repeated random walk samples from the multiplex model. G The results of ranking summarised by the community T-score. High T-score corresponds to a higher than expected level of community coherent activity when compared to the rest of the multiplex graph in this brain area. We propose functions for highly ranked communities by mapping these regions onto a 3D functional activity map and compared them to maps and terms drawn from the neuroscience literature with NeuroSynth fractional occupancy maximises the entropy pooled across subjects by maximising the objective function where the model M(n, k) is the model trained using all trials except the data from subject n assuming k hidden states, and X n is the trial data from subject n (see Fig. 2B). By selecting the initial number of states K = arg max H(k) , we appeal to the information theoretic principle of maximum entropy which states that the model which κ(s, n|M(n, k), X n ) log[κ(s, n|M(n, k), X n )] maximises the uncertainty over the data tends to be the one that best approximates the true data distribution (Jaynes 1957). More specifically, our goal is to obtain a set of states with similar uncertainty about subject behaviour over the course of the experiment. We shall see in "Entropy relates to model selection" section that the goal of state-subject uncertainty maximisation relates closely to that of optimal model selection. We note that to the best of our knowledge this is the first application of such a subject-specific entropic criterion in state space model selection.

The state Markov information graph
First model parameters µ * (s) and � * (s) for state s from the HMM model M are backprojected using the transpose eigenmatrix A to obtain a model in D dimensional brain space so that the full D dimensional model has mean µ(s) and variance �(s) defined over the ROIs and given by Using the full model, each state s has normally distributed observations with mean µ(s) x and covariance �(s) x,y , for brain regions x, y ∈ V . We use these to define a graph G(s) = (V , a(s), W (s)) over the set of R brain regions, node weights a(s) and edge weights W(s), which we take to be a proxy for the information flow between regions. More specifically, we estimate the weights W(s) by the correlation matrix |ρ(s)| , as derived from the state covariance matrix �(s).
Here, a(s) x = µ(s) x are the mean regional functional activity at brain region x in s. The weighted edge (directed information flow) from regions x to y are The resulting edge weights matrix W(s), defines a Markov transition matrix, a model of information flow between brain regions in state s in which information flow between x and y is defined both into x from y, W (s) x,y and out of x to y, W (s) y,x . Note this defines a potentially asymmetric and directed graph with edges (information flow) both into and out of x. The rationale for using such a Markov transition matrix to define edge weights is to convert the entire network into a dynamic Markov graph in which information is propagated probabilistically both in time and space. This is useful in particular in "Analysis of states and communities with NeuroSynth" section.

Louvain and hierarchical temporal clustering
We perform Louvain modularity detection on the directed Markov transition and information graphs (Blondel et al. 2008). Suppose G = (V , E, W ) is a potentially directed and weighted graph with vertex set V, edge set E and weight matrix W. The Louvain algorithm involves the greedy optimisation of an objective function Q(U ) , termed the modularity score for U a partition of V (see Additional file 1: Section 2) (Girvan and Newman 2002;Newman 2006). The algorithm allows for a resolution parameter γ which determines the relative size of communities and goes to one as γ → ∞ ).
(2) �(s) = A� * (s)A T and µ(s) = µ * (s)A T . (3) We use a form of the Louvain optimisation algorithm originally designed for undirected networks but complement this with a version of the modularity Q(U ) which has been adapted for directed networks in Nicosia et al. (2009) and Leicht and Newman (2008). In order to assess the validity of this approach, a rough measure of the degree of symmetry in a weight matrix W can be given by the fraction of the energy of the adjacency matrix (as measured by the Frobenius norm) that is contributed by the symmetric part, Sym(W ) (see Additional file 1: Section 3) (Aggarwal 2020).
In the case of temporal communities, we determine the significance of the community partitioning by comparing Q(U ) to an empirical distribution composed of modularity scores from 10,000 partitions constructed by random permutation of the community labels. In addition, in order to examine the state-subject relationships directly, we perform agglomerative hierarchical linkage clustering based on correlation in fractional occupancy κ using Ward's method (Ward 1963).

Community hub selection
State hubs are the states most central to the dynamics of the model and facilitate the switching dynamics within each community. These are selected by maximising the community centrality z-score, z(s), for each community U ⊂ S (Guimera and Amaral 2005; Shine et al. 2016). This score measures the within community degree centrality of a node relative to the mean community connectivity (see Supplementary Information 4). Hubs are then analysed for their community structure, using the same Louvain algorithm as in "Louvain and hierarchical temporal clustering" section but this time on the directed brain state graph G(s).

Identifying functionally important spatial communities
Not all detected communities are as relevant to a state's functional role as others. Performance of these roles requires both functional activation and coordination of brain regions. To discern which communities are the most functionally cohesive, we rank communities by comparing to samples of regional activity from the full multiplex graph model (see Fig. 2C, D). We used random walks to sample plausible patterns of functional network activity and employ them as a benchmark against which to measure the level of coordination within spatial communities. Controlling for the local level of background activity in space and time allows for a more representative indication of functional cohesion within brain networks identified by community detection than naive comparison of communities by community mean functional activity.
We introduce to neuroimaging the Functional Homogeneity, FH, as our community coherence measure, a statistic derived from the mean activity µ(s) and �(s) that is high when the community mean activity is most in agreement with the directions of maximum community functional connectivity and low otherwise. It is a measure of the alignment between the two key features of spatial communities, their level of shared information and activation. This measure is well suited for neuroimaging data, and is well established in computer vision and image classification where it is known as the covariance metric and measures the agreement between and within image classes (Li et al. 2019). The FH for a community C in a state s is where the superscript C refers to the submatrix given by removal of all rows and columns not corresponding to regions in community C. This metric is key to the community ranking procedure which follows a six step process: 1 Given a community C ⊂ V in state G(s) we calculate FH(s, C). 2 Sample a state s ′ from the stationary distribution π. 3 Select a region x ∈ C and sample |C| nodes from G(s ′ ) starting at x ∈ V in G(s ′ ). 4 Repeat steps 2 and 3 to construct a representative sample of paired states and brain regions (s 1 , C 1 ), (s 2 , C 2 ) . . . , (s L , C L ) 5 Calculate the T-score for functional cohesiveness of a subgraph where I is the standard indicator function and rank the communities in s by decreasing T-score. 6 Determine whether the community represents a correlated or anticorrelated brain subgraph by the sign of The T-scores of all the communities in a specific state can then be used to order the states in terms of which are most likely to contribute to the functional cohesion of the state. Note that T(s, C) is a score between zero and one, with one implying that the community C is much more functionally cohesive than other comparable brain subgraphs in space and time. T-scores are not designed to be compared across states. These steps are summarised by steps E to F in Fig. 2.

Analysis of states and communities with NeuroSynth
NeuroSynth is a meta-analysis tool that takes in 3D images of brain activity (termed functional activity maps) in MNI152 standard space and returns a scored association (based on the Pearson correlation) between the activity maps and other images from published articles that directly reference a given term i (Yarkoni et al. 2011). We choose the six terms most clearly associated with resting state activity default mode, salience, executive, these are the resting state network terms and sensorimotor, auditory and visual, sensory network terms. We used these to characterise the mean activity of a given state s by projecting the activity pattern µ(s) back into 3D brain standard space (see Supplementary Figure S-2A) and inputting the resulting map into NeuroSynth. The resulting score for a state s and term i is denoted θ i,s ∈ [−1, 1] , with 1 indicating perfect correlation between the state's mean functional activity map and i and -1 indicating perfectly anticorrelated activity. We note that although these terms, while chosen to relate to known resting state patterns, are not equivalent and should be thought of as suggestive of a global pattern of activity (or its absence). We explore the activity of actual networks in our spatial community analysis "Community rankings reveal spatiotemporal modules of functional activity" section.
Page 13 of 22 Wilsenach et al. Applied Network Science (2022) 7:15 We propose that the global score θ can also be considered a dynamically changing property of the system. Given a score θ i,s for a term i and state s, the one step ahead predicted score is We use this predicted score to examine the global properties of the activity observed after reaching a given state.
NeuroSynth can also be used in conjunction with the newly developed package NiMARE to directly calculate the posterior probability of terms from a large corpus of neuroimaging journal abstracts and images given a selection of brain voxels in standard space (NiMARE 2019). Due to the variability in brain region size, regions selected by community membership are downsampled by selecting 10,000 voxels with replacement from each community which was found to produce stable posterior probabilities up to the third decimal place.
We use NeuroSynth with NiMARE to determine a plausible function for each of our spatial brain region communities, selecting only those terms that are most a posteriori probable and which had a functional rather than anatomical interpretation (see Supplementary Figure S-2B). We pass each community from each hub state through our spatiotemporal community ranking method resulting in a ranked list of communities of brain regions per state and then pass each top ranked community through the NiMARE/Neu-roSynth method to determine their most likely functional term associations. In order to be comparable with the global score θ , the NeuroSynth score is either a positive or negative association depending on the mean activity of the regions as suggested in "Identifying functionally important spatial communities" section.

Validation of model framework
A detailed validation of key features of the modelling and analysis framework was carried out using synthetic data (see Additional file 1: Section 5). This includes validation of the dimensionality reduction method as a means to reduce the computational demand of modelling while retaining community structure using the Adjusted Rand Index (ARI) (Rand 1971). Validation is also performed for the Markov Information Graph-based community detection and model selection procedures. Other key components of the model such as the HMM inference procedure have already been validated using synthetic data with detailed simulations (Vidaurre et al. 2016(Vidaurre et al. , 2018.
Not all components of the modelling and analysis framework could be validated by simulation as it was considered beyond the scope of this document to generate realistic synthetic community functional homogeneity and NeuroSynth scores. The community importance ranking procedure is instead validated using real annotation metadata and the NeuroSynth tool.

Results
Results for our multisubject HMM model training and multiplex graph model analysis are given below.

Dimensionality reduction
We select the appropriate number of principal components using the method of parallel analysis outlined in Additional file 1: Section 1. This resulted in a reduced set of d = 9 dimensions that account for roughly 75% of the total variance, which are then used in fitting the model. Validation of this approach using synthetic data is explored in Additional file 1: Sections 5.1 and 5.2.

Entropy relates to model selection
Applying our cross validated maximum entropy Hidden Markov Model selection criteria by maximising the cross-validated entropy H(k), we obtain an HMM with K = 33 initial states. Figure 3 shows that the entropy maximum also coincides with the maximisation of the cross-validated Bayesian log-likelihood, which is a general indicator of model fit. To further reduce the risk of overfitting, we exclude those states that occur in less than 25% of subjects and renormalise P so that the rows again sum to one. The resulting model has a total of K = 27 brain states. Table 1 shows that states positively correlated with resting state activity terms are significantly more likely to transition to states with similar associations and vice versa (see Supplementary Figure S-3 for linear model comparison). In contrast, states correlated with resting state terms tended to transition to states that are negatively correlated with the sensory terms. This suggests that states associated with the former resting state networks tend to co-occur to the exclusion of sensory and sensorimotor patterns of activity. These results indicate a spatiotemporal separation between resting state network activity and sensory activity.

Network dynamics indicate clustering of activity patterns in space and time
States with high scores for sensory activity terms show a far weaker positive affinity for transition to each other than do the former resting state network terms. This suggests that concurrent activity in space and time is most likely between states with high resting state network activity. This pattern of concurrent activity is only weakly suggestive Fig. 3 Selection of the number of hidden states by minimising the negative cross-validated entropy. The axes show the negative cross-validated log-likelihood −cvLL (left) and negative cross-validated entropy −cvH (right). Qualitative similarities are evident between the two criteria suggesting deeper similarities between likelihood and entropy maximisation for sensory modes of activity. In contrast, robust mutually antagonistic spatiotemporal relationships between sensory and resting state network associations are present. We shall see in "Community rankings reveal spatiotemporal modules of functional activity" section this pattern of mutual exclusivity is mirrored by the most central states in the network or hub states at both the global (functional activity map) and the local (network community) levels. States show a general trend of transitioning from terms with one global activity association to another state that scores highly for the same association, suggesting some level of brain state inertia in the global pattern of functional activity.

Evidence for metatastate structure in wakefulness
In order to demonstrate the presence of temporal community structure, we performed hierarchical linkage clustering using the correlation in κ between subjects and states. We also calculated the normalised degree of symmetry in P, Sym(P) = 0.9921 indicating a degree of symmetry in P (with Sym(P) = 1 when P is completely symmetric). Figure 4A suggests a temporally clustered pattern of state fractional occupancy in which certain states are more likely to co-occur in one subset of subjects than in the other. Figure 4B shows the transition probability matrix P organised into communities by Louvain community detection, where γ = 0.48 (as selected by Variation of Information minimisation) . Temporal communities indicate modules of clustered state transitions. This temporal community partition was tested for robustness by comparing the Q modularity statistic to 10,000 random partitions with the same community labels ( p = 1e−4).
Each community, U ⊂ S , is characterised by a hub state h(U) determined by the state with the highest community degree z-score, a measure of state centrality to the temporal network (see Supplementary Figure S-1). Figure 4C, shows the long run probability of state s re-occurence π s . Re-occurence and centrality to a community appear to be strongly correlated as states more central to their communities according to the z-score, z(s), also tended to have a higher stationary probability π s , with correlation coefficient ρ = 0.537 ( p = 0.004 ). This observation suggests that as mediators of network dynamics, community hub states tend to re-occur, playing a central role in the overall network dynamics as well as in their own community. Table 1 This table shows the relationships between NeuroSynth terms scores, calculated using the mean activity brain map for each state and the one step ahead projected score for each term according to the model (see Eq. 5) Term scores for each state are correlated with the projected term scores one time step into the future (denoted by subscript t + 1 ) from the current state (bold is positive correlation, italic negative). False discovery rate corrected t test significance is marked as ** ( p < 0.01 ), * ( p < 0.05 ) and (*) for marginal results ( p < 0.1 ). The comparison between state scores for each term and the one step ahead predicted scores shows that there is a spatiotemporal relationship between resting state terms which are anticorrelated with sensory terms

Community rankings reveal spatiotemporal modules of functional activity
Louvain community detection was performed for each of the community hub state graphs G(h(U)) for each community U in partition U . We assessed the degree of symmetry in the Markov Information graph of each hub states and found that Sym(W (h(U ))) > 0.99 for all communities U. Here, the Variation of Information was not used to select γ as differing recommended γ between hubs was found to produce communities of inconsistent and incomparable sizes; we thus select the resolution as γ = 2 for all hub states. This was found to produce median spatial community network sizes that were sufficiently small on average (roughly 4 regions per community) for our community ranking method to efficiently sample the graph while also being large enough to detect functionally conserved brain state networks. We perform NeuroSynth analysis by taking the mean functional activity maps generated for the hub states as input in combination with the resting state network terms default mode, salience, executive and the sensory network terms sensorimotor, auditory and visual (see Supplementary Figure S-2A for algorithmic explanation). The results in Table 2 suggests a separation between sensory and resting state activity in space and time with hub states scoring highly for either resting state or sensory terms but rarely both. Table 2 gives the highest ranked functional terms (filtering out purely  ). B The log of the state transition matrix P is shown, where states have been grouped along the diagonal, according to their community membership. C Pie chart showing the state occupancy at equilibrium (the probability of finding a subject in a state in the limit as time goes to infinity). Wedges in this pie chart are the individual hub states in each community according to the community z-score. These two scores share a significant 0.537 ( p = 0.004 ), indicating the importance of community centrality to long run behaviour anatomical terms) in each hub state for the top three ranked spatial network communities (using our ranking method). The top terms for each of the networks (communities) in the states largely coincide with the functional associations ascribed to each of the hub states themselves.
Exploring these relationships, we see that in some cases the connections between spatial community function and hubs are direct. State 23 shows a positive association with observation and action in dominant spatial communities and a strong association with all three sensory network terms. State 11 shows a clear association with auditory activity as well as a top ranked community association with the term voice. In state 15, which shows a strong correlation with visual activity, the top ranked communities include positive associations with the face (a common object of visual processing).
In some states we see both strong positive and negative associations. Global negative asssociations are difficult to interpret in isolation as evidence of anticorrelated network behaviour within a state, however when paired with mesoscale information from the top ranked communities a stronger case is possible. State 32 appears mixed in activity but shows strong to moderate negative correlations with visual and auditory processing. The latter of these is corroborated by the anticorrelated speech network. State 30 is another state with mixed associations based purely on global functional activity, however, we see both moderate negative correlation globally with visual activity, and a specific negatively correlated community related to visual tasks or processing, suggesting a visual down state. A similar explanation can be used for state 5. State 23 is a sensory associated state with sensory associations at both the global and network scales. State 23 is negatively correlated with default mode activity. The default mode network is involved in language comprehension and reasoning, explaining the anticorrelated network associated with syntactic processing. Negatively associated communities may more generally suggest The terms scored by NeuroSynth are the resting state terms default mode (DM), salience (S), executive (E) and sensory terms sensorimotor (SM), visual (V) and auditory (A). The first rows of the table under hub states show the NeuroSynth correlation score between each of the hub states' brain maps and the terms on the left (see Additional file 1: Figure S-2). The second section under terms shows the most probable terms associated with each of the top three communities identified by our ranking method, providing further information on the component functional communities of these states. The sign next to each term indicates whether the association is positive or negative (depending on the sign of the brain regions involved)

Discussion
In this paper we present a fully unsupervised pipeline for characterising the spatiotemporal activity of neuronal brain states in terms of a multiplex brain state graph model. This pipeline involves the training of an HMM in order to obtain a multiplex spatiotemporal directed brain state graph that represents the dynamics of subjects in resting wakefulness. We present a method for obtaining a set of states (layers) that generalises well over subjects and use this method to determine key states in the network dynamics. Lastly, we characterise the spatiotemporal components of the model that are most central and most functionally coherent, characterising these using metatextual image analysis of the neuroscience literature. Our HMGM-based methodology reveals a rich array of complementary communities acting together to produce modes of neural behaviour during resting wakefulness. Crucially, we have shown that patterns of activity resembling the resting state networks tend to co-occur and that these patterns tend to preclude sensory and sensorimotor patterns of activity. This modularity of brain state function has been suggested by others (Smallwood et al. 2012;Vidaurre et al. 2017), but metaanalysis of terms associated with these functions allows us to characterise individual states and quantify their change in character through time.
Within each hub brain state the division between functions was not clearly partitioned, with many terms featuring communities with memory or autobiographical associations, possibly suggesting an undercurrent of narrative thought which persists across numerous states. Alternatively, this may be due to artefacts caused by auditory memoryrelated tasks studies in the NeuroSynth database. It is important to note that spatiotemporal state-based activity analysis is novel and so terms in the literature which derive from static models of activity may not map accurately onto dynamic patterns of activity. In particular, transient states may be smoothed out of these analyses meaning that new studies will need to be performed focusing on dynamic functional activity change at much shorter time scales in order to build up an understanding of function in dynamic brain states.
Some of the state global functional activity term associations, particularly negative ones, remain difficult to interpret. In state 13, there is a strong association with the term sensorimotor, however all of the top ranked communities for this state are negatively associated with functions that may have a closer association to resting state activity. This could be due to putative link between the central executive activity and reward observed in primates (Sigmund et al. 2001), but may also be due to ranking error or noise in our graph model. However, the roles of many states become more clear when combining functional information from either anticorrelated or correlated mesoscale communities with global tendencies in functional activity. We hypothesize that strongly cohesive anticorrelated networks may be entering a coordinated down state due to changes in metabolic or functional demand (Tomasi et al. 2017;Passow et al. 2015;Thompson 2018).
One issue with our approach is that the Louvain implementation we use with directed modularity does not fully capture the signal of edge directionality in community