Dataset
As a case study for our analysis, we employ a snapshot of the bipartite network of US mutual fund holdings corresponding to the third quarter of 2012. The network was inferred from data in the SurvivorBiasFree US Mutual Funds database provided by the Center for Research in Security Prices, The University of Chicago Booth School of Business. We only considered equity funds with a reported Total Net Assets (TNA) greater than or equal to one million US dollars (USD) and, in addition, we discarded funds that are classified as International and Global. We parsed the data for just the holdings that correspond to stocks with a valid ticker. Assets with coupon or maturity information and derivatives were not considered. This filtered network has 3497 different funds investing in 9015 different assets. The number of holdings is 550,554 for a total asset value of 2.61 trillion USD. In the following, we will use the generic term “holders” to refer, in particular, to US mutual funds and their asset managers, and “holdings” to indicate the different stocks in the funds’ investment portfolios.
Model of bipartite network of holdings
We model a network of holdings with m portfolios and an investment universe of n different stocks as a bipartite graph (Caldarelli 2007; Newman 2010). We first introduce the network’s incidence matrix B=(b_{ij}), where b_{ij}=1 if portfolio i invests in stock j and b_{ij}=0 otherwise. The number of holdings is the number of the nonzero elements of B and we will indicate it as n_{h}=E, where E={(i,j)} stands for the set of the holdings and E for its cardinality. Let n_{ij} be the number of shares of stock j in the portfolio and s_{j} the stock price. The share matrixN=(n_{ij}) can be assumed as the weighted incidence matrix.
The TNA of portfolio i is \(p_{i}=\sum _{j=1}^{n} n_{ij}\,s_{j} = (\mathbf {N}\,\mathbf {s})_{i}\) where \(\mathbf {s}=(s_{1}, s_{2}, \dots, s_{n})\) is the vector of stock prices. By introducing the value matrixV=(v_{ij}) whose elements v_{ij}=n_{ij}s_{j} are the holding values, the TNA can be written equivalently as p_{i}=(V1_{n})_{i} where 1_{n} is the vector with all n components being one. In the following we will also refer to the portfolio weightw_{ij}=v_{ij}/p_{i} of each position, representing the relative weight of an individual position into a stock with respect to the portfolio’s TNA.
Such network is a subsystem of the whole “market”. The number N_{j} of outstanding shares of each stock \(j=1,\dots, n\) is assumed to be constant in the market. We do not take into account share splitting or new issues. This implies that holders’ portfolios can not hold more shares at any time t than those outstanding:
$$ \sum_{i} n_{ij}(t) \le N_{j}\;,\quad j=1,\dots, m\,. $$
(1)
The above inequalities represent global constraints. These need to be satisfied during all stages of the execution of the randomization algorithms and at any time during the propagation of exogenous shocks through the portfolio network.
The exact values of N_{j} can not be retrieved easily. However, if we think of the holdings network as a smallscale representation of the market, we can introduce a scaling parameter c and imagine that the total value of each stock in holders’ portfolios initially is a fraction 1/c of its value in the whole market. In this spirit, and for the purposes of simulations, we make the following choice for the outstanding shares:
$$ N_{j} = c\,\sum_{i} n_{ij}(0)\;. $$
(2)
Parameter c captures the size of the network, in total assets terms, relative to the market: the larger c the more the holdings network is to be considered small with respect to its embedding market. Considering the total assets value of our test network (see “Dataset” section), and that the New York Stock Exchange had a market capitalization of 30.1 trillion USD as of February 2018, c=10 can be considered a sensible choice. We will also compare the results for this value to those for c=2 and c=100, to also account for scenarios where the holdings network represents a large or small fraction of the market respectively.
Market model of flowinduced trading
We are interested in studying how the aggregate fragility of the network is dependent on the overlap of portfolios and in testing interaction between portfolio diversification and overlap. We consider a case where marketwide negative events can determine mass investor outflows from funds. In such events, a fund will be forced to liquidate part of its asset positions in order to repay leaving investors. This trading activity will a have a (negative) impact on stock prices and this will cause funds that were not hit by the original event to experience losses afterwards.
Usually funds do not retain cash: a flow of investors into a fund is followed by an expansion of asset positions while an outflow is associated to a shrinkage of positions. Outflow corresponds to fund investors asking for redemption of their shares and to fund managers liquidating some assets to meet requests. Such trading activity by funds is referred to as flowinduced trading. We are interested in studying the network’s reaction to idiosyncratic shocks on a small time scale, when portfolio managers firesale assets with the effect of amplifying the shocks and triggering new ones. In particular we aim at measuring how fast the total value of the network is eroded and to what extent network fragility depends on asset commonality between portfolios. Accordingly, we consider a model where portfolio managers can only liquidate assets and we make the working assumption that the market outside the network is large and liquid enough to absorb such offer.
Consider the dynamics of a portfolio along a generic trading period. Let p(t−1) be the endofperiod TNA for period t−1 and assume that this value includes the fund’s flow along the period. If the portfolio’s composition was not altered, its return r(t) along period t would depend only on market price variations and the amount p(t−1) would grow at rate r(t) into the amount [ 1+r(t)] p(t). We define fund flow along period t the difference between the closing TNA and the accrued value:
$$f(t) = p(t)  [1+r(t)] p(t1)\;. $$
Because flowinduced trading is needed to repay leaving investors, we assume that it comes with no modifications to the portfolios’ investment strategies, by which we mean that the stocks in the portfolios will remain the same. This implies that portfolio managers liquidate every asset in the same proportion (proportional selling). Let η_{i}(t) be the fraction of TNA that is liquidated along period t
$$\sum_{j} n_{ij}(t)\,s_{j}(t1) = p_{i}(t1) \left[ 1 + \eta_{i}(t) \right]\;. $$
Then proportional selling corresponds to solution n_{ij}(t)=[1+η_{i}(t)]n_{ij}(t−1) to the previous equation, with η_{i}(t)<0.
Flowinduced trading has an impact on the prices of the securities being traded. The price impactI_{j}(x) of trading x shares of a stock is defined as the corresponding relative price variation. In the literature it is often assumed that a linear relationship exists (Kyle 1985) of the form I_{j}(x)=x/λ_{j}. The stockspecific parameter λ_{j} is called the stock’s market depth: the larger λ_{j} the more liquid the stock and the smaller the price impact of trading it. We assume that the market depth of a stock can be approximated by the total number of shares that have been issued and purchased by all participants in the market (outstanding shares). We take λ_{j}=N_{j} and such parameter accounts for heterogeneity in the liquidity characteristics of stocks. The price impact on stock j along period t will be
$$ I_{j}(t) = \frac{\Delta s_{j}(t)}{s_{j}(t1)} = \frac{\sum_{i} \Delta n_{ij}(t)}{\lambda_{j}} = \frac{\sum_{i}\eta_{i}(t) n_{ij}(t1)}{N_{j}} <0\;. $$
(3)
Note that λ_{j}=N_{j} and the scaling relationship (2) imply that the larger is c the smaller is the price impact of asset liquidations.
Taking into account both the price variations and asset liquidations, the portfolio TNA at endofperiod will be
$$p_{i}(t) = [1+\eta_{i}(t)] \sum_{j} \left[1+I_{j}(t) \right]\,n_{ij}(t1)\,s_{j}(t1)\;. $$
We can motivate a simple choice for the amount of assets liquidated. Consider an individual investor soon after a negative return r_{i}(t−1) has occurred to her fund’s portfolio. The larger is r_{i}, the higher is the chance that the investor will ask for redemption of one of her fund shares. Because 0≤r_{i}≤1 we can interpret the absolute value of the portfolio (negative) return as a proxy of such probability. If we consider homogeneous investors and a uniform redemption probability across the fund’s shares, the average fraction of redemptions will be r_{i}(t−1). Thus, as a basic approximation and discarding fluctuations, we also take η_{i}(t)=r_{i}(t−1).
To summarize, we consider a dynamics of shock propagation that proceeds as follows:

1
A negative shock to stock prices \(\boldsymbol {\delta }(t) = (\delta _{1}(t),\dots,\delta _{n}(t))\) hits the market, with δ_{j}=Δs_{j}/s_{j}∈[ −1,0];

2
This determines negative portfolio returns r_{i}(t) for those portfolios that invest in the stocks hit;

3
Negative returns determines an outflow from funds and a corresponding reallocation of portfolio positions n_{ij}(t)→n_{ij}(t+1)=[1+η_{i}(t+1)] n_{ij}(t) where η_{i}(t+1)=r_{i}(t).

4
Flowinduced asset selling has a negative price impact δ(t+1)=I(t+1) as given by Eq. (3) and then the process starts again from step 1:
$$\boldsymbol{\delta}(t)\longrightarrow \mathbf{r}(t) \longrightarrow \Delta n_{ij}(t+1) \longrightarrow \mathbf{I}(t+1) = \boldsymbol{\delta}(t+1)\;. $$
For our purposes we assume that the first shock δ(1) is exogenous: no outflow occurs during the first trading period and the first variation of the TNAs is completely determined by the negative returns.
Systemic damage
The dynamics described above implies that the value of every portfolio reduces by the end of the trading period, or it stays the same if the portfolio is not investing in the stocks that went through a price downturn. In the same spirit of Delpini et al. (2019), we consider the relative reduction in the total value of the portfolios \(D(t) = {\sum _{i} p_{i}(t) / \sum _{i} p_{i}(t1) 1}\) as an indicator of network fragility. Comparison of the real network and its counterparts from random allocation scenarios will provide indication of what configuration is more robust in the light of global market constraints as well as insight into diversification–riskiness and differentiation–riskiness correlations.
A comprehensive measure of the fragility of the holding network would require to measure D with respect to all possible combinations of one or multiple shocks δ_{j}(1). This is impractical due to computational time constraints and sensible choices are needed to select appropriate values for δ(1). A shock to a single random stock, albeit large, is unlikely to produce largescale effects in few trading periods. On the other hand a combination of very large shocks to many assets will produce a too fast and unrealistic degradation of portfolio values. A possible strategy is to consider worst case scenarios where a subset of targets is selected according to some criterion of asset network “centrality”. This was done in Delpini et al. (2019) where stress tests are performed applying a uniform shock to the most popular assets. There, the degree centrality k of a stock in the bipartite network was chosen to select the targets. Let us indicate this choice as the “ktargets” scheme. We will compare it with the following alternative choices.
Indeed it may not be obvious which measure of centrality is most appropriate. In principle, there could be assets that are very popular but account for a small fraction of the network’s value or, on the contrary, assets that are owned by few large holders with large portfolio weights. In order to take into account the actual monetary weight that stocks have in portfolios as well their popularity, we propose to alternatively select targets by maximizing the following indicator:
$$h_{j} = \left[ \sum_{i=1}^{m} \left(\frac{v_{ij}}{s_{j}\,N_{j}} \right)^{2} \right]^{1} = \left[ \sum_{i=1}^{m} \left(\frac{n_{ij}}{N_{j}} \right)^{2} \right]^{1}\;. $$
If holders owned all outstanding shares in the market, h_{j} would account for the number of funds that own the largest portions of the stock’s market value, or simply the number of leading holders of stock j. Taking into account relationship (2), such a number actually gets multiplied by a uniform factor c^{2}. Nevertheless we continue to think of it as a number of effective holders and, following this interpretation, we will say that the stocks with the highest h_{j} are the “most owned” in the network. In this sense, h_{j} is a rescaled version of the Herfindahl–Hirschman index, computed for stock j in terms of the ratios of the numbers of shares in each portfolio to the total number of outstanding shares. Let us indicate a scenario where target stocks are selected maximizing h_{j} as the “htargets” scheme. In both this scheme and the “ktargets” scheme we apply a uniform random shock δ_{j}∈[δ_{min},δ_{max}) to the n_{target} stocks with the largest values of k_{j} and h_{j} respectively. We take n_{target}=0.1×n and arbitrarily set δ_{min}=1% in order to produce an appreciable systemic damage.
Crises can come with negative returns that are widespread across a lot of stocks. In order to account for that, we also consider an “alltargets” scheme where a uniform shock δ_{j}∈[0,δ_{max}) is applied to all stocks. This choice also free us from the need of arbitrarily selecting a subset of targets. In this case we allow stock prices to eventually be unchanged, that is δ_{min}=0%.
Trading of a given stock on a stock exchange is suspended if the price relative loss exceeds a stopping threshold. This is dependent of the exchange regulation and usually less than 10%. We therefore set δ_{max}=10% for all cases.
For every scheme and different underlying holding network, we perform n_{mc}=100 Monte Carlo runs of shock propagation over T=10 trading periods. Systemic damage is computed as the average value of D over the runs.
Portfolio diversification and similarity
We measure diversification by the portfolio’s Herfindahl–Hirschman index
$$h_{i} = \left[ \sum_{j=1}^{n} \left(\frac{v_{ij}}{p_{i}} \right)^{2} \right]^{1} = \left[ \sum_{j=1}^{n} w_{ij}^{2} \right]^{1}, $$
which is interpreted as the number leading holdings in a portfolio.
Similarity between two portfolios can be quantified by their cosine similarity:
$$s_{i i^{\prime}} = \frac{\sum_{j} v_{i j}\,v_{i^{\prime} j}}{\lVert{\vec{v}_{i}}\rVert\, \lVert{\vec{v}_{i^{\prime}}}\rVert} = \frac{\sum_{j} w_{i j}\,w_{i^{\prime} j}}{\lVert{\vec{w}_{i}}\rVert\, \lVert{\vec{w}_{i^{\prime}}}\rVert}\;, $$
where \(\vec {v}_{i}=(v_{i\,1},v_{i\,2},\dots)\phantom {\dot {i}\!}\) is the vector of holdings of portfolio i and \(\vec {w}_{i}=(w_{i\,1},w_{i\,2},\dots)\phantom {\dot {i}\!}\) is the corresponding vector of portfolio weights. It is worth noticing that both indicators h_{i} and \(\phantom {\dot {i}\!}s_{i i'}\) are independent of the portfolio’s TNA and can be equivalently expressed in terms of either the holding values or the portfolio weights. Throughout the paper, we refer to the subsidiary notion of differentiation between two portfolios. In symbols, we define it as the complementary quantity of similarity, or \(\phantom {\dot {i}\!}1s_{i i'}\), and both quantities take values in [0,1]. The less two portfolios are similar, the more they are different.
Consider two portfolios of degree k_{i} and \(\phantom {\dot {i}\!}k_{i'}\) that invest in a perfectly balanced way, with uniform positions v_{ij}=p_{i}/k_{i} and \(\phantom {\dot {i}\!}v_{i' j}=p_{i'}/k_{i'}\) if stock j is in the portfolio, and 0 otherwise. Their similarity will be \(s_{i i'} = k_{i i'}/ \sqrt {k_{i}\,k_{i'}}\phantom {\dot {i}\!}\) where \(\phantom {\dot {i}\!}k_{i i'}\) is the number of common assets. For a given degree sequence \(k_{1}, k_{2}, \dots, k_{m}\) the average similarity across the network is
$$\bar{s} = \frac{2}{m\,(m1)} \sum_{i=1}^{m}\,\sum_{i^{\prime}=i+1}^{m} s_{i i^{\prime}} = \frac{2}{m\,(m1)} \sum_{i=1}^{m}\,\sum_{i^{\prime}=i+1}^{m} \frac{k_{i i^{\prime}}}{ \sqrt{k_{i}\,k_{i^{\prime}}}}\;. $$
Consider also a network where every portfolio invests selecting uniformly at random the same number of stocks it owns in the real network. Since stocks are treated indifferently, it is natural to expect that such idealized investor would invest the same amount of wealth in every stock. We will refer to such limit scenario as an “unconstrained random holdings” network (URH). Because of the balanced positions, in this network investors maximize diversification conditionally on k_{i}, achieving \(h_{i}=1/\sum _{j} (v_{ij}/p_{i})^{2}=1/\sum _{j'} (1/k_{i})^{2}=k_{i}\), where the second summation is limited to the stocks j^{′} that are actually present in the portfolio.
If we consider many realizations of a network like that, the expected value of the number of common assets will be \(\langle {k_{i i'}}\rangle \approx \frac {k_{i}\,k_{i'}}{n}\phantom {\dot {i}\!}\), where the approximation holds as long as the probability of choosing the same stock multiple times can be neglected. Under this approximation, the expected value of the average network similarity reads:
$$ s_{0} = \langle{\bar{s}}\rangle \approx \frac{2}{m\,(m1)} \sum_{i=1}^{m} \sqrt{k_{i}} \left(\sum_{i^{\prime}=i+1}^{m} \sqrt{k_{i^{\prime}}} \right) \;, $$
(4)
and this can be considered a benchmark value with respect to which we evaluate deviations from randomness. As for the diversification, since in the URH case the degree sequence is constrained and each portfolio achieves the maximum diversification h_{i}=k_{i}, we see that the average network diversification is identically equal to the average degree \(h_{0} = \bar {h} = \sum _{i} h_{i} /m\). In the following it will be useful to compare different scenarios in terms of the network’s average diversification and similarity relative to the benchmarks h_{0} and s_{0}.
Portfolio rebalancing
In a perfectly balanced portfolio every stock has the weight w=p/k. Conditionally on k, such a portfolio has maximum diversification h=k. If a holder, starting with an unbalanced portfolio, wants to increase her diversification without opening positions in new stocks, or conditionally on k, she can try to rebalance her positions in the original assets steering them toward the optimal level w=p/k. At first, some positions are larger than w, some are smaller and possibly some are just at the threshold. The portfolio’s TNA can be decomposed in the following way
$$\begin{array}{*{20}l} p &= \sum_{j} v_{j} = \sum_{j\in J^{+}} v_{j} + \sum_{k\in J^{}} v_{k} + \sum_{l\in J^{0}} v_{l} \\ &= \sum_{j\in J^{+}} (w + \Delta v_{j}) + \sum_{k\in J^{}} (w  \Delta v_{k}) + \sum_{l\in J^{0}} w \\ &= k\,w + \sum_{j\in J^{+}} \Delta v_{j}  \sum_{k\in J^{}} \Delta v_{l} \\ &= p + \sum_{j\in J^{+}} \Delta v_{k}  \sum_{k\in J^{}} \Delta v_{k}\;, \end{array} $$
where J^{+}, J^{−} and J^{0} are the sets of stocks corresponding to redundant, deficient or balanced positions, while Δv_{j} is the difference between the actual position on stock j and the balanced position w. From the previous relationship it follows that
$$\sum_{j\in J^{+}} \Delta v_{j} = \sum_{k\in J^{}} \Delta v_{k}\;, $$
which simply states that the total excess equals the total defect. A holder can liquidate redundant positions and use the liquidity to buy more shares and balance defective positions. In the absence of limits on the number of shares that can be bought or sold, the holder will achieve perfect diversification. Otherwise, she will end with a suboptimal portfolio but the value of h will have increased anyway.
Algorithms
Portfolios can be similar with respect to how many stocks they invest in, what stocks and how much money is invested in each one. We introduce two synthetic models where the original holdings are randomized with different strategies. We will refer to them by the symbols H_{1} and H_{2} respectively, distinguishing them from the original holdings network H_{or}. In both of them we assume portfolio TNAs and portfolio degree sequences as given. In other words, we focus on how stocks are chosen, conditionally on the number of stocks k_{i} and the portfolio wealth p_{i}.
In model H_{1}, each holder goes through a process of replacement of its original stocks with new ones chosen uniformly at random. We refer to this as a “shuffling of holdings”. The degree sequence of stocks is not preserved and the degree of each stock is approximately a binomial random variable with mean E/n. As discussed in Delpini et al. (2019), the degree distribution of the real assets decays slowly and there exist very popular assets owned by thousands of portfolios. In order to account for the role of these hubs, we also consider model H_{2}. It differs from H_{1} in that stock replacement is performed by means of a doubleedge swap strategy that preserves the degree sequence of stocks as well as that of holders. The number of swaps to be performed is fixed upfront and expressed as a fraction f of the number of holdings, that is n_{swap}=f×n_{h}.
In both models the ideal investor uses no information during stock selection and treats stocks as being equivalent, if it was not for the number of available shares that is heterogeneous and subject to changes during the allocation process. For both scenarios H_{1} and H_{2} we consider two variants. In the first one holders tries to reallocate exactly the same amount of money of each original position. This case will allow us to compare scenarios for equal portfolio concentrations and will be referred to as an “unbalanced scenario”. In the second one funds try to achieve the highest degree of diversification possible by going through the rebalancing process discussed in the previous section. In a way, this is also coherent with the idea of a random allocation: since stocks are treated equivalently, an investors will have no reason to make unbalanced positions, provided that enough shares are available from the market to buy. Rebalancing is performed after randomization: first a random network is generated where portfolio positions are exactly the same as the original ones, be it H_{1} or H_{2}, and positions are rebalanced afterwards^{Footnote 1}. We will consider a rebalanced network for the original topology as well and we will indicate the three balanced cases as H_{or,b}, H_{1,b} and H_{2,b} respectively to distinguish them from their original or unbalanced counterparts.
The algorithms for rebalancing and randomization that we use are required to enforce the global constraints (1) at each stage of their execution. They ensure that when holders open new positions no more shares can be bought of a stock than actually available on the market at that very moment. This aspect represents a major contribution of our work. Indeed, the simple randomization algorithms in Delpini et al. (2019) preserved portfolio TNAs and degrees, but they were otherwise unconstrained. Because no constraints were imposed on the numbers of shares, the total value of every asset in the network of portfolios was allowed to change. Those routines were instrumental in comparing the actual network topology with its random counterparts but the corresponding network models are to be considered approximations valid in the limit of infinite numbers of shares. On the contrary, the n global constraints (1) are required to be satisfied at any time here. In particular, they limit the maximum diversification holders can reach without altering their degree and this can be critical for systemic risk assessment.
Both in scenario H_{1} and H_{2}, one holding at a time is replaced. When position v_{ij} is to be reallocated, v_{ij}/s_{j} shares are sold of stock j and \(v_{ij'}/s_{j'}\phantom {\dot {i}\!}\) shares need to be bough of stock j^{′} from the market. Since the number of outstanding shares is limited, the random reallocation of an individual portfolio can fail. It can happen that the shares of j that are available on the market to buy at a given time are not sufficient to reallocate the desired position. It can also be the case, in scenario H_{2}, that the stock that was sampled as a replacement can not be swapped with the original one without either violating the global constraints or changing the stocks’ degree sequence. In such cases, a failure is registered and a new stock is randomly sampled for replacement. When a limit number of consecutive failures is reached during the allocation of a given portfolios, the whole randomization process is reset and restarted from the very beginning after a reshuffle of all stocks and holders. This is done to stochastically escape configurations that would not allow full reallocation of all portfolios and to find possible solutions that may be reached through a different sequence of holding replacements.