Coalitions and coordination in Washington think tanks: board interlock among Washington D.C.-based policy research and planning organizations

Think tanks have become central players in the political and policy ecosystem of the United States, yet the communication and coordination strategies and connections between them remain relatively unexamined. This paper begins to remedy that by using IRS 990 data to construct and analyze the largest board interlock network among policy planning organizations to date. An analysis of 277 policy research organizations located in Washington D.C. reveals high levels of resource seeking, issue-area coordination and weak evidence of ideological coordination among these substantively important organizations in American politics.

1984), this study catalogues policy research and analysis organizations active in Washington D.C. In particular, I focus on the connections between these organizations, examining how they coordinate and share information, as well as mitigate systemic risk I take an inductive approach at how these central-yet understudied-actors in the American political landscape coordinate among themselves. I conduct the largest mapping of the Washington D.C. think tank ecosystem to date (Burris 2008). Following an approach common in organizational sociology, I leverage interlocking directorates of organizations to examine patterns of organizational coordination (Mizruchi 1996). Two organizations are said to have an "interlock" between their directorates when one (or more) people sit on the boards of both organizations. This signifies a strong organizational tie between these two groups. I construct and analyze the board interlock network for 277 Washington D.C.-based organizations, using IRS 990 disclosure data.
Prior interlock scholarship has tended to focus largely on connections between corporate boards (Mizruchi 1996). However, even in these contexts scholars have found that interlock has been central in enabling political activity (Mizruchi 1982), with highly interlocked directors being more active in policy associations (Useem 1979). Research on bank control over corporate boards has indicated that centrality in the interlock network can be understood as power or importance in the community (Mariolis 1975), with some arguing that these ties are particularly important as they facilitate social capital and access to information flowing through the network of organizations (Davis 1991;Mizruchi 1996).
A more thorough accounting of policy-planning organizations and the manner in which they coordinate is of central importance in understanding contemporary American political institutions. As legislative staff capacity has declined in members' offices (Crosson et al. 2019), policy-making in Congress has become centralized in party leadership (Curry 2015). We can understand this partisan legislating in the context of a recently prominent theory of American politics, which conceptualizes political parties as extended networks of policy demanding constituencies, organizations and interest groups (Bawn et al. 2012). This theory is usually applied to the role these networks play in setting party agendas or nominating and supporting candidates for elected office. To be sure, the nomination and election process serves a fundamental role in identifying and prioritizing issues for government attention. However, I argue that think tanks serve as the policy-apparatus of these extended party networks of policy-demanders. As such, we should expect them to coordinate along ideological dimensions in their attempts to support and influence partisan and ideological lawmakers (Noel 2014).
Prior interlock research among policy planning organizations has focused on small subsets of the full network, with particular attention given to the largest actors. Burris (2008) evaluated interlock among twelve prominent think tanks, and Salas-Porras (2018) focused on a more inclusive set of connections-beyond board interlock-among 33 economic policy think tanks during the financial crisis. These studies have found strong evidence of ideological coordination among these elite policy planning actors. In a study of policy networks around estuaries, Berardo and Scholz (2010) demonstrate that organizations connect both to popular and well-resourced organizations. These connections enable both information flow in the network, and the establishment of bonding structures that facilitate higher-stakes coordination. This paper differs from this prior work on think-tank interlock in two important ways. First, I analyse a far larger set of organizations, as policy information, especially on niche issues, may come from players beyond the few most prominent. Current work gives us no sense of whether or how these organizations are integrated into the network. Second, I use exponential random graph modeling to more systematically evaluate the interlock network. Beyond simply describing the observed characteristics of the network (Burris 2008;Salas-Porras 2018), this enables me to conduct a statistical assessment of what features of individual organizations are directly associated with greater connectivity in this network.
Because of the inductive nature of this investigation of coordination and coalition formation among policy-planning organizations, I do not detail explicit theory-driven hypotheses to test. However, it is useful to lay out some general expectations about what evidence of different network structures and interlock patterns might indicate. Following the findings in Burris (2008) and Salas-Porras (2018), I expect that organizations will be more likely to foster ties with other organizations that are ideologically aligned with them. However, it remains to be seen if this effect extends to this broader set of organizations beyond the top, often quite ideological, players. I also expect organizations with access to greater financial resources to be more heavily interlocked, as access to resources is a substantial motivator of interlock in other contexts (Mizruchi 1996).
I do not have strong prior beliefs about the direction or magnitude of the relationships between other dyadic and organizational factors and the probability of connection. Common explanations for strategic interlock in the corporate context include preferences for diverse connections to increase information gathering breadth, and preferences for connections to other similar organizations to monitor competition, or engage in collusion (Mizruchi 1996). Similar explanations map to this case: Organizations may seek to build cross-issue coalitions increasing the scope of their information networks (issue heterophily), or they may choose ties within their local issue space to increase efficiency and avoid duplicated effort (issue homophily). Similarly, we might imagine justifications for either homo-or heterophilic preferences in other dimensions of organization type, like whether the organization engages in lobbying, has dues paying members, or hires contractors.
Following (Berardo and Scholz 2010), we can interpret the presence of centrally located bridging ties as indicative of the network's ability to facilitate information transfer and mitigate low-level risk, while a tendency for transitivity and triadic closure can be understood as offering the potential for greater collaboration within communities.
In this paper, I have three main goals. First, I set out to describe the scope and properties of the U.S. federal policy-planning network, at a much larger scale than has previous work. Second, I model the organizational factors which are associated with a greater embeddedness in this community. Finally, I detail the communication and coordination strategies that these interlock trends imply.

Describing the U.S. Federal Policy-planning network
In this project I construct and analyze the board interlock network of policy planning organizations headquartered in Washington D.C. I analyze all organizations active between 2008 and 2015 with average annual budgets over $100,000, which are classified as subtype "Research Institutes & Public Policy Analysis" according to the IRS' National Taxonomy of Exempt Entities Core Codes (NTEE-CC). As part of their determination of tax exemption status, "Determination specialists" at the IRS classify organizations by type using the NTEE-CC. The first digit of the NTEE-CC is a letter A-Z, which represents the "Major Group, " a broad sectoral categorization (e.g. Education, Environment, Crime & Legal-Related etc.). The second and third digit of the NTEE-CC subdivide orgnanizations by specific areas, organization types, or activities. The NTEE includes a set of, so-called, "common codes, " that code for particular types of activity that are common across all major groups, such as advocacy, technical assistance, fundraising, or research. In this analysis, I include all organizations classified with the "05" common code for "Research Institutes & Public Policy Analysis, " regardless of their major group. Data about these organizations, including their classifications, complete listings of their boards of directors, annual revenues, membership dues, lobbying expenditures, and contracting comes from their Internal Revenue Service (IRS) form 990 mandatory annual disclosures, as collected and digitized by GuideStar. In addition to IRS 990 data, I gathered campaign contributions made by employees of these organizations, summed by party using data from the Center for Responsive Politics, and organizations' ideal points, IGScores (Crosson et al. 2020

Fig. 1 Breakdown of research institutes & public policy analysis by NTEE-CC major codes
left-right scale based on organizations' public position-taking activity between 2005 and 2016, which are particularly well-suited to this application (Crosson et al. 2020). Edges have been dichotomized to code for the existence of an interlock tie between organizations. Table 1, below, shows descriptive statistics for the variables which are used in the subsequent analysis.
As Table 1 shows, IGScores were only available for 28.9% of the organizations included in the analysis. Because of this missingness I employed multiple imputation using Amelia (Honaker et al. 2011), to generate imputed values for IGScores for think-tanks without scores. In service of a more accurate imputation model, I then collected two campaign finance variables for each organization using data from the Center for Responsive Politics: (1) total contributions to democratic candidates from the organizations' employees, and (2) total contributions to republican candidates from the organizations' employees. I found contributions to democratic candidates from 61.0% of organizations and contributions to republican candidates for 54.1% of organizations. Organizations with no contributions found from employees to either republicans or democrats were treated as missing. I also included variables for the average IGscore of the three and five organizations with the most semantically similar names in a 300 dimensional vector space constructed using latent semantic analysis (Furnas et al. 1988;Deerwester et al. 1990;Rehurek and Sojka 2011). Both the contribution variables and the proximate IGScore variables were used in the imputation model, along with average revenue, average membership dues, average number of contractors, average total revenue, NTEE code, and average end of year assets during the period 2008-2015. Contributions to democrats, contributions to republicans, revenue, membership dues, total revenue and end of year assets were logged, while NTEE was treated as a nominal variable. Bounds for the IGScores to be imputed as well as the campaign finance variables were set at the empirically observed minimum and maximum values in the dataset. I generate 100 imputed datasets and conduct my subsequent analysis in parallel across datasets, combining results. The distribution of real versus imputed IGScore values is shown in Fig. 2, below. The imputed values for missing IGScores is substantially more unimodal and centrally aligned than the distribution of actual IGScores. Additional imputation diagnostics are shown in Additional file 1: Electronic Supplementary Information "Appendix A, in Figures 7 and 8". It is important to note that because I use these scores to measure ideological distance between organizations, the moderate, unimodality of the imputed scores will tend to yield conservative estimates of ideological distance, and bias against finding substantive ideological results.
In expectation, overly moderate IGScores for these organizations will artificially shrink the distance between organizations-the key dyadic measure of ideological alignment. This should attenuate estimated effects, and as a result the test of dyadic alignment presented here is a conservative one.

Detecting think tank interlock
To facilitate the detection of interlocks, I standardized directors names using a threestage process. First, I used key collision clustering with a two letter fingerprint with manual verification to correct for typos. Next, I used key collision clustering based on word-tokens, with manual verification to identify names that may have been entered in different formats (e.g. FirstName LastName vs. LastName, FirstName). This clustering and merging was done in OpenRefine (Verborgh and De Wilde 2013;Kusumasari et al. 2016). Finally, I removed titles, honorifics, and post-nominal letters or initials such as Esq., PhD or Jr., as they are applied extremely inconsistently throughout the data. 1 Following this name standardization procedure, the 277 think tanks included in this analysis had, in total, 9469 unique directors between 2008 and 2015. Two organizations were

Fig. 2 Real versus imputed IGScores for organizations in the interlock network
1 While it is certainly true that titles and post-nominal letters provide additional information which may distinguish between otherwise identically-named individuals, they are applied so inconsistently (even when the same person's name is being entered by the same organization in successive years) that matching on names including titles and post-nominals would induce substantial false negatives.
coded as having an interlock tie between them if they had directors with identical poststandardization names at any point during that eight year window. Using this method, 234 of these directors served on the boards of more than one organization in the dataset. For example, Ambassador C. Boyden Gray, founding partner of the DC-based law firm, Boyden Gray & Associates LLP, served on the board of 5 organizations in the dataset during this period, the most of any director,-American Action Forum, Atlantic Council of the US, The European Institute, Center for Global Development, and Freedomworks Foundation.
Because having a common board member within this time window is coded as an interlock tie, organizations can be coded as being interlocked without actually having a board member simultaneously serve on both boards. For example, if John Smith serves on the board of organization A from 2008 to 2011 and then organization B from 2013 to 2015, my procedure would code organization's A and B as being interlocked although they were not, in the strictest sense, actually interlocked. In this analysis I attempt to analyze the structure of the policy-planning network during the Obama Administration, looking at the whole time period. Because I use interlock ties as an indicator of coordination and communication, I count asynchronous ties like the hypothetical one between organisations A and B. As a member of the board of directors for organization B, John Smith would retain social-network connections to the board members he formerly served with on the board of organization A. Because these existing social ties would still serve as effective means of coordination and communication, I consider these asynchronous interlocks as valid for my analytic purposes.

Properties of the think tank interlock network
The interlock network of these D.C.-based research institutes and public policy analysis organizations is shown in Fig. 3, below. The degree distribution of this network is shown in Fig. 4.
Following (Gerber et al. 2013) who conduct a similar network analysis on the regional planning network in California, I report a series of network statistics on the observed interlock network. These are shown in Table 2. The 277 node undirected interlock network has a density of 0.005, which means that only 1 200 of all possible dyads are connected by an interlock. The average organization in the network has a degree of 1.4 other organizations, while the most connected organization, the Atlantic Council of the US, is connected to 20 other organizations. There are 127 organizations in the largest connected component of the network, which comprises ∼ 46 % of the network. Path length, or the degree of separation between nodes, refers to the minimum number of "hops" that are required to travel from one node to the other. Mean path length is equal to 1 n·(n−1) · i� =j d(v i , v j ) where n is the number of vertices in the network, and d(v i , v j ) is the length of the shortest path between vertex i ( v i ) and vertex j ( v j ). Networks with longer mean path lengths are more disparately connected, while those with shorter mean path lengths are more closely connected. The interlock network has a mean path length of 4.825, substantially lower than the expected average path length of 9.29 calculated from 100 randomly generated Erdős-Rényi random graphs (Erdős and Rényi 1960;Newman et al. 2001), in which the probability of ties between any two of the 277 nodes was set equal to the observed density in the interlock network. The global clustering coefficient is the share of all triplets (sets of three nodes) which are closed (i.e. all connected to each other). Networks with higher global clustering coefficients will tend to have more tightly-knit clusters of nodes. The interlock network has a global clustering coefficient of 0.158 which far exceeds the expected global clustering coefficient estimated from Erdős-Rényi random graphs.
This pattern of comparatively small mean path length and a substantially higher clustering coefficient than we would expect by random chance suggests that the interlock network is a "small-world network" (Watts 1999), a type of network common in social and other real-world phenomenon that tend to have more highly connected subgraphs with high-degree hubs and relatively short paths between nodes.
Scholars have long noted the importance of an actor's position in a network for a variety of salient outcomes such as access to information (Granovetter 1985;Carpenter et al. 2004) ability to engage in brokerage (Burt 1992;Heaney 2006), and status (Podolny 2010; Heaney and Lorenz 2013). Following (Heaney and Lorenz 2013), I focus on betweenness centrality to identify potentially influential organizations in the interlock network. Node size is proportional to organizations' average total revenue during this period, and color is a function of the organizations' ideological ideal point (IGScore), where blue indicates more liberal and red indicates more conservative. The network is displayed using the Fruchterman and Reingold force-directed algorithm (Fruchterman and Reingold 1991) Betweenness centrality measures how frequently a given organization lies on the shortest path between other organizations in the network (Freeman 1978). For descriptive purposes I present the top twenty most central nodes according to betweenness centrality in Table 3.
All but one of the twenty most central organizations, the Albert Shanker Institute, have average yearly revenues above the median revenue in the network of $1,189,176. This suggests that revenue is strongly related to connectedness in the network, a proposition which I test more rigorously below.

Modeling think tank interlock connections
In this section I model the individual organizational, dyadic, and endogenous network factors which shape the patterns we observe in the interlock network I have described above.  Methods I use an exponential-family random graph model (ERGM) to interrogate the interlock network described above. Interdependence between ties presents a critical challenge to inference when working with network data, as the higher-order relational structure of the data violates the usual assumptions of simple linear regression models. ERGMs allow an analyst to explicitly model these higher-order network structures in addition to nodelevel and dyad-level covariates of interest (Robins et al. 2007(Robins et al. , 2007. Exponential random graph modeling treats the full observed network as the dependent variable, and node variables, dyadic variables, and endogenous network structures act as independent variables which influence the probability of any given tie to form within that network. The observed network is understood as a random draw from a probability distribution of possible networks given those independent variables, maximized via Monte Carlo maximum likelihood (MCMLE).
The ERGMs in this paper are fit using MCMLE (Handcock et al. 2008), using a single chain of 33,562,620 iterations and a thinning interval of 8196 for an effective sample size of 4090.
The full model of board interlock estimated here includes the following "exogenous" terms: Ideological Distance (absdiff)-dyadic covariate for each (i, j), defined as the absolute value of the difference between the IGScores of nodes i and j. Match NTEE (nodematch)-a statistic for "uniform homophily. " Accounts for the propensity for groups to connect with others of the same NTEE-CC type (assumes this probability is uniform across types). Match Dues Collecting (nodematch)-a statistic for "uniform homophily. " Accounts for the propensity for groups to connect with others that also (do not) have dues paying members. Match Lobbying (nodematch)-a statistic for "uniform homophily. " Accounts for the propensity for groups to connect with others that also (do not) spend money on lobbying. Match Contracting (anodematch)-a statistic for "uniform homophily. " Accounts for the propensity for groups to connect with others that also (do not) hire contractors. Log(Revenue) (nodecov)-a node covariate which captures the propensity for organizations with higher revenues to have more connections. Logged to aid convergence, as the distribution of revenue is extremely left-skewed. Membership Dues (nodecov)-a node covariate which captures the propensity for organizations to connect to others with larger revenues from membership dues specifically. Lobbying Fees (nodecov)-a node covariate which captures the propensity for organizations to connect to others that spend more money on lobbying.
When I take endogenous network structure into effect using the ERGM, I include the following "structural" terms: Edges (edges)-a count of the number of edges in the graph. Can be thought of as analogous to an intercept in a linear model. Concurrent (concurrent)-a network statistic equal to the number of nodes with degree two or higher. Isolates (isolates)-a network statistic equal to the number of nodes with degree zero. GWESP (gwesp.fixed)-a statistic equal to the Geometrically weighted edgewise shared partner distribution with a decay parameter of θ = 0.8 . See (Hunter 2007) for details. This statistic captures the tendency of the network towards transitivity, but is less sensitive to degeneracy than a simple triangles statistic.
The Edges term is included to capture the base-rate of connections within the network, and the Concurrent, Isolates, and GWESP terms are included to account for endogenous network dependency in the interlock network. These terms capture the tendency of the interlock network to form clusters while having a relatively large percentage of isolates.
Following best practices, I report several intermediary models prior to the full ERGM specification. Cranmer and Desmarais (2011) and Schrodt (2014) I present models which include only the primary variables of interest (Ideological distance and NTEE typo homophily, as well as those without endogenous network controls. The models are run using identical specifications across 100 imputed datasets. Each of these datasets is identical in all regards other than that they contain different values of the imputed IGScores used to calculate the ideological distance between nodes. Results from these 100 separate estimates are combined using Rubin's rules (Rubin 1987;Honaker et al. 2011).

Results
The results from the ERGM analysis is presented in Table 4 ERGM coeficients can be interpreted similarly to those of a logistic regression. In fact because models 1-3 contain no model terms for endogenous network effects, the ERGM estimation procedure defaults to logistic regression on the network dyads. Coefficients are the logodds of a particular tie given the rest of the interlock network (Leifeld and Schneider 2012). Model 4 is the full ERGM model including endogenous network statistics, estimated using MCMC MLE on the 100 imputed datasets. If the p value from the multivariate geweke test was greater than or equal to 0.95, indicating poor convergence of the MCMC model, the model was re-estimated for that dataset. ERGM convergence and goodness-of-fit diagnostics are presented in the Additional file 1: Supplementary Information.
Compared to the simpler specifications, Model 4, which includes all substantive covariates as well as endogenous network effects, offers the best fit as indicated by the lowest AIC and BIC. It is worth noting, however, that across all specifications there is strong evidence of homophily by issue area of the think tank as indicated by the Match NTEE parameter estimates. However, these results in Table 4 offer mixed evidence as to whether ideological distance is associated with a decreased probability of connection between organizations. In the complete model, think tanks are less likely to be connected to those that are ideologically more distant from them. However, the magnitude of this effect is quite small, and after propagating uncertainty from the imputation using 100 imputed datasets the effect is not significant at conventional levels. In order to assess whether this uncertainty is the result of missingness and imputation, I plot the separate estimates for each imputed dataset in Fig. 5, below. This plot shows marginal effect of an increase of 1 unit in distance between organizations i and j on the probability of a tie between those two organizations across each of the ERGMs estimated. In 85 out of 100 replicates (85%), ideological distance was associated with a significantly lower likelihood of a tie between two organizations. This suggests that these results are substantially attenuated by measurement error induced by missingness in the ideology measure, and variance introduced by imputation.
The most persistent and substantively significant finding across models 2-4, is that organizations are significantly more likely to be connected to each other if they are both of the same NTEE-CC classification. With log-odds of ∼ 1.07, a given think tank is ∼ 190% more likely to be connected to another think tank that works in the same issue area (according to NTEE classification), than it is to be connected to a think tank that works in a different issue area. This is strong evidence of issue-based homophily in the interlock network. This is consistent with several existing explanations for interlock behavior. Because the information transfer and coordination enabled by interlock can mitigate risk (Mizruchi 1996), scholars have argued that organizations tend towards homophilic interlock when they are dependent on other organizations in their sector. For example, organizations active within the same policy area may need to rely on the same sets of funders, and knowledge of each others' activities may allow them to avoid challenging each others' funding streams.
While not as strong as the propensity for organizations to connect with those working in the same area, organizations are also ∼ 10 % more likely to be connected to those that lobby if it also hires lobbyists. Similarly, organizations are ∼ 11 % more likely to interlock if they both hire contractors. One possible explanation for this is that contractors and lobbyist may serve as potential vectors of informal relationships between organizations, which ultimately facilitates later institutionalization via interlock. In contrast, membership organizations that collect dues are significantly less likely to be connected to others that collect membership dues. Organizations connect to those that are dissimilar along this dimension, suggesting that they find benefit in the differential expertise and resources that the other can provide. However, the amount of lobbying an organization does, or amount of membership dues it collects are not associated with interlock. Non-member organizations and member organizations appear to seek each other out for interlock.
As expected, think tanks are substantially more likely to have interlocking boards as the revenue of the organizations increase; particularly successful fundraisers appear to be more attractive targets for interlock. Because the dependent variable in an ERGM is represented as log odds, the estimated effect of revenue can be interpreted as a log-log model, in which the proportional change in the probability of tie associated with a p percent increase in Revenue = exp(aβ R ) , where a = log([100 + p]/100) and β R is the coefficient estimated for Log(Revenue) (Benoit 2011). Using this, an increase in revenue of 100% is associated with a 19.1% increase in the likelihood of a tie. An organization with the revenue at 75th percentile ($3,433,207) has a 65.0% higher probability of a given tie than an organization with revenues at the 25th percentile ($473,935).
In order to better understand the within-issue area homophily that the ERGM results indicate, I present a breakdown of the subgraph density of the interlock network by NTEE-CC code of the organization. These results are shown in Table 5. Density within the subgraph composed of the 47 organizations in the International, Foreign Affairs & National Security sector is almost five-times higher (0.024) than in the full interlock network (0.005). In fact, the five most common categories all have higher subgraph density than the density in the full graph. This suggests that organizations form interlock communities with each other based on their activity in the same policy sector. This effect is particularly strong among the most common types of organizations in the network.

Discussion
The results presented here represent the most extensive look at the network structure of interlock among active policy-planning organizations in Washington D.C. The network appears to exhibit small-world properties, suggesting it is relatively efficient for transmitting information within the connected component. The tendency for triadic closure represented by the positive effect of the GWESP parameter in the ERGM is consistent with organizations forming dense ties within their local communities, a form of network structure often thought to facilitate collaboration.
In contrast to prior work on small subsets of the policy network, I find mixed and inconclusive evidence of ideological alignment as a substantial driver of interlock. This may in part be related to ideology indeed playing a substantially less significant role among niche organizations, for whom issue area is a more salient part of their organizational identity than ideology. However, substantial missingness in my measure of ideology-and a conservative imputation process likely-may have attenuated these results as well.
Organizations tend to interlock with those with the most resources, and those active in their own issue areas. In the case of membership organizations, the network structure indicates that organizations tend to connect with organizations unlike themselves; membership organizations are more likely to connect with non-membership organizations and vice versa. This is particularly interesting because, by virtue of their organizational structure, membership organizations may tend to have better access to diffuse information from their membership,or a greater ability to mobilize an outside lobbying campaign (Kollman 1998).