Identifying key sectors in the regional economy: a network analysis approach using input–output data

By applying network analysis techniques to large input–output system, we identify key sectors in the local/regional economy. We overcome the limitations of traditional measures of centrality by using random-walk based measures, as an extension of Blöchl et al. (Phys Rev E 83(4):046127, 2011). These are more appropriate to analyze very dense networks, i.e. those in which most nodes are connected to all other nodes. These measures also allow for the presence of recursive ties (loops), since these are common in economic systems (depending to the level of aggregation, most firms buy from and sell to other firms in the same industrial sector). The centrality measures we present are well suited for capturing sectoral effects missing from the usual output and employment multipliers. We also develop and make available an R implementation for computing the newly developed measures.


Introduction
Wassily Leontief won the Nobel prize in economics in 1973 for developing the Input-Output [I-O] modeling system, in the years before and during WWII. The original purpose of the I-O approach was to identify flows among economic sectors of a region or a country. Those flows represent the exchanges that take place among sectors. In market economies, this functions as a fair representation of sectors buying from and selling to one another. Over time it was clear that the identification of key sectors in the economy was emerging as a new and exciting application as shown in the early contributions of Rasmussen (1956), Laumas (1975), Schultz (1977) and Hewings (1982). In those early times, the implication was that any industrial policy intended to protect or stimulate specific sectors would start with the proper identification of their importance to the system. Today this application still remains relevant, even though I-O systems themselves have been superseded by more sophisticated and complex economic modeling approaches, such as general equilibrium (Jorgenson 2016). Local and regional policy makers and planners can certainly refer to the to evaluate and prioritize various centrality measures (e.g., Harrison et al. 2016;Meng et al. 2017), context and interpretability remain the most valid determinants of their selection and application.
As we see it, we contribute to the advancement of the field by setting multiple objectives. First, we want to consolidate definitions and interpretations from existing frameworks. As it is common with methods that extend into other fields, multiple terms are used to refer to the same concept, and vice-versa. We are not aware of similar efforts. Second, we test whether the newly developed network metrics are applicable to I-O systems at the subnational level (state, and even municipal). This is not a technical issue, but one of interpretation. As we show, the same metrics have different values at the national, state and local level, hinting at the different role that specific industrial sectors play at different level of geographic aggregation. We have not identified other contributions in the literature that analyze subnational geographies in similar fashion. Finally, we implement all the metrics discussed in the paper in R. The code is publicly available and-with relatively little pre-processing-a user can compute all metrics for any network. 1

Theoretical considerations
As discussed above, I-O systems can be represented by networks in which the nodes (also referred to as vertices) are economic sectors or industries, and the links connecting them (also conceptialized as ties) represent the flows among those industries. More precisely, I-O systems are very dense, valued, directed networks. In a network, density refers to the proportion of links that actually exist as a share of all possible links. I-O networks are very dense because-especially at high levels of aggregation-most, if not all, nodes (industries) will be connected to almost all other nodes.That is, network density (d) approaches 1 (in symbols, d →1). 2 They are valued networks because the links do not only represent the presence of a connection, but such a connection has a specific magnitude. Finally, they are also directed networks because I-O systems represent bidirectional flows between economic sectors. That is, each pair of nodes is connected by two links, one for each of the directions in which transactions may take place, typically with differing values. At greater levels of sectoral disaggregation, a more granular classification captures narrower and narrower definitions of industries and commodities. This results in more differentiation in the flows between sectors, with one direction potentially overshadowing the other by orders of magnitude (Lovász 2009;Miller and Blair 2009). Ties with similar values in both directions are extremely rare in I-O systems.
If the objective is to determine how important a sector is in the economic network, one may consider using vertex centrality measures such as those introduced by Freeman (Freeman 1977(Freeman , 1978Freeman et al. 1991). Common vertex centrality measures are of ambiguous applicability in the case of I-O networks. The difficulties associated with applying such vertex centrality measures to I-O systems become apparent when considering some common features that describe such networks. Of particular relevance are the values of the ties between vertices, loops representing recursive trade within an industry, and the overall density of I-O networks. 3 The simplest of the measures Freeman defined, degree, is calculated as the number of ties that are incident upon a node. 4 As such, the degree of a node describes how often each industry participates in the production function of others, and which sectors are part of its own production function. Given the high density found in I-O networks, however, the number of links that are incident upon any given node is not likely to vary greatly throughout the network, making degree a relatively poor measure of a given industry's relative prominence in a local economy. 5 In addition, because degree measures only direct access to others, it fails to capture the larger systemic effects that are distributed throughout the wider network.
Path-based measures were introduced to take into account a node's place within the larger network. Two measures that were introduced to take the entire network into account were closeness and betweenness Freeman (1978). Closeness provides a measure of the inverse distance between a node and all other nodes reachable from it. More specifically, closeness centrality for a given node i is calculated as where d ij is the distance of the shortest path (i.e., geodesic distance, or, simply, geodesic) between node i and any other node j. In this manner, closeness provides a measure of a node's strategic positioning within an network in terms of the speed or efficiency with which the flows within a network will pass through a particular node. Larger measures of closeness may, for example, indicate a node that will be able to access information or materials more frequently or quickly than others with lower values.
Another means of conceptualizing prominence of a node in terms of flows through a network is betweenness, which measures a node's potential for being able to capture, enable, or impede the passage of informaiton or materials in a network. As such, it is calculated as where g jk is the number of shortest paths (i.e., geodesics) between node j and node k, and g jk (i) is the number of those paths that include i. In this manner, betweenness measures the degree to which a particular node i commands strategic junctures within the network. (1) A major weakness of both betweenness and closeness, as they are commonly used, is that neither measure was conceived to take the value of the ties into account. For I-O networks, this is a major shortcoming, as the actual magnitudes of intersectoral flows are a critical consideration. A solution has been proposed by Dijkstra (1959), Brandes (2001 and Newman (2001) and further modified by Opsahl et al. (2010) to seek the path with the least cumulative impedance, as opposed to the one with the fewest steps. The idea that drives this modification is that ties with lower values transmit less, and may be considered to impede flow more than ties with greater tie values. The resulting implementation is a weighted distance measure given by where the inverse of the tie values (i.e., weights) w ij is summed for each of the paths between node i and node j, and the path of least resistance (i.e., the one with the lowest value) is selected. The α coefficient functions as a tuning parameter that is used to either emphasize or deemphasize whether the number of steps should be taken into account. Setting α = 0 produces the same measure as if the ties were of binary values; setting α = 1 sums the inverse tie values; setting 0 < α < 1 favors fewer steps; and setting α > 1 favors stronger tie weights in calculating shortest path distances. Opsahl et al. (2010) employed the weighted geodesic measure shown in (3) in both closeness (4) and betweenness (5) in a manner that is fairly straightforward.
In each case, the weighted geodesic (i.e., shortest path) distance measure has been substituted for the binary form. In the case of weighted closeness, depending on the α setting, the relative distance in terms of summed inverse tie values is substituted for a count of the number of steps in each path when selecting the lowest value. For, weighted betweenness, on the other hand, g wα jk is a count of the number of geodesics occurring between node j and node k (Opsahl 2015).
Weighted path-based centrality measures hold the potential to reveal the relative prominence of nodes in a valued network. Each is well suited for use with valued networks, though there are some shortcomings for each that should be noted in regard to their potential for application in I-O networks. The characteristics of I-O networks that make them a challenge for both standard and weighted network metrics include the values of ties in I-O networks, the recursive loops present in aggregated networks, and their density merit consideration in modeling the flow of resources.
If one considers the weighted distance equation given in Eq.
(3) to modify closeness (4) and betweenness (5), it should quickly become apparent that such a weighting metric will function in a manner similar to the measures designed for dichotomous ties (Eqs. (1) and (2)) only when the tie values are limited to a relatively narrow range of integer values. Given that I-O networks are expected to take a theoretically unlimited range of positive continuous values, it becomes increasingly likely that the measure will produce one unique shortest path for any given node pair. Such a solution would emphasize the prominence of nodes that are situated in some of the most proiminent production sectors in the region being evaluated. Although prominence within key sectors will produce useful information, the lack of alternate shortest paths between nodes holds the potential to mask the relative importance of other nodes that may hold secondary importance-something that is also important from a planning and disaster mitigation standpoint. The ability to tune the measure using the α setting helps to reduce this tendency, but also requires a more standardized approach to tuning that has not yet been evaluated in I-O networks at this point. An additional challenge to the anlaysis of I-O networks occurs when considering the recursive loops that are used to best model flows within a system that has been aggregated to create a set of nodes that would normally trade amongst themselves into a metanode that represents an entire industrial sector. Depending on the level of aggregation, the presence of loops within I-O networks can be substantial. However, both the classic binary and the weighted assessments of shortest paths through a network will logically always ignore such loops, as they would not normally constitute a "shorter" path through the network. This is not a realistic representation of the actual behavior of flows within an I-O network.
The final consideration that should be important to those modeling I-O networks is the high density of ties within the network. Using path-based measures that treat ties as binary tends to make it look as though all nodes are relatively "close" to one another since, on average, the paths throughout exceptionally dense networks will be of roughly the same length. This tendency is somewhat reduced by using weighted ties. When employing weighted measures, both closeness and betweenness will produce measures of prominence that are relative to the total network. They do not provide any special consideration for the more immediate neighborhood (e.g., manufacturing sector) around each node.
One way of avoiding most of those limitations is adopting measures of centrality based on random walks. Newman (2005) explored and further developed the notion of a measure of centrality based on random walks to overcome the need for a pre-determined, known flow from each source (s) to each target (t), and stated that "the random-walk betweenness of a vertex i is equal to the number of times that a random walk starting at s and ending at t passes through i along the way, averaged over all s and t. " Experts in I-O modeling, not yet familiar with network metrics, perceive the notion of random walk as one that does not correctly represent the relationships in the economic system. Their complaints are not without merit because, after all, industrial sectors do not consume a random collection of inputs in the hopes of producing a very specific output. Those input are indeed very well defined by a sector's production function. In the same vein, one could conceptualize random walks as representing a way of exhausting all possible combinations of inputs that produce all outputs in the economy. Because the computation of any random walk-based measure involves the "value" of the flow between two given sectors, only the ties that are present will influence the magnitude of the metric. Therefore, all random walked-based measures of an I-O system will reflect the true underlying production functions that link those sectors.
Network metrics that use random walks could identify those sectors that participate in all other sectors' production functions more often and with a greater insidence. This is one of the most significant differences between simple measures of centrality (which would identify the "presence" of the tie, in an on-off fashion) and random walk-based measures of centrality, which also include an apt representation of the intensity of the tie.

Methods
Following Friedkin (1991) we present a means of measuring immediate effects and mediative effects, in addition to the already defined total effects. Lee (2006) provides a more extended discussion about the equivalency between total, immediate and mediative effects and eigenvector, closeness and betweenness centrality, respectively. In this section we discuss approaches that provide a better definition of mediative and immediate effects within a network, as presented in García Muñiz et al. (2008) and Blöchl et al. (2011). Table 1 presents our own attempt to establish the equivalence between the measures proposed by each author.
What follows is a discussion of two alternative conceptualizations and corresponding definitions for closeness and betweenness centrality. In each case, these metrics provide a measurement that more closely mimics flows within I-O networks than the more widely shared (Freeman 1978) versions. We have elected to implement the definitions set by Blöchl et al. (2011). The metrics proposed in García Muñiz et al. (2008) are the inverse of those considered in Blöchl et al. (2011). This does not alter the outcome in terms offor instance-sectoral rankings because the information transmitted by those metrics is equivalent.
One question remains. Is it appropriate to add the immediate and mediative effects to obtain a sort of total effects? Friedkin (1991) refers to total effects in the context of social networks but we have not found any work in which such definition is applied to I-O networks. Our cursory examination shows that immediate and mediative effects could show opposite directions, producing ambiguous interpretation of total effects. This is perhaps a computational confirmation of our intuition, which is that immediate and mediative effects correspond to two substantially different processes, and their combination is-at least-questionable. This is certainly an area for further investigation.

Random walk centrality for immediate effects
Sectors that have effects transmitted over long sequences of economic relations have lower economic impacts than those sectors that have a large number of direct linkages. The measure immediate effects is the reciprocal of the mean length of the sequences of relations from the jth sector to all others. Consider a weighted network, either directed or undirected, with n nodes denoted by j = 1,..., n; and a random walk process on this network with a transition matrix M. 6 The m jk element of M describes the probability of the random walker that has reached node i, proceeds directly to node j. These probabilities are defined by Eq. (6).
where a ij is the (i, j)th element of the weighting matrix A of the network. When there is no tie between two nodes, the corresponding element of the A matrix is zero. In the case of I-O system as we implement it, A is a matrix of regional absorption coefficients. Equation (7) shows the random walk closeness centrality of a node i as a function of the inverse of the average mean first passage time to that node.
The mean first passage time (MFPT) from node i to node j is the expected number of steps it takes for the process to reach node j from node i for the first time (Eq. (8)). Random walk centrality is the inverse of MFPT to a given sector. MFPT is the starting point in the computation of a random walk-based measure. Noh and Rieger (2004) define it as the expected number of steps a random walker who starts at source i needs to reach target j for the first time, H ij .
where P ijr denotes the probability that it takes exactly r steps to reach j from i for the first time. To calculate these probabilities of reaching a node for the first time in r steps, it is useful to regard the target node as an absorbing one, and introduce a transformation of M (in Eq. (6) by deleting its jth row and column and denoting it by M −j .
As the probability of a process starting at i and being in k after r-1 steps is simply given by the (i, k)th element of M r−1 −j , P(i, j, r) can be expressed as: where m jk is a column of M with the element m kk deleted. Substituting Eq. (9) into equation(8), and vectorizing for computational convenience yields: where H(•,j) is the vector of first passage times for a walk ending at node j, e is an n-1 dimensional vector of ones and I is the identity matrix. The result is then used in Eq. (7). The calculation of random walk betweenness centrality is very computationally intensive; for that reason we conducted most of our calculation at the 86-sector level of aggregation 7 rather at the 536-sector aggregation 8 , which proved to be excessively challenging.

Counting betweenness for mediative effects
Mediative effects capture the prominence of sectors as instruments of transmission of total effects. The assumption is that it measures the involvement of a sector in the paths connecting other sectors, as if they were intermediaries in the transactions among other sectors. The more paths a sector participates in, the more its relevance as a connector or conduit in the overall economy. Counting Betweenness generalizes Newman's random walk for directed networks with loops. It measures how often a sector (a node in the network) is visited on first-passage walks, averaged over all source-target pairs, N st ij , and it is shown in Eq. (11).
where N jk (i) is a measure of the frequency with which a random walker reaches node i while going from j to k. 9

Data and application to two cases
In the U.S., the commercial product IMPLAN ® is regarded as an industry standard in terms of its widespread use as a tool for regional economic analysis. IMPLAN captures the inter-sector relationships with the parameter "gross absorption, " which is the total amount of each commodity that is needed for production in one sector. 10 We decided to use "regional absorption, " which is the amount of the required inputs (gross absorption) purchased locally. 11 This, we think, captures the relationships at the local level better than gross absorption, and is a better starting point in the identification of key sectors in the local economy. A sector's relative importance in the local economy increases as it uses more locally sourced inputs.
• Gross Absorption describes the total amount of each commodity that is needed in the production of a sector's output. • Regional Absorption describes the amount of the required inputs that can be obtained locally (the total amount of production requirements that can be sourced within the model's geography).
(11) C CB i = n j=1 n k� =j N jk (i) n(n − 1) 7 Equivalent to the 3-digit level of NAICS. 8 Full specification of the IMPAN dataset. 9 For a detail discussion of the calculations, see Blöchl et al. (2011). They provide the corresponding Matlab code, but it did not work as stated by the authors. 10 IMPLAN granted us special permission to use internal components of their software for the development of our research, for which we are grateful. 11 The relationship between gross and regional absorption is capture by "regional purchase coefficients".
We developed two different settings for the application of our algorithms. In the first case, we examine the economic structure of Monterey County, California. Here we show the differences between the more commonly used multipliers (total output and employment multipliers) and the centrality measures defined above. For the second application, we compare a metro area (Wayne County, Michigan, where the Detroit metropolitan region is located), a state (Michigan), and the whole country (United States). In this case, we show how the same centrality measures vary with the geographic scale and offer some interpretations.
At the time of the analysis, the IMPLAN system provided data at the local level, counties or zipcodes, using a 536-sector aggregation schema. Sectoral aggregations at 2and 3-digit NAICS were also available. We produced datasets at all three levels: 2-digit NAICS (20 sectors); 3-digit NAICS (86 sectors); and native IMPLAN format (536 sectors). At the 86-sector level, the base matrix is a grid which cells are 86 by 86 matrices, that is, its full dimension is 7396 by 7396, or almost 55 million cells; at the 536-sector level, the full dimension is over 82 billion cells. These values are a measure of the computational demands under each aggregation schema. 12 The results presented below correspond to the 86-sector, 3-digit NAICS structure because we think that is the best sectoral map for this application. There are enough sectors to distinguish between subactivities that would be masked under the 20-sector schema, but not so many sectors that would result in the processing of huge matrices. The 3-digit NAICS aggregation has the additional advantage of providing a realistic description of almost any local or regional economy, with relatively very few missing sectors.

Results and interpretation
When applying our algorithm to the datasets in Blöchl et al. (2011), we replicated their results for a variety of countries. Although, they do not report the actual values of their metrics, we are confident our algorithm performs as expected because the high concordance between the rankings they report and our own computations. The R code is available on RPubs. 13 For Monterey County, "Appendix 1" shows both sets of results, that is the traditional multipliers and the network metrics developed in our research. It shows the top-10 sectors, ranked by each of the measures (Output multiplier, Employment multiplier, Random Walk Centrality and Counting Betweenness). "Appendix 2" shows the ranking of all 86 sectors for the same four measures.
For the metropolitan area of Detroit, the state of Michigan, and the United States, "Appendix 3", "Appendix 4" and "Appendix 5" show the top ten sectors for Random Walk Centrality and Counting Betweenness, and the complete ranks in all three measures for all 86 sectors, respectively.
Both measures, Random Walk Centrality (RWC) and Counting Betweenness (CBET), are dynamic in nature as they capture the impacts on each sector as effects propagate throughout the local economy. 14 Random walk centrality measures to what extent a 12 IMPLAN has since shifted to a browser-based implementation and the data structure might have changed. 13 Contact the corresponding author for additional details.
14 In this context, I-O modelers can simulate "shocks" to the systems by, for instance, increasing or decreasing the value of output in a sector sector will be impacted earlier or later during the shock transmission process. Counting betweenness is a measure of impedance in the flow of a shock. A shock will reach a sector with a higher RWC before a sector with lower value of RWC. A shock will transit faster-or will reach more sectors from-an industry with a larger value of CBET. The commonly used output and employment multipliers cannot capture these dynamic effects. In fact, if the objective is to identify systemic weakness in the local economy, the multipliers could be misleading. Compared to a game of falling dominoes, the multipliers would be equivalent to the size of a single domino, while RWC and CBET would show how fast and in which direction the flow of falling dominoes will move, once the initial wave is set in motion (i.e. a shock). 15 In the case of Monterey County, the sectors at the top of the RWC and CBET rankings are quite different from those for the output and employment multipliers. Professional and Scientific Services and Management of Companies are first and second for RWC and first and eight for CBET, while they are 32nd and 12th, and 41st and 50th for output and employment multipliers, respectively. One interpretation is that RWC and CBET for those sectors show that, although their output and employment levels are not very high, they do participate in many (or most) other sectors' production functions. This is one way of assigning preeminence to a sector in the context of the local economy.
When comparing the results for the Detroit metro region, the state of Michigan and the United States, the rankings of RWC and CBET show some interesting effects. For example, sector #9, Utilities, has RWC ranking values 8, 10 and 13 respectively for each geography. That would imply that at the smaller geography (Detroit metro), the "Utilities" sector plays a more important role than in the larger geographies (although it is quite important in all of them). The rankings in CBET show a similar effect. Conversely, for sector #20, Chemical Manufacturing, the effects seem to be opposite. The ranking is much lower for the Detroit metro area (70) than for the state and the nation (37 and 12, respectively). This means one of two things: (1) There is some production for the sector in the Detroit metro region but its output is consumed mostly outside the region, and therefore its local effect is lower; or (2) There is some local production but other local sectors utilize inputs from outside the region (this could be the result of the sectoral aggregation being too coarse to capture sub-sector effects). On the other end of the geographic scale, at the national level the Chemical Manufacturing sector provides inputs to all local production. That becomes meaningless when considering that "local" in this case refers to the whole of the United States.
The reader can derive similar inferences for all other sectors shown in "Appendix 5". Regardless of the remaining ambiguity, the interpretation confirms that both RWC and CBET measures performed as expected in identifying those sectors that are more relevant as contributors of locally sourced production. This attribute is not directly captured by output multipliers, which makes the identification of key sectors in the local economy more difficult. DePaolis et al. Applied Network Science (2022) 7:86 Closing remarks Following the work of Blöchl et al. (2011) and García Muñiz et al. (2008), we implemented random walk-based measures of centrality in I-O networks to identify key sectors in the local and regional economy, using IMPLAN data and our own R code. The brief examination of the literature confirmed that the need for identifying key sectors remains and that improved network analysis techniques are apt instruments for the task. Thus we completed the three original objectives of our research (1) implement RWC and CBET in R; (2) test the implementation on subnational regional datasets; and (3) the presenting a preliminary consolidated version of network metrics from the existing literature. Future explorations include the analysis of other applications of these network metrics, such as identification of industrial clusters.We posit that the measures developed in this paper could uncover previously unidentified clusters and analyze their evolution over time; both could be instrumental in informing future policy actions.