Skip to main content

Identifying key sectors in the regional economy: a network analysis approach using input–output data

Abstract

By applying network analysis techniques to large input–output system, we identify key sectors in the local/regional economy. We overcome the limitations of traditional measures of centrality by using random-walk based measures, as an extension of Blöchl et al. (Phys Rev E 83(4):046127, 2011). These are more appropriate to analyze very dense networks, i.e. those in which most nodes are connected to all other nodes. These measures also allow for the presence of recursive ties (loops), since these are common in economic systems (depending to the level of aggregation, most firms buy from and sell to other firms in the same industrial sector). The centrality measures we present are well suited for capturing sectoral effects missing from the usual output and employment multipliers. We also develop and make available an R implementation for computing the newly developed measures.

Introduction

Wassily Leontief won the Nobel prize in economics in 1973 for developing the Input-Output [I–O] modeling system, in the years before and during WWII. The original purpose of the I–O approach was to identify flows among economic sectors of a region or a country. Those flows represent the exchanges that take place among sectors. In market economies, this functions as a fair representation of sectors buying from and selling to one another. Over time it was clear that the identification of key sectors in the economy was emerging as a new and exciting application as shown in the early contributions of Rasmussen (1956), Laumas (1975), Schultz (1977) and Hewings (1982). In those early times, the implication was that any industrial policy intended to protect or stimulate specific sectors would start with the proper identification of their importance to the system. Today this application still remains relevant, even though I–O systems themselves have been superseded by more sophisticated and complex economic modeling approaches, such as general equilibrium (Jorgenson 2016). Local and regional policy makers and planners can certainly refer to the sectors thus identified to allocate infrastructure funds, grant tax exemptions, employment support programs, and implement other policy actions geared toward strengthening the economic system and ensuring its resilience.

The identification of the most relevant sectors is not, however, an easy task. Is the most important sector the one that produces the highest output? or the one with the highest employment? or the one that buys the most from local suppliers? It is quite obvious that the choice of measurement will greatly affect the results; different sectors are “key” under different assumptions, and for different purposes.

The I–O approach provides a rather straight forward process to describing local economies. This was discovered very early by economic geographers, planners and others who began using it to, first, describe, and later analyze the relationships among local sectors and the emergence or presence of clusters or other groupings that were meaningful in the performance of the local/regional economy as a system (Hubbell 1965; Streit 1969; Roepke et al. 1974 and Campbell 1975).

In recent years, there has a been a growing interest in revisiting those early attempts, but this time with the aid of network analysis techniques. For the most part those efforts have focused on national economies, either at the country level (Giuliani 2013), the regional block level (Guo and Planting 2000; García Muñiz et al. 2008; Montresor and Marzetti 2009; Aroche Reyes and Marquez Mendoza 2012; García Muñiz 2013) or at the global scale (Blöchl et al. 2011). On the other hand, there are relatively few examples of this type of analysis at the sub-national level.

Therefore, there remains a clear and strong motivation to find alternative ways for the identification of key sectors in the local and regional economy, or at the very least the systematic analysis of the connections among sectors (Reid et al. 2008). One such approach is social network analysis (SNA). Though the roots of SNA, as a field, stretch back to the 1930s with Jacob Moreno’s sociometry (Moreno 1934), the field experienced its greatest initial growth and expansion beginning with advances in computing power and ubiquity in the 1970s (Freeman 2004). Although the vast majority of early efforts in SNA applied to networks that are interpersonal and social in nature, the field has since bloomed and has become well-developed, with extensions into a variety of disciplines, such as biology, genetics, physics, and economics. At the core of the network analysis approach is a set of measures that provide a variety of conceptualizations of how one may operationalize the concept of relative prominence of each of the nodes that constitute the network. At its simplest, prominence may be measured as a function of the frequency in which a node is connected to others. Somewhat more nuanced conceptualizations of such measures, however, will consider the structure of the pathways that the ties create and relative distances between nodes in the network when traveling such paths. Frequency, path, and distance considerations have been incorporated into dozens of centrality measures (degree, closeness, betweenness, eigenvector, reach, and flow, just to mention some of the most commonly used measures). The concept of prominence within a network, as operationalized through centrality measures, remains fairly fluid and context-dependent. Even the earliest treatments on the topic, Freeman (1978) asserted that there was no unanimity on what centrality is or on the proper method(s) for its measurement. Despite attempts to evaluate and prioritize various centrality measures (e.g., Harrison et al. 2016; Meng et al. 2017), context and interpretability remain the most valid determinants of their selection and application.

As we see it, we contribute to the advancement of the field by setting multiple objectives. First, we want to consolidate definitions and interpretations from existing frameworks. As it is common with methods that extend into other fields, multiple terms are used to refer to the same concept, and vice-versa. We are not aware of similar efforts. Second, we test whether the newly developed network metrics are applicable to I–O systems at the subnational level (state, and even municipal). This is not a technical issue, but one of interpretation. As we show, the same metrics have different values at the national, state and local level, hinting at the different role that specific industrial sectors play at different level of geographic aggregation. We have not identified other contributions in the literature that analyze subnational geographies in similar fashion. Finally, we implement all the metrics discussed in the paper in R. The code is publicly available and—with relatively little pre-processing—a user can compute all metrics for any network.Footnote 1

Theoretical considerations

As discussed above, I–O systems can be represented by networks in which the nodes (also referred to as vertices) are economic sectors or industries, and the links connecting them (also conceptialized as ties) represent the flows among those industries. More precisely, I–O systems are very dense, valued, directed networks. In a network, density refers to the proportion of links that actually exist as a share of all possible links. I–O networks are very dense because–especially at high levels of aggregation–most, if not all, nodes (industries) will be connected to almost all other nodes.That is, network density (d) approaches 1 (in symbols, \(d\rightarrow \>\)1).Footnote 2 They are valued networks because the links do not only represent the presence of a connection, but such a connection has a specific magnitude. Finally, they are also directed networks because I–O systems represent bi-directional flows between economic sectors. That is, each pair of nodes is connected by two links, one for each of the directions in which transactions may take place, typically with differing values. At greater levels of sectoral disaggregation, a more granular classification captures narrower and narrower definitions of industries and commodities. This results in more differentiation in the flows between sectors, with one direction potentially overshadowing the other by orders of magnitude (Lovász 2009; Miller and Blair 2009). Ties with similar values in both directions are extremely rare in I–O systems.

If the objective is to determine how important a sector is in the economic network, one may consider using vertex centrality measures such as those introduced by Freeman (Freeman 1977, 1978; Freeman et al. 1991). Common vertex centrality measures are of ambiguous applicability in the case of I–O networks. The difficulties associated with applying such vertex centrality measures to I–O systems become apparent when considering some common features that describe such networks. Of particular relevance are the values of the ties between vertices, loops representing recursive trade within an industry, and the overall density of I–O networks.Footnote 3

The simplest of the measures Freeman defined, degree, is calculated as the number of ties that are incident upon a node.Footnote 4 As such, the degree of a node describes how often each industry participates in the production function of others, and which sectors are part of its own production function. Given the high density found in I–O networks, however, the number of links that are incident upon any given node is not likely to vary greatly throughout the network, making degree a relatively poor measure of a given industry’s relative prominence in a local economy.Footnote 5 In addition, because degree measures only direct access to others, it fails to capture the larger systemic effects that are distributed throughout the wider network.

Path-based measures were introduced to take into account a node’s place within the larger network. Two measures that were introduced to take the entire network into account were closeness and betweenness Freeman (1978). Closeness provides a measure of the inverse distance between a node and all other nodes reachable from it. More specifically, closeness centrality for a given node i is calculated as

$$\begin{aligned} C_i^{CLO}=\left[ {{\sum _{j=1}^n}d_{(i,j)}}\right] ^{-1} \end{aligned}$$
(1)

where \(d_{ij}\) is the distance of the shortest path (i.e., geodesic distance, or, simply, geodesic) between node i and any other node j. In this manner, closeness provides a measure of a node’s strategic positioning within an network in terms of the speed or efficiency with which the flows within a network will pass through a particular node. Larger measures of closeness may, for example, indicate a node that will be able to access information or materials more frequently or quickly than others with lower values.

Another means of conceptualizing prominence of a node in terms of flows through a network is betweenness, which measures a node’s potential for being able to capture, enable, or impede the passage of informaiton or materials in a network. As such, it is calculated as

$$\begin{aligned} C_i^{BET}=\frac{g_{jk}(i)}{g_{jk}} \end{aligned}$$
(2)

where \(g_{jk}\) is the number of shortest paths (i.e., geodesics) between node j and node k, and \(g_{jk}(i)\) is the number of those paths that include i. In this manner, betweenness measures the degree to which a particular node i commands strategic junctures within the network.

A major weakness of both betweenness and closeness, as they are commonly used, is that neither measure was conceived to take the value of the ties into account. For I–O networks, this is a major shortcoming, as the actual magnitudes of intersectoral flows are a critical consideration. A solution has been proposed by Dijkstra (1959), Brandes (2001) and Newman (2001) and further modified by Opsahl et al. (2010) to seek the path with the least cumulative impedance, as opposed to the one with the fewest steps. The idea that drives this modification is that ties with lower values transmit less, and may be considered to impede flow more than ties with greater tie values. The resulting implementation is a weighted distance measure given by

$$\begin{aligned} d^{w\alpha }_{(i,j)}=min\left( \frac{1}{(w_{ih})^\alpha }+\cdots +\frac{1}{(w_{hj})^\alpha }\right) \end{aligned}$$
(3)

where the inverse of the tie values (i.e., weights) \(w_{ij}\) is summed for each of the paths between node i and node j, and the path of least resistance (i.e., the one with the lowest value) is selected. The \(\alpha\) coefficient functions as a tuning parameter that is used to either emphasize or deemphasize whether the number of steps should be taken into account. Setting \(\alpha =0\) produces the same measure as if the ties were of binary values; setting \(\alpha =1\) sums the inverse tie values; setting \(0<\alpha <1\) favors fewer steps; and setting \(\alpha >1\) favors stronger tie weights in calculating shortest path distances.

Opsahl et al. (2010) employed the weighted geodesic measure shown in (3) in both closeness (4) and betweenness (5) in a manner that is fairly straightforward.

$$\begin{aligned} C_i^{WCLO}= & {} \left[ {{\sum _{j=1}^n}d^{w\alpha }_{(i,j)}}\right] ^{-1} \end{aligned}$$
(4)
$$\begin{aligned} C_i^{WBET}= & {} \frac{g_{jk}^{w\alpha }(i)}{g^{w\alpha }_{jk}} \end{aligned}$$
(5)

In each case, the weighted geodesic (i.e., shortest path) distance measure has been substituted for the binary form. In the case of weighted closeness, depending on the \(\alpha\) setting, the relative distance in terms of summed inverse tie values is substituted for a count of the number of steps in each path when selecting the lowest value. For, weighted betweenness, on the other hand, \({g^{w\alpha }_{jk}}\) is a count of the number of geodesics occurring between node j and node k (Opsahl 2015).

Weighted path-based centrality measures hold the potential to reveal the relative prominence of nodes in a valued network. Each is well suited for use with valued networks, though there are some shortcomings for each that should be noted in regard to their potential for application in I–O networks. The characteristics of I–O networks that make them a challenge for both standard and weighted network metrics include the values of ties in I–O networks, the recursive loops present in aggregated networks, and their density merit consideration in modeling the flow of resources.

If one considers the weighted distance equation given in Eq. (3) to modify closeness (4) and betweenness (5), it should quickly become apparent that such a weighting metric will function in a manner similar to the measures designed for dichotomous ties (Eqs. (1) and (2)) only when the tie values are limited to a relatively narrow range of integer values. Given that I–O networks are expected to take a theoretically unlimited range of positive continuous values, it becomes increasingly likely that the measure will produce one unique shortest path for any given node pair. Such a solution would emphasize the prominence of nodes that are situated in some of the most proiminent production sectors in the region being evaluated. Although prominence within key sectors will produce useful information, the lack of alternate shortest paths between nodes holds the potential to mask the relative importance of other nodes that may hold secondary importance–something that is also important from a planning and disaster mitigation standpoint. The ability to tune the measure using the \(\alpha\) setting helps to reduce this tendency, but also requires a more standardized approach to tuning that has not yet been evaluated in I–O networks at this point.

An additional challenge to the anlaysis of I–O networks occurs when considering the recursive loops that are used to best model flows within a system that has been aggregated to create a set of nodes that would normally trade amongst themselves into a metanode that represents an entire industrial sector. Depending on the level of aggregation, the presence of loops within I–O networks can be substantial. However, both the classic binary and the weighted assessments of shortest paths through a network will logically always ignore such loops, as they would not normally constitute a “shorter” path through the network. This is not a realistic representation of the actual behavior of flows within an I–O network.

The final consideration that should be important to those modeling I–O networks is the high density of ties within the network. Using path-based measures that treat ties as binary tends to make it look as though all nodes are relatively “close” to one another since, on average, the paths throughout exceptionally dense networks will be of roughly the same length. This tendency is somewhat reduced by using weighted ties. When employing weighted measures, both closeness and betweenness will produce measures of prominence that are relative to the total network. They do not provide any special consideration for the more immediate neighborhood (e.g., manufacturing sector) around each node.

One way of avoiding most of those limitations is adopting measures of centrality based on random walks. Newman (2005) explored and further developed the notion of a measure of centrality based on random walks to overcome the need for a pre-determined, known flow from each source (s) to each target (t), and stated that “the random-walk betweenness of a vertex i is equal to the number of times that a random walk starting at s and ending at t passes through i along the way, averaged over all s and t.”

Experts in I–O modeling, not yet familiar with network metrics, perceive the notion of random walk as one that does not correctly represent the relationships in the economic system. Their complaints are not without merit because, after all, industrial sectors do not consume a random collection of inputs in the hopes of producing a very specific output. Those input are indeed very well defined by a sector’s production function. In the same vein, one could conceptualize random walks as representing a way of exhausting all possible combinations of inputs that produce all outputs in the economy. Because the computation of any random walk-based measure involves the “value” of the flow between two given sectors, only the ties that are present will influence the magnitude of the metric. Therefore, all random walked-based measures of an I–O system will reflect the true underlying production functions that link those sectors.

Network metrics that use random walks could identify those sectors that participate in all other sectors’ production functions more often and with a greater insidence. This is one of the most significant differences between simple measures of centrality (which would identify the “presence” of the tie, in an on–off fashion) and random walk-based measures of centrality, which also include an apt representation of the intensity of the tie.

Methods

Following Friedkin (1991) we present a means of measuring immediate effects and mediative effects, in addition to the already defined total effects. Lee (2006) provides a more extended discussion about the equivalency between total, immediate and mediative effects and eigenvector, closeness and betweenness centrality, respectively. In this section we discuss approaches that provide a better definition of mediative and immediate effects within a network, as presented in García Muñiz et al. (2008) and Blöchl et al. (2011). Table 1 presents our own attempt to establish the equivalence between the measures proposed by each author.

Table 1 Equivalency of measurements

What follows is a discussion of two alternative conceptualizations and corresponding definitions for closeness and betweenness centrality. In each case, these metrics provide a measurement that more closely mimics flows within I–O networks than the more widely shared (Freeman 1978) versions. We have elected to implement the definitions set by Blöchl et al. (2011). The metrics proposed in García Muñiz et al. (2008) are the inverse of those considered in Blöchl et al. (2011). This does not alter the outcome in terms of–for instance–sectoral rankings because the information transmitted by those metrics is equivalent.

One question remains. Is it appropriate to add the immediate and mediative effects to obtain a sort of total effects? Friedkin (1991) refers to total effects in the context of social networks but we have not found any work in which such definition is applied to I–O networks. Our cursory examination shows that immediate and mediative effects could show opposite directions, producing ambiguous interpretation of total effects. This is perhaps a computational confirmation of our intuition, which is that immediate and mediative effects correspond to two substantially different processes, and their combination is–at least–questionable. This is certainly an area for further investigation.

Random walk centrality for immediate effects

Sectors that have effects transmitted over long sequences of economic relations have lower economic impacts than those sectors that have a large number of direct linkages. The measure immediate effects is the reciprocal of the mean length of the sequences of relations from the \(\hbox {j}{th}\) sector to all others.

Consider a weighted network, either directed or undirected, with n nodes denoted by j = 1,..., n; and a random walk process on this network with a transition matrix M.Footnote 6 The \(\hbox {m}_{jk}\) element of M describes the probability of the random walker that has reached node i, proceeds directly to node j. These probabilities are defined by Eq. (6).

$$\begin{aligned} M_{ij} = \frac{a_{ij}}{ \sum \nolimits _{j=1}^{n}a_{ij}} \end{aligned}$$
(6)

where \(\hbox {a}_{ij}\) is the (i, j)th element of the weighting matrix A of the network. When there is no tie between two nodes, the corresponding element of the A matrix is zero. In the case of I–O system as we implement it, A is a matrix of regional absorption coefficients. Equation (7) shows the random walk closeness centrality of a node i as a function of the inverse of the average mean first passage time to that node.

$$\begin{aligned} C_{i}^{RWC} = \frac{n}{ \sum\nolimits _{j=1}^{n}H_{ij}} \end{aligned}$$
(7)

The mean first passage time (MFPT) from node i to node j is the expected number of steps it takes for the process to reach node j from node i for the first time (Eq. (8)). Random walk centrality is the inverse of MFPT to a given sector. MFPT is the starting point in the computation of a random walk-based measure. Noh and Rieger (2004) define it as the expected number of steps a random walker who starts at source i needs to reach target j for the first time, \(H_{ij}\).

$$\begin{aligned} H_{ij}= \sum _{j=1}^{\infty }rP_{ijr} \end{aligned}$$
(8)

where \(P_{ijr}\) denotes the probability that it takes exactly r steps to reach j from i for the first time. To calculate these probabilities of reaching a node for the first time in r steps, it is useful to regard the target node as an absorbing one, and introduce a transformation of M (in Eq. (6) by deleting its \(\hbox {j}{th}\) row and column and denoting it by \(M_{-j}\).

As the probability of a process starting at i and being in k after r-1 steps is simply given by the (ik)th element of \(M_{-j}^{r-1}\), P(ijr) can be expressed as:

$$\begin{aligned} P_{ijr}= \sum _{k\ne {j}}((M_{-j}^{r-1}))_{ik}m_{kj} \end{aligned}$$
(9)

where \(\hbox {m}_{jk}\) is a column of M with the element \(\hbox {m}_{kk}\) deleted. Substituting Eq. (9) into equation(8), and vectorizing for computational convenience yields:

$$\begin{aligned} H(\bullet ,j)=(I-M_{-j})^{-1}e \end{aligned}$$
(10)

where H(\(\bullet\),j) is the vector of first passage times for a walk ending at node j, e is an n-1 dimensional vector of ones and I is the identity matrix. The result is then used in Eq. (7). The calculation of random walk betweenness centrality is very computationally intensive; for that reason we conducted most of our calculation at the 86-sector level of aggregationFootnote 7 rather at the 536-sector aggregationFootnote 8, which proved to be excessively challenging.

Counting betweenness for mediative effects

Mediative effects capture the prominence of sectors as instruments of transmission of total effects. The assumption is that it measures the involvement of a sector in the paths connecting other sectors, as if they were intermediaries in the transactions among other sectors. The more paths a sector participates in, the more its relevance as a connector or conduit in the overall economy. Counting Betweenness generalizes Newman’s random walk for directed networks with loops. It measures how often a sector (a node in the network) is visited on first-passage walks, averaged over all source-target pairs, \(\hbox {N}_{ij}^{st}\), and it is shown in Eq. (11).

$$\begin{aligned} C_i^{CB} = \frac{\sum^{n} _{j=1} \sum^{n} _{k \ne j} N^{jk}(i)}{n(n-1)} \end{aligned}$$
(11)

where \(N^{jk}(i)\) is a measure of the frequency with which a random walker reaches node i while going from j to k.Footnote 9

Data and application to two cases

In the U.S., the commercial product \(\hbox {IMPLAN}^{{\textregistered} }\) is regarded as an industry standard in terms of its widespread use as a tool for regional economic analysis. IMPLAN captures the inter-sector relationships with the parameter “gross absorption,” which is the total amount of each commodity that is needed for production in one sector.Footnote 10 We decided to use “regional absorption,” which is the amount of the required inputs (gross absorption) purchased locally.Footnote 11 This, we think, captures the relationships at the local level better than gross absorption, and is a better starting point in the identification of key sectors in the local economy. A sector’s relative importance in the local economy increases as it uses more locally sourced inputs.

  • Gross Absorption describes the total amount of each commodity that is needed in the production of a sector’s output.

  • Regional Absorption describes the amount of the required inputs that can be obtained locally (the total amount of production requirements that can be sourced within the model’s geography).

We developed two different settings for the application of our algorithms. In the first case, we examine the economic structure of Monterey County, California. Here we show the differences between the more commonly used multipliers (total output and employment multipliers) and the centrality measures defined above. For the second application, we compare a metro area (Wayne County, Michigan, where the Detroit metropolitan region is located), a state (Michigan), and the whole country (United States). In this case, we show how the same centrality measures vary with the geographic scale and offer some interpretations.

At the time of the analysis, the IMPLAN system provided data at the local level, counties or zipcodes, using a 536-sector aggregation schema. Sectoral aggregations at 2- and 3-digit NAICS were also available. We produced datasets at all three levels: 2-digit NAICS (20 sectors); 3-digit NAICS (86 sectors); and native IMPLAN format (536 sectors). At the 86-sector level, the base matrix is a grid which cells are 86 by 86 matrices, that is, its full dimension is 7396 by 7396, or almost 55 million cells; at the 536-sector level, the full dimension is over 82 billion cells. These values are a measure of the computational demands under each aggregation schema.Footnote 12 The results presented below correspond to the 86-sector, 3-digit NAICS structure because we think that is the best sectoral map for this application. There are enough sectors to distinguish between sub-activities that would be masked under the 20-sector schema, but not so many sectors that would result in the processing of huge matrices. The 3-digit NAICS aggregation has the additional advantage of providing a realistic description of almost any local or regional economy, with relatively very few missing sectors.

Results and interpretation

When applying our algorithm to the datasets in Blöchl et al. (2011), we replicated their results for a variety of countries. Although, they do not report the actual values of their metrics, we are confident our algorithm performs as expected because the high concordance between the rankings they report and our own computations. The R code is available on RPubs.Footnote 13

For Monterey County, “Appendix 1” shows both sets of results, that is the traditional multipliers and the network metrics developed in our research. It shows the top-10 sectors, ranked by each of the measures (Output multiplier, Employment multiplier, Random Walk Centrality and Counting Betweenness). “Appendix 2” shows the ranking of all 86 sectors for the same four measures.

For the metropolitan area of Detroit, the state of Michigan, and the United States, “Appendix 3”, “Appendix 4” and “Appendix 5” show the top ten sectors for Random Walk Centrality and Counting Betweenness, and the complete ranks in all three measures for all 86 sectors, respectively.

Both measures, Random Walk Centrality (RWC) and Counting Betweenness (CBET), are dynamic in nature as they capture the impacts on each sector as effects propagate throughout the local economy.Footnote 14 Random walk centrality measures to what extent a sector will be impacted earlier or later during the shock transmission process. Counting betweenness is a measure of impedance in the flow of a shock. A shock will reach a sector with a higher RWC before a sector with lower value of RWC. A shock will transit faster–or will reach more sectors from–an industry with a larger value of CBET. The commonly used output and employment multipliers cannot capture these dynamic effects. In fact, if the objective is to identify systemic weakness in the local economy, the multipliers could be misleading. Compared to a game of falling dominoes, the multipliers would be equivalent to the size of a single domino, while RWC and CBET would show how fast and in which direction the flow of falling dominoes will move, once the initial wave is set in motion (i.e. a shock).Footnote 15

In the case of Monterey County, the sectors at the top of the RWC and CBET rankings are quite different from those for the output and employment multipliers. Professional and Scientific Services and Management of Companies are first and second for RWC and first and eight for CBET, while they are 32nd and 12th, and 41st and 50th for output and employment multipliers, respectively. One interpretation is that RWC and CBET for those sectors show that, although their output and employment levels are not very high, they do participate in many (or most) other sectors’ production functions. This is one way of assigning preeminence to a sector in the context of the local economy.

When comparing the results for the Detroit metro region, the state of Michigan and the United States, the rankings of RWC and CBET show some interesting effects. For example, sector #9, Utilities, has RWC ranking values 8, 10 and 13 respectively for each geography. That would imply that at the smaller geography (Detroit metro), the “Utilities” sector plays a more important role than in the larger geographies (although it is quite important in all of them). The rankings in CBET show a similar effect. Conversely, for sector #20, Chemical Manufacturing, the effects seem to be opposite. The ranking is much lower for the Detroit metro area (70) than for the state and the nation (37 and 12, respectively). This means one of two things: (1) There is some production for the sector in the Detroit metro region but its output is consumed mostly outside the region, and therefore its local effect is lower; or (2) There is some local production but other local sectors utilize inputs from outside the region (this could be the result of the sectoral aggregation being too coarse to capture sub-sector effects). On the other end of the geographic scale, at the national level the Chemical Manufacturing sector provides inputs to all local production. That becomes meaningless when considering that “local” in this case refers to the whole of the United States.

The reader can derive similar inferences for all other sectors shown in “Appendix 5”. Regardless of the remaining ambiguity, the interpretation confirms that both RWC and CBET measures performed as expected in identifying those sectors that are more relevant as contributors of locally sourced production. This attribute is not directly captured by output multipliers, which makes the identification of key sectors in the local economy more difficult.

Closing remarks

Following the work of Blöchl et al. (2011) and García Muñiz et al. (2008), we implemented random walk-based measures of centrality in I–O networks to identify key sectors in the local and regional economy, using IMPLAN data and our own R code. The brief examination of the literature confirmed that the need for identifying key sectors remains and that improved network analysis techniques are apt instruments for the task. Thus we completed the three original objectives of our research (1) implement RWC and CBET in R; (2) test the implementation on subnational regional datasets; and (3) the presenting a preliminary consolidated version of network metrics from the existing literature.

Future explorations include the analysis of other applications of these network metrics, such as identification of industrial clusters.We posit that the measures developed in this paper could uncover previously unidentified clusters and analyze their evolution over time; both could be instrumental in informing future policy actions.

Availability of data and materials

The R code is available on RPubs (https://rpubs.com/RStudio_knight/368268). Contact the corresponding author for sample datasets and additional details.

Notes

  1. NOTE: In the field of network analytics it is very common to find multiple definitions and implementations for the same metric, or one with very slight variations. Whenever possible, we refer to the earliest definition of a measure or concept.

  2. One reason for high density is high levels of sectoral aggreation. But even highly disaggregated systems, with hundreds of nodes, can have, for example d > 0.7, where 70% of all possible ties are present.

  3. Loops may disappear (or be drastically reduced) at higher levels sectoral aggregation.

  4. In directed networks, the degree measure can be separated into in-degree and out-degree to account for ties coming to or going from a node, respectively.

  5. When running the analysis at the 2-digic NAICS level (North American Industrial Classification System level which contains 20 sectors), we found degree values averaging around 19, meaning the almost every sector was connected to every other sectors.

  6. In a random walk process (such as a Markov chain), for pair of “states”, there is a transition probability of going from the source node to the target node. The “transition matrix” M contains the probability of each step of the process of being at the source or the target. For detailed discussions, see Blum et al. (2015) and Schulman (2016).

  7. Equivalent to the 3-digit level of NAICS.

  8. Full specification of the IMPAN dataset.

  9. For a detail discussion of the calculations, see Blöchl et al. (2011). They provide the corresponding Matlab code, but it did not work as stated by the authors.

  10. IMPLAN granted us special permission to use internal components of their software for the development of our research, for which we are grateful.

  11. The relationship between gross and regional absorption is capture by “regional purchase coefficients”.

  12. IMPLAN has since shifted to a browser-based implementation and the data structure might have changed.

  13. Contact the corresponding author for additional details.

  14. In this context, I–O modelers can simulate “shocks” to the systems by, for instance, increasing or decreasing the value of output in a sector

  15. The description of the process gives the appearance of dynamics, but in reality we are just computing the aggregate effects as if it were happening instantaneously. In our metrics, there is no explicit modeling of time.

References

  • Aroche Reyes F, Marquez Mendoza MA (2012) An economic network in North America. MPRA Paper 61391, University Library of Munich, Germany

  • Blöchl F, Theis FJ, Vega-Redondo F, Fisher E (2011) Vertex centralities in input–output networks reveal the structure of modern economies. Phys Rev E 83(4):046127

    Article  Google Scholar 

  • Blum A, Hopcroft J, Kannan R (2015) Foundations of data science

  • Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25:163–177

    Article  MATH  Google Scholar 

  • Campbell J (1975) Application of graph theoretic analysis to inter-industry relationships. Reg Sci Urban Econ 5:91–106

    Article  Google Scholar 

  • Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1:269–271

    Article  MathSciNet  MATH  Google Scholar 

  • Freeman L (1977) A set of measures of centrality based on betweenness. Sociometry 40(1):35–41

    Article  Google Scholar 

  • Freeman L (1978/79) Centrality in social networks conceptual clarification. Soc Netw 1(3):215–239

  • Freeman L, Borgatti S, White D (1991) Centrality in valued graphs: a measure of betweenness based on network flow. Soc Netw 13:141–154

    Article  MathSciNet  Google Scholar 

  • Freeman LC (2004) The development of social network analysis: a study of the sociology of science. Empirical Press, Vancouver

    Google Scholar 

  • Friedkin N (1991) Theoretical foundations for centrality measures. Am J Sociol 96(6):1478–1504

    Article  Google Scholar 

  • García Muñiz A (2013) Modelling linkages versus leakages networks: the case of Spain. Reg Sect Econ Stud 13(1):43–54

    Google Scholar 

  • García Muñiz A, Morillas Raya A, Ramos Carvajal C (2008) Key sectors: a new proposal from network theory. Reg Stud 42(7):1013–1030

    Article  Google Scholar 

  • Giuliani E (2013) Network dynamics in regional clusters: evidence from Chile. Res Policy 42(8):1406–1419

    Article  Google Scholar 

  • Guo J, Planting M (2000) Using input–output analysis to measure the U.S. economic structural change over a 24 year period. BEA Papers 0004, Bureau of Economic Analysis

  • Harrison K, Ventresca M, Ombuki-Berman B (2016) A meta-analysis of centrality measures for comparing and generating complex network models. J Comput Sci 17:205–215

    Article  Google Scholar 

  • Hewings GJD (1982) The empirical identification of key sectors in an economy: a regional perspective. Dev Econ 20(2):173–195

    Article  Google Scholar 

  • Hubbell C (1965) An input–output approach to clique identification. Sociometry 28(4):377–399

    Article  Google Scholar 

  • Jorgenson D (2016) Econometric general equilibrium modeling. J Policy Model 38(3):436–447

    Article  Google Scholar 

  • Laumas P (1975) Key sectors in some underdeveloped countries. Kyklos 28(1):62–79

    Article  Google Scholar 

  • Lee C-Y (2006) Correlations among centrality measures in complex networks. ArXiv Physics e-prints

  • Lovász L (2009) Very large graphs. Curr Dev Math 2008:67–128

    Article  MathSciNet  MATH  Google Scholar 

  • Meng F, Gu Y, Fu S, Wang M, Guo Y (2017) Comparison of different centrality measures to find influential nodes in complex networks. In: Wang G, Atiquzzaman M, Yan Z, Choo K (eds) Security, privacy, and anonymity in computation, communication, and storage. Springer, pp 415–423

    Chapter  Google Scholar 

  • Miller R, Blair P (2009) Input–output analysis, foundations and extensions. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Montresor S, Marzetti G (2009) Applying social network analysis to input–output based innovation matrices: an illustrative application to six OECD technological systems for middle 1990s. Econ Syst Res 21(2):129–149

    Article  Google Scholar 

  • Moreno J (1934) Who shall survive? Nervous and Mental Disease Publishing Company, Washington

    Google Scholar 

  • Newman MEJ (2001) Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys Rev E 64:016132

    Article  Google Scholar 

  • Newman M (2005) A measure of betweenness centrality based on random walks. Soc Netw 27(1):39–54

    Article  Google Scholar 

  • Noh J, Rieger H (2004) Random walks on complex networks. Phys Rev Lett 92:118701

    Article  Google Scholar 

  • Opsahl T (2015) tnet: software for analysis of weighted, two-mode, and longitudinal networks

  • Opsahl T, Agneessens F, Skvoretz J (2010) Node centrality in weighted networks: generalizing degree and shortest paths. Soc Netw 32:245–251

    Article  Google Scholar 

  • Rasmussen PN (1956) Studies in inter-sectoral relations. E. Harck, Copenhagen

    Google Scholar 

  • Reid N, Smith B, Carroll M (2008) Cluster regions. Econ Dev Q 22(4):345–352

    Article  Google Scholar 

  • Roepke H, Adams D, Wiseman R (1974) A new approach to the identification of industrial complexes using input–output data. J Reg Sci 14(1):15–29

    Article  Google Scholar 

  • Schulman LS (2016) Transition matrix from a random walk. ArXiv e-prints arXiv:1605.04282

  • Schultz S (1977) Approaches to identifying key sectors empirically by means of input–output analysis. J Dev Stud 14(1):77–96

    Article  Google Scholar 

  • Streit M (1969) Spatial associations and economic linkages between industries. J Reg Sci 9(2):177–187

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

DePaolis and Murphy contributed equally. De Paolis Kaluza contributed to the code development. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Fernando DePaolis.

Ethics declarations

Competing interests

The authors declare that there are no competing interests or conflicts related to this research.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Top 10 sectors by selected indicators for Monterey County

Order

Output multiplier

Employment multiplier

Random walk centrality

Counting betweenness

1

Securities other financial

Social assistance

Professional, scientific, tech svcs

Professional, scientific, tech svcs

2

Funds, trusts, other finan

Misc retailers

Management of companies

Real estate

3

Electronics, appliances stores

Personal, laundry svcs

Real estate

Admin support svcs

4

Broadcasting

Private households

Admin support svcs

Wholesale Trade

5

Food products

Admin support svcs

Wholesale Trade

Construction

6

Personal, laundry svcs

Educational svcs

Construction

Insurance carriers, related

7

Social assistance

Nursing, residential care

Government, non NAICs

Monetary authorities

8

Sightseeing transportation

Sports, hobby, book, music stores

Monetary authorities

Management of companies

9

Religious, grantmaking, similar orgs

Food svcs, drinking places

Securities, other financial

Securities, other financial

10

Telecommu-nications

Ag, Forestry Svcs

Food svcs, drinking places

Government, non NAICs

Appendix 2: Complete rankings for Monterey County

Sector ID

Sector description

Output multiplier

Employment multiplier

Random walk centrality

Counting betweenness centrality

1

Crop Farming

66

36

38

37

2

Livestock

20

66

49

48

3

Forestry & Logging

35

29

78

79

4

Fishing- Hunting & Trapping

70

19

79

72

5

Ag & Forestry Svcs

55

10

81

49

6

Oil & gas extraction

72

68

35

34

7

Mining

83

73

71

71

8

Mining services

58

53

37

31

9

Utilities

85

85

27

20

10

Construction

23

44

6

5

11

Food products

5

62

26

23

12

Beverage & Tobacco

36

70

64

64

13

Textile Mills

78

58

74

75

14

Textile Products

42

43

70

70

15

Leather & Allied

15

31

80

80

16

Wood Products

59

46

65

65

17

Paper Manufacturing

71

74

75

76

18

Printing & Related

47

37

56

57

19

Petroleum & coal prod

27

83

59

59

20

Chemical Manufacturing

79

84

46

44

21

Plastics & rubber prod

81

69

76

77

22

Nonmetal mineral prod

69

65

72

73

23

Primary metal mfg

61

61

69

69

24

Fabricated metal prod

74

55

60

60

25

Machinery Mfg

75

76

30

28

26

Computer & oth electron

65

71

36

35

27

Electircal eqpt & appliances

77

67

77

78

28

Transportation eqpmt

80

80

53

51

29

Furniture & related prod

60

47

58

58

30

Miscellaneous mfg

51

51

68

68

31

Wholesale Trade

54

60

5

4

32

Motor veh & parts dealers

67

40

33

36

33

Furniture & home furnishings

37

27

61

61

34

Electronics & appliances stores

3

11

66

66

35

Bldg materials & garden dealers

33

25

45

46

36

food & beverage stores

50

15

51

52

37

Health & personal care stores

26

20

39

39

38

Gasoline stations

34

34

43

43

39

Clothing & accessories stores

49

22

21

24

40

Sports- hobby- book & music stores

38

8

52

53

41

General merch stores

48

16

31

32

42

Misc retailers

17

2

41

41

43

Non-store retailers

68

35

15

16

44

Air transportation

41

64

32

33

45

Rail Transportation

29

59

62

62

46

Water transportation

86

86

86

86

47

Truck transportation

28

39

19

22

48

Transit & ground passengers

40

17

20

30

49

Pipeline transportation

45

63

73

74

50

Sightseeing transportation

8

32

18

15

51

Couriers & messengers

18

14

24

25

52

Warehousing & storage

30

33

28

27

53

Publishing industries

76

72

57

56

54

Motion picture & sound recording

82

77

34

26

55

Broadcasting

4

56

47

42

56

Internet publishing and broadcasting

62

82

12

13

57

Telecommunications

10

52

48

47

58

Internet & data process svcs

64

81

63

63

59

Other information services

43

75

50

50

60

Monetary authorities

16

45

8

7

61

Credit inmediation & related

19

48

14

14

62

Securities & other financial

1

13

9

9

63

Insurance carriers & related

57

54

13

6

64

Funds- trusts & other finan

2

18

55

55

65

Real estate

84

78

3

2

66

Rental & leasing svcs

63

49

17

21

67

Lessor of nonfinance intang assets

56

79

16

18

68

Professional- scientific & tech svcs

32

41

1

1

69

Management of companies

12

50

2

8

70

Admin support svcs

39

5

4

3

71

Waste mgmt & remediation svcs

53

57

22

17

72

Educational svcs

21

6

54

54

73

Ambulatory health care

25

30

67

67

74

Hospitals

22

38

82

81

75

Nursing & residential care

13

7

84

83

76

Social assistance

7

1

84

83

77

Performing arts & spectator sports

14

21

23

19

78

Museums & similar

11

26

25

83

79

Amusement- gambling & recreation

31

23

44

45

80

Accommodations

46

24

29

29

81

Food svcs & drinking places

24

9

10

11

82

Repair & maintenance

52

28

11

12

83

Personal & laundry svcs

6

3

42

40

84

Religious- grantmaking- & similar orgs

9

12

40

38

85

Private households

44

4

86

86

86

Government & non NAICs

73

42

7

10

Appendix 3: Top 10 sectors by random walk centrality for Detroit (Wayne County), State of Michigan, and the United States

Order

Detroit

Michigan

United States

1

Professional- scientific and tech svcs

Professional- scientific and tech svcs

Professional- scientific and tech svcs

2

Management of companies

Management of companies

Management of companies

3

Real estate

Real estate

Real estate

4

Admin support svcs

Admin support svcs

Admin support svcs

5

Construction

Construction

Wholesale Trade

6

Wholesale Trade

Wholesale Trade

Construction

7

Petroleum and coal prod

Insurance carriers and related

Petroleum and coal prod

8

Utilities

Monetary authorities

Securities and other financial

9

Monetary authorities

Securities and other financial

Monetary authorities

10

Insurance carriers and related

Utilities

Oil and gas extraction

Appendix 4: Top 10 sectors by counting betweenness for Detroit (Wayne County), State of Michigan, and the United States

Order

Detroit

Michigan

United States

1

Professional- scientific and tech svcs

Professional- scientific and tech svcs

Professional- scientific and tech svcs

2

Utilities

Insurance carriers and related

Insurance carriers and related

3

Real estate

Real estate

Real estate

4

Admin support svcs

Admin support svcs

Admin support svcs

5

Insurance carriers and related

Utilities

Utilities

6

Construction

Construction

Chemical Manufacturing

7

Wholesale Trade

Wholesale Trade

Wholesale Trade

8

Petroleum and coal prod

Monetary authorities

Securities and other financial

9

Management of companies

Securities and other financial

Management of companies

10

Monetary authorities

Management of companies

Monetary authorities

Appendix 5: Complete rankings for Detroit (Wayne County), State of Michigan, and the United States

Sector ID

Sector description

Random walk centrality

Counting betweenness

Detroit

Michigan

United States

Detroit

Michigan

United States

1

Crop Farming

11

61

55

70

59

54

2

Livestock

78

69

58

76

64

55

3

Forestry and Logging

83

65

66

81

56

58

4

Fishing- Hunting and Trapping

82

68

69

77

79

77

5

Ag and Forestry Svcs

80

67

67

78

66

65

6

Oil and gas extraction

31

23

10

28

24

14

7

Mining

79

46

29

79

46

25

8

Mining services

63

35

14

60

33

13

9

Utilities

8

10

13

2

5

5

10

Construction

5

5

6

6

6

11

11

Food products

39

48

41

31

45

35

12

Beverage and Tobacco

71

63

62

68

62

61

13

Textile Mills

75

78

73

73

76

70

14

Textile Products

74

81

79

72

80

78

15

Leather and Allied

81

82

82

80

81

81

16

Wood Products

62

44

49

57

42

42

17

Paper Manufacturing

77

53

35

75

49

32

18

Printing and Related

57

54

45

55

54

44

19

Petroleum and coal prod

7

20

7

8

20

12

20

Chemical Manufacturing

70

37

12

67

34

6

21

Plastics and rubber prod

33

41

31

29

40

31

22

Nonmetal mineral prod

76

34

42

74

32

41

23

Primary metal mfg

47

49

23

43

47

19

24

Fabricated metal prod

45

29

18

42

30

17

25

Machinery Mfg

18

55

21

15

55

22

26

Computer and oth electron

40

73

25

34

70

18

27

Electircal eqpt and appliances

35

64

51

32

65

52

28

Transportation eqpmt

23

33

34

17

29

28

29

Furniture and related prod

50

74

64

48

72

62

30

Miscellaneous mfg

66

75

65

62

73

64

31

Wholesale Trade

6

6

5

7

7

7

32

Motor veh and parts dealers

36

38

57

36

38

57

33

Furniture and home furnishings

69

76

78

66

74

76

34

Electronics and appliances stores

73

79

81

71

78

80

35

Bldg materials and garden dealers

52

56

74

49

57

72

36

food and beverage stores

61

71

76

59

69

74

37

Health and personal care stores

44

45

63

41

48

66

38

Gasoline stations

55

58

72

53

58

71

39

Clothing and accessories stores

25

30

50

25

31

51

40

Sports- hobby- book and music stores

60

66

77

58

67

75

41

General merch stores

34

36

56

33

35

56

42

Misc retailers

46

51

71

46

51

69

43

Non-store retailers

27

26

43

27

27

43

44

Air transportation

42

25

33

38

25

37

45

Rail Transportation

54

62

46

51

63

47

46

Water transportation

68

72

70

65

71

68

47

Truck transportation

16

17

27

16

17

30

48

Transit and ground passengers

17

18

28

39

44

46

49

Pipeline transportation

41

39

48

37

39

48

50

Sightseeing transportation

49

24

30

47

22

24

51

Couriers and messengers

24

22

32

24

19

33

52

Warehousing and storage

22

28

38

21

28

39

53

Publishing industries

59

42

52

56

37

50

54

Motion picture and sound recording

51

47

47

44

41

38

55

Broadcasting

58

57

54

54

52

49

56

Internet publishing and broadcasting

21

15

16

19

11

15

57

Telecommunications

43

19

24

40

21

27

58

Internet and data process svcs

67

52

61

64

53

63

59

Other information services

56

43

40

52

43

40

60

Monetary authorities

9

8

9

10

8

10

61

Credit inmediation and related

19

14

19

18

15

20

62

Securities and other financial

14

9

8

11

9

8

63

Insurance carriers and related

10

7

11

5

2

2

64

Funds- trusts and other finan

64

59

68

61

60

67

65

Real estate

3

3

3

3

3

3

66

Rental and leasing svcs

20

21

26

20

23

29

67

Lessor of nonfinance intang assets

26

16

22

26

16

26

68

Professional- scientific and tech svcs

1

1

1

1

1

1

69

Management of companies

2

2

2

9

10

9

70

Admin support svcs

4

4

4

4

4

4

71

Waste mgmt and remediation svcs

29

27

36

23

18

36

72

Educational svcs

65

70

75

63

68

73

73

Ambulatory health care

72

80

80

69

77

79

74

Hospitals

84

83

83

82

82

82

75

Nursing and residential care

84

83

83

83

83

83

76

Social assistance

84

83

83

83

83

83

77

Performing arts and spectator sports

28

31

37

22

26

34

78

Museums and similar

30

32

39

83

83

83

79

Amusement- gambling and recreation

53

60

60

50

61

60

80

Accommodations

32

77

44

30

75

45

81

Food svcs and drinking places

13

12

17

13

13

21

82

Repair and maintenance

15

13

20

14

14

23

83

Personal and laundry svcs

48

50

59

45

50

59

84

Religious- grantmaking- and similar orgs

37

40

53

35

36

53

85

Private households

38

86

86

86

86

86

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

DePaolis, F., Murphy, P. & De Paolis Kaluza, M.C. Identifying key sectors in the regional economy: a network analysis approach using input–output data. Appl Netw Sci 7, 86 (2022). https://doi.org/10.1007/s41109-022-00519-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s41109-022-00519-2

Keywords

JEL Classification