Skip to main content

Skills-approximate occupations: using networks to guide jobs retraining


An issue often confronting economic development agencies is how to minimize unemployment due to disruptions like technological change, trade wars, recessions, or other economic shocks. Decision makers are left to craft policies that can absorb surplus labor with as little pain to workers as possible. The questions they face include how to re-employ displaced workers and how to fill labor shortages. To address such questions, we quantify the proximity of any two occupations based on the skills inherent in each. Taking labor skills as nodes, we model US labor as a weighted network of interdependent skills, deriving link values from geographical patterns of skill co-occurrence. We use this network to locate occupations, measure their proximity to each other, and identify which missing skills may inhibit workers from easily transitioning from one occupation to another. Thus, given that an occupation is a bundle of skills, we use our skills network to help policy makers identify which other occupations are most proximate a worker’s current occupation. Finally, we apply our method to assess various worker retraining pathways for metropolitan Washington, DC, USA, whose economy was simultaneously disrupted by both the COVID-19 pandemic and the arrival of a second headquarters for Amazon.


The field of economic geography has increasingly embraced a complexity science perspective to understand economic phenomena. Such studies apply techniques from complex systems science to empirical data on the geography of economic activities to characterize economies and understand development pathways. Two major themes emerging in this literature are those of relatedness metrics and complexity metrics (Hidalgo 2021). While both of these themes use methods from network analysis as a primary means of analyzing economic components of interest, relatedness approaches typically focus on one class of economic entities (e.g., industries, occupations, products, skill, etc.), creating a network, or ‘space’, to be analyzed. In such networks, nodes represent the economic entities of study (e.g., occupations) while link values are determined by some measure of proximity between pairs of nodes. The resulting network (e.g., “occupation space”) is then characterized and analyzed to better understand economic phenomena of interest.

Among the earliest and most influential of these studies was the creation and characterization of a global “product space” (Hidalgo C. A. et al. 2007). In this network nodes represent products or goods and proximity of two products i and j is defined as the conditional probability that a country exports i given that it exports j. The authors found the network to be heterogenous, with more complex goods located near the network’s core and less complex goods located primarily near its periphery. The authors go on to suggest that countries exporting those products found primarily at the network’s periphery would have difficulty transitioning into the production and export of the more complex goods found in the network’s core.

Networks – or spaces – of other economic entities soon followed. Industry space has been mapped using link values based on the co-occurrence of products in manufacturing plants (Neffke et al. 2011) and inter-industry labor flows (Neffke and Henning 2013). Knowledge space has been mapped with link values derived from the co-occurrence patterns of patent technology classes across patent applications (Kogler et al. 2013) while technology space has been mapped based on the conditional probability that a region specializes in a given patent technology class (Boschma et al. 2015). Similarly, occupation space has been mapped with link values derived from co-occurrence patterns of occupations across metropolitan areas (Muneepeerakul et al. 2013; Farinha et al. 2019). More recently, skills space has been mapped with link values derived from the co-occurrence patterns of skills both across occupations (Alabdulkareem et al. 2018) and across metropolitan areas (Shutters and Waters 2020).

As this body of scholarly literature has grown over the last 15 years, regional economic development agencies have simultaneously faced an increasing urgency to grow and protect jobs. In a survey of US elected officials, nearly half indicated that growing jobs was their top priority (Bartik 2003). Similarly, in a 2016 report from the US President’s Office, “finding and acquiring a good job, a quality education, and appropriate training” was listed first among challenges facing Americans (PCAST 2016, p. 8).

Thus, there exists a clear need and opportunity to translate the growing body of network-framed studies in economic geography into applications and tools that can inform regional economic policy.

Here, we explore this opportunity by translating recent findings into a policy-relevant decision support metric and applying it to a regional economy case study. We use recently developed metrics to quantify the skills proximity between occupations, thereby informing policy makers of which occupations could most easily absorb a surplus of workers in occupation X or which occupations could most easily be a source of workers for needed occupation Y. The first of these metrics, called transition potential, was originally created to measure the proximity between a single occupation and the full set of occupations present in a given economy (Muneepeerakul et al. 2013). Here, we use this formulation to instead measure the proximity between a single skill and the full set of skills present in a given occupation. The second metric, originally developed to measure the proximity of a starting economy and a target economy by averaging occupational transition potentials (Shutters et al. 2016), is modified here to average the transition potentials between the skills present in starting occupation and those in a target occupation. We take this aggregated transition potential to be a measure of the proximity of two occupations.

Our approach can be used to inform policy in at least three ways.

  • First, for policy makers seeking to re-employ displaced workers, such as those laid-off when a coal-mine closes, proximity can be used to identify other occupations to which unemployed workers could most easily transition, given the skills profile of their previous occupation.

  • Second, for policy makers facing a critical shortage of workers in a particular occupation, proximity can be used to identify other occupations that are most similar and thus offer the most efficient source of workers that could be retrained to undertake jobs in the high-demand occupation.

  • Third, for policy makers addressing either of the transitions described above, our network-metrics can identify which skills are likely to be the largest obstacles to successful retraining, enabling training program to prioritize curricula options.

In any case, proximity alone should not be used to prioritize new job targets but should be used in conjunction with numerous other factors such as an occupation’s wages, its anticipated future demand, and its fit with local long-term vision. Thus, our network-based measures are meant to augment the suite of information available to decision makers not to replace them.

Application case study: The regional economy of metropolitan Washington, DC

To illustrate how our approach might be used by policy makers, we apply our methodology to the economic situation in the US metropolitan statistical area (MSA) of Washington, DC. This regional economy has been recently impacted simultaneous by two major disruptions affecting its labor force.

First, the COVID-19 pandemic led to a rapid and dramatic decline in the region’s leisure and hospitality employment, which is supported by Washington’s numerous well-known tourist attractions and museums. From April 2019 to April 2020, employment in the MSA’s leisure and hospitality sector decreased from 336,000 in April 2019 to 158,000 in April 2020, a decline of 52.9%. The sector recovered somewhat by December 2020 with employment down 32.5% compared to December 2019 (U.S. Bureau of Labor Statistics 2022).

Second, in November 2018, Amazon announced it would build its second headquarters in metropolitan Washington, DC, and expected to create more than 25,000 technical and administration jobs. Technology workers were already scarce before the announcement, but once Amazon began hiring for computer-related occupations, the availability of qualified workers became scarcer.

Thus, the Washington, DC regional economy faces the dual issues of high unemployment among leisure and hospitality workers and a severe shortage of qualified workers for computer-related occupations, and offers an ideal case for which to demonstrate the applicability of our methodology.

Data and methods

We use two publicly available datasets to conduct our analysis. The first is the O*NET dataset published by the Occupational Information Network. O*NET data decomposes US occupations o into several hundred characteristics which are referred to as “elements” i. For each of 161 elements a value of level li,o (0, 7) is assigned for each occupation, indicating the degree to which the element is required or needed to perform the activities of that occupation (National Center for O*NET Development 2020). We use O*NET version 24.2 to obtain level values for each element-occupation pair. O*Net groups elements into a number of categories such as skills, work activities, abilities, and knowledge. However, it has become conventional in studies using O*NET elements to refer to all elements, regardless of category, simply as skills (e.g., Florida et al. 2012; Kok and Weel 2014; Alabdulkareem et al. 2018; Deming and Noray 2018; Vona et al. 2018; Farinha et al. 2019). We adopt this convention and hereafter refer to all O*NET elements simply as skills.

The second dataset we use is the Occupational Employment and Wage Statistics (OEWS) dataset published by the US Bureau of Labor Statistics (BLS). The OEWS dataset is published annually each May and provides estimates of employment and wages by occupation for various US geographies, including nation, state, and metropolitan statistical area. In this study we use the May 2018 dataset (U.S. Bureau of Labor Statistics 2018).

Note that O*NET uses an 8-digit occupation code while the OEWS reports employment using the more aggregated 6-digit Standard Occupation Coding (SOC) code. Thus, for each a few OEWS occupation there exist multiple corresponding occupations in O*NET data. In those cases, we use simple average to collapse the skill values of the multiple O*NET occupations into a single value for the corresponding OEWS occupation code following previous literature (Shutters and Waters 2020). See Additional files 1 and 2 for more detail. 

We use only employment data for US metropolitan statistical areas (MSAs). MSAs are defined as cohesive regional economic units comprising one or more counties, based primarily on commuting patterns. Note that, while the BLS states that OEWS data is aggregated by MSA, in the six states comprising New England the data are actually aggregated by an alternative geographic unit known as New England City and Town Areas (NECTAs). Hereafter, we adopt the convention of the OEWS data and refer to both MSAs and NECTAs collectively as MSAs.

Constructing the skills network

Given the set of skills i defined by O*NET, we construct a skills network \(\mathcal{G}=\left(\mathcal{N},\mathcal{L}\right),\) with nodes \(\mathcal{N}=\left\{{i}_{1}, {i}_{2},\dots {, i}_{161}\right\}\). The network is complete and undirected, with the matrix of link values X derived from co-occurrence patterns of skills across MSAs as follows. Following Shutters and Waters (2020), we first measure the aggregate level s of each skill i in each MSA m as

$$s_{i,m} = \mathop \sum \limits_{o} l_{i,o} w_{o,m}$$

where \({l}_{i,o}\) is the level of skill i in occupation o taken from the O*NET dataset, and wo,m is the number of workers w employed in occupation o in MSA m taken from the OEWS dataset. To standardize across MSAs of different sizes we take the location quotient or LQ (Isard 1960; Treyz 1993; Hidalgo 2021) of each skill in each MSA as

$$LQ_{i,m} = \frac{{\left( { s_{i,m} /\mathop \sum \nolimits_{i} s_{i,m} } \right)}}{{\left( {\mathop \sum \nolimits_{m} s_{i,m} /\mathop \sum \nolimits_{m} \mathop \sum \nolimits_{i} s_{i,m} } \right)}} .$$

The LQ value, which is also referred to in related literature as Relative Comparative Advantage or RCA (e.g. Alabdulkareem et al. 2018), is then used to designate each skill i as either present or absent in each MSA m such that if LQi,m ≥ 1, i is considered present in m, and if LQi,m < 1, i is considered absent in m. The co-occurrence patterns of present/absent skills across MSAs are then used to quantify an interdependence x between each pair of skills i and j as

$$x_{i,j} = \frac{{P\left[ {LQ_{i,m} > 1,LQ_{j,m} > 1} \right]}}{{P\left[ {LQ_{{i,m^{\prime}}} > 1} \right]P\left[ {LQ_{{j,m^{\prime\prime}}} > 1} \right]}} - 1,$$

where m, m’ and m” are randomly selected MSAs. Thus, interdependence xi,j is the conditional probability that i and j are present in the same MSA divided by the product of their marginal probabilities of being present in a random MSA. The result is a symmetric skill \(\times\) skill matrix X which we take as the values of the links \(\mathcal{L}\) in the skills network \(\mathcal{G}\). Skill pairs that co-locate in MSAs more often than expected by chance have xi,j > 0 while pairs that co-locate less often than expected by chance have xi,j < 0. Self-links are ignored.

Locating individual occupations in the skills network

Having constructed the full US skills network \(\mathcal{G}\), we next identify the “location” of each occupation o within that network as the subnetwork of skills present in o: \({\mathcal{G}}_{o}=({\mathcal{N}}_{o}, {\mathcal{L}}_{o})\) where \({\mathcal{N}}_{o}\) \(\subseteq \mathcal{N}\) includes only the skills i that are present in occupation o, and \({\mathcal{L}}_{o}\subseteq \mathcal{L}\) includes only the links between the members of \({\mathcal{N}}_{o}.\) Nearly every skill has some positive value for every occupation and thus, to make a meaningful determination of which skills are relevant to each occupation, we again use a location quotient to designate each skill i as either present or absent within a particular occupation o as

$$LQ_{i,o} = \frac{{\left( { l_{i,o} w_{o} /\mathop \sum \nolimits_{i} l_{i,o} w_{o} } \right)}}{{\left( {\mathop \sum \nolimits_{o} l_{i,o} w_{o} /\mathop \sum \nolimits_{o} \mathop \sum \nolimits_{i} l_{i,o} w_{o} } \right)}} .$$

where \({l}_{i,o}\) is the level of skill i in occupation o and \({w}_{o}\) is the total employment in occupation o across all MSAs. We take LQi,o ≥ 1 to indicate that skill i is present in occupation o and LQi,o < 1 to indicate that i is absent in o. Thus, \({\mathcal{N}}_{o}\) is the set of all skills in occupation o for which LQi,o ≥ 1. Note that Eq. (4) is the LQ of skills across occupations, while Eq. (2) is the LQ of skills across MSAs.

Measuring proximity between occupations

Having defined the full skills network \(\mathcal{G}\) and the location of each occupation o as a sub-network \({\mathcal{G}}_{o}\), we next adapt a measure of proximity between \({\mathcal{G}}_{o}\) and a single skill i known as the transition potential \({V}_{i}({\mathcal{G}}_{o})\)(Muneepeerakul et al. 2013). If skill i is not present in occupation o \(\left(i\notin {\mathcal{N}}_{o}\right)\), the transition potential of i is calculated as

$$V_{{i \notin {\mathcal{N}}_{o} }} \left( {{\mathcal{G}}_{o} } \right) = 1 - \mathop \prod \limits_{{j \in {\mathcal{N}}_{o} }} \left( {1 - c\left( {1 + x_{i,j} } \right)P\left[ {LQ_{i} > 1} \right]} \right)$$

where i and j are different skills, xi,j is the link value between i and j, LQi is the probability that a skill is present in a random occupation, and c is an arbitrary scaling parameter which we set to c = 0.002 for consistency with Muneepeerakul et al. (2013). For skills i that are present in occupation o, by definition transition potential \({V}_{i\in {\mathcal{N}}_{o}}\left({\mathcal{G}}_{o}\right)= 1.\)

Finally, using an adapted version of the creative jobs index developed in Shutters et al. (2016), we calculate the average transition potential between each skill i present in target occupation T and the subnetwork of starting occupation \({\mathcal{G}}_{S}\). We define this average as \({P}_{S\to T}\), the proximity of starting occupation S to target occupation T:

$$P_{S \to T} = \frac{1}{{n_{T} }}\mathop \sum \limits_{{i \in {\mathcal{N}}_{T} }} V_{i} \left( {{\mathcal{G}}_{S} } \right)^{{1 - \delta_{i} }}$$

where i is a skill present in target occupation T, \({n}_{T}\) is the total number of skills present in T, and δ is a binary indicator such that δ = 1 if \({i\in \mathcal{N}}_{S}\) and δ = 0 if \(i{\notin \mathcal{N}}_{S}\). Thus, δ sets the transition potential to Vi = 1 if skill \({i\in \mathcal{N}}_{T}\) is already present in the starting occupation S.

It is important to note that proximity P is directional, meaning that \({P}_{S\to T}\) does not necessarily equal \({P}_{T\to S}\). To illustrate, consider two occupations: a comprised of 100 skills, and b comprised of 10 skills, all of which are also present in a. A worker transitioning from occupation b to a must acquire 90 new skills while a worker transitioning from a to b already possesses all required skills. Thus, \({P}_{b\to a}\) is likely much lower than \({P}_{a\to b}\).


Skills interdependence

Using Eq. (3) we derive an interdependence value x between each pair of skills i and j. We take the full set of interdependence values to be the matrix of the link values X in our US skills network. Note that interdependence is symmetric so that xi,j = xj,i. Self-links are ignored and xi,i = 0 for all i. Skill pairs having the five highest interdependence values are provided in Table 1. Thus, the skills “Complex Problem Solving” and “Thinking Creatively” have the strongest co-occurrence pattern of any skill pair, meaning that it is quite rare for an MSA to include one of these skills without the other.

Table 1 Five highest skill-pair interdependence values (\({x}_{i,j}\))

Skills-based occupation proximity

We next apply Eqs. (4)–(6) to derive proximity \({P}_{S\to T}\) between each of 566,256 ordered pairs of US occupations, one of which in each case is the starting occupation S and the other is the target occupation T. The distribution of these proximities (Fig. 1) is roughly symmetric with mean = 0.495 and standard deviation = 0.189. Occupational pairs having the five highest and lowest proximities are presented in Table 2. Note that for three occupation pairs \({P}_{S\to T}\)= 1, meaning that the starting occupation in each case already possess all the skills required by the target occupation. Two of these pairs are reciprocals of each other, “Substitute Teachers” and “Teachers and Instructors, All Other, Except Substitute Teachers”, meaning these two occupations are composed of the same set of skills.

Fig. 1
figure 1

Distribution of occupational proximities \({P}_{S\to T}\). The distribution is approximately symmetric with mean = 0.45 and standard deviation = 0.189. Note that each occupation pair will have two proximities depending on which is the starting occupation S and which is the target T

Table 2 Highest five and lowest five occupation proximity values \({P}_{S\to T}\) between occupation pairs

On the other hand, when “Woodworking Machine Setters, Operators, and Tenders, Except Sawing” is the starting occupation S, it already possesses all 59 skills present in the target “Molding, Coremaking, and Casting Machine Setters, Operators, and Tenders, Metal and Plastic” T, meaning that \({P}_{S\to T}=1\). However, the reverse is not true as “Woodworking Machine Setters, Operators, and Tenders, Except Sawing” has 12 additional skills and \({P}_{T\to S} = 0.838\). The lowest proximity value occurs when the starting occupation is “Labor Relations Specialists” and the target is “Molding, Coremaking, and Casting Machine Setters “. Thus, workers seeking to make this transition would likely face one of the most difficult retraining pathways of any transition.

We reiterate three possible uses for our methodology:

  • To identify other occupations to which unemployed workers could most easily transition,

  • To identify other occupations that may most efficiently supply labor for an occupation having a shortage of workers, and

  • To identify which skills that are likely to be significant obstacles to successful retraining, enabling training program to prioritize curricula options.

With these uses in mind we turn to a case study to illustrate how our methodology might be applied in an actual policy setting.

Case study: Washington, DC metropolitan statistical area (MSA)

Due to the nearly simultaneous disruptions of the COVID-19 pandemic fallout and being chosen as the site of Amazon’s second headquarters, the Washington, DC MSA faces the dual issues of high unemployment among leisure and hospitality workers and a severe shortage of qualified workers for computer-related jobs. One possible strategy by policy makers is to retrain some of the surplus of unemployed hospitality workers so that they may fill some of the many vacant computer-related jobs.

For our demonstration case study, we use standard occupations “Waiters and Waitresses” (SOC 35–3031) as the starting occupation and both “Computer User Support Specialists” (SOC 15–1151) and “Computer Network Support Specialists” (SOC 15–1152) as target occupations. Using the network-based proximities of “Waiters and Waitresses” to both targets, we inform potential policy decisions regarding possible retraining pathways. Furthermore, we use proximities of two computer occupations from all others to identify workers besides “Waiters and Waitresses” that may be a quality source of workers for high-demand computer-related jobs.

Assessing targets, given a starting occupation

Given that workers are displaced from starting occupation S = “Waiters and Waitresses”, we compare the proximity from S to two possible target occupations T. When T is “Computer User Support Specialists” \({P}_{S\to T}\)= 0.388, while when T is “Computer Network Support Specialists” \({P}_{S\to T}\)= 0.273. Thus, our proximity suggests that, between the two target occupations, “Waiters and Waitresses” would require less retraining to transition to “Computer User Support Specialists” than to “Computer Network Support Specialists”.

Note, however, that the occupational proximities of both targets in this example are less than the mean \({P}_{S\to T}\) of 0.495. This suggests that transitioning from “Waiters and Waitresses” to either computer-related occupation may be more difficult than transitioning to many other occupations. Thus, we can use our approach to further inform policy makers of target occupations having high proximity as alternatives to computer-related jobs. Table 3 presents five target occupations having the highest proximity to “Waiters and Waitresses” along with the occupations’ 2018 mean annual wages and LQs for the Washington, DC MSA.

Table 3 Occupations most proximate to starting occupation “Waiters and Waitresses”, with mean annual wages and LQ values for the Washington, DC MSA

While Table 3 is insightful, policy makers typically must simultaneously address multiple objectives when evaluating policy options. For example, policy makers seeking to lower unemployment, may also seek to increase average wages, protect against automation, and attract jobs with high growth potential. Thus, we present alternative results for the case of “Waiters and Waitresses” in Washington, DC, but highlight occupations that have a combination of both high proximity and high wages. Table 4 lists the five occupations with highest proximity to “Waiters and Waitresses” that also earn more than $75,000 in annual wages in the Washington, DC MSA. Thus, “Waiters and Waitresses” would likely require more retraining to transition to “Nurse Midwives” than they would to “Orderlies”, but they would earn significantly more in wages. Our approach does not optimize across multiple objectives but instead gives policy makers critical information so they can navigate these tradeoffs more easily.

Table 4 Occupations most proximate to starting occupation “Waiters and Waitresses” having annual wages greater than $75,000 for the Washington, DC MSA

Assessing starting occupation, given a target

In addition to the need to re-employ displace workers, policy makers may face scenarios in which critical target occupations go unfilled in the local economy. These vacancies may, for example, be hindering growth in important industries or preventing the establishment of new industries. One possible policy option to address this situation is to incentivize workers in existing occupations that could be efficiently retrained for the target occupation. Here we use our methodology to identify starting occupations with high proximity to a target occupation. We again use “Computer User Support Specialists” which are in critical demand in the Washington DC, MSA exacerbated by the introduction of a new Amazon headquarters. Table 5 lists the five starting occupations S with highest proximity to “Computer User Support Specialists” T. Thus, in our example, policy makers may wish to inform existing workers in jobs such as “Biological Technicians” and “Information Security Analysts” about career opportunities in computer-related occupations.

Table 5 Starting occupations with five highest proximity values to target occupation “Computer User Support Specialists”

Identifying skills likely to be problematic for a worker transition

To identify skills that may present the largest obstacles to a worker transition, we compare the transition potentials (Eq. 5) of each skill that is missing in a starting occupation S but is required in the target occupation T. Again, we use “Waiters and Waitresses” as the starting occupation S and “Computer User Support Specialists” as the target T. The skills present in each of these occupations are highlighted in Fig. 2 as sets of nodes \({\mathcal{N}}_{S}\) and \({\mathcal{N}}_{T}\) within the full US skills network \(\mathcal{G}\). Links with negative values are excluded for clarity and visualizations were prepared with Pajek using the Kamada-Kawaii algorithm (Mrvar and Batagelj).

Fig. 2
figure 2

Occupation locations of (A) \({\mathcal{N}}_{S}\) and (B) \({\mathcal{N}}_{T}\) within the full skills network \(\mathcal{G}.\) Here the starting occupation S is “Waiters and Waitresses” and the target occupation T is “Computer User Support Specialist”. Skills present in \({\mathcal{N}}_{S}\) and \({\mathcal{N}}_{T}\) are highlighted while those absent are greyed out. Though negative link values are used to calculate occupational proximity \({P}_{S\to T}\), they are excluded from these visualizations, which were rendered in Pajek using the Kamada-Kawaii algorithm

Among skills required to transition from “Waiters and Waitresses” to “Computer User Support Specialists”, those with the highest and lowest transition potentials \({V}_{i\in {\mathcal{N}}_{T}}\left({\mathcal{G}}_{s}\right)\) are shown in Table 6. This ignores skills required in the target occupation that are already present in the starting occupation, meaning \({V}_{i}\left({\mathcal{G}}_{s}\right)=1\). Skills more likely to hinder a transition are “Thinking Creatively” and “Complex Problem Solving”, which have low transition potentials. These skills would likely require more emphasis in a worker retraining program. In contrast, “Waiters and Waitresses” would likely require less training (if any) in “Public Safety and Security” and “Wrist-Finger Speed”. These skills represent capabilities of “Waiters and Waitresses” that could help facilitate a transition to “Computer User Support Specialists”.

Table 6 Skill transition potentials from waiters and waitresses S to computer user support specialists T

Limitations of our approach and future directions

At its most fundamental level, our approach depends on measuring similarity of networks, particularly subnetworks within a parent network. However, there is no universally accepted best method of calculating such measures. In our case the links of those networks have non-binary values including negative values, which complicates measures of similarity. In addition, the nodes of our networks also have non-binary values, though in this study we collapse those values into a binary present or absent designation. Thus, there is much research to be done on the best ways to measure similarity of networks, including those that have link and node values. It is highly likely that improvements to such measures will also improve our ability to use economic networks to inform policy makers.


Here we present a network-based methodology for identifying the proximity of two occupations given the skills present in each. We propose that this methodology may help inform regional economic developers concerned with minimizing labor disruptions due, for instance, to a surplus of unemployed workers or a shortage of workers for needed occupations. We demonstrate the potential utility of this approach through an application to a case study of the US metropolitan area of Washington, DC, which has recently experienced significant economic disruption. We demonstrate, in this case, how policy makers may prioritize targets for re-employment of displaced workers and how they might identify specific skills that require added emphasis in worker-retraining programs. While we certainly would not suggest that our method replace existing approaches to economic development, we believe it does additional and new information, and would thus fit well into the toolbox of a policy maker seeking to efficiently guide a regional economy through a disruption or transformation.

Availability of data and materials

Data used in this study are publicly available from O*NET at and the US Bureau of Labor Statistics at



Bureau of labor statistics


Metropolitan statistical area


North American industrial classification system


Occupational information network


Occupational employment and wage statistics


Standard occupation classifications


New England city and town area


Location quotient


Download references


The authors thank researchers at Indiana University’s Indiana Business Research Center, attendees at the Western Regional Science Association 2021 virtual conference, and three anonymous reviewers for helpful feedback.


This publication was prepared by Keith Waters at George Mason University and Shade T. Shutters at Arizona State University using Federal funds awarded to the Trustees of Indiana University and as a sub-component under award number ED17HDQ3120040 from the US Economic Development Administration, US Department of Commerce. The statements, findings, conclusions, and recommendations are those of the author(s) and do not necessarily reflect the views of the Economic Development Administration or the US Department of Commerce.

Author information

Authors and Affiliations



KW analyzed the data, operationalized the modifications noted, and wrote the paper. STS conceptualized the paper and wrote the paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Keith Waters.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

This file describes how the O*NET data are mapped to BLS data.

Additional file 2:

This file provides the full mapping from O*NET to BLS data.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Waters, K., Shutters, S.T. Skills-approximate occupations: using networks to guide jobs retraining. Appl Netw Sci 7, 43 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: