Skip to main content

Community structure in co-inventor networks affects time to first citation for patents


We have investigated community structure in the co-inventor network of a given cohort of patents and related this structure to the dynamics of how these patents acquire their first citation. A statistically significant difference in the time lag until first citation is linked to whether or not this citation comes from a patent whose listed inventors share membership in the same communities as the inventors of the cited patent. Although the inventor-community structures identified by different community-detection algorithms differ in several aspects, including the community-size distribution, the magnitude of the difference in time to first citation is robustly exhibited. Our work is able to quantify the expected acceleration of knowledge flow within inventor communities and thereby further establishes the utility of network-analysis tools for studying innovation dynamics.

Introduction and motivation

Inventions can be codified in patent applications that, if granted, bestow certain rights to the assignees. Patent documents contain citations to other patents as part of the requirement to acknowledge the state of prior art and to delimit the invention’s legal scope. These citations are understood to represent knowledge flows between inventors and have been used widely to study various economic and social aspects of innovation dynamics (Hall et al. 2002; Breschi and Lissoni 2004; Jaffe and de Rassenfosse 2017).

The degree to which social and working relationships between inventors influence patenting and knowledge propagation in technological-innovation space has been the subject of recent interest (Stolpe 2002; Balconi et al. 2004; Singh 2005; Sorenson et al. 2006; Breschi and Lissoni 2009; Breschi and Lenzi 2016; Tóth et al. 2018). We shed new light on this topic from a network-analysis perspective by studying the effect of community structure in the co-inventor network on patent-citation dynamics. It can be intuitively expected that knowledge about inventions will be transmitted, and thus become available for utilization, faster within groups of inventors that have collaborated before. One of the main aims of our present work is to verify this expectation and quantify the acceleration of knowledge flow through inventor communities. We use the time lag until first citation (Gay et al. 2005; Lee and Sohn 2017) as a proxy measure for the fastest speed at which knowledge about an invention can propagate. This is motivated by the fact that, as in the case of an electrical pulse whose leading edge is the real carrier of information, independent of the rest of its line shape, the time to first citation is more representative of the speed of information flow than any other, aggregate or average, citation measure.

In previous studies, inventor communities were identified using attributes such as geographical co-location (Jaffe et al. 1993; Singh 2005; Breschi and Lenzi 2016) or employment at the same firm (Kogut and Zander 1992; Stolpe 2002; Singh 2005; Breschi and Lissoni 2009). Here our approach is different. We use established community-detection algorithms (Fortunato 2010; Fortunato and Hric 2016) to identify communities of inventors in the co-inventor network that has been constructed by projection from a bipartite inventor-and-patent network. This method can directly reveal communities based on collaborative inventive activity without relying on any externally observed relationship proxies. On the other hand, the algorithmic identification of communities may introduce biases arising from idiosyncrasies inherent to particular detection methods. Our present work also investigates how robustly community effects in patent-citation dynamics can be identified using the generally different community structures obtained by various established detection algorithms.

Community structures in citation networks for scientific articles have been analyzed recently (see, for example, Refs (Lambiotte and Panzarasa 2009; Chen and Redner 2010; Liu et al. 2017)), and opportunities for similar studies on the available large amounts of rich patent-citation data are now also beginning to be realized (Nakai et al. 2018; Gao et al. 2018). Our study offers an example for the type of interesting insights that can be gained by applying network-analysis-based community-detection methods in the important area of technical innovation.

The remainder of this article is structured as follows. In the following “Methods” section, we start by discussing the data used to construct the co-inventor network and the five community-detection algorithms utilized in our work. Basic properties of communities identified on the largest connected component of the co-inventor network by the different algorithms are compared before details are given on how the citation-lag distributions are assembled. The subsequent “Results and discussion” section presents a detailed analysis of the observed distributions for the time to first citation, focusing in particular on how these depend on whether inventors from the citing and the cited patent share membership in one of the algorithmically detected communities. Our “Conclusions” are summarized in the final section. A list of abbreviations used throughout this article is given in Table 1.

Table 1 List of abbreviations


The availability of disambiguated US-patent data (Li et al. 2014) makes it possible to construct the co-inventor network associated with a particular cohort of patents. To be specific, we use patents granted by the US Patent and Trademark Office (USPTO) during 1995-1999 and assigned to one of the following classes from the US Patent Classification System (USPC) (US Patent Classification 2013): 257 (Active solid-state devices), 326 (Electronic digital logic circuitry), or 438 (Semiconductor device manufacturing: process)Footnote 1. The projection of the bi-partite patent-and-inventor network onto inventors yields the co-inventor network (Balconi et al. 2004) associated with this cohort. Going further than using a simple projection as was done in previous work (Balconi et al. 2004; Singh 2005), we introduce edge weights wij reflecting the frequency and intensity of the co-inventing activity (Newman 2001b) between two inventors i and j;

$$ w_{ij} = \sum\limits_{\alpha} \frac{\delta_{i}^{(\alpha)}\, \delta_{j}^{(\alpha)}}{n_{\alpha} - 1} \quad. $$

The sum in Eq. (1) is over all patents in the chosen cohort that list more than one inventor, and \(\delta _{i}^{(\alpha)} = 1\) (\(\delta _{i}^{(\alpha)} = 0\)) if inventor i is (not) among the nα inventors listed on the particular patent α. In the following, we restrict ourselves to considering only the largest connected component (LCC) of the thus-constructed co-inventor network, which can be expected to capture the most relevant aspects of inventors’ connectedness (Newman 2001a; Breschi and Lissoni 2009). Amounting to approximately 31% of the total co-inventor network by number of nodes, the LCC comprises a diverse assortment of inventors as evidenced by the spread of firms their patents are assigned to. For example, IBM, Hitachi LTD, Motorola INC, Kabushiki Kaisha Toshiba, and Texas Instruments represent 14%, 7.2%, 4.6%, 4.1%, and 4.0% of nodes in the LCC, respectively. Other companies each account for portions smaller than 4% of the total in the component. Table 2 provides an overview of relevant summary statistics pertaining to the patent cohort and co-inventor network considered here.

Table 2 Summary statistics of patent data utilized for this study

We have used five established community-detection algorithms to analyze the LCC of the co-inventor network: Greedy (Newman 2004), Louvain (Blondel et al. 2008), Infomap (Rosvall and Bergstrom 2008), Random Walks (Pons and Latapy 2006), and Propagating Labels (Raghavan et al. 2007). Although different algorithms generally yield different community structures, clear similarities are exhibited between the structures obtained by conceptually related approaches such as Greedy and Louvain on the one hand, and Infomap, Random Walks, and Propagating Labels on the other. In particular, we observe the well-known resolution-limit issue (Fortunato and Barthélemy 2007) where the communities delineated by approaches that maximise modularity (Newman and Girvan 2004) (Greedy and Louvain) are typically larger in size and generally subsume the many smaller communities identified by other approaches (Infomap, Random Walks, Propagating Labels). This may be inferred graphically from visualizations of community structures, such as those shown in Fig. 1. A more quantitative comparison is possible based on the size distributions for communities obtained by application of each of the five different detection algorithms to the LCC, given in Fig. 2, and the corresponding community-structure-related summary statistics provided in Table 3. We also use the adjusted Rand index (ARI) (Hubert and Arabie 1985) to measure similarity between community structures generated by different algorithms as well as those from different runs of the same algorithm. ARI scores close to 1 are obtained when comparing the results of multiple runs of the same algorithm on the LCC of the co-inventor network, indicating the generally very good or, for Propagating Labels, at least satisfactory robustness of each method. The ARI value of 0.8 found when comparing the community structures generated from the Greedy and Louvain algorithms attests to their high degree of similarity. Similarity to a much lesser extent is exhibited between the partitionings arising from the Infomap and Propagating-Labels methods, as well as between those of Greedy/Louvain and Random Walks. All other pairwise comparisons yield a very small ARI score of 0.2 and below.

Fig. 1

Community structure within the co-inventor network. The upper (lower) panel indicates communities in the largest connected component of the co-inventor network identified by the Louvain (Infomap) algorithm using different colors (but with some colors being reused in the lower panel due to the overall large number of communities yielded by Infomap). Note how certain groups of communities identified as separate by Infomap are clustered together by Louvain. The size of the circle indicating a node is proportional to the number of edges attached to that node. Images created using Gephi (Bastian et al. 2009)

Fig. 2

Size distributions of inventor communities found by different algorithms. Shown here are distributions for the number of nodes within the communities identified on the largest connected component of the co-inventor network by the Greedy, Louvain, Infomap, Random-Walks, and Propagating-Labels algorithms. Note the different bin widths: 50 (5) nodes in the plots in the first (second, third) row. For Propagating Labels (Random Walks), one community with 262 nodes (twelve communities with 163, 169, 172, 175, 194, 222, 346, 387, 393, 583, 656, and 1 060 nodes, respectively) is (are) outside the displayed range. See also Table 3 for community-structure-related summary statistics

Table 3 Comparison of inventor-community structures found by different algorithms

Having identified inventor communities in the LCC on the co-inventor network, we track the citations acquired by patents from the chosen cohort over a ten-year period from each individual patent’s time of grant. Following the usual convention (Hall et al. 2002; Mehta et al. 2010), we assign the time of citation to be the application date of the citing patent. Hence, citations to patents in the cohort considered here can only originate from patents applied for before 2010. All citation-related data are sourced directly from the USPTO (USPTO Bulk Data Storage System 2016). For the purpose of the present work, we specifically consider the time Δt1 elapsed after each patent’s time of grant until it acquires its first citation (Gay et al. 2005; Lee and Sohn 2017) and determine this citation to be either a self-citation, an in-community citation, or an out-of-community citation based on the previously obtained community structure. Here a self-citation occurs if any of the inventors listed on the citing patent is also listed as an inventor on the cited patent, which is the case independently of any community structure. An in-community citation is a non-self-citation from a patent where at least one of the listed inventors shares membership in one of the algorithmically identified communities with at least one of the inventors listed on the cited patent. An out-of-community citation is from a patent whose listed inventors belong to different communities than the inventors of the cited patent. To focus most precisely on how community structure affects the knowledge flow through the co-inventor network, we only consider citations from patents that have at least one of their listed inventors belonging to the LCC of the co-inventor network defined via the cohort of cited patentsFootnote 2. Table 2 provides citation-related summary statistics for the analyzed patent cohort.

Distributions of time lags Δt1 to first citation obtained for each subset of citation type (self, in-community, and out-of-community) are juxtaposed in Fig. 3. Their distinctive properties are discussed in greater detail in the next section. Note that, because the time of citation is the application date of the citing patent, and the time lag Δt1 is measured from the grant date of the cited patent, negative values of Δt1 are possible and, in fact, occur quite frequently. That patents have often already accrued citations by the time of grant from other patents that were applied for before that date but granted afterwards is a well-known feature of patent-citation dynamics (Mehta et al. 2010).

Fig. 3

Time to first citation differentiated according to origin of the first citation. The empirically obtained distributions of the time lag Δt1 until first citation are shown for the three different categories of first citations considered in this work: self-citations, in-community citations, and out-of-community citations. Results shown here for the latter two types of citation have been obtained based on the community structure identified by the Infomap algorithm on the largest connected component of the co-inventor network. The bin width is 2 months

Results and discussion

As is apparent from the examples shown in Fig. 3, distributions of time lags to first citation are generally broad and skewed. We find that log-normal distributions provide a successful fit to their line shapes, except for the distinctive peak for the Δt1=0 bin exhibited by the time-lag distribution of first citations that are self-citations. Further research is needed to elucidate the origin of the inflated probability for a zero time lag for self-citations; we can only speculate at this point that it could be the result of larger firms’ patenting strategies.

The analysis of time-lag distributions for in-community vs. out-of-community first citations enables us to discuss the influence of community structure on citation dynamics, for each set of communities identified by the five different algorithms used in this work. Specifically, we consider distributional averages (mean and median values), as well as the most probable value (mode). Table 4 gives the results obtained for these by two different methods: direct calculation using the empirical data for the time-lag distributions, and the values derived from parameters of the fitted log-normal distributions. For the mean and median values of time lags for in-community first citations, there is excellent agreement, within uncertainties, between the two approaches for all five community structures considered. Similarly good agreement exists between the medians and modes of time lags for out-of-community first citations. The disagreement between the modes calculated directly and from the log-normal fit for in-community first-citation time lags obtained based on the Infomap and Propagating Labels algorithms is due to the slightly inflated probability for the Δt1=0 bin exhibited in these cases. See Figs. 3 and 4. The magnitude of the inflated peak at Δt1=0 for in-community first-citation time lags is generally comparable to the size of statistical fluctuations and overall much smaller than that observed in the time-lag distribution of first self-citations. The consistently higher mean obtained for the out-of-community first-citation time lag using the raw data as compared with that derived from the log-normal-distribution fit parameters can be traced to the fact that the fit systematically underestimates probabilities in the tail of the time-lag distribution for this type of citation. Overall, the magnitude for the means and medians found in this work for the first-citation time lag agrees very well with that reported previously in the literature (Trajtenberg 1990; Gay et al. 2005; Fisch et al. 2017; Lee and Sohn 2017).

Fig. 4

In-community first-citation time lag distributions for different community structures. The distributions of the time lag Δt1 until first citation when this is an in-community citation are shown for the different community structures on the largest connected component of the co-inventor network obtained by the Greedy, Louvain Random-Walks, and Propagating-Labels methods

Table 4 Community-related differences in the time lag Δt1 until first citation

The specific community structure determined for the co-inventor network is observed to depend sensitively on the employed algorithm. The Greedy and Louvain algorithms yield a comparatively small number of larger communities, whereas the Infomap and Propagating-Labels algorithms identify mostly much smaller communities. The community structure obtained using Random Walks lies somewhat inbetween these two extremes. See Fig. 2 and Table 3. Interestingly, the distributional properties of the time lag for in-community first citations, and even more so those pertaining to out-of-community first citations, are found to be quite similar irrespective of the significant differences between the underlying inventor-community structures. Figure 4 shows the distributions of time lags for in-community citations based on the community structures obtained from running the Greedy, Louvain, Random-Walks, and Propagating-Labels algorithms on the LCC of the co-inventor network. Corresponding results for the community structure identified by the Infomap algorithm are given in Fig. 3. The distributions of the time lag for out-of-community first citations obtained based on the communities identified by the five different algorithms are visually barely distinguishable and therefore not shown. This suggests that algorithms producing larger communities have generally agglomerated parts of the co-inventor network that are separate communities as far as citing behavior is concerned. Hence, our study provides another real-world verification of the resolution limit (Fortunato and Barthélemy 2007) associated with algorithms based on optimizing modularity.

Comparison of the median and mean values for the time lags to first citation found for the in-community and out-of-community types, respectively, shows systematic differences. See Table 4. In particular, the median time lags to in-community first citations are about 2-3 months shorter than the corresponding medians for out-of-community first citations. The difference between the means of the time lag for in-community and out-of-community first citations is about 2-5 months. Given that we have analyzed time lags to first citation for patents whose inventors are all connected by prior co-authorship of patents, this indicates a strong influence of community structure on patent-citation dynamics and the associated knowledge flow. Further comparison with the mean, median, and modal values found for the time lag to first citations that are self-citations is very instructive. See Table 5. The mean and median calculated using the raw data, including the inflated peak at Δt1=0, are found to be essentially the same as for in-community citations. Hence, on average, in-community first citations and first-self-citations occur after similar time periods that are, again on average, shorter by about 3 months than the time lag for out-of-community first citations. However, the distributions of time lags for the in-community and self types of first citations differ significantly. This can be illustrated by analyzing the adjusted distribution of time lags for self-citations where the inflated probability at Δt1=0 has been replaced with an interpolated value, or with the peak removed entirely. The mean, median and modal values found for this modified distribution agree almost exactly with those found for out-of-community citations. See again Table 5. This interesting combination of properties invites further investigation. A tentative speculation could be made that both the out-of-community type of first citation and the first-self-citations outside the Δt1=0 peak originate largely from examiners, whereas the in-community type and the self-citations from the inflated Δt1=0 peak are generally made by inventors themselves. As inventor and examiner citations have been separately identified on USPTO patents granted since 2001, a follow-up study utilizing a large-enough cohort of post-2001 patents with sufficiently long time window for citation accrual should soon be possible.

Table 5 Distributional properties of the time lag until first citation when this is a self-citation

We employed Welch’s t-test (Welch 1947) to establish whether the difference between the means of the citation lag for in-community and out-of-community citations is statistically significant. Results are summarized in Table 4. The t-scores ranging between 6.6 and 8.7 indicate that the differences in the means of citation lags are statistically extremely significant. Although the t-test is expected to yield reliable scores for non-normal distributions (Lumley et al. 2002), we also applied it to the, to a good approximation normal, distribution of the log of the time lag to first citation (after shifting the latter by a constant to eliminate negative values). Also in this case, similarly large t-scores reject the possibility of the means being equal with high confidence.

The high t-values generated in our application of the Welch test are likely a consequence of the very large sample sizes (1415 data points from in-community citations, 12 272 from out-of-community citations) that increase sensitivity to small differences. To illustrate the effect sample size has on the outcome of the t-test, we selected a random and independently chosen subset of 300 citations from each of the in-community and out-of-community datasets and performed Welch’s t-test on these small samples. This was repeated 500 times to obtain averages of such t-values, which are also listed in Table 4. In the case of the Infomap-generated community partition, the mean t-value resulting from this procedure is 2.9, along with 83% of the small-sample simulations finding differences between the distributions’ means to be significant to the 5% α level. This indicates that sample sizes of at least 300 first citations are needed to detect with sufficient confidence the time delay reported here between those originating from within and outside of co-inventor communities.

As further proof of the inventor-community-related cause of acceleration in patent-citation dynamics, we performed the following control experiment. Starting with the Infomap-generated community partition of the LCC, inventors were randomly re-assigned to communities within this given structure. As a result, the high-level morphology (number of communities and their individual sizes) of the partition was left unchanged, but the groupings of inventors within each community became completely random, as indicated by the ARI value of 0.0002 obtained when comparing the initial and randomized partitions. Repeating the analysis of first citations, we observe that the total number of in-community citations occurring in the randomized community structure has dropped precipitously to 129, i.e., about 10 % of the in-community citations present in the original structure. Furthermore, the mean (median) time to a first in-community citation in the randomized structure is determined to be 23.9 months (18 months), which is not significantly earlier anymore than the corresponding value of 24.4 months (19 months) found for the out-of-community first citations. The Welch’s t-test score of 0.24 also indicates that there is no significant difference between the means of the distributions of times to first in-community and out-of-community citations in the situation with randomized inventor assignment to communities. The significantly reduced total number of in-community citations, and the disappearing difference between mean times to first citations that originate from within and outside of communities, both convincingly indicate that the originally established inventor communities are indeed the platforms for accelerated patent-citation dynamics.

We close this section by discussing several confounding variables whose influence may weaken our inference of a general non-trivial community-structure effect on citation dynamics. Inventor team size: All other things being equal, inventors working in larger teams will have a greater chance of acquiring in-community first citations to their patents, which would also be likely to occur more quickly. However, the scope for this trivial mechanism causing the observed effect is severely limited by the fact that large inventor teams are actually quite rare. See, e.g., data presented in Ref. Crescenzi et al. (2016). The average community size found in our work certainly exceeds the average inventor-team size given for the period 1995-1999 in Fig. 4 of that article. Firm-level associations: Certain inventor communities may just be a reflection of the inventors’ employment at the same firm and, for their case, intra-firm information channels acting in parallel to prior co-invention activity may drive the observed acceleration in the in-community citation dynamics. However, for large firms, especially those with multiple geographically separated R&D centers, this alternative mechanism could be ineffective. Disentangling trivial firm-association effects from knowledge flows established via real inventor collaboration would be an interesting direction for future research. Niche-technology associations: The patent cohort studied in this work relates to the broad and technologically crowded semiconductor industry, and our detected communities may correlate with very specific technology types. The fact that citations are likely to come first from inventors in the same community could then arise simply because inventors that are working in the same industrial niche are likely to build on each others’ technological advances before inventors working in technologically more distant fields. To clarify this issue, the distributions of technology specializations across the respective in-community and out-of-community citing-patent cohorts would need to be studied in greater detail using a suitable proxy measure for technological similarity that could be defined either within (Trajtenberg et al. 1997) or beyond (Kay et al. 2014) existing classifications schemes.


We have investigated community structure on the co-inventor network associated with a particular cohort of USPTO patents where edge weights reflect the frequency of inventors’ collaborative patenting activity. Five established community-detection algorithms (Greedy, Louvain, Infomap, Random Walks, and Propagating Labels) were deployed to identify communities on this network’s largest connected component. The sizes and numbers of communities found by the different algorithms varied, with some similarities exhibited by algorithms using related methodology. Table 3 and Fig. 2 provide details enabling a quantitative comparison between the properties of the algorithmically detected community structures.

To investigate the effect of inventor communities on patent-citation dynamics, we analyzed the time lag to the first citation received by the patents associated with inventors from the largest connected component of the co-inventor network. Only citations originating from patents co-authored by at least one of the inventors from the largest connected component were counted for the present study. Three different types of first citation were distinguished: self-citations, in-community (non-self-)citations, and out-of-community citations. The distributions of time lags for each type of first citation were observed to have distinctive properties. Figure 3 shows results obtained based on the community structure found by the Infomap algorithm. The mean, median, and modal values for each type of distribution were determined to enable a quantitative comparison of the speeds of knowledge flow within and between different inventor communities. Results for all community structures investigated in the present work are summarized in Table 4. The median time delay to first citation for the out-of-community type turns out to be typically 3 months longer than for the in-community type. Self-citations and in-community-type citations have approximately the same median time lag, even though the distributions of time lags for these two types of first citation are markedly different. The difference between the mean time lag observed for out-of-community and in-community first citations is generally even larger than that found for the corresponding median values. Although the communities identified by the different detection algorithms utilized in this work differ in some detail, the observed influence of community structure on patent-citation dynamics was found to agree closely, even on a quantitative level. Thus our results provide a rather general quantification for the accelerated knowledge flow through inventor communities formed via collaborative patenting. Furthermore, the observation that association with distinct inventor communities based on previous co-authorship on patents results in faster citation of an inventor’s later patents by members of that community provides strong further evidence in support of the fundamental importance of such collaboration-based social connections (Singh 2005; Agrawal et al. 2006; Breschi and Lissoni 2009; Breschi and Lenzi 2016) that has not always been able to be clearly observed (Stolpe 2002).

Our focus on the time lag to patents’ first citation was motivated by the expectation that this quantity will likely be the best proxy measure for a real difference in time scales for knowledge propagation within and outside of co-inventor communities. It would be interesting to also compare the mean time lags between later (i.e., second, third, …nth) in-community and out-of-community citations. Such a study should be able to observe how the in-community advantage for knowing earlier about an invention diminishes over its repeated utilization.

The results of our work point the way to other interesting directions for future research. For example, analyzing the characteristics of the algorithmically identified inventor communities could yield useful information regarding the structure of effective innovation teams, extending previous work that focused only on direct collaborations between inventors (Crescenzi et al. 2016). Studying the dependence of the observed acceleration of information flow through inventor communities on the field and type of inventions may yield a measure for the speed at which the knowledge frontier moves in different parts of innovation space. Community-detection methods could also be deployed to elucidate relationships shaping innovation activity beyond the network of inventors, e.g., on the level of firms (Schilling and Phelps 2007) or other organizations. Thus opportunities abound for the useful application of modern network-analysis tools to innovation economics (Acemoglu et al. 2016) and related social-science studies (Hidalgo 2016).


  1. 1.

    The objectives of a particular investigation will generally determine the choice of cohort. Ours is motivated by three main needs: (i) to leave a sufficiently long time window for citation accrual, (ii) to have a sufficiently large data set to reduce statistical noise, and (iii) to include patents from similar fields that are granted around the same time to minimize variations in the inventors’ work environment.

  2. 2.

    Excluding citations by inventors outside the LCC amounts to neglecting trivial out-of-community citations that would likely influence the time-lag distribution for this type of citation mostly by shifting weight to larger Δt1 (Singh 2005; Sorenson et al. 2006; Breschi and Lissoni 2009). If at all relevant, this could only further accentuate the differences between characteristic values (mean, median, and mode) for the time lag to first citation found for in-community and out-of-community citations.

  3. 3.

    abinned into 2-months-wide bins as shown in Figs. 3 and 4


  1. Acemoglu, D, Akcigit U, Kerr WR (2016) Innovation network. Proc Natl Acad Sci USA 113:11483–11488.

    Article  Google Scholar 

  2. Agrawal, A, Cockburn I, McHale J (2006) Gone but not forgotten: knowledge flows, labor mobility, and enduring social relationships. J Econ Geogr 6:571–591.

    Article  Google Scholar 

  3. Balconi, M, Breschi S, Lissoni F (2004) Networks of inventors and the role of academia: an exploration of Italian patent data. Res Policy 33:127–145.

    Article  Google Scholar 

  4. Bastian, M, Heymann S, Jacomy M (2009) Gephi: An open source software for exploring and manipulating networks. In: Adar E, Hurst M, Finin T, Glance N, Nicolov N, Tseng B (eds)Proceedings of the Third International AAAI Conference on Weblogs and Social Media, 361–362.. AAAI Press.

  5. Blondel, VD, Guillaume J, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech:10008.

    Article  Google Scholar 

  6. Breschi, S, Lenzi C (2016) Co-invention networks and inventive productivity in US cities. J Urban Econ 92:66–75.

    Article  Google Scholar 

  7. Breschi, S, Lissoni F (2004) Knowledge networks from patent data: Methodological issues and research targets. In: Moed HF, Glänzel W, Schmoch U (eds)Handbook of Quantitative Science and Technology Research, 613–643.. Kluwer, Dordrecht.

    Google Scholar 

  8. Breschi, S, Lissoni F (2009) Mobility of skilled workers and co-invention networks: an anatomy of localized knowledge flows. J Econ Geogr 9:439–468.

    Article  Google Scholar 

  9. Chen, P, Redner S (2010) Community structure of the physical review citation network. J Informetrics 4:278–290.

    Article  Google Scholar 

  10. Crescenzi, R, Nathan M, Rodríguez-Pose A (2016) Do inventors talk to strangers? On proximity and collaborative knowledge creation. Res Policy 45:177–194.

    Article  Google Scholar 

  11. Fisch, C, Sandner P, Regner L (2017) The value of Chinese patents: An empirical investigation of citation lags. China Econ Rev 45.

    Article  Google Scholar 

  12. Fortunato, S (2010) Community detection in graphs. Phys Rep 486:75–174.

    ADS  MathSciNet  Article  Google Scholar 

  13. Fortunato, S, Barthélemy M (2007) Resolution limit in community detection. Proc Natl Acad Sci USA 104:36–41.

    ADS  Article  Google Scholar 

  14. Fortunato, S, Hric D (2016) Community detection in networks: A user guide. Phys Rep 659:1–44.

    ADS  MathSciNet  Article  Google Scholar 

  15. Gao, Y, Zhu Z, Kali R, Riccaboni M (2018) Community evolution in patent networks: technological change and network dynamics. Appl Netw Sci 3:26.

    Article  Google Scholar 

  16. Gay, C, Le Bas C, Patel P, Touach K (2005) The determinants of patent citations: an empirical analysis of French and British patents in the US. Econ Innov New Technol 14:339–350.

    Article  Google Scholar 

  17. Hall, BH, Jaffe AB, Trajtenberg M (2002) The NBER patent citations data file: Lessons, insights and methodological tools. In: Jaffe AB Trajtenberg M (eds)Patents, Citations, and Innovations: A Window on the Knowledge Economy, 403–460.. MIT Press, Cambridge, MA.

    Google Scholar 

  18. Hidalgo, CA (2016) Disconnected, fragmented, or united? a trans-disciplinary review of network science. Appl Netw Sci 1:6.

    Article  Google Scholar 

  19. Hubert, L, Arabie P (1985) Comparing partitions. J Classif 2:193–218.

    Article  Google Scholar 

  20. Jaffe, AB, de Rassenfosse G (2017) Patent citation data in social science research: Overview and best practices. J Assoc Inf Sci Technol 68:1360–1374.

    Article  Google Scholar 

  21. Jaffe, AB, Trajtenberg M, Henderson R (1993) Geographic localization of knowledge spillovers as evidenced by patent citations. Quart J Econ 108:577–598.

    Article  Google Scholar 

  22. Kay, L, Newman N, Youtie J, Porter AL, Rafols I (2014) Patent overlay mapping: Visualizing technological distance. J Assoc Inf Sci Technol 65:2432–2443.

    Article  Google Scholar 

  23. Kogut, B, Zander U (1992) Knowledge of the firm, combinative capabilities, and the replication of technology. Organ Sci 3:383–397.

    Article  Google Scholar 

  24. Lambiotte, R, Panzarasa P (2009) Communities, knowledge creation, and information diffusion. J Informetrics 3:180–190.

    Article  Google Scholar 

  25. Lee, J, Sohn SY (2017) What makes the first forward citation of a patent occur earlier?. Scientometrics 113:279–298.

    Article  Google Scholar 

  26. Li, G-C, Lai R, D’Amour A, Doolin DM, Sun Y, Torvik VI, Yu AZ, Fleming L (2014) Disambiguation and co-authorship networks of the US, patent inventor database (1975–2010). Res Policy 43:941–955.

    Article  Google Scholar 

  27. Liu, W, Nanetti A, Cheong SA (2017) Knowledge evolution in physics research: An analysis of bibliographic coupling networks. PLoS ONE 12:0184821.

    Google Scholar 

  28. Lumley, T, Diehr P, Emerson S, Chen L (2002) The importance of the normality assumption in large public health data sets. Annu Rev Public Health 23:151–169.

    Article  Google Scholar 

  29. Mehta, A, Rysman M, Simcoe T (2010) Identifying the age profile of patent citations: new estimates of knowledge diffusion. J Appl Econ 25:1179–1204.

    MathSciNet  Article  Google Scholar 

  30. Nakai, K, Nonaka H, Hentona A, Kanai Y, Sakumoto T, Kataoka S, Carreón ECA, Hiraoka T (2018) Community detection and growth potential prediction using the Stochastic Block Model and the long short-term memory from patent citation networks In: 2018 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), 1884–1888.

  31. Newman, MEJ (2001a) Scientific collaboration networks. I. Network construction and fundamental results. Phys Rev E 64:016131.

  32. Newman, MEJ (2001b) Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys Rev E 64:016132.

  33. Newman, MEJ (2004) Fast algorithm for detecting community structure in networks. Phys Rev E 69:066133.

  34. Newman, MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113.

    ADS  Article  Google Scholar 

  35. Pons, P, Latapy M (2006) Computing communities in large networks using random walks. J Graph Algorithms Appl 10:191–218.

    MathSciNet  Article  Google Scholar 

  36. Raghavan, UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76:036106.

    ADS  Article  Google Scholar 

  37. Rosvall, M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci USA 105:1118–1123.

    ADS  Article  Google Scholar 

  38. Schilling, MA, Phelps CC (2007) Interfirm collaboration networks: The impact of large-scale network structure on firm innovation. Manage Sci 53:1113–1126.

    Article  Google Scholar 

  39. Singh, J (2005) Collaborative networks as determinants of knowledge diffusion patterns. Manage Sci 51:756–770.

    Article  Google Scholar 

  40. Sorenson, O, Rivkin JW, Fleming L (2006) Complexity, networks and knowledge flow. Res Policy 35:994–1017.

    Article  Google Scholar 

  41. Stolpe, M (2002) Determinants of knowledge diffusion as evidenced in patent data: the case of liquid crystal display technology. Res Policy 31:1181–1198.

    Article  Google Scholar 

  42. Tóth, G, Juhász S, Elekes Z, Lengyel B (2018) Inventor collaboration and its persistence across European regions. preprint arXiv:1807.07637.

  43. Trajtenberg, M (1990) A penny for your quotes: patent citations and the value of innovations. RAND J Econom 21:172–187.

    Article  Google Scholar 

  44. Trajtenberg, M, Henderson R, Jaffe A (1997) University versus corporate patents: A window on the basicness of invention. Econ Innov New Technol 5:19–50.

    Article  Google Scholar 

  45. US Patent Classification (2013). Accessed 8 Nov 2018.

  46. USPTO Bulk Data Storage System (2016) Patent Technology Monitoring Team - Custom Bibliographic Patent Data Extract DVD-ROM Available for Download (2014). Version of 2016-09-21. Accessed 8 Nov 2018.

  47. Welch, BL (1947) The generalization of ‘Student’s’ problem when several different population variances are involved. Biometrika 34:28–35.

    MathSciNet  MATH  Google Scholar 

Download references


The authors obtained useful insights into companies’ patenting practices from conversations with L. Colombo and F. Natali, discussed community-detection algorithms with S. Marsland and D. O’Neale, and received advice on statistical testing from I. Doonan.


The work of WD was supported by a Victoria University of Wellington Faculty of Science Strategic Grant (GMS grant no. 217470). Funding from the New Zealand Tertiary Education Commission’s Centres of Research Excellence Fund via Te Pūnaha Matatini is gratefully acknowledged by KWH, MG and UZ.

Availability of data and materials

The datasets supporting the conclusions of this article are available in the Zenodo repository,

Author information




KWH, MG and UZ designed the research. WD constructed the co-inventor network and performed the data analysis, with assistance from KWH. All authors contributed to analyzing the results. UZ wrote the paper with input from all authors. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Ulrich Zülicke.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Doonan, W., Higham, K.W., Governale, M. et al. Community structure in co-inventor networks affects time to first citation for patents. Appl Netw Sci 4, 17 (2019).

Download citation


  • Patent citations
  • Community detection
  • Knowledge propagation