- Open Access
Local bow-tie structure of the web
© The Author(s) 2019
- Received: 12 November 2018
- Accepted: 2 April 2019
- Published: 18 April 2019
Social networks often has the graph structure of giant strongly connected component (GSCC) and its upstream and downstream portions (IN and OUT), known as a bow-tie structure since a pioneering study on the World Wide Web (WWW). GSCC, on the other hand, has community structure, namely tightly knitted clusters, reflecting how the networks developed in time. By using our visualization of enhanced multidimensional scaling (MDS) and force-directed graph drawing for large and directed graphs, we discovered that a bow-tie in the WWW usually has clusters, which are locally-located mini bow-ties that are loosely connected to each other, resulting in a formation of GSCC as a whole. To quantify the mutual connectivity among such local bow-tie, we define a quantity to measure how a local bow-tie connects to others in comparison with random graphs. We found that there are striking difference between the WWW and other social and artificial networks including a million firms’ nationwide supply chain network in Japan and thousands of symbols’ dependency in the programming language of Emacs LISP, in which a global bow-tie exits. Presumably the difference comes from a self-similar structure and development of the WWW speculated by others.
Two decades studies of complex networks have revealed some aspects of complex systems. Up to now, many network indices (centralities) have been proposed to understand complex networks quantitatively. Many new methods for communities extraction have been also proposed and clarified the inner structure of complex networks. However, there is only one notion characterizing an overall structure of complex networks. That notion is a “bow-tie" structure discovered by Broder et al. (2000) when they investigated two AltaVista crawls in 1999 with over 200 million pages and 1.5 billion links (Broder et al. 2000). The bow-tie structure has been actively examined in the information science, especially in the study of the topology of world wide web. However, the bow-tie structure takes a vital role in a metabolic network in biology (for example, see Refs. (Csete and Doyle 2004; Kitano 2004; Zhao et al. 2006; Kawakami et al. 2016)).
Components of some web graphs
Zhu et al. (2008) investigated the Chinese web in 2006 from the viewpoint of a hierarchy of three levels, i.e., page level (830 million pages), host level (17 million hosts), and domain level (0.8 million domains) (Zhu et al. 2008). They found that the page level web has “tea pod" structure (with a large size of the GSCC, a medium size of IN, and small size of OUT). They also found that the Chinese web becomes increasingly close to the daisy structure when the aggregation level is increased from the page level to the host levels and the domain level. This fact means that the absence of self-similarity between page level and host/domain levels.
Meusel et al. (2014, 2015) investigated the publicly accessible crawl of the web gathered by the Common Crawl Foundation in 2012 (CC12) (Meusel et al. 2014; 2015). The CC12 is available to the public outside companies such as Google, Yahoo!, Yandex, and Microsoft, and contains over 3.5 billion web pages and 128.7 billion links. They analyzed the CC12 on three different levels of aggregation: page, host, and pay-level domain (PLD) (one "dot level" above public suffixes). They also obtained almost the same results founded by Zhu et al. (2008). Thus there is no self-similarity between page level and host/PDL levels.
When we study complex networks, it is rarely possible to get a complete set of nodes and edges of the networks that we are studying. Thus, we tend to abandon clarifying the gross structure of the network. However, if the network has self-similar property, it is anticipated that we can elucidate the overall structure by studying subgraphs. Although previous studies explained above suggest the absence of self-similarity, Dill et al. (2002) found that the web exhibits self-similarity, i.e., each thematically unified region (for instances, pages on a site or pages about a topic) plays the same characteristics as the web at large (Dill et al. 2002). This finding means that the thematically self-similarity of the web. Thus, we can elucidate the overall structure from thematically subgraphs, i.e., communities.
The purpose of this paper, therefore, is to extract communities of networks by using a modern algorithm of community extraction, such as the map equation, and show the self-similar property, i.e., the bow-tie structure of each community, by using visualization techniques and introducing the new measure quantifying the local bow-tie structure. In the next section, we explain the data set used in this paper. We summarize methods of network analysis used in this article in “Methods” section. “Analysis and results” section is the main part of this paper, and we show the self-similar property, i.e., the bow-tie structures of each community in the web. The last section is devoted to the conclusion and discussion.
Basic statistics of the web graphs data
Web graph from Google
Web graph from Stanford.edu
Web graph of Berkeley and Stanford
Web graph from Notre Dame
Basic statistics of Japanese production network and Emacs24 lisp network
Production network of Japan
LISP symbol relations
In order to visualize a large-scale structure of directed graph, we shall use our method of graph drawing, which is based on the so-called force-directed graph drawing (Fruchterman and Reingold 1991) with a nice property so as to calculate a layout for a large and directed graph in an aesthetically-pleasing way. Let us briefly explain our algorithm (see (Fujita et al. 2016) for more details).
The algorithm has two steps of calculation. The first step is to determine the initial positions of nodes in a graph by using a multi-dimensional scaling (MDS) algorithm. We assume the initial positions are given in a two-dimensional Euclidean space in this paper. In the second step, we perform a physical simulation, in which nodes have electric charges (with a same sign, say plus) with repulsive forces between pairs of nodes by Coulomb’s law, while edges are regarded as “springs” exerting attractive forces between adjacent nodes by Hooke’s law. In addition, each edge has a magnetism and is aligned with a globally given magnetic field (see (Sugiyama and Misue 1995) for example). Frictional forces are additionally given to nodes so that the physical system will be relaxed into a quasi-stable configuration, which is the visualization result that we shall use.
We call the algorithm DMDS (direction-aware MDS), because it can capture the direction of edges appropriately in the following manner.
We use the defintion (2) to determine similarity (or dissimilarity) between nodes in the standard calculation of MDS.
Thus we can determine the initial configuration by using the MDS in our first step of graph layout. Practically, the computational cost of MDS, both in space and time (memory usage and computational time), can be reduced by using a sophisticated method invented by (Brandes and Pich 2006).
Also, because the calculation of the Coulomb interaction can be naively proportional to the square of the number of nodes, one can employ a well-known algorithm developed in astronomy (Barnes and Hut 1986) to reduce the computational time significantly. Additionally, we used a recent technique of Phantom-GRAPE (Ataru Tanikawa 2012) to accelerate the computation.
For our purpose, we need a community detection algorithm that can be applied to a large and directed graph. We shall use the well-known Infomap algorithm, first proposed by (Rosvall and Bergstrom 2008), which optimizes the so-called map equation that is a flow-based method and operates on dynamics on the network. The algorithm works for directed links and can cluster tightly interconnected nodes into modules (two-level clustering) or the optimal number of nested modules (multi-level clustering) in a hierarchical way (Martin Rosvall 2011). See the original papers and references therein. We employ the code given by the original authors who invented the algorithm.
To examine the graph structure of giant strongly connected component (GSCC) and its upstream and downstream portions (IN and OUT), we used a well-known algorithm of graph search in the following way.
Any pair of vertices that belong to the set Fw(A)∩Bk(A) are connected by some path, as one can easily prove.
We utilize the well-known algorithm of breadth-first search to calculate (3), and then identify the GSCC by finding the largest SCC found in the search.
In this section we present our analysis and claim a remark on the feature of community structure which is particular to the network of the World Wide Web.
Visualization and qualitative analysis
If you see see the upper right inset, we can obviously see that the network has no bow-tie structure as its global feature. The network is a union of tightly connected communities, within which local bow-tie structure is found. These localized bow-ties are actually communities, which we can see in the lower left inset. To look into the communities we placed lower right inset to show several of the communities, where we can see each communities have their own bow-ties.
Quantitative identification of locality
Summary of the community detection using the Infomap algorythm for the Google-web network and the Japanese production network, showing the number of communities, the modularity of the GSCC corresponding to the Infomap partition, and the ratio of the number of the GSCC nodes in the subnetworks to that in the original network
Japanese production network
Next, let us further dig into the communities and see how those sub-networks are linked to other part of the network.
For the purpose of finding the connection characteristic, we developed bow-tie modularity index as follows to see the “IN” and “OUT” segments’ acting.
The bow-tie modularity index is defined to each subgraph. G be a graph and Gp be a subgraph of G. Let Out(Gp),SCC(Gp),In(Gp) be the bow-tie segment (if any) of the subgraph Gp.
Let Bf(Gp) be a set of links that bridges Out(Gp) and outside of Gp in forward (from Out(Gp) view point) direction. Likewise Bb(Gp) be a set of links that bridges In(Gp) and G−Gpbackward. We put letter f (b) on Bf (Bb) to signify that it is a “forward” (or “backward”) bridge.
Intuitively, Eq. 6 measures the ratio of outside connectivity through IN or OUT segment relative to the inside connectivity of the bow-tie. If this index is small, the bow-tie of concerning subgraph (or community) is relatively independent to the external segment.
Average values of the locality index
web are respectively.
It is known the community of Japanese production network is closely related to the industrial sectors (see (Chakraborty et al.)). Industrial sector has definite economic role within the production network. That particular role gives appropriate niche to the collection of nodes within the whole economic network. For example, wholesale sector is a customer of the manufacturing sector and a supplier of the retail sector. As we have seen in the Introduction section, the In and OUT segment of bow-tie structure works as a interface to the external of the network. Consequently IN or OUT segment of wholesale sector should be tightly connected to the outside, which is manufacturing and retail sector’s community. The meaning of “external” depends on the network. For a production sub-network, the external existence is either 1) other part of the production network of that segment or 2) economic agent located outside of production network (such as consumers or governmental organizations).
In case of the web network, IN or OUT part have less need to be connected to other part of the network. Maybe viewer comes directly to the IN segment rather than by following link from other web contents, although we have no way to check this supposition so far.
Bow-tie structure was first proposed almost twenty years ago as the overall feature of the World Wide Web. Since then this structure served several different purposes including the connectivity evaluation of directional network, dividing network into disjoint parts using connectivity information only, estimate or analyze robustness of the network and overall network feature analysis.
In this study we discovered that the bow-tie structure of web network, upon which the structure was originally proposed as a overall feature, is actually a locally limited structure within relatively densely connected sub-networks (communities). It was found by using originally developed two-staged directed graph visualization method. We named this property as “bow-tie locality” and developed an index that can evaluate the degree of locality quantitatively. Locality is quantitatively confirmed by the fact that all the web network data housed in SNAP Dataset ((Leskovec and Krevl 2014)) shows more locality than economic network (Japanese production network) or programming symbols’ network (Emacs LISP function definition relation).
Large-scale complex networks consists of multiple networks connected together and not a monolithic structure in general, which has been studied by various community detection methods. Bow-tie locality concept is useful to analyze the way how these components (communities) are connected with each other, which has so much diversity that it is often difficult to have any research direction. Japanese production network, which is one of the reference data in this study, is a complex of heterogeneous communities. We will apply bow-tie locality to understand the internal structure of the network and analyze the relation between industrial sectors in the near future.
We would like to thank Hideaki Aoyama for helpful discussions and encouragement. We also thank Atsushi Kawai for his kind support in using Phantom GRAPE and providing GRAPE interface.
This study was supported in part by the Project “Large-scale Simulation and Analysis of Economic Network for Macro Prudential Policy” undertaken at Research Institute of Economy, Trade and Industry (RIETI), MEXT as Exploratory Challenges on Post-K computer (Studies of Multi-level Spatiotemporal Simulation of Socioeconomic Phenomena), Grant-in-Aid for Scientific Research (KAKENHI) by JSPS Grant Number 17H02041.
Availability of data and materials
The datasets used and/or analysed during the current study are publicly available from (Leskovec and Krevl 2014) for Web network. The programming language symbol definition relation data is available from the corresponding author on reasonable request. Japanese production network data of this study are available from Tokyo Shoko Research, LTD. (TOKYO SHOKO RESEARCH 2016) but restrictions apply to the availability of this data, which is used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Tokyo Shoko Research, LTD. The analysis tools (computer program) of this research is publicly available from author’s repository https://github.com/fjt/ngraph/.
YF: Visualization algorithm development, algorithm implementation, data acquisition, analysis and interpretation of data. YK: Initial analysis and interpretation of data, modularity analysis. YF: Acquisition of funding, visualization algorithm, data acquisition and important advice for analysis. WS: Acquisition of funding, data acquisition and analysis design. HI: Overall analysis design and important advice on the overall logical structure of research. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- Ataru Tanikawa, e. a (2012) Phantom-grape: numerical software library to accelerate collisionless n-body simulation with simd instruction set on x86 architecture. arXiv.org/astro-ph/arXiv:1203.4037.Google Scholar
- Barnes, J, Hut P (1986) A hierarchical O(N log N) force-calculation algorithm. Nature 324(6096):446–449.ADSView ArticleGoogle Scholar
- Brandes, U, Pich C (2006) Eigensolver Methods for Progressive Multidimensional Scaling of Large Data. LNCS 4372. https://doi.org/10.1007/978-3-540-70904-6_6.
- Broder, A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener J (2000) Graph structure in the web. Comput Netw 33:309–320.View ArticleGoogle Scholar
- Chakraborty, A, Kichikawa Y, Iyetomi H, Iino T, Inoue H, Fujiwara Y, Aoyama HHierarchical communities in the walnut structure of japanese production networks. PLoS ONE 13. https://doi.org/10.1371/journal.pone.0202739.View ArticleGoogle Scholar
- Csete, M, Doyle J (2004) Bow ties, metabolism and disease. Trends Biotechnol 22:446–50. https://doi.org/10.1016/j.tibtech.2004.07.007.View ArticleGoogle Scholar
- Dill, S, Kumar R, McCurley KS, Rajagopalan S, Sivakumar D, Tomkins A (2002) Self-similarity in the web. ACM Trans Internet Technol (TOIT) 2(3):205–223.View ArticleGoogle Scholar
- Donato, D, Leonardi S, Millozzi S, Tsaparas P (2005) Mining the inner structure of the web graph. 8th Int Work Web Database:145–150.Google Scholar
- Donato, D, Leonardi S, Millozzi S, Tsaparas P (2008) Mining the inner structure of the web graph. J Phys A Math Theor 41(22):224017.ADSMathSciNetView ArticleGoogle Scholar
- Fruchterman, TMJ, Reingold EM (1991) Graph Drawing by Force-directed Placement. Softw Pract Experience 21(11):1129–1164.View ArticleGoogle Scholar
- Fujita, Y, Fujiwara Y, Souma W (2016) Large directed-graph layout and its application to a million-firms economic network. Evol Inst Econ Rev 13(2):397–408. https://doi.org/10.1007/s40844-016-0059-9.View ArticleGoogle Scholar
- Kawakami, E, Singh VK, Matsubara K, Ishii T, Matsuoka Y, Hase T (2016) Network analyses based on comprehensive molecular interaction maps reveal robust control structures in yeast stress response pathways. npj Syst Biol Appl 2. https://doi.org/10.1038/npjsba.2015.18.
- Kitano, H (2004) Biological robustness. Nat Rev Genet 5(11):826.View ArticleGoogle Scholar
- Leskovec, J, Krevl A (2014) SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data. Accessed 12 Oct 2018.
- Leskovec, J, Lang KJ, Dasgupta A, Mahoney MW (2008) Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. arXiv.org:0810.1335.Google Scholar
- Martin Rosvall, CTB (2011) Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PLoS ONE 6(4). https://doi.org/10.1371/journal.pone.0018209.ADSView ArticleGoogle Scholar
- Meusel, R, Vigna S, Lehmberg O, Bizer C (2014) Graph structure in the web—revisited: a trick of the heavy tail In: Proceedings of the 23rd international conference on World Wide Web, 427–432.. ACM. https://doi.org/10.1145/2567948.2576928.
- Meusel, R, Vigna S, Lehmberg O, Bizer C (2015) The graph structure in the web: Analyzed on different aggregation levels. J Web Sci 1(1):33–47.View ArticleGoogle Scholar
- Rosvall, M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. PNAS 105(4):1118–1123. https://doi.org/10.1073/pnas.0706851105.ADSView ArticleGoogle Scholar
- Sugiyama, K, Misue K (1995) Graph drawing by the magnetic spring model. J Vis Lang Comput 6:217–231.View ArticleGoogle Scholar
- TOKYO SHOKO RESEARCH, LTD. (2016). http://www.tsr-net.co.jp. Accessed Dec 2016.
- Zhao, J, Yu H, Luo J-H, Cao Z-W, Li Y-X (2006) Hierarchical modularity of nested bow-ties in metabolic networks. BMC Bioinformatics 7(386). https://doi.org/10.1186/1471-2105-7-386.View ArticleGoogle Scholar
- Zhu, JJH, Meng T, Xie Z, Li G, Li X (2008) A teapot graph and its hierarchical structure of the Chinese web In: Proceedings of the 17th international conference on World Wide Web, 1133–1134.. ACM. https://doi.org/10.1145/1367497.1367692.