- Research
- Open access
- Published:
A generalized uniform placement of alters on spherical surface (U-PASS) for visualizing general weighted networks
Applied Network Science volume 9, Article number: 37 (2024)
Abstract
Network visualization is an important tool for extracting information from the structure and configuration of a network, especially when the network includes weighted edges and nodes with attribute information. Previous studies have demonstrated an effective visualization method that projects the network onto a spherical surface. In this work, we extend this method, known as Uniform Placement of Alters on Spherical Surface (U-PASS), to general weighted networks. This extension enables the uniform distribution of network nodes across multiple layers of concentric spheres. In addition, we discuss the lower bound of a criterion called generalized spherical cap discrepancy, which is used to evaluate the uniformity of node distribution on a collapsed spherical surface.
Introduction
In contemporary society, networks have become indispensable, playing pivotal roles in both daily activities and various academic disciplines. Their significance lies not only in connectivity but also in their capacity to visualize intricate relationships. This burgeoning field enhances intuitive comprehension of complex network structures, thereby enriching academic research and impacting diverse fields like business, science, and sociology. As network dimensions expand in the era of big data, the need for innovative visualization methods becomes increasingly urgent to address the challenges inherent in large-scale network analysis.
Various methods for network visualization have been developed, but their applicability varies depending on the specific insights sought from the network structure. This article focuses on the challenge of efficiently identifying nodes related to a specific node in a weighted edge network and evaluating the strength of these connections. Our proposed visualization method aims to address this issue, particularly benefiting networks such as patent and citation networks where understanding node relationships is crucial. Current visualization methods, exemplified in Fig. 1 (left), allow node observation but often fail to effectively convey the strength of correlations between nodes. Even approaches like Fig. 1 (right), which use edge thickness to denote correlation strength, do not facilitate quick identification of nodes related to the target node. Hence, new visualization techniques are needed to overcome these limitations and improve the clarity and efficiency of extracting relevant information from network structures.
One significant limitation hindering Fig. 1’s effectiveness in displaying network structure is the constraint of display space. Expanding network visualization into three-dimensional (3D) space, as opposed to two-dimensional (2D) planes, can substantially enhance information conveyance. Current efforts to represent networks in 3D employ two primary approaches. The first involves combining multiple 2D plane networks to create a multilayer structure that illustrates correlations between different layers. For instance, Ahmed et al. (2005) assigned nodes to corresponding planes based on node degrees across layers, applied notably in protein-protein interactions, IEEE InfoVis citations, and collaboration networks. Additionally, Škrlj et al. (2019) developed a Python suite capable of graphically laying out multilayered networks, comparing its efficacy with solutions in other programming languages. However, these methods commonly utilize force-directed algorithms for node allocation, balancing attractive and repulsive forces between nodes but occasionally encountering node overlap issues. Moreover, they typically assume network relationships as simple connections without considering variations in edge strengths.
Another approach gaining traction is to represent networks on the surface of a sphere, a burgeoning research topic with diverse algorithms developed to tackle this challenge. In the realm of distributed networks on spheres, two primary methods prevail. The first method continues to rely on force-directed algorithms. For example, Schulz (2018) proposed an algorithm to lay out networks on a sphere, projecting them into lower dimensions to assess distances from a selected focal node to others. Among these algorithms, Doubly Stochastic Neighbor Embedding on Spheres (DOSNES) (Lu et al. 2019) represents a novel method based on stochastic neighborhoods, effectively normalizing node distribution amidst highly imbalanced data. The second method employs the self-organizing map (SOM) algorithm, an unsupervised neural network technique designed to project high-dimensional data into a two-dimensional space while preserving original data topology. Fu et al. (2007) extended SOM to distribute email networks on surfaces, yielding small-world networks but with challenges such as node and edge overlap, addressed in part by Wu and Takatsuka (2006)’s two-stage SOM (SOM2) algorithm, enhancing clarity albeit at the cost of computational efficiency.
Despite these advancements, a persistent issue remains: node overlap, complicating structural network analysis. A promising solution involves increasing distance between node pairs proportionate to their closeness, akin to minimum energy design principles in experimental design. Building on this concept, our recent study (Huang and Phoa 2023) introduced Uniform Placement of Alters on Spherical Surface (U-PASS), tailored for visualizing ego-centric networks on a spherical surface. Ego-centric networks, characterized by a central node connected to all others, are uniformly distributed across a sphere using a three-step algorithm that maximizes minimum distances between nodes, optimized through particle swarm optimization (PSO). However, real-world network data often transcends ego-centric structures. Allocating all network nodes on a sphere may insufficiently represent network structure, prompting extension to multi-layer spaces, such as the two-layer concentric sphere proposed in Huang and Phoa (2024). Here, nodes connected to a randomly selected ego populate the first layer, while unconnected nodes populate the second.
Central to U-PASS is the imperative of uniform node distribution, linked to minimum energy design (MED) principles as discussed in Huang and Phoa (2023). Evaluating uniformity across methods commonly employs criteria like discrepancy, measuring deviations between empirical and theoretical distributions, critical for validating existing designs and searching for uniform solutions. Various criteria, from star discrepancy to Lee discrepancy, were comprehensively explored in Fang et al. (2018), yet standardized criteria for network node distribution remain lacking.
When considering a domain on a unit sphere \({\mathbb {S}}^3=\left\{ z\in {\mathbb {R}}^3:\Vert z\Vert = 1\right\}\), traditional criteria are no longer directly applicable. Introduced by Aistleitner et al. (2012), the spherical cap discrepancy (SCD) offers a natural criterion for assessing uniformity:
where \(x_k\in D\), represents an experimental design of N points on the sphere, C(w, h) denotes a spherical cap with center w and height h, and \(\sigma ^{*}(C(w,h))=\frac{\sigma (C(w,h))}{\sigma ({\mathbb {S}}^3)}\) normalizes the surface area of C(w, h). he best-known lower bound of \(SCD_N^C(D)\), established in Beck (1984), is
for any N-element set \(D\subset {\mathbb {S}}^d\), where c(d) is a constant dependent only on d. While originally designed for independent points, the focus shifts to network nodes in our context, which exhibit dependencies rather than independence. Therefore, relying solely on spherical cap discrepancy proves inadequate for evaluating uniformity within a network on a spherical surface. Recognizing this, Huang and Phoa (2024) introduced the generalized spherical cap discrepancy (gSCD) criterion to address these complexities. This criterion accounts for varying node degrees where nodes with larger degrees typically occupy smaller solid angles, akin to independent points with less weight. The formulation of gSCD is:
where \(G_j=\frac{1/\beta _j}{\sum _{i=1}^N1/\beta _i}N\) and \(\sum \limits _{i=1}^N G_i=N\). However, the statistical properties of gSCD have yet to be fully explored. Therefore, this work aims to investigate its lower bounds comprehensively.
In this work, we propose an extended algorithm derived from U-PASS to uniformly distribute network nodes on a multi-layer concentric sphere, applicable to general networks. Moreover, we introduce a novel criterion to evaluate the uniformity of network nodes on a spherical surface. The notations and definitions used in this study are detailed in Sect. 2. Section 3 outlines our method and discusses the lower bounds of the generalized spherical cap discrepancy. Simulations, result comparisons, and discussions are presented in Sects. 4 and 5, respectively.
Notations and definitions
We use notation similar to Huang and Phoa (2023), with slight modifications. We consider an undirected network G(V, E), where V is the set of nodes and E the set of edges. One node, denoted as \(v_{00}\), is designated as the ego node. The network contains k clusters, \(\left\{ C_1,\dots ,C_k \right\}\), each with sizes \(|C_i|=c_i\) for \(i=1,\dots ,k\).
Nodes within each cluster at distance d from the ego are denoted \(v_{ij}^d\) for \(i=1,\dots ,k\) and \(j=1,\dots ,c_i\). Nodes connected to the ego node at distance d, excluding those within clusters, are denoted \(v_{0j}^d\) for \(j=1,\dots ,N\), where \(N=|V|-\sum _i c_i -1\). \(E_i\) represents the set of edges in cluster \(C_i\) with size \(|E_i|=e_i\) for \(i=1,\dots ,k\). The total number of nodes |V| satisfies \(|V|=\sum \limits _{d=1}|V^d|+1\), where \(V^d\) is the set of nodes at distance d from the ego.
Let A be the \((|V|-1) \times (|V|-1)\) adjacency matrix of \(V\backslash \left\{ v_{00}\right\}\), where \(a_{v_{is}^dv_{jt}^l}=1\) if \((v_{is}^d,v_{jt}^l)\in E\) and 0 otherwise, for \(i,j=1,\cdots ,k\), \(s=1,\cdots ,c_s\) and \(t=1,\cdots ,c_t\). We focus on the network structure under varying edge weights, defining a \(|V|\times |V|\) similarity matrix S, where \(S_{v_{is}^dv_{jt}^l}\) rindicates the degree of dependence between network nodes, satisfying \(0\le S_{v_{is}^dv_{jt}^l}\le 1\), \(i,j=0,\cdots ,k\), \(s=1,\cdots ,c_s\) and \(t=1,\cdots ,c_t\).
In practice, nodes \(v_{ij}^d\) and clusters \(C_i\) are represented as spherical caps, where each node \(v_{ij}^d\) resides at the apex characterized by polar angle \(\theta _i^j\) and solid angle \(\Omega _i^j=2\pi (1-\cos \theta _i^j)\). Similarly, each cluster \(C_i\) is characterized by solid angle \(\Omega _i\).
To define optimality in uniform node allocation, we first measure the distance \(\phi _{jj'}\) between two points \(v_{ij}^d\) and \(v_{ij'}^d\) on a unit sphere centered at \(v_{00}\), defined as \(\phi _{jj'}=cos^{-1}\left( |{\textbf{v}}_{ij}^d \cdot {\textbf{v}}_{ij'}^d|\right)\), where \({\textbf{v}}_{ij}^d\) represents the coordinates of node \(v_{ij}^d\). Uniform allocation is deemed optimal if it maximizes the minimum \(\phi _{jj'}\) value across all nodes \(v_{ij}^d\) and clusters \(C_i\) in G(V, E). Note that adjustments are made to the distance between node pairs within a cluster based on the cluster’s edge density. The conventional Beta index \(\beta =\frac{\text {total number of edges}}{\text {total number of nodes}}\) poses a zero-denominator problem for a single point, leading us to define an adjusted Beta index \(\beta _i=\frac{e_i+c_i}{c_i}\) to quantify the connectivity degree of \(C_i\).
Three-stage optimization method
We consider an arbitrary network and select a node of interest as the ego, while all other nodes are treated as alters. Using a pre-determined similarity matrix, we determine the degree of similarity of all alters to the ego. Based on these similarity degrees, we represent the network as an egocentric multi-layer concentric sphere network, assigning nodes to layers according to their similarity to the ego. Nodes with higher similarity are positioned closer to the ego on smaller radius spheres, while nodes with lower similarity are placed on larger radius spheres. The radius of concentric spheres is proportionally determined by the reciprocal of the degree of similarity. Each node \(v_{ij}^d\) is placed on a layer with a radius determined by \(d=\frac{1}{S_{v_{ij}^d,v_{00}}}\). This approach ensures a more accurate representation of the relationships among network nodes, facilitating insightful visualizations. This visualization method resembles an ego-centric network and allows modifications to the U-PASS algorithm from Huang and Phoa (2023) to accommodate this more complex network structure.
To ensure optimal visibility of node distribution within the sphere network, we aim for uniform distribution, whether nodes are on their own layers or projected onto the same layer. Initially, all nodes are evenly distributed on the same layer to prevent overlap, then projected one by one onto their corresponding layers. The algorithm is listed in 1, and the details are described below.
Step 1: Allocation of Cluster Center. This step mirrors Step 1 of Huang and Phoa (2023). Each cluster is treated as a spherical cap with a distinct solid angle. The size of the polar angle of each spherical cap is determined by the number of nodes and edges in the cluster, expressed as \(\theta _i=\cos ^{-1}(1-2r_i)\) where \(r_i\) represents the weight proportion of the i-th scatter node. The weight function is defined as \(w(c_i,e_i)=\frac{c_i}{\beta _i}=\frac{c_i^2}{e_i+c_i}\). The objective is to optimize the function \(f(p_i,p_i^{\prime })=\min \limits _{i\ne i^{\prime },i,i^{\prime }\in \left\{ 1,\cdots ,k\right\} }\frac{\phi _{ii^{\prime }}}{\theta _i+\theta _{i^{\prime }}}\), where \(\phi _{ii^{\prime }}\) denotes the angle between the cluster centers \(p_i\) and \(p_{i^{\prime }}\), aiming to maximize the angle between the spherical caps. Figure 2 (top left) illustrates the result of allocating the two cluster centers.
Step 2: Alter Allocation within Each Cluster. Nodes within each cluster are positioned on different layers based on their similarities to the ego. Initially, all nodes are considered collectively and assumed to be divided into M communities. Each community’s placement range is defined as a circular sector m, with an angle of \(\alpha _m=2\pi w_m(\sum \limits _{m=1}^Mw_m)^{-1}\) as shown in Fig. 2 (top right), where \(w_m\) represents the weight function introduced in Step 1. The outermost nodes are uniformly positioned on the unit sphere within each sector, followed by the allocation of inner nodes to the remaining space, ensuring non-overlapping positions. This uniform distribution process maximizes the angle \(\min \limits _{j\ne j^{\prime }}\phi _{jj^{\prime }}\) between pairs of nodes. Finally, the distances between the nodes and the ego are adjusted, and nodes are projected back onto the corresponding sphere.
Step 3: Allocation of Remaining Degree-1 Alters. This step closely resembles Step 3 in Huang and Phoa (2023). All nodes that do not belong to any cluster are distributed to the remaining space of the sphere by maximizing the minimum distance between pairs of nodes. Here, we also consider the similarity between alters and the ego. After determining the positions of the nodes, they are projected back onto the sphere where they should exist. Figure 2 (bottom) presents the diagram where all degree-1 alters have the same similarity.
Generalized spherical cap discrepancy and its lower bound
We reviewed in the introduction that SCD (Spherical Cap Discrepancy) is a widely used criterion for evaluating the uniformity of point distributions on a sphere, and gSCD (generalized SCD) is a new criterion that accounts for the degree of network nodes. Uniformity is critical in the U-PASS method, as it allows for numerical comparison of the uniformity of results obtained by different methods, rather than relying solely on visual inspection. Therefore, it is essential to discuss the nature of this criterion.
In this section, we aim to discuss the lower bound of gSCD. Understanding the lower bound of gSCD is significant because it represents the optimal uniformity of node allocation. By knowing this lower bound, we can determine how far our results are from the optimal outcome. Unlike SCD, gSCD additionally considers the weight of nodes. Since the distribution position of nodes with different weights significantly impacts gSCD, it is not straightforward to directly derive the lower bound of gSCD. However, from a mathematical standpoint, it is reasonable to assume that gSCD should be larger than the value of SCD. Therefore, SCD can serve as a valuable reference point for estimating the lower bound of gSCD. In the simulation, we compared the gSCD of U-PASS (Huang and Phoa 2023) with other approaches mentioned in Huang and Phoa (2023), including SOM (Fu et al. 2007), two-step SOM (SOM2) (Wu and Takatsuka 2006), Christian Schulz’s method (CS) (Schulz 2018), and DOSNES (Lu et al. 2019).
The simulation initially focused on a network containing 50 nodes, with detailed network structure provided in Appendix A. Following a preferential attachment model, the network was expanded incrementally by adding 50 nodes in each step, resulting in calculations for 10 distinct network sizes. The gSCD was computed for each network size using five different methods, and the results are presented in Fig. 3.
As illustrated in the figure, both gSCD and SCD exhibit a declining trend as the number of nodes increases, with comparable rates of decrease. Remarkably, U-PASS consistently demonstrates a lower gSCD, indicating a more uniform distribution of nodes. The advantages of U-PASS are particularly pronounced when the number of samples is small.
Real data example
In this section, we use real data to demonstrate the visualization capabilities of U-PASS. The dataset comes from the Department of Intellectual Property and Technology Transfer of Academia Sinica’s website, containing 1,094 patents in various fields. For each patent, it includes attributes and a brief abstract describing its purpose and function. Typically, patents can be clustered by attributes such as the developing institute. However, the relationships between these patents have become more complex, making traditional labels less useful for describing cross-field patents and often meaningless for visualization purposes.
To address this challenge, we use the Doc2vec method, a robust technique for converting text to vectors based on each patent’s abstract. We then construct pairwise similarities between patents by computing the Euclidean distance. A heuristic clustering method is applied to the similarity matrix, and U-PASS is used to visualize the relationships between patents. This approach differs from traditional clustering methods that rely solely on predefined attributes, focusing more on the patents themselves. The U-PASS algorithm facilitates easier analysis of the network and helps uncover insights about the patents. The main objective of this study is to enhance the accessibility and practical utility of patent datasets through meticulous and insightful visual representation.
In subsequent stages, we select a patent of specific interest as the ego node. The remaining patent nodes are allocated to different layers based on their similarity to the ego patent. In the visualization, four nodes (03A-910313, 09A-971225, 10A-971218, and 13A-990614) are used as illustrative examples. As shown in Fig. 4, nodes with varying similarities to the ego are assigned to different layers, with colors indicating the clusters to which the nodes belong. Nodes with lower similarity to the ego are represented with lighter colors. This visualization allows for clear observation and comparison of the similarities between the ego and other nodes through distance, and it also helps in identifying other patents related to the ego within different clusters.
In conclusion, the layer structure of the graph layout vividly illustrates the correlation between alter nodes and the selected ego patent node. This graphical representation not only enhances understanding of the intricate relationships within a patent dataset but also simplifies the identification of patents closest to the ego patent by exploiting discernible distance differences in the visual representation. Overall, this multifaceted approach facilitates a more detailed exploration and interpretation of patent datasets, contributing to a deeper understanding of underlying relationships and patterns.
Conclusion
In this work, we extend the application of the U-PASS method beyond ego-centric networks. Our approach involves placing network nodes on the surfaces of concentric spheres based on their similarity to the selected ego node. This expanded visualization method not only helps identify the relationships between ego nodes and alters but also enhances the clarity of observing cluster structures from different ego perspectives.
Furthermore, our study determined the lower bound for the generalized spherical cap discrepancy (gSCD) criterion. We provide methods for estimating this lower bound, thereby enhancing the evaluation of our algorithm for uniform distributions.
The multi-layered visualization approach proposed in this study enables a nuanced exploration and interpretation of patent datasets, enriching our comprehension of underlying relationships and patterns. By combining advanced visualization techniques with real-world data, our work provides valuable insights for utilizing patent information more effectively.
In future work, we aim to expand the applicability of U-PASS, exploring its utility in network structures that account for directed edges, such as citation networks. The ongoing development of U-PASS will focus on enhancing its performance and scalability, ensuring its adaptability to larger and more intricate network datasets.
Availability of data and materials.
The simulation data can be found in the appendix. The patent dataset is generated from the raw data obtained from the Department of Intellectual Property and Technology Transfer, Academia Sinica, Taiwan. The website is https://iptt.sinica.edu.tw/pages/1256.
References
Ahmed A, Dywer T, Hong S, Murray C, Song L, Wu YX (2005) Visualisation and analysis of large and complex scale-free networks. IEEE VGTC conference on visualization, 239-246
Aistleitner C, Brauchart JS, Dick J (2012) Point sets on the sphere \({\mathbb{S} }^{2}\) with small spherical cap discrepancy. Discrete Comput Geom Discrete Comput Geom 48(4):990–1024
Beck J (1984) Sums of distances between points on a sphere - an application of the theory of irregularities of distribution to discrete geometry. Mathematika 31(1):33–41
Fang K, Liu M, Qin H, Zhou Y (2018) Theory and application of uniform experimental designs. Springer
Fu X, Hong S, Nikolov NS, Shen X, Wu YX, Xu K (2007) Visualization and analysis of email networks. In Proceedings of the 2007 6th international asia-pacific symposium on visualization, 1-8
Huang EC, Phoa FKH (2024) An extended uniform placement of alters on spherical surface (U-PASS) method for visualizing general networks. In Proceedings of complex networks & their applications XII. COMPLEX NETWORKS 2023. Studies in computational intelligence, 1144, 291–300. Springer, Cham
Huang EC, Phoa FKH (2023) A uniform placement of alters on spherical surface (U-PASS) for ego-centric networks with community structure and alter attributes. Adv Complex Syst 26(3):2340003
Lee WJ, Lee WK, Sohn SY (2016) Patent network analysis and quadratic assignment procedures to identify the convergence of robot technologies. PLoS ONE 11(10):e0165091
Lu Y, Corander J, Yang Z (2019) Doubly stochastic neighbor embedding on spheres. Pattern Recogn Lett 128:100–106
Schulz C (2018) Visualizing spreading phenomena on complex networks. arXiv preprint arXiv:1807.01390
Škrlj B, Kralj J, Lavrač N (2019) Py3plex toolkit for visualization and analysis of multilayer networks. Appl Netw Sci 4(1):1–24
Wu Y, Takatsuka M (2006) Visualizing multivariate network on the surface of a sphere. In Proceedings of the 2006 Asia-Pacific symposium on information visualisation, 60, 77-83. Australian Computer Society, Inc
Yan B, Luo J (2016) Measuring technological distance for patent mapping. J Am Soc Inf Sci 68(2):423–437
Acknowledgements
ECHH would like to thank for the doctoral student scholarship provided by the Institute of Statistical Science, Academia Sinica, Taiwan. All authors would like to thank for the help from the Department of Intellectual Property and Technology Transfer, Academia Sinica in acquisition of data.
Funding
This work is patially funded by National Science and Technology Council grant numbers NSTC-111-2118-M-001-007-MY2 and NSTC-113-2628-M-001-010-MY3, and Academia Sinica grant number AS-IA-112-M03.
Author information
Authors and Affiliations
Contributions
ECHH is responsible for design of the work, analysis and interpretation of data, programming, and the draft of the manuscript. FPKH is responsible for the conception, design of the work, acquisition of data, programming, and draft of the manuscript. YHC is responsible for the acquisition of data, and programming. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Appendix A
A.1 Simulation data
The existing edges in the network are listed, where A-B indicates an edge connecting node A and node B.
03-05,03-10,04-06,04-09,04-11,04-13,04-45,05-08,05-43,06-09,06-11,06-13,07-08,07-10,07-40,08-40,09-11,09-13,09-44,09-46,10-41,10-42,11-13,13-44,20-21,20-22,20-23,20-28,21-22,22-50,23-30,23-47,23-51,27-28,27-29,28-29,29-47,29-48,30-31,30-48,31-32,32-49,40-41,40-42,41-42,42-43,44-45,44-46,45-46,46-47,46-48,47-50,47-51,48-49,48-51,48-52,49-50
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Huang, E.CH., Phoa, F.K.H. & Chen, YH. A generalized uniform placement of alters on spherical surface (U-PASS) for visualizing general weighted networks. Appl Netw Sci 9, 37 (2024). https://doi.org/10.1007/s41109-024-00646-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41109-024-00646-y