Graph-based data clustering via multiscale community detection

Liu, Zijing; Barahona, Mauricio

doi:10.1007/s41109-019-0248-7

Applied Network Science

Table 3 Comparison of performance of clustering methods on eleven real UCI datasets

From: Graph-based data clustering via multiscale community detection

Dataset	k-means	Gaussian mixture	Hierarchical	Spectral N-cut	Spectral NJW	MS on CkNN graph
Normalised Mutual Information (NMI)(Number of clusters fixed to ground truth: c=c^∗)
Iris (c^∗=3)	0.7582	0.8997	0.7221	0.7578	0.7665	0.7980
Glass (c^∗=6)	0.3163	0.4033	0.1548	0.2899	0.3175	0.3932
Wine (c^∗=3)	0.8759	0.9120	0.6144	0.9276	0.9276	0.8347
WBDC (c^∗=2)	0.5546	0.5275	0.0102	0.5434	0.5565	0.7231
Control chart (c^∗=6)	0.6907	0.5348	0.7070	0.7924	0.7891	0.8489
Parkinson (c^∗=2)	0.0970	0.0387	0.0519	0.2189	0.2189	0.1334
Vertebral (c^∗=3)	0.4120	0.5559	0.1280	0.3969	0.3901	0.5971
Breast tissue (c^∗=6)	0.5446	0.5487	0.4745	0.5481	0.5366	0.5291
Seeds (c^∗=3)	0.7279	0.8111	0.7010	0.6881	0.6881	0.7142
Image Seg. (c^∗=7)	0.5840	0.5697	0.0359	0.5929	0.6466	0.5816
Yeast (c^∗=10)	0.2955	0.2071	0.1846	0.2794	0.2801	0.2909
Average	0.5324	0.5462	0.3440	0.5487	0.5561	0.5858
Adjusted Rand Index (ARI)(Number of clusters fixed to ground truth: c=c^∗)
Iris (c^∗=3)	0.7302	0.9039	0.6423	0.7430	0.7570	0.7455
Glass (c^∗=6)	0.1722	0.2649	0.0350	0.1104	0.1483	0.2380
Wine (c^∗=3)	0.8975	0.9325	0.5771	0.9471	0.9471	0.8471
WBDC (c^∗=2)	0.6707	0.6436	0.0048	0.6661	0.6777	0.8244
Control chart (c^∗=6)	0.5207	0.2699	0.5374	0.6835	0.6803	0.7056
Parkinson (c^∗=2)	-0.0978	-0.0502	-0.0710	0.1608	0.1608	0.2659
Vertebral (c^∗=3)	0.3147	0.5256	-0.0327	0.2986	0.2888	0.6096
Breast tissue (c^∗=6)	0.2876	0.2872	0.1895	0.3983	0.3806	0.3659
Seeds (c^∗=3)	0.7733	0.8167	0.6863	0.7189	0.7189	0.7432
Image Seg. (c^∗=7)	0.4605	0.4522	0.0011	0.3360	0.4849	0.4150
Yeast (c^∗=10)	0.1658	0.0972	0.1083	0.1483	0.1481	0.1746
Average	0.4450	0.4676	0.2435	0.4737	0.4902	0.5395

The number of clusters is fixed to the number of clusters of the ground truth and given as an input. The best performance for each dataset is indicated in boldface.

Back to article page