Skip to main content

Table 3 Comparison of performance of clustering methods on eleven real UCI datasets

From: Graph-based data clustering via multiscale community detection

Dataset

k-means

Gaussian mixture

Hierarchical

Spectral N-cut

Spectral NJW

MS on CkNN graph

Normalised Mutual Information (NMI)(Number of clusters fixed to ground truth: c=c∗)

Iris (c∗=3)

0.7582

0.8997

0.7221

0.7578

0.7665

0.7980

Glass (c∗=6)

0.3163

0.4033

0.1548

0.2899

0.3175

0.3932

Wine (c∗=3)

0.8759

0.9120

0.6144

0.9276

0.9276

0.8347

WBDC (c∗=2)

0.5546

0.5275

0.0102

0.5434

0.5565

0.7231

Control chart (c∗=6)

0.6907

0.5348

0.7070

0.7924

0.7891

0.8489

Parkinson (c∗=2)

0.0970

0.0387

0.0519

0.2189

0.2189

0.1334

Vertebral (c∗=3)

0.4120

0.5559

0.1280

0.3969

0.3901

0.5971

Breast tissue (c∗=6)

0.5446

0.5487

0.4745

0.5481

0.5366

0.5291

Seeds (c∗=3)

0.7279

0.8111

0.7010

0.6881

0.6881

0.7142

Image Seg. (c∗=7)

0.5840

0.5697

0.0359

0.5929

0.6466

0.5816

Yeast (c∗=10)

0.2955

0.2071

0.1846

0.2794

0.2801

0.2909

Average

0.5324

0.5462

0.3440

0.5487

0.5561

0.5858

Adjusted Rand Index (ARI)(Number of clusters fixed to ground truth: c=c∗)

Iris (c∗=3)

0.7302

0.9039

0.6423

0.7430

0.7570

0.7455

Glass (c∗=6)

0.1722

0.2649

0.0350

0.1104

0.1483

0.2380

Wine (c∗=3)

0.8975

0.9325

0.5771

0.9471

0.9471

0.8471

WBDC (c∗=2)

0.6707

0.6436

0.0048

0.6661

0.6777

0.8244

Control chart (c∗=6)

0.5207

0.2699

0.5374

0.6835

0.6803

0.7056

Parkinson (c∗=2)

-0.0978

-0.0502

-0.0710

0.1608

0.1608

0.2659

Vertebral (c∗=3)

0.3147

0.5256

-0.0327

0.2986

0.2888

0.6096

Breast tissue (c∗=6)

0.2876

0.2872

0.1895

0.3983

0.3806

0.3659

Seeds (c∗=3)

0.7733

0.8167

0.6863

0.7189

0.7189

0.7432

Image Seg. (c∗=7)

0.4605

0.4522

0.0011

0.3360

0.4849

0.4150

Yeast (c∗=10)

0.1658

0.0972

0.1083

0.1483

0.1481

0.1746

Average

0.4450

0.4676

0.2435

0.4737

0.4902

0.5395

  1. The number of clusters is fixed to the number of clusters of the ground truth and given as an input. The best performance for each dataset is indicated in boldface.