Skip to main content

Table 1 Summary of internal cluster validation used to determine optimal clustering configuration

From: Applications of node-based resilience graph theoretic framework to clustering autism spectrum disorders phenotypes

Validation Metric

Mathematical Description

Optimal Value

Silhouette index (SI)

\(\frac {1}{k}{\sum \nolimits }_{i}^{}\bigg \{\frac {1}{n_{i}}{\sum \nolimits }_{x \in C_{i}}^{}\frac {b(x)-a(x)}{\max _{x}[b(x),a(x)]}\bigg \}\) where

Max

 

\(a(x) = \frac {1}{n_{i} - 1} {\sum \nolimits }_{y \in c_{i}, y \neq x} d(x,y)\) and

 
 

\(b(x) = min_{j, j\neq i}\bigg [ \frac {1}{n_{j}} {\sum \nolimits }_{y \in C_{j}} d(x,y) \bigg ]\)

 

Calinski-Harabasz index (CH)

\(\frac {{\sum \nolimits }_{i}^{}n_{i}d^{2}(c_{i},c)/(k-1)}{{\sum \nolimits }_{i}^{}{\sum \nolimits }_{x \in C_{i}}^{}d^{2}(x,c_{i})/(N-k)}\)

Max

Davies-Bouldin index (DB)

\(\frac {1}{k}\sum _{i}\max _{j, j \neq i} \bigg [ \frac {\frac {1}{n_{i}}\sum _{x \in C_{i}}d(x,c_{i}) + \frac {1}{n_{j}}\sum _{x \in C_{j}}d(x, c_{j})}{d(c_{i},c_{j})}\bigg ]\)

Min

Dunn’s index

\(\min _{i}\bigg [\min _{j}\frac {\min _{x \in C_{i}, y \in C_{j}}d(x,y)}{\max _{k} \{\max _{x,y \in C_{k}} d(x,y) \}}\bigg ]\)

Max

Xie-Beni index (XB)

\(\frac {\sum _{i}\sum _{x \in C_{i}}d^{2}(x,c_{i})} { N \min _{i,j \neq i}d^{2}(c_{i},c_{j})}\)

Min

SD validity index (SD)

Dis(kmax)Scat(k)+Dis(k) where

Min

 

\(Scat(k) = \frac {1}{k}\sum _{i}||\sigma (C_{i})|| / ||\sigma (D)||\) and

 
 

\(Dis(k) = \frac {\max _{i,j}d(c_{i},c_{j})}{\min _{i,j}d(c_{i},c_{j})}\sum _{i}\Big [\sum _{j}d(c_{i},c_{j})\Big ]^{-1}\)

 

S_Dbw validity Index (SD_Dbw)

Scat(k)+Dens_bw(k) where

Min

 

\(Dens\_bw(k) = \frac {1}{k(k-1)} \sum \limits _{i} \bigg [ \sum \limits _{j, j \neq i} \frac {\sum \limits _{x \in C_{i} \cup C_{j}}f(x, u_{ij})}{max \big \{ \sum \limits _{x \in C_{i}}f(x,c_{i}),\sum \limits _{x \in C_{j}}f(x,c_{j}) \big \}} \bigg ]\)

 

I index

\(\bigg [\frac {\sum _{x \in D}d(x,c)}{k \sum _{i}\sum _{x \in C_{i}}d(x,c_{i})}\max _{i,j}d(c_{i},c_{j})\bigg ]^{p}\)

Max

CVNN index

\( \frac {Sep(k,NN)}{\max \limits _{k}SEP(k,NN)} + \frac {Com(k)}{\max \limits _{k}Com(k)} \) where

Min

 

\(Com(k) = \sum _{i}\Big [\frac {2}{n_{i}(n_{i}-1)}\sum \limits _{x,y \in C_{i}}d(x,y)\Big ]\) and

 
 

\(Sep(k,NN) = \max _{i}(\frac {1}{n_{i}}\sum _{j}^{n_{i}}\frac {q_{j}}{NN})\) where

 
 

Oj is the jth object in Ci, and qj is the number of nearest neighbors

 
 

of Oj which are not in cluster Ci.

 
  1. D denote the data set; N: number of objects in D; C: center of D;
  2. k: number of clusters; Ci: the i–th cluster; ni: number of objects in Ci;
  3. ci: center of Ci; d(x,y): distance between x and y; NN : number of nearest neighbors