Skip to main content

Table 1 Summary of internal cluster validation used to determine optimal clustering configuration

From: Applications of node-based resilience graph theoretic framework to clustering autism spectrum disorders phenotypes

Validation Metric Mathematical Description Optimal Value
Silhouette index (SI) \(\frac {1}{k}{\sum \nolimits }_{i}^{}\bigg \{\frac {1}{n_{i}}{\sum \nolimits }_{x \in C_{i}}^{}\frac {b(x)-a(x)}{\max _{x}[b(x),a(x)]}\bigg \}\) where Max
  \(a(x) = \frac {1}{n_{i} - 1} {\sum \nolimits }_{y \in c_{i}, y \neq x} d(x,y)\) and  
  \(b(x) = min_{j, j\neq i}\bigg [ \frac {1}{n_{j}} {\sum \nolimits }_{y \in C_{j}} d(x,y) \bigg ]\)  
Calinski-Harabasz index (CH) \(\frac {{\sum \nolimits }_{i}^{}n_{i}d^{2}(c_{i},c)/(k-1)}{{\sum \nolimits }_{i}^{}{\sum \nolimits }_{x \in C_{i}}^{}d^{2}(x,c_{i})/(N-k)}\) Max
Davies-Bouldin index (DB) \(\frac {1}{k}\sum _{i}\max _{j, j \neq i} \bigg [ \frac {\frac {1}{n_{i}}\sum _{x \in C_{i}}d(x,c_{i}) + \frac {1}{n_{j}}\sum _{x \in C_{j}}d(x, c_{j})}{d(c_{i},c_{j})}\bigg ]\) Min
Dunn’s index \(\min _{i}\bigg [\min _{j}\frac {\min _{x \in C_{i}, y \in C_{j}}d(x,y)}{\max _{k} \{\max _{x,y \in C_{k}} d(x,y) \}}\bigg ]\) Max
Xie-Beni index (XB) \(\frac {\sum _{i}\sum _{x \in C_{i}}d^{2}(x,c_{i})} { N \min _{i,j \neq i}d^{2}(c_{i},c_{j})}\) Min
SD validity index (SD) Dis(kmax)Scat(k)+Dis(k) where Min
  \(Scat(k) = \frac {1}{k}\sum _{i}||\sigma (C_{i})|| / ||\sigma (D)||\) and  
  \(Dis(k) = \frac {\max _{i,j}d(c_{i},c_{j})}{\min _{i,j}d(c_{i},c_{j})}\sum _{i}\Big [\sum _{j}d(c_{i},c_{j})\Big ]^{-1}\)  
S_Dbw validity Index (SD_Dbw) Scat(k)+Dens_bw(k) where Min
  \(Dens\_bw(k) = \frac {1}{k(k-1)} \sum \limits _{i} \bigg [ \sum \limits _{j, j \neq i} \frac {\sum \limits _{x \in C_{i} \cup C_{j}}f(x, u_{ij})}{max \big \{ \sum \limits _{x \in C_{i}}f(x,c_{i}),\sum \limits _{x \in C_{j}}f(x,c_{j}) \big \}} \bigg ]\)  
I index \(\bigg [\frac {\sum _{x \in D}d(x,c)}{k \sum _{i}\sum _{x \in C_{i}}d(x,c_{i})}\max _{i,j}d(c_{i},c_{j})\bigg ]^{p}\) Max
CVNN index \( \frac {Sep(k,NN)}{\max \limits _{k}SEP(k,NN)} + \frac {Com(k)}{\max \limits _{k}Com(k)} \) where Min
  \(Com(k) = \sum _{i}\Big [\frac {2}{n_{i}(n_{i}-1)}\sum \limits _{x,y \in C_{i}}d(x,y)\Big ]\) and  
  \(Sep(k,NN) = \max _{i}(\frac {1}{n_{i}}\sum _{j}^{n_{i}}\frac {q_{j}}{NN})\) where  
  Oj is the jth object in Ci, and qj is the number of nearest neighbors  
  of Oj which are not in cluster Ci.  
  1. D denote the data set; N: number of objects in D; C: center of D;
  2. k: number of clusters; Ci: the i–th cluster; ni: number of objects in Ci;
  3. ci: center of Ci; d(x,y): distance between x and y; NN : number of nearest neighbors