# Table 1 Summary of internal cluster validation used to determine optimal clustering configuration

Validation Metric Mathematical Description Optimal Value
Silhouette index (SI) $$\frac {1}{k}{\sum \nolimits }_{i}^{}\bigg \{\frac {1}{n_{i}}{\sum \nolimits }_{x \in C_{i}}^{}\frac {b(x)-a(x)}{\max _{x}[b(x),a(x)]}\bigg \}$$ where Max
$$a(x) = \frac {1}{n_{i} - 1} {\sum \nolimits }_{y \in c_{i}, y \neq x} d(x,y)$$ and
$$b(x) = min_{j, j\neq i}\bigg [ \frac {1}{n_{j}} {\sum \nolimits }_{y \in C_{j}} d(x,y) \bigg ]$$
Calinski-Harabasz index (CH) $$\frac {{\sum \nolimits }_{i}^{}n_{i}d^{2}(c_{i},c)/(k-1)}{{\sum \nolimits }_{i}^{}{\sum \nolimits }_{x \in C_{i}}^{}d^{2}(x,c_{i})/(N-k)}$$ Max
Davies-Bouldin index (DB) $$\frac {1}{k}\sum _{i}\max _{j, j \neq i} \bigg [ \frac {\frac {1}{n_{i}}\sum _{x \in C_{i}}d(x,c_{i}) + \frac {1}{n_{j}}\sum _{x \in C_{j}}d(x, c_{j})}{d(c_{i},c_{j})}\bigg ]$$ Min
Dunn’s index $$\min _{i}\bigg [\min _{j}\frac {\min _{x \in C_{i}, y \in C_{j}}d(x,y)}{\max _{k} \{\max _{x,y \in C_{k}} d(x,y) \}}\bigg ]$$ Max
Xie-Beni index (XB) $$\frac {\sum _{i}\sum _{x \in C_{i}}d^{2}(x,c_{i})} { N \min _{i,j \neq i}d^{2}(c_{i},c_{j})}$$ Min
SD validity index (SD) Dis(kmax)Scat(k)+Dis(k) where Min
$$Scat(k) = \frac {1}{k}\sum _{i}||\sigma (C_{i})|| / ||\sigma (D)||$$ and
$$Dis(k) = \frac {\max _{i,j}d(c_{i},c_{j})}{\min _{i,j}d(c_{i},c_{j})}\sum _{i}\Big [\sum _{j}d(c_{i},c_{j})\Big ]^{-1}$$
S_Dbw validity Index (SD_Dbw) Scat(k)+Dens_bw(k) where Min
$$Dens\_bw(k) = \frac {1}{k(k-1)} \sum \limits _{i} \bigg [ \sum \limits _{j, j \neq i} \frac {\sum \limits _{x \in C_{i} \cup C_{j}}f(x, u_{ij})}{max \big \{ \sum \limits _{x \in C_{i}}f(x,c_{i}),\sum \limits _{x \in C_{j}}f(x,c_{j}) \big \}} \bigg ]$$
I index $$\bigg [\frac {\sum _{x \in D}d(x,c)}{k \sum _{i}\sum _{x \in C_{i}}d(x,c_{i})}\max _{i,j}d(c_{i},c_{j})\bigg ]^{p}$$ Max
CVNN index $$\frac {Sep(k,NN)}{\max \limits _{k}SEP(k,NN)} + \frac {Com(k)}{\max \limits _{k}Com(k)}$$ where Min
$$Com(k) = \sum _{i}\Big [\frac {2}{n_{i}(n_{i}-1)}\sum \limits _{x,y \in C_{i}}d(x,y)\Big ]$$ and
$$Sep(k,NN) = \max _{i}(\frac {1}{n_{i}}\sum _{j}^{n_{i}}\frac {q_{j}}{NN})$$ where
Oj is the jth object in Ci, and qj is the number of nearest neighbors
of Oj which are not in cluster Ci.
1. D denote the data set; N: number of objects in D; C: center of D;
2. k: number of clusters; Ci: the i–th cluster; ni: number of objects in Ci;
3. ci: center of Ci; d(x,y): distance between x and y; NN : number of nearest neighbors