TY - JOUR
T1 - Ensemble clustering using extended fuzzy k-means for cancer data analysis
AU - Khan, Imran
AU - Luo, Zongwei
AU - Shaikh, Abdul Khalique
AU - Hedjam, Rachid
N1 - Funding Information:
This work is partially supported by BNU startup research fund and UIC startup research fund (R72021110).
Publisher Copyright:
© 2021 Elsevier Ltd
PY - 2021/6/15
Y1 - 2021/6/15
N2 - Clustering analysis is a significant research topic in discovering cancer using different profiles of gene expression, which is very important to successfully diagnose and treat the cancer decease. Many ensemble clustering methods have been developed to perform clustering using tumor data. Only few of them incorporates a significant number of input clusterings, the optimal number of clusters in each input clustering, and an appropriate ensemble method to combine input clusterings into a final clustering. In this paper, we introduce two new steps in the standard fuzzy k-means algorithm to determine the optimal number of input clusterings, and the optimal number of clusters in each clustering for ensemble clustering. The first one is to incorporate a penalty term for making the algorithm insensitive to the initialization of cluster centroids. The second one is to automate a clustering process for iteratively updating the feature weights. This step addresses the noise values in the dataset. We propose an ensemble clustering method, which combines a set of input clusterings into a final clustering having better overall quality. Experiments on real cancer gene expression profiles illustrate that the proposed algorithm outperformed the well-known clustering algorithms.
AB - Clustering analysis is a significant research topic in discovering cancer using different profiles of gene expression, which is very important to successfully diagnose and treat the cancer decease. Many ensemble clustering methods have been developed to perform clustering using tumor data. Only few of them incorporates a significant number of input clusterings, the optimal number of clusters in each input clustering, and an appropriate ensemble method to combine input clusterings into a final clustering. In this paper, we introduce two new steps in the standard fuzzy k-means algorithm to determine the optimal number of input clusterings, and the optimal number of clusters in each clustering for ensemble clustering. The first one is to incorporate a penalty term for making the algorithm insensitive to the initialization of cluster centroids. The second one is to automate a clustering process for iteratively updating the feature weights. This step addresses the noise values in the dataset. We propose an ensemble clustering method, which combines a set of input clusterings into a final clustering having better overall quality. Experiments on real cancer gene expression profiles illustrate that the proposed algorithm outperformed the well-known clustering algorithms.
KW - Cancer data
KW - Cluster analysis
KW - Fuzzy k-means
KW - Variable weights
UR - http://www.scopus.com/inward/record.url?scp=85100691973&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100691973&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2021.114622
DO - 10.1016/j.eswa.2021.114622
M3 - Article
AN - SCOPUS:85100691973
SN - 0957-4174
VL - 172
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 114622
ER -