TY - GEN
T1 - A novel three staged clustering algorithm
AU - Al-Shaqsi, Jamil
AU - Wang, Wenjia
PY - 2009
Y1 - 2009
N2 - This paper presents a novel three staged clustering algorithm and a new similarity measure. The main objective of the first stage is to create the initial clusters, the second stage is to refine the initial clusters, and the third stage is to refine the initial BASES, if necessary. The novelty of our algorithm originates mainly from three aspects: automatically estimating k value, a new similarity measure and starting the clustering process with a promising BASE. A BASE acts similar to a centroid or a medoid in common clustering method but is determined differently in our method. The new similarity measure is defined particularly to reflect the degree of the relative change between data samples and to accommodate both numerical and categorical variables. Moreover, an additional function has been devised within this algorithm to automatically estimate the most appropriate number of clusters for a given dataset. The proposed algorithm has been tested on 3 benchmark datasets and compared with 7 other commonly used methods including TwoStep, k-means, k-modes, GAClust, Squeezer and some ensemble based methods including k-ANMI. The experimental results indicate that our algorithm identified the appropriate number of clusters for the tested datasets and also showed its overall better clustering performance over the compared clustering algorithms.
AB - This paper presents a novel three staged clustering algorithm and a new similarity measure. The main objective of the first stage is to create the initial clusters, the second stage is to refine the initial clusters, and the third stage is to refine the initial BASES, if necessary. The novelty of our algorithm originates mainly from three aspects: automatically estimating k value, a new similarity measure and starting the clustering process with a promising BASE. A BASE acts similar to a centroid or a medoid in common clustering method but is determined differently in our method. The new similarity measure is defined particularly to reflect the degree of the relative change between data samples and to accommodate both numerical and categorical variables. Moreover, an additional function has been devised within this algorithm to automatically estimate the most appropriate number of clusters for a given dataset. The proposed algorithm has been tested on 3 benchmark datasets and compared with 7 other commonly used methods including TwoStep, k-means, k-modes, GAClust, Squeezer and some ensemble based methods including k-ANMI. The experimental results indicate that our algorithm identified the appropriate number of clusters for the tested datasets and also showed its overall better clustering performance over the compared clustering algorithms.
KW - Automatic cluster detection
KW - Centroid selection
KW - Clustering
KW - Similarity measures
UR - http://www.scopus.com/inward/record.url?scp=77955603665&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77955603665&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:77955603665
SN - 9789728924881
T3 - Proceedings of the IADIS European Conference on Data Mining 2009, ECDM'09 Part of the IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2009
SP - 19
EP - 26
BT - Proceedings of the IADIS European Conference on Data Mining 2009, ECDM'09 Part of the IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2009
T2 - IADIS European Conference on Data Mining 2009, ECDM'09. Part of the IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2009
Y2 - 18 June 2009 through 20 June 2009
ER -