A novel three staged clustering algorithm

Jamil Al-Shaqsi, Wenjia Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

This paper presents a novel three staged clustering algorithm and a new similarity measure. The main objective of the first stage is to create the initial clusters, the second stage is to refine the initial clusters, and the third stage is to refine the initial BASES, if necessary. The novelty of our algorithm originates mainly from three aspects: automatically estimating k value, a new similarity measure and starting the clustering process with a promising BASE. A BASE acts similar to a centroid or a medoid in common clustering method but is determined differently in our method. The new similarity measure is defined particularly to reflect the degree of the relative change between data samples and to accommodate both numerical and categorical variables. Moreover, an additional function has been devised within this algorithm to automatically estimate the most appropriate number of clusters for a given dataset. The proposed algorithm has been tested on 3 benchmark datasets and compared with 7 other commonly used methods including TwoStep, k-means, k-modes, GAClust, Squeezer and some ensemble based methods including k-ANMI. The experimental results indicate that our algorithm identified the appropriate number of clusters for the tested datasets and also showed its overall better clustering performance over the compared clustering algorithms.

Original languageEnglish
Title of host publicationProceedings of the IADIS European Conference on Data Mining 2009, ECDM'09 Part of the IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2009
Pages19-26
Number of pages8
Publication statusPublished - 2009
EventIADIS European Conference on Data Mining 2009, ECDM'09. Part of the IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2009 - Algarve, Portugal
Duration: Jun 18 2009Jun 20 2009

Other

OtherIADIS European Conference on Data Mining 2009, ECDM'09. Part of the IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2009
CountryPortugal
CityAlgarve
Period6/18/096/20/09

    Fingerprint

Keywords

  • Automatic cluster detection
  • Centroid selection
  • Clustering
  • Similarity measures

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Al-Shaqsi, J., & Wang, W. (2009). A novel three staged clustering algorithm. In Proceedings of the IADIS European Conference on Data Mining 2009, ECDM'09 Part of the IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2009 (pp. 19-26)