A novel three staged clustering algorithm

Jamil Al-Shaqsi, Wenjia Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

This paper presents a novel three staged clustering algorithm and a new similarity measure. The main objective of the first stage is to create the initial clusters, the second stage is to refine the initial clusters, and the third stage is to refine the initial BASES, if necessary. The novelty of our algorithm originates mainly from three aspects: automatically estimating k value, a new similarity measure and starting the clustering process with a promising BASE. A BASE acts similar to a centroid or a medoid in common clustering method but is determined differently in our method. The new similarity measure is defined particularly to reflect the degree of the relative change between data samples and to accommodate both numerical and categorical variables. Moreover, an additional function has been devised within this algorithm to automatically estimate the most appropriate number of clusters for a given dataset. The proposed algorithm has been tested on 3 benchmark datasets and compared with 7 other commonly used methods including TwoStep, k-means, k-modes, GAClust, Squeezer and some ensemble based methods including k-ANMI. The experimental results indicate that our algorithm identified the appropriate number of clusters for the tested datasets and also showed its overall better clustering performance over the compared clustering algorithms.

Original languageEnglish
Title of host publicationProceedings of the IADIS European Conference on Data Mining 2009, ECDM'09 Part of the IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2009
Pages19-26
Number of pages8
Publication statusPublished - 2009
EventIADIS European Conference on Data Mining 2009, ECDM'09. Part of the IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2009 - Algarve, Portugal
Duration: Jun 18 2009Jun 20 2009

Other

OtherIADIS European Conference on Data Mining 2009, ECDM'09. Part of the IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2009
CountryPortugal
CityAlgarve
Period6/18/096/20/09

Fingerprint

Clustering algorithms

Keywords

  • Automatic cluster detection
  • Centroid selection
  • Clustering
  • Similarity measures

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Al-Shaqsi, J., & Wang, W. (2009). A novel three staged clustering algorithm. In Proceedings of the IADIS European Conference on Data Mining 2009, ECDM'09 Part of the IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2009 (pp. 19-26)

A novel three staged clustering algorithm. / Al-Shaqsi, Jamil; Wang, Wenjia.

Proceedings of the IADIS European Conference on Data Mining 2009, ECDM'09 Part of the IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2009. 2009. p. 19-26.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Al-Shaqsi, J & Wang, W 2009, A novel three staged clustering algorithm. in Proceedings of the IADIS European Conference on Data Mining 2009, ECDM'09 Part of the IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2009. pp. 19-26, IADIS European Conference on Data Mining 2009, ECDM'09. Part of the IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2009, Algarve, Portugal, 6/18/09.
Al-Shaqsi J, Wang W. A novel three staged clustering algorithm. In Proceedings of the IADIS European Conference on Data Mining 2009, ECDM'09 Part of the IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2009. 2009. p. 19-26
Al-Shaqsi, Jamil ; Wang, Wenjia. / A novel three staged clustering algorithm. Proceedings of the IADIS European Conference on Data Mining 2009, ECDM'09 Part of the IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2009. 2009. pp. 19-26
@inproceedings{74ae59d4436a4a168683fc5f96436ea8,
title = "A novel three staged clustering algorithm",
abstract = "This paper presents a novel three staged clustering algorithm and a new similarity measure. The main objective of the first stage is to create the initial clusters, the second stage is to refine the initial clusters, and the third stage is to refine the initial BASES, if necessary. The novelty of our algorithm originates mainly from three aspects: automatically estimating k value, a new similarity measure and starting the clustering process with a promising BASE. A BASE acts similar to a centroid or a medoid in common clustering method but is determined differently in our method. The new similarity measure is defined particularly to reflect the degree of the relative change between data samples and to accommodate both numerical and categorical variables. Moreover, an additional function has been devised within this algorithm to automatically estimate the most appropriate number of clusters for a given dataset. The proposed algorithm has been tested on 3 benchmark datasets and compared with 7 other commonly used methods including TwoStep, k-means, k-modes, GAClust, Squeezer and some ensemble based methods including k-ANMI. The experimental results indicate that our algorithm identified the appropriate number of clusters for the tested datasets and also showed its overall better clustering performance over the compared clustering algorithms.",
keywords = "Automatic cluster detection, Centroid selection, Clustering, Similarity measures",
author = "Jamil Al-Shaqsi and Wenjia Wang",
year = "2009",
language = "English",
isbn = "9789728924881",
pages = "19--26",
booktitle = "Proceedings of the IADIS European Conference on Data Mining 2009, ECDM'09 Part of the IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2009",

}

TY - GEN

T1 - A novel three staged clustering algorithm

AU - Al-Shaqsi, Jamil

AU - Wang, Wenjia

PY - 2009

Y1 - 2009

N2 - This paper presents a novel three staged clustering algorithm and a new similarity measure. The main objective of the first stage is to create the initial clusters, the second stage is to refine the initial clusters, and the third stage is to refine the initial BASES, if necessary. The novelty of our algorithm originates mainly from three aspects: automatically estimating k value, a new similarity measure and starting the clustering process with a promising BASE. A BASE acts similar to a centroid or a medoid in common clustering method but is determined differently in our method. The new similarity measure is defined particularly to reflect the degree of the relative change between data samples and to accommodate both numerical and categorical variables. Moreover, an additional function has been devised within this algorithm to automatically estimate the most appropriate number of clusters for a given dataset. The proposed algorithm has been tested on 3 benchmark datasets and compared with 7 other commonly used methods including TwoStep, k-means, k-modes, GAClust, Squeezer and some ensemble based methods including k-ANMI. The experimental results indicate that our algorithm identified the appropriate number of clusters for the tested datasets and also showed its overall better clustering performance over the compared clustering algorithms.

AB - This paper presents a novel three staged clustering algorithm and a new similarity measure. The main objective of the first stage is to create the initial clusters, the second stage is to refine the initial clusters, and the third stage is to refine the initial BASES, if necessary. The novelty of our algorithm originates mainly from three aspects: automatically estimating k value, a new similarity measure and starting the clustering process with a promising BASE. A BASE acts similar to a centroid or a medoid in common clustering method but is determined differently in our method. The new similarity measure is defined particularly to reflect the degree of the relative change between data samples and to accommodate both numerical and categorical variables. Moreover, an additional function has been devised within this algorithm to automatically estimate the most appropriate number of clusters for a given dataset. The proposed algorithm has been tested on 3 benchmark datasets and compared with 7 other commonly used methods including TwoStep, k-means, k-modes, GAClust, Squeezer and some ensemble based methods including k-ANMI. The experimental results indicate that our algorithm identified the appropriate number of clusters for the tested datasets and also showed its overall better clustering performance over the compared clustering algorithms.

KW - Automatic cluster detection

KW - Centroid selection

KW - Clustering

KW - Similarity measures

UR - http://www.scopus.com/inward/record.url?scp=77955603665&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955603665&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:77955603665

SN - 9789728924881

SP - 19

EP - 26

BT - Proceedings of the IADIS European Conference on Data Mining 2009, ECDM'09 Part of the IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2009

ER -