A hybrid method for estimating the predominant number of clusters in a data set

Jamil Alshaqsi; Wenjia Wang

doi:10.1109/ICMLA.2012.146

A hybrid method for estimating the predominant number of clusters in a data set

Jamil Alshaqsi^*, Wenjia Wang

^*المؤلف المقابل لهذا العمل

Information Systems

نتاج البحث: Conference contribution

ملخص

In cluster analysis, finding out the number of clusters, K, for a given dataset is an important yet very tricky task, simply because there is often no universally accepted correct or wrong answer for non-trivial real world problems and it also depends on the context and purpose of a cluster study. This paper presents a new hybrid method for estimating the predominant number of clusters automatically. It employs a new similarity measure and then calculates the length of constant similarity intervals, L and considers the longest consistent intervals representing the most probable numbers of the clusters under the set context. An error function is defined to measure and evaluate the goodness of estimations. The proposed method has been tested on 3 synthetic datasets and 8 real-world benchmark datasets, and compared with some other popular methods. The experimental results showed that the proposed method is able to determine the desired number of clusters for all the simulated datasets and most of the benchmark datasets, and the statistical tests indicate that our method is significantly better.

اللغة الأصلية	English
عنوان منشور المضيف	Proceedings - 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012
الصفحات	569-573
عدد الصفحات	5
المعرِّفات الرقمية للأشياء	https://doi.org/10.1109/ICMLA.2012.146
حالة النشر	Published - 2012
الحدث	11th IEEE International Conference on Machine Learning and Applications, ICMLA 2012 - Boca Raton, FL, United States المدة: ديسمبر ١٢ ٢٠١٢ → ديسمبر ١٥ ٢٠١٢

سلسلة المنشورات

الاسم	Proceedings - 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012
مستوى الصوت	2

Other

Other	11th IEEE International Conference on Machine Learning and Applications, ICMLA 2012
الدولة/الإقليم	United States
المدينة	Boca Raton, FL
المدة	١٢/١٢/١٢ → ١٢/١٥/١٢

ASJC Scopus subject areas

???subjectarea.asjc.1700.1709???
???subjectarea.asjc.3300.3304???

الوصول إلى المستند

10.1109/ICMLA.2012.146

الملفات والروابط الأخرى

قم بذكر هذا

Alshaqsi, J., & Wang, W. (2012). A hybrid method for estimating the predominant number of clusters in a data set. في Proceedings - 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012 (الصفحات 569-573). المقال 6406797 (Proceedings - 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012; المجلد 2). https://doi.org/10.1109/ICMLA.2012.146

A hybrid method for estimating the predominant number of clusters in a data set. / Alshaqsi, Jamil; Wang, Wenjia.
Proceedings - 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012. 2012. صفحة 569-573 6406797 (Proceedings - 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012; المجلد 2).

نتاج البحث: Conference contribution

Alshaqsi, J & Wang, W 2012, A hybrid method for estimating the predominant number of clusters in a data set. في Proceedings - 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012., 6406797, Proceedings - 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012, المجلد 2, الصفحات 569-573, 11th IEEE International Conference on Machine Learning and Applications, ICMLA 2012, Boca Raton, FL, United States, ١٢/١٢/١٢. https://doi.org/10.1109/ICMLA.2012.146

@inproceedings{6e67b4e07501457c9e002be882d07b98,

title = "A hybrid method for estimating the predominant number of clusters in a data set",

abstract = "In cluster analysis, finding out the number of clusters, K, for a given dataset is an important yet very tricky task, simply because there is often no universally accepted correct or wrong answer for non-trivial real world problems and it also depends on the context and purpose of a cluster study. This paper presents a new hybrid method for estimating the predominant number of clusters automatically. It employs a new similarity measure and then calculates the length of constant similarity intervals, L and considers the longest consistent intervals representing the most probable numbers of the clusters under the set context. An error function is defined to measure and evaluate the goodness of estimations. The proposed method has been tested on 3 synthetic datasets and 8 real-world benchmark datasets, and compared with some other popular methods. The experimental results showed that the proposed method is able to determine the desired number of clusters for all the simulated datasets and most of the benchmark datasets, and the statistical tests indicate that our method is significantly better.",

keywords = "cluster analysis, cluster number, similarity measure",

author = "Jamil Alshaqsi and Wenjia Wang",

year = "2012",

doi = "10.1109/ICMLA.2012.146",

language = "English",

isbn = "9780769549132",

series = "Proceedings - 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012",

pages = "569--573",

booktitle = "Proceedings - 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012",

note = "11th IEEE International Conference on Machine Learning and Applications, ICMLA 2012 ; Conference date: 12-12-2012 Through 15-12-2012",

}

TY - GEN

T1 - A hybrid method for estimating the predominant number of clusters in a data set

AU - Alshaqsi, Jamil

AU - Wang, Wenjia

PY - 2012

Y1 - 2012

N2 - In cluster analysis, finding out the number of clusters, K, for a given dataset is an important yet very tricky task, simply because there is often no universally accepted correct or wrong answer for non-trivial real world problems and it also depends on the context and purpose of a cluster study. This paper presents a new hybrid method for estimating the predominant number of clusters automatically. It employs a new similarity measure and then calculates the length of constant similarity intervals, L and considers the longest consistent intervals representing the most probable numbers of the clusters under the set context. An error function is defined to measure and evaluate the goodness of estimations. The proposed method has been tested on 3 synthetic datasets and 8 real-world benchmark datasets, and compared with some other popular methods. The experimental results showed that the proposed method is able to determine the desired number of clusters for all the simulated datasets and most of the benchmark datasets, and the statistical tests indicate that our method is significantly better.

AB - In cluster analysis, finding out the number of clusters, K, for a given dataset is an important yet very tricky task, simply because there is often no universally accepted correct or wrong answer for non-trivial real world problems and it also depends on the context and purpose of a cluster study. This paper presents a new hybrid method for estimating the predominant number of clusters automatically. It employs a new similarity measure and then calculates the length of constant similarity intervals, L and considers the longest consistent intervals representing the most probable numbers of the clusters under the set context. An error function is defined to measure and evaluate the goodness of estimations. The proposed method has been tested on 3 synthetic datasets and 8 real-world benchmark datasets, and compared with some other popular methods. The experimental results showed that the proposed method is able to determine the desired number of clusters for all the simulated datasets and most of the benchmark datasets, and the statistical tests indicate that our method is significantly better.

KW - cluster analysis

KW - cluster number

KW - similarity measure

UR - http://www.scopus.com/inward/record.url?scp=84873589657&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84873589657&partnerID=8YFLogxK

U2 - 10.1109/ICMLA.2012.146

DO - 10.1109/ICMLA.2012.146

M3 - Conference contribution

AN - SCOPUS:84873589657

SN - 9780769549132

T3 - Proceedings - 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012

SP - 569

EP - 573

BT - Proceedings - 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012

T2 - 11th IEEE International Conference on Machine Learning and Applications, ICMLA 2012

Y2 - 12 December 2012 through 15 December 2012

ER -

A hybrid method for estimating the predominant number of clusters in a data set

ملخص

سلسلة المنشورات

Other

ASJC Scopus subject areas

الوصول إلى المستند

الملفات والروابط الأخرى

بصمة

قم بذكر هذا