A novel clustering algorithm with a new similarity measure and ensemble methods for mixed data clustering

نتاج البحث: Doctoral Thesis

ملخص

This thesis addressed some specific issues in clustering: (1) clustering algorithms, (2) similarity measures, (3) number of clusters, K, and (4) clustering ensemble methods. Following on an in-depth review of clustering methods, a new three staged (3-Staged) clustering algorithm is proposed, with new three key aspects: (1) a new method for automatically estimating the K value, (2) a new similarity measure and (3) initiating the clustering process with a promising BASE. A BASE is a real sample that acts like a centroid or a medoid in common clustering methods but it is determined differently in our approach. A new similarity measure is defined particularly to reflect the degree of relative change between data samples, and more importantly to be able to accommodate numerical and categorical variables. We have proven mathematically that the proposed similarity measure meets the three properties of the metric measure. This research also investigated the problem of determining the appropriate number of clusters in a dataset and devised a novel function, which is integrated into our 3-Staged clustering algorithm, to automatically estimate the most appropriate number of clusters, K. Based on our new 3-Staged clustering algorithm, we developed two new ensemble algorithms. For all experiments, we used publicly available real-world benchmark datasets as these datasets have been commonly used by other researchers. Experimental results showed that the 3- Staged clustering algorithm performed better than the compared individual methods including K-means, TwoStep and also some ensemble based methods such as K-ANMI, and ccdByEnsemble. They also showed that the proposed similarity measure is very effective in improving the clustering quality. Besides, they showed that our proposed method for estimating the K value identified the correct number of clusters for most of the tested datasets.
اللغة الأصليةEnglish
التأهيلDoctor of Philosophy
المؤسسة المانحة
  • University of East Anglia
المشرفون/المستشارون
  • Wang, Wenjia, Supervisor, موظف خارجي
  • Rayward Smith, Vic , Supervisor, موظف خارجي
تاريخ الجائزةأكتوبر ١٠ ٢٠١٠
مكان النشرUnited Kingdom
طبعة1
رقم المعيار الدولي للكتب الإلكترونية0000 0004 2700 321X
حالة النشرPublished - أكتوبر 10 2010

بصمة

أدرس بدقة موضوعات البحث “A novel clustering algorithm with a new similarity measure and ensemble methods for mixed data clustering'. فهما يشكلان معًا بصمة فريدة.

قم بذكر هذا