A dynamic K-means clustering for data mining

Md Zakir Hossain; Md Nasim Akhtar; R. Badlishah Ahmad; Mostafijur Rahman

doi:10.11591/ijeecs.v13.i2.pp521-526

A dynamic K-means clustering for data mining

Md Zakir Hossain^*, Md Nasim Akhtar, R. Badlishah Ahmad, Mostafijur Rahman

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

91 Citations (Scopus)

Abstract

Data mining is the process of finding structure of data from large data sets. With this process, the decision makers can make a particular decision for further development of the real-world problems. Several data clusteringtechniques are used in data mining for finding a specific pattern of data. The K-means method isone of the familiar clustering techniques for clustering large data sets. The K-means clustering method partitions the data set based on the assumption that the number of clusters are fixed. The main problem of this method is that if the number of clusters is to be chosen small then there is a higher probability of adding dissimilar items into the same group. On the other hand, if the number of clusters is chosen to be high, then there is a higher chance of adding similar items in the different groups. In this paper, we address this issue by proposing a new K-Means clustering algorithm. The proposed method performs data clustering dynamically. The proposed method initially calculates a threshold value as a centroid of K-Means and based on this value the number of clusters are formed. At each iteration of K-Means, if the Euclidian distance between two points is less than or equal to the threshold value, then these two data points will be in the same group. Otherwise, the proposed method will create a new cluster with the dissimilar data point. The results show that the proposed method outperforms the original K-Means method.

Original language	English
Pages (from-to)	521-526
Number of pages	6
Journal	Indonesian Journal of Electrical Engineering and Computer Science
Volume	13
Issue number	2
DOIs	https://doi.org/10.11591/ijeecs.v13.i2.pp521-526
Publication status	Published - Feb 1 2019
Externally published	Yes

Keywords

Centroid
Clustering
Data mining
Euclidean distance
K-Means
Threshold value

ASJC Scopus subject areas

Signal Processing
Information Systems
Hardware and Architecture
Computer Networks and Communications
Control and Optimization
Electrical and Electronic Engineering

Access to Document

10.11591/ijeecs.v13.i2.pp521-526

Cite this

@article{1687eb8594d54bfd9bc249e191160669,

title = "A dynamic K-means clustering for data mining",

abstract = "Data mining is the process of finding structure of data from large data sets. With this process, the decision makers can make a particular decision for further development of the real-world problems. Several data clusteringtechniques are used in data mining for finding a specific pattern of data. The K-means method isone of the familiar clustering techniques for clustering large data sets. The K-means clustering method partitions the data set based on the assumption that the number of clusters are fixed. The main problem of this method is that if the number of clusters is to be chosen small then there is a higher probability of adding dissimilar items into the same group. On the other hand, if the number of clusters is chosen to be high, then there is a higher chance of adding similar items in the different groups. In this paper, we address this issue by proposing a new K-Means clustering algorithm. The proposed method performs data clustering dynamically. The proposed method initially calculates a threshold value as a centroid of K-Means and based on this value the number of clusters are formed. At each iteration of K-Means, if the Euclidian distance between two points is less than or equal to the threshold value, then these two data points will be in the same group. Otherwise, the proposed method will create a new cluster with the dissimilar data point. The results show that the proposed method outperforms the original K-Means method.",

keywords = "Centroid, Clustering, Data mining, Euclidean distance, K-Means, Threshold value",

author = "Hossain, {Md Zakir} and Akhtar, {Md Nasim} and Ahmad, {R. Badlishah} and Mostafijur Rahman",

year = "2019",

month = feb,

day = "1",

doi = "10.11591/ijeecs.v13.i2.pp521-526",

language = "English",

volume = "13",

pages = "521--526",

journal = "Indonesian Journal of Electrical Engineering and Computer Science",

issn = "2502-4752",

publisher = "Institute of Advanced Engineering and Science (IAES)",

number = "2",

}

TY - JOUR

T1 - A dynamic K-means clustering for data mining

AU - Hossain, Md Zakir

AU - Akhtar, Md Nasim

AU - Ahmad, R. Badlishah

AU - Rahman, Mostafijur

PY - 2019/2/1

Y1 - 2019/2/1

N2 - Data mining is the process of finding structure of data from large data sets. With this process, the decision makers can make a particular decision for further development of the real-world problems. Several data clusteringtechniques are used in data mining for finding a specific pattern of data. The K-means method isone of the familiar clustering techniques for clustering large data sets. The K-means clustering method partitions the data set based on the assumption that the number of clusters are fixed. The main problem of this method is that if the number of clusters is to be chosen small then there is a higher probability of adding dissimilar items into the same group. On the other hand, if the number of clusters is chosen to be high, then there is a higher chance of adding similar items in the different groups. In this paper, we address this issue by proposing a new K-Means clustering algorithm. The proposed method performs data clustering dynamically. The proposed method initially calculates a threshold value as a centroid of K-Means and based on this value the number of clusters are formed. At each iteration of K-Means, if the Euclidian distance between two points is less than or equal to the threshold value, then these two data points will be in the same group. Otherwise, the proposed method will create a new cluster with the dissimilar data point. The results show that the proposed method outperforms the original K-Means method.

AB - Data mining is the process of finding structure of data from large data sets. With this process, the decision makers can make a particular decision for further development of the real-world problems. Several data clusteringtechniques are used in data mining for finding a specific pattern of data. The K-means method isone of the familiar clustering techniques for clustering large data sets. The K-means clustering method partitions the data set based on the assumption that the number of clusters are fixed. The main problem of this method is that if the number of clusters is to be chosen small then there is a higher probability of adding dissimilar items into the same group. On the other hand, if the number of clusters is chosen to be high, then there is a higher chance of adding similar items in the different groups. In this paper, we address this issue by proposing a new K-Means clustering algorithm. The proposed method performs data clustering dynamically. The proposed method initially calculates a threshold value as a centroid of K-Means and based on this value the number of clusters are formed. At each iteration of K-Means, if the Euclidian distance between two points is less than or equal to the threshold value, then these two data points will be in the same group. Otherwise, the proposed method will create a new cluster with the dissimilar data point. The results show that the proposed method outperforms the original K-Means method.

KW - Centroid

KW - Clustering

KW - Data mining

KW - Euclidean distance

KW - K-Means

KW - Threshold value

UR - http://www.scopus.com/inward/record.url?scp=85060868369&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85060868369&partnerID=8YFLogxK

U2 - 10.11591/ijeecs.v13.i2.pp521-526

DO - 10.11591/ijeecs.v13.i2.pp521-526

M3 - Article

AN - SCOPUS:85060868369

SN - 2502-4752

VL - 13

SP - 521

EP - 526

JO - Indonesian Journal of Electrical Engineering and Computer Science

JF - Indonesian Journal of Electrical Engineering and Computer Science

IS - 2

ER -

A dynamic K-means clustering for data mining

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this