Examining Techniques to Solving Imbalanced Datasets in Educational Data Mining Systems

Shubair Abdulkareem Abdullah Abdullah

doi:10.47839/ijc.21.2.2589

Examining Techniques to Solving Imbalanced Datasets in Educational Data Mining Systems

Shubair Abdulkareem Abdullah Abdullah

Instructional & Learning Technologies

Research output: Contribution to journal › Article › peer-review

5 Citations (Scopus)

Abstract

The educational data mining research attempts have contributed in developing policies to improve student learning in different levels of educational institutions. One of the common challenges to building accurate classification and prediction systems is the imbalanced distribution of classes in the data collected. This study investigates data-level techniques and algorithm-level techniques. Six classifiers from each technique are used to explore their effectiveness to handle the imbalanced data problem while predicting students’ graduation grade based on their performance at the first stage. The classifiers are tested using the k-fold cross-validation approach before and after applying the data-level and algorithm-level techniques. For the purpose of evaluation, various evaluation metrics have been used such as accuracy, precision, recall, and f1-score. The results showed that the classifiers do not perform well with imbalanced dataset, and the performance could be improved by using these techniques. As for the level of improvement, it varies from one technique to another.

Original language	English
Pages (from-to)	205-213
Number of pages	9
Journal	International Journal of Computing
Volume	21
Issue number	2
DOIs	https://doi.org/10.47839/ijc.21.2.2589
Publication status	Published - Jun 30 2022

Keywords

Educational data mining
Imbalanced datasets
Machine learning
Prediction
Student grade

ASJC Scopus subject areas

Computer Science (miscellaneous)
Software
Information Systems
Hardware and Architecture
Computer Networks and Communications

Access to Document

10.47839/ijc.21.2.2589

Cite this

@article{36ffa0e2ef9d4ecfb31921295711dd3b,

title = "Examining Techniques to Solving Imbalanced Datasets in Educational Data Mining Systems",

abstract = "The educational data mining research attempts have contributed in developing policies to improve student learning in different levels of educational institutions. One of the common challenges to building accurate classification and prediction systems is the imbalanced distribution of classes in the data collected. This study investigates data-level techniques and algorithm-level techniques. Six classifiers from each technique are used to explore their effectiveness to handle the imbalanced data problem while predicting students{\textquoteright} graduation grade based on their performance at the first stage. The classifiers are tested using the k-fold cross-validation approach before and after applying the data-level and algorithm-level techniques. For the purpose of evaluation, various evaluation metrics have been used such as accuracy, precision, recall, and f1-score. The results showed that the classifiers do not perform well with imbalanced dataset, and the performance could be improved by using these techniques. As for the level of improvement, it varies from one technique to another.",

keywords = "Educational data mining, Imbalanced datasets, Machine learning, Prediction, Student grade",

author = "{Abdulkareem Abdullah Abdullah}, Shubair",

year = "2022",

month = jun,

day = "30",

doi = "10.47839/ijc.21.2.2589",

language = "English",

volume = "21",

pages = "205--213",

journal = "International Journal of Computing",

issn = "1727-6209",

publisher = "Research Institute of Intelligent Computer Systems",

number = "2",

}

TY - JOUR

T1 - Examining Techniques to Solving Imbalanced Datasets in Educational Data Mining Systems

AU - Abdulkareem Abdullah Abdullah, Shubair

PY - 2022/6/30

Y1 - 2022/6/30

N2 - The educational data mining research attempts have contributed in developing policies to improve student learning in different levels of educational institutions. One of the common challenges to building accurate classification and prediction systems is the imbalanced distribution of classes in the data collected. This study investigates data-level techniques and algorithm-level techniques. Six classifiers from each technique are used to explore their effectiveness to handle the imbalanced data problem while predicting students’ graduation grade based on their performance at the first stage. The classifiers are tested using the k-fold cross-validation approach before and after applying the data-level and algorithm-level techniques. For the purpose of evaluation, various evaluation metrics have been used such as accuracy, precision, recall, and f1-score. The results showed that the classifiers do not perform well with imbalanced dataset, and the performance could be improved by using these techniques. As for the level of improvement, it varies from one technique to another.

AB - The educational data mining research attempts have contributed in developing policies to improve student learning in different levels of educational institutions. One of the common challenges to building accurate classification and prediction systems is the imbalanced distribution of classes in the data collected. This study investigates data-level techniques and algorithm-level techniques. Six classifiers from each technique are used to explore their effectiveness to handle the imbalanced data problem while predicting students’ graduation grade based on their performance at the first stage. The classifiers are tested using the k-fold cross-validation approach before and after applying the data-level and algorithm-level techniques. For the purpose of evaluation, various evaluation metrics have been used such as accuracy, precision, recall, and f1-score. The results showed that the classifiers do not perform well with imbalanced dataset, and the performance could be improved by using these techniques. As for the level of improvement, it varies from one technique to another.

KW - Educational data mining

KW - Imbalanced datasets

KW - Machine learning

KW - Prediction

KW - Student grade

UR - http://www.scopus.com/inward/record.url?scp=85133525855&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85133525855&partnerID=8YFLogxK

U2 - 10.47839/ijc.21.2.2589

DO - 10.47839/ijc.21.2.2589

M3 - Article

SN - 1727-6209

VL - 21

SP - 205

EP - 213

JO - International Journal of Computing

JF - International Journal of Computing

IS - 2

ER -

Examining Techniques to Solving Imbalanced Datasets in Educational Data Mining Systems

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Cite this