A Modified Decision Tree and its Application to Assess Variable Importance

Francis Fuller Bbosa, Ronald Wesonga, Peter Nabende, Josephine Nabukenya

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents an approach to further improve the data reduction abilities of the traditional C4.5 algorithm by integrating the information gain ratio and forward stepwise regression algorithms. Motivated by the fact that the traditional C4.5 algorithm utilizes a full set of antecedent attributes without taking into consideration irrelevant attributes which is a precursor to spurious predictive model estimates. This study aims to overcome this drawback by developing and evaluating the performance of an importance-based attribute selection algorithm called the C4.5-Forward Stepwise (C4.5-FS) for improving the data reduction abilities of the traditional C4.5 classifiers. Five datasets with dimensionality ranging from 6 to 10,000 attributes were employed to evaluate the model performance the goodness of fit for the modified and traditional C4.5 classifier was done using k-fold cross-validation based on a confusion matrix. Experimental results revealed that the C4.5-FS algorithm trained on fewer antecedent attributes improved the data reduction capabilities of the traditional C4.5 algorithm trained on a full set of antecedent attributes by achieving higher accuracy.

Original languageEnglish
Title of host publication2021 4th International Conference on Data Science and Information Technology, DSIT 2021
PublisherAssociation for Computing Machinery
Pages468-475
Number of pages8
ISBN (Electronic)9781450390248
DOIs
Publication statusPublished - Jul 23 2021
Externally publishedYes
Event4th International Conference on Data Science and Information Technology, DSIT 2021 - Shanghai, China
Duration: Jul 23 2021Jul 25 2021

Publication series

NameACM International Conference Proceeding Series

Conference

Conference4th International Conference on Data Science and Information Technology, DSIT 2021
Country/TerritoryChina
CityShanghai
Period7/23/217/25/21

Keywords

  • big data
  • C4.5
  • data reduction
  • Machine learning
  • modified
  • significant

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Cite this