TY - GEN
T1 - A Modified Decision Tree and its Application to Assess Variable Importance
AU - Fuller Bbosa, Francis
AU - Wesonga, Ronald
AU - Nabende, Peter
AU - Nabukenya, Josephine
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/7/23
Y1 - 2021/7/23
N2 - This paper presents an approach to further improve the data reduction abilities of the traditional C4.5 algorithm by integrating the information gain ratio and forward stepwise regression algorithms. Motivated by the fact that the traditional C4.5 algorithm utilizes a full set of antecedent attributes without taking into consideration irrelevant attributes which is a precursor to spurious predictive model estimates. This study aims to overcome this drawback by developing and evaluating the performance of an importance-based attribute selection algorithm called the C4.5-Forward Stepwise (C4.5-FS) for improving the data reduction abilities of the traditional C4.5 classifiers. Five datasets with dimensionality ranging from 6 to 10,000 attributes were employed to evaluate the model performance the goodness of fit for the modified and traditional C4.5 classifier was done using k-fold cross-validation based on a confusion matrix. Experimental results revealed that the C4.5-FS algorithm trained on fewer antecedent attributes improved the data reduction capabilities of the traditional C4.5 algorithm trained on a full set of antecedent attributes by achieving higher accuracy.
AB - This paper presents an approach to further improve the data reduction abilities of the traditional C4.5 algorithm by integrating the information gain ratio and forward stepwise regression algorithms. Motivated by the fact that the traditional C4.5 algorithm utilizes a full set of antecedent attributes without taking into consideration irrelevant attributes which is a precursor to spurious predictive model estimates. This study aims to overcome this drawback by developing and evaluating the performance of an importance-based attribute selection algorithm called the C4.5-Forward Stepwise (C4.5-FS) for improving the data reduction abilities of the traditional C4.5 classifiers. Five datasets with dimensionality ranging from 6 to 10,000 attributes were employed to evaluate the model performance the goodness of fit for the modified and traditional C4.5 classifier was done using k-fold cross-validation based on a confusion matrix. Experimental results revealed that the C4.5-FS algorithm trained on fewer antecedent attributes improved the data reduction capabilities of the traditional C4.5 algorithm trained on a full set of antecedent attributes by achieving higher accuracy.
KW - big data
KW - C4.5
KW - data reduction
KW - Machine learning
KW - modified
KW - significant
UR - http://www.scopus.com/inward/record.url?scp=85116527236&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85116527236&partnerID=8YFLogxK
U2 - 10.1145/3478905.3479245
DO - 10.1145/3478905.3479245
M3 - Conference contribution
AN - SCOPUS:85116527236
T3 - ACM International Conference Proceeding Series
SP - 468
EP - 475
BT - 2021 4th International Conference on Data Science and Information Technology, DSIT 2021
PB - Association for Computing Machinery
T2 - 4th International Conference on Data Science and Information Technology, DSIT 2021
Y2 - 23 July 2021 through 25 July 2021
ER -