TY - JOUR
T1 - Improving intrusion detection model prediction by threshold adaptation
AU - Al Tobi, Amjad M.
AU - Duncan, Ishbel
N1 - Funding Information:
Funding: This research was supported and funded by the Government of the Sultanate of Oman represented by the Ministry of Higher Education and the Sultan Qaboos University.
Publisher Copyright:
© 2019 by the authors.
PY - 2019
Y1 - 2019
N2 - Network traffic exhibits a high level of variability over short periods of time. This variability impacts negatively on the accuracy of anomaly-based network intrusion detection systems (IDS) that are built using predictive models in a batch learning setup. This work investigates how adapting the discriminating threshold of model predictions, specifically to the evaluated traffic, improves the detection rates of these intrusion detection models. Specifically, this research studied the adaptability features of three well known machine learning algorithms: C5.0, Random Forest and Support Vector Machine. Each algorithm's ability to adapt their prediction thresholds was assessed and analysed under different scenarios that simulated real world settings using the prospective sampling approach. Multiple IDS datasets were used for the analysis, including a newly generated dataset (STA2018). This research demonstrated empirically the importance of threshold adaptation in improving the accuracy of detection models when training and evaluation traffic have different statistical properties. Tests were undertaken to analyse the effects of feature selection and data balancing on model accuracy when different significant features in traffic were used. The effects of threshold adaptation on improving accuracy were statistically analysed. Of the three compared algorithms, Random Forest was the most adaptable and had the highest detection rates.
AB - Network traffic exhibits a high level of variability over short periods of time. This variability impacts negatively on the accuracy of anomaly-based network intrusion detection systems (IDS) that are built using predictive models in a batch learning setup. This work investigates how adapting the discriminating threshold of model predictions, specifically to the evaluated traffic, improves the detection rates of these intrusion detection models. Specifically, this research studied the adaptability features of three well known machine learning algorithms: C5.0, Random Forest and Support Vector Machine. Each algorithm's ability to adapt their prediction thresholds was assessed and analysed under different scenarios that simulated real world settings using the prospective sampling approach. Multiple IDS datasets were used for the analysis, including a newly generated dataset (STA2018). This research demonstrated empirically the importance of threshold adaptation in improving the accuracy of detection models when training and evaluation traffic have different statistical properties. Tests were undertaken to analyse the effects of feature selection and data balancing on model accuracy when different significant features in traffic were used. The effects of threshold adaptation on improving accuracy were statistically analysed. Of the three compared algorithms, Random Forest was the most adaptable and had the highest detection rates.
KW - Anomaly-based IDS
KW - C5.0
KW - Intrusion Detection System
KW - Machine Learning
KW - Prediction accuracy improvement
KW - Random Forest
KW - STA2018 dataset
KW - Support Vector Machine
KW - Threshold adaptation
UR - http://www.scopus.com/inward/record.url?scp=85065886636&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85065886636&partnerID=8YFLogxK
U2 - 10.3390/info10050159
DO - 10.3390/info10050159
M3 - Article
AN - SCOPUS:85065886636
SN - 2078-2489
VL - 10
JO - Information (Switzerland)
JF - Information (Switzerland)
IS - 5
M1 - 159
ER -