A data variability index: Quantifying complexity of models and analyzing adversarial data

Rami Al-Hmouz*, Witold Pedrycz, Ahmed Chiheb Ammari, Ahmed Al-Hmouz

*المؤلف المقابل لهذا العمل

نتاج البحث: المساهمة في مجلةArticleمراجعة النظراء

ملخص

In system modeling arises a fundamental question about the level of difficulty one may encounter when designing a model on a basis of some training data. In this study, we advocate that such level of difficulty inherently depends upon the variability of the available function (data). If for a pair of input data which exhibits small differences, the differences of the corresponding outputs are substantial then building a model in the presence of such data becomes more challenging than in cases of data where the differences in the output data are far more limited. Dwelling on this observation, we introduce a variability index quantifying the nature of data in terms of variability observed in input and output data, respectively. The proposed index is model-neutral (model agnostic), namely describes and quantifies the modeling challenge implied by the data irrespectively of the specific model to be constructed. In case of functions, we show that the Lipschitz constant plays a similar role as the variability index computed for experimental data. An original way of reducing values of the variability index through a nonlinear transformation of original data completed by a fuzzy rule-based model is introduced. It is shown that such rule-based architecture gives rise to a piecewise linear transformation (multipoint linear approximation) exhibiting required contraction-dilation characteristics. The optimization of this transformation is carried out with the use of a Particle Swarm Optimization algorithm. We also demonstrate that the index can be used to quantify a concept of adversarial data. Along this line, we introduce a granular characterization of adversarial feature of individual data points. A series of experiments is provided to offer a thorough illustration and detailed insight into the nature and a thorough characterization of publicly available data.

اللغة الأصليةEnglish
رقم المقال11
الصفحات (من إلى)8412-8435
عدد الصفحات24
دوريةInternational Journal of Intelligent Systems
مستوى الصوت37
رقم الإصدار11
المعرِّفات الرقمية للأشياء
حالة النشرPublished - يونيو 29 2022

ASJC Scopus subject areas

  • ???subjectarea.asjc.1700.1712???
  • ???subjectarea.asjc.2600.2614???
  • ???subjectarea.asjc.1700.1709???
  • ???subjectarea.asjc.1700.1702???

قم بذكر هذا