A data variability index: Quantifying complexity of models and analyzing adversarial data

Rami Al-Hmouz*, Witold Pedrycz, Ahmed Chiheb Ammari, Ahmed Al-Hmouz

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

In system modeling arises a fundamental question about the level of difficulty one may encounter when designing a model on a basis of some training data. In this study, we advocate that such level of difficulty inherently depends upon the variability of the available function (data). If for a pair of input data which exhibits small differences, the differences of the corresponding outputs are substantial then building a model in the presence of such data becomes more challenging than in cases of data where the differences in the output data are far more limited. Dwelling on this observation, we introduce a variability index quantifying the nature of data in terms of variability observed in input and output data, respectively. The proposed index is model-neutral (model agnostic), namely describes and quantifies the modeling challenge implied by the data irrespectively of the specific model to be constructed. In case of functions, we show that the Lipschitz constant plays a similar role as the variability index computed for experimental data. An original way of reducing values of the variability index through a nonlinear transformation of original data completed by a fuzzy rule-based model is introduced. It is shown that such rule-based architecture gives rise to a piecewise linear transformation (multipoint linear approximation) exhibiting required contraction-dilation characteristics. The optimization of this transformation is carried out with the use of a Particle Swarm Optimization algorithm. We also demonstrate that the index can be used to quantify a concept of adversarial data. Along this line, we introduce a granular characterization of adversarial feature of individual data points. A series of experiments is provided to offer a thorough illustration and detailed insight into the nature and a thorough characterization of publicly available data.

Original languageEnglish
Pages (from-to)8412-8435
Number of pages24
JournalInternational Journal of Intelligent Systems
Volume37
Issue number11
DOIs
Publication statusPublished - Nov 2022

Keywords

  • adversarial data
  • fuzzy rule-based model
  • granular computing
  • Lipschitz constant
  • system modeling
  • variability index
  • variability of input–output data

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Human-Computer Interaction
  • Artificial Intelligence

Cite this