TY - JOUR
T1 - A data variability index
T2 - Quantifying complexity of models and analyzing adversarial data
AU - Al-Hmouz, Rami
AU - Pedrycz, Witold
AU - Ammari, Ahmed Chiheb
AU - Al-Hmouz, Ahmed
N1 - Funding Information:
This project was funded by Sultan Qaboos University under Grant (IG/ENG/ECED/22/01). The authors, therefore, acknowledge with thanks the technical and financial support.
Publisher Copyright:
© 2022 Wiley Periodicals LLC.
DBLP License: DBLP's bibliographic metadata records provided through http://dblp.org/ are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.
PY - 2022/6/29
Y1 - 2022/6/29
N2 - In system modeling arises a fundamental question about the level of difficulty one may encounter when designing a model on a basis of some training data. In this study, we advocate that such level of difficulty inherently depends upon the variability of the available function (data). If for a pair of input data which exhibits small differences, the differences of the corresponding outputs are substantial then building a model in the presence of such data becomes more challenging than in cases of data where the differences in the output data are far more limited. Dwelling on this observation, we introduce a variability index quantifying the nature of data in terms of variability observed in input and output data, respectively. The proposed index is model-neutral (model agnostic), namely describes and quantifies the modeling challenge implied by the data irrespectively of the specific model to be constructed. In case of functions, we show that the Lipschitz constant plays a similar role as the variability index computed for experimental data. An original way of reducing values of the variability index through a nonlinear transformation of original data completed by a fuzzy rule-based model is introduced. It is shown that such rule-based architecture gives rise to a piecewise linear transformation (multipoint linear approximation) exhibiting required contraction-dilation characteristics. The optimization of this transformation is carried out with the use of a Particle Swarm Optimization algorithm. We also demonstrate that the index can be used to quantify a concept of adversarial data. Along this line, we introduce a granular characterization of adversarial feature of individual data points. A series of experiments is provided to offer a thorough illustration and detailed insight into the nature and a thorough characterization of publicly available data.
AB - In system modeling arises a fundamental question about the level of difficulty one may encounter when designing a model on a basis of some training data. In this study, we advocate that such level of difficulty inherently depends upon the variability of the available function (data). If for a pair of input data which exhibits small differences, the differences of the corresponding outputs are substantial then building a model in the presence of such data becomes more challenging than in cases of data where the differences in the output data are far more limited. Dwelling on this observation, we introduce a variability index quantifying the nature of data in terms of variability observed in input and output data, respectively. The proposed index is model-neutral (model agnostic), namely describes and quantifies the modeling challenge implied by the data irrespectively of the specific model to be constructed. In case of functions, we show that the Lipschitz constant plays a similar role as the variability index computed for experimental data. An original way of reducing values of the variability index through a nonlinear transformation of original data completed by a fuzzy rule-based model is introduced. It is shown that such rule-based architecture gives rise to a piecewise linear transformation (multipoint linear approximation) exhibiting required contraction-dilation characteristics. The optimization of this transformation is carried out with the use of a Particle Swarm Optimization algorithm. We also demonstrate that the index can be used to quantify a concept of adversarial data. Along this line, we introduce a granular characterization of adversarial feature of individual data points. A series of experiments is provided to offer a thorough illustration and detailed insight into the nature and a thorough characterization of publicly available data.
KW - adversarial data
KW - fuzzy rule-based model
KW - granular computing
KW - Lipschitz constant
KW - system modeling
KW - variability index
KW - variability of input–output data
UR - http://www.scopus.com/inward/record.url?scp=85132856135&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85132856135&partnerID=8YFLogxK
U2 - 10.1002/int.22947
DO - 10.1002/int.22947
M3 - Article
AN - SCOPUS:85132856135
SN - 0884-8173
VL - 37
SP - 8412
EP - 8435
JO - International Journal of Intelligent Systems
JF - International Journal of Intelligent Systems
IS - 11
M1 - 11
ER -