TY - GEN
T1 - Semantic of data dependencies to improve the data quality
AU - Zaidi, Houda
AU - Pollet, Yann
AU - Boufarès, Faouzi
AU - Kraiem, Naoufel
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - Data quality in databases is a critical challenge because the cost of anomalies may be very high, especially for large databases. Therefore, the correction of these anomalies represents an issue that has become more and more important both in enterprises and in academia. In this work, we address the problems of intra-column and inter-columns anomalies in big data. We propose a new approach for data cleaning that takes into account the semantic dependencies between the columns of a data source. The novelty of our proposal is the reduction of the size of the search space in the process of functional dependency discovery based on data semantics. In this paper, we present the first steps of our work. They allow recognizing the semantics of data and correct intra-column anomalies.
AB - Data quality in databases is a critical challenge because the cost of anomalies may be very high, especially for large databases. Therefore, the correction of these anomalies represents an issue that has become more and more important both in enterprises and in academia. In this work, we address the problems of intra-column and inter-columns anomalies in big data. We propose a new approach for data cleaning that takes into account the semantic dependencies between the columns of a data source. The novelty of our proposal is the reduction of the size of the search space in the process of functional dependency discovery based on data semantics. In this paper, we present the first steps of our work. They allow recognizing the semantics of data and correct intra-column anomalies.
KW - Big data
KW - Data cleaning
KW - Data quality
KW - Data structure
KW - Functional dependencies
KW - Semantic dependencies
UR - http://www.scopus.com/inward/record.url?scp=84951813826&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84951813826&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-23781-7_5
DO - 10.1007/978-3-319-23781-7_5
M3 - Conference contribution
AN - SCOPUS:84951813826
SN - 9783319237800
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 53
EP - 61
BT - Model and Data Engineering - 5th International Conference, MEDI 2015, Proceedings
A2 - Manolopoulos, Yannis
A2 - Bellatreche, Ladjel
PB - Springer Verlag
T2 - 5th International Conference on Model and Data Engineering, MEDI 2015
Y2 - 26 September 2015 through 28 September 2015
ER -