Semantic of data dependencies to improve the data quality

Houda Zaidi, Yann Pollet, Faouzi Boufarès, Naoufel Kraiem

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Data quality in databases is a critical challenge because the cost of anomalies may be very high, especially for large databases. Therefore, the correction of these anomalies represents an issue that has become more and more important both in enterprises and in academia. In this work, we address the problems of intra-column and inter-columns anomalies in big data. We propose a new approach for data cleaning that takes into account the semantic dependencies between the columns of a data source. The novelty of our proposal is the reduction of the size of the search space in the process of functional dependency discovery based on data semantics. In this paper, we present the first steps of our work. They allow recognizing the semantics of data and correct intra-column anomalies.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages53-61
Number of pages9
Volume9344
ISBN (Print)9783319237800
DOIs
Publication statusPublished - 2015
Event5th International Conference on Model and Data Engineering, MEDI 2015 - Rhodes, Greece
Duration: Sep 26 2015Sep 28 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9344
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other5th International Conference on Model and Data Engineering, MEDI 2015
CountryGreece
CityRhodes
Period9/26/159/28/15

Fingerprint

Data Dependency
Data Quality
Semantics
Anomaly
Cleaning
Functional Dependency
Search Space
Costs
Industry

Keywords

  • Big data
  • Data cleaning
  • Data quality
  • Data structure
  • Functional dependencies
  • Semantic dependencies

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Zaidi, H., Pollet, Y., Boufarès, F., & Kraiem, N. (2015). Semantic of data dependencies to improve the data quality. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9344, pp. 53-61). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9344). Springer Verlag. https://doi.org/10.1007/978-3-319-23781-7_5

Semantic of data dependencies to improve the data quality. / Zaidi, Houda; Pollet, Yann; Boufarès, Faouzi; Kraiem, Naoufel.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9344 Springer Verlag, 2015. p. 53-61 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9344).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zaidi, H, Pollet, Y, Boufarès, F & Kraiem, N 2015, Semantic of data dependencies to improve the data quality. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 9344, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9344, Springer Verlag, pp. 53-61, 5th International Conference on Model and Data Engineering, MEDI 2015, Rhodes, Greece, 9/26/15. https://doi.org/10.1007/978-3-319-23781-7_5
Zaidi H, Pollet Y, Boufarès F, Kraiem N. Semantic of data dependencies to improve the data quality. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9344. Springer Verlag. 2015. p. 53-61. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-23781-7_5
Zaidi, Houda ; Pollet, Yann ; Boufarès, Faouzi ; Kraiem, Naoufel. / Semantic of data dependencies to improve the data quality. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9344 Springer Verlag, 2015. pp. 53-61 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{05ebb96801b14a78bf3066143f46410a,
title = "Semantic of data dependencies to improve the data quality",
abstract = "Data quality in databases is a critical challenge because the cost of anomalies may be very high, especially for large databases. Therefore, the correction of these anomalies represents an issue that has become more and more important both in enterprises and in academia. In this work, we address the problems of intra-column and inter-columns anomalies in big data. We propose a new approach for data cleaning that takes into account the semantic dependencies between the columns of a data source. The novelty of our proposal is the reduction of the size of the search space in the process of functional dependency discovery based on data semantics. In this paper, we present the first steps of our work. They allow recognizing the semantics of data and correct intra-column anomalies.",
keywords = "Big data, Data cleaning, Data quality, Data structure, Functional dependencies, Semantic dependencies",
author = "Houda Zaidi and Yann Pollet and Faouzi Boufar{\`e}s and Naoufel Kraiem",
year = "2015",
doi = "10.1007/978-3-319-23781-7_5",
language = "English",
isbn = "9783319237800",
volume = "9344",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "53--61",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Semantic of data dependencies to improve the data quality

AU - Zaidi, Houda

AU - Pollet, Yann

AU - Boufarès, Faouzi

AU - Kraiem, Naoufel

PY - 2015

Y1 - 2015

N2 - Data quality in databases is a critical challenge because the cost of anomalies may be very high, especially for large databases. Therefore, the correction of these anomalies represents an issue that has become more and more important both in enterprises and in academia. In this work, we address the problems of intra-column and inter-columns anomalies in big data. We propose a new approach for data cleaning that takes into account the semantic dependencies between the columns of a data source. The novelty of our proposal is the reduction of the size of the search space in the process of functional dependency discovery based on data semantics. In this paper, we present the first steps of our work. They allow recognizing the semantics of data and correct intra-column anomalies.

AB - Data quality in databases is a critical challenge because the cost of anomalies may be very high, especially for large databases. Therefore, the correction of these anomalies represents an issue that has become more and more important both in enterprises and in academia. In this work, we address the problems of intra-column and inter-columns anomalies in big data. We propose a new approach for data cleaning that takes into account the semantic dependencies between the columns of a data source. The novelty of our proposal is the reduction of the size of the search space in the process of functional dependency discovery based on data semantics. In this paper, we present the first steps of our work. They allow recognizing the semantics of data and correct intra-column anomalies.

KW - Big data

KW - Data cleaning

KW - Data quality

KW - Data structure

KW - Functional dependencies

KW - Semantic dependencies

UR - http://www.scopus.com/inward/record.url?scp=84951813826&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84951813826&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-23781-7_5

DO - 10.1007/978-3-319-23781-7_5

M3 - Conference contribution

AN - SCOPUS:84951813826

SN - 9783319237800

VL - 9344

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 53

EP - 61

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -