Semantic of data dependencies to improve the data quality

Houda Zaidi*, Yann Pollet, Faouzi Boufarès, Naoufel Kraiem

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Data quality in databases is a critical challenge because the cost of anomalies may be very high, especially for large databases. Therefore, the correction of these anomalies represents an issue that has become more and more important both in enterprises and in academia. In this work, we address the problems of intra-column and inter-columns anomalies in big data. We propose a new approach for data cleaning that takes into account the semantic dependencies between the columns of a data source. The novelty of our proposal is the reduction of the size of the search space in the process of functional dependency discovery based on data semantics. In this paper, we present the first steps of our work. They allow recognizing the semantics of data and correct intra-column anomalies.

Original languageEnglish
Title of host publicationModel and Data Engineering - 5th International Conference, MEDI 2015, Proceedings
EditorsYannis Manolopoulos, Ladjel Bellatreche
PublisherSpringer Verlag
Pages53-61
Number of pages9
ISBN (Print)9783319237800
DOIs
Publication statusPublished - 2015
Externally publishedYes
Event5th International Conference on Model and Data Engineering, MEDI 2015 - Rhodes, Greece
Duration: Sept 26 2015Sept 28 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9344
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other5th International Conference on Model and Data Engineering, MEDI 2015
Country/TerritoryGreece
CityRhodes
Period9/26/159/28/15

Keywords

  • Big data
  • Data cleaning
  • Data quality
  • Data structure
  • Functional dependencies
  • Semantic dependencies

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Semantic of data dependencies to improve the data quality'. Together they form a unique fingerprint.

Cite this