Part of Speech (POS) tag sets reduction and analysis using rough set techniques

Mohamed Elhadi*, Amjd Al-Tobi

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution


The motivation behind this work stems from an earlier work where text was transformed into strings of syntactical structures and used in similarity calculations using sequence algorithm on a string generated by a POS tagger. The performance of computations was greatly affected by the size of the string which in itself is the result of the type of tags used. Generated tags range from several (minimum of nine) general ones to many more (hundreds) detailed tags. Figuring out which tags and what combination of tags affect the realization of meanings, dependencies or relationships that exist in the text is an important issue. The resulting tag set reduction using rough sets and consequently string reduction has resulted in an improved efficiency in similarity calculations between documents while maintaining the same level of accuracy. Such finding was very encouraging.

Original languageEnglish
Title of host publicationRough Sets, Fuzzy Sets, Data Mining and Granular Computing - 12th International Conference, RSFDGrC 2009, Proceedings
Number of pages8
Publication statusPublished - 2009
Externally publishedYes
Event12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, RSFDGrC 2009 - Delhi, India
Duration: Dec 15 2009Dec 18 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5908 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, RSFDGrC 2009


  • Data reduction
  • POS tagging
  • Rough sets
  • Similarity calculations
  • String comparison

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this