Abstract
Arabic is a highly inflected language, and therefore the processes of stemming and root extracting represent a challenge to researches. A new method is presented for extracting Arabic text stem, and lemma. Stemming sometimes affects the semantic of a word, where as lemma preserve the meaning of a word. The approach is based on pattern extraction. It uses a special encoding based on dividing letters into original and non-original letters. Codes are automatically generated for each pattern and then match against input text to extract root, pattern, and lemma of a word. A comparison with other methods reveals a promising result with accuracy up to 96%.
Original language | English |
---|---|
Title of host publication | 10th International Conference on Information Sciences, Signal Processing and their Applications, ISSPA 2010 |
Pages | 642-645 |
Number of pages | 4 |
DOIs | |
Publication status | Published - 2010 |
Event | 10th International Conference on Information Sciences, Signal Processing and their Applications, ISSPA 2010 - Kuala Lumpur, Malaysia Duration: May 10 2010 → May 13 2010 |
Other
Other | 10th International Conference on Information Sciences, Signal Processing and their Applications, ISSPA 2010 |
---|---|
Country | Malaysia |
City | Kuala Lumpur |
Period | 5/10/10 → 5/13/10 |
Keywords
- Morphological analyzer
- Natural language processing
- Root extraction
ASJC Scopus subject areas
- Computer Science Applications
- Information Systems
- Signal Processing