Selecting topics for web resource discovery: Efficiency issues in a database approach

Abdullah Al-Hamdani, Gultekin Ozsoyoglu

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

This paper discusses algorithms for topic selection queries, designed to query a database containing metadata about web information resources. The metadata database contains topics and relationships, called metalinks, about topics. Topics in the database contain associated importance scores. The topic selection operator TSelection selects, within time T, topics that satisfy a given selection formula and having output importance scores above a given threshold value or in the top-k. The selection formula contains expensive predicates, in the form of user-defined functions. To minimize the number of expensive predicate evaluations (probes) in the TSelection algorithm, we introduce and evaluate three heuristics. Also, due to the time constraint T, the TSelection algorithm may terminate without locating all output tuples. In order to maximize the number of output tuples found, we introduce and evaluate three heuristics to locate a tuple to evaluate at a given time.

Original languageEnglish
Pages (from-to)792-802
Number of pages11
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2736
Publication statusPublished - 2003

Fingerprint

Resource Discovery
Metadata
Predicate
Evaluate
Output
Heuristics
Query
Terminate
Threshold Value
Probe
Maximise
Minimise
Resources
Evaluation
Operator

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

@article{b69b5188aba7452199474d294ae28855,
title = "Selecting topics for web resource discovery: Efficiency issues in a database approach",
abstract = "This paper discusses algorithms for topic selection queries, designed to query a database containing metadata about web information resources. The metadata database contains topics and relationships, called metalinks, about topics. Topics in the database contain associated importance scores. The topic selection operator TSelection selects, within time T, topics that satisfy a given selection formula and having output importance scores above a given threshold value or in the top-k. The selection formula contains expensive predicates, in the form of user-defined functions. To minimize the number of expensive predicate evaluations (probes) in the TSelection algorithm, we introduce and evaluate three heuristics. Also, due to the time constraint T, the TSelection algorithm may terminate without locating all output tuples. In order to maximize the number of output tuples found, we introduce and evaluate three heuristics to locate a tuple to evaluate at a given time.",
author = "Abdullah Al-Hamdani and Gultekin Ozsoyoglu",
year = "2003",
language = "English",
volume = "2736",
pages = "792--802",
journal = "Lecture Notes in Computer Science",
issn = "0302-9743",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - Selecting topics for web resource discovery

T2 - Efficiency issues in a database approach

AU - Al-Hamdani, Abdullah

AU - Ozsoyoglu, Gultekin

PY - 2003

Y1 - 2003

N2 - This paper discusses algorithms for topic selection queries, designed to query a database containing metadata about web information resources. The metadata database contains topics and relationships, called metalinks, about topics. Topics in the database contain associated importance scores. The topic selection operator TSelection selects, within time T, topics that satisfy a given selection formula and having output importance scores above a given threshold value or in the top-k. The selection formula contains expensive predicates, in the form of user-defined functions. To minimize the number of expensive predicate evaluations (probes) in the TSelection algorithm, we introduce and evaluate three heuristics. Also, due to the time constraint T, the TSelection algorithm may terminate without locating all output tuples. In order to maximize the number of output tuples found, we introduce and evaluate three heuristics to locate a tuple to evaluate at a given time.

AB - This paper discusses algorithms for topic selection queries, designed to query a database containing metadata about web information resources. The metadata database contains topics and relationships, called metalinks, about topics. Topics in the database contain associated importance scores. The topic selection operator TSelection selects, within time T, topics that satisfy a given selection formula and having output importance scores above a given threshold value or in the top-k. The selection formula contains expensive predicates, in the form of user-defined functions. To minimize the number of expensive predicate evaluations (probes) in the TSelection algorithm, we introduce and evaluate three heuristics. Also, due to the time constraint T, the TSelection algorithm may terminate without locating all output tuples. In order to maximize the number of output tuples found, we introduce and evaluate three heuristics to locate a tuple to evaluate at a given time.

UR - http://www.scopus.com/inward/record.url?scp=35248889991&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=35248889991&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:35248889991

VL - 2736

SP - 792

EP - 802

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

ER -