Research Papers Library

Abstracting Keywords from Hypertext Documents

This paper presents a process for abstracting keywords from hypertext or text documents. The abstracted keywords, like keywords listed in a paper, identify the contents of a document. Our proposed process can be used, for example, to identify the contents of HTML documents returned from a search engine, to allow users to quickly find their needed information. The proposed process not only considers the occurrent frequency of a word in a document, like other related works, but also considers the occurrent frequency of its synonyms. It also considers key phrases consisting of two or three words. To increase the accuracy of the frequency count of words, a stemming algorithm is used to remove suffixes. Our tests show that the stemming algorithm consumed on average 56.7% of the total computation time, and that the proposed process can on average abstract 52% of the keywords provided by the authors of the tested documents.

Download PDF

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.
Please wait

airs logo

Association of Internet Research Specialists is the world's leading community for the Internet Research Specialist and provide a Unified Platform that delivers, Education, Training and Certification for Online Research.

Newsletter Subscription

Receive Great tips via email, enter your email to Subscribe.
Please wait

Follow Us on Social Media