Projects Selection: Biodegradation Evaluation | Text Search (Web Classification) | Computer Music

Approaches to Searching Text

We have been working for some years on approaches to deal with "personalizing the web". The first work in this regard was PAINT, a forerunner of todays booksmarks for MOSAIC. The work then evolved into searching through text for distiguishing "words", first using a statstical approach found in WORC, then using some simple semantic relationships as found in the SRG work. Read below for more insight

PAINT:

  • A Tool for Personalizing the Web , 2nd World Wide Web Conference , pg 49-59, Chicago 1994.
PAINT (Personalized, Adaptable Internet Navigation Tool) was an attempt to increase the effectivness of the then Mosaic browser by allowing the user to cache away useful web pages organized in a hierarchical file system. This interface was developed and subsequently its functionality has shown up in many browers.

WORC:

  • Finding Salient Features for Personal Web Page Categories , Computer Networks and ISDN Systems , Vol 29, 1147-1156, 1997 (also in the 6 WWW conference)
WORC (Web Organization by Robot Categorization) was an attempt to take personal web pages (as bookmarked by a user), analyze them and try to find similar pages (text of any kind, but web pages in particular). Early attempts focused on categorizing pages based on words, but subsequent research showed that categorizing words based on pages was very effective in identifying useful word groups that identified similar pages.

SRGs:

  • Automated Concept Extraction from Plain Text AAAI Workshop on Learning for Text Categorization, Madison, July 1998.
SRG (Semantic Relationship Graphs) extends the work of identifying significant groups of words in a group of text documents, by identifying semantic relationships between the words and building a graph based on those relationships. These SRGs disambiguate word meaning and introduce "bridging" concepts that semantically connect words in the graph .

try to find me via finger:
finger punch
email address:
punch@cse.msu.edu
snail mail address:
3115 Engineering Building
Dept. of Computer Science
East Lansing, MI 48823
phone: 517-353-3541
fax: 517-432-1061
office: 3147 Engineering Bldg.