Tuesday, April 13, 2010

Seeing is retrieving: building information context from what the user sees

Comments
not yet...

Summary
In Seeing is retrieving: building information context from what the user sees,  the researchers present a new system called "SeeTrieve" that helps classify documents based on the context that they are used in, and not just the actual contents of the file. Instead of storing the file contents, the SeeTrieve system stores the text that is stored around the document - that is, the text that is in the options used and other data within the application. It stores these text snippets and maps them to the documents that were used while these snippets were on the screen. As SeeTrieve only captures text the user views, it is more accurate for content recollection than a system that indexes large amounts of HTML data not seen by the user or many pages of a PDF that were never viewed. The picture below shows the way that text snippets are linked to files using a term index:


SeeTrieve takes in actions into a stream, in which each event has a timestamp. Anytime that a file is opened and later closed in an application, all of the text snippets that occur during the life of that file are associated with the file. It acquires the text snippets primarily through the accessibility functionality of most applications. Whenever any window changes visibility, a text snippet is made of the window and inserted into the trace of events; another snapshot is made every 3 seconds to catch events where the text has changed but the visibility of the window has not.

Evaluation of this retrieval system showed that it was much more successful for finding content than a traditional content-based search engine such as Google Desktop was. One example was that in a search for the name "James Gleick," SeeTrieve found the file "log1.jpg", because the text "James Gleick" was shown on the screen during the viewing of that image, but Google Desktop did not find that file, because the name was not anywhere in the content of the file. This showed that context-based searching is much more effective than searching just by context.

Discussion
I thought this system sounded amazing, and would not be surprised if we saw it included on most systems in the future. I have always thought that the one drawback of using traditional search systems whether used on a local computer or on the internet was that it only could search the file names, description, or text inside (if it's a text file), and could lose some semantic information based on the way it is used, or the way it is stored. While Google Images does this in a way (because it also searches for text on the webpage that an image is found on and not just the name and description of the image), it is not as thorough and useful as this system.

2 comments:

  1. That is really cool.
    We're making searching smarter everyday it seems.

    ReplyDelete
  2. It's good to see that people are still working on making searching better. I can't tell you how many times I've tried to find information on something and received really awful webpages as some of the first hits. I read an article about taking pictures and actually searching with them. Like the picture itself would be the query you enter into a search bar. I think this would be interesting to try.

    ReplyDelete