Automated Translation White Papers

Statistical Machine Learning for Information Retrieval

Overview The purpose of this paper is to introduce and experimentally validate a framework, based on statistical machine learning, for handling a broad range of problems in information retrieval (IR). Probably the most important single component of this framework is a parametric statistical model of word relatedness. A longstanding problem in IR has been to develop a mathematically principled model for document processing which acknowledges that one sequence of words may be closely related to another even if the pair have few (or no) words in common. The fact that a document contains the word automobile, for example, suggests that it may be relevant to the queries Where can one find information on motor vehicles? and Tell me about car transmissions, even though the word automobile itself appears nowhere in these queries. Also, a document containing the words plumbing, caulk, paint, gutters might best be summarized as common house repairs, even if none of the three words in this candidate summary ever appeared in the document.

Further White Paper Details
PublisherCarnegie Mellon University File FormatPDF, requires Acrobat Rdr 5
Date PublishedApril 2001 Downloads75
FormatWhite Papers   
Topics

Quick Sitemap Links: