Automated Translation White Papers
Statistical Machine Learning for Information Retrieval
Overview The purpose of this paper is to introduce and experimentally validate a framework, based on statistical machine learning, for handling a broad range of problems in information retrieval (IR). Probably the most important single component of this framework is a parametric statistical model of word relatedness. A longstanding problem in IR has been to develop a mathematically principled model for document processing which acknowledges that one sequence of words may be closely related to another even if the pair have few (or no) words in common. The fact that a document contains the word automobile, for example, suggests that it may be relevant to the queries Where can one find information on motor vehicles? and Tell me about car transmissions, even though the word automobile itself appears nowhere in these queries. Also, a document containing the words plumbing, caulk, paint, gutters might best be summarized as common house repairs, even if none of the three words in this candidate summary ever appeared in the document.
| Publisher | Carnegie Mellon University | File Format | PDF, requires Acrobat Rdr 5 |
|---|---|---|---|
| Date Published | April 2001 | Downloads | 75 |
| Format | White Papers | ||
| Topics | |||



