Lev Ratinov, Dan Roth and Vivek Srikumar (Authors listed alphabetically)
University of Illinois Technical Report 2008.

Abstract

The most fundamental problem in information retrieval is that of interpreting information needs of users, typically expressed in a short query. Using the surface level representation of the query is especially unsatisfactory when the information needs are topic specific such as "US politics" or "Space Science", that seem to require understanding of what the query mean rather than what it is. We suggest that a newly proposed semantic representation of words (GabrilovichMa2007) can be used to support Conceptual Search. Namely, it allows retrieving documents on a given topic even when existing keyword-based search approaches fail. The method we develop allows us to categorize and retrieve documents topically on-the-fly, without looking at the data collection ahead of time, without knowing a-priori the topics of interest and without training topic categorization classifiers. We compare our approach experimentally to state-of-the-art IR techniques and to machine learning based text categorization techniques and demonstrate significant improvement in performance. Moreover, as we show, our method is intrinsically adaptable to new text collections and domains.

Links

Bib Entry

 
  @techreport{RRS2008,
  author = {B. Ratinov and D. Roth and V. Srikumar},
  title = {Conceptual Search and Text Categorization},
  number = {UIUCDCS-R-2008-2932},
  institution = {University of Illinois},
  year = {2008}
}