University of Illinois Technical Report, UIUCDCS-R-2008-2932, 2008.
Abstract
The most fundamental problem in information retrieval is that of interpreting information needs of users, typically expressed in a short query. Using the surface level representation of the query is especially unsatisfactory when the information needs are topic specific such as "US politics" or "Space Science", that seem to require understanding of what the query mean rather than what it is. We suggest that a newly proposed semantic representation of words (GabrilovichMa2007) can be used to support Conceptual Search. Namely, it allows retrieving documents on a given topic even when existing keyword-based search approaches fail. The method we develop allows us to categorize and retrieve documents topically on-the-fly, without looking at the data collection ahead of time, without knowing a-priori the topics of interest and without training topic categorization classifiers. We compare our approach experimentally to state-of-the-art IR techniques and to machine learning based text categorization techniques and demonstrate significant improvement in performance. Moreover, as we show, our method is intrinsically adaptable to new text collections and domains.
Links
- Link to paper
- See on Google Scholar
Bib Entry
@techreport{ratinov2008conceptual, author = {Ratinov, Lev and Roth, Dan and Srikumar, Vivek}, title = {{Conceptual Search and Text Categorization}}, institution = {University of Illinois}, number = {UIUCDCS-R-2008-2932}, year = {2008} }