Skip to topic | Skip to bottom
Home
Main
Main.PredictingQueryPerformancer1.2 - 03 Oct 2007 - 15:59 - MichaelBenderskytopic end

Start of topic | Skip to actions
Predicting Query Performance

Date Place Author Keyword
2002 SIGIR Steve Cronen-Townsend,Yun Zhou,W. Bruce Croft query clarity

Summary

This paper develops an original method for predicting query performance. Query performance is predicted by calculating a clarity score, which is a cross-entropy between the query language model and the background (collection) language model. It is suggested that low clarity scores indicate query ambiguity and are correlated with poor query performance.

Background

This paper leverages two language modeling ideas for query clarity score computation:

  • (query) relevance models
  • Kullback-Leibler divergence

Paper is based on the following two hypotheses:

Hypothesis 1: Highly coherent and "clear" queries (queries about a single topic) produce relevance models characterized by unusually large probabilities for a small number of topical terms. On the other hand, ambiguous queries produce relevance models that are much "smoother", and hence closer to the background (collection) language model.

Hypothesis 2: There is a strong correlation between query clarity score and query performance

Contribution

Computing the query clarity score

  • Clarity score is computed by KL(Q|C) , where Q represents a query language model and C represents a background (collection) language model.
  • The higher is the KL divergence score, the higher is the query clarity score
  • Q is built based on relevance model induced by a query
  • RM-1 method is used for building a relevance model, i.e., it is assumed that all terms are sampled from the same model
  • Note that, if Q is smoothed, KL(Q|C) involves summation over all terms in the vocabulary.

Query clarity score applications

Correlation with MAP scores on TREC

  • Authors use Spearmann rank correlation test to determine correlation between clarity scores and MAP performance on TREC data.
  • On the scale of [-1,1] (-1 - opposite ranks, 1 - full rank correlation), the correlation is between 0.368-0.577 on various TREC corpora, which shows a significant correlation between query MAP performance and clarity score.

Automatic Query Classification

  • In this task, query is either considered "good" or "bad", based on it's clarity score. "Good" query is a query, which should yield coherent results, and thus it's clarity score will be high. "Bad" query is an ambiguous query, which should yield a low clarity score.
  • Simple thresholding rule is proposed to detect "bad queries": a query is deemed clear enough if an estimated 80% or more of single term queries would have a lower clarity score.
  • Performance of this rule favorably compares to the optimal situation, where all relevance judgments in the collection are known.

Related work

  • "Query performance prediction in Web Search Environments" by Zhou and Croft presents additional techniques for performance prediction that outperform the original query-clarity method on web corpora
  • "What makes a query difficult?" by D.Carmel et al. presents an approach to performance prediction that relies on corpus information, not only the query information - i.e., how 'hard' the corpus is for retrieval task in general.

Reference


to top

You are here: Main > Fall2007ReadingGroup > PredictingQueryPerformance

to top

Copyright © 1999-2008 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback