A Test Collection of Preference Judgments

We have assembled a collection of preference judgments over documents judged for the Topic Distillation task of the TREC 2003 Web track. Assessors were shown two documents and asked which the preferred for a given query. These judgements are meant to serve as a starting point for research into questions of evaluation and learning over non-binary, multi-item judgments.

Data Overview

A Test Collection of Preference Judgments by Ben Carterette and Paul N. Bennett (pdf).

When using this data, please cite the above paper as:

B. Carterette and P.N. Bennett. A Test Collection of Preference Judgments. In SIGIR 2008 Workshops: Beyond Binary Relevance: Preferences, Diversity, and Set-Level Judgments. Edited by P. Bennett, B. Carterette, O. Chappelle, and T. Joachims. URL: http://ciir.cs.umass.edu/~carteret/bbr-overview.pdf.

Usage

For SIGIR 2008, we request that authors making use of these preference judgments submit to the Beyond Binary Relevance workshop while those focused solely on LETOR without use of the preference data or preference-related issues submit to the Learning to Rank workshop.

Download

If you'd like to be notified on data updates, please let us know!

Evaluation script
This perl script calculates the evaluation measures described in the overview paper. It expects preference judgments in the format we describe, and ranked results in either 6-column TREC format or 3-column format (query, docID, score). Usage is: perl prefeval.pl -p -r [-trec]

Version 1 (released 12 May 2008)