A Test Collection of Preference Judgments by Ben Carterette and Paul N. Bennett (pdf).When using this data, please cite the above paper as:
B. Carterette and P.N. Bennett. A Test Collection of Preference Judgments. In SIGIR 2008 Workshops: Beyond Binary Relevance: Preferences, Diversity, and Set-Level Judgments. Edited by P. Bennett, B. Carterette, O. Chappelle, and T. Joachims. URL: http://ciir.cs.umass.edu/~carteret/bbr-overview.pdf.
For SIGIR 2008, we request that authors making use of these preference judgments submit to the Beyond Binary Relevance workshop while those focused solely on LETOR without use of the preference data or preference-related issues submit to the Learning to Rank workshop.
If you'd like to be notified on data updates, please let us know!Evaluation script
This perl script calculates the evaluation measures described in the overview paper. It expects preference judgments in the format we describe, and ranked results in either 6-column TREC format or 3-column format (query, docID, score). Usage is:perl prefeval.pl -p-r [-trec] Version 1 (released 12 May 2008)