Experimental Data and Annotations
In some of the published material I needed to use annotated data, which was not readily available from traditional sources such as TREC. When possible, I will publish this data here, in order to promote the reproducibility of our research.
Syntactic Annotation of Search Queries
- M. Bendersky, W. B. Croft, D.A. Smith: "Structural Annotation of Search Queries
Using Pseudo-Relevance Feedback" In Proceedings of CIKM 2010 [pdf]
- M. Bendersky, W. Bruce Croft and D. A. Smith: "Joint Annotation of Search Queries" In Proceedings of ACL-HLT 2011 [pdf]
- In these two papers, we annotated 250 search queries from a search log with capitalization, POS tagging and segmentation annotations. The annotation can be found in this tar.gz file.
Finding Text Reuse on the Web
- M. Bendersky, W. B. Croft: "Finding Text Reuse on the Web" In Proceedings of WSDM 2009 [pdf]
- In this paper, we used a set of 50 "text-reuse" queries. These queries were essentially sentence-long excerpts from news articles
- This text file provides a list of these queries. Each query is associated with a source date, to enable reproducing the source date detection results discussed in the paper.
Discovering Key Concepts in Verbose Queries
- M. Bendersky, W. B. Croft: "Discovering Key Concepts in Verbose Queries" In Proceedings of SIGIR 2008 [pdf]
- In this paper, we annotated 500 TREC "description" queries with a "key concept". The key concept is defined as a single noun phrase that best represents the information need underlying the query. These annotations were used to train a concept weighting method.
- This tar.gz file contains the annotated key concepts, as well as the structured Indri queries containing the concept weights learned by our method. See README.txt in the file for details.