Background PsgRobust is an answer passage collection built upon the Robust04 collection without manual annotation. We built this collection for our research on iterative relevance feedback because of following reasons. For experiments of relevance feedback, it is neccesary to do experiments on collections that have multiple relevant answer passages for each query. However, there are few existing collections with queries that have multiple relevant answer passages. Most popular Question-answering datasets consist of queries that have only one relevant answer. For detailed information, please check our paper. Keping Bi, Qingyao Ai, W. Bruce Croft. "Iterative Relevance Feedback for Answer Passage Retrieval with Passage-level Semantic Match." In the proceeddings of the 41st European Conference on Information Retrieval, 14 pages, Springer, 2019 This collection was built based on two assumptions: 1. For the passages that are ranked in the top positions by a powerful ranker, if they are in the relevant documents, we assume they are relevant. 2. All the passages in the non-relevant document are irrelevant. First, top 100 documents were retrieved for each title query in Robust04 with the Sequential Dependency Model (SDM) [1]. Then a sliding window of 2 or 3 sentences is used to split the documents into passages with no overlap. Whether there are 2 or 3 sentences for each passage is decided randomly. Each passage was assigned an ID which is the DOCNO of the document joined with the passage id in the document by '_', i.e., DOCNO_passageNO. After that, top 100 passages were retrieved with SDM again for the title queries, and passages from relevant documents are treated as relevant. The recall of the top 100 documents is 0.43, which means that 43% of relevant documents for all queries were included in the passage collection on average. Overall, there are 246 queries with relevant passages in PsgRobust. Query 672 does not have relevant documents in the Robust04 collection. Query 309,314 and 412 do not have relevant passages in PsgRobust. Dataset There are 22403 unique documents and 383036 passages in total, and 6589 relevant passages for the 246 queries, which are from 3544 documents. The compressed folder contains Queries: cv_query/ The training and test queries of 5 folds for cross-validation we used in our experiments. PsgRobust.descs.tsv Questions (description queries) with relevant answer passages in the PsgRobust collection. robust04.descs.tsv The description queries in Robust04. robust04.titles.tsv The title queries in Robust04. Labels: PsgRobust.qrels Labels for relevant passages in PsgRobust. robust04.qrels Labels for relevant documents in Robust04. Passages: doc.trectext All the candidate passages in PsgRobust with TREC format. These passages were generated by the sliding window of 2 or 3 sentences on the top 100 retrieved documents. readme.txt The file you are currently reading. Acknowledgements This work was supported in part by the Center for Intelligent Information Retrieval and in part by NSF IIS-1715095. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of the sponsor. References [1] Donald Metzler and W Bruce Croft. 2005. A Markov random field model for term dependencies. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 472–479. [2] Keping Bi, Qingyao Ai, W. Bruce Croft. "Iterative Relevance Feedback for Answer Passage Retrieval with Passage-level Semantic Match." In the proceeddings of the 41st European Conference on Information Retrieval, 14 pages, Springer, 2019