Background The UMass Amherst Center for Intelligent Information Retrieval created two high quality evaluation datasets for non-factoid questions, with multiple answer types associated with them. An example of this type of question and answer types is the following: Question : How do I get rid of mice humanely? Potential Answer Types: Use Traps, Natural Predators. Two levels of data has been provided, which could be used for evaluation. (1) NFPassageQA_Sim : Passage Similarity (2) NFPassageQA_Div : Passage Diversity NFPassageQA_Sim dataset This dataset consists of similarity annotations between pairs of relevant answers for a question. To generate this set, 128 questions with greater than or equal to 10 relevant answers were sampled from the test set of the ANTIQUE dataset [2]. The pairs of relevant answers and the corresponding question were then shown to annotators in the Mechanical Turk platform, to select one of the four similarity labels : 4 : Highly Similar, where both passages answer the question and answers contain the same information, even if they maybe worded differently. 3 : Moderately Similar, where both passages answer the question and belong to the same answer type, but could contain other non-relevant or answer type information 2 : Dissimilar, where both passages answer the question, but answers belong to different answer types. 1 : At least one of the passages does not answer the question. For more details and examples on the annotation process and examples, please refer to the paper [1]. Two files are included, which are tab separated and in the format given below: (a) Query file (queries_sim.txt) , Format : (b) Label file (labels_sim.qrel), Format :