Understanding the Relevance of Text Passages

Principal Investigator:
W. Bruce Croft, PI

Center for Intelligent Information Retrieval (CIIR)
College of Information and Computer Sciences
140 Governors Drive
University of Massachusetts
Amherst, MA 01003-9264

Project Goals

Some information retrieval (IR) queries can be best answered with a web page, others can be answered with a single fact or named entity. Many other queries could best be answered with a text passage and we propose to develop new techniques for this task that will be a significant improvement on the current state-of-the-art. Developing effective passage retrieval would have a major effect on search tools by greatly extending the range of queries that could be answered directly using text passages retrieved from the web. This is particularly important for mobile search applications with limited output bandwidth based on using either a small screen or speech output. In this case, the ability to use passages to reduce the amount of output while maintaining high relevance will be critical.

We are studying research issues that have either been ignored, or only partially addressed, in prior research, such as showing whether passages be better answers than documents for some queries, predicting which queries have good answers at the passage level, ranking passages to retrieve the best answers, and evaluating the effectiveness of passages as answers. To address these issues, we are developing new retrieval models that can define and rank “answers” for different text granularities such as sentences and passages, models of query properties that are associated with good passage-level answers, and models that differentiate between topicality and information content. Understanding the relevance of text passages will also involve obtaining new types of relevance assessments at passage granularity, and developing new evaluation metrics that combine relevance with the size of the result output.

Significant Results (2016 Report):

The main result in recent research is that we are starting to make significant progress in the development of neural network models for the non-factoid question answering task. We expect to produce a new paper on this work in the next period.

In addition, a new dataset was made freely available to researchers in the realm of question answering tasks. The Web Answer Passages Dataset (WebAP) is based on the 2004 TREC Terabyte Track Gov2 collection and contains 8,027 answer passages to 82 TREC queries. Answer passages are annotated with four quality measures.

Students Involved in the Project:

Liu Yang


Park, J., and Croft, W.B. “Using Key Concepts in a Translation Model for Retrieval”, in the Proceedings of the 38th Annual ACM SIGIR Conference (SIGIR 2015), Santiago, Chile, 2015, pp. 927-930.

Chen, R-C., Spina, D., Croft, W.B., and Sanderson, M., "Harnessing Semantics for Answer Sentence Retrieval," Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR '15), pp. 21-27, 2015.

Yang, L., Guo, Q., Song, Y., Meng, S., Shokouhi, M., McDonald, K. and Croft, W. B. , "Modeling User Interests for Zero-query Ranking," in the Proceedings of the 38th European Conference on Information Retrieval (ECIR 2016), Padova, Italy, March 20-23, 2016, pp. 171-184.

Yang, L., Ai, Q., Spina, D., Chen, R., Pang, L., Croft, W. B. , Guo, J. and Scholer, F., "Beyond Factoid QA: Effective Methods for Non-factoid Answer Sentence Retrieval," in The Proceedings of 38th European Conference on Information Retrieval (ECIR 2016), Padova, Italy, March 20-23, 2016, pp. 115-128.

Ai, Q., Yang, L., Guo, J. and Croft, W. B. , "Improving Language Estimation with the Paragraph Vector Model for Ad-hoc Retrieval," to appear in the Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 16), Pisa, Italy.

This work is supported in part by the Center for Intelligent Information Retrieval (CIIR) and in part by the National Science Foundation (NSF IIS-1419693).