Transforming Long Queries
Principal Investigator:
W. Bruce Croft, PI
croft@cs.umass.edu
Center for Intelligent Information Retrieval (CIIR)
Department of Computer Science
140 Governors Drive
University of Massachusetts
Amherst, MA 01003-9264
Project Summary
Long queries represent a small but significant percentage of the queries submitted to web search engines currently. In other applications, such as collaborative question answering where people ask questions for other people to answer, long queries are typical, rather than unusual. Many information needs can be more easily expressed using longer, sentence-length queries, but the inadequacies of current search engines force people to try to think up the right combination of keywords to find relevant documents. This can be very difficult and often leads to search failures. On the other hand, long queries are handled poorly by current search engines. This is due at least in part to these queries being part of the “long tail”, meaning that they are infrequent and lack many of the statistical features that are used for effective ranking of short queries. Being able to effectively handle long queries would represent a significant advance in the capability of search engines from the user’s point of view, and should substantially improve our understanding of the underlying information retrieval process. In this project, we are studying long queries from web query logs and other sources such as TREC collections in order to develop new retrieval models and techniques for effective ranking. In particular, we focus on techniques for transforming long queries into equivalent queries that are more likely to perform well.
Query transformation steps such as stemming and expansion have been studied for many years, and segmentation has become an important part of processing web queries. In this project, we are working on two major changes; developing an integrated model of query transformation that includes all of these steps as part of retrieval, and focusing on long queries for which there is little click data. These changes will enable us to incorporate additional information that can be derived from a long query, such as relationships, and will be a significant development in the state of the art of retrieval models.
Research in this area will have a direct impact on the ability of web search engines to provide effective answers for more complex questions. Given that search is one of the two most common activities on the web and people often have trouble finding good answers to many questions, this research could have a very broad impact, both in the home and the office.
View details on the project's recent activities and findings.
Publications:
IR-739: (2009) Bendersky, M., Metzler, D. and Croft, W. B. , "Learning Concept Importance Using a Weighted Dependence Model," in the Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM 2010), pp. 31-40.
IR-751: (2010) Bendersky, M., Croft, W. B. and Smith, D., "Structural Annotation of Search Queries Using Pseudo-Relevance Feedback," CIIR Technical Report.
IR-755: (2010) Balasubramanian, N., Bendersky, M. and Allan, J., "Cost-Effective Combination of Multiple Rankers: Learning When Not To Query," NESCAI 2010, Amherst, MA, April 15-17, 2010.
IR-760: (2010) Dang, V., Bendersky, M. and Croft, W. B. , "Learning to Rank Query Reformulations," in the Proceedings of the 33rd Annual ACM SIGIR Conference (SIGIR 2010) Geneva, Switzerland, July 19-23, 2010, pp. 807-808.
IR-764: (2010) Croft, W. B. and Bendersky, M., "Do Longer Queries Retrieve More Diverse Results?," CIIR Technical Report.
IR-783: (2010) Bendersky, M., Croft, W. B. and Diao, Y., "Quality-Biased Ranking of Web Documents," Proceedings of the Fourth International Conference on Web Search and Data Mining (WSDM 2011), pp. 95-104.
IR-799: (2010) Bendersky, M., Fisher, D. and Croft, W. B. , "UMass at TREC 2010 Web Track: Term Dependence, Spam Filtering and Quality Bias," Proceedings of Text REtrieval Conference (TREC 2010),Gaithersburg, MD, November 15-19, 2010.
IR-805: (2011) Bendersky, M., Metzler, D. and Croft, W. B. , "Parameterized Concept Weighting in Verbose Queries," in the Proceedings of the 34th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR'11), pp. 605-614.
IR-806: (2011) Park, J., Croft, W. B. and Smith, D., "Quasi-Synchronous Dependence Model for Information Retrieval," in the Proceedings of The ACM Conference on Information and Knowledge Management (CIKM 2011), pp. 17-26.
IR-810: (2011) Lee, C. and Croft, W. B. , "Effective Query Generation from Web Page Content," submitted to WSDM 2012, Seattle, WA, February 8-12, 2012.
IR-813: (2011) Xue, X. and Croft, W. B. , "Modeling Reformulation Using Query Distributions," CIIR Technical Report.
IR-824: (2011) Bendersky, M., Croft, W. B. and Smith, D., "Joint Annotation of Search Queries," in the Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2011), pp. 102-111.
IR-827: (2011) Xue, X. and Croft, W. B. , "Modeling Subset Distributions for Verbose Queries," Proceedings of the 34th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR'11), pp. 1133-1334.
IR-846: (2011) Bendersky, M., Metzler, D. and Croft, W. B. , "Effective Query Formulation with Multiple Information Sources," submitted to Fifth ACM International Conference on Web Search and Data Mining, Seattle, WA, February 8-12, 2012.
IR-847: (2011) Dang, V., Xue, X. and Croft, W. B. , "Inferring Query Aspects from Reformulations Using Clustering," Proceedings of The ACM Conference on Information and Knowledge Management (CIKM 2011), pp. 2117-2120.
IR-854: (2011) Kim, Y., Seo, J., Croft, W. B. and Smith, D., "Improving Academic Searches using Concept Query Generation," submitted to the 34th European Conference on Information Retrieval (ECIR 12), Barcelona, Spain, 1-5 April 2012.
IR-860: (2011) Lee, C. and Croft, W. B. , "Evaluating Search in Personal Social Media Collections," submitted to the Fifth ACM International Conference on Web Search and Data Mining (WSDM 2012) Seattle, WA, February 8-12, 2012.
NSF Project Abstract
This work is supported in part by the Center for Intelligent Information Retrieval (CIIR) and in part by the National Science Foundation (NSF IIS-0914442).