SIGIR 2011: Query Representation and Understanding


Understanding the user's intent or information need that underlies a query has long been recognized as a crucial part of effective information retrieval. Despite this, retrieval models, in general, have not focused on explicitly representing intent, and query processing has been limited to simple transformations such as stemming or spelling correction. With the recent availability of large amounts of data about user behavior and queries in web search logs, there has been an upsurge in interest in new approaches to query understanding and representing intent.

This is the second workshop on query representation and understanding at SIGIR. The first workshop was held at SIGIR 2010. These workshops have the goal of bringing together the different strands of research on query understanding, increasing the dialogue between researchers working in this relatively new area, and developing some common themes and directions, including definitions of tasks, evaluation methodology, and reusable data collections.

This year, the workshop will include invited talks, poster session and panel discussions.

Invited Speakers


Workshop proceedings are now available online.

Dataset New content

Microsoft Research released the first QRU dataset that can be used for research on query representation & understanding, query transformation & reformulation, and relevance ranking.

Workshop Program

8:30-8:50 Poster Setup

8:50-10:00 Invited Talk by Nick Craswell [slides]

  • Title: Query Understanding for Relevance Measurement
  • Abstract: Understanding the user needs underlying a query can be very difficult, even for a human relevance judge. When evaluating our algorithms, particularly those with a sophisticated query model, it may be wise to use real queries and a notion of relevance that is aligned with real user needs. I will present two lines of work in this area. One is the TREC Web Track, where we attempt to incorporate real Web tasks, real queries and a diverse set of user intents for each query. Click-based clustering or crowdsourcing have been used to identify possible intents. The other line of work is click-based experimentation using result interleaving. Compared to TREC methods, interleaving can detect more subtle and personalized preferences. It is sensitive enough to get significant results from tens of users who install a browser toolbar. Analysis of these different approaches is according to statistical power, ease/availability of use, and fidelity to real user preferences.

10:00-10:30 Coffee Break

10:30-11:00 Accepted Talks I

  • Ricardo Campos, Alipio Jorge, Gael Dias: "Using Web Snippets and Query-logs to Measure Implicit Temporal Intents in Queries" [paper] [slides]
  • Rishiraj Saha Roy, Niloy Ganguly, Monojit Choudhury, Naveen Singh: "Complex Network Analysis Reveals Kernel-Periphery Structure in Web Search Queries" [paper] [slides]
  • Lidong Bing, Wai Lam: "Investigation of Web Query Refinement via Topic Analysis and Learning with Personalization" [paper] [slides]

11:00-12:10 Invited Talk by Ricardo Baeza-Yates [slides]

  • Title: Multi-faceted Query Intent Prediction
  • Abstract: In this presentation we report results for automatic classification of queries in a wide set of facets that are useful to the identification of query intent. Our hypothesis is that the performance of single-faceted classification of queries can be improved by introducing information of multi-faceted training samples into the learning process. We test our hypothesis by performing supervised and unsupervised multi-faceted classification of queries based on the combination of correlated facets. Our experimental results show that this idea can significantly improve the quality of the classification. Since most of previous works in query intent classification are based in single facets, these results are a first step to an integrated query intent classification model. This is joint work with Liliana Calderon and Cristina Gonzalez.

12:10-13:45 Lunch Break

13:45-14:05 Accepted Talks II

  • Debora Donato, Pinar Donmez, Sunil Noronha: "Toward a deeper understanding of user intent and query expressiveness" [paper] [slides]
  • Lu Bai, Jiafeng Guo, Xueqi Cheng, Xiubo Geng, Pan Du: "Exploring the Query-Flow Graph with a Mixture Model for Query Recommendation" [paper] [slides]

14:05-15:15 Invited Talk by Maarten de Rijke [slides]

  • Title: Using Linked Open Data for Understanding Queries (and Other Short Text Segments)
  • Abstract: One way of capturing what it is that queries are about is to map them to concepts in the linking open data cloud. In the talk, I will compare various methods for addressing this task, using a mixture of information retrieval and machine learning techniques. Features used include query features, concept features as well as search-history features. Simply performing a lexical match performs poorly, and so does using retrieval by itself, but complemented with a learning to re-rank approach, we obtain significant improvements. Time permitting, I will describe ongoing work on an extension of these ideas for capturing the about-ness of tweets. On top of the features used for understanding queries, we explore the use of tweet-specific features, based on hash tags, re-tweets, user mentions, etc. The talk is based on joint work with Edgar Meij, Marc Bron, Laura Hollink, Bouke Huurnink and Wouter Weerkamp.

15:15-16:00 Coffee Break and Poster Discussion

16:00 - 17:30 Panel Discussion

Call for Papers

We solicit short papers that would be presented as posters during the workshop. Relevant topics include, but are not limited to:

  • Release of new datasets or creative ways to collect data that can be beneficial for research on query representation and understanding
  • Evaluation methodologies for various query processing tasks
  • Models and algorithms for query understanding and representing user intent
  • Empirical studies on user behavior and different types of queries
  • Applications or user scenarios that involve query representation and understanding
  • New retrieval models or systems incorporating query representation and understanding

Submitted papers should be in the ACM Conference style (for LaTeX, use the "Option 2" style) and not exceed 4 pages in 9 point font. Papers must be submitted in PDF electronically via the submission page . Submissions of papers must not substantially duplicate work that any of the authors have published elsewhere or have submitted in parallel to any other conferences or journals. All submissions must be in English and will be reviewed by at least three members of the program committee. At least one author of each accepted paper will be expected to attend and prepare a poster as well as a short presentation at the workshop.

Important Dates

Deadlines for workshop poster submissions are:

  • Submissions Due: June 17, 2011
  • Acceptance Notification: July 06, 2011
  • Camera-ready Submission: July 09, 2011
  • Workshop: July 28, 2011


Program Committee

  • Alex Kotov (University of Illinois at Urbana-Champaign)
  • Craig MacDonald (University of Glasgow)
  • Daxin Jiang (Microsoft Research)
  • Donald Metzler (ISI, University of Southern California)
  • Fernando Diaz (Yahoo! Research)
  • Fuchun Peng (Microsoft Bing)
  • Jiafeng Guo (ICT, Chinese Academy of Sciences)
  • Jian-Yun Nie (University of Montreal)
  • Krisztian Balog (Norwegian University of Science and Technology)
  • Le Zhao (Carnegie Mellon University)
  • Matthias Hagen (Bauhaus University Weimar)
  • Min Zhang (Tsinghua University)
  • Oren Kurland (Technion, Haifa)
  • Patrick Pantel (Microsoft Research)
  • Pu-Jen Cheng (National Taiwan University)
  • Rodrygo Santos (University of Glasgow)
  • Vanja Josifovski (Yahoo! Research)
  • Wessel Kraaij (Radboud University Nijmegen)
  • Xiaobing Xue (University of Massachusetts)
  • Yi Chang (Yahoo! Labs)
  • Yumao Lu (Microsoft Bing)


back to top