SIGIR 2010: Query Representation and Understanding

Second Workshop on Query Representation and Understanding has been announced !


Understanding the user's intent or information need that underlies a query has long been recognized as a crucial part of effective information retrieval. Despite this, retrieval models, in general, have not focused on explicitly representing intent, and query processing has been limited to simple transformations such as stemming or spelling correction. With the recent availability of large amounts of data about user behavior and queries in web search logs, there has been an upsurge in interest in new approaches to query understanding and representing intent.

This workshop has the goal of bringing together the different strands of research on query understanding, increasing the dialogue between researchers working in this relatively new area, and developing some common themes and directions, including definitions of tasks and evaluation methodology. We hope the workshop could bring together researchers from IR, ML, NLP, and other areas of computer and information science who are working on or interested in this area, and provide a forum for them to identify the issues and the challenges, to share their latest research results, to express a diverse range of opinions about this topic, and to discuss future directions.

The workshop program includes three main sessions: invited talks, poster session and panel discussion. Ten short invited talks by both academic and industrial researchers will give the participants a sense of different aspects of query understanding, and what are the current state of the art results in this research area. In the poster session, eight accepted papers will be presented in a form of "elevator pitch" plus printed poster. The panel discussion will focus on the issues related to query representation and understanding research, including a rigorous definition of the task, modeling for the task, challenges and opportunities, implications to IR, and future research directions.

Proceedings & Workshop Report

Full workshop proceedings are now available online.

Workshop report is now available from the SIGIR forum.

Invited Speakers

  • Eugene Agichtein, Emory University
  • Fernando Diaz, Yahoo! Research
  • Rosie Jones, Akamai
  • Donald Metzler, University of Southern California
  • Jian-Yun Nie, University of Montreal
  • Patrick Pantel, Microsoft Research
  • Fuchun Peng, Microsoft Bing
  • Cheng Xiang Zhai, University of Illinois at Urbana-Champaign
  • Michael Bendersky, University of Massachusetts Amherst
  • Gu Xu, Microsoft Research Asia

Accepted Papers

  • Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng (ICT, CAS, Beijing, P.R. China): "Recommending Diverse and Relevant Queries with A Manifold Ranking Based Approach"
  • Wei-Yen Day, Pu-Jen Cheng (National Taiwan University): "Visualizing Image Query Senses by Social Tags"
  • Xiaobing Xue, W. Bruce Croft (UMass Amherst): "Representing Queries as Distributions"
  • Grzegorz Chrupala, Georgiana Dinu, Benjamin Roth (Saarland University): "Enriched syntax-based meaning representation for answer extraction"
  • Maarten Van der Heijden, Max Hinne, Suzan Verberne, Eduard Hoenkamp, Theo van der Weide, Wessel Kraaij (Radboud University Nijmegen): "When is a query a question? Reconstructing wh-requests from ad hoc-queries"
  • Liliana Calderon-Benavides (UPF), Cristina Gonzalez-Caro (UPF), Ricardo Baeza-Yates (Yahoo! Research): "Towards a Deeper Understanding of the User's Query Intent"
  • Sumio Fujita (Yahoo! Japan Corporation), Tatsuya Uchiyama (Yahoo! Japan Corporation), Georges Dupret (Yahoo! Labs), Ricardo Baeza-Yates (Yahoo! Research): "Search Facet Creation from Click Logs"
  • Kevyn Collins-Thompson (Microsoft Research), Joshua Dillon (Georgia Institute of Technology): "Controlling the search for expanded query representations by constrained optimization in latent variable space"

Workshop Program

9:15-10:15 Invited Talks (Part I)

  • Fuchun Peng, Microsoft Bing: "Concepts Identification from Queries and Its Application for Search Relevance"

    Abstract: Query is often treated as a bag of words by search engines, but when people are formulating queries, they use "concepts" as building blocks. Can we automatically segment the query to recover the concepts? How can we use the identified concepts to improve search relevance? In this talk, I will present some techniques to some techniques for query concept identification and how can we use them to improve search relevance from query rewriting and machine learning ranking.

  • Rosie Jones, Akamai: "Searching for Myself"

    Abstract: A single user query represents a user information need. A short sequence of queries may represent a reformulation sequence as the user attempts to match the expression of that information need to the available documents. Longer sequences may give us patterns of a user's interests, as well as clues about who the user is, and how he or she is feeling. In this talk I show examples of the evidence that a user's personal qualities and emotional state can emerge from longer and longer sequences of queries and click behavior. I will also give some open problems where greater understanding of user intent can provide opportunities in online computational advertising.

  • Cheng Xiang Zhai, University of Illinois at Urbana-Champaign: "Putting Query Representation and Understanding in Context: A Decision-Theoretic Framework for Optimal Interactive Retrieval through Dynamic User Modeling"

    Abstract: A query is inherently associated with rich context information, including, e.g., the user who posed the current query, the other queries entered by this user in the current retrieval session, and any documents viewed or skipped by the user in both the current session and past sessions. All these context variables provide important clues about a user's intent behind a query and thus should be exploited in order to under-stand query intent as accurately as possible. In this talk, I will present a general Bayesian decision-theoretic framework for incorporating all kinds of context information to model a user's information need dynamically as the user interacts with a retrieval system and exploiting dynamic user modeling to optimize retrieval results in an interactive retrieval system. I will also discuss how the framework has been used in the UCAIR project to naturally support statistical language models for query representation and achieve personalized search without requiring any user effort.

10:15-10:45 Coffee Break

10:45-12:00 Invited Talks (Part II)

  • Jian-Yun Nie, Lixin Shi, University of Montreal: "Integrating Term Dependencies according to Their Utility"

    Abstract: Terms in a query can be strongly dependent. A number of previous studies have shown the benefits of taking into account the dependencies between query terms. Typically, a dependency model is defined and interpolated with the traditional bag-of-words model. Such an approach can take advantage of the more robust bag-of-words model and the more precise dependency model. However, we notice that most previous approaches assign a fixed weight to each of the models in the interpolation, regardless of the dependencies in the given query. In reality, the strength of a dependency and its utility for IR vary from a pair of words to another and from a query to another. A uniform interpolation cannot account correctly for the variable strength and utility of term dependencies in queries. In this talk, we will describe a new dependency model in which the dependency between a pair of words is taken into account in document retrieval according to its strength and utility. The more a dependency is useful, the higher the importance is assigned to it. SVM is used to learn the expected utility of a dependency based on a set of features. We tested this model on several collections from TREC and NTCIR, in both English and Chinese. Our results showed that the new model outperforms the existing ones on almost all the collections. This demonstrates the necessity to integrate term dependencies in a variable manner, according to their utility for IR.

  • Fernando Diaz, Yahoo! Research: "Intent Triage: Quantifying the Severity of Poor Performance on Intent Classes"

    Abstract: Information retrieval systems, especially portal web search engines, use a variety of query analysis techniques to detect user intent. While new intent classes are introduced every year, there has been little work comparing the relative importance of performance of intent classes. In this talk, I will discuss the relative importance of intent classes in order to motivate severity-based system design.

  • Eugene Agichtein, Emory University: "Inferring User Intent from Interactions with the Search Results"

    Abstract: Search engines' understanding of the user queries has advanced greatly over the last decade. Yet, expressing the searcher information needs still primarily relies on guessing the "right" search keywords, often requiring multiple rounds of trial-and-error from the searcher. At the same time, searcher interaction data is becoming increasingly available, at both server- and client-side. Extracting meaningful signals from this data would enable a search engine to accurately infer user intent for tasks such as real-time result reranking, dynamic result presentation, and contextualized query suggestion. This talk overviews our recent progress on modeling and exploiting client-side searcher interaction data for intent inference.

12:00-14:00 Lunch Break

14:00-15:20 Invited Talks (Part III)

  • Patrick Pantel, Microsoft Research: "Entity Extraction for Query Interpretation"

    Abstract: Entity lists are vital for semantically analyzing web queries. In this talk, we propose a general information extraction framework, showing large gains in entity extraction by combining state-of-the-art distributional and pattern-based extractors with a large set of features from a 600 million document webcrawl, one year of query logs, and a snapshot of Wikipedia. We explore the hypothesis that although distributional and pattern-based algorithms are complementary, they do not exhaust the semantic space; other sources of evidence can be leveraged to better combine them. A detailed analysis of feature correlations and interactions shows that query log and webcrawl features yield the highest gains, but easily accessible Wikipedia features also improve over current state-of-the-art systems. We further study the impact of editor-chosen seeds on extraction performance. We show that in general few seeds are needed to saturate a distributional model and that seed compositionality is very sensitive resulting in tremendous variance on expansion performance. We further study the latter and show that untrained editors are terrible at choosing the right seeds and we propose an algorithm for helping editors choose better seeds.

  • Donald Metzler, University of Southern California: "Specialized Query Understanding"

    Abstract: Most of the query understanding and representation research done today is either very general ("one size fits all") or highly specialized. The "one size fits all" approaches are general but tend to fail for certain classes of queries. At the other end of the spectrum are the specialized approaches that often require a great deal of domain knowledge and search expertise. In this talk, I will discuss the pros and cons of these two competing approaches and propose a challenge to develop a robust, fully automatic approach to specialized query understanding.

  • Michael Bendersky, University of Massachusetts Amherst: "Representing Queries as Structures"

    Abstract: Traditionally, queries in information retrieval applications are represented as bags-of-words, and query terms are assumed to be independent. In this talk, we formulate a retrieval framework that represents queries as structures. We demonstrate that such formulation allows to relax the independence assumption made in the previous work, and to create richer and more realistic query representations. Finally, we show how the structural query representation can serve as a basis for both existing and novel retrieval models.

  • Gu Xu, Microsoft Research Asia: "Enrich Query Representation by Query Understanding"

    Abstract: Bags of words have been thought as a basis of information retrieval for years. However, words are often too simple to convey clear semantic meanings, and become an important cause of mismatching problems. In this talk, we introduce a different view to look at the problem of query representation. Query understanding can be conducted at different levels or granularities of semantics, i.e. word level, sense level, topic level and structure level. The outputs of query understanding can be attached to queries as enriched representations and help to answer the queries on different difficulties. We will also talk about some published work that can be considered as small steps along this direction.

15:20-15:40 Short Poster Presentations

15:40-16:10 Poster Session & Coffee Break

16:10-17:30 Panel Discussion

Call For Papers

We solicit short position and research papers that would be presented as posters during the workshop. Relevant topics include, but are not limited to:

  • Models and algorithms for query understanding and representing user intent
  • Empirical studies on user behavior and different types of queries
  • Applications or user scenarios using query understanding and modeling
  • New retrieval models or systems incorporating query representation and query understanding
  • Evaluation methodologies for various query processing tasks

We solicit research papers, position papers or papers that describe research in progress to be presented as posters. Submitted papers should be in the ACM Conference style (for LaTeX, use the "Option 2" style) and not exceed 4 pages in 9 point font. Papers must be submitted in PDF electronically via the submission page ( Submissions of papers should not substantially duplicate work that any of the authors have published elsewhere or have submitted in parallel to any other conferences or journals. All submissions must be in English and will be reviewed by at least three members of the program committee. At least one author of each accepted paper will be expected to attend and prepare a poster as well as a short presentation at the workshop.

Important Dates

Deadlines for workshop poster submissions are (note the extension of the submission deadline):

  • Submissions Due: June 10, 2010
  • Acceptance Notification: June 30, 2010
  • Camera-ready Submission: July 10, 2010
  • Workshop: July 23, 2010


Program Committee

  • Claudia Hauff, University of Twente
  • Dou Shen, Microsoft
  • Evgeniy Gabrilovich, Yahoo! Research
  • Hema Raghavan, Yahoo! Labs
  • Iadh Ounis, University of Glasgow
  • Jian-Yun Nie, University of Montreal
  • Kaushik Chakrabarti, Microsoft Research
  • Kevyn Collins-Thompson, Microsoft Research
  • Matt Lease, University of Texas at Austin
  • Nan Sun, National University of Singapore
  • Oren Kurland, Technion, Israel Institute of Technology
  • Pu-Jen Cheng, National Taiwan University
  • Ruihua Song, Microsoft Research Asia
  • Steven M. Beitzel, Illinois Institute of Technology
  • Yuanhua Lv, University of Illinois at Urbana-Champaign
  • Yumao Lu, Yahoo! Labs


back to top