Skip to topic | Skip to bottom
Home
Main
Main.IRseminar08-0404r1.16 - 05 Apr 2008 - 23:18 - MichaelBenderskytopic end

Start of topic | Skip to actions

IR seminar, April 4, 2008, (this week at 9:30am in Room 142 )

Topic Evaluation, XingYi and XiaobingXue leading

Background:

This week’s session will discuss the challenges of reliably evaluating different IR systems in the scenarios of incomplete/biased/noisy relevant judgments. The goal is to understand previous endeavors on this topic, some state-of-the-art techniques and results; then we may broadly discuss any interesting ideas related to this topic or suggest plausible solutions. Previous research could be roughly put into three folds: 1) to analyze the problems caused by incomplete relevant judgments and the impact on the existing IR measurements; 2) to design new IR measurements which can be more reliable/robust when judgments are incomplete/noisy; 3) to design new techniques to judge as few docs as possible without scarifying effectiveness and reliability.

Required papers:

We list four required papers here. Since most people are familiar with Ben's work, the required number is still three actually.

Recommended:

Tangential but interesting:

Background

Your Questions

Please directly add your questions and thoughts to the wiki (or email to us) about this topic by Friday 6am at the latest. When you save your editing, please select "Release edit lock" so that other people can edit it immediately. smile

  • In Buttcher et al., when SVM classifier is applied for predicting relevance, it is interesting to note that precision is quite high (~75%), while recall is low (~35%). As classifier is based on textual similarity, this seems to support "cluster hypothesis" to some extent. From this the following observation could follow. First, we could use this text-based classifier to rank documents in the collection by their probability of belonging to the "relevant". Then, we could use this ranking to retrieve more documents to judge (according to the results, a high percentage of these documents will be relevant). We could iterate this process, potentially accumulating a fuller sample of a "relevant" class after each step, which could improve the recall as more and more relevant documents are discovered. -- Michael

  • The "Minimal Test Collections" and "Reliable Information Retrieval Evaluation" papers each present a method for assigning relevance judgments to unjudged documents. However, this makes the resulting qrels unique, making the reproducibility of experiments difficult. How could these evaluation techniques be integrated into the current IR research framework (e.g. should we have multiple versions of qrels floating around for each corpus, etc.)? --HenryFeild

  • I wonder how much domain knowledge can be put to use in this task? Is it easier to determine relevance/nonrelevance when you have a much narrower scope of topicality, or do you start splitting hairs? I would like to see what the effects are of using some of the automatic judging techniques (particularly the ones from "Reliable Information Retrieval Evaluation") when the KL divergence will most likely be dampened due to narrower topicality. I suspect SVMs would perform better here. -- MarcCartright

  • How can this traditional evaluation method be extended for evaluating personalized(or contextual) retrieval, where the notion of relevant may dependent on user(or context)? While some of current researches use click-through for mixed rank list of baseline and improved method, would TREC-style evaluation be possible? One way seems to be providing textual representation of user (or context) so that each participant can make the best use of it and have relevant judgment created by the user(or the one who understands the context). -- JinyoungKim

  • The papers discuss evaluation issues for document retrieval. What about other tasks, such as passage retrieval? To what extent can judgments for passages be generated from judgments at the document level (that contain these passages)? -- Elif

to top

You are here: Main > IRseminarS08 > IRseminar08-0404

to top

Copyright © 1999-2008 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback