Skip to topic | Skip to bottom
Home
Main
Main.IRseminar08-0328r1.15 - 28 Mar 2008 - 21:16 - MichaelBenderskytopic end

Start of topic | Skip to actions

IR seminar, March 28, 2008

Topic : Web & IR - JinyoungKim leading.

Background:

Web has been a primary driving force of IR research, providing problems and challenges, suggesting clues to known problems. Therefore, as an IR researcher, it is worthwhile to keep an eye on what are happening on web and their implications to IR research. In this seminar, we'll look through and discuss the influence of web to IR research in general, with special emphasis on two hot trends : 'Web 2.0' and 'Semantic Web'. While my original scope was Web 2.0, I felt that two trends are rather both sides of the same coin and essential to the understanding of future web.

Readings

3/26 : Notice - required papers are changed into more technical ones - JinyoungKim

Required:

I included these papers that shows interesting research problems regarding web 2.0 and semantic web

Recommended:

I would expect all participants to be familiar with the relationship of web and IR research in general and basic idea of 'Web 2.0' and 'Semantic Web'.

Tangential but interesting:

While the above postings can be a starting point to understand the concepts, the following articles may provide interesting points toward these technologies and where they may overlap.

Additional Papers

Your Questions

As someone suggested in last meeting, you can post your questions here by editing this page. If you prefer, you can e-mail your questions to me and James as before.

  • I wonder whether the whole Semantic Web idea and particularly the use of an inference system could aid in tasks such as text summarization. Could this be used as a tool to better understand text, infer facts, and as a result produce more meaningful summaries of (web) texts? (Elif)

  • At the end of the "Optimizing web search using social annotations" paper, the authors mention the inevitability of annotation spamming and offer two general solutions. My questions are: To what extent could social annotations be spammed? Do you (we) think the solutions suggested by the authors would work? What would annotation spamming reduce to (e.g. traditional web spamming or email spam), or is it its own thing? (Henry)

  • Web 2.0 has one important property - using users' collaborative endevours to perform some task or enhance the system, e.g. Yahoo!'s question answer system, or many search engines consider mining the web query log to help IR. Semantic Web is also to use human annotation to help computers to do inference. My question is whether the Semantic Web approach could work in helping a general IR task? (it may work for helping IR in specific system like Medical area or Law system) How trustful is the users' data in Web 2.0 - what's the role of computer in Web 2.0 system - is it only a tool to filter spam and provide a platform for users' performance? I heard that some company (maybe MS) is developing a search engine so that people can search together -- is there something new for IR techniques in this kind of system? (Xing)

  • My question is somewhat related to Xing's. Given the Web 2.0 promise of massive tagging of various resources, to what extent can this user-generated data be used as "labeling" for further scientific research? For example, can we assume that document tag distribution defines the true probabilities of clusters to which it may belong, or its true topic distribution (and thus can be used as a training and validation data for clustering algorithm)? Or, even more related to IR, can document tags be used as relevance judgments of sorts or/and means to do pseudo-relevance feedback? That is, if we have document tagged by "statistics poisson", can it be assumed relevant for query "Find documents describing the poisson distribution" (Michael)

  • I wonder what the privacy implications are from implementing something so much more comprehensive as Web 2.0. Currently the Web is a medium where a huge number of people are able to interact, usually without any restrictions on their behavior. Should we be taking a closer look at what it may mean to start implicitly providing information? I mean, in tagging information with metainformation, we are providing information on how we view the world, which may be more prone to manipulation. (Marc)

  • One of the underlying assumptions of the semantic web is that there exist semantic ontologies (taxonomies) which with reasonable accuracy and coverage represent actionable information - so that software agents can use them to infer, reason and act with minimal user intervention. From the perspective of IR, these semantic annotations can help improve retrieval performance. However, we know from experience with resources like Wordnet, we have poor coverage and the problem of sense disambiguation to deal with.(Niranjan)

    • What is the fundamental difference between the semantic web resources and the existing wordnet like thesauri when it comes to helping retrieval ?

  • Furthermore, as an alternative user tagging of resources, while more susceptible to spamming, is probably a lower precision but high coverage resource that is gaining momentum. It seems like the semantic web ought to focus on establishing representation and communication standards for use by software agents that perform specific types of applications and the potential for improving retrieval performance is more of an after thought. (Niranjan)

    • The idea of using a restricted vocabulary is enticing from the perspective of retrieval (reduces variance in representing information). Is there scope for providing suggestions during the annotation process that could allow users to freely use their own vocabulary but also help the retrieval system by tying it some known restricted vocabulary ?

  • Recommending New Stories - The news items are already clustered on the popular news websites to coalesce duplicates. (Niranjan)

    • It is not clear if the authors of this paper handled duplicate news stories in any way. It would make sense to have a cluster level granularity for recommendation of news stories.

    • The discussion of the results on the live traffic was in-conclusive. Why would the combination method do worse compared to the individual methods for recommending stories ?

to top

You are here: Main > IRseminarS08 > IRseminar08-0328

to top

Copyright © 1999-2008 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback