Topic: NLP in IR
People leading: Manjunatha Jagalur
Background:
Given that most of the text documents are in natural languages and user needs can be easily encapsulated as a natural language query, intuition suggests that IR(Information Retrieval) systems built upon NLP (Natural Language Processing) systems should have enormous advantage over other statistical method based systems. But this intuition has not yet translated into a successful general search engine. However numerous studies have shown that concepts from NLP can be incorporated successfully into systems that address particular problems from IR like QA(Question Answering). In this seminar we shall discuss the utility of NLP in IR, application of NLP in QA systems, and some of the NLP based search engines that are still in development.
Required papers:
- Sharon Flank, A layered approach to NLP-based information retrieval, Proceedings of the 17th international conference on Computational linguistics, pp 10-14, 1998.
- Ellen M Voorhees, Natural Language Processing and Information Retrieval, Information Extraction: Towards Scalable, Adaptable Systems, vol 1714, pp 32-48, 1999.
- Matthew W. Bilotti, Paul Ogilvie, Jamie Callan and Eric Nyberg, Structured retrieval for question answering, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp 351-358, 2007.
Required Browsing:
Recommended:
- Ronald M. Kaplan et al., Speed and accuracy in shallow and deep stochastic parsing, Proceedings of NAACL, 2004.
- Andrei Broder,Marcus Fontoura, Vanja Josifovski and Lance Riedel,A semantic approach to contextual advertising, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp 559-566, 2007.
- I. Androutsopoulos, G.D. Ritchie, and P. Thanisch,Natural Language Interfaces to Databases–An Introduction, Natural Language Engineering, vol 1, pp 29-81.
Questions
- Although most past research is quite skeptical on this, wouldn't there be any opportunity to improve ad-hoc retrieval using linguistic processing? If little evidence in query becomes a problem, how about using query expansion to extract named entities and phrase regarding given query? Also, linguistic structs in longer queries(TREC description and narration) doesn't seem to be utilized to its full potential - JinyoungKim
- The Voorhees paper seemed to imply that anything other than simple NLP is effectively useless in improving IR unless its perfect, or the task is limited (i.e. only QA). Should we even bother worrying about developments in NLP if this is the case? Will we ever find a natural dovetailing of deep NLP and IR? - Marc Cartright
- Now IR has begun to consider organizing the information in the corpora so that users can easily access or discover some new information, e.g. building links between events or incident threading. NLP may be more helpful for these complicated new IR tasks? Ideas from some typical NLP tasks like coreference resolution can be borrowed? - Xing Yi
- My question is one of usability. After browsing several NLP search engines, they are all centered around QA task. However, it seems that most users are content with short keyword queries, and have no will to explicitly write natural language queries. Can we instead use NLP to implicitly infer (when possible) the possible question that underlies the short query? Google/Yahoo/Live already do this to some extent. E.g., typing "time amherst ma" gives an immediate answer: "12:29pm Friday (EDT) - Time in Amherst, Massachusetts". Or "capital germany" - "Germany — Capital: Berlin". It would be interesting to see how this can be done for more complicated question types. --- Michael
- Some advanced NLP techniques in IR seem useful under restrictions. I think the case to maximize the utility is that target text is short and the query is long. It is not easy to discover plausible linguistic structures from too long text. Further, it is so to analyze user's context or intension from short queries. The claim that NLP is useful for Q&A or approaches of NLP search engines seem to be driven from such limitations. -- Jangwon
- In terms of generic web retrieval (i.e., not QA), I believe that very low level NLP could aid in the process. For example, after determining that the user is posing a "what-is" query, more weight could be given to hits from Wikipedia or some other source of "what-is"-like definitions. "How-far-is" questions could trigger Google/Yahoo/Microsoft maps to be the first result. Such NLP parsing is cheap and I think, pending user studies, could be very useful. -- HenryFeild
- `The more complex the question to answer becomes, the more NLP we need.' – What do you think about this? Is there a way to get around this in an IR-like fashion? Example complex QA questions: `When was America discovered and how?' or `Who is the wife of a former president who is a candidate for this year’s elections?' Inference is indispensable for these kinds of queries. Maybe incorporating NLP features in addition to keywords for determining ranking might be a good idea to tackle this. --Elif
to top