Markov Random Fields for IR (Thesis Topic)
Publications:
- Metzler, D., and Croft, W.B. "Beyond Bags of Words: Modeling Implicit User Preferences in Information Retrieval," in the Proceedings of AAAI'06 (Nectar Track), 2006. [pdf][abstract]
- Metzler, D. and Croft, W.B., "Modeling Query Term Dependencies in Information Retrieval with Markov Random Fields," Proceedings of the North East Student Colloquium on Artificial Intelligence (NESCAI), 2006. [pdf][abstract][note: shortened, slightly revised version of "A Markov Random Field Model for Term Dependencies"]
- Metzler, D. "Direct Maximization of Rank-based Metrics," CIIR Technical report. [pdf][abstract]
- Metzler, D. and Croft, W.B., "A Markov Random Field Model for Term Dependencies," Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2005), 472-479, 2005. [pdf][abstract][Best Student Paper]
Presentations:
- Beyond Bags of Words: Modeling Implicit User Preferences in Information Retrieval -- Presented at AAAI 2006. [pdf]
- Beyond Bags of Words: A Markov Random Field Model for Information Retrieval -- Presented at UMass Machine Learning and Friends Lunch, April 2006. [pdf]
- Modeling Query Term Dependencies in Information Retrieval with Markov Random Fields -- Presented at NESCAI 2006. [pdf]
- A Markov Random Field Model for Term Dependencies -- Presented at SIGIR 2005. [pdf]
Related Publications:
- Eguchi, K. and Croft, W.B. "Query Structuring with Two-stage Term Dependence in the Japanese Language," CIIR Technical Report, 2006. [pdf]
- Eguchi, K. "NTCIR-5 Query Expansion Experiments using Term Dependence Models," in the Proceedings of the Fifth NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access, Tokyo, Japan, December 2005, 494-501. [pdf]
Indri
Description:Indri is an effective, scalable search engine with a robust query language. The project is a collaborative effort with Trevor Strohman, Howard Turtle, Bruce Croft and Carnegie Mellon University. It synthesizes and enhances the Lemur and InQuery search tools. My contribution to the project focused on the development of the underlying retrieval model and query language.
For more information go here.
Publications:
- Strohman, T., Metzler, D., Turtle, H., Croft, W.B., "Indri: A language model-based serach engine for complex queries" in the online Proceedings of the International Conference on Intelligence Analysis. [pdf][abstract]
- Metzler, D., Strohman T., Turtle H., and Croft, W.B., "Indri at TREC 2004: Terabyte Track" in the Online Proceedings of 2004 Text REtrieval Conference (TREC 2004).[pdf][abstract]
- Metzler, D. and Croft, W.B., "Combining the Language Model and Inference Network Approaches to Retrieval," Information Processing and Management Special Issue on Bayesian Networks and Information Retrieval, 40(5), 735-750, 2004. [pdf][abstract]
Presentations:
- An Overview of the Indri Search Engine -- Presented to ChengXiang Zhai's Information Retrieval group at the University of Illinois, Urbana-Champaign. [ppt][pdf]
- Indri at TREC 2004: UMass Terabyte Track Overview -- CIIR lab talk. [ppt][pdf]
- An Overview of Indri -- CIIR lab talk. [ppt][pdf]
- Formal Multinomial and Multiple-Bernoulli Language Models -- CIIR lab talk. [ppt][pdf]
- Incorporating Language Modeling into the Inference Network Retrieval Framework -- CIIR lab talk. [ppt][pdf]
Posters:
- Metzler, D., Lavrenko, V., and Croft, W. B., "Formal Multiple-Bernoulli Models for Language Modeling," Proceedings of ACM SIGIR 2004, 540-541, 2004.[paper pdf][paper abstract][poster pub][poster pdf]
Other information:
- Quick query language overview with examples [html][text][pdf]
- Slightly more detailed query language spec [coming soon]
Download
- Download Indri.
RECAP
Description:RECAP is a system for exploring, analyzing and visualizing various types of text reuse in a collection of
documents. Text reuse appears in many different forms, ranging from summarization to plagiarism. Such reuse is
very common in the newswire data. A trusted source, such as the Associated Press, will produce a story, and
other sources, such as the LA Times or Wall Street Journal, will pick up the article. The article will sometimes
be copied exactly, with no modifications. Other times, the article will be rewritten to adhere to stylistic,
political, or other factors. In other cases, an article will borrow from multiple past sources. It is of
interest to detect such reuse for a number of reasons, including determining how authoritative a piece of text is,
uncovering the primary source of a fact, detecting plagiarism, among others.
RECAP is a collaborative effort with Yaniv Bernstein,
Justin Zobel, and
Alistair Moffat.
Publications:
- Metzler, D., Bernstein, Y., Croft, W.B., Moffat, A., and Zobel, J. "Similarity Measures for Tracking Information Flow," Proceedings of the ACM Conference on Information and Knowledge Management (CIKM 2005), 517-524, 2005. [pdf][abstract]
Presentations:
- Similarity Measures for Tracking Information Flow -- Presented at CIKM 2005. [pdf]
Demonstration:
- Metzler, D., Bernstein, Y., Croft, W.B., Moffat, A., and Zobel, J., "The Recap System for Identifying Information Flow," Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2005), 678-678, 2005.[pdf]
Download:
- Email me if you are interested in obtaining a demo.
Image Retrieval
Description:This work was done in collaboration with R. Manmatha.
Publications:
- Metzler, D., and Manmatha, R., "An Inference Network Approach to Image Retrieval," Proceedings of the International Conference on Image and Video Retrieval (CIVR-2004), 42-50, 2004.[pdf][abstract]
Presentations:
Question Classification
Description:Question classification is the task of determining the expected answer type of a question. For example, for the question "Where was Thomas Edison born?", we expect the answer to be a location. Question classification systems often are used as subcomponents of question answering systems to prune the type(s) of answer returned. Much of the previous work into question classification has dealt with constructing rules by hand. Such a task is tedious and does not generalize well across domains. In this work, we focus on statistical machine learning approaches to the problem. Here, instead of hand crafting rules, we automatically learn a classifier. We employ a support vector machine (SVM) approach to the problem and explore syntactic and semantic features. Experimental results run against TREC QA track questions, UIUC questions, and MadSci questions are presented. We show that using a combination of syntactic features (unigrams, bigrams) and semantic features (WordNet hypernym expansion of headwords) yields promising results.
Publications:
- Metzler, D. and Croft, W.B., "Analysis of Statistical Question Classification for Fact-based Questions," to appear in Information Retrieval. [pdf][abstract]
Last updated: Fri Sep 8 20:20:42 EDT 2006