main | research | publications | cv | links
[ Markov Random Fields for IR | Indri | RECAP | Image Retrieval | Question Classification ]

Markov Random Fields for IR (Thesis Topic)

Publications:

Presentations:

Related Publications:


Indri

Description:

Indri is an effective, scalable search engine with a robust query language. The project is a collaborative effort with Trevor Strohman, Howard Turtle, Bruce Croft and Carnegie Mellon University. It synthesizes and enhances the Lemur and InQuery search tools. My contribution to the project focused on the development of the underlying retrieval model and query language.

For more information go here.

Publications:

Presentations:

Posters:

Other information:

Download


RECAP

Description:

RECAP is a system for exploring, analyzing and visualizing various types of text reuse in a collection of documents. Text reuse appears in many different forms, ranging from summarization to plagiarism. Such reuse is very common in the newswire data. A trusted source, such as the Associated Press, will produce a story, and other sources, such as the LA Times or Wall Street Journal, will pick up the article. The article will sometimes be copied exactly, with no modifications. Other times, the article will be rewritten to adhere to stylistic, political, or other factors. In other cases, an article will borrow from multiple past sources. It is of interest to detect such reuse for a number of reasons, including determining how authoritative a piece of text is, uncovering the primary source of a fact, detecting plagiarism, among others.

RECAP is a collaborative effort with Yaniv Bernstein, Justin Zobel, and Alistair Moffat.

Publications:

Presentations:

Demonstration:

Download:


Image Retrieval

Description:

This work was done in collaboration with R. Manmatha.

Publications:

Presentations:


Question Classification

Description:

Question classification is the task of determining the expected answer type of a question. For example, for the question "Where was Thomas Edison born?", we expect the answer to be a location. Question classification systems often are used as subcomponents of question answering systems to prune the type(s) of answer returned. Much of the previous work into question classification has dealt with constructing rules by hand. Such a task is tedious and does not generalize well across domains. In this work, we focus on statistical machine learning approaches to the problem. Here, instead of hand crafting rules, we automatically learn a classifier. We employ a support vector machine (SVM) approach to the problem and explore syntactic and semantic features. Experimental results run against TREC QA track questions, UIUC questions, and MadSci questions are presented. We show that using a combination of syntactic features (unigrams, bigrams) and semantic features (WordNet hypernym expansion of headwords) yields promising results.

Publications:



Last updated: Fri Sep 8 20:20:42 EDT 2006