Department of Computer Science
 

The Student Workshops in Information Retrieval and Language Presents:

Jamie Teevan, Jason Rennie, and Percy Liang
MIT
Computer Science and Artificial Intelligence Laboratory

Tuesday, November 9, 2004
Computer Science Building, Room 151
4:00 PM

Faculty Host: James Allan

"Personalized Web Search: Uncommon Responses to Common Queries" - Jamie Teevan"

Today, search is the same for everyone.  If two people enter the query " IR", they get the same results, regardless of whether they are interested in information retrieval research, or stock quotes for the Ingersoll-Rand Company.  In this talk, I will present research that deals with this problem by personalizing the Web search experience based on an index of information that the user has seen before.  Developing personalized search gives rise to a number of challenges, from how to evaluate result quality to how to richly yet efficiently represent both the user and the corpus. I will address these issues and present a text-based search algorithm that significantly outperforms relevance feedback where the user has fully specified the relevant document set.

Jaime Teevan is a doctoral candidate at the Computer Science and Artificial Intelligence Lab at the Massachusetts Institute of Technology, where her research focuses on helping people re-find information in dynamic information environments.  She has published in the areas of HCI, Information Retrieval, and Machine Learning.  She received an SM ('01) from MIT for research on learning probabilistic information retrieval models, and a BS in Computer Science from Yale University.  She also worked for several years at Infoseek.

4:20pm: "A Hybrid Model for Co-reference Resolution" - Jason Rennie

The problem of co-reference resolution can be compared to both clustering and classification.  In one light, co-reference resolution can be viewed as the problem of finding the antecedent for each noun phrase.  In another, it can be seen as identity resolution.  The first view begs for a classification model; the second, a clustering model with a learned distance metric.  We present a hybrid approach---a conditional, probabilistic model that subsumes both of these views.


Jason Rennie has been interested in applying learning techniques to text and language since his first year at Carnegie Mellon.  He developed one of the first learning-based e-mail filters, ifile.  He also worked with Andrew McCallum on Cora, a research paper search engine; they developed Reinforcement Learning-based algorithms for focused Web crawling.  In 1999, he graduated and joined the PhD program at MIT with Tommi Jaakkola as his advisor.  He completed his S.M. in 2001 with a treatise on text classification with Naive Bayes. He interest in text classification has continued; he worked with other students to analyze various multi-class classification techniques, uncover ways to fix the most serious flaws of Naive Bayes, and develop techniques for applying super-linear classification algorithms to very large data sets.  He has since turned his attention the domain of natural language and is currently working with his advisor on algorithms for named entity extraction and co-reference resolution.

4:40pm: "Word clustering for segmentation" - Percy Liang

We attempt to use unlabeled data (raw text) to boost the performance of supervised techniques for NP chunking, named-entity recognition, and Chinese word segmentation.  By using a discriminative model (the averaged perceptron), we are able to include arbitrary features in the model.  We perform clustering on the raw text and use the cluster identities of words as features.  Also, instead of using the standard BIO tagging for segmentation, we use semi-Markov models, a more natural and powerful framework for segmentation tasks.

Percy Liang is a masters student of Michael Collins at MIT.  His current research is in semi-supervised natural language processing.  As an undergraduate at MIT, he worked on algorithms involving hypertrees and applied them to machine learning applications.  He has participated in many programming contests such as the IOI and ACM, and now is involved in the coaching and organizing aspects of the contests.