![]() | |||
The
Student Workshops in Information Retrieval and Language Presents:
Jamie Teevan, Jason Rennie, and
Percy Liang Tuesday, November
9, 2004 Faculty Host: James Allan "Personalized Web Search: Uncommon Responses to Common Queries" - Jamie Teevan" Today, search is the same for everyone. If two people enter the query " IR", they get the same results, regardless of whether they are interested in information retrieval research, or stock quotes for the Ingersoll-Rand Company. In this talk, I will present research that deals with this problem by personalizing the Web search experience based on an index of information that the user has seen before. Developing personalized search gives rise to a number of challenges, from how to evaluate result quality to how to richly yet efficiently represent both the user and the corpus. I will address these issues and present a text-based search algorithm that significantly outperforms relevance feedback where the user has fully specified the relevant document set. Jaime Teevan is a doctoral candidate at the Computer Science and Artificial Intelligence Lab at the Massachusetts Institute of Technology, where her research focuses on helping people re-find information in dynamic information environments. She has published in the areas of HCI, Information Retrieval, and Machine Learning. She received an SM ('01) from MIT for research on learning probabilistic information retrieval models, and a BS in Computer Science from Yale University. She also worked for several years at Infoseek. 4:20pm: "A Hybrid Model for Co-reference Resolution" - Jason Rennie The problem of co-reference resolution can be compared to both clustering and classification. In one light, co-reference resolution can be viewed as the problem of finding the antecedent for each noun phrase. In another, it can be seen as identity resolution. The first view begs for a classification model; the second, a clustering model with a learned distance metric. We present a hybrid approach---a conditional, probabilistic model that subsumes both of these views.
4:40pm: "Word clustering for segmentation" - Percy Liang We attempt to use unlabeled data (raw text) to boost the performance of supervised techniques for NP chunking, named-entity recognition, and Chinese word segmentation. By using a discriminative model (the averaged perceptron), we are able to include arbitrary features in the model. We perform clustering on the raw text and use the cluster identities of words as features. Also, instead of using the standard BIO tagging for segmentation, we use semi-Markov models, a more natural and powerful framework for segmentation tasks. Percy Liang is a masters student of Michael Collins at MIT. His current research is in semi-supervised natural language processing. As an undergraduate at MIT, he worked on algorithms involving hypertrees and applied them to machine learning applications. He has participated in many programming contests such as the IOI and ACM, and now is involved in the coaching and organizing aspects of the contests. | |||
|
|
|||