<<O>>  Difference Topic DavidSmithTalk (r1.1 - 17 Feb 2008 - HenryFeild)
Line: 1 to 1
Added:
>
>
META TOPICPARENT LabMeetingS08

Research Faculty Candidate Talk

David Smith

Johns Hopkins University
The Center for Language and Speech Processing

Friday, March 7th, 2008
Computer Science Building, Room 151
Time: 11:00 a.m.

Faculty Host: Bruce Croft

"Bootstrapping Monolingual Parsers from Multilingual Data"

The creation of the Penn Treebank and similar datasets ca. 1990 produced a flowering of research on empirically trained syntactic parsers, which is now bearing fruit in information extraction and machine translation. This revolution has bypassed most languages and domains, however, due to the expense of creating treebanks. Semi-supervised learning methods such as bootstrapping and cotraining have the potential to leverage diverse sources of knowledge for robust statistical parsing in these new settings.

Drawing on Abney's (2004) analysis of the Yarowsky algorithm, I present a view of bootstrapping as optimization. This optimization is performed with standard dynamic programming for projective syntax or with a new model of graph spanning trees for non-projective syntax, which allows trees with crossing dependency links in languages such as Czech, Danish, and Dutch. Finally, I show how to draw features for a parser in one language from parse trees in another language. These quasi-synchronous grammars extend prior bootstrapping work with synchronous grammars and also have applications in translation modeling.

Bio:

David Smith is currently a Ph.D. student in Johns Hopkins University's Computer Science Department and Center for Language and Speech Processing and an NSF graduate fellow. He received his A.B. in classics from Harvard University. His interests are in machine translation, natural language parsing, and semi-supervised machine learning methods. David was formerly head programmer for the Perseus Digital Library Project at Tufts University, where he strayed from the path of classical philology toward text mining, geocoding, and information extraction.

Revision -
Revision r1.1 - 17 Feb 2008 - 15:11 - HenryFeild