Skip to topic | Skip to bottom
Home
Main
Main.DavidSmithTalkr1.1 - 17 Feb 2008 - 15:11 - HenryFeildtopic end

Start of topic | Skip to actions

Research Faculty Candidate Talk

David Smith

Johns Hopkins University
The Center for Language and Speech Processing

Friday, March 7th, 2008
Computer Science Building, Room 151
Time: 11:00 a.m.

Faculty Host: Bruce Croft

"Bootstrapping Monolingual Parsers from Multilingual Data"

The creation of the Penn Treebank and similar datasets ca. 1990 produced a flowering of research on empirically trained syntactic parsers, which is now bearing fruit in information extraction and machine translation. This revolution has bypassed most languages and domains, however, due to the expense of creating treebanks. Semi-supervised learning methods such as bootstrapping and cotraining have the potential to leverage diverse sources of knowledge for robust statistical parsing in these new settings.

Drawing on Abney's (2004) analysis of the Yarowsky algorithm, I present a view of bootstrapping as optimization. This optimization is performed with standard dynamic programming for projective syntax or with a new model of graph spanning trees for non-projective syntax, which allows trees with crossing dependency links in languages such as Czech, Danish, and Dutch. Finally, I show how to draw features for a parser in one language from parse trees in another language. These quasi-synchronous grammars extend prior bootstrapping work with synchronous grammars and also have applications in translation modeling.

Bio:

David Smith is currently a Ph.D. student in Johns Hopkins University's Computer Science Department and Center for Language and Speech Processing and an NSF graduate fellow. He received his A.B. in classics from Harvard University. His interests are in machine translation, natural language parsing, and semi-supervised machine learning methods. David was formerly head programmer for the Perseus Digital Library Project at Tufts University, where he strayed from the path of classical philology toward text mining, geocoding, and information extraction.
to top


You are here: Main > LabMeetingS08 > DavidSmithTalk

to top

Copyright © 1999-2008 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback