CRI: Developing the Lemur Toolkit into a Community Resource

PI: Jamie Callan, Professor, Computer Science, Carnegie Mellon University
Co-PI: W. Bruce Croft, Distinguished Professor, Computer Science, University of Massachusetts Amherst

The focus of this National Science Foundation project (through subcontract with Carnegie Mellon University) is on the continued development of the open-source Lemur software toolkit for language modeling and information retrieval that Carnegie Mellon University and the University of Massachusetts have jointly developed under other funding. The areas of primary UMass Amherst activity and responsibility for the project are as follows.

  • Continued improvement of the INDRI indexing and retrieval software, for example to improve ease of use, ease of installation, memory efficiency, and speed of indexing and/or retrieval;

  • Continued improvement of INDRI’s support for documents represented in XML;

  • Improved support for “small Web” collections (e.g., collections of at least ten million HTML documents);

  • Continued improvement of Lemur’s multilingual and cross-lingual capabilities;

  • Improved support for heavily annotated documents, with particular emphasis on ease of use for annotations used commonly in the research community (e.g., Treebank part-of-speech and part-of-syntax tags, IdentiFinder named entity tags, tags produced by GATE);

  • Joint development with CMU of a Lemur component for adaptive information filtering and topic detection and tracking (TDT);

  • Active participation with CMU in maintaining Lemur software, including fixing software errors (“bugs”) and retrofitting new capabilities and API changes into older software; and

  • Active participation with CMU in documenting changes made at UMass Amherst.

    Graduate students and Researchers/Programmers involved in this project:

  • Kevyn Collins-Thompson, Graduate Student, Carnegie Mellon University
  • David Fisher, Senior Software Engineer, University of Massachusetts Amherst
  • Mark Hoy, Senior Research Programmer, Carnegie Mellon University
  • Xiaoyong Liu, Graduate Student, University of Massachusetts Amherst
  • Donald Metzler, Graduate Student, University of Massachusetts Amherst
  • Paul Ogilvie, Graduate Student, Carnegie Mellon University
  • Trevor Strohman, Graduate Student, University of Massachusetts Amherst
  • Xing Wei, Graduate Student, University of Massachusetts Amherst
  • Le Zhao, Graduate Student, Carnegie Mellon University

    More details on the Lemur project and the software download can be found at: http://www.lemurproject.org/.

    This project is sponsored by the National Science Foundation grant #CNS-0454018.