Umass homepage Umass homepage CIIR homepage

Current CIIR Projects:

Indri (ARDA/NSF)

Sponsored by the Advanced Research and Development Activity in Information Technology (ARDA) under its Statistical Language Modeling for Information Retrieval Research Program and by the National Science Foundation, Indri is a new search engine from the Lemur project.

LEMUR Project (ARDA/NSF)

Sponsored by the Advanced Research and Development Activity in Information Technology (ARDA) under its Statistical Language Modeling for Information Retrieval Research Program and by the National Science Foundation, the Lemur Project is a collaboration between the CIIR and the School of Computer Science at Carnegie Mellon University. The Leumur Toolkit is designed to facilitate research in language modeling and information retrieval, where IR is broadly interpreted to include such technologies as ad hoc and distributed retrieval, cross-language IR, summarization, filtering, and classification. As an extension to the Lemur project, the CIIR has developed INDRI, a language model-based search engine for complex queries.

Nightingale (DARPA)

The CIIR is embarking on a five-year DARPA project under the Global Autonomous Language Exploitation (GALE) program. The goal of GALE is make foreign language (Arabic and Chinese) speech and text accessible to English monolingual people, particularly in military settings. The Nightingale research team includes UMass Amherst, Columbia University, International Computer Science Institute (ICSI), IDIAP Research Institute, HNC/Fair Isaac Corporation, New York University, National Research Council (NRC) Canada, Purdue University, RWTH Aachen University, University of California San Diego, University of Washington, Systran Software, and SRI International. The UMass Amherst team, led by Associate Professor James Allan (PI), Distinguished Professor Bruce Croft (co-PI), and Associate Professor Andrew McCallum (co-PI), focuses on highly accurate retrieval, dynamic topic models, social network discovery, and statistical machine translation..

NSF CRI: CRD - Supporting User Data, Privacy, and Evaluation in the Lemur Toolkit and CRI: Developing the Lemur Toolkit into a Community Resource (NSF/Carnegie Mellon University)

In this collaborative research project between UMass Amherst and CMU, the team is focussing on the continued development of the open-source Lemur software toolkit for language modeling and information retrieval that Carnegie Mellon University and the University of Massachusetts have jointly developed with other funding (see Lemur Project above).

NSF - Searching Archives of Community Knowledge

In this project, we are studying the task of finding good answers in Q&A archives by investigating techniques for question retrieval and comparing them to alternatives such as direct answer retrieval. The techniques that we are developing to search the Q&A archives also have the potential to have a significant impact on all types of search engines. The large Q&A archives can be used as training data for models of text transformation. In other words, by developing models that learn how to recognize questions using these resources, we will also be learning how concepts or topics can be expressed in different ways. These transformation models could then be used to significantly improve the robustness of the topic models used in search engines, which will in turn substantially improve the effectiveness of the system.

NSF - Text Reuse and Information Flow

In this project we are studying a range of approaches to detecting reuse at the sentence level, and a range of approaches for combining sentence-level evidence into document-level evidence. We are also developing algorithms for inferring information flow from timelines, sources, and reuse measures. Given the importance of the Web as a source for detecting reuse, we also focus on techniques that can make efficient use of this huge but unwieldy resource. The research is being evaluated using a range of corpora, such as news, Web crawls, and blogs, in order to explore the dimensions of reuse and information flow in different situations.

NSF ITR - Machine Learning for Sequences and Structured Data: Tools for Non-Experts

In this collaborative research project between UMass Amherst, UPenn, and CMU, the team is researching ways to dramatically improve the ability of people who are not experts in machine learning to design and automatically train models for analyzing and transforming sequences and other structured data such as text, signals, handwriting, and biological sequences.

NSF ITR - Unified Graphical Models

"Unified Graphical Models of Information Extraction and Data Mining with Application to Social Network Analysis" is a research project that aims to improve the ability to data mine information previously locked in unstructured natural language text. The research focuses on developing novel statistical models for information extraction and data mining that have such tight integration that the boundaries between them disappear, resulting in a powerful unified framework for extraction and mining.

Statistical Models for Information Extraction for REFLEX (BBN/DARPA)

In this project, UMass Amherst is a subcontractor to BBN Technologies on a DARPA-sponsored project to develop statistical models for information extraction that combine many sources of information in novel, integrated ways.

Automated Diagnosis of Usability Problems Using Statistical Computational Methods (Aptima)

The effects of poor usability range from mere inconvenience to disaster. Human factors specialists employ usability analysis to reduce the likelihood or impact of such failures. However, good usability analysis requires usability reports that are rarely collected, rarely complete, and difficult to analyze.The CIIR and Aptima have partnered on this AFOSR STTR project to develop a usability analysis system that addresses these problems.

CALO Project (DARPA)

As part of DARPA’s Perceptive Agent that Learns (PAL) program, SRI and team members including the CIIR are working on developing a next-generation "Cognitive Agent that Learns and Organizes" (CALO).

Confidence Measures for Information Extraction of Entities, Relations and Object Correspondence (NSF KDD)

In this NSF KDD project, UMass Amherst intends to improve the state-of-the-art in the ability to associate confidence measures with information extracted from unstructured text. The team will build on its previously successful research in probabilistic models for confidence assessment of individual extracted text segments, and will provide new capabilities for confidence assessment of object correspondence, and relations between entities.



Recent Projects

 

© 2008 University of Massachusetts Amherst. Site Policies.
This site is maintained by Department of Computer Science/Center for Intelligent Information Retrieval