Celebrating 25 Years of the Center for Intelligent Information Retrieval

For 25 years, the Center for Intelligent Information Retrieval (CIIR) has been at the forefront of research in information retrieval and applied natural language processing and engaged in groundbreaking industry and government collaborations.

“When the center was first proposed, it would have been hard to envision the dramatic changes in the Internet and web search that would take place over the next twenty years, but we found that the relevance of the CIIR’s research focus increased with each change,” observes Distinguished Professor W. Bruce Croft, director of the CIIR since its creation in September 1992 and former dean of CICS.

The CIIR began with an eight-year National Science Foundation (NSF) State/Industry University Cooperative Research Center (S/IUCRC) program grant. Croft was the director, Professors Rick Adrion, Wendy Lehnert, Victor Lesser, and Edwina Rissland were co-principal investigators, and Paul McOwen served as administrative director. The S/IUCRC program provided $300,000 annually from the NSF and $300,000 from the state, with matching funds provided by the CIIR industry members. In total, 18 faculty have been involved in the CIIR, including Professor James Allan (current CIIR co-director and chair of the faculty in CICS) and Jamie Callan (Professor at Carnegie Mellon University and former CIIR assistant director). Professor Andrew McCallum joined the CIIR in 2002 and is now the Director of the Center for Data Science.

From the start, the CIIR was unique in its use of an innovative non-profit technology transfer corporation, ACSIOM (Applied Computing Systems Institute of Massachusetts), to quickly license and commercialize technology developed within the center. ACSIOM and Rick Adrion, the chair of the ACSIOM Board, were critical to the success of the proposal and the technology transfer operations of the center.

Information retrieval (IR) studies how people access and understand information and how computer-based systems can support that process. The CIIR is recognized as one of the world’s leading IR research groups. Its researchers have made significant contributions to the field, including:

  • Understanding and improving information access though probabilistic retrieval models, including the first description of a retrieval system based on statistical language models.
  • Introducing and improving numerous techniques for text and query representation, such as phrase representations, passages, “named entities,” statistical stemming, and query expansion.
  • Leading the development of techniques for distributed search based on automatically representing databases and combining local searches.
  • Producing the first high capacity probabilistic filtering architecture and carrying out some of the earliest evaluations of machine learning algorithms for filtering.
  • Helping define and evaluate the first versions of event detection and tracking software.
  • Carrying out some of the earliest research on ranking and representation techniques for Asian languages, and showing how bilingual dictionaries can be an effective basis for a cross-lingual system.
  • Developing some of the first approaches to information extraction that emphasized learning.
  • Evaluating novel techniques for indexing images and video using joint models with associated text.
  • Developing search engine software that has been used by thousands of academic groups, government agencies, and companies.

As a testament to their impact, CIIR researchers have received six ACM SIGIR Test of Time Awards (and additional honorable mentions) recognizing their “long-lasting influence” on IR research. Croft and co-authors received the awards for leading research in the areas of relevance based language models, query expansion using document analysis techniques, and inference networks for searching distributed collections. Allan and co-authors received the award for their influential work on topic detection and tracking.

A continuing priority for the CIIR is providing information retrieval research tools to the academic community and to the public. The Lemur Toolkit, a collaboration with CMU’s Jamie Callan, is open-source software that has become a standard resource for researchers in the field of information retrieval. Academic mentoring has also been a key focus, and hundreds of talented research staff, students, and visiting researchers/postdocs were involved with the CIIR over its 25-year history.

In addition to producing fundamental research advances, the CIIR has been involved in many innovative industry collaborations. In the early 1990s, corporate partners used the CIIR’s search technology in numerous “firsts,” as they began using the web. For instance, Lotus Development Corporation, well known for its Lotus 1-2-3 spreadsheet application, worked with the CIIR to create a worldwide online customer support system based on CIIR technology. The early web search company Infoseek used CIIR technology, and West Publishing (now Thomson Reuters) based their WIN legal document retrieval system on information retrieval technology from the CIIR. In a well-publicized example of industrial collaboration, the CIIR worked with IBM on the question answering technology behind the company’s intelligent computing system, Watson, that defeated two top-winning quiz show contestants in 2011’s first-ever human vs. machine “Jeopardy!” competition. Watson used a variant of the CIIR’s Indri search engine as one of its two main document search engines.

The CIIR was also involved in many firsts with its government partners. In the early 1990s, the CIIR developed Govbot, the first federal government information portal for a one-stop search site for government information and technology across the entire federal government (all .gov and .mil sites). Using the CIIR InQuery software as their search engine, the Clinton/Gore White House was the first presidential administration to provide a web search capability for Presidential speeches, press briefings, and other documents.

The U.S. Department of Commerce chose the CIIR to make its National Trade Databank searchable over the Internet for the first time. In another first, the U.S. Holocaust Memorial Museum collaborated with the CIIR on making the museum’s entire collection of text, audio, images, and films available and searchable over the Internet. The CIIR worked with the U.S. Library of Congress (LoC) on a number of groundbreaking and high-visibility projects, and CIIR technology was used in the THOMAS System, a searchable corpus of Congressional Research Reports, Public Policy File, and all existing federal law and pending bills. THOMAS was an enormous success emulated in countries around the world. The CIIR also worked with the LoC to provide access to several collections, including the American Memory collection of historic photographic archives and the Global Legal Information Network. Other CIIR government partners included the Patent and Trademark Office, Internal Revenue Service, National Library of Medicine, and the U.S. Department of Transportation.

“A goal of the NSF S/IUCRC program was for centers to become self-sustaining after the NSF funding ended. The CIIR achieved that goal and remains a thriving center today, 25 years after the original funding began,” noted Croft. “With the advent of mobile search and voice-based search, and new developments in neural net models, there is enormous potential for new and exciting research and industry collaborations in this area and we expect the CIIR to be an important part of that future.”

A timeline with highlights from a quarter century of the CIIR is available at ciir.cs.umass.edu/timeline.

By The Numbers 1992- 2017

  • $75 million+ funding
  • 500+ CIIR personnel
  • Nearly 400 students supported (50% grad; 50% undergrad)
  • 75 PhDs produced
  • 18 faculty involved
  • 60 staff
  • 100+ CIIR industry/government members
  • 1,000+ publications (journals and conference papers)
  • 300+ software licenses
  • Over 100K search software downloads

The full CIIR anniversary article with photos appeared in the CICS Significant Bits Fall 2017 newsletter (see pages 8-9).