History

The Center for Intelligent Information Retrieval (CIIR) was formed in September 1992 with W. Bruce Croft as Director. Croft had been working in the field of information retrieval since he was a graduate student in 1975, and the advent of new storage media and networks in the 90’s created more general interest in the technology of search engines. The CIIR was a National Science Foundation State/Industry University Cooperative Research Center (S/IUCRC) from 1992 to 2001, with Professors Rick Adrion, Wendy Lehnert, Victor Lesser, and Edwina Rissland as co-Principal Investigators.

The current faculty involved in the CIIR includes Distinguished Professor Croft, Professor James Allan, Professor Andrew McCallum, Assistant Professor Benjamin Marlin, Assistant Professor Hanna Wallach, Research Associate Professor R. Manmatha, and Research Assistant Professor David Smith. Croft joined the Department of Computer Science at UMass Amherst in 1979. Dr. Allan joined the CIIR in 1994 as a Senior Post-doctoral Research Associate and later as a Research Assistant Professor. He received a tenure-track professorship in the Department in 1998. Professor Allan became the co-Director of the CIIR while Dr. Croft was Department Chair from 2001-2007. Dr. McCallum joined the CIIR in 2002 as a Research Associate Professor and was promoted to a tenure-track Associate Professor in 2003 and full Professor in 2009; Benjamin Marlin joined the CIIR in 2011 as an Assistant Professor. Hanna Wallach joined the CIIR in 2007 as a Senior Postdoctoral Research Associate and became a tenure-track Assistant Professor in 2010. After receiving his Ph.D. from UMass in 1997, Manmatha started as a Post-doctoral researcher and was promoted to Research Assistant Professor in 1998 and Research Associate Professor in 2006. David Smith joined the CIIR in 2008 as a Research Assistant Professor. From 1995-1999, Dr. Jamie Callan was Assistant Director of the CIIR. After receiving his Ph.D. in 1993 from the Department, Dr. Callan joined the CIIR as a Senior Post-doctoral Research Associate and later as a Research Assistant Professor before leaving in 1999 for a tenure-track professor position at Carnegie Mellon University.

Since 1992, we have employed more than 150 graduate students and 140 undergraduates. Seventy-three of the Center's students received Ph.D.s, and 42 received M.S. degrees.

The original mission of the Center for Intelligent Information Retrieval (CIIR) was to “develop technology that supports the emerging information infrastructure into the next century” (i.e. the 21st century).This mission was important in 1992 when the Center began, and became even more critical with the advent of the World-Wide Web and the Internet community.

The research carried out in the Center has been described in more than 900 journal and refereed conference papers. Some of the contributions we made during our first ten years include the following: We made significant contributions to understanding and improving the retrieval process though probabilistic models, including the first description of a retrieval system based on statistical language models. We introduced and improved a number of techniques for text and query representation, such as phrase representations, passages, "named entities", statistical stemming, and query expansion. We led the development of techniques for distributed search based on automatically representing databases and combining local searches. We produced the first high capacity probabilistic filtering architecture and carried out some of the earliest evaluations of machine learning algorithms for filtering. We helped to define and evaluate the first versions of event detection and tracking software. We carried out some of the earliest research on ranking and representation techniques for Asian languages, and showed how bilingual dictionaries can be an effective basis for a cross-lingual system. We developed some of the first approaches to information extraction that emphasized learning. We have evaluated novel techniques for indexing images and video.

The CIIR was also involved in a number of ground-breaking industry collaborations during its early years. Examples of those collaborations include:

  • West Publishing's "WIN" legal document retrieval system is based on information retrieval technology from the CIIR.
  • Lotus Development created a world-wide customer support technical reference retrieval system based on CIIR technology.
  • Infoseek licensed INQUERY to provide low-cost, high speed searches of the Internet.
  • The U.S. Library of Congress used INQUERY to provide access to a number of collections including the American Memory collection of historic photographic archives and the Global Legal Information Network. Another INQUERY system was the basis of the Thomas System, the corpus of Congressional Research Reports, Public Policy File, and all existing Federal law and pending bills.
  • The Executive Office of President Clinton and the Vice President's Office of the National Performance Review (renamed National Partnership for Reinventing Government) used INQUERY on its web site. The White House Home Page on the World Wide Web had links to many of the President's databases searchable using INQUERY. The publications server included transcripts of Presidential speeches, actual recorded speeches, press briefings, foreign and domestic policy documents, etc.
  • The General Services Administration (GSA) funded the application of advanced technology from the CIIR to develop a web searching capability across the entire Federal Government (all .gov and .mil sites). "Govbot" was the first government information portal that indexed over 1.5 million web pages for a one-stop shopping site of government information.

    For a perspective on what the CIIR was focusing on during the 90's, see "What Do People Want From Information Retrieval?" by W. Bruce Croft.