Umass homepage Umass homepage CIIR homepage

Download Center

Before proceeding with any download, check our "License Agreement" .

We would also like you to consider registering. It only takes a minute and will give us a contact to inform of any updates to the software.

Email Address: Forgot Your Access Code?
Access Code: Are you new? Please Register!
Why do we have access codes?

List of available downloads:

Question Classifier

Wei Li
Last Updated: Mar. 28, 2002

This is a question classifier that maps natural-language questions into entity types, like PERSON, LOCATION, NUMBER, and so on. It contains three different models: one rule-based question pattern model and two probabilistical language models.

Novelty Track, TREC

Alvaro Bolivar
Last Updated: Jan. 14, 2003

This is a collection building toolkit to assemble the training set used by the CIIR in its participation in the Novelty track at TREC 2002. For details check conference proceedings.

KStem Java Implementation

Sergio Guzman-Lara
Last Updated: Apr. 27, 2007

This is the source code of a java implementation of kstem (a stemmer designed by Bob Krovetz). In particular, this implementation is adapted for Lucene. To install, download KStem.jar to Lucene's src directory and unjar it there. Then com pile.

Word Image Data Sets *Requires Registration*

Toni Rath
Last Updated: Jan. 07, 2003

Data sets containing word images from the George Washington collection with meta-data for retrieval performance evaluation.

Table Extractor

David Pinto and Xing Wei
Last Updated: Mar. 29, 2003

This software package is a table tagger, which processes text tables in documents by tagging each cell. The input is a file that may have many documents in it. The outputs are the processed table cells, extracted tables, tagged file and non-table text.

IESL

The IESL Lab

Downloadable code and data from the Information Extraction and Synthesis Laboratory (IESL) can be found at http://www.cs.umass.edu/~mccallum/code-data.html.

Event Threading experiment

Nallapati, R., Feng, A., Peng, F., and Allan, J.
Last Updated: Feb. 22, 2005

This is the experimental data from "Event Threading within News Topics" in the Proceedings of CIKM 2004 conference, pp. 446-453.

Indri

The Lemur Project

Indri is a new search engine from the Lemur project; a cooperative effort between the University of Massachusetts Amherst and Carnegie Mellon University to build language modeling information retrieval tools.

Indri Home Page

Stemming Class from Stemming and Cooccurrence on a Larger Corpus

Jeremy Pickens

Three sets of experiments were done, using initial classes created by (1) the Porter stemmer, (2) K-Stem, and (3) the Porter stemmer classes merged in a connected component manner with the K-Stem classes.