Handwritten Document Retrieval - Data Sets

Image/Feature Data Sets


The file format is gnu-zipped tar (tape archive). To unpack:
  1. UNIX: gunzip -c archive.tgz | tar xvf -
  2. Windows: use WinZIP or WinRAR
Each archive contains a file README.txt with instructions on how to use the data.


Please note that by downloading any of the data on this web page you agree to our copyright notice and to read the instructions in the README.txt file contained in each archive.
Do not share the datasets or the URL of this page! Please point interested people to the download page of the Center for Intelligent Information Retrieval (click on the button next to Word Image Data Sets).


  1. 20 pages of George Washington's manuscripts, with segmentation information and ground truth, i.e. annotations (58.5MB)
  2. Data set of good quality (low degradation) (15.8MB)
  3. Profile features ("time series") extracted from the preceding data set. (4.3MB)