Multimedia Indexing and Retrieval Laboratory

The Multimedia Indexing and Retrieval Laboratory's (MIR) research focuses on retrieving databases of images, videos and scanned handwritten documents.

Libraries have traditionally annotated images manually with text and then retrieved the resulting images. This is labor intensive, expensive and tedious to do. One of our current approaches involves using statistical approaches to automatically annotate and retrieve images (videos) given a small annotated training set of images (videos). One approach involves viewing the problem as similar to that of cross-lingual retrieval where say a set of documents in French is retrieved using an English query. To do this, a parallel corpus of documents in English and French is required for training. By analogy we have a parallel vocabulary of image features and annotation words obtained from a training set. Given this training set a relevance (based language) model is learned. This relevance model is then used to annotate unseen test images. The test images may then be retrieved via their automatic annotations using text queries and a language model based retrieval approach. We have applied a number of other models to this area and the approach is very promising.

Current handwriting recognition works well for constrained domains such as postal address recognition and bank check recognition. There has been little work on unconstrained domains like historical manuscripts. MIR has developed the first automatic retrieval system for retrieving handwritten manuscripts and has demonstrated this on a 1000 page (8 Gb) database of George Washington's manuscripts.

The approach is similar to that used for image annotation and retrieval. The scanned images are automatically segmented using a scale space page segmentation algorithm. The word images are preprocessed and features extracted from them. A small training dataset is produced by annotating the words in a small portion of the manuscripts. A statistical model is learned using this test set and is then used to automatically annotate the test set with words and probabilities. A language model based retrieval approach may then be used to retrieve pages given a text (ASCII) query. As mentioned above this has been demonstrated on a part of the George Washington dataset. We have also developed handwriting recognition algorithms. We are currently working on improving performance, scalability issues and on learning models for out of vocabulary terms.

Past work by MIR includes a multi-modal retrieval using appearance based image retrieval and text retrieval which was applied to a large database of trademarks containing image and text data from the US Patent and Trademark Office. The database contained 68,000 trademarks which could be searched using either image retrieval or image and text retrieval while 615,000 trademarks could be searched using text retrieval.