Current Research

My research focuses on the problems of organizing and retrieving visual information. Visual information here not only includes pictures and video but also images of scanned documents and text. I believe that one needs to take advantage of as much information to retrieve this information. Thus, I work on a number of different ways of retrieving images (content based image retrieval), but also work on extracting text from images (so that it may be used to annotate the image). I also work on indexing handwritten manuscripts like the George Washington collection at the Library of Congress> The idea, here, is to make an index like the one at the back of a printed page for handwritten manuscript collections. Information from disparate sources needs to be put together. This naturally leads to the question of how cross-modal information should be combined.

Automatic Image Annotation and Retrieval

The focus here is on automatically annotating and retrieving images. We are doing this using relevance (based language) models. This is in collaboration with Jiwoon Jeon and Victor Lavrenko. Given an annotated image, one can view it as being described using two different vocabularies, an image vocabulary of features and a word (keyword) vocabulary. The problem is then one of learning the associations between terms in the two vocabularies and this can be done in a number of different ways (it is done for pairs of languages in machine translation and cross-lingual retrieval). We do this using a relevance model which essentially computes the joint probabilities of terms in the two vocabularies. These joint probabilities can either be used to annotate a test image or they can be used to retrieve images in response to a test query. See the MM-41 paper in the list.

Indexing and Retrieving Handwritten Archival Collections

This project seeks to index and retrieve collections of handwritten material written by a single author. Libraries contain an enormous amount of handwritten manuscript material which is often of interest to many people. Well known examples include the early Presidential papers at the Library of Congress. Since they are handwritten, optical character recognition technology (OCR) cannot be used. For large collections of manuscripts written in a single hand, the words viewed as pictures are likely to be similar. We are using two approaches:

  1. In the first approach we want to create an index like the one at the back of a book. The idea is to, therefore, segment pages into words and then cluster the words into classes based on image matching. For example, if the word "Independence" occurs a 100 times in one of George Washington's manuscripts, then one of these "similarity classes" will have the 100 occurences of "Independence". The top 2000 or so of these classes are then assigned ASCII equivalents. The user (or some other person) then pops up one member of a class and assigns the ASCII equivalent. The program can then automatically assign the ASCII equivalents to the other members of the class. Although we have tried a number of different matching techniques, our current favourite uses dynamic time warping - this work is with Toni Rath. See the CVPR'03 paper.
  2. The second approach uses relevance (based language) models to retrieve handwritten documents based on a text query. The idea is to first segment pages into words. Features are then computed over the words and these features are discretized so that every word can be represented in terms of a small discretized feature set. The joint probabilities of these features and their ASCII representations are learned using the relevance model. Given a texcp reset query, lines or pages can then be retrieved by formulating this as a retrieval problem and solving it using a language model. I am doing this with Toni Rath and Victor Lavrenko.

As part of this work I also did some work with Nitin Srimal on automatically segmenting handwritten manuscript images using scale space. The basic idea is that at a particular scale, the words can be well separated into blobs. The technique seems to reasonably well.

Meta Search

The idea here is to combine the outputs of search engines to produce a single ranking. We modeled the score distributions of the ranked outputs of search engines using a mixture of a Gaussian and an exponential (Gaussian for relevant and exponential for non-relevant). The parameters of these distributions may be used to compute the posterior probability of relevance given score for each document. The documents can then be reordered using this probability. I did this with Toni Rath, Fangfang Feng.

Previous Stuff

Finding objects in Foreground/Background images with Madirakshi Das. The idea is that one really wants retrieval based on parts of images - for example, the flower or bird in the image - rather than retrieval based on the entire image which may include lots of background. We found that for specific domains - flower images, bird images - one could use domain knowledge to segment out the object well enough to determine its color and then use it for retrieval.

Work on image retrieval by appearance with S. Ravela using Gaussian derivative filters at several scales.

Finding text from images with Victor Wu. The basic idea was to detect text embedded in images by viewing them as a texture

I have tried using Gaussians and their derivatives at multiple scales to find affine transformed versions of images.

Here is a of papers from my group. Some of them are available online.