My research focuses on the problems of organizing and retrieving visual information. Visual information here not only includes pictures and video but also images of scanned documents and text. I believe that one needs to take advantage of as much information to retrieve this information. Thus, I work on a number of different ways of retrieving images (content based image retrieval), but also work on extracting text from images (so that it may be used to annotate the image). I also work on indexing handwritten manuscripts like the George Washington collection at the Library of Congress> The idea, here, is to make an index like the one at the back of a printed page for handwritten manuscript collections. Information from disparate sources needs to be put together. This naturally leads to the question of how cross-modal information should be combined.
The focus here is on automatically annotating and retrieving images. We are doing this using relevance (based language) models. This is in collaboration with Jiwoon Jeon and Victor Lavrenko. Given an annotated image, one can view it as being described using two different vocabularies, an image vocabulary of features and a word (keyword) vocabulary. The problem is then one of learning the associations between terms in the two vocabularies and this can be done in a number of different ways (it is done for pairs of languages in machine translation and cross-lingual retrieval). We do this using a relevance model which essentially computes the joint probabilities of terms in the two vocabularies. These joint probabilities can either be used to annotate a test image or they can be used to retrieve images in response to a test query. See the MM-41 paper in the list.
This project seeks to index and retrieve collections of handwritten material written by a single author. Libraries contain an enormous amount of handwritten manuscript material which is often of interest to many people. Well known examples include the early Presidential papers at the Library of Congress. Since they are handwritten, optical character recognition technology (OCR) cannot be used. For large collections of manuscripts written in a single hand, the words viewed as pictures are likely to be similar. We are using two approaches:
As part of this work I also did some work with Nitin Srimal on automatically segmenting handwritten manuscript images using scale space. The basic idea is that at a particular scale, the words can be well separated into blobs. The technique seems to reasonably well.
Finding objects in Foreground/Background images with Madirakshi Das. The idea is that one really wants retrieval based on parts of images - for example, the flower or bird in the image - rather than retrieval based on the entire image which may include lots of background. We found that for specific domains - flower images, bird images - one could use domain knowledge to segment out the object well enough to determine its color and then use it for retrieval.
Work on image retrieval by appearance with S. Ravela using Gaussian derivative filters at several scales.
Finding text from images with Victor Wu. The basic idea was to detect text embedded in images by viewing them as a texture
I have tried using Gaussians and their derivatives at multiple scales to find affine transformed versions of images.
Here is a of papers from my group. Some of them are available online.