Content-based Image Retieval

*note* The CS server doesn't like serving images, so images may have to be "refreshed" (Netscape) or "shown" (IE) individually.

Abstract

For this lab, I developed a system for content-based image retrieval. First, a database of histograms is created. Then, a sample image is provided, and its histogram is matched to the histograms in the database to find similar pictures. Histogram comparisons are done in both RGB and rg color spaces using either histogram interseciton or L2 norms. Also, a "central object mode" can be specified so that central and peripheral histograms can be compared with user-specified weightings. Histograms are scaled to their image sizes at all times.


Detailed description

Histogram Generation

In order to generate the histograms, the color space is divided into an arbitrary number of "buckets." For these analyses, RGB space was divided into an 8x8x8 bucket array and rg space was divided into a 16x16 bucket array. Color values are scaled to fit into the buckets evenly across the 0-255 range. Each pixel in an image is visited and classifed to a bucket, incrementing that buckets counter by 1.0/(number of pixels). This scaling ensures that images of varying sizes will be compared fairly. In the database file creation, each image in the image directory is visited, and its filename and histogram array is stored in the database file. Database files for the 670 images ranged between 1-5MB (a wee bit smaller than the 640MB required to store the image files), depending upon color mode (RGB vs. rg) and comparison mode (central object vs. whole image).

Histogram Comparison

Histograms were compared by two methods. Histogram intersection yields a "goodness of match" value where 1.0 is a perfect match and 0.0 is the worst match possible. L2 Norms calculate a distance measure between histogram values where 0.0 is a perfect match and larger numbers are worse matches. In comparing a sample image, the histogram of the image is compared with the stored histograms and given a score. The histograms which recieve a score within a specified percent of the second best score (the image itself is the best score) are listed as the output of the program. Whem the program is working in central obect mode, weights for the importance of the central histogram comparision and the perimeter histogram comparison can be specified.


Results

The different methods were good for matching different types of pictures.

Required image: pic.0002.ppm

The results for each method will be shown first, follwed by a discussion.

RGB space using histogram intersection

1.0000000.6948550.6886320.6662630.6615780.6532810.6492710.6378270.6361020.628577

RGB space using L2 norm

0.0000000.0190800.0198080.0206340.0232950.0271270.027809

rg Space using histogram intersection

1.0000000.8874080.8767150.8711850.8664150.8613460.8591950.8555050.8520720.8487460.8421390.8356290.8329560.8320890.8213530.8106660.8017030.8010590.799899

rg Space using L2 norms

0.0000000.0044660.0049430.0052530.0061740.0061860.006653

Discussion of pic.0002.ppm

All of the methods gave fairly decent results. They all found at least one image of the blue-shirted professor lecturing. However, none of them returned pic.0003.ppm which was in the same room but didn't include the blue-shirted professor. It's my guess that the lack of blue in the image caused it not to be returned.
RGB space using histogram intersections found a lot of matches with similar "goodness of match" values. Looking at the images, a clear match of colors can be found between the first one and all the others, but most of them have to be regarded as garbage. When searching for pictures of a classroom, we don't want to find pictures of fire extinguishers, even if both images have a lot of reds, yellows, silvers, and whites. While it returned 100% of the images of the blue-shirted professor, it also returned 60% garbage.
RGB space using the L2 norm didn't fare as well as the histogram intersection method. It only found one picture of the professor (and it was the worst match of all) and returned 71% garbage. Interestingly only one of the garbage images was different, so it looks like L2 norm kept the "bad" matches while throwing out the "good."
rg Space using histogram intersection returned a whole mess of images with similar match values. However, it did the best of all the methods because it returned two of the good matches as the top two, and the other match as the fith match. It found all blue-shirted professors but also returned 79% garbage. However, since all the good matches were at the beginning of the list, this may be an indication that the threshold for matches should be smaller for rg space than for RGB space. With the threshold set just right, the number could be reduced to 33% garbage.
rg space using L2 norms didn't do as well as the histogram intersection. It found all but one of the good matches, but they were evenly spaced out in the results. 57% garbage was returned.

Another image: grass matching

Here, I selected an image of young grass in bright sunlight to compare how RGB and rg spaces match colors. Since in the previous exampl the histogram intersection method proved more useful than the L2 norm, I didn't bother compiling L2 norm results.

RGB

1.0000000.7263920.667783

rg

1.0000000.8026370.7453920.728497

Discussion of grass

The interesting thing to note here is that RGB returned pictures of grass all with the same illumination: bright sunlight with a color-rich exposure. rg, however, was able to find the last image which is also grass, but appears to be under different illumination. Neither method returned any real garbage (although the third rg image is on the edge), but it's also impossible to judge what images were missed--the database is full of summer pictures with grass on them.


Inside the Art gallery

The wood floor inside the art gallery provides a large degree of consitency between all photographs taken there.

RGB

1.0000000.7855040.7736020.7731140.7114650.6485320.644464

rg

1.0000000.8054810.7998500.7864290.7745910.7645970.7282260.7277740.7136380.7026520.653180

Discussion of the art gallery

As you can see, this worked amazingly well. I increased the matching threshold until I had a miss in rg, which is the last image on the right. RGB space matching didn't return as many images before the matching threshold was reached, most likely because it's sensitive to slight changes in illumination and camera exposure which don't affect rg space comparisons. With the lowered threshold (lowered to 20%), RGB returned no garbage images and found 7 of the 12 images taken in the gallery. rg returned 10 of the 12 gallery images before returning a garbage image, but it's hard to really call it garbage: the color of the wood table looks just like the floor of the gallery, and the cream-colored walls are very similar, as well. Neither method was able to find the three following images:



Discussion

The content-based image retrieval system here presents a simple method of finding similar images. Calculation of histograms is a simple and fast process, and the storage space required is minimal. For most types of pictures, the returned pictures are good matches for the submitted image. However, for images with a variety of colors, the results can be less helpful. For instance, the pictures of the blue-shirted professor matched pictures of chairs or fire extinquishers just as well as other pictures of the blue-shirted professor. To a human, a classroom is clearly not a dessert tray, but in RGB space histogram comparisons, they're very similar. Thus, histogram matching is only particularly useful when the image is dominated by a few colors.
Compared to object recognition, histogram-based matching is extremely simple. No complex feature vectors are required (a histogram may be a multi-valued array, but each value represents the same thing), and tranformations such as scales, translations, and rotations are completely ignored. Where object recognition can be a resource- and time-consuming process, histogram matching is exceedingly fast. For the database of 670 images, finding matches takes much less than a second, and I'll admit that my code isn't even particularly optimized. If it were running on a nice server with multiple processors and a RAID system, something very similar to this could be used as a search engine for images on the internet. The time consumed in comparisons only grows linearly with the number of images in the database, regardless of image size. Creating the histograms for the database, however, does take quite a while--about 5 minutes for this database. This can be done offline, though, so the user wouldn't see the time.


Extensions

As an extension to the lab, I experimented with histogram intersection and L2 norms in both RGB space and rg space and directly compared the results. I also implemented a "central object" comparison. For this, the image is divided into two regions: the center and the perimeter. The division is arbitrarily set so that the center is exactly 1/4 the dimensions of the image, but this can be changed. Separate histograms are stored in the the database. Comparisons are performed by attaching user-specified "weights" to the perimeter and center comparisons. The results are very encouraging, although not unexpected. The idea is that if you have a picture focused on something yellow, and you want to find more pictues focused on something yellow, all that needs to be compared is the color at the center of the image.

Here, the center of the object is weighted at 1.0, and the perimeter at 0.0:
1.0000000.5750490.5708500.5465330.5255370.496387
The second to last image shows up because the table color is close to the mask, and the book is close to the chair color.

Here, yellow objects are found:
1.0000000.8100590.8012700.7115720.689063


Here, objects with dark perimeters are found. Note that two of the matching images were even taken on the same table. This was done in RGB space because the black table would match anything gray in rg space.
1.0000000.8208950.8001010.7962530.7592580.7223370.717038



Possible Future Work

Another thing that could be useful would be to record the variance of position in image space of each of the buckets in the histogram. This way, an image with a green blob would not match an image with green speckles. This would require twice the storage space (basically another histogram for each image, except that the second histogram would contain variances rather than occurrences), but the storage space is small, so it wouldn't be much of a problem.