Thresholding and Object Recognition


Abstract

For this lab, I created a program which reads in an image and identifies objects in the image. It expects a uniform background of higher intensity than the objects. First, it determines the optimal threshold for dividing the image and uses that threshold to extract regions. After rotating each object so that its major axis is aligned vertically, it calculates a feature set of the height to width ratio, percent of the bounding box filled, and relative size. Depending on command line options, it then either prints that information to a database file or uses the previously written database file to recognize the objects.

Thresholding Algorithm

In order to determine the optimal threshold automatically, I used a modified version of Algorithm 5.2 in Image Processing, Analysis, and Machine Vision, Sonka et. all. The algorithm starts with the threshold set to 128 and divides the image into background and object sets. The mean of each is then calculated, and the threshold is set to the average of those means. This is repeated until the threshold doesn't move. However, I found that weighting the background and object intensities evenly led to inadequate segmentation: Bits of the objects showed up as background and would leave unconnected pieces. To fix this, I weighted the algorithm to find a threshold closer to the background intensity. Instead of averaging the object and background intensity means to determine the threshold at each step in the iterative process, I summed the object mean with 1.6 times the background mean and divided by 2.6. This allowed the program to pick an appropriate threshold for each image (and they were all different).

Segmentation Algorithm

After thresholding, I do some additional image processing. First, I use a median filter to remove noise. Then, I grow in 8, shrink in 4, and grow in 8 again. This serves to connect objects such as the pliers which may have been disconnected by narrow specularities.
I did not write my own segmentation algorithm but rather used the one provided on the lab page. It's a standard two-pass segmentation algorithm. I told it to ignore regions smaller than 350 pixels. I used the region IDs, bounding box information, and centroid information to generate this image.

Feature Extraction

My three features for each object were the ratio of height to width, percent of the bounding box filled, and relative size. In order to make these numbers meaningful, I first rotated each object so that its major axis was aligned vertically. For objects which didn't have a square bounding box, I found the angle of rotation by using the central moments. For objects with square bounding boxes, this angle of rotation was fairly meaningless, so I performed an exhaustive search of rotation angles, finding the angle that maxmized the percent filled of the bounding box. If at some point in the exhaustive search the ratio of height to width was no longer square, I exited the search and returned the angle as determined by the moments (this happens when long, thin objects are at 45 degree angles in the original image). In order to perform the rotations, I allocated a square image space with dimensions of the square root of two times the larger of height or width in order to make sure the rotated object did not fall outside the new image. In doing the rotation, I placed the centroid of the object at the center of the image.
After that, I extracted the bounding box of the rotated object using the two-pass segmentation function and determined the ratio of width to height and the percentage of the bounding box filled. In order to determine the relative size, I used the size information from the original extraction, multiplying the quotient of the size of the object and the size of all the objects combined by the number of objects. This gives the object size as a percent of the average object size.

Object Models

In order to create object models, the program is run with the "-g" (generate) flag on the command line. It then reads in the specified image and records the feature set for each object in a file.

Matching Objects

To identify objects in another image based on the model image, the program is run with the "-c" (compare) command line option. The program then creates a feature set for each object, reads in the features set from the database file stored by running the program with the "-g" option, and calculates a similarity value between each object in the new image and objects in the database. In order to dertermine how similar a new object is to an object in the database, I used a modified scaled Euclidian distance. For the ratio of height to width and the percent filled values, I used a normal Euclidean distance scaled by the variances of each. In determining the variances, I threw out one "outlyer" from each data set. This makes sure that values like the 6.9 ratio of the pen don't make the variances of the other data points seem larger than they are (all other values are between 1.0 and 2.3). Relative size is a different kind of number, however. Because relative size is completely dependent upon what other objects are in the image, it's really only useful as a rough guide and should only affect the comparison when the relative sizes are very different. To achieve this, I squared the values of the relative sizes and calculated the ratio of the two sizes such that the ratio was larger than one. This ratio is fairly close to one for similar relative sizes, but as the match gets worse, the ratio gets worse even faster. This prevents the clip from being identified as, say, the envelope (and it can look like a small version of the envelope when the metal has a reflection and doesn't show up after thresholding). Then, for a final distance measure, I sum five times the ratio distance, seven times the percent filled distance, and 0.5 times the ratio of the squared relative sizes. These weightings were determined experimentally.
The program then spits out a lot of information about what it saw and what it thought was what. At the bottom is the output for the image shown above. After listing the "distances" between the database objects and a region, it says what it thinks the region matches, and gives an (x,y) location of the centroid of the region in the original image. If nothing is similar enough (the empirically determined cutoff is 35), it says that it doesn't know what the object is.

Results

The system worked remarkably well. Since color was not used as a feature, the orange and black disks were often confused, but every other object was correctly identified. In one case, the orange disk was mistaken for the disk case. However, in this image, there is no "shine" on the metal of the disk, so the binarized image is just a square. Looking at the binarized image, I can't tell the difference between the two, so I don't see how I could expect my program to do so.

Confusion Matrix

This matrix shows how what objects were identified as what. Note that the only off-diagonal values are confusing the two disks (and the one confusion of the orange disk and the disk case noted above). Actual objects are on the vertical axis and identifications are on the horizontal axis.
Disk CaseFunnelVelvetPen EnvelopeClipBlack DiskOrange Disk Stop SignCompaq ThingyPliers
Disk Case 700 000 000 00
Funnel 030 000 000 00
Velvet 004 000 000 00
Pen 000 400 000 00
Envelope 000 050 000 00
Clip 000 004 000 00
Black Disk 000 000 220 00
Orange Disk 100 000 330 00
Stop Sign 000 000 005 00
Compaq Thingy 000 000 000 60
Pliers 000 000 000 04

Hit Rate

This table shows the percent correctly identified for each object. Note that the low percentages on the disks is due to confusion between the two.
ObjectCorrectness rate
Disk Case1.0
Funnel1.0
Velvet1.0
Pen1.0
Envelope1.0
Clip1.0
Black Disk0.5
Orange Disk0.429
Stop Sign1.0
Compaq Thingy1.0
Pliers1.0

Extensions

I completed two extensions. The first was automatic threshold determination, and I described that above in the "Thresholding" section. The second was to correctly identify all the objects except the disks without color. I didn't use color, and the hit rate chart shows my results.

Sample output for the image shown above

Examining region 0
Examining region 1
Examining region 2
Examining region 3
Examining region 4
Examining region 5
Examining region 6
Examining region 7
Examining region 8
Examining region 9
Examining region 10
           ratio  filled  relSize
Region  0: 1.593, 0.3034, 0.0758
Region  1: 7.385, 0.9183, 0.05184
Region  2: 2.141, 0.8897, 7.148
Region  3: 2.474, 0.4967, 0.5966
Region  4: 1.526, 0.897, 0.1481
Region  5: 1.127, 0.9443, 0.6663
Region  6: 1.037, 0.7831, 0.01229
Region  7: 1.3, 0.5732, 4.393
Region  8: 1.104, 0.8522, 0.6647
Region  9: 1.973, 0.5374, 1.192
Region 10: 1.021, 0.9699, 3.014
Variances: r=0.221194, pf=0.036244, rs=1.020198
Distances for Region 0:
   Disk Case: 107.206
   Funnel: 17.7979
   Velvet: 40.0492
   Orange Disk: 81.7346
   Envelope: 155.217
   Clip: 80.5603
   Black Disk: 62.3292
   Stop Sign: 33.7673
   Compaq thingy: 63.1157
   Pliers: 1.00484
   Pen: 720.976
I think that region 0 is the Pliers. (188, 46)
Distances for Region 1:
   Disk Case: 912.562
   Funnel: 750.942
   Velvet: 890.697
   Orange Disk: 890.478
   Envelope: 718.893
   Clip: 825.893
   Black Disk: 926.004
   Stop Sign: 609.3
   Compaq thingy: 800.834
   Pliers: 833.369
   Pen: 5.53649
I think that region 1 is the Pen. (301, 29)
Distances for Region 2:
   Disk Case: 30.1203
   Funnel: 29.8678
   Velvet: 34.5851
   Orange Disk: 29.1409
   Envelope: 1.61258
   Clip: 385.737
   Black Disk: 35.1189
   Stop Sign: 24.1402
   Compaq thingy: 28.3606
   Pliers: 97.5102
   Pen: 554.378
I think that region 2 is the Envelope. (74, 128)
Distances for Region 3:
   Disk Case: 92.835
   Funnel: 13.1157
   Velvet: 35.6113
   Orange Disk: 75.5848
   Envelope: 52.1141
   Clip: 95.4227
   Black Disk: 68.1402
   Stop Sign: 2.30198
   Compaq thingy: 51.9084
   Pliers: 26.9546
   Pen: 488.614
I think that region 3 is the Stop Sign. (380, 113)
Distances for Region 4:
   Disk Case: 10.9554
   Funnel: 26.5809
   Velvet: 31.0373
   Orange Disk: 5.71718
   Envelope: 42.7372
   Clip: 8.57802
   Black Disk: 9.55048
   Stop Sign: 40.0093
   Compaq thingy: 1.04288
   Pliers: 69.5012
   Pen: 662.737
I think that region 4 is the Compaq thingy. (240, 113)
Distances for Region 5:
   Disk Case: 1.5638
   Funnel: 38.8804
   Velvet: 27.7009
   Orange Disk: 0.682926
   Envelope: 30.5356
   Clip: 35.7895
   Black Disk: 4.08735
   Stop Sign: 61.8157
   Compaq thingy: 5.01564
   Pliers: 87.3531
   Pen: 766.436
I think that region 5 is the Orange Disk. (158, 151)
Distances for Region 6:
   Disk Case: 64.5772
   Funnel: 53.546
   Velvet: 147.184
   Orange Disk: 28.5934
   Envelope: 437.884
   Clip: 6.89704
   Black Disk: 27.6741
   Stop Sign: 94.3718
   Compaq thingy: 13.4977
   Pliers: 58.2943
   Pen: 795.114
I think that region 6 is the Clip. (226, 197)
Distances for Region 7:
   Disk Case: 36.952
   Funnel: 7.45659
   Velvet: 0.690492
   Orange Disk: 27.1453
   Envelope: 45.6328
   Clip: 252.56
   Black Disk: 16.278
   Stop Sign: 26.5433
   Compaq thingy: 27.64
   Pliers: 30.8433
   Pen: 763.228
I think that region 7 is the Velvet. (347, 251)
Distances for Region 8:
   Disk Case: 4.95547
   Funnel: 27.2488
   Velvet: 16.8616
   Orange Disk: 1.38111
   Envelope: 33.7409
   Clip: 36.9956
   Black Disk: 0.977556
   Stop Sign: 51.2766
   Compaq thingy: 4.21215
   Pliers: 66.5266
   Pen: 773.288
I think that region 8 is the Black Disk. (109, 262)
Distances for Region 9:
   Disk Case: 58.8859
   Funnel: 1.93019
   Velvet: 12.1545
   Orange Disk: 44.9658
   Envelope: 39.0446
   Clip: 100.118
   Black Disk: 36.4036
   Stop Sign: 3.80245
   Compaq thingy: 30.1514
   Pliers: 18.07
   Pen: 591.85
I think that region 9 is the Funnel. (196, 287)
Distances for Region 10:
   Disk Case: 1.29358
   Funnel: 47.4468
   Velvet: 30.4431
   Orange Disk: 3.25482
   Envelope: 29.7935
   Clip: 159.362
   Black Disk: 6.94174
   Stop Sign: 72.2025
   Compaq thingy: 13.3241
   Pliers: 104.07
   Pen: 806.05
I think that region 10 is the Disk Case. (298, 384)