Lab 3: Face Recognition by appearance

Casey Smith

For this lab, I implemented face recognition using appearance-based methods. Using a database of 75 pictures of faces and SVD, I generated eigenfaces. Each new face could then be represented as a projection into face space such that its coordinates were the dot product of it and each eigenface (or the most significant few eigenfaces).

Image processing

Scaling

The face photographs were taken at different times under different conditions, so faces varied in size considerably. Fortunately, all the pictures were taken against a bright wall. In order to scale the faces, I set threshold values for the brightness of the left side of the face, the top of the head, and the right side of the face. Then, I detected the left edge of the face by looking at a column of 20 pixels going from the center of the image upwards. This column was moved to the right (starting at the left edge of the image) until 3 of the 20 pixels were dark enough to be face pixels and not background pixels. This avoided noise and gave good results. The top of the head and the right side of the head were done in a similar fashion. Since the pictures tend to have illumination from the left (there were windows over there), I used three different thresholds (Caleb's skin is very pale, making it impossible to use the same threshold for the left and right, and Nik's bald head is brighter than the shadows on the right side of people faces). The scaled image was created to be the same size (200x135) as the original with a 5 pixel padding to the left, right, and top.
Becomes

Average Image

Next the average image was calculated by summing all the images (into a long array as opposed the image unsigned char array) and dividing by the number of images.

Difference Images

When doing comparisons and building face space, difference images--the difference between an image and the average image--were always used. The difference images were scaled to have a length of 1. (The visualizations here rescale them to go from 0 to 255 instead of small negative to small positive).
Becomes

Building Face Space

Face space was created by using SVD to find eigenfaces (or scales of eigenfaces). The images in the database were loaded into the columns of a matrix which was fed into the SVD routine, creating eigenfaces. The singular values return indicate the importance of each eigenface (and are the square root of the corresponding eigenvector). Then, the most important eigenfaces were selected to represent face space. Usually, the sum of the eigenvalues corresponding to the eigenfaces used should be at least 90% of the total sum of the eigenfaces. However, for the database at hand, this would require 42 eigenfaces. Instead, I used 15 and 20. I found that 20 was able to represent faces much better.
The top 6 eigenfaces:

Projecting into Face Space

Once the eigenfaces have been recorded, any face can be projected into face space by recording the dot product of the image with each eigenface. Thus, if 20 eigenfaces were being used, projecting a picture into face space returns an array of 20 doubles. This projection is lossy. I can determine if a picture can be represented in face space by projecting it into face space and taking the sum squared difference of the two images. If the difference is above some threshold, I can consider it a non-face (or at the very least a face that cannot be adequately represented by the eigenfaces). Recall that all image projection and processing is done on difference images.
Original difference image
Projected difference image

Model Building

For each person in the database, there are three standard poses and some miscellaneous poses. The standard ones are forward, left 45 degress, and right 45 degress. For my models, I simply represented each person as three points in face space corresponding to the three standard poses.

Recognition

In order to recognize an image, I find the closest point in face space. Since I have three separate points in face space, I can label the picture as a person with a pose (forward, left, or right). At some threshold, a recognition should not be considered valid. If the distance between the input image and the best match is more than about .35, it shouldn't be considered a match. An example result is below. csmith3.02 is a forward-facing image not used in the model creation. The results show that it strongly matches csmith facing forward, and the SSD of the projection and the original difference image is small enough that it can be adequately represented in face space:
./identify namesList bases.dat models.dat /project//data/faces/pgm/csmith.02.pgm
SSD: 0.31699
Best Matches: 
         csmith facing f: 0.0294193
           evan facing f: 0.468535
         csmith facing l: 0.592199
          laura facing l: 0.677967
           evan facing l: 0.852638
         jordan facing l: 0.984172
         jordan facing f: 0.987545
           paul facing f: 1.01283
           paul facing l: 1.01858
           matt facing f: 1.02219

Error Analysis

Below is the confusion matrix for recognizing the "02" images. These were secondary forward-facing pictures which were not used in model building. Some non faces were also used. The overall accuracy (number correct / number attempted) for the system is .8667 (26/30).
brandoncharliecshetlandcsmithdaveeliericevanjanejbulnesjessejordankuanlauraluismattmaxwellnikohsupaulrbergerstephaniesuortgillettexzhuonew facenon face
brandon100000000000000000000000000
charlie010000000000000000000000000
cshetland001000000000000000000000000
csmith000100000000000000000000000
dave000010000000000000000000000
eli000001000000000000000000000
eric000000100000000000000000000
evan000000010000000000000000000
jane000000001000000000000000000
jbulnes000000000000000000000000001
jesse000000000010000000000000000
jordan000000000001000000000000000
kuan000000000000100000000000000
laura000000000000010000000000000
luis000000000000001000000000000
matt000000000000000100000000000
maxwell000000000000000010000000000
nik000000000000000001000000000
ohsu000000000000000000100000000
paul000000000000000000010000000
rberger000000000000000000001000000
stephanie000000000000000000000100000
suor000000000000000000000000001
tgillette000000000000000000000001000
xzhuo000000000000000000000000100
non face000000000000000000000000023

Extensions

For extensions, I attempted to reject non-face images and to create novel poses. Theoretically, if face space were well filled, interpolating between two poses should generate a smooth transition between the two. However, our data set is far too small to make this possible. The image below doesn't look like a smooth transition--it looks like a blending of the two poses.