Lab 4: Real-time person detection

Casey Smith

For this lab, I implemented a method for detecting people in real-time using a video camera attached to a computer. Since the camera is sationary on a table, my first step was to detect motion relative to the background. To find the background, I implemented a rotating buffer which records one image per second. If a pixel doesn't change significantly across the entire buffer, the stored background image is updated to include that pixel. Then, a moving object is anything that isn't the background. If a moving pixel is flesh-toned (in rg chromaticity), and is next to several other flesh-toned pixels, it is assumed to be a face. Faces are assumed to not lie one above another in the image, so once a potential face is located, the next 80 columns are skipped.

The Basic Problem

Looking at a picture and identifying faces is not as easy for a computer as it is for a person. The goal for this lab was to implement a process which could be executed at 30 frames per second, which makes the problem even harder. Characteristics must be found which reliably distinguish faces from any other obejct in the image. Fortunately, faces posses several such features. My solution made use of the fact that faces tend to move around and that their rg chromaticity is fairly unique, yet moderately consistant among different people. Below are example input images.


Methods

Motion Detection

My first step was to find a reliable way of detecting motion in the images. I accomplished this by dynamically finding a background image and checking each pixel in an input image against the background. The background was found by determining stable pixels in a buffer of recent images. The background and image in the buffer were initialized to 0. The buffer was set up as an array. New images were added to the array every second or so. When the array was filled, new images were added starting at the beginning again, replacing the oldest stored image. Each time an image was added to the buffer, the variance of each pixel in the buffer was determined. If a pixel was stable across all the buffer images, it was assmed to be a valid background pixel, and the pixel in the background image was replaced. This allows the system to adjust to lighting condition changes and the movement of typically stationary objects, such as doors and chairs.

Motion was then determined by comparing each pixel in the input to the stable background image. If two pixels were different by some threshold, then the image was assumed to have a moving object at that location. This method has flaws. Moving objects are not always very different from the background color, and the background is not always stable--due to imperfections in the camera, areas of high gradient tend to flicker significantly. Below is an example of an image segmented into moving and non-moving parts.


Finding People

People tend to move around, even when they're "sitting still." Our heads turn; our eyes blink; we get uncomfortable and reposition ourselves. Thus, the movement image found above is very helpful. If something's moving, it's fairly likely that it's a person, at least in an indoor scene like a computer lab. However, we must have a way to check for sure. Fortunatly, people also tend to have skin, and skin tends to occupy a very specific location in rg chromaticity space, although ethnic variations can cause difficulties. Unfortunately, converting a normal image to a chromaticity image requires several operations per pixel (two adds and two multiplications), and the operations are dependent, making standard pipelining ineffective. The motion image above helps narrow down the search. I checked pixels in motion for face tones. If a pixel qualified, a region down and to the right was checked for facial tones. If a significant portion of that area is flesh-toned, a positive identification is recorded. Faces are assumed not to lie above one another, so once a face is detected, the next 80 columns are ignored. To make this possible, the image is scanned in column order (column 1, then column 2, then column 3, etc.). This method prevents multiple matches for one face.

Evaluation

For most cases, this method works fairly well, and it performs at the full 30 frames/second. The adaptive background succussfully adjusts to changes in the enviornment. Movements are reliably identified, and the chromaticity evaluation successfully find most faces of varying tones. However, no attempt was made to distinguish face skin from hand and arm skin. In most cases, the 80 column skip will prevent multiple identifications of the same person. However, if someone sticks out his or her arms or gets close to the camera and spans more than 80 columns, multiple identifications can be made for a single person.

Successful examples



Not-so-successful examples

Here, one person exists across more than 80 columns, so I'm recognized as two people.



Due to imperfections in the camera, certain portions of the image are never stable. As mentioned earlier, areas of high gradient flicker. If, however, a person stands still in front of one of these unstalbe areas, the program can learn that he or she is the real background. This allows the program to think of the flickering edges as motion (until a stable background is found, no data is available for comparison to retrieve motion, so flickering areas are ignored provided nothing stable takes their place for long enough). Occasionally, the edges flicker through flesh tones, and are identified as people.