In much the same way, computer vision is extremely valuable to any machine which is designed to act on its own and needs to adjust to a changing environment. As the Machine Vision and Image Analysis Laboratory at McMaster University says on its homepage, "Computer vision is an emerging new field with applications in automated manufacturing, autonomous vehicle guidance, radar signal processing, and robot guidance." In all of these applications, sight is very important to the machines. It is critical for such machines to know where they are going, and what is in the world around them.
One of the most useful aspects of vision is that of depth perception. In order for machines to guide themselves correctly, they need to know how far away things are. Humans have exceptional depth perception, using both monocular and binocular depth cues to determine the distance to objects in the world around them. The technology has not yet been developed for computers to analyze depth as well as humans can. Eventually, however, it may be possible for machines to detect depth as well or better than humans can..
One of the most common methods the human brain uses to detect depth is the concept of binocular disparity. This principle utilizes the fact that humans have two eyes which each see a slightly different view of the world. This principle states that the amount of difference between the views of the two eyes varies inversely with the distance to the object a person is looking at. In other words, as objects get farther and farther away from a person’s eyes, the difference between what the two eyes see will become less and less. The human brain uses this idea to determine depth by comparing what each eye sees. If the two eyes see a totally different view of an object, the brain knows that the object is very close. If, however, the two eyes see almost the same thing, the brain knows that the object is very far away.
Binocular disparity is not enough for humans to detect all depth, however. As was stated earlier, the brain uses both monocular and binocular depth cues to determine the distances to objects. One of the major problems with binocular disparity is that objects can cover each other up. Because each eye has a different view of a scene, certain points can be seen with one eye, but not the other. This results in points in the visual array of one eye with no corresponding points in the visual array of the other eye. It is impossible to determine the distance to these points using binocular disparity because only one eye sees them. In cases such as this, monocular depth cues must be used to finish where binocular cues left off.
One such monocular cue is called projective distortion. This effect can be seen in two ways: "(1) as a surface recedes from the viewer its markings become smaller… and (2) as a surface is inclined…, its markings appear compressed in the direction of inclination" (Brady, p. 19). Projective distortion gives the human brain a general idea of the shapes of the objects in the visual field. It can use this information to make educated guesses about the shape of the objects that it sees so that it can fill in the blanks left by binocular disparity.
Another aspect of human vision which must be explored is the ability of humans to determine where one object ends and another begins. One of the ways that humans accomplish this is by finding "luminance-defined edges" (Liu, p. 66) within the visual field. A computer can also be programmed to find such edges. Using the brightness values recorded in the image, an algorithm can be written which could "segment the image, based on the difference of average brightnesses in adjacent regions" (Brady, p. 142). Such an algorithm would allow a computer to define the edges of objects.
The objective of this project was to write a computer program which uses principles of binocular disparity in correlation with the other aspects of human vision described above to determine the distance to objects. This type of program is crucial to the field of machine vision. If autonomous machines that can adjust to a changing environment are to be created, the ability to detect depth is an extremely valuable tool. The intent of this project was to bring the ability of computers closer to the ultimate goal of machines being able to see their environment as well as or better than humans.
After the research portion of the project was completed, the next step was to design a C++ program which would determine the depth to simulated objects, which were simply two rectangles on two separate random dot background. The rectangles were placed so as to simulate the binocular effects of images taken stereoscopically. The program then outputted the disparity in pixels between the locations on the random dot fields of the two simulated objects.
Once this was completed, the next step was to take stereoscopic pictures which would test the program as it was being changed to accept actual pictures. The pictures that were most helpful for this stage were ones that very much resembled the simulated pictures. These pictures were taken of a black square of paper taped to a white wall. The pictures were taken with a black and white digital camera which was mounted on a slider with the camera pointed perpendicular to the slider. The slider was then placed parallel to the wall a measured distance away. Once the square was placed in view of the camera, a picture was taken. The camera was then slid along the slider four centimeters to the left, and another picture was taken.
These pictures are examples of stereoscopic pictures of a black square taped to a white wall. These are two 4 cm. squares, and the pictures were taken at a distance of 4 ft.
Once the pictures were taken, they then had to be converted from the TIFF file format to the Targa file format. This was done using PhotoFinish. The computer program then had to be altered so that it could read in the Targa files. This was done using object-oriented programming techniques to create a specific class for the Targa images. Once this was completed, the program needed to be further altered to accommodate for the differences between real and simulated images.
One of the major steps in this process was creating a procedure which could determine the edges of images in the visual array. The method which was used to do this looked for sharp contrasts between pixels in the image. If a pixel was next to a pixel of a sharply different brightness, both pixels were considered to be on the edge of an object.
The next step was to determine the disparity between the location of the object in one image to the next. Because it was known that the images contained only one object which was of uniform distance from the camera, there was no need to account for finding different distances to different parts of the image. It was therefore sufficient to take the average x-coordinate of all of the edge points for both images. The difference in the average x-coordinates was equal to the disparity, in pixels, in the location of the object between the two images.
Once the disparity in pixels was determined, it was then necessary to find the formula for converting that disparity in pixels to the distance to the object from the camera in inches. To do this the known distances to the objects in the picture were compared to the disparity outputted by the computer. Because it was known that the disparity varied inversely with the distance to the object, it was simply a matter of determining what constant to multiply the equation by. This was determined by plotting the distance versus disparity graph and using a linear regression to find the constant.
Another important aspect of this project was calculating the amount of error involved in determining the distance to an object. Because the size of the pixels affects the amount of accuracy that is possible in a program such as this, it is important to calculate how precise an answer is possible. This was done using the constant determined earlier, and by performing calculations which resulted in the distance represented by each pixel at each given distance from the camera. Because an object cannot be shifted over by a fraction of a pixel, this number represents the smallest amount of error possible in this program at any given distance.
There are, however, many more steps that need to be taken before the full potential of this project is realized. This project only has the capability to determine the depth to an object if it is the only object in an image. The program also does not include very sophisticated algorithms for determining the edges of objects. Only rudimentary edge-locating techniques were utilized in this project, and steps can be taken to further enhance the ability of this project to determine where one object ends and another begins. This project only deals with the simplest of situations: one object of uniform distance from the camera. A program still needs to be developed which can distinguish between objects in the image, and determine the distances to each one of them.
The error analysis is an interesting part of this project. It is obvious that there will be some error inherent in the program because there are not an infinite number of pixels in an image. It is also clear that using higher resolution images in this program will result in less necessary error. It has also been shown in the results that the farther away the camera is from an object, the more error there is. This can intuitively be seen when one considers an object that is extremely far away. If the object is so far away that the disparity between the two images is somewhere between one and two pixels. If the images show a disparity of one pixel, the computer outputs that the object twice as far away as if the images show a disparity of two pixels. When an object is that far away, doubling the distance makes a very big difference. When an object is very close, however, the question of disparity may be between 75 and 76 pixels. This difference does not create as much error. The amount of error in this program depends largely on the resolution of the images used as well as the distance from the camera to the object.
Another interesting aspect of the error in this project is that the error goes down if the camera is shifted a larger distance between taking the pictures. For example, if the camera had been shifted 8 cm. instead of just 4 cm., the error would have been cut in half. This leads to the conclusion that the camera should be shifted as far as possible between the two pictures. The limiting factor in this process in that the objects in question must be in the picture both times. If the camera is shifted too much, the object will not be in one of the pictures. It is therefore necessary to determine how far it is possible to move the camera in the specific application that will use this program. Once that has been determined, the program only needs to be changed slightly by changing the constant in the equation to 464 times the number of centimeters the camera is shifted.
This project was a successful attempt to write a computer program which could take two stereoscopic images and determine the distance to an object in the images. Through this project, the concept of binocular disparity was verified, showing that disparity does in fact vary inversely with the distance to an object. There will always be some error involved in determining depth in this manner, but this error can be minimized by using high resolution images and by shifting the camera as far as possible between pictures. The computer program in this project successfully determined the distance to an object in pictures taken stereoscopically, and it can be used as a stepping stone for further projects which will enable a computer to detect depth.
I would also like to thank Johann Schleier-Smith, who helped me tremendously with the computer programming aspects of this project.
Walt Houser CPCUG
Coordinator.