From University of California - Los Angeles
Computers with human-like vision could strengthen security and surveillance, UCLA researchers say Unmanned military vehicles could distinguish cave entrances from shadows and locate other hazards if they had a sense of vision similar to humans, say researchers at UCLA’s Henry Samueli School of Engineering and Applied Science.
A Pentagon spokesman was recently quoted concerning the difficulty of spotting caves in Afghanistan, saying, “From a cockpit perspective, a cave looks like nothing more than a shadow on the ground.”
Stefano Soatto, assistant professor at UCLA’s computer science department and head of the engineering school’s vision lab, is studying how the human visual system works in order to pass the ability on to machines. “In practice, the human visual system is still by far the best around, but this may not be so for long,” Soatto said.
Soatto’s research team is examining how people use vision to interact with others and with their environment, and is designing systems that will allow computers to interact in similar ways.
“We use senses to build models of the world around us that allow us to walk through an unfamiliar environment and interact with it,” Soatto said. “I want a machine to be able to do the same thing.”
The projects under way at the UCLA Vision Lab all involve “dynamic vision,” the ability of a computer to take in visual sensory information about its surroundings and use what it “sees” of its changing environment to perform assigned tasks, such as exploring underground bunkers or monitoring bank vaults.
As Soatto explains, “The world has certain physical properties — shape, motion, material properties of objects and so forth. Humans have developed, over the course of evolution, a particular way of representing their environment that has been crucial for them to survive.”
Machines, especially computers, can also be made to interpret the physical world and interact with it, whether that environment is inside a nuclear reactor or on the operating table.
Soatto is talking about much more than simple photography or video. “We know how to build cameras to capture images, we know how to build computers to crunch numbers, and we know how to build robots that move and perform pre-assigned tasks,” Soatto said. “However, we still do not know how to put everything together and endow a machine with a sense of vision.”
For a computer to perform “real-world” tasks, it must do more than simply capture and analyze a photograph. Using only that information, a computer cannot distinguish a photograph of a scene from the scene itself. To interact with a changing environment, the computer needs to gain additional information about spatial properties of the environment — shape, motion, distances, angles — measurable properties you can only get as images change over time. Multiple points of view are needed, where either the scene or the viewer’s perspective changes. Only then can a three-dimensional representation of the world be created.
Consider face-recognition security systems. Sometimes used at banks, airports and even public events, these systems are designed to recognize and allow passage to certain people while denying entry to strangers. But the system can be fooled in ways that the human visual system cannot, Soatto said.
“Current systems capture one image of your face, match it to a database and can recognize it as yours and let you in. However, if an intruder shows up with a photograph of your face, the system would not be able to distinguish that 2-D photograph from your 3-D face, and would therefore let the intruder in.” A computer with a true sense of vision would be able to tell the difference, says Soatto.
In July Soatto’s team was the first to demonstrate a computer system that could track an object’s movement and shape in real time — as it is happening. By capturing and processing images in real time, unmanned planes would know the difference between a shadow on a hill and the entrance to a cave, or between an airstrip and a long trench. The plane could act immediately to what it was seeing, rather than later, after the data had been analyzed. It also means a computer could do more than just pre-assigned tasks based on data collected at a certain moment; it would constantly update what it knows about its environment and truly interact with a changing world.
Soatto’s team is also investigating how to enable a computer to recognize distinct human movements — such as a person walking — and predict whether the figure is a man, woman or child. In some cases the computer could even identify a figure by his gait, or predict age, even mood, distinguishing joyful skipping from suspicious skulking.
The UCLA Vision Lab’s work may relieve humans from unwanted dangers or enable them to do things that otherwise were not possible. “Think of everything that animals and humans do with vision and all the occasions when you may want a machine to do that instead: mowing your lawn, exploring unfamiliar areas or staying up all night to check that intruders are not in your building,” Soatto said.
Soatto began the UCLA Vision Lab in 2000, and currently has eight graduate student researchers. For more information about the lab and its ongoing projects, log on to http://vision.ucla.edu/.