A surveillance video capturing a man removing his sweater near an SUV packed with explosives during the Times Square bomb scare in May swiftly resulted in a multi-agency search.
But as soon as news hit that the man was not connected to the threat, questions were brought to the forefront about the reliability of surveillance.
The challenge with surveillance is that it is costly and not always reliable, said Paul Cohen, the University of Arizona computer science department head.
"The cost of knowing what is going on is very, very high," Cohen said. "The problem right now is not that we cannot put more cameras up, the problem is that monitoring those cameras is very, very expensive and error-prone."
To help remedy some of these issues, Cohen and his collaborators are working to build an intuitive system not merely to capture video, but to detect suspicious human behaviors.
The team has received a $2 million grant for the project's first two years from the Defense Advanced Research Projects Agency, or DARPA, with Cohen serving as the principal investigator.
After the initial period, the team will have a chance to qualify for an additional $3 million for a three-year period, Cohen said.
Drawing on ways the human brain processes visual information, the research team plans to build a visual detection and tracking system, models of human behaviors and simulators to generate possible future scenes.
Another challenge with modern-day surveillance is that it must be monitored, said Kobus Barnard, an associate professor of computer science and member of the research team.
What is required, and what the team intends to utilize, is high-level semantics. "We have to bring in a lot of information and pull together these hypotheses," Barnard said.
Others involved at the UA are Ian Fasel, an assistant research professor of computer science, and Wesley Kerr, a doctoral degree candidate in the computer science department.
The team also is collaborating with Deva Ramanan, an assistant professor of computer science at the University of California, Irvine, and Clay Morrison, an assistant research professor in the computer science department.
One of the project's major components is the incorporation of visual imagination, a process by which the brain decodes what the eyes see while also guessing what should occur next.
"The same parts of the brain are involved in vision and visual imagination. To some extent the brain is telling the eyes what they should be seeing. So we want to model this sort of heavy involvement of the brain," Cohen said.
The researchers are taking on a highly complex and difficult challenge in training a system to do just that.
"It has been done before in small pilot demonstrations. But as the number of activities grows, the technology might not scale up," Cohen said. He also noted that "getting from seeing to understanding" actions is somewhat difficult.
To start, the team will focus on 48 specific verbs or commands, developing a multi-level system that can differentiate actions such as "carry," "escape," "run" and "climb."
But the system is not being designed merely to detect such behaviors, but to understand them in the context of interactivity – people interacting with people and people interacting with objects.
Given that goal, the team is figuring out ways to involve the three parts of the system simultaneously so that it will be able to make guesses and deductions, Barnard said.
"The goal of the system is to really take the next step forward in the semantics of actions that are a little bit extended in time and more complex than what has been achieved so far, which is still in the research domain," Barnard said.
In addition to a range of different scientific disciplines, such as computer science and geometry, the project has implications for a several branches, including law enforcement and military personnel, Cohen said.
"We researchers have always been interested in unsophisticated things any human can do while funding agencies are often interested in things that only a tiny portion of the population can do," Cohen said, adding that the initial proposal was one of the best DARPA programs he's come across.
"You learn most about intelligence by studying what everyone can do," he added. "Any 3-year-old can look at a movie and tell you what she sees. No machine can."