The Unblinking Eye

San Francisco International Airport (SFO) is a hub for business travelers, vacationers, immigrants, stopover passengers, on-site workers—and a whole lot of suspicious-looking people. That's why it's not surprising to discover that the airport operates an extensive video surveillance system. What is surprising is how very smart the system is.

When it comes to video surveillance, people tend to imagine banks of sharp-eyed human observers endlessly scanning video screens for anything out of the ordinary. But that's not necessarily true anymore. Sophisticated video analysis technologies are rapidly replacing people as ever-vigilant sentinels.

"If you have a security guard looking at a monitor, he's probably going to look at it for 10, 20 minutes and then get bored and zone out," says Dilip Sarangan, a security analyst for Frost and Sullivan, a technology market research firm. "A computer never gets bored, and nothing goes unchecked."

By studying human behavior and automatically detecting the presence and absence of various objects in real time, intelligent video analysis promises enhanced security at an overall lower cost. "It's more of a proactive rather than a reactive approach to video surveillance," says T. Jeff Vining, a security industry analyst at Gartner, another technology research firm.

Government agencies and other organizations are scooping up intelligent video analysis products at an accelerating pace. Over a dozen firms now offer some form of the technology. The vendor pool includes companies like Vidient, Westec Interactive, and Visual Defence, all of which offer products that can survey a local area—indoors or outdoors—and spot anything out of the ordinary.

Intelligent video system sales are projected to grow from $60 million in 2005 to $400 million in 2012, Sarangan predicts. "It's heading into the business mainstream," he says.

SCANNING SFO
More than 32 million passengers pass through San Francisco International Airport each year. Visually studying even a small percentage of this flood of humanity for quirks and behavior that might betray a sinister motive would require an army of human observers glued to video monitors. For a solution that would prove effective without financially crippling manpower costs, SFO turned to SmartCatch, an intelligent video analysis technology offered by Vidient.

SmartCatch works in conjunction with the airport's existing closed-circuit television (CCTV) systems to detect aberrant or suspicious behavior and distinguish those patterns of activity from normal activities (Fig. 1). When the behavior-based software spots an anomaly, it sends a video clip via a pager, laptop, cell phone, or other communications device to a responder, who can then investigate the situation.

"When we say ‘behavior,' we don't mean facial recognition or license plate reading. We're really talking about a combination of human and object behaviors," says Steve Goldberg, Vidient's CEO. In other words, the system looks for people and objects, such as suitcases or packages, that aren't in the right place or have lingered in a place for too long.

"So if you parked your car at the curb, where it's only supposed to be for dropoff, and the car doesn't move, it will alert security," says Michael McCarron, SFO's community affairs director. The system also can spot "human tailgating," when two people pass through a secure door on a single ID card swipe, as well as things like crowd formation and people going through an exit lane the wrong way.

Vidient's Windows-based technology is based on sophisticated video algorithms developed over three years by NEC's computer vision engineers. "The algorithms are generally based on adaptive filtering or adaptive processing—neural network types that have been used in other data and voice applications," says Goldberg. SmartCatch detects suspicious situations with an accuracy rate between 95% and 98%, Goldberg notes.

Like most other intelligent video analysis technologies, Vidient's product functions by seeing each image as a mosaic of pixels. The algorithms then work to make sense out of the mosaic's movement, or lack of movement, and to separate the pixel cluster from background clutter. "Basically, video analytics is all software," Sarangan says.

ADVANCING TECHNOLOGY
Cameras streaming IP video make it relatively easy to add analytic technology to a new or existing surveillance system, says Michael Godfrey, Visual Defence's chief technology officer. Since the raw data is in a digital format already, intelligent video analysis technology can be dropped into the system easily. "I can put my analytic server anywhere within the network," Godfrey says.

Thanks to faster and more powerful processors, it's now possible to build analytic capabilities directly into surveillance cameras. Lumenera, for example, has introduced a series of cameras that use Texas Instruments' DaVinci digital video technology to deliver advanced image processing, compression, and video analytics.

"The cameras themselves are getting more intelligent," Godfrey says. Also, many "smart" cameras now support downloadable analytics modules produced by third-party vendors. This lets system owners use a module designed for a particular task, such as body movement analysis or object tracking. "You're not tied with one specific type," Godfrey notes. Different modules can be distributed to various cameras across the system, wherever a particular capability is needed.

Whether it's camera-based or server-based, analytics has its limits despite these advances. Even the most sophisticated algorithms running on the most powerful processors can have trouble coping with busy, visually complex environments—the types of places authorities most want to monitor.

"If you put it into a urban area, like New York City, there's so much going on at once it's almost like it overloads the brain," Vining says. "But if you have a defined area to monitor, it can work very well." Even so, intelligent video analysis can still be tricked into registering false alerts.

"I might be standing outside the airport waiting for somebody to pick me up," Sarangan says. "It might look like I'm loitering, but I'm not doing anything wrong." Yet system users are generally willing to tolerate the occasional false positive as the price they must pay for not overlooking a possibly serious situation, notes Vining.

Network capacity is another concern. "If you're streaming \[video\] across the network, it's about 2 Mbytes/s," Sarangan says. That means a system with 100 cameras needs to move nearly 200 Mbytes every second. Since many large-footprint installations like mass transit systems can require thousands of cameras, network costs can quickly mount. "That's a lot of data to be streaming across a network," Sarangan says.

EXPANDING MARKET
As intelligent video analysis becomes more widely available at ever lower price points, the technology is filtering down to a wide array of enterprises. "We have seen strong demand for the technology from specialty retailers, jewelry stores, and even supermarkets," says Jon Bolen, chief technology officer of Westec Interactive (Fig. 2).

Retailers can use intelligent video analysis to detect shoplifters. Casinos can tap the tools to spot cheaters. And, theme parks often turn to smart cameras to identify and locate lost children.

It could even be a powerful business tool. Stores can judge which floor displays are most popular with shoppers, while fast food restaurants can better assess their staffing needs by monitoring crowd sizes throughout the day.

Industry players believe most people are willing to give up a little privacy, at least in public, in return for enhanced security. Vining believes intelligent video analysis systems are destined to pop up in an ever growing number of public spaces.

"It's what the world is coming to," Vining says.

For more, see "Lip-Reading Technology Knows What You Said".