Several evolving technologies have combined to create new opportunities for video to serve as an information resource. Advances in image compression and broadband wireless communications along with falling costs for imaging sensors have made the installation of video cameras easier and cheaper for a widening range of locations.
Now, users are looking for systems that can help them use these image streams effectively, by generating alerts and extracting information. The key to providing such help is video analysis.
Like any evolving technology video analysis goes by many names, including intelligent video, smart video processing, and video analytics. Whatever the name, however, the goal is simple: extracting important information from the image stream.
The exact nature of the information extracted, and the system's further use of that information, varies with application. But it can include such tasks as area security, object classification and counting, feature extraction and recognition, and movement tracking (Fig. 1).
The roots of video analysis go back many years to video motion detection in security systems. Yet now, expanding broadband capability is opening up new opportunities and thus pushing the demand for video analysis.
“Adding broadband to video changes things,” said Danny Petkevich, director of the video and vision business unit at Texas Instruments. “It makes the adoption and installation of video technology easier, allowing it to solve problems it could never efficiently address before.”
A part of this application growth comes from a shift from analog video cameras to IP-based (Internet Protocol) digital cameras. According to Michael Long, video product manager at Analog Devices, most video cameras for security systems will be IPbased by 2012. Long estimates that 25% to 50% of these cameras will have some form of video analysis processing built in.
ANALYSIS CAPABILITIES HAVE GROWN
Meanwhile, video analysis has grown from its motion detection roots into a more capable and robust technology. Early systems needed a highly controlled, stable environment in terms of lighting and camera position to generate accurate results.
However, the technology has been making steady progress toward eliminating such constraints, according to Nik Gagvani, chief technical officer at Cernium. This progress enables video analysis to extract information from cameras in a widening range of installations.
True to its historic roots, video analysis has area security as its largest application market. The technology has gone far beyond simple motion detection, though. Security applications of video analysis include such refinements as video tripwires (sounding an alert when someone approaches a protected area) and leftobject detection, such as unattended luggage at an airport (Fig. 2). Analysis can also reveal when objects that should be at a given location are moved or missing.
An offshoot of meeting security needs is the ability of video analysis to categorize objects in the image, such as identifying people or automobiles, and then count them as they pass through the field of view. This ability has many uses, from monitoring the occupancy rate of a facility or parking garage to measuring traffic flow on the highway.
According to Vaidhi Nathan, president and CEO of IntelliVision, as many as 20% of traffic intersections in the U.S. use video cameras for functions such as controlling lights to manage traffic flow based on car counts. Nathan also notes that online trafficflow summaries such as Yahoo and Google Maps derive travel speeds from cameras counting cars.
Once an analysis system has categorized an object, it has the opportunity to extract key features of that object. Feature extraction can be as simple as recognizing what part of the person is the face and then capturing and storing the face's image. It can also extend beyond simple image capture to try matching the image to a template for identification. Such identification can be part of a security system's authentication for access control, or it can be used to locate suspects in a crowd.
Feature extraction also allows analysis systems to locate and read license plates on vehicles. Systems can use this ability for security, as part of a check-in, check-out process in a parking garage, or to extract travel time by logging the vehicle's movement past a series of cameras. Feature extraction can even identify and read hazardous materials labels on trucks prior to allowing tunnel entry.
Object categorization further serves as a first step in motion analysis applications for video. A variety of sports motion analysis systems is available, for instance, to help users examine and improve their golf and tennis swings. Video motion analysis also provides user-generated input to simulation games, such as tracking a swinging bat to determine the distance and direction a simulated baseball will travel.
TRACKING OBJECT MOTION
Furthermore, motion analysis allows systems to use security cameras in stores and entertainment sites to track customer movements and determine what displays and activities draw and hold their interest. At least one company, BSR Labs, has expanded its motion analysis offering beyond tracking to determining general behavioral patterns.
The company's AISight analytics software “learns” normal target behaviors over time by monitoring a video stream. The system then uses that behavior model to identify suspicious activity such as loitering or anomalous behavior that might represent previously unanticipated security threats. Cernium's Gagvani anticipates that next-generation systems may even be able to recognize subtle differences in behavior such as distinguishing the meeting of two friends from the making of a drug deal, an application that for some raises the specter of “Big Brother” (see “Bringing Privacy To Security,” at www.electronicdesign.com, ED Online 21482).
A completely different type of application for video analysis is on the horizion: the driver assistance system. IntelliVision's Nathan notes that high-end automobiles already have built-in camera systems and expects that every car will have them in the next 10 years. The driver assistance systems use that in-car camera to detect potential risks and either alert the driver or take corrective action.
A camera pointed at the road ahead, for example, allows the system to detect an obstacle in the road, find pedestrians about to step in front of the car, or notice when the car is drifting out of its lane. It might then sound an alarm or even take control of the car to avoid a collision. A camera pointed at the driver might look for signs that the driver is falling asleep and then trigger a wake-up alarm. Such applications are under active development at automobile companies.
Despite the wide range of applications for video analysis, however, its foundation building blocks are relatively few. According to Analog Devices' Long, most analysis algorithms build off a common base function: edge detection. If a development team does not want to start from scratch, though, there are higher-level building blocks.
Ed Troha, managing director for global marketing at Object- Video, indicated the available algorithms fall into three categories. One category is database-related object recognition and identification. Motion detection constitutes another category. Both of these work at the pixel level—one looking for template matches in a single frame and the other looking for frame-byframe changes.
The third category identified by Troha is behavior-based analytics. These algorithms do more than simply react to a predefined event, such as a person entering a restricted area. Behavioral analytics instead are able to perform a function by actively analyzing a scene. Such algorithms, for example, can identify when an object has entered or left the scene and query a list of rules to select an appropriate response. For instance, this makes it possible for the system to distinguish between a person in a museum walking past a painting and someone walking off with the painting.
PREPROCESSING IMPROVES RESULTS
Another building block that has emerged in the industry is not a form of analysis but a kind of pre-processing that makes the analysis more robust. Examples include noise filtering, compensation for camera motion, and removal of distractions in the image such as water ripples, birds, and swaying tree limbs. Such pre-processing reduces the chances of false alarms in the later analysis.
To be able to apply these building blocks, however, you first need a system. The algorithm companies recommend that developers begin by looking closely at their application and operating environment. What a video analysis system can achieve depends in part on factors such as the amount and nature of the scene's illumination, the position and distance of the camera from target objects, and the field of view.
The light inside a building hallway, for instance, simplifies analysis while the light in an alleyway—which varies hourly, daily, and seasonally—complicates things. Camera sensitivity to infrared (IR) light is different from its response to visible light. A camera with a distant view will not be able to recognize faces as reliably as one closer to the subject. These and other system factors will affect what is achievable and the quality of results using video analysis.
The next place to look in the system design is at the camera, including the optics, the image sensor, and the on-board signal processor—all equally important in achieving good results with video analysis. Ideally, the camera's imaging characteristics should reflect its use in machine vision.
“Most camera designers aim for an output signal that is most pleasing to viewers,” said Cernium's Gagvani, “but that's not the best thing for analysis.” Gagvani's recommendations for camera characteristics include monochrome operation with good contrast and VGA resolution. Color, he pointed out, is not often used in analysis applications, partly because color fidelity is difficult to keep consistent without complete control over the lighting conditions. High resolution is typically unnecessary for most applications and simply loads down the system.
A camera characteristic more important than absolute performance is consistency. Analysis software is simply unable to distinguish between image changes that stem from such things as automatic gain control (AGC) adjustments and those that represent real change in the scene. Similarly, cameras should minimize frame timing jitter to ensure accurate motion detection and object tracking. Consistency is even more important than frame rate in most applications. Delivering consistent video at 10 frames per second yields better results than operating at a higher rate with dropped frames.
The next system design factor to consider is the system architecture (Fig. 3). Video analysis has traditionally occurred on a central server receiving video feeds from several cameras. This architecture is still the best choice when the system must work with an installed network of existing cameras. Increasingly, however, new system designs place the analysis capability in the camera itself, which then sends both video and meta-data about the video to the video management and display system. A third possible architecture places video analysis at the digital video recorder.
ANALYSIS NEEDS PROGRAMMABILITY
In any of the architectures, the processing hardware that performs the analysis is most likely to be a DSP or CPU rather than specialized silicon. Many video applications, including analysis, call for video compression to minimize transmission bandwidth or storage space, and specialized hardware exists to handle such compression (see “Lights, Camera, Process,” ED Online 19672).
Analysis, however, is best performed before compression and has not yet established an industry-wide set of common functions that would make specialized hardware practical. Thus, a typical video analysis design would use one set of hardware for image processing and a programmable processor for the analysis.
Some hardware specialization is beginning to occur. Eutecus, for instance, offers a design for the Xilinx Spartan FPGA family that serves as an analytics engine for its VALib video analysis library. Eptascape offers hardware that automatically extracts MPEG-7 meta-data from a video stream as part of its analysis package. Texas Instruments recently introduced C674x DSP family members with a video port that provides dual input and output channels for use in analytics and other video applications. But for the most part, developers must work with more generic devices.
Because video analysis uses a lot of convolution, the most suitable processor architecture has a single-instruction, multiple-data (SIMD) structure, according to Cernium's Gagvani. He also said that memory bandwidth is important in video analysis, so processors should have a large local cache. Fortunately, the processing speed requirements for most video analysis algorithms are quite modest. ObjectVideo's entire analytics software library requires less than 20% of a typical TI DSP's processing capacity and less than 7% of an Intel Atom's, according to Troha.
SOFTWARE IS KEY
With system architecture and hardware settled, what remains is software. Crafting efficient and effective analysis software from scratch or even just porting C-language routines for embedded operation can require substantial time and expertise. Fortunately, developers have numerous sources available for software libraries and design support, beginning with DSP vendors.
Analog Devices, according to Long, offers “a one-stop shop with free extensions that enable developers to get started” all internally developed and maintained. In addition to a library of task-level functions such as edge detection, the company provides a variety of functional modules for video analysis such as its recently released motion detection module.
Texas Instruments supplies its DSP customers with its royaltyfree VLIB software library, which includes more than 40 visionrelated kernels. These kernels provide a foundation for endapplications development by handling tasks such as background modeling and subtraction, object feature extraction, tracking, and recognition as well as low-level pixel processing. The library is optimized for the TMS32064x DSP core, which can avoid manyears of C-language porting efforts as well as increasing performance by an order of magnitude, according to Petkevich.
Independent software vendors offer even more advanced libraries that bring developers another step closer to their end product design. Many of these libraries were first developed in C for server-based system architectures but are now also available as object code for embedded development. These libraries boast fully developed information-extraction functions such as license plate recognition, intrusion detection, and video counters.
Application developers only need to decide how to use the information these functions provide. They don't need to know how to extract the information from the video. Vendors include Abstract Computing International, Agent Video Intelligence, BSR Labs, Cernium, Eutecus, Eptascape, IntelliVision, ObjectVideo, Survision, and VCA Technology. These vendors caution developers new to the field to have realistic expectations, however, for what video analysis can achieve.
Reliable and accurate detection and classification of color, for instance, is a highly complex task that depends as much on target illumination as analysis software. Without controlled lighting, analysis results can be variable. Increasing the processing power or camera resolution does not improve results significantly, either for color or other types of analysis.
While more processing power enables the system to support more simultaneous video channels or achieve faster results, it's unable to extract more information than a more modest processor. Available algorithms work in broad strokes, able to distinguish a person from an automobile, but typically unable to make fine distinctions such as telling a Ford from a Chevrolet.
Within their limits, though, video analysis algorithms should perform reliably in a wide range of environments, although poor lighting, bad camera angle, camera movement, reflections, and other environmental factors will have a detrimental impact. Even so, developers should expect systems in most installations to detect target objects appropriately better than 90% of the time (less than 10% false negatives), according to Cernium's Gagvani.
False positives, while more difficult to measure, should stay less than 10%. (This does not count nuisance recognitions such as reflections in a window being identified as a person.) The more controlled the environment, the better these results will be.
What these performance results suggest, and vendors point out, is that video analysis is best used to support human activity rather than serve as the entire solution. “People need to look at analytics as a tool to give humans more information to make a judgment,” said ObjectVideo's Troha. “It's a team effort.”
IntelliVision's Nathan pointed out that video systems don't work like human eyes and cannot recognize things the way people do. These systems are good at keeping track of results over long periods of time, he said. An analysis system can watch a hundred cameras simultaneously for months at a time without losing efficiency (Fig. 4). A human cannot.
While the reality of video analysis' abilities are a far cry from what Hollywood would have us believe, they can handle the job in many key applications. As cameras proliferate, opportunities to put these brains with those eyes will increase. Success hinges on setting appropriate expectations, working with vendors to speed development and integration, and cleverly combining and applying the fundamental capabilities video analysis has to offer.