Event-Based Image Sensor Views Videos in a New Way

Image processing is a demanding task and one reason is that it normally involves processing all of the pixels in an image. Multiple frames must be processed when dealing with video streams. Determining what changes from one frame to another is useful in detecting objects and other alterations, but it requires lots of horsepower and bandwidth to do that. This approach is needed because the image capture devices deliver a frame at a time.

But what if that wasn’t the only way to get image information?

Prophesee’s Metavision image sensor (Fig. 1) takes a different approach—it’s an event-based sensor. Aim it at a solid color image and it generates almost no data because the pixels aren’t changing. Move a hand in front of the background and the sensor will start sending a stream of events that indicate what pixels change. This tends to be a fraction of the overall number of pixels, which is typical of most scenes.

1. Prophesee’s Metavision can deliver 66 million events/s with a dynamic range over 120 dB. A comparable imaging system would need to operate at 10,000 frames/s to keep up.

The sensor actually works on a programmable threshold for each pixel, allowing the system to have a wide dynamic range of over 120 dB. This is significantly higher than most image sensors. As a result, the sensor can detect changes even with a wide dynamic range, which would cause problems with a conventional sensor.

For example, sun glare or low light can a problem because the typical image sensor has a more limited range for each pixel. The sensor can be set up for a low-light situation, but then it would deliver a maximum white value if a bright light is in the scene. A typical color sensor may have RGB values, though only 8 or 16 bits per pixel.

Figure 2 highlights the area where data would be generated by the Metavision sensor. In this case, the video is of a person swinging a golf club. The highlighted area is where pixels are changing.

2. A conventional system would record full frames, but most of the pixels in the image don’t change significantly. That’s why data compression works well with video. The gray areas highlight what the Metavision chip would report as changes.

There’s a downside to the sensor as it doesn’t report the color of each pixel—only the changes. An application may combine a conventional image sensor with the Metavision sensor if this type of information is needed, but the Metavision sensor changes the way your application analyzes video. It may only be interested in following the changes in an image.

For instance, an application may be tracking a hand gesture. It doesn’t matter whether a person is wearing a blue glove or not. The application simply wants to recognize the gesture and the Metavision sensor can provide that information more economically.

Frame-Based vs. Event-Based

Figure 3 attempts to highlight the difference between using a frame-based imaging system and event-based system. Imagine that there’s a rotating disk with a dot on the periphery. The spiral is a mapping of the blue dot’s position over time. Orange and red dots are highlighting the blue dot’s position for particular points in time. The frames to the right would be how a conventional imaging system would report the data, although the actual dot is all we’re concerned with. The circles and other dots are to provide a perspective.

3. Prophesee was demonstrating a rotating disk with a dot on the periphery. A conventional frame-based system would capture a full image, but the dot would be moving more quickly than the frames could be captured. The spiral highlights the data that the Metavision system would deliver.

The frame-based system would essentially show the dot jumping from one point to another. A sufficiently high frame rate would reveal the rotating nature of the system. Essentially, we have a Nyquist sampling issue.

Of course, a sufficiently fast frame-based solution will provide enough information for the video to be analyzed. However, this also means that a lot of data must be processed. If a machine-learning algorithm is being applied, then even more processing power is necessary.

The event-based system would deliver a significantly lower amount of information even though it could easily track the rotation. In fact, a microcontroller could easily handle the amount of data from this type of imaging system with a more complex scene and scenario. Of course, it could be overwhelmed by massive changes with a multitude of changing light conditions, reflections, etc., but that would be unusual in most cases.

Likewise, many applications have a more controlled environment, or the areas within the scene may be partitioned or managed in some fashion. The threshold that the Metavision sensor has for each pixel can be adjusted as well, allowing areas to essentially be ignored.

Prophesee is now delivering its sensor. “This is a major milestone for Prophesee and underscores the progress in commercializing our pioneering Event-Based Vision sensing technology. After several years of testing and prototyping, we can now offer product developers an off-the-shelf means to take advantage of the benefits of our machine-vision inventions that move the industry out of the traditional frame-based paradigm for image capture,” said Luca Verre, co-founder and CEO of Prophesee.

Frame-based video processing remains a useful paradigm. However, event-based video processing opens up a whole new area, potentially providing very-low-end platforms with the ability to handle image-processing chores that they couldn’t handle if a frame-based input stream was used.

Development hardware and software available from Prophesee gives developers the ability to implement the Metavision sensor right away. The current sensor has a 640-×-480 resolution with a 15-µm pixel size in a 0.75-in. format. It has a 0.04-lux low-light cutoff and under 1-mHz background noise activity. The sensor comes in a 13- × 15-mm PBGA package.