Electronic Design
Adjust Everything in Your Car with a Wave of the Hand

Adjust Everything in Your Car with a Wave of the Hand

Thanks to gesture-recognition software, we’ll soon be able to control music, climate control, and other automotive functions through simple body movements.

Download this article in PDF format.

Gesture recognition is the ability of a device to identify a series of human body movements. This electronic technology relies on the aid of a camera and the IC devices that identify and scan the scene in a 2D or 3D profile. It also uses the time-of-flight (ToF) technique, which comprises sending an infrared beam on the target to be analyzed, resulting in reflection of the processed signal by means of the receiving electronics.

Various IC solutions—with the aid of software algorithms for the recognition of gestures—create a depth map of the received images. As a result, they can respond in real time to the movements of the body. The algorithms also include a number of mathematical functions for facial recognition and voice- and eye-tracking.

On this front, automobiles are rapidly becoming camera-enabled. Taking advantage of one or multiple image sensors, the cameras can represent the three-dimensional space with the possibility to develop products that transform imaging data into meaningful operations.

One key sensor-enabled technology is gesture recognition, which helps keep the driver’s eyes on the road while still controlling several functions safely. In the automotive market, the ToF technique is seen as a promising solution for implementing gesture interaction technology (Fig. 1)

1. The base components of a ToF camera system for gesture recognition are the image sensor, an objective lens, and an infrared illumination light source that emits RF modulated light. The processor and software algorithm scan the images to analyze the gesture and proceed to the recognition.

How Does It Work?

The goal of a ToF camera is to screen a whole image of a scene. These cameras consist of a transmitter (a lightning block that illuminates the region of interest with the modulated light) and the receiving sensor (constituted by an array of pixels that collects light from the same region of interest). The vision-control algorithm will have to scan the images to analyze the gesture and proceed to the recognition.

The objective of ToF sensors is to demodulate the reflected light, measuring the position of each pixel that represents the correlation between transmitted and reflected light. The pixels collect light from separate parts of a scene. By recombining them, they create a reconstructed stage.

All of the sensor’s pixels are controlled by an input of the correlation/demodulation block and the modulation block. The demodulation of the pixels is synchronous with the modulation of the transmitted light signal. In the simplest form, each pixel can be approximated by the model shown in Figure 2.

In reset mode, the pixel is reset by the RST signal (Reset signal) to a preset voltage value. During integration time, the photocurrent is directed to Node-A or Node-B, which activates the suitable demodulation signals. In the reading stage, the demodulation is stopped and the decoding address signals are activated to read the entire array in a programmed sequence. The ToF sensors use the pixel technology based on a Current Assisted Photonic Demodulator (CAPD).

Node-A and Node B (Fig. 2, again) consist of reverse-biased diodes. Modulation is accomplished by alternately changing the direction of the voltage applied between the DMIX0 and DMIX1 nodes. Because this modulation field is applied within the substrate, the generated electrons can be collected, which contributes to higher sensitivity. The voltage used for demodulation controls the intensity of the electric field and, thus, the drift velocity of the electrons generated.

Stochastic functions adjust the conversion of photons (reflected light on the array) into electrons in a quantum process. In particular, the rate of photon generation uses a Poisson distribution. Similarly, the signal of the reflected photons from the target and the relative conversion of the same into electrons within the pixel also involve quantum processes with a Poisson distribution. In these cases, not all of the light that strikes the pixel converts into electrons.

2. Shown is a wiring diagram representative of a single pixel. The pixels are tasked with collecting light from separate parts of a scene. By recombining them with a software algorithm, they create a reconstructed image used in vision-machine or gesture-recognition systems.

To check the quality of the system, it’s best to measure the quantum efficiency as a function of the wavelength of the light used in the transmission:

n(λ) = ne/np

where ne is the number of electrons produced and np is the number of photons that activate the corresponding pixels.

The number of electrons is formed by two components: one produced by the modulated light and the other relative to the ambient light, which corresponds to the effect of noise contributing to the system signal-to-noise ratio (SNR). Considering the quantum efficiency described in the equation above, we can defined the responsivity as:

R(λ) = n(λ)( λqe/hc)

where c is the speed of light; h is the Planck constant; qe is the charge of a single electron; and λ is the wavelength of the used light. When DMIX0 is low and DMIX1 is high (Fig. 2, again), all of the generated electrons are collected by Node-A, while none are collected from Node-B. Here, it’s said that the demodulation is perfect.

On the contrary, when DMIX1 is low and DMIX0 is high, all of the generated electrons should be acquired from Node-B and none from Node-A. In practice, this condition is never verified. The phenomenon of non-ideality is called demodulation contrast. The ideal value is equal to one. The demodulation contrast is only responsible for rearranging the electrons, not for their generation. As a result, SNR increases in proportion to the demodulation contrast.

When designing a ToF system, field of view (FoV) must be chosen appropriately according to the scene coverage requirements. For example, in gesture recognition for laptops, a large FoV is more suitable because the subjects are close to the camera. On the other hand, for televisions, a narrower FoV may be appropriate because the subjects are distant.


Suppose we want to start the rear wipers just by moving our eyes to the rear-view mirror or turn on the radio by simply moving our eyes to the console. These are among the many examples and functions that the control algorithm must be able to decode in real time.

One critical feature is the ability to monitor the psycho-physical symptoms that could have negative consequences while driving, such as extreme fatigue. Modern eye-tracking systems use infrared LED (IRED) as the source of illumination and high-resolution cameras to detect the reflected light. The algorithms process raw data and calculate the position of the pupils.

3. An eye-tracking system generally comprises two components: a light source directed toward the eye, and a camera. The goal of the camera is to track the reflection of the light source along with ocular features. Other components such as a display and processor can be included in eye-tracking systems for medical applications.

In reference to the frontal images of the driver, the system can trace back to the area where the user is looking. The infrared lighting ensures good contrast between the iris and pupil, the color of the eyes, and environmental conditions (Fig. 3).

The Future

Many challenges must be overcome for the automotive sector to fully adopt and implement gesture-recognition technology. The first systems will be testbeds that support the development of gesture-recognition platforms, as well as rapid-prototyping solutions to speed time-to-market.

The new technology is already enabling automobile manufacturers to integrate high-tech features in their cars, so that the driver can check the safety of the vehicle control systems. Interpreting such information incorrectly, however, might jeopardize the safety of the driver or others around the vehicle. For example, an approaching hand might activate the infotainment system in cars, while tilting the head can turn on the direction indicator.

Multimodal human-machine interaction (HMI) is used in today’s vehicles to offer the driver various redundant ways to control functions. Voice and touch have already become standard features. But for gesture recognition in cars to succeed in simplifying driving, comfort, or infotainment features, these systems must ensure that they’re reading gestures appropriately. Thus, manufacturers are turning to the science of gesture recognition, which interprets human gestures as input commands by using mathematical algorithms. This may include small changes in facial expression, but also body motion.

It’s expected that the gesture-recognition system will be the next-generation in-car user interface. As such, semiconductor suppliers are developing the hardware and software algorithms to enable user command input with natural hand and finger movements.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.