As mobile phones begin taking advantage of LTE and 4G networks, the performance demands on in-vehicle hands-free systems will increase. Where yesterday almost any more or less comprehensible conversation was acceptable, today users expect a level of clarity and intelligibility that was unknown just a few years ago.
To deliver this performance, hands-free systems have become very sophisticated and complex, which makes evaluating them difficult and time-consuming. Designers need to be aware of key features that contribute to system performance before choosing a hands-free system for their automotive projects.
Acoustic Echo Cancellation
In a hands-free call, the downlink speech signal with the far-end speaker’s words is sent to the vehicle loudspeakers. When the sound is played over these loudspeakers it is picked up by near-end microphones and echoed back to the far-end, creating an acoustic echo that can compromise the clarity of a conversation.
Acoustic echo cancellation (AEC) prevents far-end sounds from echoing back to the far end. It requires complex, computation-intensive procedures because it must cancel only unwanted sounds and function effectively in conditions of high noise and acoustic coupling, as well as during echo path variations caused by the vehicle occupants moving (Fig. 1).
Noise Reduction And Speech Reconstruction
Automobiles are subject to large and rapid fluctuations in noise levels from HVAC systems, the engine, and other moving parts. Most engine and road noise is low frequency, but it can’t be removed by simply filtering out these frequencies, because that would also thin the near-end person’s voice.
To make optimal use of available bandwidth, mobile phone networks (CDMA and some GSM) “gate” transmission. When the near-end person pauses or speaks very softly, the network stops transmitting, reopening transmission when the person resumes speaking at sufficient volume (Fig. 2). Because conversations from a vehicle often carry a lot of background noise, the person on the far end of the conversation hears an unpleasant oscillation between conversation with background noise and sudden silences.
Reducing the unpleasant effect of network gating demands noise reduction and improved signal-to-noise ratio (SNR), especially in the lower frequencies. To be effective in an automobile, noise reduction requires more than 16 dB of noise attenuation. This is now possible without damaging speech quality by using dynamic noise removal, speech reconstruction, and advanced algorithms that mitigate issues such as CDMA gating on downstream cell-phone codecs.
The system should do more than basic noise removal, and it shouldn’t use algorithms that fake greater noise attenuation by gating the speech themselves. This only worsens the problem.
Hands-free systems are often required to support either one or many talkers at the near end and to block some of them during activities that use voice recognition. These sorts of spatially selective tasks are often at odds with the requirement to reduce background noise.
To meet these conflicting requirements, the hands-free system can use two or more well-spaced microphones and specialized algorithms for handling multi-channel input. Multi-channel support can help reduce noise by using multiple well-placed microphones picking up the speech from different talkers more clearly. It can also improve voice recognition activities by blocking off-axis speech and noise sources.
Microphone Array Independence
A vehicle manufacturer may use a variety of microphone arrays and configurations. Even within the same project, different vehicle trims may use a variety of different arrays. A good hands-free system can be adapted to multiple arrays and configurations without changes to the software.
Many algorithms that work well in an anechoic chamber don’t work in a car due to the hard glass and plastic surfaces near the microphones. Further, many solutions fail robustness tests when faced with a sensitivity mismatch or loss of a microphone. Look for a system with a record of multi-channel support in automotive environments.
Automatic Gain Control
Automatic gain control is used to maintain consistent near-end voice levels without increasing gain in response to wind buffeting, GSM buzz from the mobile phone, and other noise sources. If a person in the back seat speaks, the automatic gain controller (AGC) should amplify the person’s voice automatically. The person on the far end shouldn’t need to ask anyone to speak up.
The AGC must also function equally well for the receiving (downlink) side of a conversation. Far-side terminals, cell phones, networks, and the people speaking contribute to huge variations in voice levels coming in. Variations can reach 30 dB very rapidly. The AGC should compensate for these variations and avoid sudden or otherwise noticeable changes.
The AGC should compensate for quantization noise (or quantization error: the difference between the analog and digital signal values) and network dropouts. A soft look-ahead limiter in the AGC is an effective technique for handling sudden changes in loudness and ensuring that speech is neither clipped nor distorted.
To accommodate differences in microphones, amplifiers, speakers, phones, far-side terminals, and cabin acoustics, a hands-free system must accommodate significant differences in send and receive frequency response. It should, therefore, include a simple means of implementing an equalizer (EQ), such as a parametric equalizer, that allows specification of node centers, widths, and gain values.
The best EQ for low noise conditions often isn’t the best choice for high noise conditions. Many systems use the same EQ for low and high noise conditions. A more elegant solution involves specifying separate EQ curves for low and high noise conditions, then automatically and seamlessly blending between these curves based on the SNR.
Wind Buffet Suppression
Automobile interiors are filled with constantly changing winds and turbulence from HVAC systems, open windows, and other sources. Unfortunately, directional microphones, which are the microphones of choice for in-vehicle systems, are more susceptible to wind buffeting than other microphone designs.
To counter these effects, a hands-free system needs wind buffet suppression capabilities. The algorithm it uses should not be limited to a passive high-pass filtering of the microphone signal, but should identify and selectively remove the acoustic effects of wind and turbulence.
Lower frequency consonants (e.g. /p/, /t/ /k/, etc.) are often masked by the noise in the vehicle, predominantly low frequency. But higher frequency consonants (e.g. /s/, /f/, etc.), which many people find difficult to distinguish even in the best acoustic environments, often aren’t masked, making them good candidates for effective intelligibility enhancement.
Noise removal doesn’t directly improve the intelligibility of a call, because it only removes information. However, it can be used with intelligibility enhancement techniques that reconstruct masked low energy consonants and boost high-energy consonants to make them easier to distinguish.
Noise-Dependent Receive Gain
To help compensate for loudness masking, the hands-free system should automatically adjust the receive level supplied to the head unit, based on the noise in the vehicle. By removing sudden fluctuations in speech levels, this technique can help reduce driver distraction because less effort is required to carry on a conversation.
Gain adjustments use ambient noise compensation or dynamic level control algorithms. These algorithms should provide smooth responses to changes in noise levels:
- If the adjustment is too slow, the driver may reach for the volume, which can provoke “gain chasing.”
- If the adjustment is too quick, the sudden change may seem unnatural or even startle the driver.
Look for a solution that supports parametric control of the gain change rates and has a proven record in production. Look also for algorithms that make minimal assumptions. The best make only one: that the noise measured at the microphone is the same as at the ear.
Bandwidth extension (BWE) can improve the quality of far-end speech. It uses filtering and wave shaping to reconstruct high-frequency and low-frequency portions of the incoming signal that were dropped by the narrowband mobile network.
BWE is difficult to implement, because there is often little speech information to work with and it requires a judicious balance between too little reconstruction and too much. Too little brings little benefit, while too much produces exaggerated bass and treble speech components. The best measure of a BWE solution is its record of acceptance by users in the field.
Cellular networks are beginning to support bandwidths greater than 3.5 kHz. With the advent of 4G and LTE networks and devices, mobile Voice over Internet Protocol (VoIP) calls will soon follow. Figure 3 shows how users’ perceptions of network quality improve with an LTE network.
Since a hands-free system should remain current for the life of the vehicle (more than 10 years), any system that can’t run at a sample rate of 16 kHz is already dated. Look for a supplier that is working to support a 32-kHz sample rate and has experience with VoIP codecs.
While the frequency range of calls may quadruple in the next few years, it is unlikely that computing resources will keep pace. Hence, the best solutions will offer super-wideband processing without requiring an increase in computing resources.
- See Shreyas Paranjpe et al., “Acoustic Echo Cancellation for Wideband Audio.”
- From Kari Järvinen et al., “Media coding for the next generation mobile system LTE,” Computer Communications 33 (2010): 1916-1927.