Smart Moves

When handset met handheld, two different audio worlds collided. Several years on, analogue engineers are still perfecting integrated solutions to handle voice, music playback and ring tones.

There was a time when digital audio was neatly segregated into Hi-Fi and telephony. Hi-Fi generally meant stereo and 16bit resolution, sampled at 44.1kHz – the original Compact Disc (TM) specification. Telephony was mono and low-resolution, typically digitised at 8bits and 8kHz.

Different types of mixed-signal ICs appeared to suit each application. Hi-Fi audio codecs were quick to make use of multi-bit sigma-delta technology to improve sound quality, while phone parts remained much simpler, the low data rate and low-cost transducers restricting the scope for quality improvements.

The two types of codec also had different interfaces. A number of data formats emerged for Hi-Fi stereo, the most widely used today being I2S (Inter-IC Sound). Telephony codecs generally have a PCM (Pulse Code Modulation) interface. Strictly speaking, the term PCM encompasses most digital formats in use today, including I2S; its original purpose was to distinguish between digital coding and analogue technologies like frequency modulation. However, in digital telephony, PCM usually refers to one specific, monophonic data format that is incompatible with Hi-Fi stereo.

The rise of computer audio spawned yet another type of interface. While the quality requirements were similar to the established consumer audio market, there was a need to play audio files recorded at different sample rates (notably 8kHz, 44.1kHz and 48kHz). Sample rate conversion in software was possible, but computationally expensive. The AC'97 standard delegated this task to the codec, where it can be performed more efficiently by dedicated hardware. AC'97 has become the de facto industry standard for computer audio.

Portable systems initially remained true to their origins: personal CD, minidisc and MP3 players used I2S DACs, mobile phones stayed with PCM and audio-enabled PDAs generally had the same AC'97 codecs as desktop computers. It is hardly surprising, therefore, that the first generation of combined systems usually consisted of phone and PDA circuitry arranged side by side in one box, with a PCM voice codec controlled by a communications processor and a Hi-Fi stereo (AC'97 or I2S) codec connected to an applications processor.

However, codecs not designed with this application in mind offered little or no provision for interconnections between the two audio subsystems. Discrete solid-state switches were often inserted into analogue signal paths, introducing pops, clicks and harmonic distortion and taking up board space.

An integrated solution tailored to the application is preferable. The SoC philosophy has led some vendors to integrate stereo DACs or codecs with other large ICs. However, this approach does not yield the audio quality that can be obtained with dedicated audio chips. Combined power management and audio ICs tend to compromise audio quality, as power regulators often inject noise into nearby audio signal paths.

Integrating audio into digital ICs is equally problematic because true Hi-Fi components typically require a 0.35µm process optimised for mixed-signal applications, whereas digital logic has moved down to 0.18µm and beyond. For both types of circuitry to coexist on one chip, either performance in the analogue domain is compromised, or – if the whole IC were to be built on a larger geometry – chip size would grow to an unacceptable size.

Loudspeaker amplifiers are particularly hard to integrate, as they generate significant amounts of heat that need to be dissipated. Many combined chips lack this function, and thus cannot be considered true 'system on a chip' solutions because an external speaker driver IC is required. Another common problem is an insufficient number of analogue inputs or outputs, due to a desire to keep the IC as small as possible. In quadratic packages with pins arranged around the perimeter, such as the popular QFN (quad flat-pack, no leads) package, extending the length of each side by, say, 1mm to accommodate a few extra pins leads to a far larger increase in the IC's footprint if it was already large to begin with.

Dedicated audio ICs avoid these problems. The overall chip count can still be reduced by integrating other mixed-signal functions, such as touch screen digitisation, with the voice and and Hi-Fi codecs. Where the voice codec is integrated into a telephony chipset, a Hi-Fi codec with extra analogue inputs, outputs and internal mixing may be appropriate.

Audio integration can be achieved in a number of ways. Sharing the ADCs and DACs reduces hardware cost, but makes it impossible to play or record two audio streams simultaneously. Having dedicated converters for each function overcomes this problem and prolongs battery life, as telephony-grade audio blocks can be designed with lower power consumption than Hi-Fi functions. However, such a solution increases silicon cost. A common compromise is to have separate DACs, but share the ADCs. This permits audio playback while a phone call is in progress but no recording to the applications processor during a call. The ADC's power consumption can be kept in check by powering one channel off and running the other at a lower sample rate.

While it is possible to share internal circuit blocks between the communications and applications domains, the same is not true for the interface. This is because each audio stream runs on a separate clock domain with its own clock frequency. As long as this remains the case, combined smartphone codecs need both a PCM interface and a separate I2S or AC'97 connection.

In stationary systems, audio clocks are usually generated by a crystal oscillator. For example, AC'97 specifies that compliant codecs have an on-chip oscillator that connects to an external 24.576MHz (512 _ 48kHz) crystal, while I2S parts use a multiple of the sample rate, most often 256.

In smartphone design the extra power consumption, board space and cost of clock crystals have led designers to derive the Hi-Fi audio clock from another clock already present on the board. Although the odd frequency ratios involved require a phase locked loop (PLL) to do this, this solution is still preferred to an extra crystal because low-power, low-noise PLLs can be integrated into mixed-signal ICs at relatively low cost.

Many of the toughest design problems in smartphones are related to microphones. There are usually at least two microphones to consider: the built-in (internal) microphone and the external microphone that is part of the headset. Besides phone calls, these microphones can also be used for recording voice notes, or even the audio track of a video clip under the control of the applications processor.

To eliminate off-chip switching, smartphone codecs need to provide sufficient microphone inputs, preferably with individually adjustable gains, and flexible internal routing to cover all usage scenarios. Besides recording, a 'side tone' function should also be provided. This adds an attenuated version of the microphone signal to the analogue outputs, so that callers using a headset can hear their own voice. Insertion detection enables seamless switching between internal to external microphones when a headset is plugged in or disconnected.

Noise is another common concern. The high-frequency and digital parts of the circuit generate interference that is picked up by PCB tracks carrying microphone signals, and boosted by on-chip preamplifiers. While careful PCB layout plays a large part in avoiding this problem, differential microphone inputs are another effective counter-measure. However, differential inputs have their own layout requirements: the two PCB tracks must run in parallel and next to each other, so that any noise picked up in one track is also present in the other, and is therefore cancelled out in the microphone preamplifier.

Acoustic noise cancellation is a separate problem and requires two microphones; one picks up the speakers voice with background noise, the other only background noise. A simple subtraction in the analogue domain rarely yields satisfactory results because the two noise signals will differ in phase and amplitude depending on which direction the noise is coming from. Digital signal processing is needed here. However, the codec must facilitate the task by digitising two microphone signals.

Another type of noise occurring in outdoor use is wind noise. This is mostly confined to frequencies below 200Hz and can therefore be greatly reduced with a high-pass filter. The simplest solution is to use a smaller coupling capacitor at the microphone input. However, this prevents the microphone from being used for indoor music recording – there would be no bass. For dual-use microphones, the filtering should therefore be optional. Incidentally, most audio ADCs already have a built-in high-pass filter to remove DC bias from the digital signal. IC vendors have customised this feature for mobile applications by making the corner frequency selectable – a few Hz for Hi-Fi and somewhere between 100 and 200Hz for voice with wind noise filtering enabled. Naturally, analogue and digital filtering can also be combined to create a higher-order filter characteristic.

Handling mobile phone headsets also requires specific analogue circuitry. The first obvious task is to re-route output signals from the earpiece or other speaker to the headset when it is plugged in. Although sockets with integrated mechanical switches can do this, they are bulky and expensive. Moreover, the signal level used in a speaker may not be appropriate for the headset. Separate analogue outputs for the earpiece, speakers and headset with separate volume controls will solve this problem and allow for using a simpler socket. Although a mechanical switch is still needed, a single-pole, single-throw type with one end connected to the ground pin is sufficient, so that the socket only needs one extra pin. However, in a multimedia phone, activation of this switch does not necessarily indicate that a headset has been inserted; in a standard size socket, it could just as well be a headphone that does not include a microphone. The presence or absence of a microphone should therefore be detected separately. This can be achieved by sensing the microphone's bias current – if no current is flowing, no microphone is plugged in. Conversely, an unusually large bias current is also significant: in order to avoid adding another contact to standard headphone / headset jacks, the button used to answer a call from the headset (the so-called hook-switch) usually shorts out the microphone. As a result, the bias current increases, indicating that the hookswitch has been pressed. By adding a current sensor to the on-chip microphone bias circuit, smartphone codecs can detect both conditions and automatically take the correct action in each case.

The number and output power of loudspeakers in mobile phones has ballooned recently. Whereas a single earpiece was the norm in the 1990s, modern clamshell designs feature inside and outside speakers to play sounds while the phone is open or closed, respectively. Supporting stereo ring tones requires two outside speakers, while the popular hands-free function might need another 'large' (by mobile phone standards) speaker besides the small earpiece. As with microphones, providing dedicated analogue outputs for each speaker offers many advantages over off-chip switching. Since loudspeaker amplifiers can draw large supply currents, it is crucial that they are powered down when not active. Smartphone codecs offer increasingly granular power management, allowing for the enabling and disabling of each individual output to avoid any unnecessary draining of the battery. Moreover, the voltage regulators in existing power management solutions often cannot supply enough current to drive speakers at full volume.

Codec vendors have responded to this issue by designing on-chip loudspeaker amplifiers to run directly from the battery (typically around 4.2V with lithium ion batteries) rather than the regulated supply voltage. Although this does not normally result in a power saving – the speaker amplifier merely dissipates extra power that would otherwise be consumed in a regulator – it eliminates the need for an additional voltage regulator.

So what does the future hold for smartphone audio? Notable trends in digital audio today include the migration from stereo to multi-channel surround sound formats and the likely adoption of the recently introduced 'Azalia' (Intel High Definition Audio) standard in large parts of the PC and notebook space. While those who, not so long ago, ridiculed the idea of stereo speakers in a mobile phone have been proven wrong, it seems unlikely that handheld devices will go multi-channel in the foreseeable future. Likewise, Azalia's new features currently do not justify the higher cost and power consumption over AC'97. The I2S versus AC'97 debate is ongoing, with some designers favouring the less complex I2S interface while others prefer the lower pin count and easy handling of different sample rates that AC'97 offers. As many low-power CPUs for use in smartphones now offer both dual-standard audio interfaces that cater to both camps, both standards may continue to coexist. Conversely, designing a codec to support both standards is much more difficult because the VRA (variable rate audio) feature of AC'97 requires a different clocking scheme than I2S.

Successful integration of the applications and communications processors into a single digital device using a single audio clock would make it possible to merge the voice and Hi-Fi audio interfaces, and might eventually prompt a move back to less complex codecs. But for now, IC vendors are concentrating on integrating other existing mixed-signal components into their audio codecs, including touchscreen functions, voltage regulators and power management. The ready availability of integrated imaging solutions has so far inhibited integration of audio with camera or video functions, but this is by no means a law set in stone. Meanwhile, audio features like 3D enhancement, graphic equalisers and dynamic compression look set to proliferate, along with incremental improvements in sound quality, power consumption and package size.

See associated figure 1

See associated figure 2

See associated figure 3

See associated figure 4

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.