New Audio Technologies Resound On Many Fronts

Some fascinating developments have reverberated within the world of audio this year (see "Multichannel Audio, DDS Keep DACs Humming" at www.electronicdesign.com, ED Online 11763). For one thing, the trend toward more channels in home-entertainment systems has somewhat reversed itself, thanks to ever-more sophisticated algorithms for "virtualizing" speakers.

A certain "spousal acceptance factor" worked against installing all of the speakers that came with a 5.1 or 7.1 system. Lately, it's become possible to create the impression that the audience is hearing sound from speakers that aren't really there—hence the "virtual" moniker and a cessation of hostilities between spouses across the land.

Also, interesting things happen when personal media players encounter the limited bandwidths associated with wireless. At least one vendor is exploring what happens when a cell phone becomes a media player. Not only that, cultural factors are driving companies to serve unique submarkets around the globe.

For example, like most North Americans past a certain age, I generally keep my cell-phone ringer set to "tingle" mode. Yet younger cultures on this continent, along with other cultures elsewhere, venerate their ring tones and want everyone around them to hear when they've downloaded the latest. That leads to the need for a separate, relatively high-fidelity Class D audio amplifier just for the phone's ringtone speaker.

Finally, a number of recent silicon chips for the audio signal chain across the audio spectrum embody ideas or technical specs that just weren't available last January.

ALGORITHMS GALORE Let's start by looking at that intersection of personal audio with wireless. The latest standard is MPEG-4 aacPlus, also known as HE AAC, or just AAC Plus. It combines the Advanced Audio Coding (AAC) standard that Apple uses in iPods with Coding Technologies' Spectral Band Replication (SBR) and Parametric Stereo (PS) technologies.

SBR and PS allow much lower bit rates than AAC alone (Fig. 1). That's particularly important in wireless applications, where streaming audio shares the bandwidth with two-way communication modes. But even high-speed landline connections can't guarantee long-term bit-rate consistency.

This has led to the use of "perceptual codecs," which use psychoacoustic algorithms to decide what information to leave out of the data stream. MP3 is a perceptual codec. But get down below 128 kbits/s (about 12:1 compression), and the music starts to sound funny.

Standardized in ISO/IEC 14496-3:2001/Amd.1:2003, SBR makes it possible to either increase the audio bandwidth at a given bit rate (for music) or improve coding efficiency at a given quality level (for speech). SBR is mainly a post-process, though some pre-processing is performed in the encoder to guide the decoding process.

In practice, the underlying coder handles the lower part of the spectrum, and the SBR decoder reconstructs the higher frequencies in the decoder based on an analysis of the lower frequencies. This is encoded as guidance information that's inserted into the encoded bitstream at a very low data rate.

SBR can enhance the efficiency of perceptual audio codecs by about 30%. When used with MP3, perceived stereo quality at 64 kbits/s is said to sound as good as conventional MP3 does at 100 kbits/s or higher.

In aacPlus, PS takes SBR a step further. The PS encoder extracts a parametric representation of the stereo image of an audio signal and transmits a portion of it along with the monaural signal in the bit stream. Based on the parametric stereo information, the decoder then can regenerate the stereo image.

Where bandwidth is no barrier, in audio-visual receivers (AVRs) and the home theater, disc recording standards such as HD DVD and Blu-ray are driving new HD-audio formats like Dolby True HD, Dolby's lossless technology. While it's now appearing mainly in AVRs and home-theaters-in-a-box (HTIBs), you can expect it to eventually show up in cars and PCs. It's the mandatory format for HD DVD, and it's optional for Blu-ray.

TrueHD runs bandwidths as high as 18 Mbits/s. It was developed to deliver sound that's bit-for-bit identical to the studio masters using high-definition disc-based media. True HD supports up to eight full-range channels of 24-bit/96-kHz audio. Actually, it can support more, but HD DVD and Bluray standards limit their maximum number of audio channels to eight.

Data flows via High-Definition Media Interface (HDMI) 1.3, a single-cable digital connection for audio and video. In addition to the lossless compression, True HD supports metadata such as dialogue normalization, which maintains a consistent volume level when the user switches among multiple programming sources, and dynamic range control (also called "night mode"), which throttles peak volume levels so gunfire, car crashes, and explosions don't disturb anyone.

AUDISTRY'S ARTISTRY At last January's Consumer Electronics Show, Dolby Labs announced Audistry. This new subsidiary offered a package of five DSP technologies for MP3 portables and TVs meant to get bigger results with fewer MIPS. Typically, the technologies operate at 15 to 30 MIPS and use less than 4k of memory. Audistry-based chips are available or are on the way from Analog Devices, Freescale, Intel, Nagano Japan Radio Co., Sanyo, and Texas Instruments.

Audistry technology comes from Australian company Lake Technology, acquired by Dolby last year. Prior to the acquisition, Dolby marketed earlier Lake products such as Dolby Virtual Speaker and Dolby Headphone.

Let's look at the Audistry technologies, starting with the "Sound Space Expander" (Fig. 2). Without some trickery, you can't get much stereo separation from a boom box or a big-screen TV with speakers at the edges of the display. That's what this technology was designed to address.

Old-fashioned stereo widening is simple, and it was first achieved in the analog domain. Take a little left channel and feed it out of phase to the right speaker. Then, take a little right channel and feed it out of phase to the left speaker. But problems arise. The sounds sent to the sides of the acoustic field no longer are present in the middle of the field.

This is a problem with movie dialogue, where the actors are supposed to be in the middle of the scene while their voices don't seem to come from any particular point. It's also a problem with music when certain tones from a particular instrument are stretched further across the acoustic field than other tones from the same instrument.

Technical details are sparse, but Dolby says, "Using the symmetrical positioning of the speakers and applying our advanced patented technique, Sound Space Expander creates stable, solid stereo images with a very wide field that 'wraps' around the listener. By focusing on the left-right panned elements of the sound, while carefully maintaining the stable localization of the center channel, Sound Space Expander creates a clean and convincing sound field around the listener."

Sound Space for Headphones, the second Audistry technology, means precisely that. Because headphones are on top of or in the listener's ear, they can't reproduce the same variations in timing that the listener would get from speakers. This causes listening fatigue. Another drawback to headphones is that the center channel seems to be right inside the listener's head, instead of coming from a point somewhere in front of the listener.

Sound Space for Headphones presents the stereo from two input channels as if they came from distinct locations to the left, center, and right of the listener. It does this in a way similar to spreading the acoustic field for stereo speakers. It sends some of the left signal to the right ear and vice versa. Listeners can choose from seven settings that determine the dimensions of the soundscape. The technology also creates different room simulations. As a result, the listener can match the type of virtual room to a particular type of music.

Audistry's Natural Bass technology represents an attempt to get natural-sounding bass from small speakers, overcoming the speaker's natural rolloff characteristics at lower frequencies. The traditional perceptual-codec approach inserts artificial bass note harmonics. The listener's brain fills in the lower frequencies it believes should be there. This approach has harmonic distortion problems when multiple instruments are producing the original bass frequencies.

Instead, Natural Bass is custom-tailored to specific speakers. It then compresses the initial peak of a note and expands the sustain and decay, according to how much volume a particular speaker can handle without distortion.

Audistry's Intelligent Volume Control is like the "night mode" feature in True-HD. Based on Dolby's literature, it seems to be a basic compander. But since it's performed in the digital domain, this companding can "anticipate" sudden volume changes.

At this month's Audio Engineering Society conference in San Francisco, Dolby introduced Media Producer. This suite of professional audio mastering tools runs on the Mac OS X platform. It includes an encoder and a decoder, along with bitstream and metadata editing features. Dolby Media Producer suite supports Dolby Digital, Dolby Digital Plus, Dolby TrueHD, and MLP Lossless for the HD DVD and Blu-ray formats, as well as DVD-Video and DVD-Audio formats.

The live-sound Dolby Lake Processor helps sound engineers set up and tweak speaker systems at events. The Dolby DP600 Program Optimizer, also first shown at the AES conference, is an intelligent file-based audio loudness analysis and correction system. It helps broadcasters normalize the loudness of all file-based programming and commercials without affecting the original dynamic range.

CHIPS In September, National Semiconductor introduced the LM4562, a 34-V high-fidelity audio op amp for high-end professional audio applications. It boasts a total harmonic distortion plus noise (THD+N) of 0.00003%. At the same time, National also announced the LM4702 high-voltage stereo driver (Fig. 3).

Other op-amp specs include 2.7-nV/√Hz input noise density at 217 Hz, a 1/f noise corner of 60 Hz, and 600-Ωoutput drive even into capacitive loads as high as 100 pF. Slew rate is 20 V/µs, and gain bandwidth is 56 MHz. Thousand-unit pricing is $2.35 in small-outline ICs and $2.65 in dual-inline packages.

The LM4702 is a 200-V stereo audio amplifier driver that drives high-power discrete transistors, or Darlington pairs, in systems delivering 25 to 300 or more W per channel. It comes in three grades, differentiated by operating voltage level, performance specification, and guarantee.

The "C" version targets high-volume applications, such as stereo systems and audio/visual (AV) receivers. The "B" version includes a higher voltage rating of ±20 to 100 V, along with tighter specifications, for high-end audio, guitar amplifiers, professional audio amplifiers, and high-fidelity-powered speakers. The "A" version is fully specified, with all limits tested and guaranteed over voltage and temperature from ±20 to 100 V. It comes in a military 883-compliant, gold-plated TO-3 package.

Because the LM4702 doesn't drive speakers, National has published representative THD+N measurements on a reference design (AN-1490): 0.0006% into an 8-√ load. The "C" version costs $4.50 in 1000-unit quantities. The "B" version is $24.95 in 100-unit quantities. And, the "A" version is $150 in 25-unit quantities.

Texas Instruments introduced 300-and 200-W/channel Class Ds just before the AES conference. The single-channel TAS5261 can drive more than 300 W into a 4-√ speaker, while the two-channel TAS5162 can drive 200 W per channel at 6 √ and 125 W at 8 √. They're intended for high-end DVD receivers and mid-to high-end AVRs. In a sense, this is a story about silicon power density, because the big chip can drive up to 17 A.

With this generation of Class D amps, TI is achieving better than 95% efficiency with 110-dB signal-to-noise ratios (SNRs) and THD+N numbers below 0.09% at 125 W into 8 √. The 300-W TAS5261 comes in a PSOP-3 (plastic small-outline package), costing $5.25 each in quantities of 1000. The dual-200-W TAS5162 comes in PSOP-3 and high-power thin-shrink small-outline packaging (HTSSOP), but it isn't due out until December.

Meanwhile, fabless JamTech is in production with its JM2020 sub-ranging pulse-width-modulation (PWM) Class D amp (Fig. 4). The analogy is to analog-to-digital converters (ADCs), where a sub-ranging architecture employs a two-step approach.

In an 8-bit sub-ranging ADC, a first conversion would be completed with a 4-bit converter, and the result of the 4-bit conversion would be converted back to an analog signal (with an 8-bit-accurate digital-to-analog converter) and subtracted from the input signal, after which the result is again digitized to 4 bits. Then, the results of the first and second pass are combined.

JamTech is the first company to offer a commercial product that applies a subranging technology to Class D. The result is 16-bit audio quality with a 98-dB range of linearity, which ultimately provides accurate sound reproduction at low signal levels. It also reduces zero-crossing distortion. The JM2020 comes in a 64-pin quad flat no-lead (QFN) package and costs $3.10 each in 1000-unit lots.

Not every chip vendor concentrates exclusively on Class D for new designs. Maxim's MAX9777 and MAX9778 amps combine a stereo 3-W bridge-tied load (BTL) audio amplifier, stereo headphone amp, headphone sensing, and a 2:1 input multiplexer. The MAX9777 has an I²C interface, and the MAX9778 has a parallel-control interface. THD+N is 0.002%. Prices start at $1.25.

High-performance audio chips needn't be limited to stationary boxes either. In September, Agere announced a complete platform for entry-level cell phones (bill-of-materials cost for audio around $30) that bring downloaded music played on the phone up to CD quality. To someone too fussy about photography to tolerate the sub-brownie picture-taking offered by current cell phones and too un-hip to care about text messaging, this seems like a promising avenue for featurizing low-priced future phones.

Agere's TruEntry X125 platform comprises chips, software, and a product development kit. The silicon includes an integrated speaker amp, a polyphonic synthesizer, USB on-the-go, with charger, plus power management and battery charging.

A dedicated applications processor that can communicate with the phone's communications engine enables call-receiving calls and Web surfing while users listen to music. It complies with the Enhanced Data Rates for Global Evolution (EDGE) standard, but it runs three times faster (236.8 kbits/s). The platform's software component includes high-level development tools and a system layer with a standard set of cell-phone interfaces.

Also on the personal media player side, Texas Instruments' PCM3793 and PCM3794 stereo audio codecs target battery-operated applications, including digital still cameras and portable media players. The PCM3793 integrates a Class D amplifier that can drive 700 mW per channel into an 8-√ load. Key with the PCM3794, which is suitable for use with external amplifiers, is its low power consumption (7-mW in playback mode for extended battery life).

The feature set also is impressive. A notch filter with programmable center frequencies suppresses the sound of a camera zoom motor, three-band equalization, and digital stereo enhancement to spread the acoustic field from closely spaced speakers. Both chips come in 5-by 5-mm QFNs. The PCM3793 and PCM3794 cost $4.50 and $4.25, respectively, in quantities of 1000.

In August, Maxim introduced a 24-bit, I²S-compatible stereo DAC with 87-dB dynamic range and less than -87-dB THD+N. The MAX5556 integrates interpolation, de-emphasis, analog output filters, click-and-pop-free power-up and power-down, and line level outputs that swing 3.5 V p-p into a 10-k load.

The MAX5556 has a 16- to 24-bit I2Scompatible serial interface, while the MAX5557 will offer a 16- to 24-bit left-justified interface. The MAX5558 will feature a 16-bit right-justified interface, and the MAX5559 will have an 18-bit right-justified interface. All of the devices come in a narrow, eight-pin small-outline (SO) package and cost $0.99.

Somewhat earlier, Maxim announced the MAX5406. This stereo audio processor for boom boxes provides pushbutton volume, balance, and treble control without a microcontroller. It's designed to accommodate new multimedia applications in which the boom box is simply a front end for an audio player.

That said, there's a lot going on inside the Maxim chip. It intelligently controls the tone control wiper advance rate so that the (virtual) wiper advances at 4 Hz after the user presses the button continuously for 1 second and at 16 Hz after the button has been pressed for 4 seconds. Also, for applications that have RF signals on the same board as the audio, the MAX5406 integrates passive RF filters.

Besides boom boxes, potential applications include rear-seat entertainment controls, portable stereos, video-game consoles, and karaoke machines. THD is less than 0.01%, and output noise is less than 25 µV_RMS. It operates from a +2.7-to +5.25-V or a ±2.7-V dual supply. Packaging is a 4- by 4-mm thin fine-pitch QFN. Prices start at $2.17.

While the MAX5406 takes (and debounces) pushbutton inputs, the MAX5440 is a debounced rotary-encoder interface for volume and balance controls that connects directly to the audio circuit's power amplifier. An integrated bias generator provides the necessary (V_DD - V_SS)/2 voltage for unipolar input signals.

The object is to make OEM products smaller by eliminating mechanical pots. The chip's two 40-kΩresistor strings offer ratiometric and end-to-end temperature coefficients down to 5 ppm/°C for volume control and 35 ppm/°C for tone control. It operates from the same voltage rails as the MAX5406. Housed in a 24-pin shrink small-oultine package (SSOP), prices start at $1.47.