Use Programmable DSPs For Cost-Effective PCI Digital Audio Design

Designing computer systems involves difficult cost-performance trade-offs. It's a given that systems offering the highest performance levels--high concurrency, sophisticated new features, and the highest quality levels--will always require hardware acceleration. At the other end of the spectrum, the increasing performance of host processors will never outpace the growing demands of applications for additional computation. But shrewd designers can find a successful middle ground between the two extremes. By way of illustration, consider what's happening with PC audio.

PC audio is undergoing a profound metamorphosis. Driven by new 3D games, published DVD movie content, and live audio downloaded from the Internet, future PCs will be offering more audio-related features and higher levels of performance. Music synthesizers will sound more realistic and be capable of creating a wider variety of sounds. It will be possible to position sound effects 360° around the listener without the expense and hassle of wiring additional rear channel speakers. Users will be listening to 5.1-channel sound tracks from DVD as well as musical accompaniment to Internet web sites. These applications are only the tip of the iceberg.

However, these new capabilities can only reach mass market prices through architectural innovation. Audio signal processing consumes significant computational resources. Running all the audio functions on the host processor attenuates system performance on other applications. Hardware accelerators preserve the performance of the host CPU, but they add cost. Innovative system design minimizes the additional cost and saves system performance.

Audio Accelerators Needed System architects have a spectrum of options available for implementing audio subsystems that trade hardware cost for performance. The ends of the spectrum are well-defined. At one end are the basic, sub-$1000 systems. They run most signal-processing tasks on the host CPU, an architectural philosophy known as Host Signal Processing (HSP). This is done using an audio codec to provide baseline recording and playback functionality. At the other end are the performance systems. They rely on hardware ASICs or DSPs to implement signal-processing tasks.

To understand the rationale for adding hardware acceleration, hypothesize a set of signal-processing functions at a particular performance level, running entirely on the host CPU. Recognize that each real-time audio task steals cycles from other graphics, data, or numeric processing tasks. So, above 10% to 20% loading, the signal-processing burden detracts noticeably from the execution speed of the applications.

For example, a game's graphics will be noticeably slower; the game might respond to user control more sluggishly. Consumers who purchase a computer with a 300-MHz Pentium processor, half of which is consumed by signal-processing tasks, will find that applications will run as if the computer had a 150-MHz Pentium processor. Also, cramming too many real-time algorithms on the host CPU can lead to unstable systems with high technical support costs, and ultimately consumer disappointment and frustration.

After defining a set of signal-processing functions that do not impose an excessive burden, consider the following three points:

* Signal-processing capabilities span a range. If the baseline system includes a wavetable synthesizer with 24 voices, adding a hardware accelerator could increase the number of synthesized voices to 64.

* Signal-processing performance spans a range. Many classes of algorithms sound or work better when more computational horsepower becomes available, and the demand for MIPS always seems to be insatiable.

A wavetable synthesizer, for example, could use "layering" (the application of more than one oscillator per voice) to make the synthesized sound more realistic. Adding even a single additional layer essentially doubles the computational requirements, and high-quality synthesizers often provide three or four layers. Adding a hardware accelerator improves sound quality in the case of wavetable synthesis.

* Signal-processing concurrency spans a range. If a hypothetical system offers wavetable, adding a hardware accelerator could make it possible to offer 3D positioning and a hands-free speakerphone as well. There is no limit to the cleverness of algorithm designers, so there will always be other useful algorithms cropping up. To run them concurrently would require PCs equipped with hardware accelerators.

More sophisticated capabilities, higher levels of performance, and concurrency all require more computation. This additional capability can be provided at any given time by a hardware accelerator, or perhaps in the future by faster CPUs. Thus, performance systems are bellwethers. As CPUs get faster, PCs using them may acquire additional capabilities that previously required hardware accelerators to implement. However, the other applications also will increase their computational requirements, limiting the CPU performance available for signal processing. Consequently, it's very difficult to predict when host processors will be sufficiently powerful to run not only today's signal processing tasks, but also tomorrow's applications. These two architectural philosophies, host signal processing and hardware acceleration, define the endpoints of the spectrum of implementation options.

The Balanced Architecture The vast middle range of the spectrum is served by an elegant "Balanced Architecture." This architecture makes it possible to achieve approximately the performance of the high-end systems at mainstream consumer price points. The Balanced Architecture reconciles the conflicting objectives above by using a hardware accelerator and the host CPU in a uniquely synergistic manner (see the figure).

This reference design of the Balanced Architecture from Analog Devices is called SoundMax 64 with dynamic interprocessor voice allocation (see "The AD1818A-Based SoundMax 64 Accelerator below,"). Additional features include:

*Sample rate conversion from as many as eight independent sample rates, and mixing of the results.

*DirectInput-compatible analog/ digital game port support for joysticks such as Microsoft Sidewinder.

*Chorus, reverb, parametric filtering, and dynamic LPF audio effects.

*ACPI- and On Now-compliant PCI power management.

To understand the Balanced Architecture concept, consider first the characteristics of the hardware accelerator. The ideal hardware accelerator (for either dedicated or balanced architectures) incorporates both a programmable core and some fixed-function circuitry. The programmable core preserves the advantages of programmable systems. These advantages are:

* Software solutions afford the possibility for the same hardware to be reconfigured to serve multiple functions--for example, a Dolby Digital decoder in a DVD playback scenario or a wavetable synthesizer in another.

* Software solutions allow field driver upgrades to support evolving standards--for example, alternative multichannel audio decoders such as MPEG-2, DTS, or Sony's new DSD standard.

* Software solutions accelerate development. Bugs in fixed-function hardware require redesign and modifications to masks; bugs in software can be fixed by changing the code.

* Software solutions make it easy to accommodate OEM customization in audio algorithms and feature sets such as dynamic equalization for a specific set of OEM speakers.

To reduce cost, the hardware accelerator implements common immutable operations in fixed-function form, thereby saving DSP MIPS for the functions that benefit most from them. A prominent example of a common audio operation is the sample-rate conversion and mixing required by Microsoft's DirectSound API. Mixing is required whenever there are more than two sources of audio.

The PC98 specification co-developed by Microsoft and Intel specifies support for at least seven sample rates, so it is likely that the mixing operation also will require sample-rate conversion--an extremely compute-intensive function. The requirements for sample-rate conversion and mixing are unlikely to change dramatically, so it is sensible to build these operations into fixed-function hardware.

The virtue of the Balanced Architecture becomes apparent when considering the requirements for the wavetable synthesizer. Historically, the polyphony requirements for wavetable synthesizers were determined by the General MIDI standard, which calls for 24 voices. However, the polyphony requirements will be driven to 64 voices this year by a component of the DirectMusic Core that supports downloadable sounds through the MIDI Manufacturers Association's DLS 1.0 specification. DLS lets developers use the wavetable synthesizer to support sound effects as well as the musical accompaniment.

For example, the game could download the sound of a laser blast once, and then trigger it numerous times using MIDI commands. Or better yet, it could download the engine drone of a race car and then pitch-shift it in the synthesizer as the car accelerates or decelerates. The 64-voice polyphony empowers wavetable synthesizers to support not only the requirements for musical accompaniment, but the requirements for sound effects as well.

Although synthesizers must be capable of synthesizing 64 voices, they will rarely be called upon to deliver that many. Music scores generally require only 12 to 16 voices, so a synthesizer capable of only 32-voice polyphony will often suffice, even with the additional requirements imposed by sound effects. A hardware accelerator configured for 64-voice polyphony devotes expensive, power-consuming silicon real estate to functionality that will rarely be utilized.

The balanced solution takes advantage of these statistics. A MIDI dispatcher running on the host receives the commands that specify the desired sounds. It forwards these commands to the hardware accelerator as long as the accelerator has not reached its limit of 32 voices. After the hardware accelerator reaches this limit, the MIDI dispatcher routes any requests for additional voices to a synthesis engine running on the host. Both synthesis engines implement the same algorithm and use the same data set (which resides in host memory), so the sound that emerges is the same regardless of the platform on which the voice originated.

Because 32-voice polyphony covers most situations, the average load on the host CPU is negligible. Furthermore, the cost of a 32-voice hardware accelerator is half the cost of a 64-voice one. Also note that this balanced solution uses abundant, inexpensive, host DRAM to store the sample set, thereby avoiding the cost of additional memory. The hardware accelerator accesses the sample set over the PCI bus, which offers bandwidth to spare for this application.

Another advantage of this balanced solution is that it supports Dynamic Concurrency Scaling. To achieve the highest level of performance possible, DSP tasks always consume as much of the DSP as is available. If something happens that requires execution of an additional task, the running tasks receive a message requesting that they free up resources.

For example, in a pure gaming scenario, the wavetable synthesizer can run in a high-performance mode. If the PC switches to a scenario that also requires a modem (for telegaming, perhaps), the DSP frees resources either by shifting some voices over to the host engine, or dropping some refinements of the wavetable synthesis algorithm. Dynamic concurrency scaling assures that DSP resources are always utilized to the fullest extent possible, given the concurrency requirements at the moment.

The hardware accelerator connects between the PCI bus and the AC 97 audio codec. The bandwidth of the PCI bus supports audio streaming, and scatter-gather DMA provides access to host memory for storing the wavetable sample set. Saving the cost of the local memory chips required in ISA systems in itself provides a compelling rationale for the shift to PCI.

Another use for the PCI bus-mastering interface is for so-called "digital-ready" PCs. Digital-ready means that all digital audio created in the PC can be mixed in the PC and sent to USB speakers or IEEE 1394-enabled consumer A/V appliances. Deferring the inevitable conversion from digital to analog may preserve audio fidelity. In fact, someday it will be commonplace to render the digital audio to analog outside of the PC. Until then, Digital Ready configurations make it possible to use digital links optionally. The primary digital audio output of the hardware accelerator is through an AC-link to a companion AC 97 Audio Codec.

In the foreseeable future, systems offering the lowest cost will always be based on HSP. During the time a particular feature set migrates from pure hardware acceleration to pure HSP, the balanced architecture will make it possible for moderately priced systems to support the capabilities of the performance systems with only a trivial cost-performance penalty.

The AD1818A-Based SoundMax 64 Accelerator

The reference design includes the AD1818A accelerator, one or more audio codecs, and all required software to implement the balanced architecture. The design combines full-featured DirectX audio and telephony acceleration, supports PCI bus master/targeting, and complies with the Rev. 2.1 PCI bus interface. The 64-channel scatter-gather DMA support enables efficient use of host memory. The bus redirection mechanism supports USB and 1394 audio peripherals. The PCI and DC 97 controllers support advanced configuration peripheral interface (ACPI) and On Now compliant PCI-PM advanced power management.

The AD1818A uses a programmable DSP core to implement certain functions driven by two application scenarios:

Gaming

* DirectSound3D: eight streams

* DirectMusic: 32-voice DLS-1 wavetable synthesis

* Spatial enhancement (broadening of the stereo sound stage)

DVD playback

* Dolby Digital or MPEG-2 audio decoding

* 3D positioning (for virtual surround)

The SoundMax 64 includes a 66-MIPS, 733-MOPS DSP core from Analog Devices. It has integrated 16 kbits by 16 and 16 kbits by 24 SRAMs (80 kbytes)--sufficient for the computational needs of polyphonic wavetable synthesizers, 3D positioning algorithms or Dolby Digital (AC3) decoders.

The 64-channel DirectSound mixer is equipped with variable-sample-rate conversion to relieve the host CPU of a significant computational load. Sampled data streams must all be precisely the same sample rate before they're mixed, to avoid clicks, pops, and distortion due to rate mismatch. Variable rate conversion answers this requirement transparently--the AD1818A supports eight sample rates simultaneously, including eight independent time bases. The on-chip OPL3-compatible music synthesizer, and MPU-401-compatible MIDI UART provide legacy music synthesis.

The AC 97-compliant AC-link interface uniquely supports four analog-to-digital converters and six digital-to-analog converters, totaling ten simultaneous streams of analog input and output (see the figure).

By combining an AD1818A with multiple AD1819A codecs, the following architectures are supported:

* Single codec designs: DLS Level 1 wavetable synthesis with spatial enhancement, DirectSound3D audio localization, Dolby Digital decode with virtual surround sound output

* Dual codec designs: Same as above plus concurrent V.34/56 kbytes/s data/fax/voice modem

* Triple codec designs: Same as above plus Dolby Digital decode with 5.1-channel surround sound output.