Low-Cost Digital Audio Playback Doesn’t Have To Compromise Quality

1 of Enlarge image

Fig 1. A typical audio docking station or device accessory consists of a serial data-input channel, a deserializer, a sample-rate adjustment circuit, a DAC with output filtering, and an audio power-output stage.

Fig 2. Designers can take advantage of several typical playback solutions in use. First, a separate custom or off-the-shelf sample-rate conversion chip can be used to execute the conversion algorithm, modifying the input sample rate into a “common” rate that feeds into a simple audio DAC (top). An alternate approach leverages a more complex DAC with on-chip PLLs to lock onto the different sample data rates (middle). Third, a lower-cost solution performs SRC on the MCU (bottom).

Fig 3. The PIC32 MCUs employ a MIPS processor core that delivers adequate throughput to handle the SRC and leave plenty of headroom for all the other functions required by a docking system or other application.

Fig 4. In simplified form, an SRC circuit upsamples the input audio data, filters the upsampled audio, and then downsamples to deliver the audio data at the desired sample rate.

Fig 5. USB clock mismatch introduces DAC underrun or overrun. This error manifests as subtle audible clicks.

When designing audio docking stations and accessories for portable digital audio devices and other digital audio sources, designers are constrained by cost while trying to deliver the highest-quality audio playback.

In a typical docking station and device accessory, a digital audio source that plugs into the unit sends a serial stereo audio data stream over the dock’s USB interface. The dock captures the data stream while performing other crucial tasks and routes the stream to a digital-to-analog converter (DAC) at a specific sample rate (Fig. 1).

Since there are many possible sources of digitized audio, and not all of the sources use the same sampling rate, the dock typically adapts the sampling frequency to the source or converts the sampled data stream into a common data rate. Therefore, one of the challenges in the design of the docking system or device accessory is to perform the sample-rate conversion (SRC) without degrading the audio quality, and at the lowest cost possible.

To deal with these challenges, designers typically have used a dedicated SRC circuit and/or a high-end audio DAC that incorporates complex phase-locked loops (PLLs) to ensure flexible sample rates for stable communication of the sampled audio data.

The USB interface is a convenient interface for the transfer of audio data. But to meet the requirements of professional audio, the subtle loss of quality due to USB clock mismatch must be addressed. In addition to converting sample rates, a flexible sample-rate converter also helps alleviate USB clock mismatch issues.

Digital Audio Data Basics

When analog audio is converted into a discrete digital format, the analog signal is sampled at a frequency of at least twice the highest-frequency component in the analog signal, or the Nyquist rate. Therefore, an audio signal that spans 20 Hz to 20 kHz can be sampled at a data rate of 44.1 kHz, which in this case is the suitable Nyquist rate, so the signal can be reconstructed without aliasing when converted back to the analog domain.

In addition to the sampling rate, designers have to decide on the resolution of the conversion—will the analog signal be converted into a 16-bit word or a 24-bit word? For compact disc (CD) audio files, the standard is 16-bit resolution with a 44.1-kHz sample rate. However, there are higher-performance CD music options for audiophiles.

One such standard encodes the data with 24-bit resolution and increases the sampling rate to 96 kHz. For audio professionals, the audio files are encoded with a resolution of 24 bits per sample, which provides plenty of headroom when the studios mix and manipulate the audio in preparation for creating the master.

The resolution choice allows the designer to trade off sound quality versus file size even with compression. The larger the resolution, the better the audio quality, but also the larger the resulting storage file size. For instance, a 4-minute raw stereo audio file with a 44.1-kHz sample rate and 16-bit resolution requires a file size of about 42 Mbytes.

However, if the resolution is reduced, less storage would be required. With a resolution of 12 bits, the resultant audio file size would drop to about 30 Mbytes, but at the expense of relatively lower audio quality.

The USB interface can readily handle the streaming of high-quality audio. Its ability to deliver high-quality audio is quite evident, as it is popular among many audio users. With its universal ease of use, USB audio can transfer high-resolution and high-sample-rate audio with negligible jitter when packaged with a flexible audio interface.

Isochronous data transfer, among its various other uses, is used to stream audio data to and from a source at a constant rate in real time. Stereo audio data packets, with size governed by the sample rate of the audio stream, are transferred as part of USB frames every 1 ms on the USB full-speed link. USB audio also provides controls for common features such as volume, tone, gain control, and equalizers, among other control and processing units.

Cost And Quality Considerations

The differences in bit rates and sample rates mean that the hardware in the playback system or dock must be able to handle the differing-rate data streams. To do that, the system must either use a more complex audio DAC (ergo more expensive) that can phase-lock to each sample rate and adjust itself to each playback option; use an external sample-rate converter IC with the low-cost audio DAC; or convert all the streams into a standard sample rate and bit rate using an algorithm running on a microcontroller (MCU) that a simple low-cost audio DAC can handle (Fig. 2). The low-cost audio DACs are optimized for high signal-to-noise ratio (SNR), low power, and low clock jitter.

Many playback system designers use the solutions presented in Figure 2. In one scenario, a designer purchases an off-the-shelf SRC and audio DAC solution to perform the SRC and DAC playback. Another solution optimized for high SNR, low power, and minimal jitter combines an SRC IC with a low-cost audio DAC. However, that extra SRC chip adds to the total system cost. The other camp uses the more expensive audio DAC that contains the phase-lock circuits and switched capacitor filters to handle the fractional clocks so they can lock onto each of the different sample-rate data streams.

Although these approaches deliver viable solutions, they either require an extra chip or a higher-cost DAC. However, by using a relatively powerful 32-bit microcontroller or 16-bit digital signal controller (DSC) architecture that delivers performance levels of 40 to 80 MIPS, designers can eliminate the need for an external rate-conversion chip.

This is because the 32-bit MCU or 16-bit DSC architecture with adequate horsepower lends itself extremely well to perform sample-rate conversion on-chip, without compromising audio quality. Since the SRC output sample rate is constant, a low-cost, high-quality, 24-bit audio DAC can be used.

For instance, many current systems employ an 8/16-bit embedded MCU to handle all the housekeeping functions, such as communicating over the USB interface, display control, button control, volume control, and interface management. The PIC32 and dsPIC33E DSC high-performance MCUs can be used to perform SRC on chip. They also can perform all of the system interface and housekeeping functions.

By using a high-performance MCU, the sample-rate conversion can be performed while maintaining audio quality, since the intermediate calculations have higher bit resolution. This prevents quality deterioration due to truncation, and many devices have specialized instructions suitable for DSP operations.

The choice of system resources, such as the on-chip nonvolatile flash memory, static RAM, and peripherals available in the PIC32 or dsPIC33E families, allows a designer to select the appropriate device based upon system requirements (Fig. 3). Some devices also offer low power consumption for use in power-conscious applications.

SRC On The MCU

Here’s a quick look at how the SRC can be achieved for most common audio sampling rates. The SRC algorithm converts real-time audio data sampled at 44.1 kHz or 32 kHz to a sampling rate of 48 kHz. To accommodate the USB link frame rate, the size of the input audio data frame is an interval of 1 ms, with 64 stereo samples for 32-kHz input, or 88 or 90 stereo samples for 44.1-kHz input. The output consists of 96 stereo samples per millisecond.

In a typical SRC block, the incoming audio data passes through an up-sampler or an interpolation stage. The signal then passes through an anti-aliasing low-pass filter, followed by a down-sampler or decimation stage.

Consider the conversion from a 32-kHz to a 48-kHz sample rate where the conversion factor is 3:2. The input is up-sampled by a factor of 3, followed by a finite impulse response (FIR) filter with a steep roll-off to smooth the signal. A gain factor is applied to the smoothed signal to compensate for the loss caused by inserting the zeros.

The resulting intermediate signal is down-sampled by a factor of 2 to obtain an output audio signal at a sampling rate of 48 kHz (Fig. 4). Since down-sampling creates redundancy in the filtering of the decimated sample, the filtering of this sample can be skipped. This is a simplified form of the polyphase filtering technique, which improves the speed of the SRC.

For the 44.1-kHz to 48-kHz sample-rate conversion, where the conversion factor is 160:147, the input in the processing block is up-sampled by a factor of 2 by inserting a zero after every input sample and applying a FIR filter to smooth the signal. A gain factor is applied to the smoothed signal to compensate for the loss caused by inserting the zeros.

Polynomial interpolation is used to reduce every sequence of 147 samples at 88.2 kHz to 80 samples at 48 kHz. This ensures the sampling rate of the output audio data to be 48 kHz. Polyphase filtering is also employed in this mode to reduce redundancy. The filter dominates the overall processing load, as there is a tradeoff between filter length and the quality of the outputs.

A free SRC library is available in Microchip’s code library. The algorithm requires about 30 MIPS of the processor’s bandwidth, 6 kbytes of flash, and 1.5 kbytes of RAM, while providing good SNR.

USB Clock Mismatch

The USB specifications require a tolerance budget and a limit on the USB clock frequency as a way to achieve immunity to radio interference. The USB clock with the allowed tolerance budget results in reduced audio quality if there is USB clock mismatch. The real-time streaming audio samples must arrive at precise regular time intervals so the DAC can convert the digital samples to an analog signal with the expected constant rate at which it is configured.

The DAC clock that expects and receives the audio samples at a particular sample rate cannot miss even a sample. A missing sample manifests as a subtle click for the listener, since the DAC fails to generate an accurate representation of the streamed audio signal.

On an MCU or microprocessor with an embedded USB module, the USB clock is sourced from an independent clock such as an on-chip PLL with an external crystal oscillator of specific value. Since this USB clock source and the USB clock source of the audio source are independent, the mismatch in clocks introduces buffer overrun or underrun, causing clicks (Fig. 5).

An easy solution for the audio data underrun or overrun issue as related to audio-quality degradation is to use a good asynchronous sample rate converter (ASRC), where the input sample rate is estimated with jitter attenuation, and the internal filters are dynamically tuned for a new sample rate. However, a good ASRC is very expensive and the system still requires a DAC for analog conversion.

As an effective low-cost solution, the USB audio packets are buffered and the clocks of the DAC can be tuned to prevent underrun or overrun using a feedback mechanism. The feedback mechanism monitors the buffer level and ensures it stays within an acceptable range, while achieving at least the same quality achieved by an expensive ASRC, if not better.

One option is to use the DAC with a PLL that allows fractional tuning of the sample rate based upon buffer-level feedback. This solution is relatively less expensive than an ASRC chip.

Some PIC32 MCUs have a reference clock generation module that can be used to generate master clocks for audio DACs. This module also can be used internally as a clock source for the serial interface. It can be adjusted to tune the sample rate with good resolution to prevent buffer underrun and overrun.

This is the lowest-cost solution that also achieves low power. Alternatively, other PIC32 MCUs and dsPIC33E DSCs have a flexible independent system clock PLL and a USB clock PLL with independent clock sources. The USB PLL can be clocked with an external crystal to achieve the exact 48-MHz USB clock that’s required.

The system clock can be sourced from the fast RC (FRC) clock. On the PIC32 and dsPIC33E devices, the FRC clock is tunable and can be configured to tune the DAC clocks. This prevents buffer underrun and overrun while maintaining an acceptable DAC sample rate with a swing range of 0.2%.

Conclusion

The dock or device accessory needs to adapt to the sample rate of the source or convert the sampled data stream into a common data rate. Loss of quality due to USB clock mismatch must be addressed by tuning the sample rate. The onus of SRC and the sample rate tuning to prevent the buffer overrun and underrun can be transferred to the main processing unit, such as an MCU, rather than external chips to reduce the system cost while still maintaining high audio quality.

A 32-bit MCU or 16-bit DSC with adequate power to perform SRC and interface with a low-cost, high-quality audio DAC can also prevent any audio-quality degradation due to USB clock differences by tuning the sample rate. The versatility and flexibility of features on high-performance MCUs like the PIC32 MCU or dsPIC33E DSC devices can be used to deliver a professional audio-quality solution while keeping the cost and power consumption low.