Flexible Software Environment Eases Digital Audio Playback

Digital audio is a relatively young field. Its mainstream technology, based on the compact-disc (CD) standard that provides two channels of stereo audio information, is encoded using linear pulse code modulation (PCM). With the advent of the Dolby ProLogic format, multichannel audio offered enhanced audio performance by supplying two more channels of information (center and mono surround). Still, additional digital audio improvements, such as Dolby Digital, DTS, MPEG, and others, have been developed over the past decade. These provide six or more discrete channels of digital audio content to further enhance the audio ambience on playback.

To meet the diverse range of signal-processing requirements for those audio channels, programmable digital-signal processors (DSPs) can be used to implement multichannel decoder designs. To do that, the DSP chips can leverage a flexible software architecture that was developed to address the multiple channel and format problems. An architecture like this must be able to deal with issues such as decoder switching latency, undesirable transient noise associated with bit-stream transitions, post-decoder processing control and expandability, system-level volume management, and chip resource management.

The software architecture consists of an executive kernel, which provides a framework to manage the various resources on the DSP chip, algorithms, and post-processing functions. Also, the software architecture establishes design rules for decoder algorithms and post-decoder processes, simplifying the design of the software functions and their integration into the overall application.

This software architecture and kernel, although initially developed for use across the Motorola DSP563xx processor family, can theoretically be adapted for use on any standard DSP architecture. No ports to other architectures, however, have been crafted at this point. In the future, an overview of the software architecture and executive kernel code for the DSP563xx family will be available from the Motorola DSP web site, located at http://mot-sps.com/adc/dspaudio.html.

The last few years have seen a flurry of activity in multichannel audio on the consumer front. It's not uncommon to find six discrete channels of audio for home theaters, and room for eight or more in the future. Yet the large number presents a challenge, since the raw bit rate associated with multiple audio-PCM channels is tremendous.

To improve efficiency from a storage and transmission standpoint, advanced psychoacoustic models are utilized to analyze the signal. The model identifies the inaudible portion of the signal that is being masked by stronger signals. Inaudible portions are then discarded. By using these techniques, it's possible to attain data-reduction ratios of about 10:1 while minimizing audible artifacts. Some popular algorithms that fit this category are Dolby Digital, DTS, and MPEG audio-encoding formats.

As the different encoding formats proliferate in the marketplace, consumers must be able to access multiple decoders on a single platform, reducing the potential clutter of dedicated players for each format. Additionally, that single platform should be able to support the as yet undefined decoders of tomorrow. The software architecture provides management and a framework for such future enhancements.

The tremendous pace at which technology continues to evolve has made one thing clear. Instead of creating a fixed solution, it would be preferable to provide a template that has provisions to handle the basic elements required for inputting and outputting audio data. Furthermore, that template should be able to integrate new decoders as they become available. It should also allow an external host to control audio processing. And, the scheme should permit the addition of specialized software in a standardized manner (a plug-and-play-like environment) to process the audio signal after it has been decoded. These add-ons are called post-processing phases (PPPs). Examples of PPPs include tone control, ambience enhancement, and any other specialized audio algorithm.

These software-architecture objectives are a good match for the latest entertainment systems. They would fit well with DVD players, audio/video home theaters, multimedia-enabled personal computers, set-top boxes, digital televisions, and the new personal audio players that handle MP3 and other algorithms. Although the need to support the multiple formats is growing, the integration of multiple decoders into a seamless, elegant system is still a very difficult task to perform well. A number of complex issues inherent in multidecoder implementations—if not addressed properly—can manifest themselves as annoying operational delays and audio artifacts impacting end-user satisfaction.

To overcome the basic hurdles in implementing a multichannel, multidecoder (MCMD) environment, a software framework that acts as a unifying structure can be implemented on a DSP architecture. Such a framework can help overcome many of the shortcomings of previous attempts, eliminating various annoyances that cause consumer dissatisfaction with audio quality. These annoyances include, but aren't limited to, decoder switching delays, unexpected output transients, and limited post-processing features.

In a system that supports multiple decoder formats, a common problem associated with task switching and related program latencies arises. As input formats dynamically change (i.e., Dolby Digital bit stream to DTS bit stream), transitions from one decoder to another incur a latency (normally on the order of seconds) connected with the loading and setup of the new decoder software. This is particularly an issue when decoder algorithms are stored off chip and require loading into the DSP chip (Fig. 1a). Often, this results in the loss of the first couple of seconds of audio from a new source.

On top of the latency and loading issues, a multiple-decoder system must handle the potential introduction of undesirable transient signals at certain bit-stream transition times. Problems like these usually manifest themselves in the form of an unpleasant pop, click, or sporadic noise. These glitches are particularly hard to control because bit streams are quite frequently random at the time of transition. They also have unique behavior dependent on the source player. For example, many DVD players will produce incompletely encoded frames, or generate what could be interpreted erroneously as PCM data, when the bit stream changes or the player is in pause, stop, or fast-forward mode.

One of the most important elements of any MCMD system, though, is the handling of decoded audio data for subsequent post processing. Post-decoder bass management, 3D virtualization, mixdown features, delay management, and other essential MCMD functions aren't generally supported due to the complexity of designing these features into multiprocessor platforms. This has led some system manufacturers to move toward the integration of multiple decoders into a single environment for system-level optimization.

A traditional implementation, then, might employ what now would be considered an expensive and inefficient means to resolve the support issues. In past years, a single DSP chip didn't have all the resources—in terms of both computation power and memory. Traditional implementations might have used multiple decoders and dedicated post processors (typically programmable DSPs), or required a large external ROM for storage and support for the decoders. The soft decoders would typically be stored in a host microcontroller's memory or in a separate EPROM.

The framework described here helps eliminate these issues while adding features that are designed to enhance system flexibility and adaptability, reduce complexity, and improve design cycles. A much simpler system architecture results, because a single DSP-based decoder can host the MCMD software (Fig. 1b). All the decoders, along with the PPPs' algorithms, can now be integrated on a single DSP chip.

The software architecture manages a variety of system-level functions to control the decoding operations and post processing. Also, it can eliminate the various effects that cause user dissatisfaction. Several functions are performed by the software architecture, too, including housekeeping and coordination. It can interface to a host processor, decoders, and PPPs. Bit-stream autodetection and dynamic task-switching (changing decoders) are conducted as well. So are computation of resources, memory monitoring, volume management, and output control.

A host-transfer protocol defined by the software architecture controls the processor peripherals, the decoder, and the post-processing control variables. This protocol uses the memory locations of control and processing data to define opcodes that transfer information to and from a host. This memory-location-based protocol is necessary to enable the integrated post processes to operate concurrently and dynamically. Customized initialization of the processor peripherals or dynamic loading of post-processing functions in a live system is possible via this flexible host protocol.

Processing changes or filter-coefficient loading can be completed dynamically to handle sample-rate changes, for instance. Memory is allocated for controlling new post processes as they're added to the system according to the software architecture's design rules. This system simplifies the interfacing process for the development of post processing and provides for expandability and modularity. Upgrading an MCMD design or developing a new tier of product with differentiating features becomes an easier task for the manufacturer as a result of this flexible control.

As with the host-control information, the software architecture is responsible for detecting the type of data at the input. Since the system cannot predict the type of input data a consumer may interject into the audio system, erroneous bit streams may be introduced as a result of user ignorance. Or, momentary errors could occur in otherwise valid bit streams. In either case, the system must react quickly to eliminate sporadic noise on the output.

The software architecture does this through auto-detection of input bit streams and then rapidly task switching to the appropriate on-chip decoder algorithm. At the same time, the software monitors the spurious errors and suppresses them as needed to eliminate transient pops, clicks, and glitches. The processor in use maintains the decoders in on-chip ROM, reducing task-switching latency by a factor of 10 to 20 over similar designs with off-chip decoders.

The embedded software architecture is "open" in the sense that it has a level of standardization facilitating a plug-in environment for specialized algorithms in the form of PPPs. The existence of the software architecture in ROM offers the added benefit of having a decoder chip that automatically boots up, configures itself, and starts decoding audio bit streams every time it's powered up. This automated startup process and system support lets PPP developers focus on algorithm development, rather than system-level issues like data-buffer handling and control interfacing.

Once a PPP has been developed, the software architecture allows it to be nonintrusively downloaded in the background while the decoder chip is still decoding audio. And, it's very easy to change operating parameters of PPPs on the fly. This provides a flexible development environment that's conducive to algorithm prototyping as well.

Prior to and during decoding, processing information is required to be transferred between the various components operating on the data. This communication function is necessary to integrate multiple processes into a single processor, since the processing of one PPP could affect the operation of subsequent PPPs.

Local and global data structures are configured by the software architecture. Each component, then, has its own local control block, as well as a means to communicate to subsequent processes via the global control block. Information such as sample rate, number of channels processed, and volume levels can be transferred between post processes. This communication protocol enables PPPs to change sample-rate-dependent coefficients if necessary, process only the channels with relevant data, or compensate for varying volume levels. Additional variables can be added to the structures as the system functionality grows.

In an open architecture that permits the addition of new functionality by multiple developers, it becomes necessary to provide some means to manage memory utilization and computing resources—in units of million instructions per second (MIPs). Upon initialization, the software architecture will read and write memory to verify that enough memory is available for the decoders prior to activation. Memory also is allocated to support PPPs by the software architecture, which provides tables of unused memory for the developer to implement. Subsequently, developers can manage this while integrating various PPPs.

The software architecture also provides a "MIPS meter," which is a sort of tachometer for each block. The meter can be monitored to determine the number of MIPS required by a particular software component or to determine how many MIPS are available for new PPPs.

When a new post process or decoder is enabled, the output level should remain at a relatively constant perceived volume. Users get annoyed when an audio stream is too low and then too loud. To control the sound level, a volume-management feature can be integrated into the software architecture to handle the variation in decoder and post-processing levels.

The volume manager sums the volume changes of the enabled processes for each channel and provides this information for use in one of several volume-implementation methods. Volume compensation can be applied through a digital-volume PPP or an external volume component. Volume-compensation information can be retrieved by a host and applied directly to an external volume chip. In some cases, the volume chip can be driven directly from the processor.

Input data handling, buffering, and decoding is best managed by the decoder itself. The output process is fully controlled by the software architecture. Decoders and PPPs can then perform buffer-to-buffer processing, leaving the output handling to the software architecture. That simplifies the development of decoders and PPPs by eliminating the need to develop an output handler from the list of tasks to be performed by the developer.

At the heart of the software architecture is an executive kernel that controls the decoder chip's operation (Fig. 2).

Viewed from a high level, the executive would control not only the decoder functions, but also the PPPs' options and the host interface. Along with basic sequencing, the executive provides intercomponent communication and monitors time-based processing metrics. It also coordinates the operation of various software modules by interfacing to and interacting with four main components: the host controller, the decoders and input drivers, the PPPs, and the output drivers (not shown).

All communication between an external controller, like a host microcontroller, and the DSP chip must occur through the executive communication protocol running on the DSP. Host-controller communication is implemented by key software control and status memory locations (or "registers"). The protocol lets the host access memory-mapped hardware registers and reference absolute addresses in on-chip memory. It does so with single-word transfer and multiword block transfer commands. This permits the controller to perform hardware initialization, download code, or transfer data while the decoder chip is running. Also, it implements a table of indirect memory accesses to map standard commands, or "opcodes," which define commands for executive, decoder, and standard PPP operation.

The protocol provides a host controller with the ability to perform standard operations, like periodic checks of overall system status, decoder mode, sampling rate, and active channels. It also gives the host controller simple commands to change system modes, decoder modes, downmix options, and standard post-processing operating modes. This is all accomplished by mapping commands to each executive software module's control block. By implementing the executive's host interface in this manner, the host can communicate effectively to all the components of the system.

Instead of using a generic input-driver function, the software architecture tightly couples the input driver to the respective decoder. While this doesn't let a single input driver be used across decoders, it does result in a more efficient implementation for each decoder. And, since it's conceivable that the data format associated with each decoder type could be different, this tradeoff enables the architecture to better accommodate future decoders. The decoders themselves operate in conjunction with their specialized input driver and respective decoder algorithm. When implemented for the DSP563xx, decoder algorithms can be optimized for the minimal use of memory and MIPS, while adhering to the design rules of the software architecture.

A very important part of the solution, both in flexibility and enhanced functionality to the end user, is the PPP structure. Since decoder bit streams routinely pack data into data blocks, it makes sense to implement post-processing control using a block-based scheme. Frequently, this turns out to be a processing advantage for PPP software, since processing-loop overhead can be absorbed over a larger set of audio data. The software architecture provides the host controller with standard commands to handle the dynamic initialization and activation of custom PPP functions. It also lets the host download PPP code or modified parameters without disturbing the DSP software that's decoding the audio stream. The processed audio buffer is at last passed to the output driver, which sends audio data out of the decoder chip (Fig. 3).

When processing a PPP stream, the incoming bit stream from a source such as a laserdisk or DVD player could be a Dolby Digital, DTS, or MPEG-2 multichannel movie soundtrack. Other sources of audio information include DTS multichannel CDs, HDCD-encoded (high-definition compatible digital) CDs, as well as conventional audio formats (PCM). A double-banked buffer scheme is used throughout to store data in the input and output buffers. The bit streams get packetized into words and sent to the input buffer. At this point, auto-detection software within the context of decoders recognizes the bit stream being received and determines the correct decoder to be used. This task needs to be performed relatively quickly. At the same time, spurious noise has to be suppressed. The data in the buffers, then, must be scrutinized very efficiently.

The system detects the incoming bit stream, and the appropriate decoder algorithm is invoked to process the input buffers. Data is uncompressed, decoded, and then copied as blocks of audio samples into the corresponding channel buffer (i.e., left, right, center). The decoder extracts information from the incoming bit stream, such as decoder type, sample rate, and number of channels. This material is placed in the post-decoder control (PDC) block (Fig. 4). Important system information, such as buffer locations and volume levels on each channel, also is set in the PDC. It's then available to all PPPs and any external controller.

Next, the software architecture invokes the PPP chain to process audio that's delivered by a decoder. Provisions are made for numerous slots in which PPPs may be located. The executive addresses each PPP slot sequentially and determines whether the slot is active or inactive before it is invoked (Fig. 3, again). The host controller has the ability to load in PPP functions and then activate or deactivate them by controlling the contents of the slot locations that the executive addresses. Some of the slots are occupied by standard PPPs that exist in ROM (i.e., Dolby Pro Logic decoding, bass manager, volume manager, THX processing). Meanwhile, other slots may contain user-specific PPPs such as 3D virtualization, soundfields, delays, and equalization.

A block of global variables is made available to the PPPs. Among these variables are pointers to the output buffers for each channel. While the software architecture is "ping-ponging" the double-banked input and output data buffers, the rest of the system doesn't need to be concerned about those details. The contents of the data-buffer pointers are synchronized by the software architecture to always be valid. All the PPPs process the audio samples in place within the output buffer (similar to a read-modify-write operation). At the end of the PPP chain, the audio data is sent to the output driver, where the channels are paired (left/right, left surround/right surround, center/subwoofer, aux left/aux right) and then sent out of the decoder chip.

Each individual PPP, in turn, has an associated post-processing control block that contains parameters of local interest (Fig. 4, again). The host processor has the ability to directly manipulate these parameters, providing a mechanism to alter the nature of the processing algorithm on the fly. It's important that certain information be shared among PPPs to let them work in a modular and independent manner. Also, in certain cases, the order in which the PPP functions are executed could have auditory effects.

As an example, a 3D-virtualization PPP doesn't work well if a downmixing function precedes it. By sending simple commands from an external controller, these PPPs may be turned on, modified, or bypassed in real time without having to reload the entire system software. That would cause breaks in the audio stream. If PPPs are written in a position-independent manner by using the relative addressing features, the software may be reused in any configuration or with any number of PPPs.

Auxiliary channels also are supported within this software architecture, further expanding its flexibility. For instance, a second room may be supported with a two-channel virtual surround mixdown while simultaneously processing the primary room's 5.1-channel surround output. One system issue that's solved by the software architecture is the ability to keep volume consistent. When different decoders are used and multiple post-processing configurations are considered, it can be a challenge to maintain the system's final operating volume. The software architecture provides volume management to eliminate this problem. It also allows the end customer to turn post-processing phases on and off or change the type of source material without having to constantly adjust the system volume.

Third-party developers with unique audio solutions can port their solutions to this architecture. This will result in a large library of PPPs that provides system designers with many options. Developers need to test and ensure that the algorithm works as a PPP within the software architecture without worrying about what other PPPs are running concurrently. By taking a slot approach, the software architecture eliminates the need to share DSP source code (thus separating individual intellectual property). Different object modules that work within the software architecture from various developers may be assembled at link or run time to operate together in an audio system. It's clear that the existence of the software architecture has promoted a true system solution that makes the designer's job much easier while giving the end customer a better product.