Implement A Single-Chip Multichannel VoIP DSP Engine

The Internet explosion has created an affordable communication pipe. It's not only facilitating e-mail correspondence and web browsing, though. Voice over Internet protocol (VoIP) communications is now garnering support as well.

Many packet-based voice technologies can be found in VoIP products, like VoIP gateways, voice-capable cable modems, IP phones, and PBXs. All of these new applications rely on DSP technology for multimedia processing. In fact, designers can implement a single-chip multichannel VoIP DSP engine that is scalable and can handle a full T1 trunk (24 channels) of compressed voice or fax communications over an IP network (Fig. 1).

VoIP systems carry the voice and signaling information that's required to interface telephony equipment to a packet network. Full-duplex real-time voice or fax communications are compressed by the VoIP DSP engine and encoded by a microprocessor into network-ready frames. UDP/IP headers are added to the compressed packets and then sent to the IP network. Packets received from the network are decoded by and fed into the VoIP DSP engine and decompressed. A dynamic jitter buffer automatically compensates for network delay variation to enable smooth real-time voice communication. Voice processing includes echo cancellation, voice compression, voice-activity detection, and voice packetization.

The processor implements the industry-standard RTP/RTCP packet-streaming protocol, adaptive jitter-buffer management, and fax-relay support. It also executes the media gateway control protocol (MGCP) or any other IP call control stack. These very complex protocols can be licensed from protocol companies such as RadVision, or from VoIP chip/subsystems vendors like Audiocodes. These companies license portable C language software application libraries with all of the modules that are needed to support VoIP, including RTP/RTCP and the MGCP call control stack.

The MGCP Client Stack application library provides communication with a call-control entity. It supports the IETF MGCP protocol, and any IETF-compliant call agent can control it. MGCP assumes a call-control architecture in which the call-control intelligence is outside the VoIP gateway and is handled by an external call agent. MGCP commands, received from an external call agent through the IP network, are decoded and executed on a lower-level stack that accesses the VoIP DSP engine. MGCP commands create new connections, delete existing connections, modify connection parameters, and report call and connection status.

Designers have a number of VoIP DSP-engine architecture options at their disposal. As in any other situation, it's important to choose the DSP that best fits the application and customer requirements. Considering the increased drive for higher channel density (maximum number of channels/calls per chip), designers must focus on the voice-algorithm load.

Preprogrammed VoIP DSP Engine It's possible to buy a standard DSP and write the software or purchase it from a third-party speech-coder vendor. This formidable task, though, requires a lot of effort. Instead, designers should employ preprogrammed VoIP DSP engines like the Audiocodes AC48XXX. These engines contain all of the algorithms and meet the tough real-time requirements. When designers face price pressures—which are usually associated with high-volume products—they should try an integrated solution with a DSP core and other peripherals in a single chip.

The VoIP DSP engine uses a DSP core, on-chip SRAM, and an interface to external memory to be cost-effective without compromising performance (Fig. 2). To avoid an unnecessary load on the DSP core, designers should include a DMA that autonomously transfers information from the pulse-code-modulation (PCM) interface to buffers in the external memory, and later from the external memory to the on-chip SRAM for digital signal processing.

When the signal processing is completed, the DMA moves the data back to the external memory and forwards the packetized data to the host interface at the microprocessor request. The DMA also performs the data transfers in the opposite direction for packets that are coming from the host interface and directed toward the PCM interface.

Carrying voice-over-packet networks creates new problems that didn't exist with traditional telephone networks, such as much longer network delays, lost voice packets, and echoes. A delay in VoIP applications comes from a combination of network and speech processing delays, which cause echoes and prevent fluent conversations. Network delays are a function of the capacity of the network and the packet processing. Speech-processing delays involve a combination of the actual compression and voice-frame collection. VoIP systems are able to solve this by implementing echo-cancellation algorithms.

IP networks may drop data frames. Data packets that contain voice are time sensitive. And unlike pure data packets, dropped packets can't be corrected through retransmission. This problem is addressed by "replaying" the last packet received during the interval when the lost packet was supposed to be played out.

The VoIP DSP-engine requirement set features various voice-compression algorithms with varying voice/audio quality and bandwidth efficiency, along with fax and data-modem capability. The fax and modem support is necessary to serve any call type, including fax and modem, rather then rejecting or disconnecting these types of calls. The requirement set is shown below:

Vocoders: G.723.1, G.729a, G726/727, G.711
Fax: V.17 group 3 fax relay
Modem: V.32bis
Echo canceller: G.165/G.168 compliance
Tone signaling: DTMF relay (EIA/TIA464 compliance)

Additionally, programmable transmit and receive gain, silence compression, and packet-loss compensation algorithms are required. Independent dynamic speech-coder selection per channel increases system flexibility.

Typically managed by a host processor, the DSP system acts as a bidirectional gateway between a telephony interface such as PCM and a digital network. After the signal from the PCM interface is processed by the echo canceller, a voice/fax classifier forwards it to the appropriate software module for further compression (Fig. 3). Fax channels are demodulated to extract the payload, which is forwarded to the packet network as a bit stream. Voice channels are compressed by one of the speech coder modules. The intervals of silence are subject to a very high compression ratio for optimal bandwidth utilization. A DTMF relay preserves any tone signaling superimposed to the voice.

Simultaneously, data from the host interface is processed to reconstruct the original signal (Fig. 4). Fax channels are modulated before being relayed to the PCM interface. Voice channels are decoded, and the silence intervals are interpolated by the comfort noise generator. A bad-frame handler compensates for the lost voice packets to minimize the disturbance at the receiving end.

Sophisticated vocoders like the G.723.1 and G.729 are very complex algorithms to implement. In contrast to modems, which are based on filtering and correlations, these very nonregular DSP algorithms contain many vocoder-specific algorithms—stochastic codebook search, pitch prediction, parameters estimation, vector quantization, and others. Such algorithms are a general mix of control code and a lot of mathematical calculations.

Thanks to its configurable long-instruction-word (CLIW) architecture, conditional execution, and the memory destination orientation (nonload-store), the Carmel DSP core is an ideal candidate for a VoIP Gateway DSP engine (see "DSP Core's CLIW Breeds VLIW Performance," below). The CLIW architecture offers the right balance between scalar/superscalar DSPs that produce good code size (but moderate computational power) and the very-long-instruction-word (VLIW) architectures that provide good computational power (but inefficient code size).

With the CLIW, software can be designed so control-code and register initialization can be performed with regular instruction (good code density). Specifically, the inner loops will be written as long instructions, like VLIW, to minimize the MIPS count. Together with the conditional execution, the CLIW offers the most powerful architecture for vocoder implementation.

For example, the code listing shows a G.729a stochastic codebook inner loop implementation on the Carmel (three cycles), compared to a more conventional DSP core (14 to 16 cycles). Note that this loop consumes 10% to 20% of the overall G.729a computational load, while it is less then 1% of the vocoder's code. The algorithm processing load on the CARMEL DSP is:

G.711 packetized PCM	0.2 MIPS
G.723.1 vocoder	7.5 MIPS
G.729a vocoder	5.0 MIPS
G.726/727 vocoder	5.0 MIPS
V.17 G3 fax relay	6.5 MIPS
V.32bis modem relay	9.0 MIPS
Line echo canceller	1.5 MIPS
DTMF relay	0.3 MIPS
Voice/fax/data classifier	0.4 MIPS
Real-time scheduler	0.5 MIPS overhead
Maximum load/channel	10.4 MIPS

This VoIP system design, based on the Carmel DSP core, supports a multichannel fax and voice-over-packet application. Optimized for maximum channel density, it can be easily scaled to a wide variety of solutions. Running at 250 MHz, the core can handle 24 channels of full T1 VoIP gateway. The DSP MIPS requirements for the Carmel-based implementation are nearly 50% lower than the MIPS required by other advanced DSPs.