Understanding MPEG Audio Codecs From mp3 To xHE-AAC

Audio codecs are one of the fundamental building blocks of modern media systems. The first and still most prominent MPEG audio codec is mp3, which first appeared on the market in 1998. Since then, Fraunhofer IIS and other contributors in ISO-MPEG have developed and standardized several more audio codecs. Each MPEG audio codec has changed or will change the ways media is consumed.

ISO-MPEG: The Codec Cradle

MPEG audio and video coding started in January 1998 upon the foundation of MPEG as a working group of the International Organization for Standardization, or ISO. The goal of the “Moving Picture Experts Group” was to develop standards for coded representation of moving pictures and audio. The first MPEG meeting had 25 attendees. Today, more than 350 experts from more than 200 companies and organizations from about 20 countries gather at MPEG meetings.¹ The formal name of the group is ISO/IEC JTC 17SC 29/WG 11.²

Unlike other standardization organizations, the development work of new standards does take place during MPEG meetings. The group, which meets four times a year, reviews technical submissions and assigns work to members. At the end of the development and standardization process, a new ISO standard is published and made available for download on the ISO homepage.³

The most prominent examples of MPEG standards are MPEG-1, MPEG-2, and MPEG-4. However, there are many more versions, such as the latest audio standards in MPEG-D or http adaptive streaming in MPEG-DASH.4 A very prominent part of MPEG is, of course, video coding. For example, the most important video codec these days, MPEG-4 AVC/H.264, was developed and standardized in MPEG. Therefore, what’s true for audio is certainly true for the MPEG video standards: Today’s media world would look completely different without MPEG codecs.

MPEG Layer 3: mp3

The music industry has transformed dramatically since the arrival of mp3, changing the way consumers buy and access their music. It’s still the dominating format for music distribution because an mp3 file plays everywhere on every device. Development of the mp3 technology began in the late 1980s and broke through with the introduction of the file ending “.mp3” in 1995. In the same year, Fraunhofer IIS presented the first hardware prototype of an mp3 player. The “mp3” file ending quickly became a synonym for the standard name “MPEG Layer 3,” but it took another three years until the first mp3 players hit the market in 1998.

Specifically, mp3 is classified as a perceptual audio codec. Such codecs are based on perceptual models of the human auditory system. These models describe which elements in an audio signal can or cannot be perceived by the human ear, regardless of whether or not the listener has a highly trained ear. Perceptual audio codecs acknowledge this fact by analyzing the audio signal. Therefore, the aspects perceptible with the human ear are prioritized and represented very carefully in the final audio file. As a result, listeners can’t tell the difference between an mp3 file and the original if the bit rate was chosen properly (i.e., at least 192 kbits/s).

Not only is mp3 based on perceptual models, but also most current audio codecs of the MPEG family reduce data rate and file size by cleverly exploiting the capabilities of the human auditory system. This is also true for the AAC family of audio codecs.

AAC Family

Even before the markets adopted mp3 on a broader scale, MPEG started to develop another audio codec. The goal was to achieve the same high audio quality as with mp3, but at a significantly reduced data rate. This marked the birth of a whole new family of codecs, from AAC in 1994 up to Extended HE-AAC in 2012.

The first version of the AAC codec was standardized in 1994 as Advanced Audio Coding (AAC) in MPEG-2. Based on their experience of the development of mp3 and other proprietary codecs, AT&T, Dolby, Fraunhofer IIS, and Sony started from scratch to design a new state-of-the-art audio codec. The MPEG-2 AAC codec was extended in the MPEG-4 standard by adding perceptual noise shaping (PNS), spectral band replication (SBR), and the parametric-stereo (PS) tools.

AAC-LC

The basic MPEG-4 AAC profile is the “AAC Profile,” which is commonly called AAC-LC (low complexity). It provides audio quality up to transparency. In the audio-coding domain, transparency stands for an audio quality that even expert listeners’ so-called “golden ears” can’t distinguish from the original, although the original and the coded signal aren’t mathematically equivalent. Therefore, AAC-LC fulfills even the highest quality requirements for broadcasters.

Typically, AAC-LC ranges from 128 to 192 kbits/s for stereo and 320 kbits/s for 5.1 multichannel signals—all encoded as discrete channels. With sampling rates from 8 to 192 kHz, bit rates up to 256 kbits/s per channel, and support for up to 48 channels, AAC-LC remains one of the most flexible audio codecs. The most prominent application of this profile is Apple iTunes. It’s also used in the Japanese ISDB standard for digital TV.

HE-AAC and HE-AACv2

The MPEG-4 “High Efficiency Profile” (HE-AAC), which combines MPEG-4 AAC-LC and the parametric SBR tool, further reduces overall bit rate while maintaining excellent audio quality. Running below 128 kbits/s for stereo signals, HE-AAC shrinks the bit rate up to 30% compared to AAC-LC at equivalent audio quality.

For HE-AAC, the lower part of the audio spectrum is coded with AAC-LC, while the SBR tool encodes the upper part of the spectrum. SBR is a parametric approach that uses the relationship of the lower and upper part of the spectrum for a guided recreation of the signal’s entire audio spectrum. To reduce the bit rate even more, the AAC-LC-encoded lower part is coded with half the sampling frequency of the overall signal.

HE-AAC typically uses 48- to 64-kbit/s data rates for stereo and 160-kbit/s rates for 5.1 multichannel signals. Like AAC-LC, HE-AAC supports sampling rates from 8 to 192 kHz and up to 48 channels, as well as audio-specific metadata.

The “High Efficiency AAC v2 Profile” (HE-AACv2) adds the PS tool to HE-AAC. It thus applies a parametric approach to coding the stereo signal, achieving a further reduction in bit rate. Instead of transmitting two channels, the PS encoder extracts parameters from the stereo signal. This enables reconstruction of the stereo signal at the decoder side and produces a mono downmix, which is HE-AAC encoded.

The PS data is transmitted together with the SBR data in the ancillary data fields of the AAC bit stream. The decoder decodes the mono signal and the PS decoder recreates the stereo image. Transmission of the HE-AAC encoded mono signal with parametric data for the stereo image is more efficient than transmitting a two-channel HE-AAC encoded signal. HE-AACv2 typically features 24- to 32-kbit/s data rates for a stereo signal.

AAC and HE-AAC are found in many of today’s applications. In particular, AAC and HE-AAC are established as the main audio codecs (besides mp3) in a host of Internet applications.

HE-AACv2 is widely established in state-of-the-art TV broadcast systems. It’s part of the DVB toolbox and a mandatory codec in most countries that recently introduced the second generation of terrestrial TV, such as Spain, Great Britain, France, Ireland, Sweden, Austria, Italy, Denmark, Finland, and Norway. In Brazil and many other South American countries, HE-AAC is the only audio codec defined for terrestrial TV broadcast.

In addition, HE-AAC is an established part of the Smart TV environment. For example, it’s the mandatory codec for the Hybrid Broadcast Broadband TV (HbbTV) data service in Europe. As a result, all HD-capable TV receiver devices, such as TV sets and set-top boxes being sold in Europe and South America, support HE-AAC. All major broadcast encoder manufacturers included HE-AAC in their devices long ago. Of course, HE-AACv2 supports all relevant broadcast metadata.

HE-AAC has become the dominant audio-streaming codec. All major streaming and media platforms support HE-AAC, including Flash, Silverlight, Windows Media Player, Winamp, and iTunes. Operating systems Mac OS X and Windows come with HE-AAC, as do the iOS, Android, Windows Phone, Symbian, and BlackBerry mobile systems. Today’s established http adaptive streaming systems such as Apple HLS, Microsoft Smooth Streaming, and Adobe Dynamic Streaming also are based on codecs of the AAC family.

HE-AACv2 is an important part of other streaming standards in the consumer electronics domain, playing an integral role in electronics such as Open IPTV Forum, ATIS, HbbTV, and DLNA. Consequently, almost all digital TVs, Blu-ray players, set-top boxes, and gaming consoles support the codec. This widespread support of HE-AACv2 makes it the codec of choice for content providers. That’s why most Web radios (e.g., Pandora, Aupeo, Hulu, and BBC iPlayer) are based on HE-AACv2.

MPEG Surround

MPEG Surround technology can be viewed as an extension of the Parametric Stereo principle from stereo to multichannel. In contrast to the PS tool, MPEG Surround is more scalable in terms of bit rate and quality. MPEG Surround can be combined with the codecs of the AAC family, offering a very high coding efficiency. One other advantage of MPEG Surround is that it is backwards-compatible with stereo signals.

The bit stream always includes the AAC-encoded core stereo signal as one element and the MPEG Surround description as the second element. A stereo decoder can extract the core stereo signal and decode it, while an MPEG Surround capable decoder can recreate the full multichannel audio signal. As a result, MPEG Surround can be used in a mixed receiver population with inexpensive or legacy stereo-only, or multichannel, receivers without the simulcast of a stereo and multichannel signal.

Low-Delay Audio Codecs

Beyond broadcast, streaming and music distribution, MPEG codecs also find homes in communications’ applications. The AAC family’s communications codecs are particularly popular in high-quality conferencing and video telephony systems, because service providers and operators can thus offer Full-HD Voice services. Full-HD Voice is the highest possible audio quality achievable by communications systems.

Whereas traditional narrowband telephony only transmits up to 3.5-kHz audio bandwidth, Full-HD Voice systems transmit the full audible audio spectrum of 14 kHz and more. That way, Full-HD Voice calls sound as clear as talking to someone in the same room. The Full-HD Voice codecs of the AAC family include Low Delay AAC (AAC-LD), Enhanced Low Delay AAC (AAC-ELD), and Enhanced Low Delay AACv2 (AAC-ELDv2).

AAC-LD represents the standard for high-quality video conferencing, offering full-bandwidth, low-delay audio coding. It features an algorithmic delay of only 20 ms, while offering a good compression ratio and high sound quality for all types of audio signals.

AAC-ELD is an enhanced version of AAC-LD that combines MPEG-4 AAC-LD and Spectral Band Replication (SBR). AAC-ELD is the best choice for any delay-critical application that demands full audio bandwidth at data rates as low as 24 kbits/s.

AAC-LD and AAC-ELD are already in use today for professional and consumer video-conferencing applications. One such example is Apple’s FaceTime application.

AAC-ELDv2 is the latest extension of the successful AAC-ELD audio codec. To achieve lower bit rates for stereo-signal coding, AAC-ELDv2 combines the benefits of the AAC-ELD codec with delay-optimized parametric multichannel coding. By using this approach, only one mono channel plus some additional information is transmitted instead of two discrete channels.

Extended HE-AAC

In early 2012, MPEG standardization was finalized on Extended HE-AAC. It significantly improves the audio quality of music and speech, particularly at very low 8-kbit/s data rates and at higher bit rates, such as 24, 32, and 64 kbits/s. It is compatible with HE-AAC streams.

The new codec brings together the previously separated worlds of general audio coding and speech coding by combining the advantages of existing speech and music codecs. By adding a new set of encoding tools to the HE-AACv2 audio codec, Extended HE-AAC outperforms dedicated speech and general audio-coding schemes. It also bridges the gap between both of those worlds, providing consistent high-quality audio for all signal types.

Summary

From entertainment to communication, MPEG audio codecs populate virtually all state-of-the-art consumer electronics, IT, and communications devices. It began with the mp3 audio codec at the end of the 1990s, and the development has gone unabated through to today.

Whereas mp3 and its successor AAC are widely known in the consumer space, other MPEG codecs like HE-AAC or AAC-ELD operate under the hood. As a result, they’re mostly known to the educated audience, although most of us use these codecs on a day-to-day basis, whether it’s watching Internet video or engaging phone calls on Apple FaceTime.