Fraunhofer IIS is based in Erlangen, Germany. It has been dedicated to compressed audio technology innovation for more than 20 years developing the MP3 compression algorithm. It is one of the world's authoritative sources for audio and multimedia products and is the co-developer of AAC. Its technology can be found in more than 1 billion devices worldwide.
Fraunhofer IIS is part of Fraunhofer-Gesellschaft, Europe's largest organization for applied research with 17,000 employees and a $2 billion annual research budget. The institute supports the development of open international standards and data services in ISO MPEG, ITU-T, ITU-R, 3GPP, IETF, DVB, Eureka DAB, ISDB etc. Fraunhofer IIS has contributed to the design and standardization of many digital radio systems, including Digital Audio Broadcasting (DAB) and Digital Radio Mondiale (DRM) as well as U.S. and European satellite radio systems, such as Worldspace Satellite Radio, SiriusXM Satellite Radio, ESDR and DVB-SH. The organization offers contract-based research and development services, and provides its customers with guidance through the maze of multimedia standards and technologies.
I recently spoke with Stefanie Frank, Marketing Manager at the Audio and Multimedia Division of Fraunhofer IIS and Manfred Lutzky, Group Manager Audio for Communication at Fraunhofer IIS.
Wong: What are the trends/issues Fraunhofer is seeing in HD videoconferencing?
Frank: We see major opportunities for new interactivity with [email protected], allowing friends and families to take part in social gaming and conversation in-real time, no matter the distance. And with the switch to IP based networks, the living room is becoming a place for not only entertainment but also communication. This development is derived by two ongoing trends. The first one is based on the business world where high-quality video conferencing systems and telepresence systems are available already today (for example Cisco's TelePresence systems). The second trend is driven by the fact that internet connectivity of TVs and set-top-boxes is growing rapidly. Those devices are usually connected to the living room multi-channel HiFi equipment, which allows for high-quality audio. We expect that both worlds will meet so that high-quality video conferencing will also be possible at home. Friends and family can virtually meet at the living room table however far away they might be. Communication will be as natural and as clear as talking to someone in the same room.
We also expect that [email protected] will go mobile to allow for high-quality communication on mobile phones and smartphones. Apple's FaceTime is pioneering this trend.
Commercial web-based conferencing services for business use, which use the PC-inherent loudspeakers with full audio bandwidth, are becoming increasingly popular. We are seeing a need for better quality communication for those services, addressing higher audio quality, the opportunity to share documents and presence information during the conference call as well as technologies that ensure high-quality conversations even under adverse IP-network conditions.
Wong: How is Fraunhofer IIS addressing these issues?
Lutzky: Fraunhofer has developed, and continues to develop, technologies for next generation communication systems, including technologies that provide users with a "same room" communication experience.
Fraunhofer's audio codecs for communication, Low Delay AAC (AAC-LD) and its successor, Enhanced Low Delay AAC (AAC-ELD), are MPEG audio codecs for maximum speech and audio quality at very low coding delay and very low bit-rates. Both codecs support full audio bandwidth, which allows high audio quality for all audio sources, including music, speech and background sounds. Designed for high-quality IP-communication applications and devices, AAC-LD is derived from MPEG AAC-LC, while AAC-ELD is a combination of MPEG AAC-LD and Spectral Band Replication (SBR).
In addition, Fraunhofer has built an Audio Communication Engine, which combines several components that vastly improve the sound quality and clarity of videoconferences. The Audio Communication Engine is the label for a whole set of technologies, including: MPEG audio codecs for communication in CD quality, Echo Control for hands-free conversations and a low-delay IP-streaming system that ensures high quality also under bad network conditions - all components that create one system for best possible clarity. The Audio Communication Engine allows for multi-channel audio to support a spatial impression while talking. Spatial impression is important to address the so called "cocktail party effect". It describes the effect while talking to one person in a room full of people. Here it is important to locate the audio source (the person) to better understand what he or she is talking about.
Wong: Can you give us an example?
Lutzky: Fraunhofer is involved with the European Union Project "Together Anywhere, Together Anytime" (TA2), which aims to help make communication easier among friends separated in space and time with [email protected] TA2 includes different concepts such as family games, sharing videos, story telling and reading together etc. To make this a reality, several technology components are needed, and one of these components is Fraunhofer's [email protected] HD communication technology, which includes a low delay video solution and the Audio Communication Engine.
Wong: What customers are using Fraunhofer's communication technology?
Frank: Fraunhofer's communication technology is used in many different products and market segments.
The audio codec AAC-(E)LD is used for telephone and VoIP applications from companies like Yamaha Corporation (for example PJP-50/-25/-EC200) and Apple Inc. (iChat). In addition, many manufacturer of professional videoconferencing and telepresence systems rely on AAC-(E)LD. This includes companies like Cisco Systems, Lifesize Communications, Tandberg, Sony and many more. Apart from professional systems used in business life, AAC-LD is implemented in products for the end-consumer market, for example in Umi from Cisco, the videoconferencing system for the living room. In addition, AAC-ELD is used in the video conferencing application FaceTime from Apple. Cisco's business-targeted tablet "Cius" also is using AAC-LD in their communications apps.
AAC-(E)LD also is used in broadcasting studio equipment. The most important equipment manufacturers use AAC-(E)LD for professional audio transmission, including Telos Systems, Mayah Communications GmbH, Digigram SA, Comrex and many more.
In addition, Fraunhofer provides its Acoustic Echo Control and MPEG AAC-LC audio codec into Teliris' telepresence 6G solutions.
Wong: How did Fraunhofer develop the idea for the Audio Communication Engine?
Lutzky: The Audio Communication Engine consists of three components: the audio codec, an echo control technology and a low-delay IP-streaming system.
With the development of mp3 and AAC, Fraunhofer has enabled high-quality music storage and audio streaming applications. At the same time, audio quality of telephone conversations has been not improved over the last century. Fraunhofer has recognized that the same audio quality used for music can also be achieved for telephone calls while using the same data rate currently used for telephone calls (for example 64 kbit/s). This is why the Institute has developed a low delay version of the music codec AAC-LC for communication applications: Low Delay AAC (AAC-LD) and Enhanced Low Delay AAC (AAC-ELD). Both achieve the same audio quality compared to AAC-LC at slightly higher data rates.
In addition, we recognized that current echo cancellation approaches were not suitable for PC-based applications where high audio bandwidth is used. That is why we developed an Echo Control technology for hands-free conversation, which is also able to deal with changing acoustic background conditions.
The Audio Communication Engine is designed for communication applications over IP-networks, which are best effort networks and thus are unreliable. To address this issue we developed a low-delay IP-streaming system which allows for reliable, high-quality, low delay communication even under adverse network conditions (packet loss and packet jitter).
Wong: How does the Audio Communication Engine compare to other alternatives on the market today?
Frank: Deutsche Telekom Labs conducted an independent listening test on behalf of 3GPP (3rd Generation Partnership Project (3GPP) unites telecommunications standards bodies) comparing super wide band voice and audio codecs. The purpose of this test was to evaluate low delay audio and voice codecs that are suitable for high-quality (super wide band) communication applications. Codecs in the test were AAC ELD, AAC LD, CELT, G.718, G.719, G.722.1 Annex C, Silk, Speex, G.722.2 (AMR-WB) and G.722. As a result, AAC-ELD and AAC-LD were at the forefront of audio quality. AAC-ELD was the only audio codec that achieved excellent audio quality already at 32 kbit/s. Details of the test report and its results can be found at: ftp://ftp.3gpp.org/tsg_sa/WG4_CODEC/TSGS4_59/Docs/S4-100479.zip.
Wong: How does Fraunhofer address issues relating to efficiency, reliability and quality for videoconferencing technology?
Lutzky: We understand reliability as a system-inherent component, which we address with the robustness of our coding tools involved in the system. The Audio Communication Engine is robust against adverse network conditions, such as jitter, packet loss and limited bandwidth. This is realized by excellent packet loss concealment capability of the audio codec AAC-ELD allowing for intelligibility at packet loss rates up to 30 percent.
To reduce packet loss under adverse network conditions it helps to reduce the bit-rate of the audio codec. As shown in the listening test of Deutsche Telekom, AAC-ELD achieves excellent audio quality at 32 kbit/s.
In addition, the audio codec support features that allow for a fast adaption to network jitter, which is exploited by the low-delay IP-streaming system.
To achieve efficient communication experience via videoconferencing, the following parameters are important:
- Low delay: With the Audio Communication Engine we achieve an end-to-end delay of 80 ms via a standard LAN network, which perfectly suits human communication perception and which is also significantly below state-of-the-art conferencing products (for example 150 ms and more)
- Spatial Audio: The Audio Communication Engine supports 3 audio channels, which is common in telepresence applications. The Audio Communication Engine is also capable of more than three audio channels and therefore is future proven.
Wong: What are some of the challenges Fraunhofer has found in developing technologies for videoconferencing?
Lutzky: One of the challenges for conferencing is the interoperability of video- and teleconferencing systems of different manufacturers. Often there is no easy way to get connected between different systems. Challenges include the choice of the audio codec to be used during the session or the spatial rendering of the audio channels. To solve this challenge it is important to rely on international standards for communication equipment. There are two standardization bodies dealing with this task at the moment: ITU-T Subgroup 16 Question 5 and IMTC Telepresence Activity Group. This is only one example of challenges during the development of technologies for videoconferencing.
Wong: How do Fraunhofer's videoconferencing technologies compare to other current solutions?
Frank: Current conferencing systems are either very expensive (>10,000 Euro) or are not low delay (<150ms). However, Fraunhofer's videoconferencing technology components are based on PC hardware and software, which is widely available at a very competitive price. Also, the Fraunhofer technologies achieve a delay below 100ms at low data rates.