The recent emergence of digital-assistant products has consumers asking Alexa or Google to perform tasks and provide information in every room of their home. The first products introduced in late 2014, and still dominating the nascent market today, were smart speakers (also called artificial-intelligence [AI] speakers). Smart speakers combine a wireless speaker system with an AI platform; initially, their primary function was to stream music from the cloud. In the past two years, smart speakers have added displays; cameras; streaming video; and home-automation control of lights, climate and security systems.
The emergence of this smart-home ecosystem (Fig. 1) has resulted in a significant redundancy of smart devices throughout the home. However, recent additional functionality such as displays and home-automation control has led to form factors that start to appear as if they are suited for specific rooms of the house—although not yet optimized for the specific needs of each individual room. The market is approaching a tipping point, where smart speakers need to become targeted to specific rooms and coexist with other room-optimized speakers.
1. The smart-home hub ecosystem.
All smart speakers contain the same fundamental components (Fig. 2):
• Input: Smart speakers use microelectromechanical-system (MEMS) microphones for voice capture, while digital signal processors (DSPs) run algorithms such as acoustic beamforming, noise, and acoustic echo cancellation. Initial solutions used digital MEMS microphones to output a digitally converted bitstream to a DSP, but did so at the expense of accuracy and dynamic range. Higher-performance solutions employ analog MEMS microphones with separate, highly integrated audio analog-to-digital converters (ADCs). These ADCs can greatly increase dynamic range and reduce the number of microphones required.
• Output: The same DSPs also process the digital decoding of the audio stream, performing equalization and outputting the audio to speaker amplifiers. Newer speakers employ digital amplifiers to integrate DSP functionality such as equalization and tuning for specific electrical parameters. In addition, these mini DSPs perform various levels of speaker protection and maintain audio quality under adverse conditions such as over-temperature and loss of voltage to the power stage. Since the digital audio content is processed on-chip, these amplifiers can reduce power consumption by varying modulation schemes and controlling the power stage (based on the audio content) on its way to the output stage.
• Connectivity: The primary form of connectivity required for all smart speakers is Wi-Fi. Although the 802.11ac bandwidth isn’t required for audio streaming, it has emerged as the de facto standard, since many of the same system-on-chip (SoC) vendors provide video-streaming SoCs. A Wi-Fi radio integrated circuit (IC) also often integrates Bluetooth.
2. TI’s system block diagram of a typical smart speaker with display shows both the fundamental components of smart speakers as well as some more recent additions.
In terms of Bluetooth, both basic rate (Bluetooth Classic) for audio streaming from devices such as smartphones, and Bluetooth Low Energy (BLE) for control and communication between paired devices, are used. As Bluetooth 5.0 emerges, expect new compression profiles that allow for audio streaming, eliminating the need for Bluetooth Classic. For legacy compatibility, however, both may be available for some time. Bluetooth 5.0 also enables a mesh network, opening up a future where the audio content can move from device to device, with only one device serving as the audio hub.
Adding Some Spice
Designers are adding functions that enhance the user experience to differentiate their smart speakers. Replacing buttons with capacitive touch panels, for example, enables more intuitive control, reduces cost, and improves reliability. In some cases, haptic feedback maintains the tactile feeling with which consumers are familiar. Since a smart speaker enters many modes, including responding to commands, colorful LED lighting patterns provide visual feedback and add a bit of flair. Adding ambient light sensors to adjust the output brightness, from brighter in sunlight to dimmer in darkness, further enhances the consumer experience.
Let’s review some smart speaker features that are room-specific, based on these rooms’ inherent functions and what consumers want most.
As mentioned earlier, additional functionality has led to smart speakers with form factors suitable for specific rooms of a home. Small speakers with LED displays can look like alarm clocks (Fig. 3), for example, which is great for bedrooms. Display size and resolution aren’t critical, as the display is simply a clock face. But adjusting LED brightness for daytime and nighttime use is important, and requires the addition of ambient light sensors.
3. A typical alarm-clock-styled smart speaker for a bedroom adds a new dimension to an old application.
Speaker size and amplifier power are proportional to the smaller room. Since these rooms are often quiet, and the distance between the speaker and the consumer is probably between one to four meters, designers can reduce the number of microphones and the complexity of voice-recognition algorithms. The addition of a camera for video calls is probably not a good idea in rooms that prefer a bit of privacy.
Since these speakers are usually positioned on a nightstand or desk, ac power is readily available, so battery power isn’t an absolute requirement. However, given the location of these speakers, it makes sense to add battery-charging capability to charge smartphones and smartwatches, either through USB or wirelessly. Finally, consumers should have the ability to connect to and control lights, thermostats, and security systems from this location.
Kitchen and Office
Larger speakers with tablet-sized LED displays or short-throw projection displays (Fig. 4) seem to be nicely suited for kitchens or offices. Display resolution is important here, but space is also a priority in mechanical design. Placed against a wall or under a cabinet, speakers with short-throw projection displays using TI DLP technology enable smaller enclosures and larger images when projected against a wall or countertop.
4. A smart speaker with projection display for kitchens or offices adds video to help create a more immersive user experience.
However, designers should consider the potential placement of these speakers next to sources of high ambient light like windows. Placing these speakers on kitchen islands or office desks will require a backlit LED display.
From these locations, consumers will want high-resolution video streaming to watch recipe videos or TV shows, check the news, or see who is at the front door. These smart speakers would require a camera for video chats, as well as the ability to control lights and thermostats.
Although kitchens or offices may not be larger than bedrooms, the potential for high traffic/activity—or the desire for higher fidelity sound—will require louder speakers and more powerful audio amplifiers. There will likely be a design tradeoff between the number of microphones and complexity of voice-recognition algorithms and the short distances from microphone to voice and higher amount of ambient noise.
The power consumption of video streaming and display or projection makes battery operation not practical, but the larger body size of the display should provide some relief in regard to heat dissipation.
The living room has been the center of multimedia news and entertainment for almost a century. Today, living rooms include myriad devices that interact with us and with each other, either wired or wirelessly.
Consider the cable or satellite set-top box (or emerging Internet Protocol television and over-the-top content boxes), the TV, the soundbar, and even the remote controls. All of these devices interact in some fashion, and now all include wireless connectivity. Each device has a specific role within a living room and is content to coexist, making little attempt to replace another device.
Smart speakers in living rooms and the addition of AI to entertainment devices changed everything. Designers of traditional living-room devices had to race to add AI functionality and (at a slower pace) home automation. This sets up the unavoidable scenario of devices “stepping” on each other and creating confusion and potentially frustration for consumers, especially if products don’t share the same AI platform.
This battle for an AI hub is a boom for manufacturers of audio and video SoCs, Wi-Fi and Bluetooth connectivity ICs, audio ICs, MEMS microphones, and speakers. However, for consumers, it’s hard to imagine how so many AI devices can coexist, knowing which device is supposed to be processing which command and which device will provide the audio feedback.
In the long run, having multiple devices with microphones placed around a large room isn’t such a bad thing. If done right, this scenario could greatly improve overall beamforming and voice-recognition accuracy—as long as the devices are talking to each other. Each device needs to work together to assign (and allow) a single device to communicate to the cloud and then arbitrate which device will output the audio result.
This setup isn’t as much a hardware challenge as it is a significant connectivity and AI platform challenge. Such scenarios will require a mesh network, with each device registered in the network and protocols added to the AI platform to arbitrate which device will output the audio response and which device will perform the design function. With Bluetooth, Wi-Fi, and Zigbee protocols all in play, you can begin to see the size of the challenge. Adding 5G could add to the confusion, though it may also provide the answer. That remains to be seen.
Who Will be King?
TV manufacturers will argue that the TV is the obvious choice for a primary home hub, but televisions fall short of the audio quality and number of speakers that 3D sound technology consumers will soon expect. The same audio challenge exists for set-top boxes. Standalone smart speakers don’t connect to TVs, so they’re not an option. The lone remaining device in the living room from which all media can emanate is the soundbar.
Today, soundbars include a Wi-Fi set-top box (an IPTV). These soundbars are available for purchase through retail outlets that employ over-the-top (OTT) streaming on-demand video for anyone who purchases a subscription. Other soundbars are available from the multiple-system operators that deliver live television broadcasts.
Regardless of soundbar type, they have the capability to integrate AI voice-recognition systems, high-fidelity 3D audio, and streaming TV. Soon, they should be able to incorporate home-automation radios and protocols such as Zigbee to dim lights, and a camera to stream live feeds of security systems or video chats.
Although this utopic smart-home hub ecosystem doesn’t exist exactly in this form today, it will arrive, and quickly. It’s up to designers of these systems to keep the consumer experience front and center, to develop hardware and software that optimizes each device for the room in which it’s placed, and to play fair with other devices in a seamless room-to-room meshed network. Systems with this functionality will gain a foothold and help the market grow at its current pace (from 1 million units shipped in 2015 to over 165 million in 2024, a CAGR of more than 11%).1 Those that don’t will end up in yard sales next to the 3D glasses.
Mike has over 35 years of experience in the semiconductor industry, mostly in applications, product definition, and marketing. For the past seven years, he has worked in the Systems Engineering and Marketing organization at Texas Instruments (TI), which looks at integrated-circuit solutions at the system level across functions from power management, signal chain, interface, and wireless connectivity to embedded processing. Gilbert also has system expertise in industrial motor drives; medical systems; and most recently, personal electronics.
1. 2019 SAR Insight & Consulting, SensiAn Research Limited
Lo, Wenchau Albert, and Gilbert, Mike. “Smart speaker fundamentals: Weighing the many design trade-offs.” TI.com white paper (PDF).