Smart speakers are the hot item these days, but they really are smart listening devices. Sci-fi movies with voice-activated computers, like Jarvis in the Iron Man movies, only works if listening devices and speakers are available everywhere. What we’re missing are a host of Apple HomePods, Amazon Echos, Microsoft/Harman Kardon’s Cortana, and Google Home smart speakers (Fig. 1) populating Tony Stark’s entire house. Of course, trillionaires would hide all of this when the home was being built.
Apple’s HomePod joins Google Home to take on the incumbent, Amazon Echo. These are all relatively simple devices, albeit well-engineered, that consist of a wireless SoC driving a speaker or two and linked to a host of microphones… “the better to hear you with, my dear.”
1. Apple’s HomePod (a), Amazon’s Echo (b), Microsoft/Harman Kardon’s Cortana (c), and the Google Home (d) smart speakers are building the audio walled gardens.
These devices can simply act as wireless speakers streaming audio from sources like your smartphone or PC. They tend to have limited controls that require a smartphone for remote control or verbal commands.
For much of the latter, a link to the internet is in the mix, since these are Internet of Things (IoT) devices (Fig. 2) designed to do more than just play music. Their functionality extends to being a control center for the home as well as a platform for ordering products and services (see “Speaking of Orders: Who’s Winning?”). They can also do useful things like look up information on the internet to find out about the day’s weather, or add events to your calendar.
Much of this magic occurs in the cloud, which means that if the internet connection isn’t working or is transient in nature, the capabilities of these platforms will be diminished. This makes their suitability suspect in some environments with more limited connectivity (see “Will Trashing Net Neutrality Kill Your Customer Base?”). High-latency environments can make interaction challenging as well.
Still, the platforms are more than low-end micros that just stream audio information in either direction. They’re typically packed with robust, multi-microphone hardware and software designed to improve the interactive voice response (IVR) support as well as the ability to differentiate between multiple people speaking and their location with respect to the device.
One reason for doing much of the heavy lifting in the cloud is the use of tools like artificial intelligence (AI), machine learning, and deep-neural-network (DNN) support. They work better with more computing horsepower and even specialized AI and DNN hardware (see “CPUs, GPUs, and Now AI Chips”).
Developers can target these platforms for a variety of standpoints. It’s possible to build your own version of these platforms (see “Building Your Own Alexa Echo”). This allows IVR support to be built into devices ranging from refrigerators to televisions. There are two reasons to go this route. First, it provides a way to control the device. Second, it allows for replacing or supplementing the device, since one can typically cover only a single room. Amazon even gives a quantity discount when buying multiple Echo Dots (see “Echoes of “Little Green Men Attack!”).
2. The smart speaker is just part of an IoT environment that can also include purchasing and delivery of products and services.
Part of the challenge is providing audio support so that the device will work properly, which has made audio processing such a hot embedded topic. Another challenge is coordinating with other devices in the environment. Initially, a single device will be found in a home or office, but multiple devices with overlapping coverage will be the norm in the future. One would not want two identical orders to be made just because the request was heard by more than one device. Likewise, roaming akin to Wi-Fi—but on an audio scale—is something that may be possible in the future.
Software developers can create applications that work with Siri, Alexa, Google Assistant, and Cortana. Such support can be used to bring voice-enabled services to new or existing hardware. These may be associated with the smart speaker through the LAN or the internet.
The major question for developers is: “Which platform, and how many platforms, do I support?” That’s because they’re essentially exclusive walled gardens, and it can become challenging for hardware developers to support one or more platforms.
Will there be a fifth platform in the future? That remains to be seen, but it will be a tough row to hoe with the four heavyweights already in the mix. Quite a few services are needed to compete in this area, not just voice recognition and a cloud service.