The fastest way to get started with hardware development for Amazon’s Alexa Voice Service (AVS) is with Cirrus Logic’s Alexa Voice Capture Development Kit for Amazon AVS (Fig. 1). The kit comes complete with a Raspberry Pi 3 and an expansion board with a pair of CS7250B digital MEMS microphones and CS47L24 smart codec from Cirrus Logic.
Including the Raspberry Pi 3 is key, since the software can be targeted and delivered in a fashion that eliminates configuration errors that often haunt solutions providing only part of the hardware. In actuality, the Raspberry Pi 3 is critical to the solution, but it is significantly underutilized. This has the advantage of allowing significant enhancement to the system…but more on that later.
The kit comes in its own carrying case. Plugging the Raspberry Pi 3 into the expansion board and connecting the speaker took minutes. The Ethernet cable and USB power supply were next. It takes longer to read the getting started documentation and to sign up for the Amazon account than anything else. The end result is access to the kit’s web interface using a standard browser (Fig. 2).
There are a couple of strings to copy from the web-based Amazon account interface after creating a new logical device. These copied to the AVS configuration page on the device’s web interface. A quick login through the web interface to Amazon completed the linkage, and I now had an operational Alexa device. I started by streaming WHYY, a local PBS radio station. The whole process took less than 15 minutes.
Well, actually it took a little longer, since I deviated slightly from the instructions: The Raspberry Pi was plugged into my test network. I had to modify the URL in the Amazon AVS web configuration for the device on my network to match my DHCP/DNS configuration assigned to the device instead of the default in the instructions. Other than that, the only extra step was configuring the Wi-Fi support, and that is also done from the device’s web interface. A reboot was all that was necessary to have a wireless Alexa device.
The device’s web interface allows tweaking of the codec parameters, although I found the defaults to be sufficient. The target of this reference design is low-cost solutions, since it only has a pair of microphones. This allowed operation in a quiet room from a distance, but it required me to be a couple feet away if I had loud music playing from a different source.
On the other hand, if the device is streaming the audio, as with the radio station, it can cancel this out more effectively—thereby allowing operation over a longer distance. This is called “barge” support and allows this solution to track voice commands without silencing its streaming audio. It is possible to do say things like “Alexa, louder, louder” to increase volume. That is a bit more difficult if the audio is toned down when the Alexa keyword is spoken.
The device status web interface was useful in tracking how well it was responding to voice input. It presents the inputs from both microphones, as well as the audio output.
The Raspberry Pi essentially acts as a router and to recognize the keyword. It also initializes the codec chip that does all the heavy lifting when it comes to audio support. It is possible to develop a significantly lighter-weight solution if the device simply acts as an Amazon Echo Dot replacement.
Adding an application that matches an Alexa “skill” to the Raspberry Pi is one way to expand the capabilities of the system. There is plenty of space and horsepower, given the minimal overhead from the Cirrus Logic support. Implementing a feature as a skill has the advantage of being able to perform some action, using other means such as a matching app on a smartphone.
This type of enhancement is beyond the scope of the kit’s documentation and this article, but there are lots of resources available including the Alexa AVS SDK.
Using this “skill” approach assumes that the Alexa support is running locally and there is a link to the matching support in the cloud. Break the connection for any reason and the device can wind up being a brick.
One way to keep the device operational to some degree is to provide additional user interface options like buttons, or to implement local voice command recognition. There is one way the Raspberry Pi shines: Cirrus Logic works with Sensory’s TrulyHandsfree Voice Control technology. The company can work with you to create a voice command set. These commands could then initiate local actions without requiring a network connection or even Alexa connectivity.
Many of the kits I have looked at require a good bit of expertise and patience to get operational. This kit was a joy to setup and use. There is a significant learning plateau to reach the ability to add an Alexa “skill,” and a much higher plateau to implement local voice command, but both are possible.
Likewise, as noted, the kit targets low-cost, dual-microphone solutions that might be used in whitegoods such as washing machines, where the person is standing in front or near the device. Cirrus Logic has support and reference designs that support more microphones and that can include local voice command support on the codec.
If you are just investigating AVS, this is a great platform with which to get started.