As a young man, I yearned for a machine like the ship’s computer on Star Trek: a gadget that can listen and obey a human voice, and answer in kind. Fifteen years ago, after reading about university researchers who had gotten voice-controlled artificial intelligence systems working, I taught myself Linux and set up a server in my attic in the hope that the technology had arrived to let me build such a thing myself. It had not.
But now voice control has come to the masses. Amazon’s Echo smart speaker was the hot holiday gift of 2016, and last year the company released the smaller Dot and Tap gadgets. Like the Echo, these tie into Amazon’s intelligent personal assistant, Alexa.
It is Alexa, running in the cloud, that converts your speech into text, interprets the text, and responds verbally, musically, or by passing commands to some other smart gadget such as a Wi-Fi–enabled lightbulb.
Alexa isn’t the only player in this game, of course: Apple and Google have their own speech-driven AIs. But unlike those companies, Amazon has placed a heavy emphasis on inviting tinkerers and developers to expand the uses for Alexa, in two ways.
First, the company showed programmers how to create new “skills” (voice-controlled apps) that Alexa can invoke, and it set up a section of its online store to distribute them. Within months, the store contained almost 10,000 skills, with hundreds being added each week. (Currently, all skills are free.)
Second, and more interesting to me, Amazon released programming interfaces for Alexa and uploaded free source code and tutorials to Github. Anyone can use these to make their own Echo-like gadget on hardware as inexpensive as a US $40 Raspberry Pi 3 equipped with a cheap USB microphone and speaker.
I resolved to build an Alexa Pi that could do all that an Echo can but also play music in stereo through better speakers. And for extra credit, I wanted to try to use the same hardware to make an intelligent speaker that doesn’t rely on Amazon at all.
A quick survey of user forums turned up a problem with my plan to use my cheap USB microphone: Alexa needs cleaner audio input than one microphone can provide. The Echo uses seven microphones and sophisticated noise-cancellation circuitry to discern voice commands from across the room, even when music is playing.
Fortunately, audio-and-voice-tech company Conexant recently released the AudioSmart development kit, a board that includes two adjustable microphones, noise-cancellation hardware, and firmware preprogrammed to listen for the “Alexa” wake word. When the board hears the wake word, it sends a trigger signal to the Pi’s general purpose input/output port to let the Pi know that it should start listening to a verbal command. Although the kit is aimed at development engineers (and pricey at $300), it can be reprogrammed to respond reliably to any wake word, unlike Amazon’s Echo and Dot, which offer you only a choice of “Alexa,” “Echo,” “Amazon,” or “Computer” (the latter proving that Amazon engineers watch Star Trek, too).
Following Amazon’s tutorial on Github, I had the AudioSmart connected to the Pi and the system responding to verbal commands in a day. I linked it to the Alexa app on my iPhone, chose some skills from the online store, and soon had it turning lights on and off in the bedroom and queuing up TV shows on my Plex media server.
The effect was pretty magical—except for one glaring weakness. My setup required a monitor and keyboard to run: By default, Amazon forces a user to authenticate the device with its servers by manually logging into an Amazon Web page, which then passes a “token” (a long string of characters) to a graphical-user-interface program running on the Pi. The token expires after a few hours.
Clearly I wasn’t going to set up a monitor and keyboard in the kitchen just to turn on the lights. There has to be a better way, I thought.
There is, but it turned out to be devilishly complicated. You can use a special Android app to generate a reusable token for the Alexa gadget that works even after rebooting. Amazon provides sample code for the app, but you have to configure, build, and run it yourself, using Android Studio. The documentation is sketchy and out of date. Many hours of work went into getting the app to run and communicate successfully with the Pi, and to then configure the Pi so that all the necessary pieces of software run in the right order at boot time.
At last however, I was able to unplug the monitor and keyboard, boot up the Pi, and say “Alexa, tell me a joke.”
“What did the dog say after a long day at work?” Alexa responded: “Today was ruff.” You’re telling me.
Searching for a simpler route, I came across Mycroft AI, a startup in Lawrence, Kan., that has created a completely open-source alternative to Alexa. Even the hardware designs for its Echo-like product, called Mark 1, are free to download and build or modify yourself. I grabbed the Picroft disk image and copied it to a microSD card, which I stuck into my Pi. The Pi booted right up and started running the AI. (I did need to change a few files to make the system work with the AudioSmart board.)
Mycroft’s system is still in its early days and has only a fraction of the skills that Alexa offers. It is much more flexible, however: You can set it to use IBM’s Watson to convert your verbal commands to text and use Google Voice to talk to you, for example. Creating a new skill is as easy as writing a few dozen lines of Python code. The vast universe of open-source Linux software is there for the combining—so that your AI can boldly go where no AI has gone before.
This article appears in the May 2017 print issue as “Build Your Own Amazon Echo.”